The Perils of Succeeding “On Average”

Two recent comments on different topics got me thinking about averages, and why people like to talk about them more than they like hearing about them.

Toronto transit expert Steve Munro made this comment on the familiar perils of transit operations in that city:

In Toronto, the TTC reports that routes have average loads on vehicles, and that these fit within standards, without disclosing the range of values, or even attempting any estimate of the latent demand the route is not handling because of undependable service.  Service actually has been cut on routes where the “averages” look just fine, but the quality of service on the street is terrible.  Some of the planning staff understand that extra capacity can be provided by running properly spaced and managed service, but a cultural divide between planning and operations gets in the way.

When it comes to on-time performance, everyone understands that the average isn’t a useful concept.  If your buses are ten minutes early half the time and ten minutes late the other half, then you could say that on average they’re right on time.  We’re all smart enough not to fall for that.

But it’s still common to hear reporting of average load, as Steve mentions.  Average load is the total number of passengers through a point divided by the number of buses/trains that carried them.  It has some uses in transit planning as a way of talking about ridership patterns, but it’s not a good way of describing the customer experience.  To do that, you’d have to look at how often you have incidents that the customer will hate, such as crush-loading and worse yet, passing customers at stops for lack of room.  That’s how we talk about on-time performance: percentage of trips that are more than 5 minutes late.  So you should also be reporting the percentage of trips that are crush-loaded or begin passing up customers.  (How do you count passed-up customers?  A good question for another post.)

Now obviously, if you want to describe how your system looks to your customers, you should also weigh those measurements of overloading by the number of people who experience them, just as one might expect agencies to do for lateness. If your statistics aren’t weighted that way, then, again, you may be describing your operations but you’re not describing your customer experience.

Recently, commenter Calwatch mentioned a similar scourge, the bizarre but commonplace notion of “average frequency,” which sometimes undermines the urgent work of frequent network mapping:

[Los Angeles County MTA]’s “12 minute map” is better described as a “5 buses an hour on this street” map, as the current map now has a disclaimer stating that Rapid and local service frequencies are combined to show frequencies. So you may have a Rapid bus running every 20 minutes and a local bus running every half hour, but since it’s five buses an hour, it can be shown on the 12 minute map.

I’m not sure Calwatch is right about this.  Here’s the exact disclaimer from the Los Angeles MTA map:

The bus routes on this map run at least every 12 minutes on weekdays throughout the day.  Where Metro Rapid and Metro Local lines run together, service is available every 12 minutes or better at Metro Rapid stops. At intermediate local stops, service may operate less frequently.

I can understand that as meaning “The Metro Rapid really does run every 12 minutes, and if there are underlying locals, those locals may be less frequent.”  That would be fair, though it raises the question: So why are the less-frequent locals shown on the map?

But Calwatch claims that when they say “every 12 minutes” they merely mean “five buses per hour.”  If that’s true, and I haven’t verified it, then they would be guilty of implicit averaging.  Five buses an hour could mean five buses bunched at the same time each hour, with hour-long gaps between them.  And yes, you could say that on average, that’s a bus every 12 minutes!

So remember:

Nobody cares about average frequency, any more than they care about average lateness or average crowding!

The question customers ask is “what is the worst case I’ll typically experience?”  When an agency says the buses come “every 12 minutes” this customer is going to hear “OK.  I’ll never wait longer than 12 minutes.”  Is that customer going to get burned by trusting the 12-minute map?  I’m sure Los Angeles commenters will fill us in.  A comment from someone at the agency would be even more helpful.

I’ve had this conversation with transit agency staffs when doing jobs that required me to map the frequency of the existing system.  More than once, I’ve asked for frequency data and been given data on “scheduled trips per hour.”  I’ve had to go back and say:  Frequency is about maximum waiting time.  By its very nature, it’s a maximum, not an average!  So tell me the maximum, worst-case scheduled gap between consecutive trips!

So when you hear the word “average” in an agency’s statements, or even if you see an implicit average like “five trips per hour,” ask yourself:  Is the average what I care about as a customer?

21 Responses to The Perils of Succeeding “On Average”

  1. EngineerScotty August 29, 2010 at 8:42 am #

    Averages are abused in the other direction. A common anti-transit poster at TTP was, the other day, claiming that certain LRT systems were successful because they only “average” 24 passengers per train. Of course, he was averaging over the entire service day (or week)–when what matters for LRT systems is peak service.

  2. James D August 29, 2010 at 9:13 am #

    Of course when calculating average waits, you need to include the fraction of the people whom you expect to experience each headway in your calculation. This means you need to square the half-headways, average the squares, and take the square root. At 5bph, only regular 12-minute headways can give a 6-minute average wait if you do the math properly: something like 5-15-15-5-20 is bound to be higher (6.7 minutes in that case, equivalent to just under 4½bph). Something with 6bph could still get under 6 minutes on average whilst violating the 12-minute maximum though: 16-4-16-4-16-4 is the extreme realistic case. So long as it’s this average that’s being used (rather than the trivial one that makes all 5bph services the same), it does have a certain degree of intellectual honesty, although it does fail at inspiring good will.

  3. ant6n August 29, 2010 at 11:42 am #

    When the stm (Montreal) announced their 10 minute max network a couple of days ago, union officials apparently disrupted the announcement press conference. Their concern is that the 10 minute max sounds like a guarantee for on time service, and they are afraid that riders will verbally abuse the bus drivers if they are late (cuz bus riders are poor and that’s what poor people do).

  4. Carl August 29, 2010 at 12:42 pm #

    Great post
    Average frequency is useless, as is average load
    If one bus is empty and the next bus is overloaded, every customer experienced an overload.
    If one bus came after 2 minutes and have 10% of the customers, and the next bus came 18 minutes later, 90% of the customers waited an average of 9 minutes – so even though the route has 10 min headway and 5 min “average” wait, 90% of the customers experienced 9 min average wait
    Need to have valid measures to determine the real customer experience.

  5. mikef0234 August 29, 2010 at 2:54 pm #

    Average wait is a more useful measure than average frequency from the customer’s perspective. Maximum wait is probably more useful than average wait on infrequent buses, when the maximum is more than ten or fifteen.
    Evenly-spaced buses help reduce bunching anyway.
    In any case, the average customer-wait should be calculated assuming that customers arrive uniformly. A schedule like 16-4-16-4-16-4 has an average customer-wait of 0.75*(16 min)/2+0.25*(4 min)/2 = 6.5 min.

  6. mikef0234 August 29, 2010 at 2:55 pm #

    edit: should be 0.8 and 0.2 instead of 0.75 and 0.25, for 6.8 minutes.

  7. Steve Lax August 29, 2010 at 5:38 pm #

    I have found medians work better than averages; both for passenger loads and running time analysis. (If a transit system can capture every trip every day, the average and median tend to approach each other, but if the system can only sample a few trips and data is not carefully screened for exceptions, I believe the median tends to work better.)
    Also, I have found an absolutely even headway does not work well in all situations and should not always be the goal. For example, on a route with frequent service and short turns, a “short” should lead a “long” by a slightly tighter than “average” headway to prevent customers who need the “long” from being on an extremely crowded (or crush) bus or bypassed in the outbound direction altogether. This is especially important if the ratio of “shorts” to “longs” is 2 or 3 to 1.

  8. observer August 29, 2010 at 6:42 pm #

    Back in the days when I used to perform computer performances analysis I used to start my presentations of the results with: “Remember the old saying, you can drown in a pool of water whose average depth is 6 inches.”

  9. ant6n August 30, 2010 at 12:55 am #

    @observer: You can drown in a pool with a _maximum_ depth of 4 inches! 😉

  10. Gavin August 30, 2010 at 5:25 am #

    “(How do you count passed-up customers? A good question for another post.)”
    I might jump the gun a bit, but I know on Brisbane Transport buses there is a specific code the driver can punch in on their radio to report full loads as they happen, which they obviously must keep tabs on…..
    Every now and then in the paper you’ll see an article saying X% of services were crush loaded in the past X months”

  11. Tom West August 30, 2010 at 6:33 am #

    @ant6n: given you can drown in a pool with a maximum depth of 4 inches, it follows that you can drown in a pool with an average depth of x inches for any x>0. (Have one small part be 4″ deep, and have the rest of the sufficiently large pool be x/2 inches deep).
    … which re-enforces the point that averages aren’t as useful as they sometimes seem. The number that matters if % of buses that are late/overcrowded.

  12. Zoltan Connell August 30, 2010 at 7:39 am #

    Though criticising the measure average frequency, I notice that you do sometimes use the measure of average wait for a given (actual assuming punctuality, rather than average) frequency; for example when modelling transfers.
    I don’t blame you for that, as it’s the most useful measure in a lot of cases. But not all. The more important timing is, the stricter the measure of frequency becomes.
    For example, the previous house I lived in had the good fortune of being 400 metres from a collection of bus routes that together ran downtown 24 times per hour. In practice, buses nearly always ran two-at-a-time, every 5 minutes. That gave an average wait time of 2.5 minutes, and a maximum wait time of 5 minutes assuming punctuality. The most I ever waited for a bus was 10 minutes, so 10 minutes was the absolute maximum wait time. If buses skipped that stop being full, clearly that figure would be affected.
    When going for shopping, a walk, a cup of coffee, etc., I would consider the average wait. When going for a class/appointment/meeting/etc., I would allow for the 5 minute maximum wait assuming punctuality, and if buses were late, I would be too. When going for a specific train, a very important meeting (or a meeting involving a female that I happened to like), I would allow the 10 minute absolute maximum wait time.
    This is something that we’re all probably vaguely aware of, but agencies ought to have it constantly in mind that every measure is important to the passenger’s perception of transit. The latter two are particularly important because they bring together statistics on delays and frequency that aren’t much good on their own. They thus determine how much passengers feel they can rely on transit, and what frequency they perceive, when it’s important.
    So the analysis of these three figures, which might require more sophisticated measuring than some agencies use now, would seem to be the best substitute for these averages.

  13. Carl August 30, 2010 at 9:20 am #

    @Zoltan – you make an interesting point. Trains tend to keep to their scheduled headways. You can schedule 8 trains/hour with 7.5 minute headway and reasonably expect to have 7.5 minute headway and 3.75 minute average wait.
    Buses tend to bunch, whether due to differences in drivers or as a result of the randomness of passenger arrivals or traffic lights, time for stops is proportional to the number of passengers alighting and boarding, and as soon as the gap from the previous bus increases, more passengers being served slows that bus, or conversely if a bus catches up to the previous one, it starts operating faster due to less passengers. If the schedule calls for 12 buses/hr with 5 minute headways, the majority of passengers are more than likely going to experience something closer to 8 minute headways.

  14. Agustin August 30, 2010 at 11:21 am #

    A physicist, an engineer, and a statistician are hunting deer. The physicist takes a shot and says: “missed him by a metre to the left!” The engineer takes a shot and says: “missed him by a metre to the right!”. The statistician smiles and says: “nice work guys, you got him!”

  15. Dan Cooper August 30, 2010 at 3:02 pm #

    My understanding is that here in Vancouver, pass-ups are also intended to be counted through call-in reports from drivers when they occur. From the commentary I’ve heard, this method is not held by anyone to be very reliable.
    I don’t like the term “crush loading,” at least here in Vancouver, because in my observation there are usually just twenty people at the front of the bus, who can’t move back because of five or ten that won’t move back blocking them, while the back of the bus is empty. This leads to people yelling at the driver, even though the driver can’t actually see the back of the bus due to people in the way.
    Speaking of which, @anton, it’s also my experience that bus riders in big cities are not just the poor, and that rudeness extends across the social strata. I don’t know, of course, the specific wording of the Montreal transit union’s statement…..
    That all being said, I’m reminded of an example of the “cultural divide between planning and operations,” mentioned by Jarrett. My father, who worked for some years in planning and later management for a small-city transit district, tells the following story, which took place several years after his retirement. A friend who lived across from the local library called him after a major snow storm to express concern that buses were bypassing the library stop, leaving passengers to walk to the next stop and wait 30 minutes for another bus. Dad called a friend of his from the district’s operations department, who told him that yes, they were not running buses to that stop as the buses might get stuck. Dad suggested they put up a sign saying the stop was closed, but Mr. Operations expressed that he could not conceive why they would possibly do such a thing. Dad wished him good day, and called a different friend, only this time in the customer service department. That person sighed, and said they would send someone right over to put up a sign.
    Actually, the same kind of thing happens here in Vancouver – every so often, my son’s bus stop just mysteriously disappears, usually without notice or signage (sometimes, be it said, it seems more due to City construction without notice/coordination), and he has to hunt up another one.

  16. Paul K. McGregor August 30, 2010 at 3:21 pm #

    When I started out doing service planning back in the 90s, I had to rely on averages to make decisions related to service planning. Usually, the sample size was not nearly large enough to be statistically valid so I would always be asking myself if I was making the right decisions. Based on the available data, yes. Based on the customer reality, probably not.
    Now let’s jump ahead to the new century and my how things have changed. With the use now of avl and apc, data collection has become a lot more automated and allows for more detailed analyses to be done. You should now be able to get a huge amount of data that should be able to allow you to look at trip level data and even make comparisons by day. So there isn’t any reason why a good service planning staff that has these tools at their disposale should not be able to better address the concerns that you raise.

  17. Zoltan Connell August 30, 2010 at 4:45 pm #

    Incidentally, my example above is not atypical of bus frequencies in Leeds, and it leeds me to the following principle, confirmed by years of observation:
    Where you run 12 or more buses per hour without signal priority, average wait times are meaningless, as buses will always tend towards bunching.
    This is because at an average frequency of five minutes, the timing of traffic signals creates significant enough differences in the gaps between buses to trigger the bunching effect.
    The result is that it’s unclear how much very high frequency adds to the frequency that passengers perceive. I’ve often thought, therefore, that things might be better if the resources providing 12+ buses per hour went into providing a mix of limited and local buses, with the longest routes into the suburbs making limited stops on the corridor.

  18. Steve Lax August 30, 2010 at 6:16 pm #

    There is a book titled “Why Do Buses Come In Threes? The Hidden Mathematics of Everyday Life” by Eastaway and Wyndham. U.S. publisher is Wiley. (Amazon U.S. had it for $10.95).

  19. Eric August 30, 2010 at 6:34 pm #

    Great observations Zoltan. Esp. that one on signal priority.
    A transit agency in a city near mine relies on “pushers” to improve route performance (those extra buses running between “scheduled” trips in commuting times that attempt to pick up the slack). You just know that average load talk in the planning room is what is behind these things. Indeed, they do seem to improve route performance overall, but customers quickly learn to account for the inefficiency of use due to the bunching effect. While they value them (and hotly demand their pushers), customers know enough not to trust them. (To Jarrett’s point, the bunching effect is good reason why “average load” is useless to gauge dependability and met demand. A crush loaded scheduled with a near empty pusher following close behind improves “average load”. But a customer’s experience? Not really…They just suspect that things could be running God forbid even worse without the pushers.)

  20. Joseph E August 30, 2010 at 9:33 pm #

    Wow, no one confirmed the story about LA?
    Well, I will confirm that the Metro 12-minute map combines frequncies from Local and Rapid (limited-stop) buses. It also shows a rail line which comes only every 15 minutes during the day.
    You can see the peak, daytime and evening frequency tables on the upper corner of this map:http://www.metro.net/riding_metro/maps/images/System_Map.pdf
    and compare to the 12-minute map: http://www.metro.net/riding_metro/maps/images/12_min_map.pdf
    Sorry for the PDFs. Metro needs to publish something better.
    there are routes on the 12-minute map that qualify only because the local plus limited (rapid) add up to 6 buses per hour: for example, Reseda has local buses every 15 to 20 minutes, and rapids every 25 minutes. There will be at least 2 gaps of 15 minutes (or more) most hours.
    On Ventura west of Reseda you have the 750 every 20 minutes and the 150 every 30 to 40 minutes; not sure how that adds up to even 5 buses per hour! Most of the time you will wait 20 minutes, if you are at a Rapid stop. Waiting at a local bus stop on that street result in a very long wait or 40 minutes until the next bus. It’s not surprising that the Orange Line (with 10 minute or better headways all day) is a success, when this is the BEST parallel transit line anywhere nearby.
    Several other streets have equally bad service, with 2 or 3 buses per hour each of Rapid and Local service, really not that frequent.

  21. Joseph E August 30, 2010 at 9:34 pm #

    More data on the Metro 12-minute map:
    Sadly, few of the Rapid routes are frequent in Los Angeles. Half of the 28 “Rapid” routes would come at least every 12 minutes at rush hour, but only the 720 on Wilshire is that frequent all day long, and only 4 others are ever 15 minutes all day. Most of the other routes come every 20 to 30 minutes during the day,
    The rapid transit lines all show on the map, but the Green Line LRT is only every 15 minutes during the day, and the Harbor Transityway BRT is even worse, every 20 to 30 minutes off-peak. The other rail and BRT lines are better and actually meet the 12 minute requirement all day, until dropping to 20 minutes in the evening.
    There are 6 local lines that maintain 12 minute or better service all day, by themselves. 7 more are every 15 minutes. One is on Wishire (no surprise),
    A real “12-minute frequent all-day service on one line” map for Los Angeles would only have 12 lines: 7 bus routes (one with both local and rapid), 4 rail lines and 2 BRT routes. A 15-minute map could have 20 lines.
    12 every 12 minutes! I like that.
    To be fair, there are also a couple LADOT circulators and other shuttles that could be on the map, 4 bus lines in Long Beach, 2 in Santa Monica, and 2 in Montebello (!) that really do provide local service every 12 minutes, by overlapping 2 or 3 routes. By that standard, Metro also has a bunch of other lines near Downtown with combined local service every 12 minutes. But you won’t see a bus or train with the same number every 12 minutes, except on those 12 lines above.