Now, Anyone Can Monitor Reliability

Can you think of a better way to measure service reliability than the ones your transit agencies use?  Can you develop ways to analyze the system’s performance that will reveal more precisely where and why things go wrong?  Now, any transit geek with a head for statistics can try out these ideas, and share what they discover, for any transit agency that publishes a real-time information feed.

San Francisco’s Muni was one of the first transit agencies to make all of its realtime information public.  NextBus uses this data to drive its real-time information displays in shelters, and on its website.  But these systems also have a terrific by-product: they generate a historical record of where every bus and train was, at every minute, on every day.  The sheer mass of information makes it possible to aggregate many days’ data so that the noise of random disruptions falls away and you can see the pervasive, routine ways in which the system may be failing (or succeeding!).

Prompted by an earlier post of mine, reader Tung Way Yip took a shot (temporary link).  Here, he assesses a basic metric that he calls “average arrival.”  As I understand this, he’s assessing the likelihood that you will actually wait a given number of minutes.  The Y-axis is the percentage of cases in which each x-value as observed.


For a high-frequency route, this analysis is far, far more relevant to customer experience than the on-time performance metrics that most transit agencies use.  On-time performance — the difference between a trip’s scheduled arrival time and its actual time — is easy to measure.  But once a line is running better than every 10 minutes, no customer is waiting specifically for the 5:32 as opposed to the 5:35 trip, so the on-time performance of that trip doesn’t matter.  What matters to the customer is the actual waiting time.  That’s what Mr. Tung is measuring here.  It’s an analysis that others could take a lot further.


For a simple model of why this matters, imagine that you have a bus line running every 10 minutes, and every single bus is exactly 10 minutes late.  From the standpoint of a classic on-time performance measure (which typically counts the percentage of trips that are more than five minutes late) this situation would be described as 100% failure, because 100% of all trips are late.  From the customer’s standpoint, on the other hand, this would be perfection: buses are coming every 10 minutes, exactly as promised.  Much more about this conundrum, and the choices it requires us to confront, here.

Meanwhile, I encourage other transit geeks to dig into real-time performance data, and especially to aggregate the data for many non-holiday weekdays to smooth out the random accidents and see the more pervasive patterns.  Forget whether some lines are more on time than others.  For frequent service, the interesting question is how long the actual gaps between consecutive buses are.  Perhaps agencies (or failing that, local transit blogs) should publish line-by-line data about the reliability of waiting times.  A statistician would describe the result as the standard deviation of headway, but you could put it in more user-friendly terms:  For a particular line, at a particular stop, you could present the probability that you will wait, say, 10% longer than the ideal average wait.  (The universe of data points would be made up of all the minutes of a day, or other analysis period, and the observed actual waiting time for someone who arrived at the stop at that minute.)

For example, if a line runs every 10 minutes, that means your wait should be an average of 5 min and a maximum of 10.  So what percentage of the time will you actually wait more than 10?  What percentage of the time will you actually wait more than 15?  20?  These simple curves could help people know which transit lines, or parts of lines, they can really count on.  This could be really useful information if you want to make a decision on whether to live on this line, or to buy a vehicle if you do.

For wider appeal, you could also make a map of the data, perhaps in the Eric Fischer style.  Before long, you’d know as much about your transit system’s operations as its operators do, and be able to make really focused, quantitative comments based on incontrovertible data.  Your transit agency may initially react defensively if you do this, but in the end, you’ll be doing them a favor.

20 Responses to Now, Anyone Can Monitor Reliability

  1. Tom West June 30, 2010 at 6:20 am #

    The “ten minute headways and all ten minuets late equals perfectly on-time” doesn’t hold fot hose catching the first bus of the day.
    If you have buses every ten minutes, and staff noitce a 20 minutes gap has appeared, what should they do? The only feasible option is to tell those buses running before the gap to slow down… which means you will always have ‘extra’ buses at the end of the day.

  2. EngineerScotty June 30, 2010 at 7:51 am #

    To further on Tom’s point–what’s missing from this graph is whether or not passengers waiting for a bus at a particular time and place, could actually get ON the bus. While the first bus of the day being 10 minutes behind (and every one after that following suit) may not be a big deal, given that bus service often gets up and running well before the peak hours; but what if a bus is delayed during the peak? The assumption that everyone waiting for that bus will simply get on the next one, and then everything is fine, is of course unrealistic on a crowded line–chances are, someone’s going to be waiting for a bus only to watch one sail on by, too full to take on any more passengers. From the point of view of a waiting rider, a crushloaded bus is no better than a late or missing bus.
    An interesting question: Given that this data is available for folks on the Internet to analyze–how about for drivers? Will a given driver have reason to know how far behind (in time) his follower is–other than explicit direction from the dispatcher (or seeing it in the mirrors)?

  3. Alan Robinson June 30, 2010 at 8:55 am #

    I wonder if transit agencies might think about flagging crush loaded busses in this data. I know that some operators are able to indicate “BUS FULL” on the route display. For certain passengers, such as on the 99 B-line in Vancouver, this information would be very useful.
    @Tom West
    Slowing down busses would certainly help reduce gap. However, I believe the more common, and customer pleasing solutions, are to short turn busses near the end of a route, or to insert a relief bus in the middle of a route.

  4. Jarrett at June 30, 2010 at 9:09 am #

    @Alan.  The source of real time info data is GPS records of the location of each bus at each minute.  Asking it to pick up overcrowding would require bus driver intervention, by flipping some switch that identifies the overload and switching it off again when out of it.  This would reduce the reliability of the data substantially.
    @all.  While overloading is certainly a problem not caught by this data approach, that doesn't invalidate the approach.  Nor does the failure to accurately capture the impact on the first or last trip of the day — truly a vanishingly small issue in the context of frequent services that have many hundreds of trips per day.  Adjacent to the first and last trip are generally low-frequency trips that shouldn't be evaluated by this method anyway.  This whole post is specifically about high frequency services. 

  5. EngineerScotty June 30, 2010 at 9:14 am #

    I think Alan is suggesting that the “bus full” indicator, if present, can be monitored remotely just like position.
    Yes, it does require driver intervention, but if a driver fails to make a scheduled stop (or at a stop, only the exit doors open), some indication of why is useful in any case. (At the minimum, there would need to be some way of distinguishing between stops skipped due to a full bus, and stops skipped due to nobody waiting).

  6. Ted King June 30, 2010 at 10:03 am #

    Two alternatives to a driver-based “BUS FULL” switch :
    1) Optically scan the bus when it enters / leaves the downtown area (e.g. in San Francisco when the bus crosses Van Ness Ave. on O’Farrell / Geary [#38] or Post / Sutter [#2 etc.]);
    2) Have weight and vehicle sensors at nodal points for traffic analysis and heavy vehicle (trucks / buses) checking.
    Of course #1 probably won’t work if the bus has been wrapped with a giant ad but c’est la vie. Also, some systems have a Mark I eyeball version of #1 in place called “field inspectors”.

  7. George June 30, 2010 at 10:54 am #

    I’ve been doing this with TTC data from Toronto. See
    I would love to start a project to implement some open source style ways of analyzing the data. Please contact me if interested.

  8. George June 30, 2010 at 11:06 am #

    It may not be able to determine full loads, but it could determine length of time doors are open. This would help split time from waiting for signals and waiting for passengers…a very useful stat if you are trying to determine whether to move to all-door loading, or if there are signalling issues.

  9. Daniel Howard June 30, 2010 at 11:41 am #

    Where can we get this data? It’d be interesting to visualize some of the less-frequent services. As I recall, one-hour off-peak waits were not uncommon for the 28. 🙁

  10. Eric Fischer June 30, 2010 at 2:14 pm #

    Daniel, you can download the month worth of Muni location data that I plotted on that speed map from
    I’m continuing to sample it (except for a power failure yesterday) and hope to be able to calculate some long term trends.

  11. Anthony Palmere June 30, 2010 at 3:13 pm #

    There are several good suggestions and examples for waiting time measures in Chapter 6 of the TCRP Report 113 “Using Archived AVL-APC Data to Improve Transit Performance and Management” available at (you will need to login to download the material).

  12. Brent June 30, 2010 at 7:39 pm #

    In addition to George Bell’s comment, I would encourage folks interested in this type of analysis to visit Steve Munro’s web site (in particular, the “Service Analysis” category). Steve has done a lot of these types of analyses, and many others, for most of Toronto’s streetcar routes and a couple of its bus routes, and works on this very principle — that, on frequent service routes (headways 10 minutes or less), schedule adherence may matter to the transit agency and vehicle operators, but all that riders care about is headway variation. (George’s and Steve’s analysis methods are a little bit different but could be considered to complement one another.)

  13. Brent June 30, 2010 at 7:55 pm #

    The “average arrival” metric is interesting and easily calculated, but a weighted average arrival would be more representative of what riders actually experience. Here’s an example of what I mean:
    Let’s say we have a route that operates every 10 minutes, and there is a steady flow of 1 passenger per minute arriving at a particular stop (or 10 passengers per bus). If the buses operate like clockwork and always exactly 10 minutes apart, 10 passengers will board on each trip, and on average they will have waited 5 minutes each (half the headway).
    Now let’s say that this same route operates really poorly one day, and it operates in bunches with a gap of 19 minutes followed by another bus one minute later. The capacity of the route and the average headway haven’t changed — there are still six buses per hour, or every 10 minute on average. However, the rider experience is obviously very different. In essence the route is operating as though it was on a 20-minute headway. The first bus will board 19 passengers (who collectively have experienced an average 9.5-minute wait) and the second will board one passenger (who will have lucked out with an average 0.5-minute wait, not to mention his choice of seat). This will result in a weighted average wait of just over 9.05 minutes, or a perceived average headway of 18.1 minutes (double the average wait).
    I’m not aware of anywhere that uses perceived average headway as a metric, likely because it was too difficult to get enough meaningful data to do the calculation prior to GPS and NextBus. However, now that this type of data is becoming more common and more easily obtained, I’d like to see it used to report on perceived average headway (as opposed to, or as a comparison with, the mean headway calculated strictly based on buses per hour). I believe that would be a much more meaningful statistic that reflects what riders at a bus stop actually experience.

  14. Steve Munro July 1, 2010 at 9:48 am #

    Thanks to earlier commenters in this thread who have pointed to analyses of TTC data by me and by “George” at
    I actually started work on this sort of thing before the TTC had installed GPS units in its vehicles, and identified their location via a primitive combination of radio signposts plus hub odometers. A vehicle would be at signpost “1234” plus some distance (whose accuracy varied from vehicle to vehicle). Oddly enough the direction was not included, but was inferred from the schedule. This caused vehicles that were rerouted or short-turned to appear as if they continued on their routes up to a point where the system gave up trying to reconcile their claimed position with the schedule and “teleported” them to a more accurate position. You can imagine what this does to data analysis.
    The new GPS-based data have been available (on request) for streetcar routes for about a year, and should be fully rolled out for bus routes by the end of 2010. This is vastly superior. The 20-second sampling interval on the TTC allows one to see effects such as holds for traffic signals, stops with long loading times, even behaviour going around loops/terminals. Among other things, I have been able to unmask the fact that at some locations streetcars spend more time waiting for “transit priority signals” than they do serving passengers at stops.
    Previous analyses I have posted include charts of actual operations (“Marey” charts) which easily reveal congestion and delays, and show that the latter occur not just in the peak or the core, and that delays are much less common than claimed as a source of service irregularity.
    Once you have the data in a suitable format, it is easy to analyse headways at a point, travel times between points, especially scatter in these values. Notable by its absence in my analysis is any consideration of scheduled times whih, for frequent routes, are meaningless to passengers.
    Transit agencies love to cite average figures, and indeed accept as “on time” a vehicle that may be more than one headway out of position. They don’t look at the “cloud” of data points to see how many vehicles arrive nowhere near the desired spacing, or how consistent (or inconsistent) travel times between points might be. If travel times are consistent (long term variation in time, but not much short-term scatter), this is “predictable congestion” that the timetable should deal with.
    The TTC has resisted efforts to open up its data in part because many in management see that their usual claims about why service isn’t as good as it might be fall apart when subjected to analysis based on a large amount of historical data. As for real-time data, the NextBus maps for streetcar routes were briefly visible, but taken down because they were too embarrassing. When you can’t run regular service on a quiet Sunday morning or a holiday, something is wrong.
    They have even claimed that exposing this data could be a “security problem” because (gasp!) people would know where a bus might be alone on the system. Clearly they do not expect their timetables to be of much use in acquiring the same information.
    Service quality is what transit systems are selling, but at least in Toronto’s case, showing what’s wrong now and talking about how it might be improved is something TTC management don’t want to do, not unless they can blame every problem on external factors. “Traffic congestion” was their mantra, but ragged service on routes with their own rights-of-way showed that for the canard that it was. An ongoing report of service quality driven by real-time data would allow systems to show how they are improving and, of course, would be a great tool to spot locations where things needed attention.
    Obvious stuff, but in Toronto we have the data and no will to make proper use of it.
    The link to service analyses on my site is:

  15. Steve Munro July 1, 2010 at 10:04 am #

    A separate remark about “perceived headways”.
    In Toronto, the TTC reports that routes have average loads on vehicles, and that these fit within standards, without disclosing the range of values, or even attempting any estimate of the latent demand the route is not handling because of undendable service. Service actually has been cut on routes where the “averages” look just fine, but the quality of service on the street is terrible. Some of the planning staff understand that extra capacity can be provided by running properly spaced and managed service, but a cultural divide between planning and operations gets in the way.
    A major issue for all transit systems is the conflict between being “on time” and its implications for labour contracts (minimization of overtime, getting off work more or less when expected, getting scheduled breaks) and providing regular service.
    This ties back to a post elsewhere here about the benefits of a totally automated system. It has no staff who care about their work schedule. However, most routes need operators, and transit systems (and unions) need to find a way to schedule and pay their staff that does not depend on vehicles always being at the same time and place.
    A simple example would be a route that normally takes one hour to make a trip, and runs with a 6 minute headway using 10 vehicles. If the weather is bad, or an event such as road construction stretches the trip time out to 70 minutes, isn’t it better to run a regular 7 minute headway than to turn the line upside down trying to keep vehicles “on time”?
    I know that things are not as straightforward as this, but a lot of the service annoyances we see here in Toronto arise directly from a goal of keeping operators, rather than vehicles “on time”.
    In the process, perceived average headways are much worse than the schedule.
    (The other source of variation is that vehicles don’t leave terminals on time. The service starts up unevenly and gets worse as it progresses across the route. This is a line management issue the TTC refuses to deal with.)

  16. Alon Levy July 1, 2010 at 10:55 am #

    Brent beat me to it. The important metric for customers is perceived headway. The actual average headway is only important for transit expense metrics such as fuel and labor.

  17. Steve Lax July 1, 2010 at 6:04 pm #

    Okay. So we now can determine perceived headway. We might actually see the gaps repeating themselves at about the same time every day. And the riding public wants the gaps fixed. How do we do it?
    When I worked at a transit agency, there was a route that had a published five to ten minute peak headway; but, between 8 AM and 8:45 AM, routinely had at least one gap of 20 muinutes that often turned into 45 minutes! And the riders complained.
    The system did not have GPS or NextBus technology and we devoted huge amounts of manpower to solve the problem. We could not.
    Why? In one three mile stretch of the route, we discovered:
    a. An intersection where the traffic signal was overridden by a police officer in the AM peak. Delay dependent on which officer was on duty.
    b. Major construction that occasionally closed one lane of traffic forcing both directions into an alternating single lane.
    c. Garbage trucks on garbage collection days
    d. A parking enforcement officer who occasionally decided to check on cars parked on this road at the AM peak and block the one available lane of traffic.
    e. A school. Depending on which crossing guard was on duty, traffic was held for straggling students.
    f. A supermarket with a small loading bay. Delivery trucks sometimes parked out on the street blocking traffic waiting for the bay to open.
    We tried to work with the two municipalities involved to solve these problems; but bus service was not a high enough priority for them.

  18. Wai Yip Tung July 2, 2010 at 12:51 am #

    I am Wai Yip Tung. I did this analysis on SFMTA. First of all I want to apologize for some error in the interpretation of the chart. For example, the Y-axis is the count of the number of vehicle during the measuring period (5 evenings) and not a percentage. This is entirely my fault as I sent Jarrett the preliminary result without proper labeling. But aside from the lack of rigor, it does not really invalidate the discussion here.
    The “average arrival” is a metrics I coined to help people make sense of the actual wait time. It is actually closely related to the “perceived headway” and the description done by Brent at 12:55. I’m struggling to come up with a non-geeky term that’s meaningful for normal person.
    If you look at the chart again, the mean or averge headway is 7.5 minutes, which sounds pretty good. But taking the effect of bunching and long gap into account, my metrics say the expected headway will be 5-19 minutes. This is significantly worst than the ideal performance.
    More precisely this metric measure when someone show up at random time, 80% of time they will board a bus with a headway within the range. 10% of time they will be better off and 10% of time they will be worst off.
    Again this is preliminary analysis. I´m hoping to get more insight with further analysis into the data.

  19. dejv July 5, 2010 at 9:40 am #


    But once a line is running better than every 10 minutes, no customer is waiting specifically for the 5:32 as opposed to the 5:35 trip

    Well, I do – in the morning. It allows me to sleep 5 minutes longer. The bus must never leave early of course, to make this work.

  20. Eric Fischer July 13, 2010 at 12:29 am #

    If anybody is interested, I did a Muni bus operations plot in the Tufte railroad timetable style:
    It’s probably not so good for determining the typical wait time for a bus at a given location, but you can definitely see where there is bunching and gaps in service.