A San Francisco reporter emailed me yesterday with this question, regarding the city’s main transit system, Muni:
As you know, Muni set a goal in 1999 when the [San Francisco Municipal Transportation Agency] board was formed, to have a 85 percent on-time performance standard. That was voted on in 1999 (Prop. E). Since then … the agency has yet to the meet goal or even gotten close to it. The highest it’s been was 75 percent a few months ago. … I wanted to ask if you if there is any danger for Muni to be so focused on this one standard? Are performance metrics evolving and why are they evolving? What else should Muni to be looking at as far as improving reliability?
There are two problems with the measures of “on-time performance” that prevail in the industry.
1. They are not customer-centered. A standard on-time performance measure shows percentage of services that were on-time, not the percentage of riders who were. Because crowded services are more likely to be delayed, the percentage of customers who were served on-time may be lower than the announced on-time performance figure. An on-time performance figure weighted by ridership would give a clearer impression of the actual customer experience. This requires counting the on-board load at every timepoint, which is difficult but getting easier. Understandably nobody wants to see numbers that are even lower than they are, but the truth could be helpful in focusing attention on where the real problems are on the busiest services. (It would be trivially easy for the airline industry to do this, but they don’t. Perhaps they want you to think, falsely, that if the whole airline is 90% on time that means there’s a 90% chance that you will be on time. If bigger more crowded planes are more likely to be late, then that’s not true.)
2. For high-frequency, high-volume services, actual frequency matters more. Suppose that a transit line is supposed to run every 10 minutes, but every trip on the line is exactly 10 minutes late. A typical on-time performance metric (e.g. the percentage of trips that are 0-5 minutes late) will declare this situation to be total failure, 0% on-time performance. But to the customer, this situation is perfection.
For this reason, some big-city transit agencies use a “headway-maintenance” system, in which a transit operator’s job is not to run on time, but rather to run a specified number of minutes after the preceding vehicle on the line. What should be reported in this case is not the time a bus came, but the actual elapsed time between consecutive trips. GPS based real-time location systems, which are becoming common, provide the information needed to do this.
It’s worth noting that these two objections to standard on-time performance probably push the reported rate in opposite directions. An on-time performance measure weighted by ridership (responding to my point #1) would almost certainly yield a lower score. But on high-frequency services (my point #2) I suspect that many services now being counted as late don’t seem that way to the passenger. If the trip ahead was late as well, then the actual gap between trips — which is all that a customer notices on high-frequency service — may be more or less fine. So a shift to headway maintenance might yield numbers that tell a more positive story about the actual customer experience. In any case, once the transit agency is clearly doing what it can to optimize reliability, the case for other improvements — such as stronger transit priorities — will be far stronger.
Changing these metrics is hard. There’s a lot of room for debate about exactly how to calculate a revised measure, and that’s the easy part. Headway maintenance, in particular, means changing job expectations and management practices. It can affect how drivers are judged, so the unions may have a role. All this takes a lot of courage and persistence, so it’s understandable that many agencies aren’t ready to do it. Some agencies are thinking about the issue but are not ready to act.
The hardest part may be explaining to the public that the previous measures weren’t so good, because someone will ask: “If that’s true, why have you used the imperfect measure for so long?” Fortunately, there’s a good answer to that: the quality of data. GPS vehicle location systems have become routine in big North American systems only in the past decade, while smartcard and app-based ticketing technologies — which can pinpoint ridership by time and location — are still coming on. Many of these systems have lots of bugs, but as they become more reliable, these systems will provide robust data that was never there before. Transit managers can say, truthfully, that they’ve always understood the problem with on-time performance but never had the tools to monitor a more nuanced and accurate indicator. But now they do, or they will soon.
So perhaps now is the time to update the idea of “on-time performance,” and the ways that agencies pursue it, to achieve a better focus on what matters to the customer.
[Updated July 8, 2023: This 2010 post needed very little editing to bring it current. The ability to measure headway reliability, and to relate reliability to the number of people affected, has improved enormously since I wrote this, although there are still many data challenges. Many more agencies have shifted to headway management, but some, like San Francisco, are stuck with on-time performance measures written into law even if they don’t really measure what matters. This continues to be a lively and important topic.]