Subscribe to Decision Science News by Email (one email per week, easy unsubscribe)
EVERY U.S. FLIGHT IN 2013 ANALYZED
If you read Decision Science News, you are probably interested in decision making, you probably fly a lot, and you probably like making decisions about flying.
Data of the type the U.S. Government provides enable us to predict how delayed we will be when we fly at various hours of the day.
To make the plot above, we analyzed every single flight in the United States in 2013 for which there were Bureau of Transportation Statistics data. Filtering out flights between midnight and 6AM that leaves us with a little over six million flights (6,283,085 flights, to be precise). The BTS defines delay as the difference between the time the plane actually arrived and the time listed in the computerized reservation system. Many flights got in early, but because we’re just interested in delays (not speedups), we negative delays with zeroes.
What do we learn?
The later you leave, the greater the average delay you will face until around 6PM when things flatten out and 10PM when we see benefits in leaving later. It makes sense that delays increase as the day goes on because, we understand, the primary cause of delays is waiting for the plane to arrive from another city. The first flights out in the morning don’t have this problem.
About 60% of flights had no delay at all (3,726,061/6,283,085 or 59.3% to be precise). This has something to do with padding the expected arrival times in the computerized reservation system. Hence all the “negative” delays.
Leaving at 11PM gives you the same delay as leaving at 11AM. Miracle of miracles. Want a rule of thumb? Try not to leave between 11AM and 11PM.
The arrival and departure curves are quite similar. To save space, we’ll only look at departure delays from here on.
Now, you may be thinking “20 minutes delay if you depart at the worst possible time? That’s not such a big deal.” But remember, these are averages and 60% of the time there will be zero delay. To show you how bad things can get, here we plot the 95th and 75th percentiles of the delay distribution:
Do different airports have differing delay patterns? One might expect them to due to weather, total number of flights, longitude and the like. We isolate the ten airports with the most passenger traffic below:
In an early analysis, we thought we’d discovered something pretty cool about day of the week effects. We had chosen two months at random and noticed certain days were predictably worse than others. But then, when we looked at two different months, different days emerged as the worst ones. Digging deeper, we found that the day-0f-week effects are attributable mostly to rather random events which change from month to month. Here we look at median (not mean) delays on every day of 2013. Each panel represents one month.
R-code, as usual, for those who want it. To get the flight data, just go to … aw heck, I’ll be nice and let you download my cleaned up copy (25 Mb)
This is our first use of Hadley Wickham’s tidyr package. We like it!
1. We just learned of some extensive analyses pre-2009 flight data you might find interesting. See the FlowingData blog post. The supplemental information in this paper has some interesting analysis of flight delays. For example, hub airports tend to have a lot of outbound delays because they hold planes when an incoming flight is late. This leads to a lot of arrival delays at non-hub airports. See wicklin-supplemental.pdf page 7.