[ View menu ]

When to fly to get there on time? Six million flights analyzed.

Filed in Encyclopedia ,Ideas ,R
Subscribe to Decision Science News by Email (one email per week, easy unsubscribe)


(click to enlarge)

If you read Decision Science News, you are probably interested in decision making, you probably fly a lot, and you probably like making decisions about flying.

Data of the type the U.S. Government provides enable us to predict how delayed we will be when we fly at various hours of the day.

To make the plot above, we analyzed every single flight in the United States in 2013 for which there were Bureau of Transportation Statistics data. Filtering out flights between midnight and 6AM that leaves us with a little over six million flights (6,283,085 flights, to be precise). The BTS defines delay as the difference between the time the plane actually arrived and the time listed in the computerized reservation system. Many flights got in early, but because we’re just interested in delays (not speedups), we negative delays with zeroes.

What do we learn?

The later you leave, the greater the average delay you will face until around 6PM when things flatten out and 10PM when we see benefits in leaving later. It makes sense that delays increase as the day goes on because, we understand, the primary cause of delays is waiting for the plane to arrive from another city. The first flights out in the morning don’t have this problem.

About 60% of flights had no delay at all (3,726,061/6,283,085 or 59.3% to be precise). This has something to do with padding the expected arrival times in the computerized reservation system. Hence all the “negative” delays.

Leaving at 11PM gives you the same delay as leaving at 11AM. Miracle of miracles. Want a rule of thumb? Try not to leave between 11AM and 11PM.

The arrival and departure curves are quite similar. To save space, we’ll only look at departure delays from here on.

Now, you may be thinking “20 minutes delay if you depart at the worst possible time? That’s not such a big deal.” But remember, these are averages and 60% of the time there will be zero delay. To show you how bad things can get, here we plot the 95th and 75th percentiles of the delay distribution:

Flight_Delays_By_Hour_95thIf you leave at the worst time of day,  1 time in 4 you’ll be delayed more than 20 minutes, and 1 time in 20  you’ll be delayed more than an hour and a half!

Do different airports have differing delay patterns? One might expect them to due to weather, total number of flights, longitude and the like. We isolate the ten airports with the most passenger traffic below:



In an early analysis, we thought we’d discovered something pretty cool about day of the week effects. We had chosen two months at random and noticed certain days were predictably worse than others. But then, when we looked at two different months, different days emerged as the worst ones. Digging deeper, we found that the day-0f-week effects are attributable mostly to rather random events which change from month to month. Here we look at median (not mean) delays on every day of 2013. Each panel represents one month.


The big spike on April 18, 2013? Five inches of rain in Chicago. December 9th, 2013? Delays are mostly due to winter weather in Texas. These little bumps can really alter the day-of-week findings.

Bon voyage!

R-code, as usual, for those who want it. To get the flight data, just go to … aw heck, I’ll be nice and let you download my cleaned up copy (25 Mb)

This is our first use of Hadley Wickham’s tidyr package. We like it!


1. We just learned of some extensive analyses pre-2009 flight data you might find interesting. See the FlowingData blog post. The supplemental information in this paper has some interesting analysis of flight delays. For example, hub airports tend to have a lot of outbound delays because they hold planes when an incoming flight is late. This leads to a lot of arrival delays at non-hub airports. See wicklin-supplemental.pdf page 7.

2. Poking around at this link, we were above to find somewhat steady day of week patterns in this poster which draws on multi-year data.


  1. António says:

    Nice analysis. Even though this if for the USA, I would imagine the situation is not mush different in Europe.

    Did you have a look at the flight cancellations by time of day? Anecdotal personal evidence suggests that there are more cancellations for the last flights of the day if they are predicted to leave late. This would partially explain why there is such a large drop after 9 PM.

    (of course I could also do it myself :))

    November 7, 2014 @ 4:13 am

  2. Wigwam says:

    Does this maybe correlate with weather events? It seems to rain more at certain times of the day.


    November 7, 2014 @ 5:10 pm

  3. dan says:

    António: There do seem to be cancellation data at: http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time

    There are some older data that have cancellation info at http://stat-computing.org/dataexpo/2009/the-data.html

    November 8, 2014 @ 11:47 am

  4. jwhendy says:

    Interesting! I’d toss some labels on your y axes for clarity and so these could stand alone. Assuming minutes and %, but the word “hour” on the x axis initially threw me off on the first plot!

    November 9, 2014 @ 2:50 pm

  5. William Ryan says:

    Just a quick note — recommending leaving after 11 PM is ignoring the very real possibility that a flight will be canceled.

    I have been analyzing flights data to create a model to predict delays, so I went ahead and checked whether the “11 PM to 11 AM” rule of thumb would actually lead to taking more cancelled flights.

    Flights taken after 11 PM are twice as likely to be canceled as flights between 11 AM and 11 PM, so it may actually not be a good idea to fly after 11 PM.

    However, flights taken in the morning before 11 PM are ten times less likely to be cancelled than flights taken from 11 AM to 11 PM, so flying in the morning remains a good idea.

    December 11, 2014 @ 1:07 pm

  6. dan says:

    Great point. And an even simpler rule! Wonder if it applies if you just look at late night flights from the East Coast of the US to Europe. I’ve never been on one of those that was cancelled.

    December 13, 2014 @ 5:29 pm

  7. Daniel Hammocks says:

    What effect does zeroing the negative delays have on the model of the data?

    If you left it in obviously you could predict, if your flight would leave early. But does this alter the way the model in any other way? If so what?

    March 15, 2017 @ 1:36 pm

RSS feed Comments

Write Comment

XHTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>