SQL Queries Demonstration: Airline Operations Analysis¶
In this notebook, we'll explore various aspects of airline operations using SQL queries. Our dataset comprises three tables: airlines
, airports
, and flights
. By leveraging SQL queries, we aim to extract valuable insights to understand airline performance, identify trends, and address operational challenges.
Key Questions Addressed:¶
Distinct Airlines:
- Understanding the variety of airlines operating in our dataset.
Maximum Distance Analysis by Airline:
- Determining which airlines operate flights covering the longest distances.
Busiest Departure Days from Top 10 Airports:
- Identifying the busiest days for departures from the top 10 airports.
Cancellation Ratio Analysis by Airline:
- Assessing the cancellation ratios of different airlines to gauge their reliability.
Top 10 Airports with the Highest Average Arrival Delays in January:
- Identifying airports with significant average arrival delays in a specific month.
Analysis of On-time Landings Despite Departure Delays Exceeding 15 Minutes by Airlines:
- Evaluating which airlines maintain on-time arrivals despite departure delays.
Dataset Overview:¶
Airlines Table:
- Contains information about various airlines, including their names and unique identifiers.
Airports Table:
- Provides details about airports, such as their names, locations, and unique codes.
Flights Table:
- Includes comprehensive flight data, such as departure/arrival times, distances, cancellations, delays, and associated airline and airport codes.
Methodology:¶
- We'll utilize SQL queries to extract relevant information from the dataset.
- Each query will be accompanied by a brief explanation of its purpose and the insights it provides.
- Visualizations may complement some analyses for clearer interpretation.
Let's Dive In!¶
Without further ado, let's delve into our SQL queries to uncover valuable insights into airline operations and performance.
# @hidden_cell
%load_ext sql
%sql sqlite:///jupyter_sql_tutorial.db
Distinct airlines¶
This query selects unique values from the "AIRLINE" column of the airlines
table. It retrieves only distinct airline names, meaning each airline name appears only once in the result set, even if it occurs multiple times in the original table.
%%sql
select distinct AIRLINE
from airlines
* sqlite:///jupyter_sql_tutorial.db Done.
AIRLINE |
---|
United Air Lines Inc. |
American Airlines Inc. |
US Airways Inc. |
Frontier Airlines Inc. |
JetBlue Airways |
Skywest Airlines Inc. |
Alaska Airlines Inc. |
Spirit Air Lines |
Southwest Airlines Co. |
Delta Air Lines Inc. |
Atlantic Southeast Airlines |
Hawaiian Airlines Inc. |
American Eagle Airlines Inc. |
Virgin America |
Maximum Distance Analysis by Airline¶
This query retrieves the maximum distance traveled for each airline from the flights
table. It groups the data by the airline name (AIRLINE) and selects the maximum distance (max(DISTANCE)) for each group. So, the result contains one row per airline, with the airline name and the maximum distance traveled by that airline.
%%sql
select AIRLINE ,max(DISTANCE)
from flights
group by 1
* sqlite:///jupyter_sql_tutorial.db Done.
AIRLINE | max(DISTANCE) |
---|---|
AA | 3784 |
AS | 2874 |
B6 | 2704 |
DL | 4983 |
EV | 1330 |
F9 | 2065 |
HA | 4983 |
MQ | 1236 |
NK | 2342 |
OO | 1735 |
UA | 4962 |
US | 2979 |
VX | 2704 |
WN | 2447 |
Busiest Departure Days from Top 10 Airports¶
This query identifies the busiest departure days for flights originating from the top 10 airports. Here's a summary:
Identifying Top Airports:
- Selects the top 10 airports based on the number of departing flights from each airport.
Ranking Departure Days:
- Calculates the number of departures for each airport on each day and ranks them based on departure count.
Selecting Relevant Information:
- Retrieves the airport name, city, departure date, and departure number for the busiest departure day of each airport.
Joining with Airport Data:
- Joins the departure data with airport information to enrich the results.
Filtering and Ordering:
- Filters the results to include only the busiest departure day for each airport.
- Orders the results by the number of departures on the busiest day in descending order.
Overall, the query provides insights into the busiest departure days for flights from the top 10 airports, offering details such as airport name, city, departure date, and the number of departures on those days.
%%sql
with top_airport as (select ORIGIN_AIRPORT, count(*)
from flights
group by 1
order by 2 desc
limit 10),
Ranking_days as (
select F.ORIGIN_AIRPORT,
F.YEAR*10000+F.Month*100+F.DAY as date,
count(*) departure_number,
RANK() OVER (PARTITION BY F.ORIGIN_AIRPORT ORDER BY count(*) DESC) as ranking
from flights F
join top_airport T
on F.ORIGIN_AIRPORT = T.ORIGIN_AIRPORT
group by 1,2
order by 1,2)
select A.AIRPORT,
CITY,
R.date,
R.departure_number
from Ranking_days R
join airport A
on R.ORIGIN_AIRPORT = A.IATA_CODE
where ranking=1
order by departure_number desc
* sqlite:///jupyter_sql_tutorial.db Done.
AIRPORT | CITY | date | departure_number |
---|---|---|---|
Hartsfield-Jackson Atlanta International Airport | Atlanta | 20150213 | 1069 |
Chicago O'Hare International Airport | Chicago | 20150220 | 847 |
Dallas/Fort Worth International Airport | Dallas-Fort Worth | 20150102 | 809 |
Denver International Airport | Denver | 20150102 | 658 |
Los Angeles International Airport | Los Angeles | 20150102 | 640 |
George Bush Intercontinental Airport | Houston | 20150102 | 508 |
San Francisco International Airport | San Francisco | 20150102 | 493 |
Phoenix Sky Harbor International Airport | Phoenix | 20150102 | 487 |
McCarran International Airport | Las Vegas | 20150220 | 399 |
Orlando International Airport | Orlando | 20150103 | 360 |
Cancellation Ratio Analysis by Airline¶
The query calculates the cancellation ratio for each airline, indicating the proportion of cancelled flights relative to the total number of flights operated by each airline. It achieves this through the following steps:
Selection of Relevant Data:
- It selects the airline name (
A.AIRLINE
) and calculates the cancellation ratio.
- It selects the airline name (
Calculation of Cancellation Ratio:
- The cancellation ratio is computed by dividing the total number of cancelled flights by the total count of flights operated by each airline. This ratio is rounded to four decimal places.
Joining Tables:
- It joins the
flights
table (F
) with theairlines
table (A
) using the airline IATA code to access additional airline information.
- It joins the
Grouping by Airline:
- The results are grouped by airline to aggregate flight data for each carrier.
Ordering by Cancellation Ratio:
- Finally, the results are ordered by the cancellation ratio in descending order, showcasing which airlines have the highest proportion of cancelled flights relative to the total number of flights they operate.
%%sql
select A.AIRLINE,
round(sum(F.CANCELLED) * 1.00/ count(F.CANCELLED),4) as ratio_cancelled
from flights F
join airlines A
on F.AIRLINE = A.IATA_CODE
group by 1
order by 2 desc
* sqlite:///jupyter_sql_tutorial.db Done.
AIRLINE | ratio_cancelled |
---|---|
American Eagle Airlines Inc. | 0.0989 |
JetBlue Airways | 0.0606 |
Atlantic Southeast Airlines | 0.0489 |
US Airways Inc. | 0.0383 |
American Airlines Inc. | 0.0323 |
Skywest Airlines Inc. | 0.0292 |
Southwest Airlines Co. | 0.0279 |
United Air Lines Inc. | 0.0277 |
Virgin America | 0.0273 |
Spirit Air Lines | 0.0199 |
Delta Air Lines Inc. | 0.0191 |
Frontier Airlines Inc. | 0.0159 |
Alaska Airlines Inc. | 0.0062 |
Hawaiian Airlines Inc. | 0.0025 |
Top 10 Airports with the Highest Average Arrival Delays in January¶
The query calculates the average arrival delay for flights arriving at each airport during the month of January. It achieves this through the following steps:
Selection of Relevant Data:
- It selects the airport name (
A.AIRPORT
), city (A.CITY
), and calculates the average delay.
- It selects the airport name (
Calculation of Average Delay:
- The average delay is computed by taking the average of arrival delays (
ARRIVAL_DELAY
) for flights arriving at each airport. If the arrival delay is negative (i.e., the flight arrives early), it is treated as 0.
- The average delay is computed by taking the average of arrival delays (
Joining Tables:
- It joins the
flights
table (F
) with theairport
table (A
) using the destination airport IATA code to access airport information.
- It joins the
Filtering by Month:
- The query restricts the data to flights that occurred in January (
month = 1
).
- The query restricts the data to flights that occurred in January (
Grouping by Airport:
- The results are grouped by airport name and city to aggregate flight data for each destination airport.
Ordering and Limiting Results:
- The results are ordered by the average delay in descending order.
- Only the top 10 airports with the highest average delays are included in the output.
In summary, the query provides insights into the airports with the highest average arrival delays for flights arriving in January, showing the airport name, city, and the average delay for each of the top 10 airports.
%%sql
select A.AIRPORT,
A.CITY,
round(avg(case when ARRIVAL_DELAY<0 then 0
else ARRIVAL_DELAY end),2) as avg_delay
from flights F
join airport A
on F.DESTINATION_AIRPORT = A.IATA_CODE
where month = 1
group by A.AIRPORT,A.CITY
order by 3 desc
limit 10
* sqlite:///jupyter_sql_tutorial.db Done.
AIRPORT | CITY | avg_delay |
---|---|---|
Sawyer International Airport | Marquette | 47.46 |
Waterloo Regional Airport | Waterloo | 36.31 |
Jamestown Regional Airport | Jamestown | 33.15 |
University of Illinois - Willard Airport | Champaign/Urbana | 32.98 |
Southwest Oregon Regional Airport (North Bend Municipal) | North Bend | 32.44 |
Central Wisconsin Airport | Mosinee | 32.06 |
Devils Lake Regional Airport | Devils Lake | 31.5 |
Alpena County Regional Airport | Alpena | 31.12 |
Toledo Express Airport | Toledo | 30.49 |
Sioux Gateway Airport | Sioux City | 30.13 |
Analysis of On-Time Landings Despite Departure Delays Exceeding 15 Minutes by Airlines¶
The query aims to analyze the proportion of flights that depart with delays greater than 15 minutes and subsequently land on time, for each airline. It accomplishes this through the following steps:
Selection of Delayed Departures:
- The query begins by selecting only those flights where the departure delay exceeds 15 minutes. These flights are stored in a CTE (Common Table Expression) called
delay_departure
.
- The query begins by selecting only those flights where the departure delay exceeds 15 minutes. These flights are stored in a CTE (Common Table Expression) called
Pre-processing Data:
- The
pre_resu
CTE calculates the count of delayed take-offs (take_off_delayed
) and the count of flights that land on time (landed_in_time
) for each airline. It aggregates the data by airline.
- The
Calculation of Ratios:
- It then calculates the ratio of flights that land on time to the total number of delayed take-offs. This ratio is rounded to three decimal places.
Final Selection and Ordering:
- The final select statement retrieves the airline name, count of delayed take-offs, count of flights that landed on time, and the calculated ratio. The results are ordered by the ratio of on-time landings to delayed take-offs in descending order.
Overall, the query provides insights into the performance of airlines in managing delays by analyzing the proportion of flights that depart late but arrive on time.
%%sql
-- select only delay flight by more than 15min at departure
with delay_departure as (select *
from flights
where DEPARTURE_DELAY > 15),
pre_resu as (
select
A.airline ,
count(*) as take_off_delayed ,
sum( case when D.ARRIVAL_DELAY<= 0 then 1
else 0
end) as landed_in_time
from delay_departure D
join airlines A
on D.AIRLINE = A.IATA_CODE
group by 1)
select
airline,
take_off_delayed,
landed_in_time,
round(landed_in_time*1.00 / take_off_delayed,3) as ratio_delay_dep_landed_in_time
from pre_resu
order by 4 desc
* sqlite:///jupyter_sql_tutorial.db Done.
airline | take_off_delayed | landed_in_time | ratio_delay_dep_landed_in_time |
---|---|---|---|
United Air Lines Inc. | 17005 | 1380 | 0.081 |
Virgin America | 1370 | 110 | 0.08 |
Delta Air Lines Inc. | 18347 | 1168 | 0.064 |
Alaska Airlines Inc. | 2842 | 134 | 0.047 |
American Airlines Inc. | 14034 | 517 | 0.037 |
Southwest Airlines Co. | 34998 | 1243 | 0.036 |
JetBlue Airways | 9557 | 290 | 0.03 |
US Airways Inc. | 8997 | 251 | 0.028 |
Spirit Air Lines | 3972 | 89 | 0.022 |
Hawaiian Airlines Inc. | 1042 | 18 | 0.017 |
Skywest Airlines Inc. | 17316 | 259 | 0.015 |
Atlantic Southeast Airlines | 16767 | 231 | 0.014 |
Frontier Airlines Inc. | 3484 | 33 | 0.009 |
American Eagle Airlines Inc. | 14273 | 116 | 0.008 |