Overview
In this project, we built a system that estimates the chance that a traveler will miss a connecting flight. We used public U.S. flight data from the Bureau of Transportation Statistics, which includes things like departure and arrival times, the airline, and the airports involved. Instead of only predicting "delayed" or "not delayed," the goal was to give a probability that can help people choose safer itineraries and help airlines plan for rebooking and routing costs. We cleaned the raw flight records, matched flights into realistic two-leg connections based on layover time, and then compared three different prediction models. All three models ended up with similar results, which suggests the main limit was the data itself. Important real-world factors like weather, airport congestion, and late incoming aircraft were not included, so the system mostly learned weaker patterns like time of day, season, and which airports were involved.
Technologies
PythonpandasNumPyMatplotlibSeabornscikit-learnXGBoostFeature EngineeringSupervised LearningBinary ClassificationClass Imbalance HandlingLogistic RegressionTree EnsemblesGradient BoostingCategorical Encoding