Code Monkey home page Code Monkey logo

elastic-flights's Introduction

Elastic Stack demo for airline data

US Domestic Flights ETL flow with weather, geo, delays, airlines for the country's top 5 airports. Uses Logstash, Elasticsearch & Kibana (with optionally the Kibana plugin Timelion).

As added bonus, there is a separate data set with 2014 TSA claims data.

Running the demo

Complete these steps:

  1. Download data:

    • sh wget.sh
    • Data size is about 2.5 GB. Because of filtering, size in Elasticsearch will be much lower, below 100 MB.
  2. Create Elasticsearch indices and templates:

    • sh create_flight_template.sh URI [USERNAME:PASSWORD]
    • sh create_tsaclaims_index.sh URI [USERNAME:PASSWORD]
  3. Ingest flight data into Elasticsearch with Logstash:

    • Optionally put a username/password/host in import_*.conf
    • sh load_tsaclaims.sh && sh load_flights.sh
  4. Create an alias called flights, composed of all flights-* indices:

    • sh create_flight_alias.sh
  5. Create the index patterns in Kibana:

    • tsaclaims with Date Received as time field
    • flights with FlightDateTime as time field
  6. Import Kibana visuals and dashboards:

    • In Kibana, go to Settings, then Objects, then Import kibana_import.json
    • Optional: Timelion is a time series graphing plugin for Kibana, developed by the people of Elastic. Read more about Timelion and how to get it here. Currently it is not possible to export or import Timelion sheets. To create some charts about this data, open Timelion and add the following. For every line, add a Chart on the Timelion sheet and paste in the code for six different charts. Don't forget to save the sheet.

    .es(index=flights).label("All Flights"), .es(index=flights, q=ArrDelayMinutes:>0).label("Delayed Flights")

    .static(55).color(red).label("Red Line"), .static(50).color(orange).label("Orange Line"), .es(index=flights, q=ArrDelayMinutes:>0).label("Delayed Flights Percentage").divide(.es(index=flights)).multiply(100).color(navy).movingaverage(5)

    .es(index=flights, metric=avg:tmax).color(orange).lines(width=2).movingaverage(5).label("Minimum Temperature (celsius) mavg=5"), .es(index=flights, metric=avg:tmin).color(lightblue).lines(width=2).movingaverage(5).label("Maximum Temperature (celsius) mavg=5"), .es(index=flights, metric=avg:WeatherDelay).color(Red).movingaverage(5).label("Weather Delay (in minutes) mavg=5")

    .es(index=flights, q=ArrDelayMinutes:>0).label("Delayed Flights Percentage").color(navy).movingaverage(10), .es(index=flights, metric=sum:terribility).label("Terribility Index").movingaverage(10)

    .es(index=tsaclaims, timefield="Date Received").movingaverage(7).label("TSA Claims mavg(7)"), .es(index=flights).movingaverage(7).divide(10).label("Flights mavg(7) /10")

    .es(index=flights, metric=avg:snowfall).divide(10).add(.es(index=flights, metric=avg:thunder)).sum(.es(index=flights, metric=avg:hail).multiply(3)).sum(.es(index=flights, metric=avg:glaze).multiply(2)).sum(.es(index=flights, metric=avg:fog).multiply(1)).sum(.es(index=flights, metric=avg:heavy_fog).multiply(5)).sum(.es(index=flights, metric=avg:dust_ash).multiply(10)).label("Average Terribility(R)").points(4).color(Navy), .es(index=flights, metric=avg:terribility).label("Ingested Terribility(R)")

Prerequisites

  1. Elasticsearch 2.3
  2. Kibana 4.4
  3. Logstash 2.3
  4. Timelion 4.4 (optional)

Other versions may work but are untested. If it turns out it works, please consider letting us know by making a pull request on this README.

What's included

  1. create_*.sh: sets up Elasticsearch templates, mappings (actual mappings in mapping*.json) and aliases
  2. lookup_data/*: airport timezone and weather data for enriching the flight data
  3. logstash/filters/*.rb: four simple Logstash filters to join the lookup data
  4. load_*.sh: invoke Logstash to import the flat data files
  5. remove_indices.sh: remove all indices, mappings, templates and
  6. wget.sh: downloads the flight data files
  7. import_*.conf: configuration files for Logstash. Here, the host is hardcoded so change it to your needs
  8. kibana_import.json: Two Dashboards and 43 Visualizations for Kibana

Data sources

  1. The airline data is taken from US BTS and is limited to 2014 and the 5 busiest airports: ATL, ORD, JFK, LAX and DFW. Flights need one of these airports as both source as well as destination to qualify.
  2. The weather data is taken from NCEI. For all 5 airports I used the closest weather station (in all cases, that means readings that are taken on the actual airport)
  3. The timezone data was provided by jpatokal

elastic-flights's People

Contributors

bahaaldine avatar loekvangool avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.