Code Monkey home page Code Monkey logo

99_challenge's Introduction

Stars pull requests files lines of code commit activity Product Data | B2B SaaS | PLG

99_Challenge

Technical Challenge 99 Data • New York City taxicabs • Isis Santos Costa, Nov 2020 - Jan 2021   Binder

Key results

Link to Notebooks, ordered as per the data prep flow

  1. Data Extraction
  2. Data Transformation
  3. Data Transformation, part 2
  4. Data Transformation, part 2 (addendum)
  5. Data Transformation, part 3
  6. EDA
  7. Mᴏᴅᴇʟɪɴɢ: Time Series Analysis • Forecasting (wip)
  8. Mᴏᴅᴇʟɪɴɢ: Machine Learning (soon!)
  9. Rᴇᴘᴏʀᴛ ғᴏʀ ᴛʜᴇ 🎯 CEO 🎯
  10. Rᴇᴘᴏʀᴛ ғᴏʀ ᴛʜᴇ  💲  CFO  💲 (soon!)
  11. Rᴇᴘᴏʀᴛ ғᴏʀ ᴛʜᴇ 🎮 COO 🎮 (soon!)
  12. Pʀᴇᴘᴀʀɪɴɢ ғᴏʀ ᴛʜᴇ ғᴜᴛᴜʀᴇ: Aggregate Data

Notebooks contents

Notebook 1: DATA EXTRACTION

  • Connecting to DB
  • From DB into DF
  • DF into CSV
  • Nᴇxᴛ: Transformation

Notebook 2: DATA TRANSFORMATION

  • Setting the DataFrames
    • Taking a first look
    • Naming the columns
    • Basic inter-table consistency
  • Checking & improving the data structure
    • Trips x Orders
    • Payment type
    • Rate type
  • Exporting results for retrieval
  • Continues as Transformation, part 2

Notebook 3: DATA TRANSFORMATION, part 2

  • Resuming: retrieving the DFs
  • Checking & improving the data structure (cont'd)
    • Rate type (cont'd.)
  • Basic information
  • Converting strings into datetime
  • Feature engineering: datetime 📅
    • Orders
    • Trips
  • Treating missing data
  • Exporting results for retrieval
  • Continues as Data Transformation, part 2 (addendum)

Notebook 4: DATA TRANSFORMATION, part 2 (addendum)

  • Resuming: retrieving the DFs
  • Feature engineering: datetime 📅 (cont'd) [ Adding further new features for better analysis ]
    • 📈 Adding further new features for better analysis
    • Orders
      • Month
      • Week in the month
    • Trips
      • Month
      • Week in the month
  • Exporting results for retrieval
  • Continues as Transformation, part 3 (Final!) (lat, long) ➔ neighborhoods

Notebook 5: DATA TRANSFORMATION, part 3

  • Resuming: retrieving the DFs
  • Combining the tables: outer join 🔗
    • Listing information added when orders get converted: trips_features
    • Adding trips_features to the table of all orders
    • 🗃️ Reordering the columns of the new DF df_orders_tF (𝘵ransformed, 𝘍inal)
  • Feature engineering: 📍coordinates into 🗺️neighborhoods (reverse geocoding)
    • Preparing for heavy processing:
      • Exporting the dataframe as transformed so far 📤
      • Autotime + tqdm (progress bar)
    • Importing: Geopandas • Geopy (Nominatim+RateLimiter) • PyPlot • Plotly_express
    • Constructing Geocoder
    • Reverse geocoding
      • Pilot test ✔️
      • Full scale 📈, with Nominatim:
        • 1st trial, by list ❌
        • nth trial, by list ❌
        • ( Resuming after Kernel shut down • Retrieving data: nice it'd been saved! 😅 )
        • [ try / except ] ❌
        • [ try / except ] querying item by item ✔️ (kinda... 90 days!)
        • [ numba / njit / numpy vectorize ] ✔️ (kinda: not fast enough)
      • Full scale 📈, taking Einstein's advice:
        • « As simple as possible...
        • ... but not simpler »
        • Geometrical approach: using points and polynoms to define each borough
        • Getting boroughs coordinates from Google Maps
        • Defining a function for a « good enough » location of neighborhood
        • Reverse Geocoding: the light way ✔️✔️✔️
        • Getting coordinates of Manhattan regions from Google Maps
        • Defining a « good enough » function to locate Manhattan regions
        • (Further) Reverse Geocoding: Manhattan regions ✔️✔️✔️
  • Lessons learned
  • Preparing for the future: Creating aggregate data tables
    • Daily data
    • Data by passenger
  • Exporting results for retrieval
  • Nᴇxᴛ: EDA

Notebook 6: EDA • EXPLORATORY DATA ANALYSIS

  • Resuming: retrieving the DF, final transformed
    • Data by order
    • Daily data
    • Data by passenger
  • Exᴘʟᴏʀᴀᴛᴏʀʏ Dᴀᴛᴀ Aɴᴀʟʏsɪs • Organizing features into classes
    • Features List • Data by order
    • Features List • Daily data
    • Features List • Data by passenger
    • Features classes • Data by order
    • Features classes • Daily data
    • Features classes • Data by passenger
    • Features classes • Data by order
  • Exᴘʟᴏʀᴀᴛᴏʀʏ Dᴀᴛᴀ Aɴᴀʟʏsɪs • Importing matplotlib
  • Exᴘʟᴏʀᴀᴛᴏʀʏ Dᴀᴛᴀ Aɴᴀʟʏsɪs • Raw data (by order) ‖ Correlations
    • Predictor x Predictor: Trip duration vs. distance
    • Predictor x Predictor: Dropoff latitude vs. Pickup latitude
    • Predictor x Predictor: Dropoff longitude vs. Pickup longitude
    • Predictor x Predictor: Dropoff datetime vs. Pickup datetime
    • Predictor x Predictor: Speed vs. Pickup hour
    • Target x Predictor: tip_amount (Hᴀᴘᴘɪɴᴇss) vs. Pickup datetime
    • Target x Predictor: total_amount (Rᴇᴠᴇɴᴜᴇ) vs. Pickup datetime
    • Target x Predictor: tip_amount (Hᴀᴘᴘɪɴᴇss) vs. Trip duration
    • Target x Predictor: total_amount (Rᴇᴠᴇɴᴜᴇ) vs. Trip duration
    • Target x Predictor: tip_amount (Hᴀᴘᴘɪɴᴇss) vs. Trip length
    • Target x Predictor: total_amount (Rᴇᴠᴇɴᴜᴇ) vs. Trip length
  • Exᴘʟᴏʀᴀᴛᴏʀʏ Dᴀᴛᴀ Aɴᴀʟʏsɪs • Aggregations: frequency distributions
    • Tʀɪᴘs: Frequency distribution by month (%)
    • Tʀɪᴘs: Frequency distribution by week of the month (%) [normalized]
    • Tʀɪᴘs: Frequency distribution by day of week (%)
    • Tʀɪᴘs: Frequency distribution by pickup hour (%)
    • Tʀɪᴘs: Frequency distribution by pickup time of the day (%)
    • Tʀɪᴘs: Frequency distribution by pickup time of the day (% hourly demand)
    • Tʀɪᴘs: Frequency distribution by trajectory duration (%)
    • Tʀɪᴘs: Frequency distribution by trajectory length (%)
    • Tʀɪᴘs: Frequency distribution by pickup borough (%)
    • Tʀɪᴘs: Frequency distribution by dropoff borough (%)
    • Tʀɪᴘs: Frequency distribution by rate type (%)
    • Tʀɪᴘs: Frequency distribution by payment type (%)
    • Tʀɪᴘs: Frequency distribution by tip amount (%)
    • Tʀɪᴘs: Frequency distribution by total amount (%)
    • Tʀɪᴘs: Frequency distribution by average speed, mph (%)
  • Exporting results for retrieval
  • Nᴇxᴛ: Mᴏᴅᴇʟɪɴɢ: Time Series Analysis • Forecasting

Notebook 9: Rᴇᴘᴏʀᴛ ғᴏʀ ᴛʜᴇ CEO ‖ Mar-May 2014 Operations

  • Cᴏɴᴄᴇᴘᴛᴜᴀʟ Fʀᴀᴍᴇᴡᴏʀᴋ • Vɪsɪᴏɴᴀʀʏ
  • Dʀɪᴠɪɴɢ Qᴜᴇsᴛɪᴏɴs
  • Hɪɢʜʟɪɢʜᴛs
    • PROFIT 📈: REVENUE PROFILE
    • PEOPLE 👥: PASSENGERS' ROUTINE
  • Sᴜᴘᴘᴏʀᴛɪɴɢ Dᴀᴛᴀ: BUSINESS SUSTAINABILITY • PROFIT 📈
    • CWGR: Compound Weekly Growth Rate
    • WEEKLY PROFILE
  • Sᴜᴘᴘᴏʀᴛɪɴɢ Dᴀᴛᴀ: BUSINESS SUSTAINABILITY • PEOPLE 👥
    • % TRIPS BY DURATION ⏲️
    • % TRIPS BY DISTANCE
    • % TRIPS BY TIME OF THE DAY 🌃
  • Sᴜᴘᴘᴏʀᴛɪɴɢ Dᴀᴛᴀ: BUSINESS SUSTAINABILITY • PLANET 🌎
    • SOON!
  • See also:
    • Rᴇᴘᴏʀᴛ ғᴏʀ ᴛʜᴇ CFO
    • Rᴇᴘᴏʀᴛ ғᴏʀ ᴛʜᴇ COO

99_challenge's People

Contributors

isis-santos-costa avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.