Code Monkey home page Code Monkey logo

taxi-rides-prediction-'s Introduction

ChessUrl

Time Series Analysis for Taxi Demand Prediction

Background

The goal of this project is to predict the demand for taxis at different hours of the day. The data used for this project includes the number of rides per daytime and various features that can be extracted from the date and time, such as the day of the week, holidays etc or can join on the datetime column as season, weather, etc. The main objective is to identify patterns and trends in the data and use them to make accurate predictions about future taxi demand.

Initial Questions and Analysis

Initially, the project aims to answer basic questions about trends and causality in the data.

Example for question and answer : Is the number of rides derived from a normal distribution?

ChessUrl

Simple baselines are also created to establish a benchmark for comparison. The first baseline does not use any data from the test set, while the second one uses data from the test set. An example of a baseline that uses different time periods for predicting future demand.

ChessUrl

We also perform Exploratory Data Analysis (EDA) to understand the distribution of the data, identify outliers and missing values, and uncover any underlying patterns or trends.

missing target data points across dates

missing

Modeling

Two models are created to predict taxi demand. The first model is a regular ML model (CatBoost Regressor) that uses independent samples to predict demand. This model is trained using a set of features such as the day of the week, season, and weather, and the goal is to find the best set of features that can improve the accuracy of the predictions. The second model is a time series model, specifically a vanilla LSTM architecture, which is used to show how to use models that are built for time series cases and can generate features by themselves. The LSTM model is trained on the time series data of taxi demand, and the goal is to see how well it can capture the temporal dependencies in the data. Additionally, we do not add additional features even though we believe it could improve the LSTM model, for the sake of simplicity.

Evaluation

Once the models are trained, we evaluate their performance using mean absolute error (MAE). For further research, additional metrics such as mean squared error (MSE) and root mean squared error (RMSE) would also be used. We also perform a visual comparison of the predicted values with the true values to understand the level of accuracy and identify any patterns or trends in the errors.

Note

Please note that this is a first and short iteration, and its main goal is to prove the concept. Given this, there is a lot of room for improvement and further research. Some things that could be added include:

  • Grid search to optimize model parameters
  • Comparison with other time series models such as ARIMA
  • Comparison with other ml models such as regression, loightGBM etc
  • Further EDA to uncover more insights from the data
  • Showing the differences between the results of the test and train sets
  • Testing the statistics of the results, such as the mean, max errors and median
  • Adding more features from external sources
  • Plotting the error against time
  • Adding a histogram of errors for better visualization.
  • Cross validation.
  • Using standard pipeline functionality that exists as part of the scikit-learn framework.

Code Structure

Please note that the code for this project is not written in classes as is typically done. Instead, the code is organized in a procedural manner for simplicity and ease of understanding.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.