Code Monkey home page Code Monkey logo

speedsprediction_datamining's Introduction

Data Mining project 2019

Politecnico di Milano

logo Politecnico di Milano

Overview

This is the code for the Data Mining project of the course in 2019 at Politecnico di Milano.

The goal is to predict traffic average speeds in specific times and points of Italian roads.

We classified first with a mean absolute error of 8.5 Km/h (see the evaluation section for more about the metric).

Description

Traffic conditions are measured by sensors installed in the roads. They are identified by the road in which they are and the km. They provide measurements at interval of 15 minutes, in particular about:

  • the average speed
  • the min speed
  • the max speed
  • the speed standard deviation
  • the number of vehicles

During the day, several events may happen (e.g. accidents, roadworks, ...). Data tell us useful info about them:

  • type
  • location
  • duration
  • details

In addition, we have information also about the weather, hourly measured by a set of weather stations. We have:

  • min temperature
  • max temperature
  • nearest sensors

The goal is to predict the average speeds of each target sensor for the 4 quarters of hour immediately after the beginning of an event involving that sensor.

sample road with events

Dataset

You can download the dataset here: dataset.zip.

Extract the archive in: resources/dataset/originals.

You will get the following csv files:

  • speeds.csv.gz: contains the speeds measurements
  • events.csv.gz: contains data about the events
  • weather.csv.gz: contains data about the weather
  • distances.csv.gz: contains the nearest weather stations near each sensor. The format of the file is:
road, km | weather_station_id_0, distance_0, weather_station_id_1, distance_1, ...
  • sensors.csv.gz: contains some info about the roads

The files for speeds, events and weather are splitted in 3 files:

  • {file}_train: the file is used as training set
  • {file}_test: the file is used as validation set
  • {file}_2019: the file is used as test set

See the assignment document for additional details.

Evaluation

The evaluation metric is the Mean Absolute Error (MAE) between the real average speeds and the predicted ones.

Our solution

The task is a multi-regression problem, since we have to predict 4 real values. We built an ensemble of two gradient-boosted trees models, Catboost and LightGBM, each of them is trained in the following way: the prediction of a model is passed to the next model as a new feature. This is called and implemented in sklearn as multioutput regressor-chain. The models we used can handle missing values, but the sklearn implementation of the regressor-chain does not allow the presence of them in the dataset. So, we had to modify the code of sklearn to allow nan values.

model ensemble

Check the project presentation for further details on the models and results.

speedsprediction_datamining's People

Contributors

federicoparroni avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.