Rossmann Sales Prediction

This repository describes the participation in the Mini Competition run in Data Science Retreat - Batch 25 between 2-4 February 2021, from the Team 3 composed by:

Alberto Julián
Gert-Jan Dobbelaere
Sergio Vechi

Introduction

This DSR mini-competition is based on a Kaggle competition which run from Sep 30, 2015 to Dec 15, 2015:

https://www.kaggle.com/c/rossmann-store-sales/overview/

Generic objectives of the competition

Develop an end-to-end Data Science project
Live the experience of working as a Data Science team, sharing python code and jupyter notebooks through Github
Present results to an audience

Specific objectives of the competition

The Rossmann competition aims to predict sales on more than a thousand stores based on historic sales and additional information provided.

Scoring Criteria

The competition is scored based on a composite of predictive accuracy, following a metric detailed below, and reproducibility.

Information provided

Two csv files were provided for training the models:

train.csv
store.csv

Both datasets are described in the EDA jupyter notebook.

Additionally, a test dataset was provided to check the accuracy of the models. The holdout test period is from 2014-08-01 to 2015-07-31. The holdout test dataset has the same format as train.csv, and is called holdout.csv.

Content of the repository

Apart from the aforementioned datasets, the following files have been created:

Python files

data_cleaning_rossman.py: performs the data cleaning of the datasets
feature_eng.py: performs the feature engineering of the cleaned datasets
utils.py: plots sales of a bunch of stores in several modes: grouped by month, day of the week, week of the year

Jupyter notebooks

EDA_rossman.ipynb: Exploratory Data Analysis of the datasets
pipeline.ipynb: shows a complete tour through the stages deployed in the python files: data cleaning, feature engineering and modelling

Installation instructions

Open a terminal. Create a conda environment.

conda create --name ROSSMANN_SALES python=3.7
conda activate ROSSMANN_SALES
git clone https://github.com/albertojulian/rossman-sales-pred
cd rossman-sales-pred
pip install -r requirements.txt
jupyter notebook

The repository should show in the browser. Now you can use any of the three jupyter notebooks mentioned.

Predictive accuracy

The task is to predict the Sales of a given store on a given day.

Submissions are evaluated on the root mean square percentage error (RMSPE):

def metric(preds, actuals):
    preds = preds.reshape(-1)
    actuals = actuals.reshape(-1)
    assert preds.shape == actuals.shape
    return 100 * np.linalg.norm((actuals - preds) / actuals) / np.sqrt(preds.shape[0])

gert-jand / rossman-sales-pred Goto Github PK

rossman-sales-pred's Introduction

Rossmann Sales Prediction

Introduction

Generic objectives of the competition

Specific objectives of the competition

Scoring Criteria

Information provided

Content of the repository

Python files

Jupyter notebooks

Installation instructions

Predictive accuracy

rossman-sales-pred's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent