Code Monkey home page Code Monkey logo

rossman-sales-pred's Introduction

Rossmann Sales Prediction

This repository describes the participation in the Mini Competition run in Data Science Retreat - Batch 25 between 2-4 February 2021, from the Team 3 composed by:

  • Alberto Julián
  • Gert-Jan Dobbelaere
  • Sergio Vechi

Introduction

This DSR mini-competition is based on a Kaggle competition which run from Sep 30, 2015 to Dec 15, 2015:

https://www.kaggle.com/c/rossmann-store-sales/overview/

Generic objectives of the competition

  • Develop an end-to-end Data Science project
  • Live the experience of working as a Data Science team, sharing python code and jupyter notebooks through Github
  • Present results to an audience

Specific objectives of the competition

The Rossmann competition aims to predict sales on more than a thousand stores based on historic sales and additional information provided.

Scoring Criteria

The competition is scored based on a composite of predictive accuracy, following a metric detailed below, and reproducibility.

Information provided

Two csv files were provided for training the models:

  • train.csv
  • store.csv

Both datasets are described in the EDA jupyter notebook.

Additionally, a test dataset was provided to check the accuracy of the models. The holdout test period is from 2014-08-01 to 2015-07-31. The holdout test dataset has the same format as train.csv, and is called holdout.csv.

Content of the repository

Apart from the aforementioned datasets, the following files have been created:

Python files

  • data_cleaning_rossman.py: performs the data cleaning of the datasets
  • feature_eng.py: performs the feature engineering of the cleaned datasets
  • utils.py: plots sales of a bunch of stores in several modes: grouped by month, day of the week, week of the year

Jupyter notebooks

  • EDA_rossman.ipynb: Exploratory Data Analysis of the datasets
  • pipeline.ipynb: shows a complete tour through the stages deployed in the python files: data cleaning, feature engineering and modelling

Installation instructions

Open a terminal. Create a conda environment.

conda create --name ROSSMANN_SALES python=3.7
conda activate ROSSMANN_SALES
git clone https://github.com/albertojulian/rossman-sales-pred
cd rossman-sales-pred
pip install -r requirements.txt
jupyter notebook

The repository should show in the browser. Now you can use any of the three jupyter notebooks mentioned.

Predictive accuracy

The task is to predict the Sales of a given store on a given day.

Submissions are evaluated on the root mean square percentage error (RMSPE):

def metric(preds, actuals):
    preds = preds.reshape(-1)
    actuals = actuals.reshape(-1)
    assert preds.shape == actuals.shape
    return 100 * np.linalg.norm((actuals - preds) / actuals) / np.sqrt(preds.shape[0])

rossman-sales-pred's People

Contributors

albertojulian avatar skepticalchemist avatar gert-jand avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.