Code Monkey home page Code Monkey logo

coles-paper's Introduction

This repository contains the experiment code from the COLES paper

We also have pytorch-lifestream library, which privide a more simple way to use COLES and other methods for event sequience analysis in production.

Prerequisites

The code is tested on Ubuntu Linux 18.04.5 LTS server with NVIDIA Tesla P100 GPU. It should also work on any modern Linux with modern NVIDIA Tesla GPU.

Churn dataset experiments require around 1 day to complete.

Assessment dataset experiments require around 1 day to complete.

Age prediction dataset experiments require around 3 days to complete.

Retail dataset experiments - require around 7 days to complete.

It is possible to run different experiments on different GPU cards by using SC_DEVICE environment variable, e. g. export SC_DEVICE="cuda:0".

1. Setup using pipenv

# Ubuntu 18.04

sudo apt install python3.8 python3-venv
pip3 install pipenv

pipenv sync # install packages exactly as specified in Pipfile.lock
pipenv shell # activate virtual environment

2. Install LaTeX packages for plot generation

sudo apt-get install dvipng texlive-latex-extra texlive-fonts-recommended cm-super

3. Age prediction dataset experiments

cd experiments/scenario_age_pred

# download datasets
bin/get-data.sh

# convert datasets from transaction list to features for metric learning
bin/make-datasets-spark.sh

export SC_DEVICE="cuda"
# run experiments
sh bin/run_all_scenarios.sh

# optionally return to the project root
cd ../..

4. Churn dataset experiments

cd experiments/scenario_rosbank

# download datasets
bin/get-data.sh

# convert datasets from transaction list to features for metric learning
bin/make-datasets-spark.sh

export SC_DEVICE="cuda"
# run experiments
sh bin/run_all_scenarios.sh

# optionally return to the project root
cd ../..

5. Assessment dataset experiments

cd experiments/scenario_bowl2019

# download datasets
bin/get-data.sh

# convert datasets from transaction list to features for metric learning
bin/make-datasets-spark.sh

export SC_DEVICE="cuda"
# run experiments
sh bin/run_all_scenarios.sh

# optionally return to the project root
cd ../..

6. Retail dataset experiments

cd experiments/scenario_x5

# download datasets
bin/get-data.sh

# convert datasets from transaction list to features for metric learning
bin/make-datasets-spark.sh

export SC_DEVICE="cuda"
# run experiments
sh bin/run_all_scenarios.sh

# optionally return to the project root
cd ../..

8. Experiment results

Raw result files are stored in experiments/*/results folder. Current results are cached in this repositiry, final tables and plots can be generated without full experiment run.

The final results can be seen in the Jupyter notebooks. To run the notebooks with the required dependencies in PYTHONPATH, the Jupyter notebook server must be started inside the pipenv shell e. g.:

pipenv shell
jupyter notebook

Here are the list of the notebooks with experiment results:

Tables from paper: Tables 2, 3, 4, 5, 6, 7 from the paper that compare model quality metrics are produced by this notebook

RNN hidden size figures: Figure 3 from the paper is produced by this notebook

Periodicity and repeatability of the data figures: Figure 2 from the paper is produced by this notebook

Periodicity and repeatability of the text: Figure 2d from the paper is produced by this notebook

Model quality for different dataset sizes figures: Figure 4 from the paper is produced by this notebook

Experiments on scoring dataset, shown in Table 6, were performed on different code base. They are not available in this repository. Experiments from Table 11 were performed on in-house datasets. They are also not available in this repository.

coles-paper's People

Contributors

ivkireev86 avatar ovsovnikita avatar dllllb avatar bearsubj13 avatar nerviwki avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.