This repository contains the experiment code from the COLES paper
We also have pytorch-lifestream library, which privide a more simple way to use COLES
and other methods for event sequience analysis in production.
The code is tested on Ubuntu Linux 18.04.5 LTS server with NVIDIA Tesla P100 GPU. It should also work on any modern Linux with modern NVIDIA Tesla GPU.
Churn dataset experiments require around 1 day to complete.
Assessment dataset experiments require around 1 day to complete.
Age prediction dataset experiments require around 3 days to complete.
Retail dataset experiments - require around 7 days to complete.
It is possible to run different experiments on different GPU cards by using SC_DEVICE
environment variable, e. g. export SC_DEVICE="cuda:0"
.
# Ubuntu 18.04
sudo apt install python3.8 python3-venv
pip3 install pipenv
pipenv sync # install packages exactly as specified in Pipfile.lock
pipenv shell # activate virtual environment
sudo apt-get install dvipng texlive-latex-extra texlive-fonts-recommended cm-super
cd experiments/scenario_age_pred
# download datasets
bin/get-data.sh
# convert datasets from transaction list to features for metric learning
bin/make-datasets-spark.sh
export SC_DEVICE="cuda"
# run experiments
sh bin/run_all_scenarios.sh
# optionally return to the project root
cd ../..
cd experiments/scenario_rosbank
# download datasets
bin/get-data.sh
# convert datasets from transaction list to features for metric learning
bin/make-datasets-spark.sh
export SC_DEVICE="cuda"
# run experiments
sh bin/run_all_scenarios.sh
# optionally return to the project root
cd ../..
cd experiments/scenario_bowl2019
# download datasets
bin/get-data.sh
# convert datasets from transaction list to features for metric learning
bin/make-datasets-spark.sh
export SC_DEVICE="cuda"
# run experiments
sh bin/run_all_scenarios.sh
# optionally return to the project root
cd ../..
cd experiments/scenario_x5
# download datasets
bin/get-data.sh
# convert datasets from transaction list to features for metric learning
bin/make-datasets-spark.sh
export SC_DEVICE="cuda"
# run experiments
sh bin/run_all_scenarios.sh
# optionally return to the project root
cd ../..
Raw result files are stored in experiments/*/results
folder. Current results are cached in this repositiry, final tables and plots can be generated without full experiment run.
The final results can be seen in the Jupyter notebooks. To run the notebooks with the required dependencies in PYTHONPATH
, the Jupyter notebook server must be started inside the pipenv shell e. g.:
pipenv shell
jupyter notebook
Here are the list of the notebooks with experiment results:
Tables from paper: Tables 2, 3, 4, 5, 6, 7
from the paper that compare model quality metrics are produced by this notebook
RNN hidden size figures: Figure 3
from the paper is produced by this notebook
Periodicity and repeatability of the data figures: Figure 2
from the paper is produced by this notebook
Periodicity and repeatability of the text: Figure 2d
from the paper is produced by this notebook
Model quality for different dataset sizes figures: Figure 4
from the paper is produced by this notebook
Experiments on scoring dataset, shown in Table 6
, were performed on different code base. They are not available in this repository. Experiments from Table 11
were performed on in-house datasets. They are also not available in this repository.