Code Monkey home page Code Monkey logo

oce-dataset's Introduction

Datasets for Online Controlled Experiments

This is a project in two parts:

  1. The first survey and taxonomy for existing online controlled experiment datasets, and
  2. The ASOS Digital Experiments dataset - the first public dataset that supports the design and running of experiments with adaptive stopping.

The work is accepted into NeurIPS 2021 Track on Datasets and Benchmarks. (Link to NeurIPS proceedings | OpenReview | arXiv)

If you find the project helpful, please use the following citation:

@inproceedings{liu2021datasets,
 author = {Liu, C. H. Bryan and Cardoso, \^{A}ngelo and Couturier, Paul and McCoy, Emma J.},
 booktitle = {Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks},
 editor = {J. Vanschoren and S. Yeung},
 pages = {},
 title = {Datasets for Online Controlled Experiments},
 url = {https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/274ad4786c3abca69fa097b85867d9a4-Paper-round2.pdf},
 volume = {1},
 year = {2021}
}

Survey of existing OCE datasets

A summary of the survey, together with the direct links to the datasets are available on this Open Data StackExchange answer.

Experimenting with the ASOS Digital Experiments Dataset

Loading the ASOS Digital Experiments dataset

The dataset is available on: https://osf.io/64jsb/ .

The experiment notebook uses the parquet form of the dataset. It would attempt to download the file before getting pandas to load the dataframe. If that doesn't work, you can either:

To get the parquet form of the dataset used in the experiments, you can do one of:

  • Download the file via this direct link and place it in the data directory, or
  • Use the following command at the root of this repo:
    wget -O ./data/asos_digital_experiments_dataset.parquet https://osf.io/62t7f/download
    

Setup

This file assumes you have access to a *nix-like machine (both MacOS or Linux would do). If you have a Windows machine, the notebook should still work provided you have the right Python packages installed, but it is not tested.

This project uses pyenv and poetry for package management. Before you start, please ensure you have gcc, make, and pip installed.

Installing pyenv

For Linux (together with other required libraries):

sudo apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev \
libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev \
xz-utils tk-dev libffi-dev liblzma-dev python-openssl git
wget -L https://github.com/pyenv/pyenv-installer/raw/master/bin/pyenv-installer | bash

chmod u+x pyenv-installer
./pyenv-installer

For OS X:

brew install pyenv
brew install pyenv-virtualenv

We then need to configure the PATHs:

export PATH="$HOME/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"

...and install the right Python version for our environment:

pyenv install 3.9.10

Installing poetry

See https://python-poetry.org/docs/#installation for the installation instructions.

Download the repository and sync the environment

git clone https://github.com/liuchbryan/oce-dataset.git
cd oce-dataset  

# Switch to Python 3.9.10 for pyenv
pyenv local 3.9.10
poetry env use ~/.pyenv/versions/3.9.10/bin/python
poetry install

Run the Jupyter notebooks

poetry shell

Within the newly spawn up virtualenv shell, run

jupyter notebook

Once you are done, terminate the Jupyter server using Ctrl+C, and type exit to exit the virtualenv shell.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.