Code Monkey home page Code Monkey logo

oracle_cb's Introduction

oracle_cb

Experimentation for oracle based contextual bandit algorithms.


Installation

  1. Clone repository
  2. Instally python3, scipy, numpy, scikit-learn.
  3. Fill in settings.py with your information. I recommend using full paths.
    • BASE_DIR should point to the base of this repository.
    • DATA_DIR should point to root/data/ directory.
    • REMOTE_PATH_TO_PYTHON is only used if you want to run things on a cluster.
    • REMOTE_BASE_DIR is only used if you want to run things on a cluster.
  4. Download and prepare datasets (MSLR, Yahoo, MQ2007, MQ2008). This is somewhat optional.
    • For MSLR:
      • Visit https://www.microsoft.com/en-us/research/project/mslr/
      • Download MSLR-WEB30K dataset
      • Unpack it into settings.DATA_DIR/mslr/ you should have 5 files named mslr30k_train<#>.txt where <#> is 1 through 5. This is different from the default directory structure of the dataset, so you will have to rename the files.
      • $ python3 PreloadMSLR.py -- This will produce a file settings.DATA_DIR/mslr/mslr30k_train.npz which is required for experiments.
    • For Yahoo:
      • You need to get the dataset, this is somewhat involved. The dataset is C14B here: https://webscope.sandbox.yahoo.com/catalog.php?datatype=c
      • Unpack it into settings.DATA_DIR/yahoo/ you should have 6 files named set<#>.<$>.txt where <#> is either 1 or 2 and <$> is train, valid, or test.
      • $ python3 PreloadYahoo.py -- This will produce a file settings.DATA_DIR/yahoo/yahoo_big.npz which is required for experiments.

Locally running an algorithm

  1. Use Semibandits.py. It can be run as a script with a few parameters.

    $ python3 Semibandits.py --T 1000 --dataset mslr30k --L 3 --I 0 --alg lin --param 0.1
    

    This will generate some output and then create a folder in root/results/. That folder will have three files in it containing: the reward reported every 10 rounds, validation results on a held out dataset (which we currently ignore), and the total running time of the execution.


Running on a cluster

  1. Clone repository on the cluster. Locally update REMOTE_PATH_TO_PYTHON and REMOTE_BASE_DIR in settings.py

  2. On the cluster, make sure that the globals in settings.py point to the right places.

    • BASE_DIR=
    • DATA_DIR=
  3. Make sure you have the right .npz files in the DATA_DIR. See ContextIterators.py for the naming. For mslr you want to use the MSLR30k iterator, so you need to have DATA_DIR/mslr/mslr30k_train.npz. For yahoo you want to use the YahooContextIterator object, so you need to have DATA_DIR/yahoo/yahoo_big.npz. Put both mslr30k and yahoo on the cluster

  4. Locally:

    cd <repository location>
    python3 parallel.py | parallel -S <number of threads>/<your login>@<your server>
    

    Use as many servers as you can but note that the process is memory intensive so parallel doesn't do a great job of allocating threads. I was doing at most 4 jobs per machine. If you want to change the parameters, edit the parallel.py file.

  5. The results will be in /results/mslr_T=36000_L=3_e=0.1/ and /code/results/yahoo_T=40000_L=2_e=0.5/


Plotting results

  1. Move the above to results directories locally.
python3 plotting_script.py --save

oracle_cb's People

Contributors

akshaykr avatar

Watchers

Wabbit avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.