Code Monkey home page Code Monkey logo

dist-shift-click-models's Introduction

Evaluating the Robustness of Click Models to Policy Distributional Shift

This repository contains the supporting code and implementation details for our paper Evaluating the Robustness of Click Models to Policy Distributional Shift.

We provide below architecture diagrams for the 6 click models used in our experiments :

PBM :

UBM :

DBN :

NCM :

CACM- :

ARM :

Repository Usage

Getting started

The code was tested with python 3.7.4 on a CentOS 7 machine.

pip install -r requirements.txt

To preprocess yandex data

  1. Download Yandex's relevance prediction dataset and put it in the desired path.*
  2. Execute the following scripts :
 python utils/parse_yandex.py --path [path]
 python utils/preprocessing.py --dataset [path/serp_based]
 python utils/filter_ground_truth.py --dataset [path/serp_based]

Now the ready-to-train Yandex data is ready !

* Unfortunately the Yandex dataset is no longer available online. If you wish to work with it, please contact us and we may be able to invite you to work with it at our premises.

To generate the simulated data :

  1. Download the MSLR-WEB10K dataset.
  2. Put the Fold1 folder in your desired path.
  3. Add lambdmart policy's two files in the same directory
  4. python generate_data.py --path [path/to/directory/]

This will generate datasets for 3 different internal click models (DBN, CoCM, CoCM mismatch), each with 3 training policies (PL-oracle, PL-bm25, PL-lambdamart), each with 4 test policies (oracle, random, bm25, lambdamart), as well as the data required for the experiment in Section 6.1.

To launch click model training on a specific dataset:

python train_click_models.py --cm=[CLICK MODEL] --sl=[STATE LEARNER] --data_dir [path/to/dataset/directory/] [--hp_name hp_value] 

This will train the desired click model on the dataset given as argument, save perplexity and NDCG results in data_dir/results/, and save the best checkpoint in data_dir/checkpoints/.

โš ๏ธ You must use the format --cm=XXX instead of --cm XXX.

A complete list of default hyperparameters can be found in argument_parser.py for program-wide parameters and in click_models.py or state_learners.py for model-specific parameters. We provide configuration files for reproducing the experiments in the paper in the config folder : specs_yandex.yml for reproducing Table 2, specs_random_rerank.yml for reproducing the in-distribution results of Tables 3 and 4, and specs_sim.yml for reproducing the red dashed line in Figure 1, after having generated the data.

To launch in debugging mode (few iterations of test, and NDCG evaluation) :

Add --debug True

To run an experiment on robustness of click prediction (after training) :

python gen_eval.py --cp_name [checkpoint_filename] --dataset [path/to/dataset/directory/]

This will load the checkpoint dataset/checkpoints/cp_name and test it on all the target datasets present in the folder (by default oracle, random, bm25 and lambdamart). Ood-perplexity results are saved in dataset/target_dataset/results/. We provide configuration files for reproducing the experiments in the paper in the config folder : specs_random_rerank_ood.yml for reproducing the out-of-distribution results of Tables 3 and 4, and specs_sim_ood.yml for reproducing the blue line in Figure 1.

To plot a similar spider chart as Figure 1 in the paper (after the robustness of click prediction experiment)

python spider_chart_gen_eval.py --path [path/to/datasets/directory/]

This will read results and plot them in an interpretable fashion. The figure is saved in path/gen_eval.png.

To run an experiment on robustness of subsequent policies (after training) :

python policy_eval.py --cp_name [checkpoint_filename] --dataset [path/to/dataset/directory/]

This will load the checkpoint dataset/checkpoints/cp_name, extract the Top-Down and Max-Reward policies corresponding to this checkpoint, generate datasets with these policies and save the CTR of these datasets in dataset/policies/. We provide configuration files for reproducing the experiments in the paper in the config folder : specs_pol.yml for reproducing the results displayed in Figure 2.

To plot a similar bar chart as Figure 2 in the paper (after the robustness of policies experiment) :

python bar_chart_policy_eval.py --path [path/to/datasets/directory/]

This will read results and plot them in an interpretable fashion. The figure is saved in path/policy_eval.png.

dist-shift-click-models's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.