Evaluating the Robustness of Click Models to Policy Distributional Shift

This repository contains the supporting code and implementation details for our paper Evaluating the Robustness of Click Models to Policy Distributional Shift.

We provide below architecture diagrams for the 6 click models used in our experiments :

PBM :

UBM :

DBN :

NCM :

CACM- :

ARM :

Repository Usage

Getting started

The code was tested with python 3.7.4 on a CentOS 7 machine.

pip install -r requirements.txt

To preprocess yandex data

Download Yandex's relevance prediction dataset and put it in the desired path.*
Execute the following scripts :

 python utils/parse_yandex.py --path [path]
 python utils/preprocessing.py --dataset [path/serp_based]
 python utils/filter_ground_truth.py --dataset [path/serp_based]

Now the ready-to-train Yandex data is ready !

* _{Unfortunately the Yandex dataset is no longer available online. If you wish to work with it, please contact us and we may be able to invite you to work with it at our premises.}

To generate the simulated data :

Download the MSLR-WEB10K dataset.
Put the Fold1 folder in your desired path.
Add lambdmart policy's two files in the same directory
python generate_data.py --path [path/to/directory/]

This will generate datasets for 3 different internal click models (DBN, CoCM, CoCM mismatch), each with 3 training policies (PL-oracle, PL-bm25, PL-lambdamart), each with 4 test policies (oracle, random, bm25, lambdamart), as well as the data required for the experiment in Section 6.1.

To launch click model training on a specific dataset:

python train_click_models.py --cm=[CLICK MODEL] --sl=[STATE LEARNER] --data_dir [path/to/dataset/directory/] [--hp_name hp_value]

This will train the desired click model on the dataset given as argument, save perplexity and NDCG results in data_dir/results/, and save the best checkpoint in data_dir/checkpoints/.

⚠️ You must use the format --cm=XXX instead of --cm XXX.

A complete list of default hyperparameters can be found in argument_parser.py for program-wide parameters and in click_models.py or state_learners.py for model-specific parameters. We provide configuration files for reproducing the experiments in the paper in the config folder : specs_yandex.yml for reproducing Table 2, specs_random_rerank.yml for reproducing the in-distribution results of Tables 3 and 4, and specs_sim.yml for reproducing the red dashed line in Figure 1, after having generated the data.

To launch in debugging mode (few iterations of test, and NDCG evaluation) :

Add --debug True

To run an experiment on robustness of click prediction (after training) :

python gen_eval.py --cp_name [checkpoint_filename] --dataset [path/to/dataset/directory/]

This will load the checkpoint dataset/checkpoints/cp_name and test it on all the target datasets present in the folder (by default oracle, random, bm25 and lambdamart). Ood-perplexity results are saved in dataset/target_dataset/results/. We provide configuration files for reproducing the experiments in the paper in the config folder : specs_random_rerank_ood.yml for reproducing the out-of-distribution results of Tables 3 and 4, and specs_sim_ood.yml for reproducing the blue line in Figure 1.

To plot a similar spider chart as Figure 1 in the paper (after the robustness of click prediction experiment)

python spider_chart_gen_eval.py --path [path/to/datasets/directory/]

This will read results and plot them in an interpretable fashion. The figure is saved in path/gen_eval.png.

To run an experiment on robustness of subsequent policies (after training) :

python policy_eval.py --cp_name [checkpoint_filename] --dataset [path/to/dataset/directory/]

This will load the checkpoint dataset/checkpoints/cp_name, extract the Top-Down and Max-Reward policies corresponding to this checkpoint, generate datasets with these policies and save the CTR of these datasets in dataset/policies/. We provide configuration files for reproducing the experiments in the paper in the config folder : specs_pol.yml for reproducing the results displayed in Figure 2.

To plot a similar bar chart as Figure 2 in the paper (after the robustness of policies experiment) :

python bar_chart_policy_eval.py --path [path/to/datasets/directory/]

This will read results and plot them in an interpretable fashion. The figure is saved in path/policy_eval.png.

naver / dist-shift-click-models Goto Github PK

dist-shift-click-models's Introduction

Evaluating the Robustness of Click Models to Policy Distributional Shift

Repository Usage

Getting started

To preprocess yandex data

To generate the simulated data :

To launch click model training on a specific dataset:

To launch in debugging mode (few iterations of test, and NDCG evaluation) :

To run an experiment on robustness of click prediction (after training) :

To plot a similar spider chart as Figure 1 in the paper (after the robustness of click prediction experiment)

To run an experiment on robustness of subsequent policies (after training) :

To plot a similar bar chart as Figure 2 in the paper (after the robustness of policies experiment) :

dist-shift-click-models's People

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent