Evaluating the Robustness of Click Models to Policy Distributional Shift
This repository contains the supporting code and implementation details for our paper Evaluating the Robustness of Click Models to Policy Distributional Shift.
We provide below architecture diagrams for the 6 click models used in our experiments :
PBM :
UBM :
DBN :
NCM :
CACM- :
ARM :
Repository Usage
Getting started
The code was tested with python 3.7.4 on a CentOS 7 machine.
pip install -r requirements.txt
To preprocess yandex data
- Download Yandex's relevance prediction dataset and put it in the desired path.*
- Execute the following scripts :
python utils/parse_yandex.py --path [path]
python utils/preprocessing.py --dataset [path/serp_based]
python utils/filter_ground_truth.py --dataset [path/serp_based]
Now the ready-to-train Yandex data is ready !
* Unfortunately the Yandex dataset is no longer available online. If you wish to work with it, please contact us and we may be able to invite you to work with it at our premises.
To generate the simulated data :
- Download the MSLR-WEB10K dataset.
- Put the
Fold1
folder in your desired path. - Add lambdmart policy's two files in the same directory
python generate_data.py --path [path/to/directory/]
This will generate datasets for 3 different internal click models (DBN, CoCM, CoCM mismatch), each with 3 training policies (PL-oracle, PL-bm25, PL-lambdamart), each with 4 test policies (oracle, random, bm25, lambdamart), as well as the data required for the experiment in Section 6.1.
To launch click model training on a specific dataset:
python train_click_models.py --cm=[CLICK MODEL] --sl=[STATE LEARNER] --data_dir [path/to/dataset/directory/] [--hp_name hp_value]
This will train the desired click model on the dataset given as argument, save perplexity and NDCG results in data_dir/results/
, and save the best checkpoint in data_dir/checkpoints/
.
--cm=XXX
instead of --cm XXX
.
A complete list of default hyperparameters can be found in argument_parser.py
for program-wide parameters and in click_models.py
or state_learners.py
for model-specific parameters. We provide configuration files for reproducing the experiments in the paper in the config
folder : specs_yandex.yml
for reproducing Table 2, specs_random_rerank.yml
for reproducing the in-distribution results of Tables 3 and 4, and specs_sim.yml
for reproducing the red dashed line in Figure 1, after having generated the data.
To launch in debugging mode (few iterations of test, and NDCG evaluation) :
Add --debug True
To run an experiment on robustness of click prediction (after training) :
python gen_eval.py --cp_name [checkpoint_filename] --dataset [path/to/dataset/directory/]
This will load the checkpoint dataset/checkpoints/cp_name
and test it on all the target datasets present in the folder (by default oracle, random, bm25 and lambdamart). Ood-perplexity results are saved in dataset/target_dataset/results/
. We provide configuration files for reproducing the experiments in the paper in the config
folder : specs_random_rerank_ood.yml
for reproducing the out-of-distribution results of Tables 3 and 4, and specs_sim_ood.yml
for reproducing the blue line in Figure 1.
To plot a similar spider chart as Figure 1 in the paper (after the robustness of click prediction experiment)
python spider_chart_gen_eval.py --path [path/to/datasets/directory/]
This will read results and plot them in an interpretable fashion. The figure is saved in path/gen_eval.png
.
To run an experiment on robustness of subsequent policies (after training) :
python policy_eval.py --cp_name [checkpoint_filename] --dataset [path/to/dataset/directory/]
This will load the checkpoint dataset/checkpoints/cp_name
, extract the Top-Down and Max-Reward policies corresponding to this checkpoint, generate datasets with these policies and save the CTR of these datasets in dataset/policies/
. We provide configuration files for reproducing the experiments in the paper in the config
folder : specs_pol.yml
for reproducing the results displayed in Figure 2.
To plot a similar bar chart as Figure 2 in the paper (after the robustness of policies experiment) :
python bar_chart_policy_eval.py --path [path/to/datasets/directory/]
This will read results and plot them in an interpretable fashion. The figure is saved in path/policy_eval.png
.