UpliftRec

Overview

This is the official code of "Treatment Effect Estimation for User Interest Exploration on Recommender Systems" (SIGIR'24).

Dataset

We leverage three distinct datasets in our study: Yahoo!R3, Coat, and KuaiRec. The first two are conveniently included within our codebase, while the latter, KuaiRec, is designed to be downloaded directly from its official source at https://kuairec.com/, attributable to its considerable size. The datasets are individually preprocessed to cater to the requirements of three specific backend models, as well as the UpliftRec model, within the "data" directory. To facilitate the generation of the processed data, simply execute the provided Jupyter notebook code.

Backend Models

We choose MF, FM, and LightGCN to be our backend models. Specifically, MF is utilized for the Yahoo!R3 and KuaiRec datasets, whereas FM is applied to the Coat dataset. The implementation code for these backend models is located within the "code" directory of our repository. Additionally, we have stored the embedding files generated by the backend models. For the UpliftRec model, we have also saved the embedding file for all user samples in the augmented dataset. These pre-computed embeddings can be readily employed to train the UpliftRec model without the need for re-running the backend models. For your convenience, the following command can be used to train the optimal backend models efficiently:

To train MF for Yahoo!R3, first set dataset to "yahoo" and "main_path" to "../../data/yahoo/yahoo-FM/" in "config.py". Then use this command for the best backend model:
```
python -u main.py --lr=0.0001 --factor_num=512 --batch_size=256 --epochs=30 --gpu=0 --num_ng=24
```

To train FM for Coat, use this command for the best backend model:

python -u main.py --dataset=coat --data_path=../../data/coat/coat-FM/ --lr=0.0001 --hidden_factor=256 --batch_size=16 --epochs=100 --dropout=[0.4,0.2] --gpu=0

To train MF for KuaiRec, first set dataset to "kuai" and "main_path" to "../../data/kuai/kuai-FM/" in "config.py". Then use this command for the best backend model:
```
python -u main.py --lr=0.0001 --factor_num=256 --batch_size=1024 --epochs=100 --gpu=0 --num_ng=24
```

To fetch embeddings of all user samples in the augmented dataset, please train under the data_path "dataset-subusers-?cate?" for each dataset with different sample numbers and category-clustering numbers.

Quick Examples with Optimal Parameters

Use the following command to leverage the optimal parameters:

UpliftRec-MTEF on Yahoo!R3:

python -u main.py --dataset=yahoo --similar_user_num=30 --similar_user_num_propensity=20 --treat_clip_num=8 --ADRF_null=0.01 --propensity_null=0.01 --MTEF_null=0.45 --top_k=[10] --eps=0 --delta_T=1 --use_MTEF=1 --alpha_MTEF=0.15 --check_user=14 --embed_file=embed_user_MF_yahoo_256emb_all.npy --use_onehot_embed=0 --use_topk=1000 --data_path=./data/yahoo/yahoo-FM/ --subdata_path=./data/yahoo/yahoo-subusers-cate5/  --backend_modelname=_MF_yahoo_512emb_backend.npy --backend_path=./code/MF/embeddings/ --gamma=7.5

UpliftRec-MTEF on Coat:

python -u main.py --dataset=coat --similar_user_num=80 --similar_user_num_propensity=100 --treat_clip_num=6 --ADRF_null=0.01 --propensity_null=0.01 --MTEF_null=0.05 --top_k=[10] --eps=1 --delta_T=1 --use_MTEF=1 --alpha_MTEF=0.4 --check_user=39 --embed_file=embed_user_FM_coat_512emb_all.npy --use_onehot_embed=0 --use_topk=100 --data_path=./data/coat/coat-FM/ --subdata_path=./data/coat/coat-subusers-cate3/ --backend_path=./code/FM/embeddings/ --backend_modelname=_FM_coat_256emb_backend.npy --gamma=0.5

UpliftRec-MTEF on KuaiRec:

python -u main.py --dataset=kuai --similar_user_num=12288 --similar_user_num_propensity=1000 --treat_clip_num=5 --ADRF_null=0.01 --propensity_null=0.01 --MTEF_null=0.35 --top_k=[10] --eps=0 --delta_T=1 --use_MTEF=1 --alpha_MTEF=0.1 --check_user=0 --embed_file=embed_user_MF_kuai_256emb_all.npy --use_onehot_embed=0 --use_topk=1000 --data_path=./data/kuai/kuai-FM/ --subdata_path=./data/kuai/kuai-subusers-2cate5/ --backend_modelname=_MF_kuai_256emb_backend.npy --backend_path=./code/MF/embeddings/ --gamma=4

UpliftRec-ADRF on Yahoo!R3:

python -u main.py --dataset=yahoo --similar_user_num=50 --similar_user_num_propensity=20 --treat_clip_num=8 --ADRF_null=0.3 --propensity_null=0.01 --top_k=[10] --eps=1 --delta_T=1 --use_MTEF=0 --check_user=14 --embed_file=embed_user_MF_yahoo_256emb_all.npy --use_onehot_embed=0 --use_topk=1000 --data_path=./data/yahoo/yahoo-FM/ --subdata_path=./data/yahoo/yahoo-subusers-cate5/  --backend_modelname=_MF_yahoo_512emb_backend.npy --backend_path=./code/MF/embeddings/ --gamma=1

UpliftRec-ADRF on Coat:

python -u main.py --dataset=coat --similar_user_num=5 --similar_user_num_propensity=50 --treat_clip_num=8 --ADRF_null=0.01 --propensity_null=0.01 --top_k=[10] --eps=1 --delta_T=1 --use_MTEF=0 --check_user=116 --embed_file=embed_user_FM_coat_512emb_all.npy --use_onehot_embed=0 --data_path=./data/coat/coat-FM/ --subdata_path=./data/coat/coat-subusers-cate3/ --backend_path=./code/FM/embeddings/ --backend_modelname=_FM_coat_256emb_backend.npy --gamma=8 --use_topk=100

UpliftRec-ADRF on KuaiRec:

python -u main.py --dataset=kuai --similar_user_num=100 --similar_user_num_propensity=1000 --treat_clip_num=8 --ADRF_null=0.01 --propensity_null=0.01 --top_k=[10] --eps=1 --delta_T=1 --use_MTEF=0 --check_user=0 --embed_file=embed_user_MF_kuai_256emb_all.npy --use_onehot_embed=0 --data_path=./data/kuai/kuai-FM/ --subdata_path=./data/kuai/kuai-subusers-2cate5/ --backend_modelname=_MF_kuai_256emb_backend.npy --backend_path=./code/MF/embeddings/ --gamma=0

We train UpliftRec on GPU. Note that the embedding files are also on GPU.

Requirements

python == 3.8.13
pytorch == 1.7.1
scikit-learn == 1.0.2

jiaju-chen / upliftrec Goto Github PK

upliftrec's Introduction

UpliftRec

Overview

Dataset

Backend Models

Quick Examples with Optimal Parameters

Requirements

upliftrec's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent