zyang1580 / pda Goto Github PK

This is an implementation for our SIGIR 2021 paper "Causal Intervention for Leveraging Popularity Bias inRecommendation" based on tensorflow..

Python 85.51% C++ 2.25% Jupyter Notebook 10.73% Cython 1.51%

pda's Introduction

PDA

This is an implemention for our SIGIR 2021 paper "Causal Intervention for Leveraging Popularity Bias in Recommendation" based on tensorflow.

This work was completed when Yang Zhang was an intern at WeChat, Tencent.

Requirements

tensorflow == 1.14
Cython (for neurec evaluator)
Numpy
prefetch-generator
python3
pandas == 0.25.0

Datasets

Kwai: we provide the URL of original data and the pre-processing codes (not checked) for filtering and splitting it. We do not provide the processed Kwai dataset here because we don't make sure whether we have the right to open it. If you have difficulties getting it or processing it, you can contact us.

original data; processing-code
Douban(Movie): we provide the URL of original data. Please note that only the movies dataset is utilized. We also provide the preprocessing codes (not checked, please check it with the processed data) and the processed dataset. If you use this dataset, you may need to cite the following paper as the dataset owners request: Song, Weiping, et al. "Session-based social recommendation via dynamic graph attention networks." WSDM 2019.
original data; processing-code; processed data
Tencent: this is a private dataset.

Parameters

Key parameters in train_new_api.py:

--pop_exp: gamma or $\tilde{gamma}$ in paper.
--train: model selection (normal:BPRMF/BPRMF-A | s_condition:PD/PDA | temp_pop:BPR(t)-pop).
--test: similar to train.
-- saveID: saved name flag.
--Ks: list, set top K.
others: others: read help, or "python xxx.py --help"

Commands

We provide following commands for our models and baselines.

PD & PDA

We provide two methods.

1. Simply Reproduce the Results:

First, we have provided the model that we trained for simply reproducing the results in our paper. Then you can run the following commands to reproduce the results in our paper.

python -u MF/simple_reproduce.py --dataset douban --epoch 2000 --save_flag 0 --log_interval 5 --start 0 --end 10 --step 1 --batch_size 2048 --lr 1e-2 --train s_condition --test s_condtion --saveID xxx --cuda 0 --regs 1e-2 --valid_set valid --pop_exp 0.22 --save_dir /home/PDA/save_model/ --Ks [20,50]

And you need to change the 'pop_exp' for different datasets. (kwai:0.16, douban:0.22)

The trained model can be download at this URL (instruction in English for downloading can be found here).

2. Start from Scratch:

If you want to run PD and PDA on new datasets, you need:

1). Split data:

Split the dataset into T stages by yourself and save each stage data as a file with the name like "t_0.txt". The files should have the following format in each line: item interacted_user1 interacted_user2 ....
And you also need to save all training/testing/valid data in one file with a name such as "train_with_time.txt", and it should have the following format: uid iid time stars, where time in [0,T-1].

Note that we have provided the processed data for Douban, you can refer to it.

2). Compute Popularity:

Compute the item popularity of each stage and normalize the computed popularity:
```
python pop_pre.py --path your_data_path --slot_count T
```
the computed popularity will be saved in a file, so you only need to run this code once.

3). Run PD/PDA:

Run the main command:

nohup python -u MF/train_new_api.py --dataset kwai --epoch 2000 --save_flag 1 --log_interval 5 --start 0 --end 10 --step 1 --batch_size 2048 --lr 1e-2 --train s_condition --test s_condition --saveID s_condition --cuda 1 --regs 1e-2 --valid_set valid --pop_exp gamma > output.out &

and tune the parameters.

4). Optimal Parameters for Douban and Kwai:

the prameters that we found:

datasets\para PD reg PD pop_exp(gamma) PDA reg PDA pop_exp(gama)

kwai 1e-3 0.02 1e-3 0.16

Douban 1e-3 0.02 1e-3 0.22

Other parameters: default.

Note: For PD and PDA, they can get good performance compared with baselines with the same gamma, such as for Kwai with gamma=0.1, both PD and PDA can get a good performance compared with baselines. Due to the influence of random seeds and different machines, the results may have a few differences. To reproduce the same results easily, we provide the trained models.

datasets\para	PD reg	PD pop_exp(gamma)	PDA reg	PDA pop_exp(gama)
kwai	1e-3	0.02	1e-3	0.16
Douban	1e-3	0.02	1e-3	0.22

Baselines

We provide the codes for BPRMF/BPR-PC/BPR(t)-pop implemented by ourselves.

1. BPRMF/BPRMF-A:

run BPRMF/BPRMF-A:

nohup python -u MF/train_new_api.py --dataset kwai --epoch 2000 --save_flag 0 --log_interval 5 --start 0 --end 10 --step 1 --batch_size 2048 --lr lr --train normal --test normal --saveID normal --cuda 1 --regs reg --valid_set valid > output.out &

When running BPRMF, it will run BPRMF-A synchronously.

2. BPR-PC:

you need two steps:

step1: get the trained model of BPRMF.
step2: run BPR_PC.py and find hyper-parameters(alpha and beta): 
     python3 -W ignore  -u MF/BPR_PC.py --dataset dataset --epoch 2000 --save_flag 0 --log_interval 5 --start 0 --end 10 --step 1 --batch_size 2048 --lr 1e-3 --train normal --test normal  --saveID normal --pretrain 0 --cuda 1 --regs 1e-2  --valid_set valid --Ks [20,50] --pc_alpha $alpha --pc_beta $beta

The model path is determined by parameters such as --train/test/saveID. For more detail, please read the code.

3. BPR(t)-Pop:

Similar to BPR-MF, change the parameters "train/test/saveID" to "temp_pop".

4. xQuad and DICE

The code for xQuad and DICE is provided by the original authors. You can also ask the origin authors for the code.Please note that for DICE, regarding a regularization term named $L_{discrepancy}$, we replace $dCor$ with another option --- $L2$. Because our datasets are far bigger than the datasets taken by the original paper, computing $dCor$ is very slow and will be out of memory for the 2080Ti GPU that we used. As suggested in the paper, for the large-scale dataset, $L_2$ is taken as a substitute to be its $L_{discrepancy}$.

5. hyper-parameters:

Please read our paper about tuning the hyper-parameters.

Acknowledgments

The c++ evaluator implemented by Neurec is used in some case.
Some codes are provided by the co-author Tianxin Wei.

Citation

If you use our codes in your research, please cite our paper.

pda's People

Contributors

Stargazers

Watchers

pda's Issues

Create_Recommendation & do recommendation function

Excuse me, I would like to ask some questions about the 2 functions mentioned in the title：

is there any difference between Create_Recommendation and do recommendation in the DatasetApi_Model class?
In the definition of the class init, is the Create_Recommendation function finally used to calculate the score during training?
I noticed that do recommendation was used to get top-k results during the test, but the calculated scores in the do recommendation function is the same as Create_Recommendation. And the test dataset in the class doesn't seem to work. Is there any connection between the two functions?

Looking forward to your reply！

running BPR_top comes across a bug

my code :
python -u MF/train_new_api.py --dataset douban --epoch 10 --save_flag 0 --log_interval 5 --start 0 --end 10 --step 1 --batch_size 2048 --lr 1e-2 --train temp_pop --test temp_pop --saveID temp_pop --cuda 1 --regs 1e-2 --valid_set valid

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[2047] = [2047, 1] does not index into param shape [2048,1]
[[{{node GatherNd}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "MF/train_new_api.py", line 1091, in
model.Recommender.mf_loss, model.Recommender.reg_loss ])
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[2047] = [2047, 1] does not index into param shape [2048,1]
[[node GatherNd (defined at /workspace/CRD/PDA/MF/model_api.py:346) ]]

Errors may have originated from an input operation.
Input Source operations connected to node GatherNd:
embedding_lookup_3/Identity (defined at /workspace/CRD/PDA/MF/model_api.py:341)
concat (defined at /workspace/CRD/PDA/MF/model_api.py:344)

Request to Add a License to the Code Repository

First, thank you for sharing your work with the community!

I kindly ask if you could consider adding an open-source license to your repository for enabling others to reference your work.

Thank you for considering our request!

Question on function create_bpr_loss_two_brach

Hi, thanks for your great work and sharing of the code. I have one question about the implemented BPRMF model.
In the following implementation, I did not understand which paper the function create_bpr_loss_two_brach is in reference to?

def create_bpr_loss_two_brach(self, users, pos_items, neg_items): # TODO???
    pos_scores = tf.reduce_sum(tf.multiply(users, pos_items), axis=1)   #users, pos_items, neg_items have the same shape
    neg_scores = tf.reduce_sum(tf.multiply(users, neg_items), axis=1)
    # item stop
    pos_scores = tf.nn.elu(pos_scores) + 1
    neg_scores = tf.nn.elu(neg_scores) + 1
    # pos_items_stop = tf.stop_gradient(pos_items)
    # neg_items_stop = tf.stop_gradient(neg_items)
    pos_items_stop = pos_items
    neg_items_stop = neg_items

    self.pos_item_scores = tf.matmul(pos_items_stop,self.w)
    self.neg_item_scores = tf.matmul(neg_items_stop,self.w)
    ps_sigmoid = tf.nn.sigmoid(self.pos_item_scores)
    ns_sigmoid = tf.nn.sigmoid(self.neg_item_scores)

    # first branch
    pos_scores = pos_scores* ps_sigmoid
    neg_scores = neg_scores* ns_sigmoid
    maxi = tf.log(tf.nn.sigmoid(pos_scores - neg_scores)+1e-10)
    self.rubi_ratings = (tf.nn.elu(self.batch_ratings) + 1  - self.rubi_c) * tf.squeeze(ps_sigmoid)
    # self.shape1 = tf.shape(self.batch_ratings)
    # self.shape2 = tf.shape(tf.squeeze(ps_sigmoid))
    # self.rubi_ratings = (self.batch_ratings-self.rubi_c) * tf.squeeze(tf.nn.sigmoid(self.pos_item_scores))
    self.direct_minus_ratings = self.batch_ratings-self.rubi_c*tf.squeeze(tf.nn.sigmoid(self.pos_item_scores))
    self.mf_loss_ori_bce = tf.negative(tf.reduce_mean(maxi))

    # second branch
    # maxi_item = tf.log(tf.nn.sigmoid(self.pos_item_scores - self.neg_item_scores))
    # self.mf_loss_item_bce = tf.negative(tf.reduce_mean(maxi_item))
    self.mf_loss_item_bce = tf.reduce_mean(tf.negative(tf.log(ps_sigmoid + 1e-10))+tf.negative(tf.log(1-ns_sigmoid+1e-10)))
    # unify
    mf_loss = self.mf_loss_ori_bce + self.alpha*self.mf_loss_item_bce
    # regular
    regularizer = tf.nn.l2_loss(users) + tf.nn.l2_loss(pos_items) + tf.nn.l2_loss(neg_items)
    regularizer = regularizer/self.batch_size

    
    reg_loss = self.decay * regularizer
    return mf_loss, reg_loss

and what is the meaning of the "rubi_c" variable?

self.rubi_c = tf.Variable(tf.zeros([1]), name = 'rubi_c') # [TODO]???

and I am also confused about the untrainable variable "user_rand_embedding" meaning

weights['user_rand_embedding'] = tf.Variable(initializer([self.n_users, self.emb_dim]), name = 'user_rand_embedding', trainable = False) # [TODO] ???
weights['item_rand_embedding'] = tf.Variable(initializer([self.n_items, self.emb_dim]), name = 'item_rand_embedding', trainable = False)

Can you tell me the paper this implementation is in reference with?
Thanks for your time and any help would be appreciated.

AttributeError: 'Data2' object has no attribute 'expo_popularity'

Thanks for your work, while running the command,

python -u MF/train_new_api.py --dataset douban --epoch 2000 --save_flag 1 --log_interval 5 --start 0 --end 10 --step 1 --batch_size 2048 --lr 1e-2 --train s_condition --test s_condition --saveID s_condition --cuda 1 --regs 1e-2 --valid_set valid --pop_exp 0.22
The error occurs:

File "D:\Dragon_Killer_Plus\PDA_ori\MF\train_new_api.py", line 402, in generator_n_batch_with_pop
batch_pos_pop.append(data.expo_popularity[one_pos_item,u_pos_time])

AttributeError: 'Data2' object has no attribute 'expo_popularity'

could you tell me how to fix it, thanks a lot!

Difference between the PD model and the BPRMF model

Hi, thanks for your great paper and the sharing the code!

I have a question related to the model.
In Sec 3.2, Step-2. Estimating $\sum_z P(C|U,I,z)p(z)$. The last sentence of this paragraph is we can use $ELU'(f_{\theta} (u,i))$ to estimate P(C|do(U,I)). And $f_{\theta}(u,i)$ denotes any user-item matching model and the paper chooses the MF model(last paragraph in Page-14). As I understand, the PD model embeds an ELU' activation function and that's the main difference between the BPRMF model and the PD model. (I guess I am missing sth here.)

According to Algorithm-1, during the training and inference, the PD model does not use the information of popularity bias.
(I understand the PDA model injected the popularity bias. )

But according to Table-1, the PD model performs much better than BPRMF model.
and I am confused about this.

Any help would be appreciated and thanks for your time.

Unable to import .pyx file

Why use sparse_clicked_matrix in the testing stage to get topk_item?

I want to know the reason of using sparse_clicked_matrix.
def generator_Rec_result_fast(self, model, sess, rec_type): i = 0 result = [] ttt1 = time() for batch_user in self.list_batch_user: batch_item = list(range(ITEM_NUM)) index, row_num, valu_num = self.list_batch_index[i] if self.testing_popularity is not None: pos_pop = self.testing_popularity[batch_item] else: pos_pop = None sparse_cliked_matrix = ( index, np.array([-np.inf] * valu_num).astype(np.float32), np.array([row_num, ITEM_NUM]).astype(np.int64)) batch_topk = model.do_recommendation(sess, batch_user, batch_item, rec_type, pos_pop=pos_pop, sparse_cliked_matrix=sparse_cliked_matrix) i += 1 yield (batch_user, batch_topk)

And when getting main_branch result(for model don't consider injecting predicted popularity (PD/BPRMF)), it remain exist.
self.main_branchRec_rating = tf.sparse.add(self.Recommender.batch_ratings, self.sparse_cliked_matrix) # remove history _, self.main_brach_topk_idx = tf.nn.top_k(self.main_branchRec_rating, topk_max) # topk
If there is something unclear, please point it out. And I sincerely look forward to your answer.

README中对 t_0.txt 格式的描述有误

实际代码实现中是item interacted_user1 interacted_user2 ...

Kwai Datasets

Hello,could you please provide me with a kwai Datasets?My e-mail is [email protected]

How is P(Z) calculated?

Thank you for sharing the code. I have a question. How did you calculate the P(Z) in the paper in the code?

Looking forward to your reply, thank you！

ValueError: Cannot subset columns with a tuple with more than one element. Use a list instead

I'm using 2.0.3 version pandas
Traceback (most recent call last):
File "MF/simple_reproduce.py", line 30, in
from batch_test import *
File "C:\Users\Administrator\PDAModelTests\MF\batch_test.py", line 10, in
data = Data2(args)
File "C:\Users\Administrator\PDAModelTests\MF\load_data.py", line 742, in init
self.load_ori_data(args)
File "C:\Users\Administrator\PDAModelTests\MF\load_data.py", line 639, in load_ori_data
user_item_time = train_data.groupby('uid')[('iid','time')].agg(list)
File "C:\Users\Administrator\anaconda3\lib\site-packages\pandas\core\groupby\generic.py", line 1767, in getitem
raise ValueError(
ValueError: Cannot subset columns with a tuple with more than one element. Use a list instead.

Why is 𝑃 (𝐶 |𝑑𝑜 (𝑈,𝐼)) but not 𝑃 (𝐶 |𝑑𝑜 (𝑈,𝐼), Z)?

Thank you for sharing the code! I'm troubled by a probability problem.

In the causal graph shown in Figure 1(b), C's parent nodes contain U, I & Z. Therefore, in the probability functions evaluated on causal graphs, why not use 3 conditional variables like 𝑃 (𝐶 |𝑑𝑜 (𝑈,𝐼), Z)?

I‘m looking forward to your early reply!

Kwai data set can not be downloaded , may you share this data set as you wrote below

Kwai: we provide the URL of original data and the pre-processing codes (not checked) for filtering and splitting it. We do not provide the processed Kwai dataset here because we don't make sure whether we have the right to open it. If you have difficulties getting it or processing it, you can contact us.

original data; processing-code

· ModuleNotFoundError: No module named 'util.cython.random_choice'