This repository is the source code of the paper Learning to Infer User Implicit Preference in Conversational Recommendation implemented via PyTorch.
More descriptions are available via the paper.
The code is partially refer to Adapting User Preference to Online Feedback in Multi-round Conversational Recommendation (FPAN).
torch==1.4.0
torch_geometric==1.4.3
tqdm
sklearn
The dataset we used is based on Yelp and LastFM which are processed in SCPR
- To train CRIF offline model:
python offline_train_rec.py
The model paramters will be saved in\recommendersystem\recmodel
- To train policy network in conversational component:
Inverse reinforcement learning (IRL) is adopted to tackle the decision-making problem. We conduct our approach to learn a reward function from human feedback and optimize it explicitly using human feedback. At the same time, we also train a policy to maximize reward given the current predicted reward function.
python train_agent_ear.py --mode pretrain
: train policy network without pretrain model. The model parameters will be saved in\agents\agent_ear
python train_agent_ear.py --mode PG
: further train policy network with pretrain model. The model parameters will be saved in\agents\agent_ear
- To directly evaluate CRIF:
We provided the model parameters in corresponding folder.
python offline_test_rec.py
: evaluate CRIF offline modelpython test_agent_ear.py
: evaluate CRIF with user simulator