Filtering before Iteratively Referring for Knowledge-Grounded Response Selection in Retrieval-Based Chatbots
This repository contains the source code and datasets for the EMNLP 2020 paper Filtering before Iteratively Referring for Knowledge-Grounded Response Selection in Retrieval-Based Chatbots by Gu et al.
Our proposed Filtering before Iteratively REferring (FIRE) model has achieved a new state-of-the-art performance of knowledge-grounded response selection on the PERSONA-CHAT and CMU_DoG datasets.
Python 2.7
Tensorflow 1.4.0
Your can download the datasets and their corresponding embedding and vocabulary files used in our paper from the following links.
- PERSONA-CHAT and its embedding and vocabulary files.
- CMU_DoG and its embedding and vocabulary files.
Unzip the datasets to the folder of data
and run the following commands. The processed files are stored in data/personachat_processed/
or data/cmudog_processed/
.
cd data
python data_preprocess_pc.py
python data_preprocess_cd.py
Then, unzip their corresponding embedding and vocabulary files to the folder of data/personachat_processed/
or data/cmudog_processed/
.
Take PERSONA-CHAT as an example.
cd scripts
bash train_personachat.sh
The training process is recorded in log_FIRE_train_personachat_original.txt
file.
bash test_personachat.sh
The testing process is recorded in log_FIRE_test_personachat_original.txt
file. And your can get a test_out_FIRE_personachat_original.txt
file which records scores for each example. Run the following command and you can compute the metric of Recall.
python compute_recall.py
If you use the code and datasets, please cite the following paper: "Filtering before Iteratively Referring for Knowledge-Grounded Response Selection in Retrieval-Based Chatbots" Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Zhigang Chen, Xiaodan Zhu. EMNLP (2020)
@inproceedings{gu-etal-2020-filtering,
title = "Filtering before Iteratively Referring for Knowledge-Grounded Response Selection in Retrieval-Based Chatbots",
author = "Gu, Jia-Chen and
Ling, Zhenhua and
Liu, Quan and
Chen, Zhigang and
Zhu, Xiaodan",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.findings-emnlp.127",
pages = "1412--1422",
}
Thank ParlAI for providing the PERSONA-CHAT dataset.
Thank Xueliang Zhao for providing the processed CMU_DoG dataset used in their paper.
Please keep an eye on this repository if you are interested in our work.
Feel free to contact us ([email protected]) or open issues.