Code Monkey home page Code Monkey logo

bob's Introduction

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data

This repository provides the implementation details for the ACL 2021 main conference paper:

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data. [paper]

1. Data Preparation

In this work, we carried out persona-based dialogue generation experiments under a persona-dense scenario (English PersonaChat) and a persona-sparse scenario (Chinese PersonalDialog), with the assistance of a series of auxiliary inference datasets. Here we summarize the key information of these datasets and provide the links to download these datasets if they are directly accessible.

2. How to Run

The setup.sh script contains the necessary dependencies to run this project. Simply run ./setup.sh would install these dependencies. Here we take the English PersonaChat dataset as an example to illustrate how to run the dialogue generation experiments. Generally, there are three steps, i.e., tokenization, training and inference:

  • Preprocessing

     python preprocess.py --dataset_type convai2 \
     --trainset ./data/ConvAI2/train_self_original_no_cands.txt \
     --testset ./data/ConvAI2/valid_self_original_no_cands.txt \
     --nliset ./data/ConvAI2/ \
     --encoder_model_name_or_path ./pretrained_models/bert/bert-base-uncased/ \
     --max_source_length 64 \
     --max_target_length 32
    

    We have provided some data examples (dozens of lines) at the ./data directory to show the data format. preprocess.py reads different datasets and tokenizes the raw data into a series of vocab IDs to facilitate model training. The --dataset_type could be either convai2 (for English PersonaChat) or ecdt2019 (for Chinese PersonalDialog). Finally, the tokenized data will be saved as a series of JSON files.

  • Model Training

     CUDA_VISIBLE_DEVICES=0 python bertoverbert.py --do_train \
     --encoder_model ./pretrained_models/bert/bert-base-uncased/ \
     --decoder_model ./pretrained_models/bert/bert-base-uncased/ \
     --decoder2_model ./pretrained_models/bert/bert-base-uncased/ \
     --save_model_path checkpoints/ConvAI2/bertoverbert --dataset_type convai2 \
     --dumped_token ./data/ConvAI2/convai2_tokenized/ \
     --learning_rate 7e-6 \
     --batch_size 32
    

    Here we initialize encoder and both decoders from the same downloaded BERT checkpoint. And more parameter settings could be found at bertoverbert.py.

  • Evaluations

     CUDA_VISIBLE_DEVICES=0 python bertoverbert.py --dumped_token ./data/ConvAI2/convai2_tokenized/ \
     --dataset_type convai2 \
     --encoder_model ./pretrained_models/bert/bert-base-uncased/  \
     --do_evaluation --do_predict \
     --eval_epoch 7
    

    Empirically, in the PersonaChat experiment with default hyperparameter settings, the best-performing checkpoint should be found between epoch 5 and epoch 9. If the training procedure goes fine, there should be some results like:

     Perplexity on test set is 21.037 and 7.813.
    

    where 21.037 is the ppl from the first decoder and 7.813 is the final ppl from the second decoder. And the generated results is redirected to test_result.tsv, here is a generated example from the above checkpoint:

     persona:i'm terrified of scorpions. i am employed by the us postal service. i've a german shepherd named barnaby. my father drove a car for nascar.
     query:sorry to hear that. my dad is an army soldier.
     gold:i thank him for his service.
     response_from_d1:that's cool. i'm a train driver.
     response_from_d2:that's cool. i'm a bit of a canadian who works for america.  
    

    where d1 and d2 are the two BERT decoders, respectively.

  • Computing Infrastructure:

    • The released codes were tested on NVIDIA Tesla V100 32G and NVIDIA PCIe A100 40G GPUs. Notice that with a batch_size=32, the BoB model will need at least 20Gb GPU resources for training.

MISC

  • Build upon ๐Ÿค— Transformers.

  • Bibtex:

      @inproceedings{song-etal-2021-bob,
          title = "{B}o{B}: {BERT} Over {BERT} for Training Persona-based Dialogue Models from Limited Personalized Data",
          author = "Song, Haoyu  and
            Wang, Yan  and
            Zhang, Kaiyan  and
            Zhang, Wei-Nan  and
            Liu, Ting",
          booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
          month = aug,
          year = "2021",
          address = "Online",
          publisher = "Association for Computational Linguistics",
          url = "https://aclanthology.org/2021.acl-long.14",
          doi = "10.18653/v1/2021.acl-long.14",
          pages = "167--177",
      }
      
  • Email: [email protected].

bob's People

Contributors

haoyusoong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.