Code Monkey home page Code Monkey logo

info's Introduction

INFO: Intellectual and Friendly Dialogue Agents grounding

Source codes for the paper "You Truly Understand What I Need: Intellectual and Friendly Dialogue Agents grounding Knowledge and Persona", accepted at EMNLP 2022 Findings.

1. Setup

1.1 Environmental Setup

The code runs with python 3.6. All dependencies are listed in requirements.txt

pip install -r requirements.txt

1.2 Dataset

You can download FoCus Dataset (Persona-Knowledge Chat) in here

1.3 Create a knowledge index

Since we use RAG for dialogue generation, you need to create a knowledge index file for the generation.
Before creating a knowledge index, you need to move Focus dataset into the data/ folder.

|-- data
    |-- FoCus
        |-- train_focus.json
        `-- valid_focus.json

1) The preprocessing code for creating raw knowledge is in the knowledge_index folder

create_knowledge_index_for_github.ipynb

2) The code for creating a knowledge index file is as below

python use_own_knowledge_dataset --csv_path=your file --output_dir=your dir

or you can simply run sh file

sh create_knowldege_index.sh

we used the same file in the transformers Github but modified it a bit for preprocessing the raw knowledge

3) After creating a knowledge index for FoCus Dataset, you should change your path in the config/rag-tok-base-ct.json

"data_dir": 
"save_dirpath": 
"knowledge_dataset_path": 
"knowledge_index_path": 

2. Training

Before you train the model, please modify the config file.

sh train.sh

3. Evaluate

sh evaluate.sh

info's People

Contributors

dlawjddn803 avatar metalchaos8527 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

info's Issues

Input Encoding Conflict in PolyEncoder with Mismatched Vocab Sizes

Hi,

I'm encountering an issue that appears to stem from a mismatch in vocab sizes between different parts of the pipeline. In my case, the input encoder handles a vocab size of 50265, while the poly_encoder seems to only cover 30522.

To give some context, here's the input that gets passed to the poly_encoder:

context_input_ids: tensor([[  101,  5320,   625,  1499,  1215,   448,   324, 21978,  3144, 48124,
           534,  6106,  1277, 11936,  7771,    43,   102,  1437,  2264,    16,
             5,  2148,     9,     5,  8410,   116]], device='cuda:0') torch.Size([1, 26])
context_input_masks: tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1]], device='cuda:0') torch.Size([1, 26])

However, upon feeding this into the BERT model within the poly_encoder, I'm hit with the following error:

/opt/conda/conda-bld/pytorch_1670525552411/work/aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [72,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
CUDA error: device-side assert triggered

This suggests an out-of-range error, which is consistent with a vocab size mismatch. I suspect that the poly_encoder's smaller vocab size of 30522 is causing the failure when handling inputs processed with a larger 50265 vocab.

Any insights into why this mismatch is occurring and how it can be resolved would be greatly appreciated. I am happy to provide further information if required.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.