Code Monkey home page Code Monkey logo

light_chatbot's Introduction

How to use this repo

In order to download and format the dataset you have to run the line python light_dataset.py. To train a network you have to run python dialogue.py. Once trained you can deploy it online with python deploy_anvil.py. I've done some initial tests using the onnx format to speed up the models in the code test_onnxx_speedup.py, but the improvements were not impressive.

The content of this repo

I wanted to have a chatbot with a fixed personality, that has to play a role in a fantasy game. So, I thought of the LIGHT dataset which is actually a series of text datasets where there's different characters in different scenarios, and they have to chit-chat, produce emojis, and actions to take.

The architecture that I use is very similar to the Transformer architecture, but the encoder is used to encode various inputs. It's based on the architecture proposed on the WoW article, where they have a chatbot that can attend to Wikipedia articles to give a knowledge-grounded reply. In the case of this chatbot with fixed personality, the idea was to have the encoder attend to the persona of the character, the context of the interaction, and the history of the dialogue, which would be the knowledge to ground the reply, and the decoder would only attend to the reply to give. This solves a few problems. The first one is that the persona, the context, and the history are all fixed, cannot be influenced with a clever prompt. This is in contrast to cases where for example the persona is given as an initial prompt, making it easy to ask the chatbot to simply forget that persona. Moreover, by passing them through the encoder, they can be easily precomputed, so, the encoder can ideally be evaluated many fewer times, than if all the text describing the persona, the context, and the history had to be passed to the decoder as one would do if the chatbot was GPT based. Additionally, this type of model is probably more efficient than relying on a huge GPT4 model, that has to be trained to know everything about the world.

Tricks that I would apply in another iteration

  • Non autoregressive decoder model to speed up generation, since ideally the decoder would need to be evaluated only once to produce a sentence, instead of looping for each word as one would normally do;
  • Pretrain encoder of WoW architecture with BERT and decoder with GPT2, since that would drastically enrich the language of the architecture, and then fine tune it with the LIGHT datasets;
  • Distil on a smaller model, since often one achieves better language by training on bigger models and then distil, than training directly on a small model. Then the small model would be faster at inference time.

light_chatbot's People

Watchers

Luca Celotti avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.