Code Monkey home page Code Monkey logo

tulo-chatbot's Introduction

Tulo-Chatbot

This bot is a ML (machine learning) based intent (category) specific conversational bot, with following features -

  1. This is domain agnostic. Provide it with right training data and it should work just fine out of the box (However to demonstrate a use-case, training data is for banking domain).
  2. It has been designed to classify incoming query into categories
  3. In case a query is unclassifiable, it is stored for later training
  4. It is REST API driven (via flask). However, it can also be extended to any currently available social messaging app such as slack bot, skype bot, whatsapp, wechat, telegram (telegram prototype implemented. See below for reference)
  5. APIs require authentication and authorization.
  6. Models can be trained and retrained on the fly
  7. Everything (model creation, training, querying) is database driven (MongoDB and Redis server for caching).
  8. Supports multilingual training
  9. Supports trainable expletive query management
  10. Out of box deployment ready on Heroku (More on this later)
  11. Extend model selection and vector selection to custom implementation

In Pipeline (major upgrades) (Star it, watch it, contribute actively to it!)

  1. Each user account can support bots in multiple projects, each in multiple languages.
  2. Will Support (in the pipeline) follow up queries, custom variables in input and output.
  3. Small talk support (was part of version 1, but upon re-designing, this feature broke). Move to Spacy for small talk
  4. Improve classification accuracy by using normalizer (for spelling mistakes), NLTK for preprocessing, Lemmatization and stemming
  5. Add GUI for improved user experience (will mostly be a separate project)
  6. Other recommendations about project structure, deployment best practices etc...

Tech/ Infra Stack

Python 3.6+
MongoDB
Heroku (for deployment)
Redis (for decentralized caching)

Actors and Systems

Users -> Brokers -> Language -> TrainedClassifier

  1. User - person who creates the chatbot. This bot can be deployed as a "Bot as a Service".
  2. Brokers - projects under which chat bots are created. User can create multiple brokers (Bank Bot, HR Bot, Restaurant Bot)
  3. Language - Under each broker, each chatbot can deal in multiple languages, with a classifier trained per language. Language is passed as a input parameter.
  4. TrainedClassifier - trained model for a given language. Refer to REQUEST objects below on how to make multi-lingual queries

Project Structure

modules
    |
    |__ controllers
    |__ data
    |    |__ dao
    |    |__ db_model
    |    |__ dto
    |__ nlp_engine
    |    |__ classifier_instance
    |    |__ model_builder
    |    |__ model_selection
    |    |__ vector_selection
    |__ saved_models *not used anymore*
    |__ services
    |__ utils
  1. Controllers - These contain endpoints exposed for flask and telegram (going forward for any other end point as well)
  2. data -> dao - contains daos for mongodb -> db_model - all the models which are required by the project reside here -> dto - response objects mostly
  3. nlp_engine -> classifier_instance - trained instance of a model, which is pickled and stored in database after training -> model_builder - contains training classes -> model_selection - contains models used for classification. Extend your models here -> vector_selection - contains vector implementations for bag of words models.
  4. services - intermediate layer between controllers and dao, plus any other addendum requirements
  5. utils - misc methods

Setup and installation -

  1. Setup mongodb. the sample data is available in docs -> db folder. create a database called tulo_bot and dump everything there
  2. ensure modules defined in requirements.txt exist (TODO : make installation of all requirements script based)
  3. setup redis.
  4. provide appropriate url and credentials for mongo and redis in config.yaml

Run Flask API -

  1. run flask_controller.py

APIs

AUTHENTICATION

REQUEST
URL : /authenticate
body :

{
	"email" : "[email protected]",
	"password" : "password1"
}

RESPONSE
returns list of brokers (id + default language) available and a auth token

RETRAIN

REQUEST
URL : /retrain
body :

{
	"token": "<<use token generated from login here>>",
	"broker_id" : "5d9e1f9d6ecaa9720db58964",
 	"lang" : "en-US"
}

QUERY

REQUEST
URL : /query
body :

{
	"token": "<<use token generated from login here>>",
	"broker_id" : "5d9e1f9d6ecaa9720db58964",
	"lang" : "en-US",
	"query" : "Can you show my balance?"
}

LOGOUT

REQUEST
URL : /logout
body :

{
	"token": "<<use token generated from login here>>"
}

For more details refer to (https://github.com/usriva2405/tulo-chatbot/wiki)

tulo-chatbot's People

Contributors

usriva2405 avatar usrivastava24 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tulo-chatbot's Issues

Restructure Project to read train directly from mongodb

Currently the project uses flatfile (train.csv) for training. Instead, create dataframe from MongoDB, and use it directly.

This will ensure easy re-training directly from database, and will help in extending model specific trainings.

Support follow-up questions

Add support for follow up questions - Add linkages of input and output circumstances to support data flow

Add Additional classifiers for feature extraction

predict the following (using SpaCy?) in input query -
Noun, place, Animal, date, number, email, time, colour, url.

Add an input variable automatically when system predicts any of the above in the input query.

Setup a chatbot

prototype a basic chatbot which inputs a query and predicts following -

  • question category
  • answer category
  • answers

What email and password we have to enter in api authenticate?

@usriva2405 @usrivastava24 After setting up requirements when I run flask_controller.py and then I try authenticate API on postman, then I got error of credentials. I wants to what email and password I need to enter here?

I tried to enter mongo db user email and password with which I created account on MongoDB cloud but it didn't accept that email and password.
Can you please tell me what email and password we need to enter to run api authenticate ?

Thanks

Add capability for users to create projects (called broker)

Add a project layer between user and languages specific training data.

Users --(can add multiple)--> Projects(broker) --(can support multiple)--> languages --(will contain)--> training data

Add it once user layer has been added and language support has been added

Add user layer to database

This system be moved behind an authentication layer, to enable custom predictors for logged in users.
Add a user layer to the database

refactor classifier_instance

classifier instance should not have to worry about structuring response, saving unclassified query to database. It's job is to only classify and return.

hence need to move this logic to chat service.

cache behaving weird

cachetools sometimes gives the correct email corresponding to a session token, and other times returns null. need to check why this is happening, and if we can instead use built in decorator for cachetools

Add support for spelling mistakes in input

Currently this library gives inconsistent predictions if there are spelling mistakes in input query. Add a support for normalizer before vectorization to help with spelling mistakes.

Support input variables

Add support for variables in input

  • train a classifier(s?) to identify date, number, email, time, colour, url

Use NLTK/ BERT for small talk classifier

check if we can use NLTK for small talk module. train a separate classifier, allow copying to user account as optional.
Similar to dialogflow, following intents could be added -

  1. About Agent, Who are you?, How old are you?, You're annoying., Answer my question, You're bad, Can you get smarter?, You're beautiful., What's your birth date?, You're boring., Who is your boss?, Are you busy?, Can you help me?, You're a chatbot., You're so clever, You're crazy, You're fired., You are funny., You are good., Are you happy?...
  2. Courtesy - thats bad, great!, No problem, thank you!, you're welcome!, well done.
  3. Emotions - ha ha ha! Wow!
  4. Hello/ Good bye - Hello, hi, hey there!, wassup? good morning, good afternoon, good evening, good night, How're you? nice to meet you. nice to see you, what's up?, nice to talk to you
  5. About user - I'm very angry right now., I'm back!, I am bored., I am busy., I can't sleep., I don't want to talk., I'm so excited., I'm going to bed., I'm good., I'm happy., Today is my birthday., I am here., I'm kidding., What do I look like?, I'd like to see you again., I just want to talk.,
  6. Confirmation - Yes, No, Cancel
  7. Other Questions/ phrases - give me a hug, i don't care, sorry. what do you mean? you're wrong.

sample json from dialogflow is as below -

{
  "responseId": "64907be5-b773-40d1-9027-abd34fe1ab7e-f6406966",
  "queryResult": {
    "queryText": "I don't care",
    "action": "smalltalk.dialog.i_do_not_care",
    "parameters": {},
    "allRequiredParamsPresent": true,
    "fulfillmentText": "Ok, let's not talk about it then.",
    "fulfillmentMessages": [
      {
        "text": {
          "text": [
            "Ok, let's not talk about it then."
          ]
        }
      }
    ],
    "intent": {},
    "intentDetectionConfidence": 1,
    "languageCode": "en"
  }
}

segregate small talk and expletives management

Small talk and expletives should be added as separate classifiers.
Add it separately under the user "@def" (default) or "@sys" (system), same as what is used for defining default variables
Clone a copy when user requests for expletives/ small talk module to their account. Allow them to add custom trainable queries to their account.

Upgrade config file

upgrade config file to yaml
create a python class to extract variables
add environment specific structures (local, prod)

bind auth related to bots for messaging apps to broker

Idea would be for users to be able to configure bot auths for each broker. So as a user, if I have multiple projects (or brokers in our case) (for e.g. HR project, bank project etc), each project could individually support a bot specific to that project on multiple social media platforms.

Now I don't know about FB, Slack, Skype, Wechat, but Telegram has a auth token which binds to a bot (assuming rest also work in similar fashion.)

So every auth token for every bot should be configured at broker level.

Create endpoint for retraining

create service + flask endpoint for retraining data. gradually move away from saving file to filesystem, instead move it to database.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.