Code Monkey home page Code Monkey logo

nucleus's Introduction

Nucleus

This is the repository of team Hooli, composed of Yiming Sun (ys3031), Minghao Li (ml4025), Yihan Lin (yl3820) and Yihao Li (yl3744).

Nucleus is a question-answering AI. Tell Nucleus a passage and some related questions, and you will get an answer shortly.

The key part of Nucleus is BERT, a fast and accurate deep neural network that could answer you question, if you simply provide a question and a related context.

Now, Nucleus has two different mode: context-related and context-free. In context-related mode, things go easier with the help of BERT. you provide a context to Nucleus and a question based on this context. Nucleus will tell you the answer of it.

In context-free mode, things become more interesting. At the very beginning we only planned the context-related mode. After we finished it, however, we decided to challenge ourselves, and here it comes the context-free mode.

In context-free mode, you don't need to provide a context, we do this for you - we use abundant wikipedia API to search the most possible page that may contain answer. Calling multiple APIs including Wikipedia API, rake_nltk, etc.

If you have any questions during the installation or operation of Nucleus, please feel free to open an issue.

Get Started

Before you start, remember to create a python 3.6 virtual environment. we recommend virtualenv. Then install all the packages inside requirements.txt by doing:

pip install -r requirements.txt

Before you fully launch Nucleus, you need two more things:

Config AWS

you need to config six AWS credentials, and put them in a file named config.py at root directory

cognito_userpool_id = <your_userpool_id>
cognito_app_client_id = <your_config_id>
database_user_name = <your_database_username>
database_endpoint = <your_database_endpoint>
port = <your_endpoint_number>
database_pwd = <your_database_password>

Find model

Download the model via https://1drv.ms/f/s!AtfKeiTxgnoqjt0M3lrLoowcsjbKcA, name the whole dir as model_data, and put it to <root>/models/bert Please note that the r_net mode is now deprecated. You can try it if you want or you only have limited computation resources.

If you cannot download the model, please contact us at [email protected]

Test cases

To run the test cases, direct into ./test folder, run the three files respectively to test bert mode and database methods.

Launch Nucleus

To launch Nucleus, simply run:

python application.py

then open your browser and visit http://127.0.0.1:5000. Please make sure you are not running any other web app on port 5000.

Required packages:

pre-commit
spacy
tensorflow
tqdm
ujson
flask
warrant
wikipedia
nltk
rake-nltk
mysql-connector-python

CI Configurations and Test Results

All the results, and files required by the professor, including pre-commit and post-commit config, unit test reports, bug-finder reports, are in the result folder. These files should be read only

Implementation Detail

The basic workflow of our context-free mode is:

  1. a user submits a question at the frontend;
  2. Nucleus' backend extract keywords from the question, with rake_nltk API;
  3. these keywords are send to wikipedia API, which returns the pages of these keywords;
  4. we split these pages into a list of paragraphs, each of which is about 700 characters long;
  5. we put the list of paragraphs as contexts and the question into BERT model, and the model returns an answer and a confidence for each of question-context pair;
  6. we select the answer with the best confidence, and return it to the user.

Reference

https://github.com/google-research/bert https://github.com/HKUST-KnowComp/R-Net https://github.com/tensorflow/tensorflow https://github.com/pallets/flask https://github.com/goldsmith/Wikipedia https://github.com/capless/warrant https://github.com/csurfer/rake-nltk

nucleus's People

Contributors

mryimings avatar michaellee955 avatar ylin9 avatar enochxoo avatar

Stargazers

 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.