Code Monkey home page Code Monkey logo

bert-question-answering's Introduction

BERT-Question-Answering

Introduction

The purpose of this project was to fine-tune and cross-evaluate bert-base-uncased model in the following 5 Question Answering datasets:

  1. SQuAD 2.0
  2. TriviaQA
  3. NQ
  4. QuAC
  5. NewsQA

More specifically, i had to first fine-tune and evaluate the BERT model on SQuAD 2.0 dataset and then do the on the other 4 and after that cross-evaluate between each model and dataset by calculating the corresponding f1 scores.

In other words, the task was to reproduce the table 3 of the paper What do Models Learn from Question Answering Datasets?.

This project is a university assignment for the course Deep Learning for Natural Language Processing.

How I worked

  • First of all, I fine-tune the bert-base-uncased on SQuAD 2.0 as I describe in this notebook: Models/qa-with-squad-2.0-dataset-and-bert-base.ipynb
  • Then, using the notebook Evaluation Scripts/BERT_QA_with_SQuAD_2.0_evaluation.ipynb I evaluated the performance of my model on paragraphs and questions I asked on them that were probably not contained in SQuAD dataset.
  • After that, I converted the other 4 datasets described in the paper in SQuAD format and then I fine-tuned BERT on them using the same hyperparameters that gave the best score to the SQuAD 2.0 model (as the authors of the paper worked). The codes of corresponding models can be found in the Models folder.
  • Finally, after fine-tuning all 5 models, I did the cross-evaluation as described in table 3 of the paper. The notebook with which I calculated the scores is Evaluation Scripts/QA_cross-evaluation.ipynb.

Results

The paper equivalent table with the f1 scores I got is the following:

SQuAD TriviaQA NQ QuAC NewsQA
SQuAD 77.71% 34.67% 51.39% 14.45% 44.28%
TriviaQA 40.66% 41.94% 37.60% 8.93% 28.83%
NQ 60.02% 32.41% 59.17% 14.10% 35.77%
QuAC 31.46% 16.65% 31.06% 27.53% 24.25%
NewsQA 64.23% 31.92% 48.44% 15.08% 54.16%

Hyperparameters used

The hyperparameters used are the following:

  • Batch Size = 8
  • Learning Rate = 1e-5
  • Epochs = 3
  • Max Token Sequence Length = 512

The training was done on Kaggle with an Nvidia Tesla P100 GPU

Difficulties I faced during development

Conclusion

As you can see from the table, all 5 models (QuAC not so much :/) generalize good between different datasets. The scores are similar to the paper ones but not so good, but personally I'm happy with the result. Some causes of low scores could be either the fact that sometimes the answers become truncated during tokenization (mainly in the TriviaQA dataset), or because the models have not learned to handle correctly cases where a question does not have any answer.

Large files

Some large files which are too big to upload on gihub are here:

bert-question-answering's People

Contributors

mediabilly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

amberchen122

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.