Code Monkey home page Code Monkey logo

fill-the-gap's Introduction

Fill-the-GAP

This is the 4th solution to the Gendered Pronoun Resolution Competition on Kaggle.

Solution Overview

1. Input Dropout

I've played with BERT in other tasks where I found there are some redundancies in BERT vector. Even though we only use a small portion (like 50%) of the BERT vector, we still can get desirable performance.

Based on this observation, I placed a dropout with a large rate just after the input layer, which can be considered as a kind of model boosting, just like training several prototypes with subsets that are randomly sampled from the BERT vector.

2. Word Encoder

As I mentioned in section 1, it might not be suitable to use the output directly because of redundancies. Therefore I use a word encoder to down-project the BERT vector into a lower-dimensional space where I can extract task-related features efficiently.

The word encoder is a simple affine transformation with SELU activation and it is shared for A, B, and P. I have tried to design the word encoder for names and pronouns independently or make the word encoder deeper with highway transformations but all of them results in overfitting.

This idea is also inspired by the multi-head transformation. I have implemented a multi-head NLI encoder but it only improved the performance by ~0.0005 and took much computation time. So maybe a single head is good enough for this task.

3. Answer selection using NLI architectures

I consider this task a sub-task of answer selection. Given queries A, B, and an answer P, we can model the relations between queries and answers with heuristic interaction:

I(Q, A) = [[Q; A], Q - A, Q * A]

and then extract features from the interaction vector I(Q, A) with a siamese encoder. The overall architecture would be like this:

Model

Finally, here is a simple performacne report of my models:

Model 5 fold CV on Stage 1
Base BERT 0.50
Base BERT + input dropout 0.45
Base BERT + input dropout + NLI 0.43
Base BERT + all 0.39
Large BERT + input dropout 0.39
Large BERT + all 0.32
Ensemble of Base BERT and Large BERT 0.30

Note

The code is still under cleaning. There still exists some dirty methods for the trade-off between efficiency and scalability. For notebook stage 0.1 ~ 0.6, it's not necessary to use a for loop to dump features from each layer. The offical API supports to dump all of them at the same time.

Citation

If you find this repository is useful for your research, please cite our paper:

@inproceedings{yang2019fill,
  title={Fill the GAP: Exploiting BERT for Pronoun Resolution},
  author={Yang, Kai-Chou and Niven, Timothy and Chou, Tzu Hsuan and Kao, Hung-Yu},
  booktitle={Proceedings of the First Workshop on Gender Bias in Natural Language Processing},
  pages={102--106},
  year={2019}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.