Code Monkey home page Code Monkey logo

pos-tagger-bert's Introduction

Α Pos Tagger trained on UD treebank with fine-tuning a BERT model

The task of this work is to develop a part-of-speech (POS) tagger for the English language of the Universal Dependencies treebanks, by fine-tuning a pre-trained BERT model, using Keras and Tensorflow Hub module.

Transformer Models

Following a different approach from feature-based learning, Transfer learning - pre-training a neural network model on a known task, and then performing fine-tuning - using the trained neural network as the basis of a new purpose-specific model, was firstly known in the field of computer vision but can be also useful in many natural language tasks. A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely was proposed by paper “Attention Is All You Need”.

BERT makes use of Transformer, to learn contextual representations of words (or sub-words) which can then be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of NLP tasks.

Alignment (token -> tag)

According to “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” paper, for Named Entity Recognition task (section 4.3):

For fine-tuning, we feed the final hidden representation Ti ∈ RH for each token i into a classification layer over the NER label set. The predictions are not conditioned on the surrounding predictions (i.e., non-autoregressive and no CRF). To make this compatible with WordPiece tokenization, we feed each CoNLL-tokenized input word into our WordPiece tokenizer and use the hidden state corresponding to the first sub-token as input to the classifier. For example:

Jim   Hen   ##son was a puppet ##eer
I-PER I-PER X     O   O O      X

Where no prediction is made for X. Since the WordPiece tokenization boundaries are a known part of the input, this is done for both training and test.

So for our POS tagging task, due to WordPiece tokenizer we must take special care for the correct alignment of token to tag. For this we keep an original-to-tokenized map which can then be used to project labels to the tokenized representation. This is done in function convert_single_example() and we can see how it works with an example:

Original tokens: ['creative', 'commons', 'makes', 'no', 'warranties', 'regarding', 'the', 'information', 'provided', ',', 'and', 'disclaims', 'liability', 'for', 'damages', 'resulting', 'from', 'its', 'use', '.']
BERT tokens: ['[CLS]', 'creative', 'commons', 'makes', 'no', 'warrant', '##ies', 'regarding', 'the', 'information', 'provided', ',', 'and', 'disc', '##lai', '##ms', 'liability', 'for', 'damages', 'resulting', 'from', 'its', 'use', '.', '[SEP]']
labels: ['-PAD-', 'PROPN', 'PROPN', 'VERB', 'DET', 'NOUN', 'VERB', 'DET', 'NOUN', 'VERB', 'PUNCT', 'CCONJ', 'VERB', 'NOUN', 'ADP', 'NOUN', 'VERB', 'ADP', 'DET', 'NOUN', 'PUNCT', '-PAD-']
orig_to_tok_map: [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]

Here we have 2 tokens that are splitted to sub-tokens by WordPiece tokenizer:

'warranties' -> 'warrant', '##ies'
'disclaims' -> 'disc', '##lai', '##ms'

And this is where our alignment code keeps the appropriate sub-token. For example, the original matching 'warranties' -> 'NOUN', becomes after aligning '##ies' -> 'NOUN'. We tested the other (proposed in run_classifier.py) possible alignment 'warrant' -> 'NOUN', but with worse results! So we use the hidden state corresponding to the last sub-token as input to the classifier.

BERT Representation

We use the appropriate functions to convert our dataset to BERT features: input_ids, input_masks, segment_ids. Then we one-hot encode labels.

Model Architecture

Confusion matrix

Tag an unknown sentence

Lets tag an unknown sentence: 'Word embeddings provide a dense representation of words and their relative meanings.':

Word in BERT layer  | Initial word   : Predicted POS-tag
-------------------------------------------------------------
word                | word           : NOUN           
##s                 | embeddings     : NOUN           
provide             | provide        : VERB           
a                   | a              : DET            
dense               | dense          : ADJ            
representation      | representation : NOUN           
of                  | of             : ADP            
words               | words          : NOUN           
and                 | and            : CCONJ          
their               | their          : DET            
relative            | relative       : ADJ            
meanings            | meanings       : NOUN           
.                   | .              : PUNCT          

Here we can see how tokenization and our alignment code works, and how good the prediction is even for word 'embeddings', which was unknown and very difficult for the other models to tag correctly!

Acknowledgement

Natural Language Processing course is part of the MSc in Computer Science of the Department of Informatics, Athens University of Economics and Business. The course covers algorithms, models and systems that allow computers to process natural language texts and/or speech.

pos-tagger-bert's People

Contributors

soutsios avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.