Code Monkey home page Code Monkey logo

alephbert-ner's Introduction

Named Entity Recognition for Hebrew using BERT

Contact: Lee Fingerhut

Installations Guide

  1. Install an environment manager. Recommeneded: Miniconda3. Here is a Getting Started guide.
  2. Clone the repo:
    git clone https://github.com/LeeFB/AlephBert-NER.git
    cd AlephBert-NER
  3. Create a new environment from environment.yml (you can change the environment name in the file)
    conda env update -f environment.yml
    conda activate ner

Training

usage: ner_training.py [-h] [--seed SEED] [--name NAME] --train-file TRAIN_FILE [--max-seq-len MAX_SEQ_LEN] [--finetune]
                       [--num-epochs NUM_EPOCHS] [--batch-size BATCH_SIZE] [--learning-rate LEARNING_RATE] [--optimizer-eps OPTIMIZER_EPS]
                       [--weight-decay-rate WEIGHT_DECAY_RATE] [--max-grad-norm MAX_GRAD_NORM] [--num-warmup-steps NUM_WARMUP_STEPS]

optional arguments:
  -h, --help            show this help message and exit

general:
  --seed SEED           seed for reproducibility
  --name NAME           name of directory for product

dataset:
  --train-file TRAIN_FILE
                        path to train file
  --max-seq-len MAX_SEQ_LEN
                        maximal sequence length

training:
  --num-epochs NUM_EPOCHS
                        number of epochs to train
  --batch-size BATCH_SIZE
                        batch size

optimizer:
  --learning-rate LEARNING_RATE
                        learning rate
  --optimizer-eps OPTIMIZER_EPS
                        optimizer tolerance
  --weight-decay-rate WEIGHT_DECAY_RATE
                        optimizer weight decay rate
  --max-grad-norm MAX_GRAD_NORM
                        maximal gradients norm

scheduler:
  --num-warmup-steps NUM_WARMUP_STEPS
                        scheduler warmup steps

BERT model is pretrained.
You can enable all its parameters for training.
Example:

python ner_training.py --train-file dataset/dataset.csv --name sprml-train

FineTuning

BERT model is pretrained.
you can freeze the encoder and finetune the classifier solely, by simply adding --finetune to training command.
Example:

python ner_training.py --train-file dataset/dataset.csv --name sprml-finetune --finetune

Predicting

usage: ner_predict.py [-h] --checkpoint CHECKPOINT [--sentence SENTENCE]

optional arguments:
  -h, --help            show this help message and exit
  --checkpoint CHECKPOINT
                        checkpoint directory
  --sentence SENTENCE   sentence to apply NER

Predicting NER for a test sentence:

python ner_predict.py --checkpoint checkpoints/<checkpoint dir> --sentence "הרלין הכלב הלך לטייל בחוף הים."

alephbert-ner's People

Contributors

lee-fingerhut avatar peleg122 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.