Contact: Lee Fingerhut
- Install an environment manager. Recommeneded: Miniconda3. Here is a Getting Started guide.
- Clone the repo:
git clone https://github.com/LeeFB/AlephBert-NER.git cd AlephBert-NER
- Create a new environment from environment.yml (you can change the environment name in the file)
conda env update -f environment.yml conda activate ner
usage: ner_training.py [-h] [--seed SEED] [--name NAME] --train-file TRAIN_FILE [--max-seq-len MAX_SEQ_LEN] [--finetune]
[--num-epochs NUM_EPOCHS] [--batch-size BATCH_SIZE] [--learning-rate LEARNING_RATE] [--optimizer-eps OPTIMIZER_EPS]
[--weight-decay-rate WEIGHT_DECAY_RATE] [--max-grad-norm MAX_GRAD_NORM] [--num-warmup-steps NUM_WARMUP_STEPS]
optional arguments:
-h, --help show this help message and exit
general:
--seed SEED seed for reproducibility
--name NAME name of directory for product
dataset:
--train-file TRAIN_FILE
path to train file
--max-seq-len MAX_SEQ_LEN
maximal sequence length
training:
--num-epochs NUM_EPOCHS
number of epochs to train
--batch-size BATCH_SIZE
batch size
optimizer:
--learning-rate LEARNING_RATE
learning rate
--optimizer-eps OPTIMIZER_EPS
optimizer tolerance
--weight-decay-rate WEIGHT_DECAY_RATE
optimizer weight decay rate
--max-grad-norm MAX_GRAD_NORM
maximal gradients norm
scheduler:
--num-warmup-steps NUM_WARMUP_STEPS
scheduler warmup steps
BERT model is pretrained.
You can enable all its parameters for training.
Example:
python ner_training.py --train-file dataset/dataset.csv --name sprml-train
BERT model is pretrained.
you can freeze the encoder and finetune the classifier solely, by simply adding --finetune
to training command.
Example:
python ner_training.py --train-file dataset/dataset.csv --name sprml-finetune --finetune
usage: ner_predict.py [-h] --checkpoint CHECKPOINT [--sentence SENTENCE]
optional arguments:
-h, --help show this help message and exit
--checkpoint CHECKPOINT
checkpoint directory
--sentence SENTENCE sentence to apply NER
Predicting NER for a test sentence:
python ner_predict.py --checkpoint checkpoints/<checkpoint dir> --sentence "הרלין הכלב הלך לטייל בחוף הים."