Code Monkey home page Code Monkey logo

ace's Introduction

ACE

The code is for our ACL-IJCNLP 2021 paper: Automated Concatenation of Embeddings for Structured Prediction

ACE is a framework for automatically searching a good embedding concatenation for structured prediction tasks and achieving state-of-the-art accuracy. The code is based on flair version 0.4.3 with a lot of modifications.

PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC

News

  • 2022-11: AdaSeq: An all-in-one and easy-to-use library for developing sequence understanding models is released.
  • 2022-03: Our newest state-of-the-art NER system KB-NER is released!
  • 2021-07: New versions of document-level SOTA NER models are released, see Instructions for Reproducing Results for more details.

Comparison with State-of-the-Art

Task Language Dataset ACE Previous best
Named Entity Recognition English CoNLL 03 (document-level) 94.6 (F1) 94.3 (Yamada et al., 2020)
Named Entity Recognition German CoNLL 03 (document-level) 88.3 (F1) 86.4 (Yu et al., 2020)
Named Entity Recognition German CoNLL 03 (06 Revision) (document-level) 91.7 (F1) 90.3 (Yu et al., 2020)
Named Entity Recognition Dutch CoNLL 02 (document-level) 95.7 (F1) 93.7 (Yu et al., 2020)
Named Entity Recognition Spanish CoNLL 02 (document-level) 90.4 (F1) 90.3 (Yu et al., 2020)
Named Entity Recognition English CoNLL 03 (sentence-level) 93.6 (F1) 93.5 (Baevski et al., 2019)
Named Entity Recognition German CoNLL 03 (sentence-level) 87.0 (F1) 86.4 (Yu et al., 2020)
Named Entity Recognition German CoNLL 03 (06 Revision) (sentence-level) 90.5 (F1) 90.3 (Yu et al., 2020)
Named Entity Recognition Dutch CoNLL 02 (sentence-level) 94.6 (F1) 93.7 (Yu et al., 2020)
Named Entity Recognition Spanish CoNLL 02 (sentence-level) 89.1 (F1) 90.3 (Yu et al., 2020)
POS Tagging English Ritter's 93.4 (Acc) 90.1 (Nguyen et al., 2020)
POS Tagging English Ark 94.4 (Acc) 94.1 (Nguyen et al., 2020)
POS Tagging English TweeBank v2 95.8 (Acc) 95.2 (Nguyen et al., 2020)
Aspect Extraction English SemEval 2014 Laptop 87.4 (F1) 84.3 (Xu et al., 2019)
Aspect Extraction English SemEval 2014 Restaurant 92.0 (F1) 87.1 (Wei et al., 2020)
Aspect Extraction English SemEval 2015 Restaurant 80.3 (F1) 72.7 (Wei et al., 2020)
Aspect Extraction English SemEval 2016 Restaurant 81.3 (F1) 78.0 (Xu et al., 2019)
Dependency Parsing English PTB 95.7 (LAS) 95.3 (Wang et al., 2020)
Semantic Dependency Parsing English DM ID 95.6 (LF1) 94.4 (Fernández-González and Gómez-Rodríguez, 2020)
Semantic Dependency Parsing English DM OOD 92.6 (LF1) 91.0 (Fernández-González and Gómez-Rodríguez, 2020)
Semantic Dependency Parsing English PAS ID 95.8 (LF1) 95.1 (Fernández-González and Gómez-Rodríguez, 2020)
Semantic Dependency Parsing English PAS OOD 94.6 (LF1) 93.4 (Fernández-González and Gómez-Rodríguez, 2020)
Semantic Dependency Parsing English PSD ID 83.8 (LF1) 82.6 (Fernández-González and Gómez-Rodríguez, 2020)
Semantic Dependency Parsing English PSD OOD 83.4 (LF1) 82.0 (Fernández-González and Gómez-Rodríguez, 2020)

Guide

Requirements

The project is based on PyTorch 1.1+ and Python 3.6+. To run our code, install:

pip install -r requirements.txt

The following requirements should be satisfied:

Download Embeddings

In our code, most of the embeddings can be downloaded automatically (except ELMo for non-English languages). You can also download the embeddings manually. The embeddings we used in the paper can be downloaded here:

Name Link
GloVe nlp.stanford.edu/projects/glove
fastText github.com/facebookresearch/fastText
ELMo github.com/allenai/allennlp
ELMo (Other languages) github.com/TalSchuster/CrossLingualContextualEmb
BERT huggingface.co/bert-base-cased
M-BERT huggingface.co/bert-base-multilingual-cased
BERT (Dutch) huggingface.co/wietsedv/bert-base-dutch-cased
BERT (German) huggingface.co/bert-base-german-dbmdz-cased
BERT (Spanish) huggingface.co/dccuchile/bert-base-spanish-wwm-cased
BERT (Turkish) huggingface.co/dbmdz/bert-base-turkish-cased
XLM-R huggingface.co/xlm-roberta-large
XLM-R (CoNLL 02 Dutch) huggingface.co/xlm-roberta-large-finetuned-conll02-dutch
XLM-R (CoNLL 02 Spanish) huggingface.co/xlm-roberta-large-finetuned-conll02-spanish
XLM-R (CoNLL 03 English) huggingface.co/xlm-roberta-large-finetuned-conll03-english
XLM-R (CoNLL 03 German) huggingface.co/xlm-roberta-large-finetuned-conll03-german
XLNet huggingface.co/xlnet-large-cased

After the embeddings are downloaded, you need to set the path of embeddings in the config file manually. For example in config/conll_03_english.yaml:

TransformerWordEmbeddings-1:
    model: your/embedding/path 
    layers: -1,-2,-3,-4
    pooling_operation: mean

Pretrained Models

We provide pretrained models for Named Entity Recognition (Sentence-/Document-Level) and Dependency Parsing (PTB) on OneDrive. You can find the corresponding config file in config/. For the zip files named with doc*.zip, you need to extract document-level embeddings at first. Please check (Optional) Extract Document Features for BERT Embeddings.

  • Download models
  • unzip the zip file
  • Put the directory in the resources/taggers/

To check the accuracy of the model, run:

CUDA_VISIBLE_DEVICES=0 python train.py --config config/conll_03_english.yaml --test

where --config $config_file is setting the configureation file. Here we take CoNLL 2003 English NER as an example. The $config_file is config/conll_03_english.yaml.

Instructions for Reproducing Results

Currently, we give an instruction for reproducing the results of our NER models is in named_entity_recognition.md. Other tasks can simply follow the guide of named_entity_recognition.md to reproduce the results.


Training

Training ACE Models

To train the model, run:

CUDA_VISIBLE_DEVICES=0 python train.py --config $config_file

Train on Your Own Dataset

To set the dataset manully, you can set the dataset in the $confile_file by:

Sequence Labeling:

targets: ner
ner:
  Corpus: ColumnCorpus-1
  ColumnCorpus-1: 
    data_folder: datasets/conll_03_new
    column_format:
      0: text
      1: pos
      2: chunk
      3: ner
    tag_to_bioes: ner
  tag_dictionary: resources/taggers/your_ner_tags.pkl

Parsing:

targets: dependency
dependency:
  Corpus: UniversalDependenciesCorpus-1
  UniversalDependenciesCorpus-1:
    data_folder: datasets/ptb
    add_root: True
  tag_dictionary: resources/taggers/your_parsing_tags.pkl

The tag_dictionary is a path to the tag dictionary for the task. If the path does not exist, the code will generate a tag dictionary at the path automatically. The dataset format is: Corpus: $CorpusClassName-$id, where $id is the name of datasets (anything you like). You can train multiple datasets jointly. For example:

Corpus: ColumnCorpus-1:ColumnCorpus-2:ColumnCorpus-3

ColumnCorpus-1:
  data_folder: ...
  column_format: ...
  tag_to_bioes: ...

ColumnCorpus-2:
  data_folder: ...
  column_format: ...
  tag_to_bioes: ...

ColumnCorpus-3:
  data_folder: ...
  column_format: ...
  tag_to_bioes: ...

Please refer to Config File for more details.

Set the Embeddings

You need to modifiy the embedding paths in the $config_file to change the embeddings for concatenation. For example, you need to add bert-large-cased in the config/conll_03_english.yaml

embeddings:
  TransformerWordEmbeddings-0:
    layers: '-1'
    pooling_operation: first
    model: xlm-roberta-large-finetuned-conll03-english

  TransformerWordEmbeddings-1:
    model: bert-base-cased
    layers: -1,-2,-3,-4
    pooling_operation: mean

  TransformerWordEmbeddings-2:
    model: bert-base-multilingual-cased
    layers: -1,-2,-3,-4
    pooling_operation: mean

  TransformerWordEmbeddings-3: # New embeddings
    model: bert-large-cased
    layers: -1,-2,-3,-4
    pooling_operation: mean
  ...

(Optional) Fine-tune Transformer-based Embeddings

To archieve state-of-the-art accuracy, one optional approach is fine-tuning the transformer-based embeddings over the task. We use fine-tuned embeddings in huggingface for NER tasks while embeddings in other tasks are fine-tuned by ourselves. Then take the embeddings as an embedding candidate of ACE. Taking fine-tuning BERT model on PTB parsing as an example, run:

CUDA_VISIBLE_DEVICES=0 python train.py --config config/en-bert-finetune-ptb.yaml

After the model is fine-tuned, you will find a tuned BERT model at

ls resources/taggers/en-bert_10epoch_0.5inter_2000batch_0.00005lr_20lrrate_ptb_monolingual_nocrf_fast_warmup_freezing_beta_weightdecay_finetune_saving_nodev_dependency16/bert-base-cased

Then, replace bert-base-cased with resources/taggers/en-bert_10epoch_0.5inter_2000batch_0.00005lr_20lrrate_ptb_monolingual_nocrf_fast_warmup_freezing_beta_weightdecay_finetune_saving_nodev_dependency16/bert-base-cased in the $config_file of the ACE model (for example, config/ptb_parsing_model.yaml).

The config config/en-bert-finetune-ptb.yaml can be applied to fine-tuning other embeddings in parsing tasks. Here is an example config for fine-tuning NER (sequence labeling tasks): config/en-bert-finetune-ner.yaml

(Optional) Extract Document Features for BERT Embeddings

To archieve state-of-the-art accuracy of NER, one optional approach is extracting the document-level features from the BERT embeddings (for RoBERTa, XLM-R and XLNET, we feed the model with the whole document, if you are interested in this part, see embeddings.py). Then take the features as an embedding candidate of ACE. We follow the embedding extraction approach of Yu et al., 2020. We use the sentences with a single word -DOCSTART- to split the documents. For CoNLL 2002 Spanish, there is not -DOCSTART- sentences. Therefore, we add a -DOCSTART- sentence for every 25 sentences. For CoNLL 2002 Dutch, the -DOCSTART- is in the first sentence of the document, please split the -DOCSTART- token into a single sentence. For example:

-DOCSTART- -DOCSTART- O

De Art O
tekst N O
van Prep O
het Art O
arrest N O
is V O
nog Adv O
niet Adv O
schriftelijk Adj O
beschikbaar Adj O
maar Conj O
het Art O
bericht N O
werd V O
alvast Adv O
bekendgemaakt V O
door Prep O
een Art O
communicatiebureau N O
dat Conj O
Floralux N B-ORG
inhuurde V O
. Punc O

...

Taking English BERT model on CoNLL English NER as an example, run:

CUDA_VISIBLE_DEVICES=0 python extract_features.py --config config/en-bert-extract.yaml --batch_size 32 

Parse files

If you want to parse a certain file, add train in the file name and put the file in a certain $dir (for example, parse_file_dir/train.your_file_name). Run:

CUDA_VISIBLE_DEVICES=0 python train.py --config $config_file --parse --target_dir $dir --keep_order

The format of the file should be column_format={0: 'text', 1:'ner'} for sequence labeling or you can modifiy line 337 in train.py. The parsed results will be in outputs/. Note that you may need to preprocess your file with the dummy tags for prediction, please check this issue for more details.

Config File

The config files are based on yaml format.

  • targets: The target task
    • ner: named entity recognition
    • upos: part-of-speech tagging
    • chunk: chunking
    • ast: abstract extraction
    • dependency: dependency parsing
    • enhancedud: semantic dependency parsing/enhanced universal dependency parsing
  • ner: An example for the targets. If targets: ner, then the code will read the values with the key of ner.
    • Corpus: The training corpora for the model, use : to split different corpora.
    • tag_dictionary: A path to the tag dictionary for the task. If the path does not exist, the code will generate a tag dictionary at the path automatically.
  • target_dir: Save directory.
  • model_name: The trained models will be save in $target_dir/$model_name.
  • model: The model to train, depending on the task.
    • FastSequenceTagger: Sequence labeling model. The values are the parameters.
    • SemanticDependencyParser: Syntactic/semantic dependency parsing model. The values are the parameters.
  • embeddings: The embeddings for the model, each key is the class name of the embedding and the values of the key are the parameters, see flair/embeddings.py for more details. For each embedding, use $classname-$id to represent the class. For example, if you want to use BERT and M-BERT for a single model, you can name: TransformerWordEmbeddings-0, TransformerWordEmbeddings-1.
  • trainer: The trainer class.
    • ModelFinetuner: The trainer for fine-tuning embeddings or simply train a task model without ACE.
    • ReinforcementTrainer: The trainer for training ACE.
  • train: the parameters for the train function in trainer (for example, ReinforcementTrainer.train()).

TODO

Citing Us

If you feel the code helpful, please cite:

@inproceedings{wang2020automated,
    title = "{{Automated Concatenation of Embeddings for Structured Prediction}}",
    author = "Wang, Xinyu and Jiang, Yong and Bach, Nguyen and Wang, Tao and Huang, Zhongqiang and Huang, Fei and Tu, Kewei",
    booktitle = "{the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (\textbf{ACL-IJCNLP 2021})}",
    month = aug,
    year = "2021",
    publisher = "Association for Computational Linguistics",
    
}

Contact

Feel free to email your questions or comments to issues or to Xinyu Wang.

ace's People

Contributors

jiangyong2014 avatar wangxinyu0922 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ace's Issues

Test PTB Dependency Parsing Model

Hi there!

I am trying to test with your pretrained dependency parsing model. However, I cannot find your processed PTB dataset. Can you share it with a link?

Also, I am wondering how to inference with my own data. For example, how can I feed one sentence and get its tagging result?

Is GPU required?

Hello,
I run python train.py --config config/conll_03_english.yaml --test
on Windows 10, python 3.7, no GPU and got AssertionError: Torch not compiled with CUDA enabled at
https://github.com/Alibaba-NLP/ACE/blob/main/flair/trainers/distillation_trainer.py#L1160

Is there a work-around for no GPU env? Thank you.

Traceback (most recent call last): File "<string>", line 1, in <module> File "C:\Users\ebb\.conda\envs\ace_py37\lib\site-packages\torch\nn\modules\module.py", line 305, in cuda return self._apply(lambda t: t.cuda(device)) File "C:\Users\ebb\.conda\envs\ace_py37\lib\site-packages\torch\nn\modules\module.py", line 202, in _apply module._apply(fn) File "C:\Users\ebb\.conda\envs\ace_py37\lib\site-packages\torch\nn\modules\module.py", line 202, in _apply module._apply(fn) File "C:\Users\ebb\.conda\envs\ace_py37\lib\site-packages\torch\nn\modules\module.py", line 224, in _apply param_applied = fn(param) File "C:\Users\ebb\.conda\envs\ace_py37\lib\site-packages\torch\nn\modules\module.py", line 305, in <lambda> return self._apply(lambda t: t.cuda(device)) File "C:\Users\ebb\.conda\envs\ace_py37\lib\site-packages\torch\cuda\__init__.py", line 192, in _lazy_init _check_driver() File "C:\Users\ebb\.conda\envs\ace_py37\lib\site-packages\torch\cuda\__init__.py", line 95, in _check_driver raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

ACE for text classification

Hi there,

How complex it would be to use ACE for text classification. If possible, could you please add a working example for the same.

Regards
Raj

System requirements for implementing it

i would like to know about the system requirements for running this effectively as i am facing issues with my build of 32 gb cpu ram nvidia 1080 of 12 gb

Parsed order for prediction of test set is randomized

I have tried to run:

CUDA_VISIBLE_DEVICES=0 python train.py --config config/nergrit_indo.yaml --parse --target_dir $dir --keep_order

for a indonesian NER dataset trained with ACE. However in the ouput .conllu file, the output sentence order is always randomized, each time I run the command, and does not match the sentence order in my test set.

Kindly assist, thank you!

(NER) Error when training a custom dataset with document-level

Hello everyone,

I'm having an error trying to use a custom dataset and training it with a document-level model for the NER task. In this case, I'm copying the configuration file xlnet-doc-en-ner-finetune.yaml and modifying, based on the documentation provided in this repository, to use my custom dataset.

This custom dataset has 3 files: train, dev, and test. There are 10 different labels (IOB2 annotation scheme). I created a tag_dictionary based on the same way this file was created, just adding more tags according to my necessities. The format, structure, and everything else are identical to the normal CONLL2003 used in this repository. I've been using this custom dataset with sentence-level models and I had no problems at all. Errors do only happen when I use document-level models. I also should mention that the files I'm using are updated within the last commit of this repository.

This is the error:

(ace) azeredo@ix-ws28:~/test_ace/test_recent_ace/ACE$ CUDA_VISIBLE_DEVICES=0 python train.py --config /home/azeredo/test_ace/test_recent_ace/ACE/config/xlnet-doc-en-ner-finetune.yaml
/home/azeredo/test_ace/test_recent_ace/ACE/flair/utils/params.py:104: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  dict_merge.dict_merge(params_dict, yaml.load(f))
2021-10-23 14:58:07,820 Reading data from /home/azeredo/test_ace/data/harem_conll
2021-10-23 14:58:07,820 Train: /home/azeredo/test_ace/data/harem_conll/harem_default.train
2021-10-23 14:58:07,820 Dev: /home/azeredo/test_ace/data/harem_conll/harem_default.dev
2021-10-23 14:58:07,820 Test: /home/azeredo/test_ace/data/harem_conll/harem_default.test
2021-10-23 14:58:10,804 {b'<unk>': 0, b'<START>': 1, b'<STOP>': 2, b'O': 3, b'B-PESSOA': 4, b'I-PESSOA': 5, b'S-PESSOA': 6, b'E-PESSOA': 7, b'B-ORGANIZACAO': 8, b'I-ORGANIZACAO': 9, b'S-ORGANIZACAO': 10, b'E-ORGANIZACAO': 11, b'B-TEMPO': 12, b'I-TEMPO': 13, b'S-TEMPO': 14, b'E-TEMPO': 15, b'B-LOCAL': 16, b'I-LOCAL': 17, b'S-LOCAL': 18, b'E-LOCAL': 19, b'B-OBRA': 20, b'I-OBRA': 21, b'S-OBRA': 22, b'E-OBRA': 23, b'B-ACONTECIMENTO': 24, b'I-ACONTECIMENTO': 25, b'S-ACONTECIMENTO': 26, b'E-ACONTECIMENTO': 27, b'B-ABSTRACCAO': 28, b'I-ABSTRACCAO': 29, b'S-ABSTRACCAO': 30, b'E-ABSTRACCAO': 31, b'B-COISA': 32, b'I-COISA': 33, b'S-COISA': 34, b'E-COISA': 35, b'B-VALOR': 36, b'I-VALOR': 37, b'S-VALOR': 38, b'E-VALOR': 39, b'B-VARIADO': 40, b'I-VARIADO': 41, b'S-VARIADO': 42, b'E-VARIADO': 43}
2021-10-23 14:58:10,804 Corpus: 5155 train + 137 dev + 3658 test sentences
[2021-10-23 14:58:11,302 INFO] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/xlnet-base-cased-config.json from cache at /home/azeredo/.cache/torch/transformers/c9cc6e53904f7f3679a31ec4af244f4419e25ebc8e71ebf8c558a31cbcf07fc8.69e5e35e0b798cab5e473f253752f8bf4d280ee37682281a23eed80f6e2d09c6
[2021-10-23 14:58:11,303 INFO] Model config XLNetConfig {
  "architectures": [
    "XLNetLMHeadModel"
  ],
  "attn_type": "bi",
  "bi_data": false,
  "bos_token_id": 1,
  "clamp_len": -1,
  "d_head": 64,
  "d_inner": 3072,
  "d_model": 768,
  "dropout": 0.1,
  "end_n_top": 5,
  "eos_token_id": 2,
  "ff_activation": "gelu",
  "initializer_range": 0.02,
  "layer_norm_eps": 1e-12,
  "mem_len": null,
  "model_type": "xlnet",
  "n_head": 12,
  "n_layer": 12,
  "pad_token_id": 5,
  "reuse_len": null,
  "same_length": false,
  "start_n_top": 5,
  "summary_activation": "tanh",
  "summary_last_dropout": 0.1,
  "summary_type": "last",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 250
    }
  },
  "untie_r": true,
  "vocab_size": 32000
}

[2021-10-23 14:58:11,757 INFO] loading file https://s3.amazonaws.com/models.huggingface.co/bert/xlnet-base-cased-spiece.model from cache at /home/azeredo/.cache/torch/transformers/dad589d582573df0293448af5109cb6981ca77239ed314e15ca63b7b8a318ddd.8b10bd978b5d01c21303cc761fc9ecd464419b3bf921864a355ba807cfbfafa8
[2021-10-23 14:58:12,262 INFO] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/xlnet-base-cased-config.json from cache at /home/azeredo/.cache/torch/transformers/c9cc6e53904f7f3679a31ec4af244f4419e25ebc8e71ebf8c558a31cbcf07fc8.69e5e35e0b798cab5e473f253752f8bf4d280ee37682281a23eed80f6e2d09c6
[2021-10-23 14:58:12,263 INFO] Model config XLNetConfig {
  "architectures": [
    "XLNetLMHeadModel"
  ],
  "attn_type": "bi",
  "bi_data": false,
  "bos_token_id": 1,
  "clamp_len": -1,
  "d_head": 64,
  "d_inner": 3072,
  "d_model": 768,
  "dropout": 0.1,
  "end_n_top": 5,
  "eos_token_id": 2,
  "ff_activation": "gelu",
  "initializer_range": 0.02,
  "layer_norm_eps": 1e-12,
  "mem_len": null,
  "model_type": "xlnet",
  "n_head": 12,
  "n_layer": 12,
  "output_hidden_states": true,
  "pad_token_id": 5,
  "reuse_len": null,
  "same_length": false,
  "start_n_top": 5,
  "summary_activation": "tanh",
  "summary_last_dropout": 0.1,
  "summary_type": "last",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 250
    }
  },
  "untie_r": true,
  "vocab_size": 32000
}

[2021-10-23 14:58:12,536 INFO] loading weights file https://cdn.huggingface.co/xlnet-base-cased-pytorch_model.bin from cache at /home/azeredo/.cache/torch/transformers/33d6135fea0154c088449506a4c5f9553cb59b6fd040138417a7033af64bb8f9.7eac4fe898a021204e63c88c00ea68c60443c57f94b4bc3c02adbde6465745ac
[2021-10-23 14:58:14,071 INFO] All model checkpoint weights were used when initializing XLNetModel.

[2021-10-23 14:58:14,071 INFO] All the weights of XLNetModel were initialized from the model checkpoint at xlnet-base-cased.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use XLNetModel for predictions without further training.
2021-10-23 14:58:15,714 Model Size: 116752172
Corpus: 5034 train + 129 dev + 3530 test sentences
2021-10-23 14:58:15,738 ----------------------------------------------------------------------------------------------------
2021-10-23 14:58:15,739 Model: "FastSequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): TransformerWordEmbeddings(
      (model): XLNetModel(
        (word_embedding): Embedding(32000, 768)
        (layer): ModuleList(
          (0): XLNetLayer(
            (rel_attn): XLNetRelativeAttention(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (ff): XLNetFeedForward(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (layer_1): Linear(in_features=768, out_features=3072, bias=True)
              (layer_2): Linear(in_features=3072, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): XLNetLayer(
            (rel_attn): XLNetRelativeAttention(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (ff): XLNetFeedForward(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (layer_1): Linear(in_features=768, out_features=3072, bias=True)
              (layer_2): Linear(in_features=3072, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (2): XLNetLayer(
            (rel_attn): XLNetRelativeAttention(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (ff): XLNetFeedForward(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (layer_1): Linear(in_features=768, out_features=3072, bias=True)
              (layer_2): Linear(in_features=3072, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (3): XLNetLayer(
            (rel_attn): XLNetRelativeAttention(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (ff): XLNetFeedForward(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (layer_1): Linear(in_features=768, out_features=3072, bias=True)
              (layer_2): Linear(in_features=3072, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (4): XLNetLayer(
            (rel_attn): XLNetRelativeAttention(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (ff): XLNetFeedForward(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (layer_1): Linear(in_features=768, out_features=3072, bias=True)
              (layer_2): Linear(in_features=3072, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (5): XLNetLayer(
            (rel_attn): XLNetRelativeAttention(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (ff): XLNetFeedForward(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (layer_1): Linear(in_features=768, out_features=3072, bias=True)
              (layer_2): Linear(in_features=3072, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (6): XLNetLayer(
            (rel_attn): XLNetRelativeAttention(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (ff): XLNetFeedForward(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (layer_1): Linear(in_features=768, out_features=3072, bias=True)
              (layer_2): Linear(in_features=3072, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (7): XLNetLayer(
            (rel_attn): XLNetRelativeAttention(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (ff): XLNetFeedForward(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (layer_1): Linear(in_features=768, out_features=3072, bias=True)
              (layer_2): Linear(in_features=3072, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (8): XLNetLayer(
            (rel_attn): XLNetRelativeAttention(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (ff): XLNetFeedForward(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (layer_1): Linear(in_features=768, out_features=3072, bias=True)
              (layer_2): Linear(in_features=3072, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (9): XLNetLayer(
            (rel_attn): XLNetRelativeAttention(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (ff): XLNetFeedForward(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (layer_1): Linear(in_features=768, out_features=3072, bias=True)
              (layer_2): Linear(in_features=3072, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (10): XLNetLayer(
            (rel_attn): XLNetRelativeAttention(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (ff): XLNetFeedForward(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (layer_1): Linear(in_features=768, out_features=3072, bias=True)
              (layer_2): Linear(in_features=3072, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (11): XLNetLayer(
            (rel_attn): XLNetRelativeAttention(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (ff): XLNetFeedForward(
              (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (layer_1): Linear(in_features=768, out_features=3072, bias=True)
              (layer_2): Linear(in_features=3072, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
  )
  (word_dropout): WordDropout(p=0.1)
  (linear): Linear(in_features=768, out_features=44, bias=True)
)"
2021-10-23 14:58:15,739 ----------------------------------------------------------------------------------------------------
2021-10-23 14:58:15,739 Corpus: "Corpus: 5034 train + 129 dev + 3530 test sentences"
2021-10-23 14:58:15,739 ----------------------------------------------------------------------------------------------------
2021-10-23 14:58:15,739 Parameters:
2021-10-23 14:58:15,739  - Optimizer: "AdamW"
2021-10-23 14:58:15,739  - learning_rate: "5e-06"
2021-10-23 14:58:15,739  - mini_batch_size: "1"
2021-10-23 14:58:15,740  - patience: "10"
2021-10-23 14:58:15,740  - anneal_factor: "0.5"
2021-10-23 14:58:15,740  - max_epochs: "10"
2021-10-23 14:58:15,740  - shuffle: "True"
2021-10-23 14:58:15,740  - train_with_dev: "False"
2021-10-23 14:58:15,740  - word min_freq: "-1"
2021-10-23 14:58:15,740 ----------------------------------------------------------------------------------------------------
2021-10-23 14:58:15,740 Model training base path: "resources/taggers/test-xlnet-base-cased"
2021-10-23 14:58:15,740 ----------------------------------------------------------------------------------------------------
2021-10-23 14:58:15,740 Device: cuda:0
2021-10-23 14:58:15,740 ----------------------------------------------------------------------------------------------------
2021-10-23 14:58:15,740 Embeddings storage mode: none
2021-10-23 14:58:16,362 ----------------------------------------------------------------------------------------------------
2021-10-23 14:58:16,365 Current loss interpolation: 1
['xlnet-base-cased_v2doc']
2021-10-23 14:58:16,769 epoch 1 - iter 0/5034 - loss 282.73196411 - samples/sec: 2.48 - decode_sents/sec: 1741.10
2021-10-23 15:00:02,011 epoch 1 - iter 503/5034 - loss 19.12329946 - samples/sec: 4.99 - decode_sents/sec: 1277852.76
2021-10-23 15:01:45,008 epoch 1 - iter 1006/5034 - loss 16.76969104 - samples/sec: 5.10 - decode_sents/sec: 1260295.65
2021-10-23 15:03:28,802 epoch 1 - iter 1509/5034 - loss 16.35974099 - samples/sec: 5.06 - decode_sents/sec: 1250583.82
2021-10-23 15:05:11,367 epoch 1 - iter 2012/5034 - loss 15.39498760 - samples/sec: 5.12 - decode_sents/sec: 1271690.72
2021-10-23 15:06:54,296 epoch 1 - iter 2515/5034 - loss 14.80468873 - samples/sec: 5.10 - decode_sents/sec: 1222326.14
2021-10-23 15:08:36,583 epoch 1 - iter 3018/5034 - loss 14.29490426 - samples/sec: 5.14 - decode_sents/sec: 1297499.95
2021-10-23 15:10:19,406 epoch 1 - iter 3521/5034 - loss 13.89466663 - samples/sec: 5.11 - decode_sents/sec: 1280956.23
2021-10-23 15:12:02,116 epoch 1 - iter 4024/5034 - loss 13.52090477 - samples/sec: 5.12 - decode_sents/sec: 1234485.03
2021-10-23 15:13:44,229 epoch 1 - iter 4527/5034 - loss 13.33001108 - samples/sec: 5.15 - decode_sents/sec: 1291938.10
2021-10-23 15:15:26,911 epoch 1 - iter 5030/5034 - loss 13.11416934 - samples/sec: 5.12 - decode_sents/sec: 1267869.54
2021-10-23 15:15:27,572 ----------------------------------------------------------------------------------------------------
2021-10-23 15:15:27,572 EPOCH 1 done: loss 3.2802 - lr 5e-06
2021-10-23 15:15:27,572 ----------------------------------------------------------------------------------------------------
[Sentence: "Esta e outras condições específicas da manifestação da informação como participante deste processo são estudadas neste artigo ." - 18 Tokens]
index 0 is out of bounds for dimension 0 with size 0
> /home/azeredo/test_ace/test_recent_ace/ACE/flair/embeddings.py(3733)add_document_embeddings_v2()
-> if self.pooling_operation == "last":
(Pdb)

The error index 0 is out of bounds for dimension 0 with size 0 seems to happen in this line.

harem_default.train
harem_default.dev
harem_default.test
ner_tags_harem.pkl
xlnet-doc-ner-test.yaml
You can find all files here.

I'm very thankful for your patience and help.

[EDIT]: I forgot to mention: this error does not occur when I'm using the CONLL2003 dataset, and that's confusing me because both datasets seem to be in an identical structure. In my custom dataset, the third column (chunking) has random values, I think that's not a problem since I don't remember reading anything about using this column to the output predictions.

About config for tweebank dataset

Hello
First of all, I read the your paper published well. I was very impressed. I applied uploaded code to the CoNLL 2003 data set. I confirmed the performance suggested in the paper.
This time, I want to confirm performance about POS with Tweebank data set. But I can't confirm proper performance. So shall you send me the yaml file for POS? Thanks for reading.

ERROR: RuntimeError: cublas runtime error

My conda env:

python=3.6 pytorch=1.3.1

_libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main
 _openmp_mutex      pkgs/main/linux-64::_openmp_mutex-4.5-1_gnu
 _pytorch_select    pkgs/main/linux-64::_pytorch_select-0.2-gpu_0
 blas               pkgs/main/linux-64::blas-1.0-mkl
 ca-certificates    pkgs/main/linux-64::ca-certificates-2021.10.26-h06a4308_2
 certifi            pkgs/main/linux-64::certifi-2021.5.30-py36h06a4308_0
 cffi               pkgs/main/linux-64::cffi-1.14.6-py36h400218f_0
 cudatoolkit        pkgs/main/linux-64::cudatoolkit-10.0.130-0
 cudnn              pkgs/main/linux-64::cudnn-7.6.5-cuda10.0_0
 intel-openmp       pkgs/main/linux-64::intel-openmp-2021.4.0-h06a4308_3561
 ld_impl_linux-64   pkgs/main/linux-64::ld_impl_linux-64-2.35.1-h7274673_9
 libffi             pkgs/main/linux-64::libffi-3.3-he6710b0_2
 libgcc-ng          pkgs/main/linux-64::libgcc-ng-9.3.0-h5101ec6_17
 libgomp            pkgs/main/linux-64::libgomp-9.3.0-h5101ec6_17
 libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-9.3.0-hd4cf53a_17
 mkl                pkgs/main/linux-64::mkl-2020.2-256
 mkl-service        pkgs/main/linux-64::mkl-service-2.3.0-py36he8ac12f_0
 mkl_fft            pkgs/main/linux-64::mkl_fft-1.3.0-py36h54f3939_0
 mkl_random         pkgs/main/linux-64::mkl_random-1.1.1-py36h0573a6f_0
 ncurses            pkgs/main/linux-64::ncurses-6.3-h7f8727e_2
 ninja              pkgs/main/linux-64::ninja-1.10.2-h5e70eb0_2
 numpy              pkgs/main/linux-64::numpy-1.19.2-py36h54aff64_0
 numpy-base         pkgs/main/linux-64::numpy-base-1.19.2-py36hfa32c7d_0
 openssl            pkgs/main/linux-64::openssl-1.1.1l-h7f8727e_0
 pip                pkgs/main/linux-64::pip-21.2.2-py36h06a4308_0
 pycparser          pkgs/main/noarch::pycparser-2.21-pyhd3eb1b0_0
 python             pkgs/main/linux-64::python-3.6.13-h12debd9_1
 pytorch            pkgs/main/linux-64::pytorch-1.3.1-cuda100py36h53c1284_0
 readline           pkgs/main/linux-64::readline-8.1-h27cfd23_0
 setuptools         pkgs/main/linux-64::setuptools-58.0.4-py36h06a4308_0
 six                pkgs/main/noarch::six-1.16.0-pyhd3eb1b0_0
 sqlite             pkgs/main/linux-64::sqlite-3.36.0-hc218d9a_0
 tk                 pkgs/main/linux-64::tk-8.6.11-h1ccaba5_0
 wheel              pkgs/main/noarch::wheel-0.37.0-pyhd3eb1b0_1
 xz                 pkgs/main/linux-64::xz-5.2.5-h7b6447c_0
 zlib               pkgs/main/linux-64::zlib-1.2.11-h7f8727e_4

I later run pip install -r requirements.txt which throws an error and also installs the following:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. 
This behaviour is the source of the following dependency conflicts.
mkl-fft 1.3.0 requires numpy>=1.16, but you have numpy 1.15.1 which is incompatible.
Successfully installed Deprecated-1.2.6 Jinja2-3.0.3 MarkupSafe-2.0.1 Pillow-7.0.0 Werkzeug-2.0.2 aadict-0.2.3 
alabaster-0.7.12 allennlp-0.9.0 asset-0.6.13 attrs-21.2.0 babel-2.9.1 backcall-0.2.0 blis-0.2.4 boto3-1.10.45 
botocore-1.13.45 bpemb-0.3.0 certifi-2020.4.5.1 chardet-3.0.4 click-8.0.3 conllu-1.3.1 cycler-0.10.0 cymem-2.0.6 
dataclasses-0.8 decorator-5.1.0 docutils-0.15.2 editdistance-0.6.0 filelock-3.4.0 flaky-3.7.0 flask-2.0.2 flask-cors-3.0.10 
ftfy-6.0.3 gensim-3.8.1 gevent-21.12.0 globre-0.1.5 greenlet-1.1.2 h5py-2.8.0 idna-2.8 imagesize-1.3.0
 importlib-metadata-4.8.3 iniconfig-1.1.1 ipython-7.12.0 ipython-genutils-0.2.0 itsdangerous-2.0.1 jedi-0.18.1 
jmespath-0.10.0 joblib-1.1.0 jsonnet-0.18.0 jsonpickle-2.0.0 kiwisolver-1.3.1 matplotlib-3.1.3 mock-4.0.1 
murmurhash-1.0.6 nltk-3.6.3 numpy-1.15.1 numpydoc-1.1.0 overrides-2.8.0 packaging-21.3 parsimonious-0.8.1
 parso-0.8.3 pexpect-4.8.0 pickleshare-0.7.5 plac-0.9.6 pluggy-0.13.1 preshed-2.0.1 prompt-toolkit-3.0.24 
protobuf-3.19.1 ptyprocess-0.7.0 py-1.11.0 pygments-2.10.0 pyhocon-0.3.56 pyparsing-3.0.6 pytest-6.1.2
 python-dateutil-2.8.2 pytorch-pretrained-bert-0.6.2 pytorch-transformers-1.1.0 pytz-2021.3 pyyaml-5.2 
regex-2019.12.20 requests-2.22.0 responses-0.16.0 s3transfer-0.2.1 sacremoses-0.0.46 scikit-learn-0.24.2 
scipy-1.4.1 segtok-1.5.7 sentencepiece-0.1.96 sklearn-0.0 smart-open-5.2.1 snowballstemmer-2.2.0 
spacy-2.1.9 sphinx-4.3.2 sphinxcontrib-applehelp-1.0.2 sphinxcontrib-devhelp-1.0.2 sphinxcontrib-htmlhelp-2.0.0
 sphinxcontrib-jsmath-1.0.1 sphinxcontrib-qthelp-1.0.3 sphinxcontrib-serializinghtml-1.1.5 sqlparse-0.4.2 
srsly-1.0.5 tabulate-0.8.6 tensorboardX-2.4.1 thinc-7.0.8 threadpoolctl-3.0.0 tokenizers-0.8.0rc4 toml-0.10.2
 tqdm-4.41.0 traitlets-4.3.3 transformers-3.0.0 typing-extensions-4.0.1 unidecode-1.3.2 urllib3-1.25.11
 wasabi-0.9.0 wcwidth-0.2.5 word2number-1.1 wrapt-1.13.3 zipp-3.6.0 zope.event-4.5.0 zope.interface-5.4.0

Then when I run: CUDA_VISIBLE_DEVICES=0 python train.py --config config/conll_03_english.yaml --test throws this error:

[2021-12-23 11:25:58,720 INFO] loading file https://s3.amazonaws.com/models.huggingface.co/bert/xlm-roberta-large-finetuned-conll03-english-sentencepiece.bpe.model from cache at /home/chapapadopoulos/.cache/torch/transformers/431cf95b26928e8ff52fd32e349c1de81e77e39e0827a725feaa4357692901cf.309f0c29486cffc28e1e40a2ab0ac8f500c203fe080b95f820aa9cb58e5b84ed
[2021-12-23 11:25:59,854 INFO] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/xlm-roberta-large-finetuned-conll03-english-config.json from cache at /home/chapapadopoulos/.cache/torch/transformers/4df1826a1128bbf8e81e2d920aace90d7e8a32ca214090f7210822aca0fd67d2.af9bc4ec719428ebc5f7bd9b67c97ee305cad5ba274c764cd193a31529ee3ba6
[2021-12-23 11:25:59,856 INFO] Model config XLMRobertaConfig {
  "_num_labels": 8,
  "architectures": [
    "XLMRobertaForTokenClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "id2label": {
    "0": "B-LOC",
    "1": "B-MISC",
    "2": "B-ORG",
    "3": "I-LOC",
    "4": "I-MISC",
    "5": "I-ORG",
    "6": "I-PER",
    "7": "O"
  },
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "label2id": {
    "B-LOC": 0,
    "B-MISC": 1,
    "B-ORG": 2,
    "I-LOC": 3,
    "I-MISC": 4,
    "I-ORG": 5,
    "I-PER": 6,
    "O": 7
  },
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "xlm-roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "output_hidden_states": true,
  "output_past": true,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 250002
}

[2021-12-23 11:26:00,498 INFO] loading weights file https://cdn.huggingface.co/xlm-roberta-large-finetuned-conll03-english-pytorch_model.bin from cache at /home/chapapadopoulos/.cache/torch/transformers/3a603320849fd5410edf034706443763632c09305bb0fd1f3ba26dcac5ed84b3.437090cbc8148a158bd2b30767652c9e66e4b09430bc0fa2b717028fb6047724
[2021-12-23 11:26:21,062 INFO] All model checkpoint weights were used when initializing XLMRobertaModel.

[2021-12-23 11:26:21,063 INFO] All the weights of XLMRobertaModel were initialized from the model checkpoint at xlm-roberta-large-finetuned-conll03-english.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use XLMRobertaModel for predictions without further training.
2021-12-23 11:26:22,672 Model Size: 1106399156
Corpus: 14987 train + 3466 dev + 3684 test sentences
2021-12-23 11:26:22,721 ----------------------------------------------------------------------------------------------------
2021-12-23 11:26:25,010 loading file resources/taggers/en-xlmr-tuned-first_elmo_bert-old-four_multi-bert-four_word-glove_word_origflair_mflair_char_30episode_150epoch_32batch_0.1lr_800hidden_en_monolingual_crf_fast_reinforce_freeze_norelearn_sentbatch_0.5discount_0.9momentum_5patience_nodev_newner5/best-model.pt
2021-12-23 11:26:30,452 Testing using best model ...
2021-12-23 11:26:30,455 Setting embedding mask to the best action: tensor([1., 0., 0., 0., 1., 1., 0., 1., 1., 1., 1.], device='cuda:0')
['/home/chapapadopoulos/.cache/torch/transformers/bert-base-cased', '/home/chapapadopoulos/.flair/embeddings/lm-jw300-backward-v0.1.pt', '/home/chapapadopoulos/.flair/embeddings/lm-jw300-forward-v0.1.pt', '/home/chapapadopoulos/.flair/embeddings/news-backward-0.4.1.pt', '/home/chapapadopoulos/.flair/embeddings/news-forward-0.4.1.pt', '/home/chapapadopoulos/.flair/embeddings/xlm-roberta-large-finetuned-conll03-english', 'Char', 'Word: en', 'Word: glove', 'bert-base-multilingual-cased', 'elmo-original']
2021-12-23 11:26:32,461 /home/yongjiang.jy/.cache/torch/transformers/bert-base-cased 108310272
Traceback (most recent call last):
  File "train.py", line 163, in <module>
    predict_posterior=args.predict_posterior,
  File "/home/chapapadopoulos/github/NER/ACE-main/flair/trainers/reinforcement_trainer.py", line 1459, in final_test
    self.gpu_friendly_assign_embedding([loader], selection = self.model.selection)
  File "/home/chapapadopoulos/github/NER/ACE-main/flair/trainers/distillation_trainer.py", line 1171, in gpu_friendly_assign_embedding
    embedding.embed(sentences)
  File "/home/chapapadopoulos/github/NER/ACE-main/flair/embeddings.py", line 97, in embed
    self._add_embeddings_internal(sentences)
  File "/home/chapapadopoulos/github/NER/ACE-main/flair/embeddings.py", line 2722, in _add_embeddings_internal
    sequence_output, pooled_output, all_encoder_layers = self.model(all_input_ids, token_type_ids=None, attention_mask=all_input_masks)
  File "/home/chapapadopoulos/anaconda3/envs/ACEagain/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/chapapadopoulos/anaconda3/envs/ACEagain/lib/python3.6/site-packages/transformers/modeling_bert.py", line 762, in forward
    output_hidden_states=output_hidden_states,
  File "/home/chapapadopoulos/anaconda3/envs/ACEagain/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/chapapadopoulos/anaconda3/envs/ACEagain/lib/python3.6/site-packages/transformers/modeling_bert.py", line 439, in forward
    output_attentions,
  File "/home/chapapadopoulos/anaconda3/envs/ACEagain/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/chapapadopoulos/anaconda3/envs/ACEagain/lib/python3.6/site-packages/transformers/modeling_bert.py", line 371, in forward
    hidden_states, attention_mask, head_mask, output_attentions=output_attentions,
  File "/home/chapapadopoulos/anaconda3/envs/ACEagain/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/chapapadopoulos/anaconda3/envs/ACEagain/lib/python3.6/site-packages/transformers/modeling_bert.py", line 315, in forward
    hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, output_attentions,
  File "/home/chapapadopoulos/anaconda3/envs/ACEagain/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/chapapadopoulos/anaconda3/envs/ACEagain/lib/python3.6/site-packages/transformers/modeling_bert.py", line 239, in forward
    attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: cublas runtime error : the GPU program failed to execute at /tmp/pip-req-build-ocx5vxk7/aten/src/THC/THCBlas.cu:331

It runs on an nvidia 3090 and I have updated all drivers:
NVIDIA-SMI 470.86 Driver Version: 470.86 CUDA Version: 11.4

Trying to download pretrained models

Hi,

I was trying to download the pretrained models from Onedrive, yet I have an issue with conll_en_ner_model.zip.

After 2.4GB in I get the following message:

2021-03-12 09:11:40 (782 KB/s) - Read error at byte 2581894817/4107054003 (Connection reset by peer). Retrying.

This might suggest the download is corrupt. Is there any other way to share that specific zip? Thanks!

Requirements.txt versions

Many package dependencies are deprecated. Will the requirements eventually be updated for newer Python versions?

Furthermore, installing the requirements gives

ERROR: Cannot install -r requirements.txt (line 4) and conllu==4.4 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested conllu==4.4
allennlp 0.9.0 depends on conllu==1.3.1

The conflict is caused by:
The user requested numpy==1.15.1
allennlp 0.9.0 depends on numpy
blis 0.2.4 depends on numpy>=1.15.0
bpemb 0.3.0 depends on numpy
elmoformanylangs 0.0.2 depends on numpy
gensim 3.8.1 depends on numpy>=1.11.3
h5py 2.8.0 depends on numpy>=1.7
keras-applications 1.0.8 depends on numpy>=1.9.1
keras-preprocessing 1.1.0 depends on numpy>=1.9.1
matplotlib 3.1.3 depends on numpy>=1.11
mxnet 1.5.0 depends on numpy<2.0.0 and >1.16.0

Originally posted by @daniel-v-e in #37 (comment)

KeyError: "Unable to open object (object 'conll_03_english_train_-1' doesn't exist)"

你好啊,我执行这个命令CUDA_VISIBLE_DEVICES=0 python train.py --config config/doc_ner_best.yaml --test的时候出现的问题,我也把这个出问题的文件路径打印出来,就是这个/home/gly/download/bert-large-cased.hdf5,这个conll_03_english_train_-1需要吗

[2022-04-14 21:58:45,188 INFO] loading weights file https://cdn.huggingface.co/bert-base-multilingual-cased-pytorch_model.bin from cache at /home/gly/.cache/torch/transformers/3d1d2b2daef1e2b3ddc2180ddaae8b7a37d5f279babce0068361f71cd548f615.7131dcb754361639a7d5526985f880879c9bfd144b65a0bf50590bddb7de9059
[2022-04-14 21:58:48,009 INFO] All model checkpoint weights were used when initializing BertModel.

[2022-04-14 21:58:48,010 INFO] All the weights of BertModel were initialized from the model checkpoint at bert-base-multilingual-cased.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use BertModel for predictions without further training.
2022-04-14 21:58:49,262 Model Size: 2191057464
2022-04-14 21:58:49,361 bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
2022-04-14 21:58:49,362 /home/gly/download/bert-large-cased.hdf5
Traceback (most recent call last):
  File "train.py", line 121, in <module>
    trainer: trainer_func = trainer_func(student, None, corpus, config=config.config, **config.config[trainer_name], is_test=args.test)
  File "/home/gly/python_workspace/ace/ACE/flair/trainers/reinforcement_trainer.py", line 169, in __init__
    self.assign_predicted_embeddings(doc_sentence_dict,embedding,pretrained_file_dict[embedding.name])
  File "/home/gly/python_workspace/ace/ACE/flair/trainers/distillation_trainer.py", line 1195, in assign_predicted_embeddings
    group = lm_file[key]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/h5py/_hl/group.py", line 177, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'conll_03_english_train_-1' doesn't exist)"

AttributeError: 'NoneType' object has no attribute 'tokenize'

Hello,
Hit an error while running
python .\train.py --config .\config\doc_ner_best.yaml --batch_size 1 --parse --target_dir .\datasets\mytest --keep_order
on Windows 10, Python 3.7, no GPU.

Here is the error message:
2022-07-28 14:35:50,789 Reading data from datasets\mytest
2022-07-28 14:35:50,789 Train: datasets\mytest\doc_train.txt
2022-07-28 14:35:50,789 Dev: None
2022-07-28 14:35:50,791 Test: None
Traceback (most recent call last):
File ".\train.py", line 345, in
train_eval_result, train_loss = student.evaluate(loader,out_path=Path('outputs/train.'+config.config['model_name']+'.'+tar_file_name+'.conllu'),embeddings_storage_mode="none",prediction_mode=True)
File "C:\Users\ebb\ACE\flair\models\sequence_tagger_model.py", line 2218, in evaluate
features = self.forward(batch,prediction_mode=prediction_mode)
File "C:\Users\ebb\ACE\flair\models\sequence_tagger_model.py", line 818, in forward
self.embeddings.embed(sentences,embedding_mask=self.selection)
File "C:\Users\ebb\ACE\flair\embeddings.py", line 184, in embed
embedding.embed(sentences)
File "C:\Users\ebb\ACE\flair\embeddings.py", line 97, in embed
self._add_embeddings_internal(sentences)
File "C:\Users\ebb\ACE\flair\embeddings.py", line 2962, in _add_embeddings_internal
self._add_embeddings_to_sentences(sentences)
File "C:\Users\ebb\ACE\flair\embeddings.py", line 3051, in _add_embeddings_to_sentences
subtokenized_sentence = self.tokenizer.tokenize(tokenized_string)
AttributeError: 'NoneType' object has no attribute 'tokenize'

The error is trigged by this line: https://github.com/Alibaba-NLP/ACE/blob/main/flair/embeddings.py#L3041
because self.tokenizer is None.

Any suggestions how to debug this issue? Thanks.

btw, the content of doc_train.txt is the following gibberish:
-DOCSTART- O

Amazon O
predict O
Paypal O
and O
do O
7-11 O
for O
Canada O
and O
Hongkong O

Confuse about the SDP dataset

When I run train.py with config-psd_parsing_model.yaml, I encounter “FileNotFoundError: [Errno 2] No such file or directory: '/home/rnx/.flair/datasets/enhanced_ud/PSD' Error, which indicates that neither do I have train, Dev and test of PSD, DM and PAS datasets nor I know about how to preprocess them. Could you provide some tips on how to get and preprocess data sets? Do I need to buy dataset from https://catalog.ldc.upenn.edu/LDC2016T10?

Ask about results in Table 2.

image
Hi, I read your paper and find in Table 2, for NER tasks on Es, your methods bring very large gains compared with other languages. Do you have some intuition on why it can bring such large gains ?

How to apply ACE model as a parser tool for DM parsing?

As the best parser on DM, i wanna apply it for DM parsing.
But now I can't use the code, even cannot reproduce the semantic dependency parsing on DM.
For the first step, i don't have the dataset "/enhancedud/DM", and also don't know where to get it.

So how can i use ACE as a parser tool for DM parsing?
Looking forward to your reply.

Using your best model for inference for NER in production

Hi there,

Is there an easy way to use your pre-trained model for inference? If not, do you have plans perhaps to release it in huggingface or any other place so that we can use it in production as you still hold the SOTA on CoNLl dataset?

Thanks

Change the directory of downloading

Hi, this should be trivial issue but I hope you can still help me out.

I set up the framework and the code indeed try to download embedding automatically. The issue is that it tries to put the downloaded embedding to ~/.cache/torch, however I have limited space in the default directory.

I am wondering, is there a way to change the default downloading to other directory?

Thank you for your time.

下载bert-base-multilingual-cased.hdf5的问题

由于这个模型在OneDrive上,而且有19个g,所以在国内下载这个模型在下载链接失效前总是下载不完,国内有其他的网盘或者链接可以下载吗,感谢~

Special characters and punctuation in ACE chunking

Hello,

I've noticed some special characters and words attached to them get omitted in ACE chunking, as seen below:

"text": "This is an experiment: how do special chars & punctuations--like ~ (tilde) or * (star)--behave in ACE? #science",
"chunk_str": "<This> <is> <an experiment> <how> <special chars & punctuations--like> <~> <*> <in> <ACE> .",


"text": "This is an experiment how do special chars and punctuations like tilde or star behave in ACE? science",
"chunk_str": "<This> <is> <an experiment> <how> <special chars and punctuations> <like> <tilde or star> <behave> <in> <ACE> <science> ."

Here, :, (, ), # seem to be culprits. For some reason, <do> also disappears in both examples.

Would you have a complete list of such characters? I'm trying to create some kind of preprocessing module that would strip input sentences of them.

Much thanks!

Can a different ModelTrainer be used?

image
image

I was wonder if it is possible to use a different trainer like this for a NER specific task.
If its possible how should I proceed with it?

image
Is there any provision to simply change it here?
image

Implementation of Abstract Extraction

Thank you for your awesome work, but after looking at part of the code I realized the training code for abstract Extraction is currently under "rejected list". Could you explain a little bit how you obtained the results for SentEval dataset? I am trying to reproduce the results and hopefully use them for my own project.

Sincerely.

ERROR: No matching distribution found for nltk==7.1.2

There is an error when pip install -r requirement.txt.

Environment: 百度飞浆 python3.7

ERROR: Ignored the following versions that require a different python version: 0.2.0 Requires-Python ==3.6; 1.10.0rc1 Requires-Python <3.12,>=3.8; 1.22.0 Requires-Python >=3.8; 1.22.0rc1 Requires-Python >=3.8; 1.22.0rc2 Requires-Python >=3.8; 1.22.0rc3 Requires-Python >=3.8; 1.22.1 Requires-Python >=3.8; 1.22.2 Requires-Python >=3.8; 1.22.3 Requires-Python >=3.8; 1.22.4 Requires-Python >=3.8; 1.23.0 Requires-Python >=3.8; 1.23.0rc1 Requires-Python >=3.8; 1.23.0rc2 Requires-Python >=3.8; 1.23.0rc3 Requires-Python >=3.8; 1.23.1 Requires-Python >=3.8; 1.23.2 Requires-Python >=3.8; 1.23.3 Requires-Python >=3.8; 1.23.4 Requires-Python >=3.8; 1.23.5 Requires-Python >=3.8; 1.24.0rc1 Requires-Python >=3.8; 1.24.0rc2 Requires-Python >=3.8; 1.8.0 Requires-Python >=3.8,<3.11; 1.8.0rc1 Requires-Python >=3.8,<3.11; 1.8.0rc2 Requires-Python >=3.8,<3.11; 1.8.0rc3 Requires-Python >=3.8,<3.11; 1.8.0rc4 Requires-Python >=3.8,<3.11; 1.8.1 Requires-Python >=3.8,<3.11; 1.9.0 Requires-Python >=3.8,<3.12; 1.9.0rc1 Requires-Python >=3.8,<3.12; 1.9.0rc2 Requires-Python >=3.8,<3.12; 1.9.0rc3 Requires-Python >=3.8,<3.12; 1.9.1 Requires-Python >=3.8,<3.12; 1.9.2 Requires-Python >=3.8; 1.9.3 Requires-Python >=3.8; 3.6.0 Requires-Python >=3.8; 3.6.0rc1 Requires-Python >=3.8; 3.6.0rc2 Requires-Python >=3.8; 3.6.1 Requires-Python >=3.8; 3.6.2 Requires-Python >=3.8; 8.0.0 Requires-Python >=3.8; 8.0.0a1 Requires-Python >=3.8; 8.0.0b1 Requires-Python >=3.8; 8.0.0rc1 Requires-Python >=3.8; 8.0.1 Requires-Python >=3.8; 8.1.0 Requires-Python >=3.8; 8.1.1 Requires-Python >=3.8; 8.2.0 Requires-Python >=3.8; 8.3.0 Requires-Python >=3.8; 8.4.0 Requires-Python >=3.8; 8.5.0 Requires-Python >=3.8; 8.6.0 Requires-Python >=3.8; 8.7.0 Requires-Python >=3.8
ERROR: Could not find a version that satisfies the requirement nltk==7.1.2 (from versions: 2.0b8.macosx-10.5-i386, 2.0.1rc1.macosx-10.6-x86_64, 2.0.1rc2-git, 2.0b4, 2.0b5, 2.0b6, 2.0b7, 2.0b8, 2.0b9, 2.0.1rc1, 2.0.1rc3, 2.0.1rc4, 2.0.1, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 3.0.0b1, 3.0.0b2, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.0.4, 3.0.5, 3.1, 3.2, 3.2.1, 3.2.2, 3.2.3, 3.2.4, 3.2.5, 3.3.0, 3.4, 3.4.1, 3.4.2, 3.4.3, 3.4.4, 3.4.5, 3.5b1, 3.5, 3.6, 3.6.1, 3.6.2, 3.6.3, 3.6.4, 3.6.5, 3.6.6, 3.6.7, 3.7, 3.8)
ERROR: No matching distribution found for nltk==7.1.2

requirements.txt facing conflicts

Can anyone confirm if it is possible to run the command:

pip install -r requirements.txt

and face zero conflicts?

I am trying to run it on newly created conda envs and I face 2 conflicts.

  1. on mkl-fft==1.0.6 and
  2. on numpy==1.15.1

Thanks in advance

Error with document-level model for NER task

Hello,

First of all, I would like to thank you for making this work available :)

I'm having an issue trying to replicate experiments documented here: https://github.com/Alibaba-NLP/ACE/blob/main/resources/docs/named_entity_recognition.md

I had success replicating the tutorial for Sentence-level Model for CoNLL 2003 English. However, when I try to do the same for Document-level Model, I'm having an error:

(ace) azeredo@ix-ws28:~/test_ace/test_recent_ace/ACE$ CUDA_VISIBLE_DEVICES=0 python train.py --config /home/azeredo/test_ace/test_recent_ace/ACE/config/xlnet-doc-en-ner-finetune.yaml
/home/azeredo/test_ace/test_recent_ace/ACE/flair/utils/params.py:104: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  dict_merge.dict_merge(params_dict, yaml.load(f))
2021-10-21 15:51:47,652 Reading data from /home/azeredo/.flair/datasets/conll_03_english
2021-10-21 15:51:47,652 Train: /home/azeredo/.flair/datasets/conll_03_english/eng.train
2021-10-21 15:51:47,652 Dev: /home/azeredo/.flair/datasets/conll_03_english/eng.testa
2021-10-21 15:51:47,652 Test: /home/azeredo/.flair/datasets/conll_03_english/eng.testb
2021-10-21 15:51:53,054 {b'<unk>': 0, b'O': 1, b'B-PER': 2, b'E-PER': 3, b'S-LOC': 4, b'B-MISC': 5, b'I-MISC': 6, b'E-MISC': 7, b'S-MISC': 8, b'S-PER': 9, b'B-ORG': 10, b'E-ORG': 11, b'S-ORG': 12, b'I-ORG': 13, b'B-LOC': 14, b'E-LOC': 15, b'I-PER': 16, b'I-LOC': 17, b'<START>': 18, b'<STOP>': 19}
2021-10-21 15:51:53,054 Corpus: 14987 train + 3466 dev + 3684 test sentences
[2021-10-21 15:51:53,562 INFO] https://s3.amazonaws.com/models.huggingface.co/bert/xlnet-large-cased-config.json not found in cache or force_download set to True, downloading to /home/azeredo/.cache/torch/transformers/tmpdwnsntaa
Downloading: 100%|███████████████████████████████████████| 761/761 [00:00<00:00, 391kB/s]
[2021-10-21 15:51:54,051 INFO] storing https://s3.amazonaws.com/models.huggingface.co/bert/xlnet-large-cased-config.json in cache at /home/azeredo/.cache/torch/transformers/df92a75c0ebbeb195065fe16fafa54ccd72e8362692cca884303a56788bd4bfc.0163e810fe4bdef52282bd9ddcbded8accbaa97a3ea7d89737ee7ce87511c587
[2021-10-21 15:51:54,051 INFO] creating metadata file for /home/azeredo/.cache/torch/transformers/df92a75c0ebbeb195065fe16fafa54ccd72e8362692cca884303a56788bd4bfc.0163e810fe4bdef52282bd9ddcbded8accbaa97a3ea7d89737ee7ce87511c587
[2021-10-21 15:51:54,052 INFO] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/xlnet-large-cased-config.json from cache at /home/azeredo/.cache/torch/transformers/df92a75c0ebbeb195065fe16fafa54ccd72e8362692cca884303a56788bd4bfc.0163e810fe4bdef52282bd9ddcbded8accbaa97a3ea7d89737ee7ce87511c587
[2021-10-21 15:51:54,052 INFO] Model config XLNetConfig {
  "architectures": [
    "XLNetLMHeadModel"
  ],
  "attn_type": "bi",
  "bi_data": false,
  "bos_token_id": 1,
  "clamp_len": -1,
  "d_head": 64,
  "d_inner": 4096,
  "d_model": 1024,
  "dropout": 0.1,
  "end_n_top": 5,
  "eos_token_id": 2,
  "ff_activation": "gelu",
  "initializer_range": 0.02,
  "layer_norm_eps": 1e-12,
  "mem_len": null,
  "model_type": "xlnet",
  "n_head": 16,
  "n_layer": 24,
  "pad_token_id": 5,
  "reuse_len": null,
  "same_length": false,
  "start_n_top": 5,
  "summary_activation": "tanh",
  "summary_last_dropout": 0.1,
  "summary_type": "last",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 250
    }
  },
  "untie_r": true,
  "vocab_size": 32000
}

[2021-10-21 15:51:54,532 INFO] https://s3.amazonaws.com/models.huggingface.co/bert/xlnet-large-cased-spiece.model not found in cache or force_download set to True, downloading to /home/azeredo/.cache/torch/transformers/tmp6pula_yo
Downloading: 100%|████████████████████████████████████| 798k/798k [00:00<00:00, 1.48MB/s]
[2021-10-21 15:51:55,568 INFO] storing https://s3.amazonaws.com/models.huggingface.co/bert/xlnet-large-cased-spiece.model in cache at /home/azeredo/.cache/torch/transformers/5b125ba222ff82664771f63cd8fac9696c24b403fc1ab720d537fe2ceaaf0576.8b10bd978b5d01c21303cc761fc9ecd464419b3bf921864a355ba807cfbfafa8
[2021-10-21 15:51:55,568 INFO] creating metadata file for /home/azeredo/.cache/torch/transformers/5b125ba222ff82664771f63cd8fac9696c24b403fc1ab720d537fe2ceaaf0576.8b10bd978b5d01c21303cc761fc9ecd464419b3bf921864a355ba807cfbfafa8
[2021-10-21 15:51:55,568 INFO] loading file https://s3.amazonaws.com/models.huggingface.co/bert/xlnet-large-cased-spiece.model from cache at /home/azeredo/.cache/torch/transformers/5b125ba222ff82664771f63cd8fac9696c24b403fc1ab720d537fe2ceaaf0576.8b10bd978b5d01c21303cc761fc9ecd464419b3bf921864a355ba807cfbfafa8
[2021-10-21 15:51:56,047 INFO] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/xlnet-large-cased-config.json from cache at /home/azeredo/.cache/torch/transformers/df92a75c0ebbeb195065fe16fafa54ccd72e8362692cca884303a56788bd4bfc.0163e810fe4bdef52282bd9ddcbded8accbaa97a3ea7d89737ee7ce87511c587
[2021-10-21 15:51:56,048 INFO] Model config XLNetConfig {
  "architectures": [
    "XLNetLMHeadModel"
  ],
  "attn_type": "bi",
  "bi_data": false,
  "bos_token_id": 1,
  "clamp_len": -1,
  "d_head": 64,
  "d_inner": 4096,
  "d_model": 1024,
  "dropout": 0.1,
  "end_n_top": 5,
  "eos_token_id": 2,
  "ff_activation": "gelu",
  "initializer_range": 0.02,
  "layer_norm_eps": 1e-12,
  "mem_len": null,
  "model_type": "xlnet",
  "n_head": 16,
  "n_layer": 24,
  "output_hidden_states": true,
  "pad_token_id": 5,
  "reuse_len": null,
  "same_length": false,
  "start_n_top": 5,
  "summary_activation": "tanh",
  "summary_last_dropout": 0.1,
  "summary_type": "last",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 250
    }
  },
  "untie_r": true,
  "vocab_size": 32000
}

[2021-10-21 15:51:56,274 INFO] https://cdn.huggingface.co/xlnet-large-cased-pytorch_model.bin not found in cache or force_download set to True, downloading to /home/azeredo/.cache/torch/transformers/tmpbnvrd8_z
Downloading: 100%|██████████████████████████████████| 1.44G/1.44G [00:31<00:00, 46.2MB/s]
[2021-10-21 15:52:27,536 INFO] storing https://cdn.huggingface.co/xlnet-large-cased-pytorch_model.bin in cache at /home/azeredo/.cache/torch/transformers/7fc554c19ef7bc74f1f74603c10156d751d2f99b09e8e38f91ed88e8c9ec6294.db8dc8babedbb75a56c36fca3e02b016e19fd682e79fb1a928e03c2df977cace
[2021-10-21 15:52:27,536 INFO] creating metadata file for /home/azeredo/.cache/torch/transformers/7fc554c19ef7bc74f1f74603c10156d751d2f99b09e8e38f91ed88e8c9ec6294.db8dc8babedbb75a56c36fca3e02b016e19fd682e79fb1a928e03c2df977cace
[2021-10-21 15:52:27,537 INFO] loading weights file https://cdn.huggingface.co/xlnet-large-cased-pytorch_model.bin from cache at /home/azeredo/.cache/torch/transformers/7fc554c19ef7bc74f1f74603c10156d751d2f99b09e8e38f91ed88e8c9ec6294.db8dc8babedbb75a56c36fca3e02b016e19fd682e79fb1a928e03c2df977cace
[2021-10-21 15:52:32,104 INFO] All model checkpoint weights were used when initializing XLNetModel.

[2021-10-21 15:52:32,104 INFO] All the weights of XLNetModel were initialized from the model checkpoint at xlnet-large-cased.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use XLNetModel for predictions without further training.
2021-10-21 15:52:33,874 Model Size: 360289300
Traceback (most recent call last):
  File "train.py", line 121, in <module>
    trainer: trainer_func = trainer_func(student, None, corpus, config=config.config, **config.config[trainer_name], is_test=args.test)
TypeError: __init__() got an unexpected keyword argument 'train_with_doc'

This happens if I try to use any of the available pre-defined configuration files: xlnet-doc-en-ner-finetune.yaml, xlmr-doc-en-ner-finetune.yaml or roberta-doc-en-ner-finetune.yaml

Thank you!

Could you provide google-colab notebook?

Hello,
I am a little bit beginner but I am very interested to implement your method for aspect extraction with google-colab or kaggle notebook...
I hope if you could provide one or give me a link if the notebook exists.

Thank you in advance...

NER prediction

Hello again,
I am trying to run a prediction.
Let's say I have a txt file with a paragraph that I want to annotate with entities (B-LOC, I-LOC etc), how can I do this?
I have already set up your pretrained model and I succesfully run the test command:
CUDA_VISIBLE_DEVICES=0 python train.py --config config/conll_03_english.yaml --test
I am a little lost with the documentation. I appreciate any help.
Thanks in advance!

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED

我运行的命令是
CUDA_VISIBLE_DEVICES=0 python train.py --config config/conll_03_english.yaml --test
配置文件也没有修改过,会出现RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED

Traceback (most recent call last):
  File "train.py", line 87, in <module>
    student=config.create_student(nocrf=args.nocrf)
  File "/home/gly/python_workspace/ACE/flair/config_parser.py", line 235, in create_student
    return self.create_model(self.config,pretrained=self.load_pretrained(self.config), is_student=True)
  File "/home/gly/python_workspace/ACE/flair/config_parser.py", line 188, in create_model
    embeddings, word_map, char_map, lemma_map, postag_map=self.create_embeddings(config['embeddings'])
  File "/home/gly/python_workspace/ACE/flair/config_parser.py", line 163, in create_embeddings
    embedding_list.append(getattr(Embeddings,embedding.split('-')[0])(**embeddings[embedding]))
  File "/home/gly/python_workspace/ACE/flair/embeddings.py", line 1181, in __init__
    embedded_dummy = self.embed(dummy_sentence)
  File "/home/gly/python_workspace/ACE/flair/embeddings.py", line 97, in embed
    self._add_embeddings_internal(sentences)
  File "/home/gly/python_workspace/ACE/flair/embeddings.py", line 1218, in _add_embeddings_internal
    embeddings = self.ee.embed_batch(sentence_words)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/allennlp/commands/elmo.py", line 255, in embed_batch
    embeddings, mask = self.batch_to_embeddings(batch)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/allennlp/commands/elmo.py", line 197, in batch_to_embeddings
    bilm_output = self.elmo_bilm(character_ids)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/allennlp/modules/elmo.py", line 607, in forward
    token_embedding = self._token_embedder(inputs)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/allennlp/modules/elmo.py", line 376, in forward
    convolved = conv(character_embedding)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 202, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

这个是我的cuda和torch版本,我的python是3.7.4的。
image

我试了在train.py禁用cudnn,

import torch
torch.backends.cudnn.enabled = False

出现的是这个问题
image

Traceback (most recent call last):
  File "train.py", line 88, in <module>
    student=config.create_student(nocrf=args.nocrf)
  File "/home/gly/python_workspace/ACE/flair/config_parser.py", line 235, in create_student
    return self.create_model(self.config,pretrained=self.load_pretrained(self.config), is_student=True)
  File "/home/gly/python_workspace/ACE/flair/config_parser.py", line 188, in create_model
    embeddings, word_map, char_map, lemma_map, postag_map=self.create_embeddings(config['embeddings'])
  File "/home/gly/python_workspace/ACE/flair/config_parser.py", line 163, in create_embeddings
    embedding_list.append(getattr(Embeddings,embedding.split('-')[0])(**embeddings[embedding]))
  File "/home/gly/python_workspace/ACE/flair/embeddings.py", line 1181, in __init__
    embedded_dummy = self.embed(dummy_sentence)
  File "/home/gly/python_workspace/ACE/flair/embeddings.py", line 97, in embed
    self._add_embeddings_internal(sentences)
  File "/home/gly/python_workspace/ACE/flair/embeddings.py", line 1218, in _add_embeddings_internal
    embeddings = self.ee.embed_batch(sentence_words)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/allennlp/commands/elmo.py", line 255, in embed_batch
    embeddings, mask = self.batch_to_embeddings(batch)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/allennlp/commands/elmo.py", line 197, in batch_to_embeddings
    bilm_output = self.elmo_bilm(character_ids)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/allennlp/modules/elmo.py", line 607, in forward
    token_embedding = self._token_embedder(inputs)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/allennlp/modules/elmo.py", line 376, in forward
    convolved = conv(character_embedding)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 202, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

感谢回复~

Incorrect ModelTrainer() arguments in tutorials

Hello,

Thank you for your extensive tutorials. They have been very helpful.

I came across a minor error in the tutorial code, however. The first keyword argument while initializing the ModelTrainer object is a teacher model, rather than corpus. So, lines like trainer: ModelTrainer = ModelTrainer(tagger, corpus) in https://github.com/Alibaba-NLP/ACE/blob/main/resources/docs/EXPERIMENTS.md return an attribute error that can be fixed with trainer: ModelTrainer = ModelTrainer(tagger, corpus = corpus)

Unable to download sequence tagger model due to updated links at flair

Hello,

First, thank you for your exciting work in concatenating embeddings for use in downstream tasks. I am looking forward to trying it out.

I attempted to download sequence tagger models, but found out that links included in your fork of flair is outdated, and the AWS links like https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/models-v0.4 have been replaced with the likes of https://nlp.informatik.hu-berlin.de/resources/models.

The flair update is documented here: flairNLP/flair#1831

Much thanks!

Ask about BERT/XLM-R embeddings

Hi, I have read your interesting paper and code. My question is: As BERT and XLM-R has many layers. I wonder what kind of embeddings you use ? Just the word embedding or a mixture of intermediate layer representation ? Did you find the difference between these options ?
Thanks !

About datasets

Excuse me

I want to know which files to download from this website? https://www.clips.uantwerpen.be/conll2003/ner/

And Where should these datasets be placed?

It reported the error after I tried it myself.

I changed the cache_root because of win10.
image

and put the dataset.

image

but it reported the error:

image

Thanks!

Problem by installing requirement mkl-fft==1.0.6 and mkl-random==1.0.1

Hello,
I have problem by installing mkl-fft and mkl-random. I have the following Error messages:

ERROR: Could not find a version that satisfies the requirement mkl-fft==1.0.6
ERROR: No matching distribution found for mkl-fft==1.0.6

ERROR: Could not find a version that satisfies the requirement mkl-random==1.0.1
ERROR: No matching distribution found for mkl-random==1.0.1

I tried to install them separately but it didn't work for me.

Thank you!

Loading pre-trained model with set vocab file

When loading a pre-trained model I get stuck on a pre-defined loading of the vocabulary file which is set to an absolute path inside the model...
How can I fix this [without making the absolute path as well]?

import sys
import torch 
sys.path.append("/home/jordy/code/SP-calibration-NER/ACE")
import flair


path = "/home/jordy/code/SP-calibration-NER/ACE/resources/taggers/en-xlmr-tuned-first_elmo_bert-old-four_multi-bert-four_word-glove_word_origflair_mflair_char_30episode_150epoch_32batch_0.1lr_800hidden_en_monolingual_crf_fast_reinforce_freeze_norelearn_sentbatch_0.5discount_0.9momentum_5patience_nodev_newner5/best-model.pt"
model = torch.load(path, map_location=torch.device('cpu'))

ERROR:
"""
loader.py
Traceback (most recent call last):
File "loader.py", line 11, in
model = torch.load(path, map_location=torch.device('cpu'))
File "/home/jordy/.virtualenvs/ACE/lib/python3.6/site-packages/torch/serialization.py", line 426, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/home/jordy/.virtualenvs/ACE/lib/python3.6/site-packages/torch/serialization.py", line 613, in _load
result = unpickler.load()
File "/home/jordy/.virtualenvs/ACE/lib/python3.6/site-packages/transformers/tokenization_xlm_roberta.py", line 175, in setstate
self.sp_model.Load(self.vocab_file)
File "/home/jordy/.virtualenvs/ACE/lib/python3.6/site-packages/sentencepiece.py", line 118, in Load
return _sentencepiece.SentencePieceProcessor_Load(self, filename)
OSError: Not found: "/home/yongjiang.jy/.flair/embeddings/xlm-roberta-large-finetuned-conll03-english/sentencepiece.bpe.model": No such file or directory Error #2
"""

Predicting sequence tags and attributes of 'Sentence' object

Hi,

I have been trying to use an ACE model to perform chunking predictions. I understand that I am able to use the --parse flag, but while the command works, I also want to be able to perform predictions on single sentences using something along the lines of SequenceTagger.predict in models/sequence_tagger_model.py. But I run into attribute errors upon running it because in lines 630-640 in embeddings.py, the code references alleged attributes of sentences: List[Sentence] like max_sent_len and char_seqs that do not exist.

If SequenceTagger.predict is deprecated, is it possible to make predictions on sentences whose gold sequence labels are unknown? It's my understanding that using the --parse flag requires gold labels to be included in the parse file as well.

Thanks in advance for your help!

Displaying metrics from results on CoNLl

Hi there,

So I believe I successfully managed to run your best model on CoNLl, however I was wondering how can I go about getting actual prediction values, e.g. Precision (Accuracy), F1 and Recall?

The current output that I have when running python train.py --config config/conll_03_english.yaml --test can be seen below:

Click to expand!
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
/content/ACE/flair/utils/params.py:104: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  dict_merge.dict_merge(params_dict, yaml.load(f))
2021-07-07 10:38:05,848 Reading data from /root/.flair/datasets/conll_03
2021-07-07 10:38:05,848 Train: /root/.flair/datasets/conll_03/train.txt
2021-07-07 10:38:05,848 Dev: /root/.flair/datasets/conll_03/testa.txt
2021-07-07 10:38:05,848 Test: /root/.flair/datasets/conll_03/testb.txt
2021-07-07 10:38:13,533 {b'<unk>': 0, b'O': 1, b'S-ORG': 2, b'S-MISC': 3, b'B-PER': 4, b'E-PER': 5, b'S-LOC': 6, b'B-ORG': 7, b'E-ORG': 8, b'I-PER': 9, b'S-PER': 10, b'B-MISC': 11, b'I-MISC': 12, b'E-MISC': 13, b'I-ORG': 14, b'B-LOC': 15, b'E-LOC': 16, b'I-LOC': 17, b'<START>': 18, b'<STOP>': 19}
2021-07-07 10:38:13,533 Corpus: 14987 train + 3466 dev + 3684 test sentences
[2021-07-07 10:38:13,922 INFO] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-config.json from cache at /root/.cache/torch/transformers/b945b69218e98b3e2c95acf911789741307dec43c698d35fad11c1ae28bda352.9da767be51e1327499df13488672789394e2ca38b877837e52618a67d7002391
[2021-07-07 10:38:13,922 INFO] Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "type_vocab_size": 2,
  "vocab_size": 28996
}

[2021-07-07 10:38:14,289 INFO] loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt from cache at /root/.cache/torch/transformers/5e8a2b4893d13790ed4150ca1906be5f7a03d6c4ddf62296c383f6db42814db2.e13dbb970cb325137104fb2e5f36fe865f27746c6b526f6352861b1980eb80b1
[2021-07-07 10:38:14,680 INFO] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-config.json from cache at /root/.cache/torch/transformers/b945b69218e98b3e2c95acf911789741307dec43c698d35fad11c1ae28bda352.9da767be51e1327499df13488672789394e2ca38b877837e52618a67d7002391
[2021-07-07 10:38:14,680 INFO] Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "output_hidden_states": true,
  "pad_token_id": 0,
  "type_vocab_size": 2,
  "vocab_size": 28996
}

[2021-07-07 10:38:14,753 INFO] loading weights file https://cdn.huggingface.co/bert-base-cased-pytorch_model.bin from cache at /root/.cache/torch/transformers/d8f11f061e407be64c4d5d7867ee61d1465263e24085cfa26abf183fdc830569.3fadbea36527ae472139fe84cddaa65454d7429f12d543d80bfc3ad70de55ac2
[2021-07-07 10:38:28,899 INFO] All model checkpoint weights were used when initializing BertModel.

[2021-07-07 10:38:28,899 INFO] All the weights of BertModel were initialized from the model checkpoint at bert-base-cased.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use BertModel for predictions without further training.
[2021-07-07 10:38:37,167 INFO] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-config.json from cache at /root/.cache/torch/transformers/45629519f3117b89d89fd9c740073d8e4c1f0a70f9842476185100a8afe715d1.65df3cef028a0c91a7b059e4c404a975ebe6843c71267b67019c0e9cfa8a88f0
[2021-07-07 10:38:37,168 INFO] Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "directionality": "bidi",
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "type_vocab_size": 2,
  "vocab_size": 119547
}

[2021-07-07 10:38:37,520 INFO] loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt from cache at /root/.cache/torch/transformers/96435fa287fbf7e469185f1062386e05a075cadbf6838b74da22bf64b080bc32.99bcd55fc66f4f3360bc49ba472b940b8dcf223ea6a345deb969d607ca900729
[2021-07-07 10:38:38,091 INFO] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-config.json from cache at /root/.cache/torch/transformers/45629519f3117b89d89fd9c740073d8e4c1f0a70f9842476185100a8afe715d1.65df3cef028a0c91a7b059e4c404a975ebe6843c71267b67019c0e9cfa8a88f0
[2021-07-07 10:38:38,091 INFO] Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "directionality": "bidi",
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "output_hidden_states": true,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "type_vocab_size": 2,
  "vocab_size": 119547
}

[2021-07-07 10:38:38,161 INFO] loading weights file https://cdn.huggingface.co/bert-base-multilingual-cased-pytorch_model.bin from cache at /root/.cache/torch/transformers/3d1d2b2daef1e2b3ddc2180ddaae8b7a37d5f279babce0068361f71cd548f615.7131dcb754361639a7d5526985f880879c9bfd144b65a0bf50590bddb7de9059
[2021-07-07 10:39:00,902 INFO] All model checkpoint weights were used when initializing BertModel.

[2021-07-07 10:39:00,902 INFO] All the weights of BertModel were initialized from the model checkpoint at bert-base-multilingual-cased.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use BertModel for predictions without further training.
[2021-07-07 10:39:04,978 INFO] Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
[2021-07-07 10:39:05,768 INFO] instantiating registered subclass relu of <class 'allennlp.nn.activations.Activation'>
[2021-07-07 10:39:05,769 INFO] instantiating registered subclass relu of <class 'allennlp.nn.activations.Activation'>
[2021-07-07 10:39:05,769 INFO] instantiating registered subclass relu of <class 'allennlp.nn.activations.Activation'>
[2021-07-07 10:39:05,770 INFO] instantiating registered subclass relu of <class 'allennlp.nn.activations.Activation'>
[2021-07-07 10:39:05,868 INFO] Initializing ELMo.
[2021-07-07 10:39:33,761 INFO] loading Word2VecKeyedVectors object from /root/.flair/embeddings/glove.gensim
[2021-07-07 10:39:34,377 INFO] loading vectors from /root/.flair/embeddings/glove.gensim.vectors.npy with mmap=None
[2021-07-07 10:39:38,741 INFO] setting ignored attribute vectors_norm to None
[2021-07-07 10:39:38,741 INFO] loaded /root/.flair/embeddings/glove.gensim
[2021-07-07 10:39:39,293 INFO] loading Word2VecKeyedVectors object from /root/.flair/embeddings/en-fasttext-news-300d-1M
[2021-07-07 10:39:43,391 INFO] loading vectors from /root/.flair/embeddings/en-fasttext-news-300d-1M.vectors.npy with mmap=None
tcmalloc: large alloc 1200005120 bytes == 0x55bcac878000 @  0x7f107e69f1e7 0x7f107bec7601 0x7f107bf302b0 0x7f107bf30e4e 0x7f107bfc635a 0x55bc2ab90d54 0x55bc2ab90a50 0x55bc2ac05105 0x55bc2abff4ae 0x55bc2ab923ea 0x55bc2ac0132a 0x55bc2abff4ae 0x55bc2ab923ea 0x55bc2ac0132a 0x55bc2ab9230a 0x55bc2ac0060e 0x55bc2abff4ae 0x55bc2ab92a81 0x55bc2ab92ea1 0x55bc2ac01bb5 0x55bc2abff7ad 0x55bc2ab92a81 0x55bc2ab92ea1 0x55bc2ac01bb5 0x55bc2abff7ad 0x55bc2ab923ea 0x55bc2ac047f0 0x55bc2abff7ad 0x55bc2ab92c9f 0x55bc2abd3d79 0x55bc2abd0cc4
[2021-07-07 10:40:16,461 INFO] setting ignored attribute vectors_norm to None
[2021-07-07 10:40:16,461 INFO] loaded /root/.flair/embeddings/en-fasttext-news-300d-1M
[2021-07-07 10:40:32,540 INFO] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/xlm-roberta-large-finetuned-conll03-english-config.json from cache at /root/.cache/torch/transformers/4df1826a1128bbf8e81e2d920aace90d7e8a32ca214090f7210822aca0fd67d2.af9bc4ec719428ebc5f7bd9b67c97ee305cad5ba274c764cd193a31529ee3ba6
[2021-07-07 10:40:32,541 INFO] Model config XLMRobertaConfig {
  "_num_labels": 8,
  "architectures": [
    "XLMRobertaForTokenClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "id2label": {
    "0": "B-LOC",
    "1": "B-MISC",
    "2": "B-ORG",
    "3": "I-LOC",
    "4": "I-MISC",
    "5": "I-ORG",
    "6": "I-PER",
    "7": "O"
  },
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "label2id": {
    "B-LOC": 0,
    "B-MISC": 1,
    "B-ORG": 2,
    "I-LOC": 3,
    "I-MISC": 4,
    "I-ORG": 5,
    "I-PER": 6,
    "O": 7
  },
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "xlm-roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "output_past": true,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 250002
}

[2021-07-07 10:40:32,901 INFO] loading file https://s3.amazonaws.com/models.huggingface.co/bert/xlm-roberta-large-finetuned-conll03-english-sentencepiece.bpe.model from cache at /root/.cache/torch/transformers/431cf95b26928e8ff52fd32e349c1de81e77e39e0827a725feaa4357692901cf.309f0c29486cffc28e1e40a2ab0ac8f500c203fe080b95f820aa9cb58e5b84ed
[2021-07-07 10:40:33,866 INFO] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/xlm-roberta-large-finetuned-conll03-english-config.json from cache at /root/.cache/torch/transformers/4df1826a1128bbf8e81e2d920aace90d7e8a32ca214090f7210822aca0fd67d2.af9bc4ec719428ebc5f7bd9b67c97ee305cad5ba274c764cd193a31529ee3ba6
[2021-07-07 10:40:33,866 INFO] Model config XLMRobertaConfig {
  "_num_labels": 8,
  "architectures": [
    "XLMRobertaForTokenClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "id2label": {
    "0": "B-LOC",
    "1": "B-MISC",
    "2": "B-ORG",
    "3": "I-LOC",
    "4": "I-MISC",
    "5": "I-ORG",
    "6": "I-PER",
    "7": "O"
  },
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "label2id": {
    "B-LOC": 0,
    "B-MISC": 1,
    "B-ORG": 2,
    "I-LOC": 3,
    "I-MISC": 4,
    "I-ORG": 5,
    "I-PER": 6,
    "O": 7
  },
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "xlm-roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "output_hidden_states": true,
  "output_past": true,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 250002
}

[2021-07-07 10:40:33,925 INFO] loading weights file https://cdn.huggingface.co/xlm-roberta-large-finetuned-conll03-english-pytorch_model.bin from cache at /root/.cache/torch/transformers/3a603320849fd5410edf034706443763632c09305bb0fd1f3ba26dcac5ed84b3.437090cbc8148a158bd2b30767652c9e66e4b09430bc0fa2b717028fb6047724
[2021-07-07 10:41:53,332 INFO] All model checkpoint weights were used when initializing XLMRobertaModel.

[2021-07-07 10:41:53,333 INFO] All the weights of XLMRobertaModel were initialized from the model checkpoint at xlm-roberta-large-finetuned-conll03-english.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use XLMRobertaModel for predictions without further training.
2021-07-07 10:41:54,876 Model Size: 1106399156
Corpus: 14987 train + 3466 dev + 3684 test sentences
2021-07-07 10:41:54,910 ----------------------------------------------------------------------------------------------------
2021-07-07 10:41:56,883 loading file resources/taggers/en-xlmr-tuned-first_elmo_bert-old-four_multi-bert-four_word-glove_word_origflair_mflair_char_30episode_150epoch_32batch_0.1lr_800hidden_en_monolingual_crf_fast_reinforce_freeze_norelearn_sentbatch_0.5discount_0.9momentum_5patience_nodev_newner5/best-model.pt
^C
Thank you.

Missing B-ORG but with I-ORG and E-ORG

I wonder if anyone else hit the situation where B-ORG is missing but I-ORG, E-ORG are in the tag.

Moonstone is tagged as I-ORG without a B-ORG. I thought a tag will always start with a "B-" tag, like an open parenthesis followed by a close parenthesis. Any ideas what may cause this case? Thank you.

Mount O B-LOC 0.9758268594741821
St O I-LOC 0.9999212026596069
Louis O E-LOC 0.9446600675582886
Moonstone O I-ORG 0.5030291080474854
Ski O I-ORG 0.9011916518211365
Resort O I-ORG 0.6022846102714539
Ltd O E-ORG 0.9203924536705017

RuntimeError: Found param embeddings.list_embedding_0.model.embeddings.word_embeddings.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor.

你好,不好意思又要麻烦你看个问题,我把模型开启这个混合精度,我在train.py添加了use_ampamp_opt_level这两个参数,然后执行有下面这个问题,看错误应该是要把embeddings加载到gpu里吧,我后来把ReinforcementTrainer中的train函数中的参数embeddings_storage_mode改成了gpu,看了代码之后感觉不是这里的问题,而且改成这个后,我这里的gpu也没那么大的内存。
然后我也调试看了,下面这个图片中的model,发现里面有的embeddings是不在gpu里的,现在不知道是不是这个原因,如果要把embeddings全部加载到gpu,感觉内存也会不够的,再次麻烦你看一下了。
image

Defaults for this optimization level are:
enabled                : True
opt_level              : O2
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : True
master_weights         : True
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O2
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : True
master_weights         : True
loss_scale             : 128.0
Traceback (most recent call last):
  File "/home/gly21/python_workspace/ACE/flair/trainers/reinforcement_trainer.py", line 547, in train
    self.model, optimizer, opt_level=amp_opt_level, loss_scale=128.0
  File "/home/gly21/.conda/envs/gly_ace/lib/python3.7/site-packages/apex/amp/frontend.py", line 358, in initialize
    return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
  File "/home/gly21/.conda/envs/gly_ace/lib/python3.7/site-packages/apex/amp/_initialize.py", line 171, in _initialize
    check_params_fp32(models)
  File "/home/gly21/.conda/envs/gly_ace/lib/python3.7/site-packages/apex/amp/_initialize.py", line 93, in check_params_fp32
    name, param.type()))
  File "/home/gly21/.conda/envs/gly_ace/lib/python3.7/site-packages/apex/amp/_amp_state.py", line 33, in warn_or_err
python-BaseException
    raise RuntimeError(msg)
RuntimeError: Found param embeddings.list_embedding_0.model.embeddings.word_embeddings.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor.
When using amp.initialize, you need to provide a model with parameters
located on a CUDA device before passing it no matter what optimization level
you chose. Use model.to('cuda') to use the default device.

Process finished with exit code 1

Issue with Replicating results on CoNLl

HI,

I've been following the instructions on your tutorial to try and replicate your experimental results.

I've installed requirements.txt and also the transformers lib.

Then I downloaded the pertained model from the OneDrive you've supplied conll_en_ner_model.zip and extracted it in directory resources/taggers

Then using the command you've supplied:
image

Gives me the following result:
image

I downloaded and extracted the conll dataset as specified in the link in the error, however I keep getting the same error. Could you please tell me what am I doing wrong?

Test result comparison without vs with document-level features

Sorry to raise another issue. I observed ACE+doc is worse

ACE+doc (python .\train.py --config .\config\doc_ner_best.yaml --test)
MICRO_AVG: acc 0.8338 - f1-score 0.9094
MACRO_AVG: acc 0.8032 - f1-score 0.8851
LOC tp: 1550 - fp: 135 - fn: 118 - tn: 1550 - precision: 0.9199 - recall: 0.9293 - accuracy: 0.8597 - f1-score: 0.9246
MISC tp: 478 - fp: 95 - fn: 224 - tn: 478 - precision: 0.8342 - recall: 0.6809 - accuracy: 0.5997 - f1-score: 0.7498
ORG tp: 1426 - fp: 99 - fn: 235 - tn: 1426 - precision: 0.9351 - recall: 0.8585 - accuracy: 0.8102 - f1-score: 0.8952
PER tp: 1563 - fp: 40 - fn: 54 - tn: 1563 - precision: 0.9750 - recall: 0.9666 - accuracy: 0.9433 - f1-score: 0.9708

ACE (python .\train.py --config .\config\conll_03_english.yaml --test)
MICRO_AVG: acc 0.8807 - f1-score 0.9366
MACRO_AVG: acc 0.8635 - f1-score 0.9247500000000001
LOC tp: 1580 - fp: 90 - fn: 88 - tn: 1580 - precision: 0.9461 - recall: 0.9472 - accuracy: 0.8987 - f1-score: 0.9466
MISC tp: 606 - fp: 115 - fn: 96 - tn: 606 - precision: 0.8405 - recall: 0.8632 - accuracy: 0.7417 - f1-score: 0.8517
ORG tp: 1561 - fp: 159 - fn: 100 - tn: 1561 - precision: 0.9076 - recall: 0.9398 - accuracy: 0.8577 - f1-score: 0.9234
PER tp: 1575 - fp: 31 - fn: 42 - tn: 1575 - precision: 0.9807 - recall: 0.9740 - accuracy: 0.9557 - f1-score: 0.9773

Did I miss anything? Thanks.

'FastSequenceTagger' object has no attribute 'selection'

Hello,

Thank you for your previous help.

I took your advice and am working with the config files in an attempt to reproduce your results.

When I run CUDA_VISIBLE_DEVICES=0 python3 train.py --config config/conll_03_english.yaml --test, I run into Pdb instances (but that's not a problem), and eventually get an AttributeError:

Traceback (most recent call last):
  File "train.py", line 163, in <module>
    predict_posterior=args.predict_posterior,
  File "/workspace/ACE/flair/trainers/reinforcement_trainer.py", line 1358, in final_test
    for name, module in self.model.named_modules():
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 585, in __getattr__
    type(self).__name__, name))
AttributeError: 'FastSequenceTagger' object has no attribute 'selection'

And it looks like self.selection initialization has been commented out in https://github.com/Alibaba-NLP/ACE/blob/main/flair/models/sequence_tagger_model.py

I get the same error when I attempt to confirm performance on chunking.

Your help is greatly appreciated.

How many embeds are connected?

My greetings,
Thank you for these ideas and this work.
I would like to know how many embeds are connected (two, three or four?), is it constant during training, and is there an optimal number according to your experience?

Questions about reproduction of Aspect Term Extraction

Hi Wang Xinyu, I am a graduate student from NUS. Thanks for sharing such valuable study and source code. Now, I am trying to reproduce the results of aspect term extraction. I followed the instruction of Named Entity Recognition and referred to several files (like bert-en-ner-finetune.yaml, en-bert-extract.yaml, conll_03_english.yaml, etc.). But I still found it is difficult to write an executable config file for aspect extraction(targets: ast). The guidance is incomplete and makes me so confused. For example, why do I need to provided a finetuned model name since I haven't trained a finetuned model yet, like 'model_name: en-bert_10epoch_32batch_0.00005lr_10000lrrate_en_monolingual_nocrf_fast_sentbatch_sentloss_finetune_saving_nodev_newner4'. Plus, there are some keywords that are not explained, such as 'processors' and 'teachers' in 'upos', 'anneal_factor', 'interpolation'. I have tried to retrieve some information about that in flair/datasets.py file. But I was really unfamiliar with such code style and found it hard to start. I would appreciate it if you share such related instructions or files about aspect term extraction. Look forward to your reply.
Best wishes

Different Label Space

What should I do if the data I'm using has a different label space than the data used

PER : Person
LOC : Location
GRP : Group
CORP : Corporation
PROD : Product
CW: Creative Work

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.