alibaba-nlp / multilangstructurekd Goto Github PK
View Code? Open in Web Editor NEW[ACL 2020] Structure-Level Knowledge Distillation For Multilingual Sequence Labeling
License: Other
[ACL 2020] Structure-Level Knowledge Distillation For Multilingual Sequence Labeling
License: Other
Hello,
I read the paper, it was really exciting for me, also I was thinking to replicate the model into my environment.
You have mentioned, an alternative way is training the teacher models by yourself with the below script..
Script: python train_with_teacher.py --config config/multi_bert_origflair_300epoch_2000batch_1lr_256hidden_de_monolingual_crf_sentloss_10patience_baseline_nodev_ner0.yaml
But I don't find the "multi_bert_origflair_300epoch_2000batch_1lr_256hidden_de_monolingual_crf_sentloss_10patience_baseline_nodev_ner0.yaml" file in config.
Can you guide mw how can I train these teachers ?
Regards,
Shiven.
Hi, I tried to run Posterior distillation without M-BERT finetuning and got the following error:
Traceback (most recent call last):
File "train_with_teacher.py", line 104, in <module>
teachers=teacher_func()
File "/home/mlej8/projects/MultilangStructureKD/flair/config_parser.py", line 235, in create_teachers_list
config=Params.from_file(filename)
File "/home/mlej8/projects/MultilangStructureKD/flair/utils/params.py", line 102, in from_file
with open(params_file, encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'config_gen/multi_bert_origflair_300epoch_2000batch_1lr_256hidden_de_monolingual_crf_sentloss_10patience_baseline_fast_nodev_ner12.yaml'
I looked around the repo and did not find any config_gen directory. Is it possible that these files weren't uploaded ?
Hello!
I trained model on Russian data with help of your config file. But I can't use it to predict dependencies.
I tried using command below, but it started to train the model again.
python3 train_with_teacher.py --config config/xlmr_1000epoch_0.1inter_2000batch_0.002lr_400hidden_ru_monolingual_nocrf_fast_2nd_nodev_enhancedud15.yaml --predict
I also tried importing model in python and doing model([sentence])
, where sentence is an instance of Sentence class, but it gave this error: AttributeError: 'list' object has no attribute 'features'
I also tried passing my data to model(data)
as UniversalDependenciesDataset object, but it didn't help:
File "/home/my_folder/.conda/envs/alibaba-nlp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541_call__
result = self.forward(*input, **kwargs)
File "/home/my_folder/MultilangStructureKD/flair/models/dependency_model.py", line 524, in forward
sentence_tensor = torch.cat([sentences.features[x].to(flair.device) for x in sentences.features],-1)
AttributeError: 'UniversalDependenciesDataset' object has no attribute 'features'
Can you give me any hints on how to use it? Thank you in advance!
Hello,
I have successfully trained German and English teacher models, but facing similar issue with the the last two models. The training starts, but it get killed.
python train_with_teacher.py --config config/multi_bert_origflair_300epoch_2000batch_0.1lr_256hidden_nl_monolingual_crf_sentloss_10patience_baseline_nodev_ner1.yaml
python train_with_teacher.py --config config/multi_bert_origflair_300epoch_2000batch_0.1lr_256hidden_es_monolingual_crf_sentloss_10patience_baseline_nodev_ner1.yaml
2020-12-31 15:30:53,359 epoch 1 - iter 0/151 - loss 55.17443848 - samples/sec: 17.41 - decode_sents/sec: 2256.12
Traceback (most recent call last):
File "/home/shivendra/MultilangStructureKD/flair/trainers/distillation_trainer.py", line 396, in train
loss = self.model.forward_loss(student_input)
File "/home/shivendra/MultilangStructureKD/flair/models/sequence_tagger_model.py", line 526, in forward_loss
features = self.forward(data_points)
File "/home/shivendra/MultilangStructureKD/flair/models/sequence_tagger_model.py", line 665, in forward
self.embeddings.embed(sentences)
File "/home/shivendra/MultilangStructureKD/flair/embeddings.py", line 169, in embed
embedding.embed(sentences)
File "/home/shivendra/MultilangStructureKD/flair/embeddings.py", line 90, in embed
self._add_embeddings_internal(sentences)
File "/home/shivendra/MultilangStructureKD/flair/embeddings.py", line 2665, in _add_embeddings_internal
mean = torch.mean(torch.cat(embeddings, dim=0), dim=0)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CUDATensorId, CPUTensorId, VariableTensorId]
/home/shivendra/MultilangStructureKD/flair/trainers/distillation_trainer.py(410)train()
-> torch.nn.utils.clip_grad_norm_(self.model.parameters(), 5.0)
(Pdb)
python train_with_teacher.py --config config/multi_bert_origflair_300epoch_2000batch_0.1lr_256hidden_nl_monolingual_crf_sentloss_10patience_baseline_nodev_ner1.yaml
/home/shivendra/MultilangStructureKD/flair/utils/params.py:104: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
dict_merge.dict_merge(params_dict, yaml.load(f))
2020-12-31 15:30:08,375 Reading data from /home/shivendra/.flair/datasets/conll_03_dutch
2020-12-31 15:30:08,375 Train: /home/shivendra/.flair/datasets/conll_03_dutch/ned.train
2020-12-31 15:30:08,375 Dev: /home/shivendra/.flair/datasets/conll_03_dutch/ned.testa
2020-12-31 15:30:08,375 Test: /home/shivendra/.flair/datasets/conll_03_dutch/ned.testb
2020-12-31 15:30:08,375 UTF-8 can't read: /home/shivendra/.flair/datasets/conll_03_dutch/ned.train ... using "latin-1" instead.
2020-12-31 15:30:11,966 UTF-8 can't read: /home/shivendra/.flair/datasets/conll_03_dutch/ned.testb ... using "latin-1" instead.
2020-12-31 15:30:13,066 UTF-8 can't read: /home/shivendra/.flair/datasets/conll_03_dutch/ned.testa ... using "latin-1" instead.
2020-12-31 15:30:13,783 {b'': 0, b'O': 1, b'B-PER': 2, b'E-PER': 3, b'S-LOC': 4, b'B-MISC': 5, b'I-MISC': 6, b'E-MISC': 7, b'S-MISC': 8, b'S-PER': 9, b'B-ORG': 10, b'E-ORG': 11, b'S-ORG': 12, b'I-ORG': 13, b'B-LOC': 14, b'E-LOC': 15, b'I-PER': 16, b'I-LOC': 17, b'': 18, b'': 19}
2020-12-31 15:30:13,783 Corpus: 15806 train + 2895 dev + 5195 test sentences
/home/shivendra/.conda/envs/strcKD/lib/python3.7/site-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host '127.0.0.1'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
[2020-12-31 15:30:14,043 INFO] loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt from cache at /home/shivendra/.cache/torch/pytorch_transformers/96435fa287fbf7e469185f1062386e05a075cadbf6838b74da22bf64b080bc32.99bcd55fc66f4f3360bc49ba472b940b8dcf223ea6a345deb969d607ca900729
/home/shivendra/.conda/envs/strcKD/lib/python3.7/site-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host '127.0.0.1'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
[2020-12-31 15:30:14,467 INFO] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-config.json from cache at /home/shivendra/.cache/torch/pytorch_transformers/45629519f3117b89d89fd9c740073d8e4c1f0a70f9842476185100a8afe715d1.65df3cef028a0c91a7b059e4c404a975ebe6843c71267b67019c0e9cfa8a88f0
[2020-12-31 15:30:14,468 INFO] Model config {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"directionality": "bidi",
"finetuning_task": null,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"num_labels": 2,
"output_attentions": false,
"output_hidden_states": true,
"pad_token_id": 0,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"pruned_heads": {},
"torchscript": false,
"type_vocab_size": 2,
"vocab_size": 119547
}
/home/shivendra/.conda/envs/strcKD/lib/python3.7/site-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host '127.0.0.1'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
[2020-12-31 15:30:14,728 INFO] loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-pytorch_model.bin from cache at /home/shivendra/.cache/torch/pytorch_transformers/5b5b80054cd2c95a946a8e0ce0b93f56326dff9fbda6a6c3e02de3c91c918342.7131dcb754361639a7d5526985f880879c9bfd144b65a0bf50590bddb7de9059
[2020-12-31 15:30:25,855 INFO] loading Word2VecKeyedVectors object from /home/shivendra/.flair/embeddings/nl-wiki-fasttext-300d-1M
[2020-12-31 15:30:28,985 INFO] loading vectors from /home/shivendra/.flair/embeddings/nl-wiki-fasttext-300d-1M.vectors.npy with mmap=None
[2020-12-31 15:30:29,405 INFO] setting ignored attribute vectors_norm to None
[2020-12-31 15:30:29,406 INFO] loaded /home/shivendra/.flair/embeddings/nl-wiki-fasttext-300d-1M
2020-12-31 15:30:30,123 ----------------------------------------------------------------------------------------------------
2020-12-31 15:30:30,125 Model: "SequenceTagger(
(embeddings): StackedEmbeddings(
(list_embedding_0): BertEmbeddings(
(model): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(119547, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(1): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(2): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(3): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(4): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(5): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(6): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(7): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(8): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(9): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(10): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(11): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
)
(list_embedding_1): FlairEmbeddings(
(lm): LanguageModel(
(drop): Dropout(p=0.1, inplace=False)
(encoder): Embedding(7632, 100)
(rnn): LSTM(100, 2048)
(decoder): Linear(in_features=2048, out_features=7632, bias=True)
)
)
(list_embedding_2): FlairEmbeddings(
(lm): LanguageModel(
(drop): Dropout(p=0.1, inplace=False)
(encoder): Embedding(7632, 100)
(rnn): LSTM(100, 2048)
(decoder): Linear(in_features=2048, out_features=7632, bias=True)
)
)
(list_embedding_3): WordEmbeddings('nl')
)
(word_dropout): WordDropout(p=0.05)
(locked_dropout): LockedDropout(p=0.5)
(embedding2nn): Linear(in_features=5164, out_features=5164, bias=True)
(rnn): LSTM(5164, 256, bidirectional=True)
(linear): Linear(in_features=512, out_features=20, bias=True)
)"
2020-12-31 15:30:30,126 ----------------------------------------------------------------------------------------------------
2020-12-31 15:30:30,126 Corpus: "Corpus: 15806 train + 2895 dev + 5195 test sentences"
2020-12-31 15:30:30,126 ----------------------------------------------------------------------------------------------------
2020-12-31 15:30:30,126 Parameters:
2020-12-31 15:30:30,126 - learning_rate: "0.1"
2020-12-31 15:30:30,126 - mini_batch_size: "2000"
2020-12-31 15:30:30,126 - patience: "10"
2020-12-31 15:30:30,126 - anneal_factor: "0.5"
2020-12-31 15:30:30,126 - max_epochs: "300"
2020-12-31 15:30:30,126 - shuffle: "True"
2020-12-31 15:30:30,127 - train_with_dev: "False"
2020-12-31 15:30:30,127 ----------------------------------------------------------------------------------------------------
2020-12-31 15:30:30,127 Model training base path: "resources/taggers/multi_bert_origflair_300epoch_2000batch_1lr_256hidden_nl_monolingual_crf_sentloss_10patience_baseline_nodev_ner1"
2020-12-31 15:30:30,127 ----------------------------------------------------------------------------------------------------
2020-12-31 15:30:30,127 Device: cuda:0
2020-12-31 15:30:30,127 ----------------------------------------------------------------------------------------------------
2020-12-31 15:30:30,127 Embeddings storage mode: cpu
/home/shivendra/.conda/envs/strcKD/lib/python3.7/site-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host '127.0.0.1'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
[2020-12-31 15:30:30,444 INFO] loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt from cache at /home/shivendra/.cache/torch/pytorch_transformers/96435fa287fbf7e469185f1062386e05a075cadbf6838b74da22bf64b080bc32.99bcd55fc66f4f3360bc49ba472b940b8dcf223ea6a345deb969d607ca900729
/home/shivendra/.conda/envs/strcKD/lib/python3.7/site-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host '127.0.0.1'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
[2020-12-31 15:30:42,174 INFO] loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt from cache at /home/shivendra/.cache/torch/pytorch_transformers/96435fa287fbf7e469185f1062386e05a075cadbf6838b74da22bf64b080bc32.99bcd55fc66f4f3360bc49ba472b940b8dcf223ea6a345deb969d607ca900729
/home/shivendra/.conda/envs/strcKD/lib/python3.7/site-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host '127.0.0.1'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
[2020-12-31 15:30:44,755 INFO] loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt from cache at /home/shivendra/.cache/torch/pytorch_transformers/96435fa287fbf7e469185f1062386e05a075cadbf6838b74da22bf64b080bc32.99bcd55fc66f4f3360bc49ba472b940b8dcf223ea6a345deb969d607ca900729
2020-12-31 15:30:48,764 ----------------------------------------------------------------------------------------------------
2020-12-31 15:30:53,359 epoch 1 - iter 0/151 - loss 55.17443848 - samples/sec: 17.41 - decode_sents/sec: 2256.12
Traceback (most recent call last):
File "/home/shivendra/MultilangStructureKD/flair/trainers/distillation_trainer.py", line 396, in train
loss = self.model.forward_loss(student_input)
File "/home/shivendra/MultilangStructureKD/flair/models/sequence_tagger_model.py", line 526, in forward_loss
features = self.forward(data_points)
File "/home/shivendra/MultilangStructureKD/flair/models/sequence_tagger_model.py", line 665, in forward
self.embeddings.embed(sentences)
File "/home/shivendra/MultilangStructureKD/flair/embeddings.py", line 169, in embed
embedding.embed(sentences)
File "/home/shivendra/MultilangStructureKD/flair/embeddings.py", line 90, in embed
self._add_embeddings_internal(sentences)
File "/home/shivendra/MultilangStructureKD/flair/embeddings.py", line 2665, in _add_embeddings_internal
mean = torch.mean(torch.cat(embeddings, dim=0), dim=0)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CUDATensorId, CPUTensorId, VariableTensorId]
/home/shivendra/MultilangStructureKD/flair/trainers/distillation_trainer.py(410)train()
-> torch.nn.utils.clip_grad_norm_(self.model.parameters(), 5.0)
(Pdb)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.