Code Monkey home page Code Monkey logo

parser's Introduction

Hi there 👋

My name is Yu Zhang ([jy tʃɑŋ], 張宇/张宇 in traditional/simplified Chinese).

I am currently a third-year PhD student at HLT@SUDA, advised by Prof. Guohong Fu. I expect to graduate in 2025. Prior to this, I received my M. Eng. degree from Soochow University in 2021.

My early research focused on structured prediction tasks, specifically dependency parsing and constituency parsing. Currently, my research interests have evolved to focus on developing efficient text generation models. I am particularly intrigued by the prospect of developing hardware-efficient methods for linear-time sequence modeling. As a disciple of parallel programming, I am passionate about exploring techniques that harness the power of parallel computing to develop scalable subquadratic models.

github contribution grid snake animation

parser's People

Contributors

koichiyasuoka avatar nomalocaris avatar yzhangcs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parser's Issues

Some bugs

你好,很抱歉多次打扰你!请问你的代码中做了几处更改后,在执行过程中报了下面两个错误:

  • corpus.py
Sentence = namedtuple(typename='Sentence',
                      field_names=['ID', 'FORM', 'LEMMA', 'CPOS',
                                   'POS', 'FEATS', 'HEAD', 'DEPREL',
                                   'PHEAD', 'PDEPREL'],
                      defaults=[None]*10)

上述对于Sentence的声明中加入了defaults,在运行中会报错:


Traceback (most recent call last):
  File "run.py", line 5, in <module>
    from parser.cmds import Evaluate, Predict, Train
  File "/data/lxiao/workspace/biaffine-parser/parser/cmds/__init__.py", line 3, in <module>
    from .evaluate import Evaluate
  File "/data/lxiao/workspace/biaffine-parser/parser/cmds/evaluate.py", line 4, in <module>
    from parser.utils import Corpus
  File "/data/lxiao/workspace/biaffine-parser/parser/utils/__init__.py", line 4, in <module>
    from .corpus import Corpus
  File "/data/lxiao/workspace/biaffine-parser/parser/utils/corpus.py", line 10, in <module>
    defaults=[None]*10)
TypeError: namedtuple() got an unexpected keyword argument 'defaults'

  • 关于parser.py
Create the model
BiaffineParser(
  (pretrained): Embedding(355413, 300)
  (embed): Embedding(21118, 300)
  (tag_embed): Embedding(4, 100)
  (embed_dropout): IndependentDropout(p=0.33)
  (lstm): BiLSTM(
    (f_cells): ModuleList(
      (0): LSTMCell(400, 400)
      (1): LSTMCell(800, 400)
      (2): LSTMCell(800, 400)
    )
    (b_cells): ModuleList(
      (0): LSTMCell(400, 400)
      (1): LSTMCell(800, 400)
      (2): LSTMCell(800, 400)
    )
  )
  (lstm_dropout): SharedDropout(p=0.33, batch_first=True)
  (mlp_arc_h): MLP(
    (linear): Linear(in_features=800, out_features=500, bias=True)
    (activation): LeakyReLU(negative_slope=0.1)
    (dropout): SharedDropout(p=0.33, batch_first=True)
  )
  (mlp_arc_d): MLP(
    (linear): Linear(in_features=800, out_features=500, bias=True)
    (activation): LeakyReLU(negative_slope=0.1)
    (dropout): SharedDropout(p=0.33, batch_first=True)
  )
  (mlp_rel_h): MLP(
    (linear): Linear(in_features=800, out_features=100, bias=True)
    (activation): LeakyReLU(negative_slope=0.1)
    (dropout): SharedDropout(p=0.33, batch_first=True)
  )
  (mlp_rel_d): MLP(
    (linear): Linear(in_features=800, out_features=100, bias=True)
    (activation): LeakyReLU(negative_slope=0.1)
    (dropout): SharedDropout(p=0.33, batch_first=True)
  )
  (arc_attn): Biaffine(n_in=500, n_out=1, bias_x=True)
  (rel_attn): Biaffine(n_in=100, n_out=3, bias_x=True, bias_y=True)
)

Traceback (most recent call last):
  File "run.py", line 52, in <module>
    cmd(config)
  File "/data/lxiao/workspace/biaffine-parser/parser/cmds/train.py", line 100, in __call__
    model.train(train_loader)
  File "/data/lxiao/workspace/biaffine-parser/parser/model.py", line 27, in train
    s_arc, s_rel = self.parser(words, tags)
  File "/home/user/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/lxiao/workspace/biaffine-parser/parser/parser.py", line 80, in forward
    x = self.lstm(x)
  File "/home/user/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/lxiao/workspace/biaffine-parser/parser/modules/bilstm.py", line 67, in forward
    x, batch_sizes = x
ValueError: too many values to unpack (expected 2)

unexpected keyword argument 'defaults'

Running:

python run.py -h

I get:

I0519 11:18:48.137495 139755282601792 file_utils.py:55] TensorFlow version 2.1.0 available.
Traceback (most recent call last):
File "run.py", line 5, in
from parser.cmds import Evaluate, Predict, Train
File "/project/piqasso/tools/biaffine-parser/parser/cmds/init.py", line 3, in
from .evaluate import Evaluate
File "/project/piqasso/tools/biaffine-parser/parser/cmds/evaluate.py", line 5, in
from parser.cmds.cmd import CMD
File "/project/piqasso/tools/biaffine-parser/parser/cmds/cmd.py", line 4, in
from parser.utils import Embedding
File "/project/piqasso/tools/biaffine-parser/parser/utils/init.py", line 3, in
from . import corpus, data, field, fn, metric
File "/project/piqasso/tools/biaffine-parser/parser/utils/corpus.py", line 10, in
defaults=[None]*10)
TypeError: namedtuple() got an unexpected keyword argument 'defaults'

RuntimeError: cuda runtime error (59)

Hello, I run your code locally when the code runs the training, validation, and testing correctly. But the following error occurs while running on the server: (Note: My data label category is 2)

THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=26 error=59 : device-side assert triggered
Traceback (most recent call last):
File "run.py", line 112, in
file=args.file)
File "/home/workspace/dependency-parser/BiaffineAttention/biaffine/trainer.py", line 27, in fit
self.train(train_loader)
File "/home/workspace/dependency-parser/BiaffineAttention/biaffine/trainer.py", line 68, in train
loss.backward()
File "/home/miniconda3/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/miniconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorMath.cu:26
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes failed.

Pytorch version problem

Hi, thanks for this work, it helps me a lot..
I'm using Pytorch 1.4-1.5 previously. Everything runs fine. But when I switch to Pytorch 1.7, it returns RuntimeError with:

'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor

Here's full error stack:

File "/my_proj/utils/supar/parsers/biaffine_dependency.py", line 125, in predict
return super().predict(**Config().update(locals()))
File "/my_proj/utils/supar/parsers/parser.py", line 135, in predict
preds = self._predict(dataset.loader)
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/my_proj/utils/supar/parsers/biaffine_dependency.py", line 187, in _predict
s_arc, s_rel = self.model(words, feats)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/my_proj/utils/supar/models/dependency.py", line 199, in forward
feat_embed = self.feat_embed(feats)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/my_proj/utils/supar/modules/char_lstm.py", line 66, in forward
x = pack_padded_sequence(x, lens[char_mask], True, False)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/utils/rnn.py", line 244, in pack_padded_sequence
_VF._pack_padded_sequence(input, lengths, batch_first)

Is there any way to fix it?

Question about fine-tuning

Hi, thanks for the awesome library. I wonder if it is possible to pretrain the model on one dataset and then fine-tune on another?

Attribute error when calling hasattr()

I ran into the following error when initiating training:

To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html
Set the max num of threads to 16
Set the seed for generating random numbers to 1
Set the device with ID 0 visible
Override the default configs with parsed arguments
----------------+--------------------------
Param           |           Value          
----------------+--------------------------
bert_model      |      bert-base-cased     
n_embed         |            100           
n_char_embed    |            50            
n_bert_layers   |             4            
embed_dropout   |           0.33           
n_lstm_hidden   |            400           
n_lstm_layers   |             3            
lstm_dropout    |           0.33           
n_mlp_arc       |            500           
n_mlp_rel       |            100           
mlp_dropout     |           0.33           
lr              |           0.002          
mu              |            0.9           
nu              |            0.9           
epsilon         |           1e-12          
clip            |            5.0           
decay           |           0.75           
decay_steps     |           5000           
batch_size      |           5000           
epochs          |            500           
patience        |            100           
min_freq        |             2            
fix_len         |            20            
mode            |           train          
buckets         |            32            
punct           |           False          
ftrain          | /home/jingya/biaffine-parser/data/test_en/train.conllu
fdev            | /home/jingya/biaffine-parser/data/test_en/dev.conllu
ftest           | /home/jingya/biaffine-parser/data/test_en/test.conllu
fembed          | /home/jingya/biaffine-parser/data/glove/glove.6B.100d.txt
unk             |            unk           
conf            |        config.ini        
file            |       exp/ptb.char       
preprocess      |           True           
device          |           cuda           
seed            |             1            
threads         |            16            
tree            |           False          
feat            |           char           
fields          |    exp/ptb.char/fields   
model           |    exp/ptb.char/model    
----------------+--------------------------

Run the subcommand in mode train
Preprocess the data
Traceback (most recent call last):
  File "run.py", line 58, in <module>
    cmd(args)
  File "/home/jingya/biaffine-parser/parser/cmds/train.py", line 40, in __call__
    super(Train, self).__call__(args)
  File "/home/jingya/biaffine-parser/parser/cmds/cmd.py", line 49, in __call__
    self.WORD.build(train, args.min_freq, embed)
  File "/home/jingya/biaffine-parser/parser/utils/field.py", line 73, in build
    counter = Counter(token for sequence in sequences
  File "/home/weijian/anaconda3/envs/biaffine/lib/python3.7/collections/__init__.py", line 566, in __init__
    self.update(*args, **kwds)
  File "/home/weijian/anaconda3/envs/biaffine/lib/python3.7/collections/__init__.py", line 653, in update
    _count_elements(self, iterable)
  File "/home/jingya/biaffine-parser/parser/utils/field.py", line 73, in <genexpr>
    counter = Counter(token for sequence in sequences
  File "/home/jingya/biaffine-parser/parser/utils/corpus.py", line 59, in __getattr__
    raise AttributeError
AttributeError

After some investigation, it seems like the error is cased by getattr() in corpus.py

def __getattr__(self, name):
        if not hasattr(self.sentences[0], name):
            raise AttributeError

When I print out name I get "words", which is set in line 25 in cmd.py. However, self.sentences is a list of Sentence object, and by definition Sentence object has no attribute "words", and that is the cause of the AttributeError.

Any idea how I can fix this?

feat_pad_index

The value of args.feat_pad_index is calculated by property Field.pad_index and depends on the presence of attribute vocab.
However in CMD.call() the attribute gets set to:

self.FEAT.vocab = tokenizer.vocab

Therefore the value of args.feat_pad_index must be updated correspondingly:

args.feat_pad_index = self.FEAT.pad_index

or else the value that is saved in the model will be inconsistent.

TypeError: namedtuple() got an unexpected keyword argument 'defaults'

[kyzhang@gpu12 biaffine-parser]$ python run.py train -h
Traceback (most recent call last):
  File "run.py", line 5, in <module>
    from parser.cmds import Evaluate, Predict, Train
  File "/users6/kyzhang/attempts/biaffine-parser/parser/cmds/__init__.py", line 3, in <module>
    from .evaluate import Evaluate
  File "/users6/kyzhang/attempts/biaffine-parser/parser/cmds/evaluate.py", line 5, in <module>
    from parser.cmds.cmd import CMD
  File "/users6/kyzhang/attempts/biaffine-parser/parser/cmds/cmd.py", line 4, in <module>
    from parser.utils import Embedding
  File "/users6/kyzhang/attempts/biaffine-parser/parser/utils/__init__.py", line 3, in <module>
    from . import corpus, data, field, fn, metric
  File "/users6/kyzhang/attempts/biaffine-parser/parser/utils/corpus.py", line 10, in <module>
    defaults=[None]*10)
TypeError: namedtuple() got an unexpected keyword argument 'defaults'

position_ids

A problem arises with transformers 3.1.0, in certain languages (nl and ta).

In method BertEmbeddings.forward() from file:

transformers/modeling_bert.py

position_ids is set as:

    if position_ids is None:
        position_ids = self.position_ids[:, :seq_length]

But it happens that seq_length = 583 while self.position_ids.shape = [1, 512]
therefore position_ids gets truncated and then the following fails:

    embeddings = inputs_embeds + position_embeddings + token_type_embeddings

since inputs_embeds.shape = [1, 593]

This is due to the fact that self.position_embeddings is initialised with
config.max_position_embeddings, which is set to 512.

!pip install supar failing on Google Colab notebook

I tried importing supar on Google Colab notebook using the following line of code:

import supar

But it threw a ModuleNotFoundError
So I tried installing it using !pip install supar and I got the following error:

ERROR: Could not find a version that satisfies the requirement supar (from versions: none)
ERROR: No matching distribution found for supar

@yzhangcs Can this package be used on Google Colab?

RuntimeError: CUDA out of memory

I am testing the dev branch, using transformers 2.10.0, somewhat successfully.

However it runs out of CUDA memory on the UD_English-EWT treebank:

Epoch 8 / 50000:
Traceback (most recent call last):
File "run.py", line 61, in
cmd(args)
File "/homenfs/tempGPU/iwpt2020/parser/parser/cmds/train.py", line 81, in call
loss, train_metric = self.train(train.loader)
File "/homenfs/tempGPU/iwpt2020/parser/parser/cmds/cmd.py", line 91, in train
s_arc, s_rel = self.model(words, feats)
File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/homenfs/tempGPU/iwpt2020/parser/parser/model.py", line 92, in forward
feat_embed = self.feat_embed(feats)
File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/homenfs/tempGPU/iwpt2020/parser/parser/modules/bert.py", line 59, in forward
embed = embed.masked_scatter_(mask.unsqueeze(-1), bert[bert_mask])
RuntimeError: CUDA out of memory. Tried to allocate 2.02 GiB (GPU 0; 14.76 GiB total capacity; 6.04 GiB already allocated; 1.06 GiB free; 12.90 GiB reserved in total by PyTorch)

I thought it was because the treebank is large, but there are larger treebanks on which it works:

wc -l UD_English-EWT/en_ewt-ud-train.conllu
242778 UD_English-EWT/en_ewt-ud-train.conllu
wc -l UD_Italian-ISDT/it_isdt-ud-train.conllu
333822 UD_Italian-ISDT/it_isdt-ud-train.conllu

I tried increasing the buckets to 48, but that did not help.
It works though by decreasing the batch_size to 500.

The problem occurs also with transformers 2.1.1 on the UD_English-EWT.

Question: Have you tried to ELMO based your code?

Hi, Have you tried to ELMO based your code? I tried to do this, but I have some problems.
Traceback (most recent call last): File "run.py", line 42, in <module> args.func(args) File "/home/workspace/elmo_biaffineparser/parser1/commands/train.py", line 111, in __call__ File "/home/workspace/elmo_biaffineparser/parser1/model.py", line 36, in __call__ self.train(train_loader, trainwords) File "/home/workspace/elmo_biaffineparser/parser1/model.py", line 84, in train loss.backward() File "/home/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 102, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 90, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: merge_sort: failed to synchronize: an illegal memory access was encountered

drop in performance

I notice a significant drop in performance in the release branch with respect to the dev branch using the same configuration with 2 n_lstm_layers and all bert data (n_feat_embed=0).
Here is an example on the UD Italian corpus.

Dev version:

----------------+--------------------------
Param           |           Value          
----------------+--------------------------
bert_model      | dbmdz/bert-base-italian-xxl-cased
n_embed         |            100           
n_char_embed    |            50            
n_feat_embed    |             0            
n_bert_layers   |             0            
embed_dropout   |           0.33           
n_lstm_hidden   |            400           
n_lstm_layers   |             2            
lstm_dropout    |           0.33           
mix_dropout     |            0.1           
n_mlp_arc       |            500           
n_mlp_rel       |            100           
mlp_dropout     |           0.33           
lr              |           0.002          
mu              |            0.9           
nu              |            0.9           
epsilon         |           1e-12          
clip            |            5.0           
decay           |           0.75           
decay_steps     |           5000           
batch_size      |           5000           
epochs          |           1000           
patience        |            20            
min_freq        |             2            
fix_len         |            20            
mode            |           train          
punct           |           False          
ftrain          | ../train-dev/UD_Italian-ISDT/it_isdt-ud-train.conllu
fdev            | ../train-dev/UD_Italian-ISDT/it_isdt-ud-dev.conllu
ftest           | ../test-turkunlp/it2.conllu
fembed          |                          
lower           |           False          
unk             |           [unk]          
max_sent_length |            512           
conf            |      config-bert.ini     
file            |       exp/it-bert/       
preprocess      |           True           
device          |           cuda           
seed            |             1            
threads         |            16            
tree            |           False          
proj            |           False          
feat            |           bert           
buckets         |            32            
fields          |    exp/it-bert/fields    
model           |     exp/it-bert/model    
n_words         |           13498          
n_feats         |           31102          
n_rels          |            45            
pad_index       |             0            
unk_index       |             1            
bos_index       |             2            
feat_pad_index  |             0            
----------------+--------------------------

....
Epoch 91 / 1000:
train: Loss: 0.2317 UAS: 95.97% LAS: 93.20%
dev:   Loss: 0.3567 UAS: 95.94% LAS: 93.90%
test:  Loss: 0.4502 UAS: 95.14% LAS: 92.84%

0:01:20.747564s elapsed (saved)

Release version:

----------------+--------------------------
Param           |           Value          
----------------+--------------------------
delete          | {'', '.', ':', 'S1', '?', '``', 'TOP', '!', '-NONE-', ',', "''"}
equal           |      {'ADVP': 'PRT'}     
bert            | dbmdz/bert-base-italian-xxl-cased
n_embed         |            100           
n_char_embed    |            50            
n_feat_embed    |             0            
n_bert_layers   |             0            
embed_dropout   |           0.33           
n_lstm_hidden   |            400           
n_lstm_layers   |             2            
lstm_dropout    |           0.33           
mix_dropout     |            0.1           
n_mlp_span      |            500           
n_mlp_arc       |            500           
n_mlp_label     |            100           
n_mlp_sib       |            100           
n_mlp_rel       |            100           
mlp_dropout     |           0.33           
lr              |           0.002          
mu              |            0.9           
nu              |            0.9           
epsilon         |           1e-12          
clip            |            5.0           
decay           |           0.75           
decay_steps     |           5000           
batch_size      |           5000           
epochs          |           1000           
patience        |            20            
min_freq        |             2            
fix_len         |            20            
mode            |           train          
path            |     exp/it-bert/model    
conf            |      config-bert.ini     
device          |           cuda           
seed            |             1            
threads         |            16            
buckets         |            32            
tree            |           False          
proj            |           False          
feat            |           bert           
build           |           False          
punct           |           False          
max_len         |           None           
train           | ../train-dev/UD_Italian-ISDT/it_isdt-ud-train.conllu
dev             | ../train-dev/UD_Italian-ISDT/it_isdt-ud-dev.conllu
test            | ../test-turkunlp/it2.conllu
embed           |                          
unk             |           [unk]          
----------------+--------------------------

2020-06-15 07:52:02 INFO train: 13121 sentences,  68 batches, 32 buckets
2020-06-15 07:52:02 INFO dev:     564 sentences,  32 batches, 32 buckets
2020-06-15 07:52:02 INFO test:    489 sentences,  32 batches, 32 buckets

2020-06-15 07:52:02 INFO BiaffineParserModel(
  (word_embed): Embedding(12876, 100)
  (feat_embed): BertEmbedding(n_layers=12, n_out=768, pad_index=0)
  (embed_dropout): IndependentDropout(p=0.33)
  (lstm): BiLSTM(868, 400, num_layers=2, dropout=0.33)
  (lstm_dropout): SharedDropout(p=0.33, batch_first=True)
  (mlp_arc_d): MLP(n_in=800, n_out=500, dropout=0.33)
  (mlp_arc_h): MLP(n_in=800, n_out=500, dropout=0.33)
  (mlp_rel_d): MLP(n_in=800, n_out=100, dropout=0.33)
  (mlp_rel_h): MLP(n_in=800, n_out=100, dropout=0.33)
  (arc_attn): Biaffine(n_in=500, n_out=1, bias_x=True)
  (rel_attn): Biaffine(n_in=100, n_out=45, bias_x=True, bias_y=True)
  (criterion): CrossEntropyLoss()
)
...
2020-06-15 09:18:19 INFO Epoch 119 / 1000:
2020-06-15 09:18:59 INFO dev:   - loss: 0.6333 - UCM: 46.28% LCM: 29.96% UAS: 91.46% LAS: 87.32%
2020-06-15 09:19:01 INFO test:  - loss: 0.3640 - UCM: 55.42% LCM: 41.10% UAS: 93.57% LAS: 90.32%
2020-06-15 09:19:06 INFO 0:00:41.564757s elapsed (saved)

With 4 n_bert_layers and 100 n_feats_embed dev and release brach perform similarly:

n_feat_embed    |            100           
n_bert_layers   |             4            
embed_dropout   |           0.33           
n_lstm_hidden   |            400           
n_lstm_layers   |             3            

it works better:

2020-06-13 20:15:39 INFO Epoch 84 / 1000:
2020-06-13 20:17:08 INFO dev:   - loss: 0.4315 - UCM: 61.70% LCM: 47.87% UAS: 95.80% LAS: 93.46%
2020-06-13 20:17:13 INFO test:  - loss: 0.3368 - UCM: 60.12% LCM: 47.65% UAS: 95.05% LAS: 92.48%
2020-06-13 20:17:18 INFO 0:01:34.633723s elapsed (saved)

which is similar to the dev branch:

n_feat_embed    |            100           

n_bert_layers | 4
embed_dropout | 0.33
n_lstm_hidden | 400
n_lstm_layers | 3

Epoch 112 / 1000:

train: Loss: 0.2213 UAS: 96.16% LAS: 93.16%
dev: Loss: 0.3626 UAS: 95.90% LAS: 93.51%
test: Loss: 0.4056 UAS: 95.13% LAS: 92.63%
0:01:07.875049s elapsed (saved)

What could be the reason?
You suggested that using all layers and all features from BERT would have been beneficial, and indeed it was in the dev branch.

Things slow down when I use DDP

Hello, thanks for sharing this project. I was trying to train the parser using DDP as is shown in the ReadMe. Unfortunately, this is twice as slow compared to using a single GPU. On a single GPU, it takes about 25 seconds whereas when I use two GPUs (installed on the same machine), it just seems to pause at the start of a new epoch and hence takes about 58 seconds.

I understand that this could possible not be an issue with your code, but was wondering if you could provide any pointers. I am using exactly the same style of command that is given in the ReadMe.

Automatically Deleting Old Temporary Models

Hello,

First of all, thank you very much for your work.
I use your parser a lot lately.

I had a wonder though.
When I train a lot of parsers on very small data sets, I'm left with thousands of saved models.
Is there a way to automatically delete those past models and only keep the latest one?

Thank you very much

kmeans

I get this error when training on the Tamil treebank:

File "/project/piqasso/tools/biaffine-parser/parser/utils/alg.py", line 18, in kmeans
assert len(d) >= k, f"unable to assign {len(d)} datapoints to {k} clusters"
AssertionError: unable to assign 25 datapoints to 32 clusters

With the debugger I found that in the invocation of kmeans(x, k)
with len(x) = 80, k = 32
at line
10 d, indices, f = x.unique(return_inverse=True, return_counts=True)

d = tensor([ 6., 7., 8., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20.,
21., 22., 23., 24., 26., 27., 32., 33., 35., 45., 51.])
len(d) = 25

f =tensor([4, 1, 1, 5, 1, 7, 6, 4, 8, 8, 2, 2, 1, 8, 5, 2, 1, 2, 3, 3, 1, 2, 1, 1, 1])
len(f) = 25

With other treebanks it work fine.

Thank you for the nice and useful project.

Apply MST when evaluating or predicting?

Hi, I found that it seems that you didn't apply MST when evaluating or predicting.
In parser/model.py, the decoding function directly uses Argmax function instead of MST algorithm.
It means that it will yield illegal dependency parsing result when predicting.
Am I correct? Thank you

Potential problem with not truncating sequences

When trying to train a Finnish model (using TDT treebank and bert-base-multilingual-uncased), I ran into a cryptic CUDA error:

  ...
  File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py", line 177, in forward
    embeddings = inputs_embeds + position_embeddings + token_type_embeddings
RuntimeError: CUDA error: device-side assert triggered

When trying to figure out what was wrong and running the script with CUDA_LAUNCH_BLOCKING=1, I found out some clues about it being an indexing problem (going out of bounds somewhere). This lead me to think it had something to do with BERT's limited max sequence length (512) and sure enough the script stopped crashing when I removed the example which had a length longer than that from the training set.

I unfortunately don't have a fix ready but I thought I'd leave a note here in case somebody else runs into this problem.

CoNLLU annotation

Here is code to preserve the annotations in the CoNLLU format.

In utils/corpus.py change method Corpus.load()

@classmethod
def load(cls, path, fields, max_sent_length=math.inf):
    sentences = []
    fields = [field if field is not None else Field(str(i))
              for i, field in enumerate(fields)]
    with open(path, 'r') as f:
        lines = []
        for line in f:
            line = line.strip()
            if not line:
                sentences.append(Sentence(fields, lines))
                lines = []
            else:
                lines.append(line)

    return cls(fields, sentences)

method Sentence._init():

def __init__(self, fields, lines):
    self.annotations = dict()
    values = []
    for i, line in enumerate(lines):
        if line.startswith('#'):
            self.annotations[-i-1] = line
        else:
            value = line.split('\t')
            if value[0].isdigit():
                values.append(value)
                self.annotations[int(value[0])] = '' # placeholder                                                                 
            else:
                self.annotations[-i] = line
    for field, value in zip(fields, list(zip(*values))):
        if isinstance(field, Iterable):
            for j in range(len(field)):
                setattr(self, field[j].name, value)
        else:
            setattr(self, field.name, value)
    self.fields = fields

and method Sentence.repr():

def __repr__(self):
    merged = {**self.annotations,
              **{i+1: '\t'.join(map(str, line))
                 for i, line in enumerate(zip(*self.values))} }
    return '\n'.join(merged.values()) + '\n'

token

We need to use a defaultdict in biaffine_parser.py

            if hasattr(tokenizer, 'vocab'):
                FEAT.vocab = tokenizer.vocab
            else:
                from collections import defaultdict
                FEAT.vocab = defaultdict(lambda: tokenizer.unk_token_id,
                                         {tokenizer._convert_id_to_token(i): i for i in range(len(tokenizer))})

because I run into a token which is split like this
['▁http', '://', 'www', '.', 'ib', 'rae', '.', 'ac', '.', 'ru', '/', 'ib', 'rae', '/', 'eng', '/', 'cher', 'no', 'by', 'l',
'/', 'nat', '', 're', 'p', '/', 'nat', '', 're', 'pe', '.', 'ht', 'm', '#', '24']

and '_' is not present in the vocabulary.

关于词向量权重的初始化问题

你好,观察到代码里将pretrained embedding和zero embedding进行了加和,请问这样的操作和不用zero embedding是不是做的一件事?
由于zero embedding可以看作0均值0方差,在embedding dropout中在计算scale的时候将word level乘以2是否还有必要?
谢谢。

Unable to train on custom conllu data

Hi,

I am trying to train a biaffine dependency parser on UD_Russian-SynTagRus corpus. For some reason, training script fails without any warnings or errors. Could you please help me on what could go wrong? I'm trying to run training in Google Colab.

Here's the script I'm using:

!python -m supar.cmds.biaffine_dependency train -b -d 0\
    -p exp/ptb.biaffine.dependency.char/model \
    -f char \
    --embed ft_native_300_ru_wiki_lenta_nltk_wordpunct_tokenize.vec \
    --train ./corpus/_UD/UD_Russian-SynTagRus/ru_syntagrus-ud-train.conllu \
    --dev ./corpus/_UD/UD_Russian-SynTagRus/ru_syntagrus-ud-dev.conllu \
    --test ./corpus/_UD/UD_Russian-SynTagRus/ru_syntagrus-ud-test.conllu

The output is:

2020-07-29 16:20:53.655924: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-07-29 16:21:01 INFO 
----------------+--------------------------
Param           |           Value          
----------------+--------------------------
tree            |           False          
proj            |           False          
mode            |           train          
path            | exp/ptb.biaffine.dependency.char/model
device          |             0            
seed            |             1            
threads         |            16            
batch_size      |           5000           
feat            |           char           
build           |           True           
punct           |           False          
max_len         |           None           
buckets         |            32            
train           | ./corpus/_UD/UD_Russian-SynTagRus/ru_syntagrus-ud-train.conllu
dev             | ./corpus/_UD/UD_Russian-SynTagRus/ru_syntagrus-ud-dev.conllu
test            | ./corpus/_UD/UD_Russian-SynTagRus/ru_syntagrus-ud-test.conllu
embed           | ft_native_300_ru_wiki_lenta_nltk_wordpunct_tokenize.vec
unk             |            unk           
bert            |      bert-base-cased     
----------------+--------------------------

2020-07-29 16:21:01 INFO Build the fields
^C

The training fails with no errors, so it's hard to see what exactly is wrong.

Plus, when trying to use Bert embeddings (bert-base-multilingual-cased), there was an error that the bos_token was not set.

Same failure happens when running supar.cmds.crf_dependency.

Decode with MST algorithm

I agree that using Eisner's algorithm can have better results for projective dependency parsing. If MST is supported, it would be great.

pad_index

The code uses two values for padding: 0, the default, for nn.Embedding such as char and tag feature types, and feat_pad_index for CharLSTM and Bert Embedding.
A value for pad_index is stored as self.pad_index in the model BiaffineDependencyModel as well as saved among the parameters in a model file.
But BERT models uses a different value, usually different from feat_pad_index and pad_index.
In method BiaffineDependencyModel.forward(), self.pad_index is used to compute a mask:

mask = words.ne(self.pad_index)

but that might not be the right value.
I run into this problem when using the French model camembert-base which uses a pad index of 1.
Besides, in my experiments, I also get away with feature word_embed.

I propose is to use its own value on feats rather than on words:

    word_feats = feats[:,:,0] # drop subpiece dimension                              
    batch_size, seq_len = word_feats.shape
    # get the mask and lengths of given batch                                         
    mask = word_feats.ne(self.feat_embed.pad_index)
    lens = mask.sum(dim=1)

directions in LSTM

In lstm.py, shouldn't be the dimensions of h and c depend on the number of directions, as mentioned in the comment?
"h of shape [num_layers*num_directions, batch_size, hidden_size] holds the initial hidden state ...
c of shape [num_layers*num_directions, batch_size, hidden_size] holds the initial cell state"

Hence, it should be:

num_directions = 1 + self.bidirectional
if hx is None:
        ih = x.new_zeros(self.num_layers * num_directions, batch_size, self.hidden_size)
        h, c = ih, ih
    else:
        h, c = self.permute_hidden(hx, sequence.sorted_indices)
    h = h.view(self.num_layers, num_directions, batch_size, self.hidden_size)
    c = c.view(self.num_layers, num_directions, batch_size, self.hidden_size)

integrated tokenizer

A simple change is needed in order to integrate a tokenizer.
In file utils/transform.py, to method CoNLL.transform.init(), add the optional parameter

reader=open

and then set

self.reader=reader

and in CoNLL.load(), change it to use it:

    if isinstance(data, str):
        if not hasattr(self, 'reader'): self.reader = open # back compatibility       
        with self.reader(data) as f:
            lines = [line.strip() for line in f]

You can then pass as reader a nltk tokenizer or a Stanza tokenizer.
I use this code to interface tp Stanza:

tokenizer.py.txt

config file

Values in a config file which are also CLI arguments are discarded, since class Config overwrites those values from kwargs.

Is this intended?

The alternative is to set their default to config.SUPPRESS, like this:

subparser.add_argument('--n-word-embed', default=argparse.SUPPRESS, type=int, help='dimension of embeddings')
subparser.add_argument('--bert', default=argparse.SUPPRESS, help='which bert model to use')

If not specified in the CLI, they will get a default value from function call to train()

config.ini is not work for n_embed

I have change "n_embed" in config.ini from 100 to 300,but when i run biaffine.dependency with the embedding file "sgns.renmin.word"(the size of embeddings in "sgns.renmin.word" is 300), the programe report an error. when i debug, i find "n_embed" in "models/dependncy" is still 100, it have not change to 300!!!! that means, "n_embed" in the programe can not change according to "config.ini". so i have to change "n_embed" in "models/dependncy" to 300 by hand.

Duplicate Name "parser"

I get an error when running run.py

ModuleNotFoundError: No module named 'parser.cmds'; 'parser' is not a package

I suppose the reason is that one module is named "parser" while one file is named "parser".

AutoModel

What about adding the ability to use any model from HuggingFace, besides BERT?

It is enough to define a subclass of BertEmbedding like this:

from transformers import AutoModel, AutoConfig

class AutoEmbedding(BertEmbedding):

def __init__(self, model, n_layers, n_out, pad_index=0,
             requires_grad=False):
    super(BertEmbedding, self).__init__()

    config = AutoConfig.from_pretrained(model)
    config.output_hidden_states = True
    self.bert = AutoModel.from_pretrained(model, config=config)
    self.bert.config.output_hidden_states = True
    self.bert = self.bert.requires_grad_(requires_grad)
    self.n_layers = n_layers
    self.n_out = n_out
    self.pad_index = pad_index
    self.requires_grad = requires_grad
    self.hidden_size = self.bert.config.hidden_size

    self.scalar_mix = ScalarMix(n_layers)
    if self.hidden_size != n_out:
        self.projection = nn.Linear(self.hidden_size, n_out, False)

unk not exist in tokens

Traceback (most recent call last):
  File "run.py", line 58, in <module>
    cmd(args)
  File "/public/sist/home/lizhao/biaffine-parser/parser/cmds/train.py", line 40, in __call__
    super(Train, self).__call__(args)
  File "/public/sist/home/lizhao/biaffine-parser/parser/cmds/cmd.py", line 49, in __call__
    self.WORD.build(train, args.min_freq, embed)
  File "/public/sist/home/lizhao/biaffine-parser/parser/utils/field.py", line 86, in build
    tokens[embed.unk_index] = self.unk
  File "/public/sist/home/lizhao/biaffine-parser/parser/utils/embedding.py", line 29, in unk_index
    return self.tokens.index(self.unk)
ValueError: tuple.index(x): x not in tuple

Hi, I check that the unk token is not added to class Embedding so it can't be indexed. Any suggestions? Thanks.

saved models

It seems that the models saved with torch.save() include external objects, like BertTokenizer.
If you try to run the model on a machine where a new version of transformers (e.g. 3.1.0) becomes available, the program will crash.
This is a pity, since it makes all trained model no more usable.
It should be better to avoid saving the whole tokenizer object and only save its class name in order to recreate a new instance when loading the model.

python -m supar.cmds.biaffine_dependency predict -p=exp/fr-bert/model --tree
--data=fr.conllu --pred=/dev/null

2020-09-18 17:00:10 INFO Loading the data
Traceback (most recent call last):
File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/homenfs/tempGPU/iwpt2020/supar/supar/cmds/biaffine_dependency.py", line 43, in
main()
File "/homenfs/tempGPU/iwpt2020/supar/supar/cmds/biaffine_dependency.py", line 39, in main
parse(parser)
File "/homenfs/tempGPU/iwpt2020/supar/supar/cmds/cmd.py", line 35, in parse
parser.predict(**args)
File "/homenfs/tempGPU/iwpt2020/supar/supar/parsers/biaffine_dependency.py", line 125, in predict
return super().predict(**Config().update(locals()))
File "/homenfs/tempGPU/iwpt2020/supar/supar/parsers/parser.py", line 137, in predict
dataset.build(args.batch_size, args.buckets)
File "/homenfs/tempGPU/iwpt2020/supar/supar/utils/data.py", line 88, in build
self.fields = self.transform(self.sentences)
File "/homenfs/tempGPU/iwpt2020/supar/supar/utils/transform.py", line 39, in call
pairs[f] = f.transform([getattr(i, f.name) for i in sentences])
File "/homenfs/tempGPU/iwpt2020/supar/supar/utils/field.py", line 302, in transform
for seq in sequences]
File "/homenfs/tempGPU/iwpt2020/supar/supar/utils/field.py", line 302, in
for seq in sequences]
File "/homenfs/tempGPU/iwpt2020/supar/supar/utils/field.py", line 301, in
sequences = [[self.preprocess(token) for token in seq]
File "/homenfs/tempGPU/iwpt2020/supar/supar/utils/field.py", line 157, in preprocess
sequence = self.tokenize(sequence)
File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/transformers/tokenization_utils.py", line 349, in tokenize
no_split_token = self.unique_no_split_tokens
AttributeError: 'BertTokenizer' object has no attribute 'unique_no_split_tokens'

吐槽

代码耦合程度太高了,很多东西可以简单来的,搞些没用的,弄复杂了

Data Download for Testing

Can you share some information on the data availability for testing, I want to inculcate this model in a different project so would like to quickly test and understanding the working. Can you share an open source data source?

RuntimeError: merge_sort: failed to synchronize: an illegal memory access was encountered

你好,在python trun.py train --device 1 是会出现下列的错误:
torch.cuda.is_available(): True

Traceback (most recent call last):
File "run.py", line 41, in
args.func(args)
File "/home/workspace/biaffine-parser/parser/cmds/train.py", line 92, in call
file=args.file)
File "/home/workspace/biaffine-parser/parser/model.py", line 34, in call
self.train(train_loader)
File "/home/workspace/biaffine-parser/parser/model.py", line 75, in train
loss.backward()
File "/home/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 102, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/anaconda3/lib/python3.7/site-packages/torch/autograd/init.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: merge_sort: failed to synchronize: an illegal memory access was encountered

RuntimeError: copy_if failed to synchronize

When training with the a relatively small Arabian corpus from here (I just made a few simple changes to the reader to skip comments and multi-tokens in the CoNLLU format):
http://ufal.mff.cuni.cz/~zeman/soubory/iwpt2020-train-dev.tgz
I get this error:

/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [38,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [38,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [38,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [38,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [38,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed.
Traceback (most recent call last):
File "run.py", line 58, in
cmd(args)
File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/cmds/train.py", line 85, in call
self.train(train.loader)
File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/cmds/cmd.py", line 91, in train
arc_scores, rel_scores = self.model(words, feats)
File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/model.py", line 95, in forward
feat_embed = self.feat_embed(*feats)
File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/modules/bert.py", line 43, in forward
bert = bert[bert_mask].split(bert_lens[mask].tolist())
RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered

It works on CPU however.

Thank you for a nice project.

Understanding biaffine operation.

https://github.com/zysite/biaffine-parser/blob/e54c2104658443e10df4e27a392041a559fcc745/parser/modules/biaffine.py#L43

Hi zysite
I am really getting confused by this operation. In the official code they do an adjoint of a matrix and here you do
s = x @ self.weight @ torch.transpose(y, -1, -2).

Especially I want to understand how it is same as

Do the multiplications

(bn x d) (d x rd) -> (bn x rd)

lin = tf.matmul(tf.reshape(inputs1, [-1, inputs1_size + add_bias1]),
tf.reshape(weights, [inputs1_size + add_bias1, -1]))

(b x nr x d) (b x n x d)T -> (b x nr x n)

bilin = tf.matmul(
tf.reshape(lin, [batch_size, inputs1_bucket_size * output_size, inputs2_size + add_bias2]),
inputs2, adjoint_b=True)

What was your intuition to come up with this.
Can you help me understand this in detail.
Thanks a lot for your help.

Sincerely
Pranoy

Reproduce the results of CTB

Hi, I failed to reproduce the results of biaffine parser in Chinese Tree Bank. I tried simply using the default config file with CTB datasets, but I got a LAS of 85~86. Which is about 3 LAS lower than the result produced by the Dozat's parser. Any suggestions for this? Maybe I miss something in data pre-processing?

AttributeError: module 'torch.distributed' has no attribute 'is_initialized'

I loaded the crf-con-en model to the Parser. Then I tried to use the predict() function on a list of English tokens.
But I get the above error. Need help on how to debug this!

Here's my code:

from supar import Parser
import nltk

parser = Parser.load("crf-con-en")

text = nltk.word_tokenize("John, who is the CEO of a company, played golf.")

output = parser.predict(data=[text], verbose=False)

Complete error log:

AttributeError Traceback (most recent call last)
in
----> 1 output = parser.predict(data=[text], verbose=False)

~\anaconda3\lib\site-packages\supar\parsers\crf_constituency.py in predict(self, data, pred, buckets, batch_size, prob, mbr, verbose, **kwargs)
121 """
122
--> 123 return super().predict(**Config().update(locals()))
124
125 def _train(self, loader):

~\anaconda3\lib\site-packages\supar\parsers\parser.py in predict(self, data, pred, buckets, batch_size, prob, **kwargs)
120 def predict(self, data, pred=None, buckets=8, batch_size=5000, prob=False, **kwargs):
121 args = self.args.update(locals())
--> 122 init_logger(logger, verbose=args.verbose)
123
124 self.transform.eval()

~\anaconda3\lib\site-packages\supar\utils\logging.py in init_logger(logger, path, mode, level, handlers, verbose)
28 level=level,
29 handlers=handlers)
---> 30 logger.setLevel(logging.INFO if is_master() and verbose else logging.WARNING)
31
32

~\anaconda3\lib\site-packages\supar\utils\parallel.py in is_master()
33
34 def is_master():
---> 35 return not dist.is_initialized() or dist.get_rank() == 0

AttributeError: module 'torch.distributed' has no attribute 'is_initialized'

disable padding.

Hi, is there a fast way to disable padding to make sure all sentences in a batch have the same length?
Thanks.

Data share

Hi, I'm very interested in your nice work, and I'd love to build my new model upon yours.
Could you please share the conllx style data you used to reproduce your results in paper?
Looking forward to your reply~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.