Code Monkey home page Code Monkey logo

bilstm-lan's Introduction

  • 👋 Hi, I’m @Nealcly, a senior researcher at Natural Language Processing Center, Tencent AI lab.
  • 👀 Please email me ([email protected]), if you would like to work with us.

Nealcly's github stats

bilstm-lan's People

Contributors

abhi1nandy2 avatar nealcly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bilstm-lan's Issues

Performance on Ontonotes v5.0

Hi

First of all, Thanks for your last reply.
As your command I execute model with Ontonotes v5.0.
Although your official f1-score is 88.16%, I always get 85%.
When I execute your model with UD, I got very good performance. So I think I have something mistake.

It is my command.
python main.py --learning_rate 0.01 --lr_decay 0.035 --dropout 0.5 --hidden_dim 400 --lstm_layer 4 --momentum 0.9 --whether_clip_grad True --clip_grad 5.0 --train_dir 'data/onto.train.txt' --dev_dir 'data/onto.development.txt' --test_dir 'data/onto.test.txt' --model_dir 'model/' --word_emb_dir 'glove.6B.100d.txt'

It is summary.
DATA SUMMARY START:
I/O:
Tag scheme: BIO
MAX SENTENCE LENGTH: 250
MAX WORD LENGTH: -1
Number normalized: False
Word alphabet size: 69812
Char alphabet size: 119
Label alphabet size: 38
Word embedding dir: glove.6B.100d.txt
Char embedding dir: None
Word embedding size: 100
Char embedding size: 30
Norm word emb: False
Norm char emb: False
Train file directory: data/onto.train.txt
Dev file directory: data/onto.development.txt
Test file directory: data/onto.test.txt
Raw file directory: None
Dset file directory: None
Model file directory: model/
Loadmodel directory: None
Decode file directory: None
Train instance number: 115812
Dev instance number: 15679
Test instance number: 12217
Raw instance number: 0
FEATURE num: 0
++++++++++++++++++++++++++++++++++++++++
Model Network:
Model use_crf: False
Model word extractor: LSTM
Model use_char: True
Model char extractor: LSTM
Model char_hidden_dim: 50
++++++++++++++++++++++++++++++++++++++++
Training:
Optimizer: SGD
Iteration: 100
BatchSize: 10
Average batch loss: False
++++++++++++++++++++++++++++++++++++++++
Hyperparameters:
Hyper lr: 0.01
Hyper lr_decay: 0.035
Hyper HP_clip: 5.0
Hyper momentum: 0.9
Hyper l2: 1e-08
Hyper hidden_dim: 400
Hyper dropout: 0.5
Hyper lstm_layer: 4
Hyper bilstm: True
Hyper GPU: True
DATA SUMMARY END.

I think I follow the hyperparameters well written in your paper.
Is there any mistake?

Thanks for reading.

Random initialize the word embedding

I have changed the Lao language dataset and want to perform pos, but there is no embedding for this language. How can I change it to randomly initialize embedding?

About OntoNotes 5.0

Hello.

I have a question about result when I use OntoNotes 5.0 as dataset for NER task.
As you wrote the performance of BiLSTM-LAN, expected accuracy is 88.16%..
But when I implemented model, I got 91.85% at Epoch 1.
Is there any mistake in my command?
My command is here.

python main.py --learning_rate 0.01 --lr_decay 0.035 --dropout 0.5 --hidden_dim 400 --lstm_layer 3 --momentum 0.9 --whether_clip_grad True --clip_ grad 5.0 --train_dir 'data/onto.train.txt' --dev_dir 'data/onto.development.txt' --test_dir 'data/onto.test.txt' --model_dir 'model/' --word_emb_dir 'glove.6B.1 00d.txt'

And this is Summary

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DATA SUMMARY START:
I/O:
Tag scheme: BIO
MAX SENTENCE LENGTH: 250
MAX WORD LENGTH: -1
Number normalized: False
Word alphabet size: 69812
Char alphabet size: 119
Label alphabet size: 38
Word embedding dir: glove.6B.100d.txt
Char embedding dir: None
Word embedding size: 100
Char embedding size: 30
Norm word emb: False
Norm char emb: False
Train file directory: data/onto.train.txt
Dev file directory: data/onto.development.txt
Test file directory: data/onto.test.txt
Raw file directory: None
Dset file directory: None
Model file directory: model/
Loadmodel directory: None
Decode file directory: None
Train instance number: 2200752
Dev instance number: 304684
Test instance number: 230111
Raw instance number: 0
FEATURE num: 0
++++++++++++++++++++++++++++++++++++++++
Model Network:
Model use_crf: False
Model word extractor: LSTM
Model use_char: True
Model char extractor: LSTM
Model char_hidden_dim: 50
++++++++++++++++++++++++++++++++++++++++
Training:
Optimizer: SGD
Iteration: 100
BatchSize: 4
Average batch loss: False
++++++++++++++++++++++++++++++++++++++++
Hyperparameters:
Hyper lr: 0.01
Hyper lr_decay: 0.035
Hyper HP_clip: None
Hyper momentum: 0.9
Hyper l2: 1e-08
Hyper hidden_dim: 400
Hyper dropout: 0.5
Hyper lstm_layer: 3
Hyper bilstm: True
Hyper GPU: True
DATA SUMMARY END.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

This is a sample of dataset. I convert data format to txt file.

Just O
now O
we O
were O
primarily O
talking O
about O
those O
produced O
by O
counterfeiting O

Thanks you.

Do concated size of the last layer equal to label_alpha table size ?

  1. As parameters listed in the utils/data.py , the label_dim equal to HP_hidden_dim, which specified to 200.
  2. The output size of the LAN layer in the last layer is the size of [HP_hidden_dim, label_dim], that is 2*HP_hidden_dim.
  3. After this, if we make use_crf = False , Do the concated size of the last layer eq to label_alpha size?
    Or make use_crf = True, how to make the output teasor with size (batch, seq_len, HP_hidden_dim) to the emission probability ?

Could Anyone give me some advise ? Thx !

关于标签L的维度问题

请问标签embedding的时候,维度(dh)是不是需要和标签数量L(或label_num)一样大?
不然的话先看(假如维度512),attention=qk:(length, 512)(label_num,512)==》(length, label_num),--继续计算--- attentionV=(length,label_num)*(label_num,512)==》(length,512),这就是LAN的结果,这个结果去映射标签,假如标签只有128个,那512去对应128个标签映射肯定是有问题的啊?

所以文中的维度,也就是文中的512是不是必须和标签数量L(或label_num)相等,不然没法和输出对应呀

Experiments on CoNLL03NER

Hello!
I try to run your code on conll03-ner dataset. But, performance I get is not as good as bilstm-crf. Could you help me find the bug? Thanks.
Here is part of my experiment log.

True
Seed num: 42
MODEL: train
Load pretrained word embedding, norm: False, dir: ../Data/pretrain_emb/glove.6B.100d.txt
Embedding:
pretrain word:400000, prefect match:11415, case_match:11656, oov:2234, oov%:0.08827945941673912
Training model...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DATA SUMMARY START:
I/O:
Tag scheme: BMES
MAX SENTENCE LENGTH: 250
MAX WORD LENGTH: -1
Number normalized: True
Word alphabet size: 25306
Char alphabet size: 78
Label alphabet size: 18
Word embedding dir: ../Data/pretrain_emb/glove.6B.100d.txt
Char embedding dir: None
Word embedding size: 100
Char embedding size: 30
Norm word emb: False
Norm char emb: False
Train file directory: ../Data/conll03/conll03.train.bmes
Dev file directory: ../Data/conll03/conll03.dev.bmes
Test file directory: ../Data/conll03/conll03.test.bmes
Raw file directory: None
Dset file directory: None
Model file directory: save/label_embedding
Loadmodel directory: None
Decode file directory: None
Train instance number: 14987
Dev instance number: 3466
Test instance number: 3684
Raw instance number: 0
FEATURE num: 0
++++++++++++++++++++++++++++++++++++++++
Model Network:
Model use_crf: False
Model word extractor: LSTM
Model use_char: True
Model char extractor: LSTM
Model char_hidden_dim: 50
++++++++++++++++++++++++++++++++++++++++
Training:
Optimizer: SGD
Iteration: 100
BatchSize: 10
Average batch loss: False
++++++++++++++++++++++++++++++++++++++++
Hyperparameters:
Hyper lr: 0.01
Hyper lr_decay: 0.04
Hyper HP_clip: None
Hyper momentum: 0.9
Hyper l2: 1e-08
Hyper hidden_dim: 400
Hyper dropout: 0.5
Hyper lstm_layer: 4
Hyper bilstm: True
Hyper GPU: True
DATA SUMMARY END.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
build network...
use_char: True
char feature extractor: LSTM
word feature extractor: LSTM
build word sequence feature extractor: LSTM...
build word representation...
build char sequence feature extractor: LSTM ...
--------pytorch total params--------
9849140
Epoch: 0/100
Learning rate is set as: 0.01
Instance: 14987; Time: 125.29s; loss: 2452.2396; acc: 172887.0/204567.0=0.8451
Epoch: 0 training finished. Time: 125.29s, speed: 119.62st/s, total loss: 126550.64305019379
totalloss: 126550.64305019379
gold_num = 5942 pred_num = 6508 right_num = 2556
Dev: time: 11.00s, speed: 317.98st/s; acc: 0.9036, p: 0.3927, r: 0.4302, f: 0.4106
gold_num = 5648 pred_num = 6351 right_num = 2261
Test: time: 10.95s, speed: 339.54st/s; acc: 0.8919, p: 0.3560, r: 0.4003, f: 0.3769
Exceed previous best f score: -10

inference

Ask a question:How to do in prediction, which don't have label? Thanks.

why exit query masking?

In lstm_attention.py , there exit the following code:

# Query Masking
query_masks = torch.sign(torch.abs(torch.sum(queries, dim=-1))) # (N, T_q)
query_masks = query_masks.repeat(self.num_heads, 1) # (hN, T_q)
query_masks = torch.unsqueeze(query_masks, 2).repeat(1, 1, keys.size()[1]) # (h
N, T_q, T_k)
outputs = outputs * query_masks

The query_masks seem to be a tensor which contain 0 and 1, what is the effect of it and why does it exit ?

Confuse about the effect of label embedding in the model

As described in Fig2 of the paper, the label embedding is concated with the output of BiLSTM of Layer1 and Layer2, as well as the output of Label Attention Inference Layer. However,
how does the label embedding correctly take effect in Layer1 and Layer2 ? In addition, why the label embedding didn't concated as the input of final prediction.

data and word embedding

Could you please provide links for downloading them? If anything requires licence, could you please also provide a comment informing about that ?

why is very slow about the model on CPU platform?

l run the model in windows10 with CPU, but it will spend 4 hours every epoch, that is, 100 epoches need 400 hour in order to run the whole model. it claims it is faster than biLSTM+CRF, actually,it is not.
ok, l run the BERT+biLSTM+CRF on same envirment(windows10 with CPU), it only costs 10 hours, however, it's accuracy is 0.92
Please can you tell me that is why?

Train/Decode speed comparation with CRF

Hello, your work is very great but I have a question about the speed comparation.
From your code published, when we use the CRF, in the seqlab.py file, the CRF layer only be used to calculate the loss in the neg_log_liklihood_loss function, but in the forward function, there is nothing with CRF layer. Did I miss something somewhere ? Hope hear your reply, thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.