rafiepour / ctran Goto Github PK

Complete code for the proposed CNN-Transformer model for natural language understanding.

Home Page: https://github.com/rafiepour/CTran

License: Apache License 2.0

Jupyter Notebook 90.43% Python 9.57%

atis cnn-transformer dialogue-systems encoder-decoder intent-detection nlu slot-filling snips transformer-decoder transformers

ctran's People

Contributors

Stargazers

Watchers

Forkers

zsc19 thumesn

ctran's Issues

Why Transformer Decoder?

why use transformer decoder for slots filling，why not something more simple. what's the reason or intuition for using the structure

Training Saturates before Epoch ~10

When training on either ATIS or SNIPs, my model seems to have no improvements beyond Epoch ~10, with training SlotFilling F1 abd IntentDet Prec reaching ~1.0. The test set hits ~.98 which I understand is quite close to the training, but the fast convergence makes me wonder if this is expected of the model, especially since the paper mentions that improvements are not observed beyond 50.

I believe I have set my learning rates and optimizers appropriately according to the paper with BERT large so I am just wondering if there is something I am missing, if I should add more methods of preventing overfitting, or if this behavior is expected.

Here are my training parameters in case:

BATCH_SIZE=16
LENGTH=60
STEP_SIZE=50
loss_function_1 = nn.CrossEntropyLoss(ignore_index=0)
loss_function_2 = nn.CrossEntropyLoss()
dec_optim = optim.AdamW(decoder.parameters(),lr=0.0001)
enc_optim = optim.AdamW(encoder.parameters(),lr=0.001)
ber_optim = optim.AdamW(bert_layer.parameters(),lr=0.00001)
mid_optim = optim.AdamW(middle.parameters(), lr=0.0001)
enc_scheduler = torch.optim.lr_scheduler.StepLR(enc_optim, 1, gamma=0.96)
dec_scheduler = torch.optim.lr_scheduler.StepLR(dec_optim, 1, gamma=0.96)
mid_scheduler = torch.optim.lr_scheduler.StepLR(mid_optim, 1, gamma=0.96)
ber_scheduler = torch.optim.lr_scheduler.StepLR(ber_optim, 1, gamma=0.96)

Thank you!

Issue while using bert-large-uncased

Hi Rafiepour，
I tried to trained the CTRAN using bert-large-uncased for ATIS dataset. but its performance is poor as compared to when using bert-base-uncased embeddings. what could be the reason. and while downloading the bert-large-uncased files from hugging face there is no hubcof.py file. I m using the same hubconf.py file used in bert-large-uncased.

the result after 11 epoch is below
best model at epoch: 11
max single SF F1: 0.5989
max single ID PR: 0.7079
which is very low compared to the bert-base-uncased where I got good results.
loss is also high loss:2.0106

what might be the possible reasons for it

Why the addition of Self-Attention for Intent-Detection?

why dont use a simple fully connected layer for intent classification but instead uses self attention ? How to explain the reason.

what is the use of word2index

Hi Rafiepour，
Thanks for your great job！ I have a small question of the use of word2index，seems real ids（input_ids） that produced by bert_tokenizer, can i remove the word2index.

thanks!

rafiepour / ctran Goto Github PK

ctran's People

Contributors

Stargazers

Watchers

Forkers

ctran's Issues

Why Transformer Decoder?

Training Saturates before Epoch ~10

Issue while using bert-large-uncased

Why the addition of Self-Attention for Intent-Detection?

what is the use of word2index

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent