Code Monkey home page Code Monkey logo

ctran's People

Contributors

rafiepour avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

zsc19 thumesn

ctran's Issues

Why Transformer Decoder?

why use transformer decoder for slots filling,why not something more simple. what's the reason or intuition for using the structure

Training Saturates before Epoch ~10

When training on either ATIS or SNIPs, my model seems to have no improvements beyond Epoch ~10, with training SlotFilling F1 abd IntentDet Prec reaching ~1.0. The test set hits ~.98 which I understand is quite close to the training, but the fast convergence makes me wonder if this is expected of the model, especially since the paper mentions that improvements are not observed beyond 50.

I believe I have set my learning rates and optimizers appropriately according to the paper with BERT large so I am just wondering if there is something I am missing, if I should add more methods of preventing overfitting, or if this behavior is expected.

Here are my training parameters in case:

BATCH_SIZE=16
LENGTH=60
STEP_SIZE=50
loss_function_1 = nn.CrossEntropyLoss(ignore_index=0)
loss_function_2 = nn.CrossEntropyLoss()
dec_optim = optim.AdamW(decoder.parameters(),lr=0.0001)
enc_optim = optim.AdamW(encoder.parameters(),lr=0.001)
ber_optim = optim.AdamW(bert_layer.parameters(),lr=0.00001)
mid_optim = optim.AdamW(middle.parameters(), lr=0.0001)
enc_scheduler = torch.optim.lr_scheduler.StepLR(enc_optim, 1, gamma=0.96)
dec_scheduler = torch.optim.lr_scheduler.StepLR(dec_optim, 1, gamma=0.96)
mid_scheduler = torch.optim.lr_scheduler.StepLR(mid_optim, 1, gamma=0.96)
ber_scheduler = torch.optim.lr_scheduler.StepLR(ber_optim, 1, gamma=0.96)

Thank you!

Issue while using bert-large-uncased

Hi Rafiepour,
I tried to trained the CTRAN using bert-large-uncased for ATIS dataset. but its performance is poor as compared to when using bert-base-uncased embeddings. what could be the reason. and while downloading the bert-large-uncased files from hugging face there is no hubcof.py file. I m using the same hubconf.py file used in bert-large-uncased.

the result after 11 epoch is below
best model at epoch: 11
max single SF F1: 0.5989
max single ID PR: 0.7079
which is very low compared to the bert-base-uncased where I got good results.
loss is also high loss:2.0106

what might be the possible reasons for it

what is the use of word2index

Hi Rafiepour,
Thanks for your great job! I have a small question of the use of word2index,seems real ids(input_ids) that produced by bert_tokenizer, can i remove the word2index.

thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.