Comments (2)
-
You can check the argparse, this should be helpful: https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/main.py#L78-L91
Also, see 2) below: -
Sharing all encoder and decoder layers (
share_enc
andshare_dec
) is usually good, or not far from the best you could get. Sharing the language embeddings (share_lang_emb
) helps if the languages are related (like English-French), but is not very useful if they are very distant or have a different alphabet (like English-Russian). Sharing the decoder with the output embeddings (share_decpro_emb
) or sharing the output embeddings with the input ones (share_output_emb
) usually doesn't make a big difference. I would suggest setting this toTrue
as it might help in very low resource scenarios. -
Not exactly. You will be sharing the 2 first layers of the decoder, but the 2 last layers of the encoder. The sharing grows from the distance to the latent state.
See:
https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/src/model/transformer.py#L60 for the encoder, and:
https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/src/model/transformer.py#L166
for the decoder. -
For this you should set
share_lang_emb
toFalse
-
I recognize this is a bit tricky, and some parameters can have a different effect based on the others. I mean that they are not all totally independent. I would suggest looking at these few lines:
https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/src/model/transformer.py#L184-L198 it is probably much simpler that you look at these to understand what is shared in which condition. -
Yes, you need to have a shared vocabulary to share the lookup tables.
share_lang_emb = True
is only possible if the vocabulary is the same for the 2 languages. In total, there are 6 lookup tables: 2 for the encoder input, 2 for the decoder input, and 2 for the decoder output. If you setshare_lang_emb = True
, it becomes 1 for the encoder input, 1 for the decoder input, and 1 for the decoder output.
If you also setshare_decpro_emb = True
you only have 1 lookup table in the encoder, and 1 in the decoder. If you also setshare_encdec_emb = True
, you only have one lookup table in the end.
Also, this should be helpful to understand what parameters are related, and not independent:
https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/src/model/__init__.py#L23-L31
It checks that parameters are valid and not contradictory with each other. For instance assert not params.share_output_emb or params.share_lang_emb
says that if we share the output embeddings, then necessarily we share the source and target embeddings and that share_lang_emb
has to be true.
Overall, I would not worry too much about this, and I would just suggest sharing everything. Sharing everything should not give you something very far from the best performance you may get by not sharing some specific layers.
Hope this helps.
from unsupervisedmt.
Thanks a lot Guillaume for such a detailed response.
from unsupervisedmt.
Related Issues (20)
- why MemoryError
- Why codes file is empty.? HOT 4
- for different language, where to make change?
- How to train NMT + PBSMT ?
- UnboundLocalError: local variable 'n_words' referenced before assignment
- About number of shared layers
- RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [14, 32, 1536]], which is output 0 of AddBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). HOT 1
- How to run PBSMT +NMT ?
- transformer multihead attention scaling layer error
- Setting the random seed does not result in same outputs across runs
- I have trouble when run get_data_enfr.sh
- How can I modify the code to train may own dataset on specific language?
- Low utilization rate of cuda HOT 1
- How to train the vector of phrases
- Low BLEU on PBSMT HOT 3
- bpe_end issue
- Getting raise EOFError() while executing Linux Command through Netmiko
- How i can run MUSE alignment in .sh
- How to train the model without para_dataset
- Error in runny bash command. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unsupervisedmt.