Thanks for releasing the code. Please help me understand these

You can check the argparse, this should be helpful: <a href="http

Help understand the share* input arguments. about unsupervisedmt HOT 2 CLOSED

facebookresearch commented on July 17, 2024

Help understand the share* input arguments.

from unsupervisedmt.

Comments (2)

glample commented on July 17, 2024 2

You can check the argparse, this should be helpful: https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/main.py#L78-L91
Also, see 2) below:
Sharing all encoder and decoder layers (share_enc and share_dec) is usually good, or not far from the best you could get. Sharing the language embeddings (share_lang_emb) helps if the languages are related (like English-French), but is not very useful if they are very distant or have a different alphabet (like English-Russian). Sharing the decoder with the output embeddings (share_decpro_emb) or sharing the output embeddings with the input ones (share_output_emb) usually doesn't make a big difference. I would suggest setting this to True as it might help in very low resource scenarios.
Not exactly. You will be sharing the 2 first layers of the decoder, but the 2 last layers of the encoder. The sharing grows from the distance to the latent state.
See:
https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/src/model/transformer.py#L60 for the encoder, and:
https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/src/model/transformer.py#L166
for the decoder.
For this you should set share_lang_emb to False
I recognize this is a bit tricky, and some parameters can have a different effect based on the others. I mean that they are not all totally independent. I would suggest looking at these few lines:
https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/src/model/transformer.py#L184-L198 it is probably much simpler that you look at these to understand what is shared in which condition.
Yes, you need to have a shared vocabulary to share the lookup tables. share_lang_emb = True is only possible if the vocabulary is the same for the 2 languages. In total, there are 6 lookup tables: 2 for the encoder input, 2 for the decoder input, and 2 for the decoder output. If you set share_lang_emb = True, it becomes 1 for the encoder input, 1 for the decoder input, and 1 for the decoder output.
If you also set share_decpro_emb = True you only have 1 lookup table in the encoder, and 1 in the decoder. If you also set share_encdec_emb = True, you only have one lookup table in the end.

Also, this should be helpful to understand what parameters are related, and not independent:
https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/src/model/__init__.py#L23-L31
It checks that parameters are valid and not contradictory with each other. For instance assert not params.share_output_emb or params.share_lang_emb says that if we share the output embeddings, then necessarily we share the source and target embeddings and that share_lang_emb has to be true.

Overall, I would not worry too much about this, and I would just suggest sharing everything. Sharing everything should not give you something very far from the best performance you may get by not sharing some specific layers.

Hope this helps.

from unsupervisedmt.

ashim95 commented on July 17, 2024

Thanks a lot Guillaume for such a detailed response.

from unsupervisedmt.

Help understand the share* input arguments. about unsupervisedmt HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent