Code Monkey home page Code Monkey logo

Comments (2)

glample avatar glample commented on July 17, 2024 2
  1. You can check the argparse, this should be helpful: https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/main.py#L78-L91
    Also, see 2) below:

  2. Sharing all encoder and decoder layers (share_enc and share_dec) is usually good, or not far from the best you could get. Sharing the language embeddings (share_lang_emb) helps if the languages are related (like English-French), but is not very useful if they are very distant or have a different alphabet (like English-Russian). Sharing the decoder with the output embeddings (share_decpro_emb) or sharing the output embeddings with the input ones (share_output_emb) usually doesn't make a big difference. I would suggest setting this to True as it might help in very low resource scenarios.

  3. Not exactly. You will be sharing the 2 first layers of the decoder, but the 2 last layers of the encoder. The sharing grows from the distance to the latent state.
    See:
    https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/src/model/transformer.py#L60 for the encoder, and:
    https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/src/model/transformer.py#L166
    for the decoder.

  4. For this you should set share_lang_emb to False

  5. I recognize this is a bit tricky, and some parameters can have a different effect based on the others. I mean that they are not all totally independent. I would suggest looking at these few lines:
    https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/src/model/transformer.py#L184-L198 it is probably much simpler that you look at these to understand what is shared in which condition.

  6. Yes, you need to have a shared vocabulary to share the lookup tables. share_lang_emb = True is only possible if the vocabulary is the same for the 2 languages. In total, there are 6 lookup tables: 2 for the encoder input, 2 for the decoder input, and 2 for the decoder output. If you set share_lang_emb = True, it becomes 1 for the encoder input, 1 for the decoder input, and 1 for the decoder output.
    If you also set share_decpro_emb = True you only have 1 lookup table in the encoder, and 1 in the decoder. If you also set share_encdec_emb = True, you only have one lookup table in the end.

Also, this should be helpful to understand what parameters are related, and not independent:
https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/src/model/__init__.py#L23-L31
It checks that parameters are valid and not contradictory with each other. For instance assert not params.share_output_emb or params.share_lang_emb says that if we share the output embeddings, then necessarily we share the source and target embeddings and that share_lang_emb has to be true.

Overall, I would not worry too much about this, and I would just suggest sharing everything. Sharing everything should not give you something very far from the best performance you may get by not sharing some specific layers.

Hope this helps.

from unsupervisedmt.

ashim95 avatar ashim95 commented on July 17, 2024

Thanks a lot Guillaume for such a detailed response.

from unsupervisedmt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.