Code Monkey home page Code Monkey logo

Comments (7)

zhangguanqun avatar zhangguanqun commented on August 30, 2024

In reference paper "Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder", to limit the translated candidates, they add target language token indicator at the beginning and the end of every source sentences.
In another reference paper "Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation", they also add the target language token at the beginning of every source sentences.
Why you add source language not target language indicator in your paper?
Source sentence with language indicator itself, with only pre-trained model, how can i specify the language translated to?

from mrasp.

linzehui avatar linzehui commented on August 30, 2024

Following mBART, we add source language indicator on the source side and target language indicator on the target side.
If you are using fairseq, you may want to modify the sequence generator to specify the language translated to.

Refer to fairseq/sequence_generator.py, the function generate(models, sample, prefix_tokens=None, bos_token=None) allows to force decoder to begin with a prefix_token, where you can specify language.

lang_prefix_id = self.target_dictionary.index(self.args.lang_prefix_tok)
return generator.generate(models, sample, prefix_tokens=lang_prefix_id, bos_token=bos_id)

from mrasp.

zhangguanqun avatar zhangguanqun commented on August 30, 2024

Wow, that is the key, thank u

from mrasp.

10211412 avatar 10211412 commented on August 30, 2024

请问怎么具体改相应语言的标识符呢,是去pip install fairseq的包里面去改包里的代码吗

from mrasp.

10211412 avatar 10211412 commented on August 30, 2024

prefix_tokens: Optional[Tensor] = None那这个tokens具体要改成什么呢

from mrasp.

10211412 avatar 10211412 commented on August 30, 2024

改成LANG_TOK_DE会报错

from mrasp.

linzehui avatar linzehui commented on August 30, 2024

@10211412 时间久远不大记得了,大致思路:下载GitHub的fairseq,checkout到两年前的版本,然后pip install -e . ,之后再修改fairseq/sequence_generator.py, 强制以指定token开头,注意到词表中必须有这个language token。

from mrasp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.