Hi! I have tried to reproduce your work (en->tr direction transformer baseline for now) using both fairseq and your system. I followed the exact data preprocessing and hyper-parameter settings in your configs. And I only achieved 9.7 sacrebleu score instead of 11.2. I would appreciate it if could you please double-check the code or hyper-parameter settings for that!
Below is the result I got. PS: I checked the sacrebleu version in your requirements.txt as well, it yields the same score.
{
"name": "BLEU",
"score": 9.7,
"signature": "nrefs:1|case:mixed|eff:no|tok:13a|smooth:exp|version:2.3.1",
"verbose_score": "38.5/13.9/6.0/2.7 (BP = 1.000 ratio = 1.042 hyp_len = 55984 ref_len = 53731)",
"nrefs": "1",
"case": "mixed",
"eff": "no",
"tok": "13a",
"smooth": "exp",
"version": "2.3.1"
}