unbabel / openkiwi Goto Github PK

View Code? Open in Web Editor NEW

230.0 27.0 48.0 34.37 MB

Open-Source Machine Translation Quality Estimation in PyTorch

Home Page: https://unbabel.github.io/OpenKiwi/

License: GNU Affero General Public License v3.0

Python 96.55% Mathematica 3.45%

machine-translation quality-estimation pytorch translation-quality-estimation openkiwi pytorch-lightning

openkiwi's Introduction

Open-Source Machine Translation Quality Estimation in PyTorch

Quality estimation (QE) is one of the missing pieces of machine translation: its goal is to evaluate a translation system’s quality without access to reference translations. We present OpenKiwi, a Pytorch-based open-source framework that implements the best QE systems from WMT 2015-18 shared tasks, making it easy to experiment with these models under the same framework. Using OpenKiwi and a stacked combination of these models we have achieved state-of-the-art results on word-level QE on the WMT 2018 English-German dataset.

News

An experimental demonstration interface called OpenKiwi Tasting has been released on GitHub and can be checked out in Streamlit Share.
A new major version (2.0.0) of OpenKiwi has been released. Introducing HuggingFace Transformers support and adoption of Pytorch-lightning. For a condensed view of changed, check the changelog
Following our nomination in early July, we are happy to announce we won the Best Demo Paper at ACL 2019! Congratulations to the whole team and huge thanks for supporters and issue reporters.
Check out the published paper.
We have released the OpenKiwi tutorial we presented at MT Marathon 2019.

Features

Framework for training QE models and using pre-trained models for evaluating MT.
Supports both word and sentence-level (HTER or z-score) Quality estimation.
Implementation of five QE systems in Pytorch: NuQE [2, 3], predictor-estimator [4, 5], BERT-Estimator [6], XLM-Estimator [6] and XLMR-Estimator
Older systems only supported in versions <=2.0.0: QUETCH [1], APE-QE [3] and a stacked ensemble with a linear system [2, 3].
Easy to use API. Import it as a package in other projects or run from the command line.
Easy to track and reproduce experiments via yaml configuration files.
Based on Pytorch-Lightning making the code easier to scale, use and keep up-do-date with engineering advances.
Implemented using HuggingFace Transformers library to allow easy access to state-of-the-art pre-trained models.

Quick Installation

To install OpenKiwi as a package, simply run

pip install openkiwi

You can now

import kiwi

inside your project or run in the command line

kiwi

Optionally, if you'd like to take advantage of our MLflow integration, simply install it in the same virtualenv as OpenKiwi:

pip install openkiwi[mlflow]

Getting Started

Detailed usage examples and instructions can be found in the Full Documentation.

Contributing

We welcome contributions to improve OpenKiwi. Please refer to CONTRIBUTING.md for quick instructions or to contributing instructions for more detailed instructions on how to set up your development environment.

License

OpenKiwi is Affero GPL licensed. You can see the details of this license in LICENSE.

Citation

If you use OpenKiwi, please cite the following paper: OpenKiwi: An Open Source Framework for Quality Estimation.

@inproceedings{openkiwi,
    author = {Fábio Kepler and
              Jonay Trénous and
              Marcos Treviso and
              Miguel Vera and
              André F. T. Martins},
    title  = {Open{K}iwi: An Open Source Framework for Quality Estimation},
    year   = {2019},
    booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics--System Demonstrations},
    pages  = {117--122},
    month  = {July},
    address = {Florence, Italy},
    url    = {https://www.aclweb.org/anthology/P19-3020},
    organization = {Association for Computational Linguistics},
}

References

[1] Kreutzer et al. (2015): QUality Estimation from ScraTCH (QUETCH): Deep Learning for Word-level Translation Quality Estimation

[2] Martins et al. (2016): Unbabel's Participation in the WMT16 Word-Level Translation Quality Estimation Shared Task

[3] Martins et al. (2017): Pushing the Limits of Translation Quality Estimation

[4] Kim et al. (2017): Predictor-Estimator using Multilevel Task Learning with Stack Propagation for Neural Quality Estimation

[5] Wang et al. (2018): Alibaba Submission for WMT18 Quality Estimation Task

[6] Kepler et al. (2019): Unbabel’s Participation in the WMT19 Translation Quality Estimation Shared Task

openkiwi's People

Contributors

Stargazers

Watchers

Forkers

hqwu-hitcs alvations awesome-archive fossabot xlniu erickrf yuan7180 ninalopatina zachary-yl parkchanjun jingzhoujinga gazzola zouharvi irfnrdh felixlabelle francoishernandez drcica yanghaocsg zhaofuchen pixelneo alphadl cyjack danparamov mengxia-mx dmar1n ricardorei lefterav goo2go adityavavre zh25714 xuda1998 yym6472 dalgaard zmqgeek jsouza adam11112 xiaoffff liang813 cep-ter ncduy0303 blinkblade jlx339 gullanshire kaiserv2 zolastro lmirel holla-waldfee-2 tanminhtran168

openkiwi's Issues

Unable to predict gap tags

Describe the bug
In our research we wanted to use QUETCH/NuQE for baseline QE. It mostly works, but we were unable to start predicting gap tags (WMT18+).

To Reproduce

Download en-de WMT19 data (or WMT18)
Use the default quetch YAML config
Add wmt18-format: true and predict-gaps: true
See error

Expected behavior
QUETCH training.

Encountered behavior

ValueError: Expected input batch_size (2176) to match target batch_size (2240).

Additional context
From the numbers it is obvious what the error might be. Since their difference is 64, which is also the batch size, then for each sentence there's an extra target tag supplied. When we deleted first tag from every target sentence, it started training. The data is, however, correct, as for each target sentence of length T, there should be 2*T+1 tags and not 2*T.

I've checked in the source code for possible errors, but everything seems to be just fine.

def wmt18_to_gaps(batch, *args):
    """Extract gap tags from wmt18 format file.
    """
    return batch[::2]

Thank you for pointing us in the direction of solving this issue.

Add Streamlit demo

This demo created with Streamlit let's you inspect the predictions of a Kiwi model.

training failed when only predicting source tags

When I run the experiment with training bert based qe models and predicting only the source tags, it failed after reporting for the metric on validation dataset. The error occured as below. It seems the function "self.monitor_op(current - self.min_delta, self.best)" could only run on cpu. But when I run the training process and predicting all the source tags, target tags and the other tags, the training process is ok.

Error training estimator model

Hi~, thank you for releasing the code.

I got error when i training the estimator model( kiwi train --config experiments/train_estimator.yaml) as follow:

Traceback (most recent call last):
File "/home//anaconda3/bin/kiwi", line 11, in
sys.exit(main())
File "/home//anaconda3/lib/python3.6/site-packages/kiwi/main.py", line 22, in main
return kiwi.cli.main.cli()
File "/home//anaconda3/lib/python3.6/site-packages/kiwi/cli/main.py", line 71, in cli
train.main(extra_args)
File "/home//anaconda3/lib/python3.6/site-packages/kiwi/cli/pipelines/train.py", line 141, in main
train.train_from_options(options)
File "/home//anaconda3/lib/python3.6/site-packages/kiwi/lib/train.py", line 123, in train_from_options
trainer = run(ModelClass, output_dir, pipeline_options, model_options)
File "/home//anaconda3/lib/python3.6/site-packages/kiwi/lib/train.py", line 204, in run
trainer.run(train_iter, valid_iter, epochs=pipeline_options.epochs)
File "/home//anaconda3/lib/python3.6/site-packages/kiwi/trainers/trainer.py", line 75, in run
self.train_epoch(train_iterator, valid_iterator)
File "/home//anaconda3/lib/python3.6/site-packages/kiwi/trainers/trainer.py", line 95, in train_epoch
outputs = self.train_step(batch)
File "/home//anaconda3/lib/python3.6/site-packages/kiwi/trainers/trainer.py", line 139, in train_step
model_out = self.model(batch)
File "/home//anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(input, kwargs)
File "/home//anaconda3/lib/python3.6/site-packages/kiwi/models/predictor_estimator.py", line 349, in forward
sentence_input = self.make_sentence_input(h_tgt, h_src)
File "/home/***/anaconda3/lib/python3.6/site-packages/kiwi/models/predictor_estimator.py", line 418, in make_sentence_input
h = h_tgt[0] if h_tgt else h_src[0]
TypeError: 'NoneType' object is not subscriptable

I am training the estimator model only on sentence level QE data, and i have checked my data. here is my setting:

token-level: True
sentence-level: True
sentence-ll: False
binary-level: False
predict-target: False
predict-source: False
predict-gaps: false

Could you give me some advice for solving this error? Thank you very much！

how fast is the predictor training supposed to be?

I'm training predictor model on 10 million sentence pairs using gpu. Sentence length is limited to 1 to 50. Batch size is 128. According to the log, training for an epoch should taka about a day. Is this performance normal? What is the most time consuming nn layer?

Documentation: Can't find yaml configuration files for QUETCH

Hi,

Could you share YAML files for training, prediction, and evaluation?

Achyuth

Hi, I am trying to load the model once and then predict to stdout as the input is fed to stdin. I cannot find any module in the python package to do this.

Can not find all five models.

Where is ape-qe?
I know my question is dump but please help me. I am a poor guy which can afford only one hour's time to use computer per day.
Now I have to go for feeding the sheep.

Failed to conduct Predictor-Estimator predicting

After training zh-en data with predictor model, I continued the predict step with following command:
kiwi predict --model estimator --test-source /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/dev/dev.source --test-target /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/dev/dev.target --sentence-level True --gpu-id 0 --output-dir /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/
I got following errors:
[kiwi.lib.predict setup:159] {'batch_size': 64,
'config': None,
'debug': False,
'experiment_name': None,
'gpu_id': 0,
'load_data': None,
'load_model': None,
'load_vocab': None,
'log_interval': 100,
'mlflow_always_log_artifacts': False,
'mlflow_tracking_uri': 'mlruns/',
'model': 'estimator',
'output_dir': '/home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/',
'quiet': False,
'run_uuid': None,
'save_config': None,
'save_data': None,
'seed': 42}

Traceback (most recent call last):
File "/home/hzli/anaconda3/bin/kiwi", line 11, in
sys.exit(main())
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/main.py", line 22, in main
return kiwi.cli.main.cli()
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/cli/main.py", line 73, in cli
predict.main(extra_args)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/cli/pipelines/predict.py", line 56, in main
predict.predict_from_options(options)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/lib/predict.py", line 54, in predict_from_options
run(options.model_api, output_dir, options.pipeline, options.model)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/lib/predict.py", line 113, in run
model = Model.create_from_file(pipeline_opts.load_model)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/models/model.py", line 210, in create_from_file
str(path), map_location=lambda storage, loc: storage
File "/home/hzli/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 356, in load
f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'None'
I spent lots time to find out the error, but it never work.
Would you please give me some advice for solving this error? Thank you very much！

Relative links break when displayed in docs

Describe the bug
Relative links to files, which are common in the README, break when rendered in the docs.

To Reproduce
Go to docs and click on the changelog link in the news section.

Expected behavior
Should take you to a nice doc or the correct file.

Screenshots

Output format names are somewhat confusing

The options for defining the output formats, wmt18 and wmt17, can be somewhat confusing. I think something like with-gaps would be clearer, especially to newcomers to QE.

This is particularly misleading when running an evaluate config on source tags. If set to wmt18, the script will interpret that there are gap tags in the system output, which was not the case in WMT 2018.

Word level QE reproduction

As per the email we sent all of the paper authors we (@zouharvi, @obo) trained the predictor and then the estimator on our custom data, but the results were almost random.

Describe the bug
While trying to find the problem we tried to reproduce the WMT result based on your pre-trained models, as mentioned in the documentation. There must be some systematic mistake we're making because the pre-trained estimator produces almost random results.

To Reproduce
Run in an empty directory. The script downloads the model and then tries to estimate the quality of the first sentence from the training dataset in WMT18.

wget https://github.com/Unbabel/OpenKiwi/releases/download/0.1.1/en_de.nmt_models.zip
unzip -n en_de.nmt_models.zip

mkdir output input
echo "the part of the regular expression within the forward slashes defines the pattern ." > ./input/test.src
echo "der Teil des regulären Ausdrucks innerhalb der umgekehrten Schrägstrich definiert das Muster ." > ./input/test.trg

kiwi predict \
--config ./en_de.nmt_models/estimator/target_1/predict.yaml \
--load-model ./en_de.nmt_models/estimator/target_1/model.torch \
--experiment-name "Single line test" \
--output-dir output \
--gpu-id -1 \
--test-source ./input/test.src \
--test-target ./input/test.trg

cat output/tags

Expected result

OK OK OK OK OK OK OK OK OK OK OK OK OK BAD OK OK OK BAD OK OK OK OK OK OK OK OK OK

Of course, the gold annotation contains the extra gap tags, but despite that most of the sentence is classified as OK, which is contrary to the model output (lots of almost zeroes).

Actual result

0.04104529693722725 0.013736072927713394 0.011828877963125706 0.014644734561443329 0.022598857060074806 0.10979203879833221 0.8875276446342468 0.711827278137207 0.9585599303245544 0.20660772919654846 0.22217749059200287  0.1782749891281128 0.012791415676474571

Environment (please complete the following information):

OS: Fedora 30, Ubuntu 18.04
OpenKiwi version 0.1.2
Python version 3.7.4

Error training nuqe model

When running kiwi train --config experiments/train_nuqe.yaml I ran into the following error:

2019-02-26 15:48:32.631 [root setup:380] This is run ID: 9124ced8667849acb40f10a124109234
2019-02-26 15:48:32.631 [root setup:383] Inside experiment ID: 0 (None)
2019-02-26 15:48:32.631 [root setup:386] Local output directory is: models/nuqe
2019-02-26 15:48:32.632 [root setup:389] Logging execution to MLflow at: mlruns/
2019-02-26 15:48:32.632 [root setup:397] Using CPU
2019-02-26 15:48:32.632 [root setup:400] Artifacts location: mlruns/0/9124ced8667849acb40f10a124109234/artifacts
2019-02-26 15:48:33.648 [kiwi.lib.train run:154] Training the NuQE model
2019-02-26 15:48:34.448 [kiwi.lib.train run:187] NuQE(
  (_loss): CrossEntropyLoss()
  (source_emb): Embedding(5372, 50, padding_idx=1)
  (target_emb): Embedding(7874, 50, padding_idx=1)
  (embeddings_dropout): Dropout(p=0.5)
  (linear_1): Linear(in_features=300, out_features=400, bias=True)
  (linear_2): Linear(in_features=400, out_features=400, bias=True)
  (linear_3): Linear(in_features=400, out_features=200, bias=True)
  (linear_4): Linear(in_features=200, out_features=200, bias=True)
  (linear_5): Linear(in_features=400, out_features=100, bias=True)
  (linear_6): Linear(in_features=100, out_features=50, bias=True)
  (linear_out): Linear(in_features=50, out_features=2, bias=True)
  (gru_1): GRU(400, 200, batch_first=True, bidirectional=True)
  (gru_2): GRU(200, 200, batch_first=True, bidirectional=True)
  (dropout_in): Dropout(p=0.0)
  (dropout_out): Dropout(p=0.0)
)
2019-02-26 15:48:34.449 [kiwi.lib.train run:188] 2313552 parameters
2019-02-26 15:48:34.449 [kiwi.trainers.trainer run:74] Epoch 1 of 10
Batches:   0%|                                    | 0/236 [00:00<?, ? batches/s]
Traceback (most recent call last):
  File "/Users/erick/.virtualenvs/kiwi/bin/kiwi", line 10, in <module>
    sys.exit(main())
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/__main__.py", line 22, in main
    return kiwi.cli.main.cli()
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/cli/main.py", line 71, in cli
    train.main(extra_args)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/cli/pipelines/train.py", line 141, in main
    train.train_from_options(options)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/lib/train.py", line 123, in train_from_options
    trainer = run(ModelClass, output_dir, pipeline_options, model_options)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/lib/train.py", line 204, in run
    trainer.run(train_iter, valid_iter, epochs=pipeline_options.epochs)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/trainers/trainer.py", line 75, in run
    self.train_epoch(train_iterator, valid_iterator)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/trainers/trainer.py", line 95, in train_epoch
    outputs = self.train_step(batch)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/trainers/trainer.py", line 140, in train_step
    loss_dict = self.model.loss(model_out, batch)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/models/quetch.py", line 161, in loss
    loss = self._loss(predicted, y)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 904, in forward
    ignore_index=self.ignore_index, reduction=self.reduction)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/torch/nn/functional.py", line 1970, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/torch/nn/functional.py", line 1788, in nll_loss
    .format(input.size(0), target.size(0)))
ValueError: Expected input batch_size (256) to match target batch_size (576).

The .yaml file only differed from the original in the path to the training data.

This happened with

macOS 10.14.2
OpenKiwi 0.1.0
Python 3.7.2

Add calibration and active learning with Baal

Baal looks pretty cool.

Let's see if we can use it to select data for more training (active learning) or model calibration (so the predicted probabilities are more meaningul).

Error occurred while using sentence-level Predictor-Estimator to predict

After successfully training the sentence-level Predictor and Estimator model，an error occurred while using the Estimator model to predict the sentence-level data.

The command is:
kiwi predict --config experiments_sl/predict_estimator.yaml

And the error is:
2019-04-22 07:19:37.521 [kiwi.lib.predict setup:159] {'batch_size': 64,
'config': 'experiments_sl/predict_estimator.yaml',
'debug': False,
'experiment_name': 'EN-ZH Pretrain Predictor',
'gpu_id': None,
'load_data': None,
'load_model': 'runs/0/464dc10bfc174ac79ca082eae0dea352/best_model.torch',
'load_vocab': None,
'log_interval': 100,
'mlflow_always_log_artifacts': False,
'mlflow_tracking_uri': 'mlruns/',
'model': 'estimator',
'output_dir': 'predictions/predest/ccmt/en_zh',
'quiet': False,
'run_uuid': None,
'save_config': None,
'save_data': None,
'seed': 42}
2019-04-22 07:19:37.521 [kiwi.lib.predict setup:160] Local output directory is: predictions/predest/ccmt/en_zh
2019-04-22 07:19:37.521 [kiwi.lib.predict run:100] Predict with the PredEst (Predictor-Estimator) model
Traceback (most recent call last):
File "/home2/zyl/anaconda3/envs/openkiwi/bin/kiwi", line 10, in
sys.exit(main())
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/kiwi/main.py", line 22, in main
return kiwi.cli.main.cli()
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/kiwi/cli/main.py", line 73, in cli
predict.main(extra_args)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/kiwi/cli/pipelines/predict.py", line 56, in main
predict.predict_from_options(options)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/kiwi/lib/predict.py", line 54, in predict_from_options
run(options.model_api, output_dir, options.pipeline, options.model)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/kiwi/lib/predict.py", line 113, in run
model = Model.create_from_file(pipeline_opts.load_model)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/kiwi/models/model.py", line 214, in create_from_file
model = Model.subclasses[model_name].from_dict(model_dict)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/kiwi/models/model.py", line 235, in from_dict
model.load_state_dict(class_dict[const.STATE_DICT])
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Estimator:
Unexpected key(s) in state_dict: "predictor_tgt.W2", "predictor_tgt.V", "predictor_tgt.C", "predictor_tgt.S", "predictor_tgt.attention.scorer.layers.0.0.weight", "predictor_tgt.attention.scorer.layers.0.0.bias", "predictor_tgt.attention.scorer.layers.1.0.weight", "predictor_tgt.attention.scorer.layers.1.0.bias", "predictor_tgt.embedding_source.weight", "predictor_tgt.embedding_target.weight", "predictor_tgt.lstm_source.weight_ih_l0", "predictor_tgt.lstm_source.weight_hh_l0", "predictor_tgt.lstm_source.bias_ih_l0", "predictor_tgt.lstm_source.bias_hh_l0", "predictor_tgt.lstm_source.weight_ih_l0_reverse", "predictor_tgt.lstm_source.weight_hh_l0_reverse", "predictor_tgt.lstm_source.bias_ih_l0_reverse", "predictor_tgt.lstm_source.bias_hh_l0_reverse", "predictor_tgt.lstm_source.weight_ih_l1", "predictor_tgt.lstm_source.weight_hh_l1", "predictor_tgt.lstm_source.bias_ih_l1", "predictor_tgt.lstm_source.bias_hh_l1", "predictor_tgt.lstm_source.weight_ih_l1_reverse", "predictor_tgt.lstm_source.weight_hh_l1_reverse", "predictor_tgt.lstm_source.bias_ih_l1_reverse", "predictor_tgt.lstm_source.bias_hh_l1_reverse", "predictor_tgt.forward_target.weight_ih_l0", "predictor_tgt.forward_target.weight_hh_l0", "predictor_tgt.forward_target.bias_ih_l0", "predictor_tgt.forward_target.bias_hh_l0", "predictor_tgt.forward_target.weight_ih_l1", "predictor_tgt.forward_target.weight_hh_l1", "predictor_tgt.forward_target.bias_ih_l1", "predictor_tgt.forward_target.bias_hh_l1", "predictor_tgt.backward_target.weight_ih_l0", "predictor_tgt.backward_target.weight_hh_l0", "predictor_tgt.backward_target.bias_ih_l0", "predictor_tgt.backward_target.bias_hh_l0", "predictor_tgt.backward_target.weight_ih_l1", "predictor_tgt.backward_target.weight_hh_l1", "predictor_tgt.backward_target.bias_ih_l1", "predictor_tgt.backward_target.bias_hh_l1", "predictor_tgt.W1.weight".

Could you give some advice for solving this error? Thanks a lot!

Questions about prediction results and evaluate module

Hi! Thank you for the work you have done.

I'm working with the Predictor-Estimator model and I successfully trained the predictor and estimator model on both word level and sentence level. But I find the results a little bit different: tags for words are in the type of probability.

I wonder how to transfer the probability into a binary tag like OK/BAD? If there is a threshold, how much is it?

Besides, I found that if I train the Estimator with both word and sentence data, it would produce both tags of words and scores of sentences. Then I evaluated them with kiwi, the result board shows that the score drafted from tags are much more better. May I ask how does it calculate score of sentence from tags of words? Is it simple average? But I didn't set the property --sents-avg.

Thank you so much!

Unable to start openkiwi

I am unable to run to start openkiwi it seems that I have properly installed everything needed.

When I try to load the example typing kiwi --example I get the following message

(base) C:\Users\Administrator>kiwi --example
Traceback (most recent call last):
File "c:\users\administrator\anaconda3\lib\runpy.py", line 194, in _run_module_as_main
return run_code(code, main_globals, None,
File "c:\users\administrator\anaconda3\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\Administrator\anaconda3\Scripts\kiwi.exe_main.py", line 7, in
File "c:\users\administrator\anaconda3\lib\site-packages\kiwi_main.py", line 21, in main
return kiwi.cli.cli()
File "c:\users\administrator\anaconda3\lib\site-packages\kiwi\cli.py", line 105, in cli
config_dict = arguments_to_configuration(arguments)
File "c:\users\administrator\anaconda3\lib\site-packages\kiwi\cli.py", line 68, in arguments_to_configuration
config_file = Path(arguments['CONFIG_FILE'])
File "c:\users\administrator\anaconda3\lib\pathlib.py", line 1038, in new
self = cls._from_parts(args, init=False)
File "c:\users\administrator\anaconda3\lib\pathlib.py", line 679, in _from_parts
drv, root, parts = self._parse_args(args)
File "c:\users\administrator\anaconda3\lib\pathlib.py", line 663, in _parse_args
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

OS: Windows 10 64 bit
OpenKiwi version 2.0.0
Python version 3.8.3
Anaconda
Pytorch Build 1.7.0

Artificially generated triplets?

Do you all have any plans to make the artificially generated training corpus referenced in your paper available publicly? I might be missing it, but I can't find that data on the repo.

I would like to reproduce the Openkiwi results from scratch, but without this , the dataset is very small.

Thanks!

Hyperparamete tuning using skorch

Hello, I am trying to use Random Search for hyperparameter tuning using skorch. I am using RandomSearchCV to find the best hyperparameters when running my estimator yaml file, however because there are many wrappers, I can't get it to work.

My question is more general, do you have any experience with this? or might you know someone who has experience with this?

Thank you.

Documentation missing for Logging Flags

Describe the bug
Logging options are not showing in the automatic documentation.

To Reproduce
Steps to reproduce the behavior:

Go to Docs
Configuration Options -> General Options
The logging flags are not documented

Expected behavior
These flags should also be captured by sphinx and displayed clearly in the documentation.

Pre-trained models for v. 2.0.0

Hi,

When trying to make predictions with a previously released pre-trained model (the EN-DE NMT Estimator included in release 0.1.1.), I noticed that loading models trained with the previous version of OpenKiwi is not quite straightforward. At this point, I'm not sure if the error I'm running into is a bug or if I'm doing something wrong, hence my question:

Are you planning to release more pre-trained models that are compatible with v. 2.0.0?

Kind regards,
Andras

p.s. The error I'm running into occurs because the assert statement on line 215 in utils/migrations/v0_to_v2.py fails when attempting to load the older model.

Failed to rerun predictor model

Describe the bug

When I try to rerun an already trained predictor model I get:

~/.local/lib/python3.5/site-packages/kiwi/lib/train.py", line 187, in run
logger.info(str(trainer.model))

AttributeError: 'NoneType' object has no attribute 'model'

To Reproduce
I have trained the predictor model with the default configuration for 6 epochs and it ran fine. Then I tried to rerun it for a few more epochs with the following changed arguments:

checkpoint-early-stop-patience:20
run-uuid: 33939be51e8a4f04a9f485142e16000d
resume: true

I retrieved the run-uuid from the runs/model_name/output.log file:
This is run ID: 33939be51e8a4f04a9f485142e16000d
Is this the correct one? Are there any further arguments that need to be provided in order to rerun/train further a model?

I will also attach my config file.

Environment (please complete the following information):

OS: Linux, Ubuntu 19.04
train_config.txt
OpenKiwi version 0.1.1
Python version 3.5

Implement Multi-GPU training

Is your feature request related to a problem? Please describe.
Currently, there is no solution to training QE models using multiple GPUs which can significantly speed up training of large models.
There are have been several issues of people requesting this feature. (#31 #29)

Describe the solution you'd like
Ideally, it should be possible to pass several GPU IDs to the gpu-id yams flag. OpenKiwi should use all of them in parallel to train the model.

Additional context
An important thing to take into account is that other parts of the pipeline might become a bottleneck when using multiple GPUS. Things like data injestion/tokenisation, etc

Is it possible to train with just src, mt, ter?

Hi,
Thanks for making openkiwi available!
Can any of the openkiwi models be trained when the only data available is
[source sentence], [machine translation sentence], [TER score]
? As far as I can tell all the examples need more data, for example the tags. But maybe I missed something.
Thanks,
James

Evaluate pipeline cannot deal with `verbose` and `quiet` arguments

Describe the bug
The evaluate pipeline cannot be used: it is missing configuration arguments quiet and verbose and Pydantic throws an error.

To Reproduce

kiwi evaluate evaluate.yaml
...
pydantic.error_wrappers.ValidationError: 2 validation errors for Configuration
quiet
  extra fields not permitted (type=value_error.extra)
verbose
  extra fields not permitted (type=value_error.extra)

with evaluate.yaml:

gold_files:
    sentence_scores: <some scores file>

predicted_dir:
    - runs/0/64719031d54b444f988c5ec1cbfc8491
    - runs/0/a31ab711714f4d779023fc9f68fa2c45

Replacing the predictor by BERT or XLM to implement PREDEST-BERT and PREDEST-XLM?

Hello,

Thank you very much for this OpenKiwi toolkit.

I am trying to reproduce the section 2.6, Transfer Learning and Fine-Tuning part in the Unbabel’s paper

It states that I can replace the predictor with multilingual BERT or XLM. I'm wondering how can I achieve that?

If I simply load the PyTorch version of BERT or XLM model in the train_estimator.yaml file, it gives me KeyError of 'vocab'. The script is trying to retrieve the vocabulary torch file from the models but couldn't find it. What should I modify with the models or OpenKiwi source code to solve this problem?

Poor results when training Estimator with parallel data and TER scores

Hi!

We are trying to train a EN-FR sentence level QE model by using a predictor estimator model with parallel data.

We are using OpenKiwi 0.1.3 to train it.

The procedure was as follows:

Train the Predictor using parallel data (EN-FR)
Train the Estimator using the Predictor (from step 1) using the following data (as commented in thread #46):
a.the English source sentences
b.the FR translated sentences using a pretrained MT model
c.the TER scores for each FR sentence translated

The results obtained were of a Pearson correlation of 0.32 and a Spearman correlation of 0.36, which are below the 0.5018 and 0.5566 obtained on the OpenKiwi paper (https://www.aclweb.org/anthology/P19-3020.pdf).

My question is: is it possible to obtain a similar result using only parallel data? If affirmative there is something wrong on our procedure?

The configuration files used to train are the following:

#predictor_config-enfr.yml
checkpoint-early-stop-patience: 0
checkpoint-keep-only-best: 2
checkpoint-save: true
checkpoint-validation-steps: 50000
dropout-pred: 0.5
embedding-sizes: 200
epochs: 5
experiment-name: Pretrain Predictor
gpu-id: 0
hidden-pred: 400
learning-rate: 2e-3
learning-rate-decay: 0.6
learning-rate-decay-start: 2
log-interval: 100
model: predictor
optimizer: adam
out-embeddings-size: 200
output-dir: runs/predictor-enfr
predict-inverse: false
rnn-layers-pred: 2
source-embeddings-size: 200
source-max-length: 50
source-min-length: 1
source-vocab-min-frequency: 1
source-vocab-size: 45000
split: 0.9
target-embeddings-size: 200
target-max-length: 50
target-min-length: 1
target-vocab-min-frequency: 1
target-vocab-size: 45000
train-batch-size: 16
train-source: custom_data/train-enfr.src
train-target: custom_data/train-enfr.tgt
valid-batch-size: 16
valid-source: custom_data/dev-enfr.src
valid-target: custom_data/dev-enfr.tgt

#estimator_config-enfr.yml
binary-level: false
checkpoint-early-stop-patience: 0
checkpoint-keep-only-best: 2
checkpoint-save: true
checkpoint-validation-steps: 0
dropout-est: 0.0
epochs: 5
experiment-name: Train Estimator
gpu-id: 0
hidden-est: 125
learning-rate: 2e-3
load-pred-target: runs/predictor-enfr/best_model.torch
log-interval: 100
mlp-est: true
model: estimator
output-dir: runs/estimator-enfr
predict-gaps: false
predict-source: false
predict-target: false
rnn-layers-est: 1
sentence-level: true
sentence-ll: false
source-bad-weight: 2.5
target-bad-weight: 2.5
token-level: false
train-batch-size: 16
train-sentence-scores: custom_data/train-enfr.ter
train-source: custom_data/train-enfr.src
train-target: custom_data/train-enfr.pred
valid-batch-size: 16
valid-sentence-scores: custom_data/dev-enfr.ter
valid-source: custom_data/dev-enfr.src
valid-target: custom_data/dev-enfr.pred
wmt18-format: false

#predictions_config-enfr.yml
gpu-id: 0
load-model: runs/estimator-enfr/best_model.torch
model: estimator
output-dir: predictions/predest-enfr
seed: 42
test-source: custom_data/test-enfr.src
test-target: custom_data/test-enfr.pred
valid-batch-size: 64
wmt18-format: false

Incompatibility with Torch 1.2

Describe the bug
The following error is occurring when using openkiwi with PyTorch 1.2:

"Expected object of scalar type Byte but got scalar type Bool for argument #3 'other' "

The error happened for inference.

The same error does not occur with PyTorch 1.0.1

Environment (please complete the following information):

OS: Amazon Linux
OpenKiwi version: 0.1.1
Python version 3.6

array must not contain infs or NaNs

I am training estimator on an en-zh dataset. At first everything runs well. But after epoch 8, it says "array must not contain infs or NaNs" and exists. I don't know why this happen.

How can I continue my predictor training when interrupted?

Hi,
I am using a very large corpus to train a predictor, and I set 6 epochs totally. Each epoch costs me more than 24 hours because of the large-scale corpus. However, it seems that my machine could not stand such a heavy work and the program got interrupted two times when it was on the 4th epoch. However, restarting the kiwi program will waste the former epoch, so I wonder how I can get the checkpoint or continue predictor training from where the program interrupted. Could you tell me what I should do? Thank you.

Question about data

Hello OpenKiwi team,
First, thanks for providing such good framework. It looks promising.

I have a question since I am a bit confused:

I only have source and target data (corpus.en, corpus.es). Is this enough to train a model? As far as I could understand from going through some threads, this should be enough to create a 'predictor' model, and maybe the predictor model can be used to train an 'estimator' model if you have some post-edited data later, right?

If right, in such case, I should only have the following lines under DATA OPTS in the config file:

### DATA OPTS ###

# Source and Target Files
train-source: data/parallel/corpus.en
train-target: data/parallel/corpus.es

Are there any other changes that I need to make?

I would be grateful if you can give guidance on best practices of creating a model using only source and target data.

Thanks in advance for your support!

Add alignment prediction with SimAlign

Why?

SimAlign is an amazingly simple and effective way of obtaining word alignments from multilingual Transformer encoders. OpenKiwi is built on top of multilingual Transformers. Hence OpenKiwi can produce alignments.

The training objective of OpenKiwi might even improve the alignments.

The alignments could be used in ingenious ways in the quality predictions. For example:

The predicted BAD target words can be aligned with source tokens to highlight which source word might have caused the mistranslation (similar to the definition of 'source tags' in the WMT QE shared task)
The alignments themselves can be used to detect accuracy errors: if an alignment is missing between a content-word in source and target this might indicate an omission or a mistranslation.

To be investigated.

How?

Two options:

Pip install

We add SimAlign to the dependencies, and import from it. Challenge: we use the encoders in slightly different ways:

OpenKiwi forwards source and target simultaneously; SimAlign forwards the sentences as two separate sentences: https://github.com/cisnlp/simalign/blob/master/simalign/simalign.py#L211
OpenKiwi has the encoder integrated into the model, and not saved to a path (which is expected by SimAlign: https://github.com/cisnlp/simalign/blob/master/simalign/simalign.py#L51), and we don't want to have to save to file separately.

Integrate code

Integrate the SimAlign code into OpenKiwi and adapt as needed. All the decoding algorithms are left unchanged, only the model setup and forward pass need to be changed. The only files that is needed is: https://github.com/cisnlp/simalign/blob/master/simalign/simalign.py

Important notes:

We need to verify that the licence allows this (GNU GENERAL PUBLIC LICENSE)
All development and changes to SimAlign need to be ported manually (instead of automatically through new version releases)
We should properly reference SimAlign where we use their code - acknowledgements are important!
OpenKiwi code becomes more complicated

Open questions

What should the output format be? I think for passing alignments, List[Tuple[int, int]], and for saving to file we should opt for 'pharaoh format': i-j k-l etc.
How do we add alignments dynamically to the predicted output? Just another field in the output object?

Vocabulary size for predictor training

Hi,

First of all, thank you for OpenKiwi.

My question: when training the Predictor, what exactly are source-vocab-size and target-vocab-size supposed to mean: the number of (all) tokens or the number of unique words in the training corpora?

Perhaps adding this to the relevant docs page would be helpful to less experienced users of frameworks of this kind.

Best,
Andras (TAUS)

Error happened when running demo

Describe the bug
When I run the demo according to "https://unbabel.github.io/OpenKiwi/usage.html",
I run the folloing script,
" from kiwi.lib.train import train_from_file "
then error happend:

To Reproduce
Steps to reproduce the behavior:

python
from kiwi.lib.train import train_from_file

Environment (please complete the following information):

OS: mac
OpenKiwi version: latest code from master
Python version: 3.6.10

`inf` value for RMSE metric

Describe the bug

Hi!

I'm training a estimator and I'm getting the following bug:

Traceback (most recent call last):
  File "/path/to/venv/bin/kiwi", line 10, in <module>
    sys.exit(main())
  File /path/to/venv/lib/python3.5/site-packages/kiwi/__main__.py", line 22, in main
    return kiwi.cli.main.cli()
  File "/path/to/venv/lib/python3.5/site-packages/kiwi/cli/main.py", line 71, in cli
    train.main(extra_args)
  File "/path/to/venv/lib/python3.5/site-packages/kiwi/cli/pipelines/train.py", line 141, in main
    train.train_from_options(options)
  File "/path/to/venv/lib/python3.5/site-packages/kiwi/lib/train.py", line 123, in train_from_options
    trainer = run(ModelClass, output_dir, pipeline_options, model_options)
  File "/path/to/venv/lib/python3.5/site-packages/kiwi/lib/train.py", line 204, in run
    trainer.run(train_iter, valid_iter, epochs=pipeline_options.epochs)
  File "/path/to/venv/lib/python3.5/site-packages/kiwi/trainers/trainer.py", line 75, in run
    self.train_epoch(train_iterator, valid_iterator)
  File "/path/to/venv/lib/python3.5/site-packages/kiwi/trainers/trainer.py", line 97, in train_epoch
    self.stats.log(step=self._step)
  File "/path/to/venv/lib/python3.5/site-packages/kiwi/metrics/stats.py", line 167, in log
    stats_summary.log()
  File "/path/to/venv/lib/python3.5/site-packages/kiwi/metrics/stats.py", line 63, in log
    tracking_logger.log_metric(k, v)
  File "/path/to/venv/lib/python3.5/site-packages/kiwi/loggers.py", line 181, in log_metric
    mlflow.log_metric(key, value)
  File "//path/to/venv/lib/python3.5/site-packages/mlflow/tracking/fluent.py", line 199, in log_metric
    MlflowClient().log_metric(run_id, key, value, int(time.time()))
  File "/path/to/venv/lib/python3.5/site-packages/mlflow/tracking/client.py", line 170, in log_metric
    _validate_metric(key, value, timestamp)
  File "/path/to/venv/lib/python3.5/site-packages/mlflow/utils/validation.py", line 67, in _validate_metric
    INVALID_PARAMETER_VALUE)
mlflow.exceptions.MlflowException: Got invalid value inf for metric 'RMSE' (timestamp=1561492769). Please specify value as a valid double (64-bit floating point)

This comes after a warning:

RuntimeWarning: invalid value encountered in subtract
  xm, ym = x - mx, y - my
/path/to/venv/lib/python3.5/site-packages/scipy/stats/stats.py:3036: RuntimeWarning: invalid value encountered in reduce
  r_num = np.add.reduce(xm * ym)

It happened during training, after 16 epochs and after 500 batches of epoch 16.

To Reproduce
Both yaml files and data are not public. Let me know and I can share them with you.

Expected behavior
Finishes training for the 20 epochs I specified.

Environment (please complete the following information):

OS: Ubuntu 16.04.3 LTS (Xenial Xerus) on Azure
OpenKiwi version 0.1.1
Python version 3.5.2

TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

Describe the bug
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

To Reproduce

Creat a new file: run.py

from kiwi.lib.train import train_from_file
run_info = train_from_file('config/bert.yaml')

and then execute this program
2、See error
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

Expected behavior
I hope to pass the type check when the parameter is initialized

Screenshots
https://github.com/Shannen3206/csn/blob/master/error.png

Environment (please complete the following information):

OS: Linux
OpenKiwi version： 2.0.0
Python version：3.6.9

I have use quickstart .

but i don't know how to give sentence score .dose it need word-leval score before do that .I use wmt2020 model from kiwi as the baseline.
http://www.statmt.org/wmt20/quality-estimation-task.html
the model execute but just give me that
{'batch_size': 64,
'config': 'predict_estimator.yaml',
'debug': False,
'experiment_name': 'predict-predest',
'gpu_id': 0,
'load_data': None,
'load_model': 'best_model.torch',
'load_vocab': None,
'log_interval': 100,
'mlflow_always_log_artifacts': False,
'mlflow_tracking_uri': 'mlruns/',
'model': 'estimator',
'output_dir': 'peace&love',
'quiet': False,
'run_name': None,
'run_uuid': None,
'save_config': None,
'save_data': None,
'seed': 42}
Local output directory is: peace&love
Predict with the PredEst (Predictor-Estimator) model
{'batch_size': 64,
'config': 'predict_estimator.yaml',
'debug': False,
'experiment_name': 'predict-predest',
'gpu_id': -1,
'load_data': None,
'load_model': 'best_model.torch',
'load_vocab': None,
'log_interval': 100,
'mlflow_always_log_artifacts': False,
'mlflow_tracking_uri': 'mlruns/',
'model': 'estimator',
'output_dir': 'peace&love',
'quiet': False,
'run_name': None,
'run_uuid': None,
'save_config': None,
'save_data': None,
'seed': 42}
Local output directory is: peace&love
Predict with the PredEst (Predictor-Estimator) model
Loaded vocabularies from best_model.torch
Is it wrong ,so stop?

IndexError: tuple index out of range while using sentence-level Predictor-Estimator to train

After successfully training the sentence-level Predictor and then use it to train Estimator, I get an error named IndexError: tuple index out of range ,the details are as follows：

/home/zwc/python-virtual-environments/env/lib/python3.6/site-packages/torch/nn/modules/loss.py:443: UserWarning: Using a target size (torch.Size([1])) that is different to the input size (torch.Size([])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  return F.mse_loss(input, target, reduction=self.reduction)
Traceback (most recent call last):
  File "/home/zwc/python-virtual-environments/env/bin/kiwi", line 10, in <module>
    sys.exit(main())
  File "/home/zwc/python-virtual-environments/env/lib/python3.6/site-packages/kiwi/__main__.py", line 22, in main
    return kiwi.cli.main.cli()
  File "/home/zwc/python-virtual-environments/env/lib/python3.6/site-packages/kiwi/cli/main.py", line 71, in cli
    train.main(extra_args)
  File "/home/zwc/python-virtual-environments/env/lib/python3.6/site-packages/kiwi/cli/pipelines/train.py", line 141, in main
    train.train_from_options(options)
  File "/home/zwc/python-virtual-environments/env/lib/python3.6/site-packages/kiwi/lib/train.py", line 123, in train_from_options
    trainer = run(ModelClass, output_dir, pipeline_options, model_options)
  File "/home/zwc/python-virtual-environments/env/lib/python3.6/site-packages/kiwi/lib/train.py", line 204, in run
    trainer.run(train_iter, valid_iter, epochs=pipeline_options.epochs)
  File "/home/zwc/python-virtual-environments/env/lib/python3.6/site-packages/kiwi/trainers/trainer.py", line 78, in run
    self.checkpointer(self, valid_iterator, epoch=epoch)
  File "/home/zwc/python-virtual-environments/env/lib/python3.6/site-packages/kiwi/trainers/callbacks.py", line 105, in __call__
    eval_stats_summary = trainer.eval_epoch(valid_iterator)
  File "/home/zwc/python-virtual-environments/env/lib/python3.6/site-packages/kiwi/trainers/trainer.py", line 151, in eval_epoch
    self.stats.update(batch=batch, **outputs)
  File "/home/zwc/python-virtual-environments/env/lib/python3.6/site-packages/kiwi/metrics/stats.py", line 137, in update
    metric.update(**kwargs)
  File "/home/zwc/python-virtual-environments/env/lib/python3.6/site-packages/kiwi/metrics/metrics.py", line 310, in update
    predictions = self.get_predictions_flat(model_out, batch)
  File "/home/zwc/python-virtual-environments/env/lib/python3.6/site-packages/kiwi/metrics/metrics.py", line 104, in get_predictions_flat
    print("^^",predictions.shape[-1])
IndexError: tuple index out of range

in order to find the reason, I look up the /home/zwc/python-virtual-environments/env/lib/python3.6/site-packages/kiwi/metrics/metrics.py, and find the code around line 104:

#####** the original code is :**

  def get_predictions_flat(self, model_out, batch):
        predictions = self.get_predictions(model_out).contiguous()
        predictions_flat = predictions.view(-1, predictions.shape[-1]).squeeze()
        token_indices = self.get_token_indices(batch)
        return predictions_flat[token_indices]

#####** I print the message by insert print():**

def get_predictions_flat(self, model_out, batch):
        predictions = self.get_predictions(model_out).contiguous()
        print("**",predictions )
        print("^^",predictions.shape[-1])
        predictions_flat = predictions.view(-1, predictions.shape[-1]).squeeze()
        token_indices = self.get_token_indices(batch)
        print("++",token_indices )
        print("--",predictions_flat[token_indices])
        return predictions_flat[token_indices]

#####then execute it again, the message is like this:
** tensor([0.0949, 0.0955, 0.0961, 0.0940, 0.0974], device='cuda:1',
grad_fn=)
^^ 5
++ 5
-- tensor([0, 1, 2, 3, 4], device='cuda:1')

its al right but when end this epoch, and Saving training state to runs/0/ec661049f6364cfbb4c2e7a9dd1abe9d/epoch_1 ; Saving sentence_scores predictions to runs/0/ec661049f6364cfbb4c2e7a9dd1abe9d/epoch_1/sentence_scores, the error occured because of follow:

** tensor(0.1439, device='cuda:1')
File "/home/zwc/python-virtual-environments/env/lib/python3.6/site-packages/kiwi/metrics/metrics.py", line 104, in get_predictions_flat
print("^^",predictions.shape[-1])

In other words, when the predictions are not a list but an element, predictions.shape[-1] gives an error. And why it? How to resolve it. I gave my configuration files for train_predictor.yaml and train_estimator.yaml

train_estimator - 副本.yaml.txt
train_predictor - 副本.yaml.txt

And another issue, how to train with two gpu instead of just one.
such as gpu-id: 1 -> gpu-id: 1,0

GPU training crashes

I have no problems training on a CPU. But when I train on a GPU it crashes every time.

2019-10-15 09:02:52.215 [root setup:380] This is run ID: 4d526700aafc4f5fba779bae21789a82
2019-10-15 09:02:52.215 [root setup:383] Inside experiment ID: 0 (EN-DE Pretrain Predictor)
2019-10-15 09:02:52.215 [root setup:386] Local output directory is: /ec/dgt/local/exodus/home/bhaskbh/gpu_train
2019-10-15 09:02:52.215 [root setup:389] Logging execution to MLflow at: None
2019-10-15 09:02:52.247 [root setup:395] Using GPU: 3
2019-10-15 09:02:52.247 [root setup:400] Artifacts location: None
2019-10-15 09:02:52.252 [kiwi.lib.train run:154] Training the PredEst Predictor model (an embedder model) model
2019-10-15 09:03:13.830 [kiwi.lib.train run:187] Predictor(
(attention): Attention(
(scorer): MLPScorer(
(layers): ModuleList(
(0): Sequential(
(0): Linear(in_features=1600, out_features=800, bias=True)
(1): Tanh()
)
(1): Sequential(
(0): Linear(in_features=800, out_features=1, bias=True)
(1): Tanh()
)
)
)
)
(embedding_source): Embedding(45004, 200, padding_idx=1)
(embedding_target): Embedding(45004, 200, padding_idx=1)
(lstm_source): LSTM(200, 400, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True)
(forward_target): LSTM(200, 400, num_layers=2, batch_first=True, dropout=0.5)
(backward_target): LSTM(200, 400, num_layers=2, batch_first=True, dropout=0.5)
(W1): Embedding(45004, 200, padding_idx=1)
(_loss): CrossEntropyLoss()
)
2019-10-15 09:03:13.831 [kiwi.lib.train run:188] 39389601 parameters
2019-10-15 09:03:13.831 [kiwi.trainers.trainer run:75] Epoch 1 of 6
Batches: 0%| | 0/5680 [00:00<?, ? batches/s]
Traceback (most recent call last):
File "/home/bhaskbh/.local/bin/kiwi", line 11, in
sys.exit(main())
File "/home/bhaskbh/.local/lib/python3.6/site-packages/kiwi/main.py", line 22, in main
return kiwi.cli.main.cli()
File "/home/bhaskbh/.local/lib/python3.6/site-packages/kiwi/cli/main.py", line 71, in cli
train.main(extra_args)
File "/home/bhaskbh/.local/lib/python3.6/site-packages/kiwi/cli/pipelines/train.py", line 142, in main
train.train_from_options(options)
File "/home/bhaskbh/.local/lib/python3.6/site-packages/kiwi/lib/train.py", line 123, in train_from_options
trainer = run(ModelClass, output_dir, pipeline_options, model_options)
File "/home/bhaskbh/.local/lib/python3.6/site-packages/kiwi/lib/train.py", line 204, in run
trainer.run(train_iter, valid_iter, epochs=pipeline_options.epochs)
File "/home/bhaskbh/.local/lib/python3.6/site-packages/kiwi/trainers/trainer.py", line 76, in run
self.train_epoch(train_iterator, valid_iterator)
File "/home/bhaskbh/.local/lib/python3.6/site-packages/kiwi/trainers/trainer.py", line 96, in train_epoch
outputs = self.train_step(batch)
File "/home/bhaskbh/.local/lib/python3.6/site-packages/kiwi/trainers/trainer.py", line 140, in train_step
model_out = self.model(batch)
File "/home/bhaskbh/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/bhaskbh/.local/lib/python3.6/site-packages/kiwi/models/predictor.py", line 240, in forward
source_mask = self.get_mask(batch, source_side)[:, 1:-1]
File "/home/bhaskbh/.local/lib/python3.6/site-packages/kiwi/models/model.py", line 205, in get_mask
input_tensor != pad_id, dtype=torch.uint8
RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'other'

To Reproduce
Simple train with config file.
epochs: 6
checkpoint-validation-steps: 5000
checkpoint-save: true
checkpoint-keep-only-best: 1
checkpoint-early-stop-patience: 0
optimizer: adam
log-interval: 100
learning-rate: 2e-3
learning-rate-decay: 0.6
learning-rate-decay-start: 2
train-batch-size: 64
valid-batch-size: 64
train-source: /home/bhaskbh/data/en_de.src
train-target: /home/bhaskbh/data/en_de.pe
split: 0.99
source-vocab-size: 45000
target-vocab-size: 45000
source-max-length: 50
source-min-length: 1
target-max-length: 50
target-min-length: 1
source-vocab-min-frequency: 1
target-vocab-min-frequency: 1
experiment-name: EN-DE Pretrain Predictor
gpu-id: 3

Environment

OS: Linux
OpenKiwi version 0.1.2
Python 3.6

Add transformer encoder fine-tuning with Adapters

Adapters are pretty cool. Let's see if we can implement them for our Transformer encoder models by integrating with Adapter Transformers.

A particular cool extension would be AdapterFusion, which would allow us to glue together Kiwi Adapter Transformer models for various language pairs into a single multilingual Kiwi model. This idea is a QE version of the the adapters for cross-lingual transfer. Lucky for us, AdapterFusion is also implemented in Adapter Transformers.

Error in predicting when using GPU for predictor-estimator

To Reproduce

I used the following config for predicting using the predictor-estimator model and I got some error when using GPU. I'm using it on a multi GPU machine but want to run only on one GPU.

output-dir: ...
seed: 42

gpu-id: 0
debug: True

model: estimator

load-model: ...

wmt18-format: False
test-source: ...
test-target: ...
valid-batch-size: 1024

2020-02-27 06:31:03.970 [kiwi.data.utils load_vocabularies_to_fields:126] Loaded vocabularies from KiwiCutter/trained_models/estimator_en_de.torch/estimator_en_de.torch
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/kiwi/bin/kiwi", line 11, in <module>
    sys.exit(main())
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/__main__.py", line 22, in main
    return kiwi.cli.main.cli()
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/cli/main.py", line 73, in cli
    predict.main(extra_args)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/cli/pipelines/predict.py", line 56, in main
    predict.predict_from_options(options)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/lib/predict.py", line 54, in predict_from_options
    run(options.model_api, output_dir, options.pipeline, options.model)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/lib/predict.py", line 131, in run
    test_dataset, batch_size=pipeline_opts.batch_size
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/predictors/predictor.py", line 116, in run
    model_pred = self.model.predict(batch)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/models/model.py", line 119, in predict
    model_out = self(batch)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/models/predictor_estimator.py", line 324, in forward
    model_out_tgt = self.predictor_tgt(batch)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/models/predictor.py", line 240, in forward
    source_mask = self.get_mask(batch, source_side)[:, 1:-1]
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/models/model.py", line 205, in get_mask
    input_tensor != pad_id, dtype=torch.uint8
RuntimeError: Expected object of device type cuda but got device type cpu for argument #3 'other' in call to _th_iand_

Environment (please complete the following information):

OS: Linux
openkiwi==0.1.2
Python 3.6.2

Plan to release other pre-trained models?

Hi!

Do you have any plan to release pre-trained models other than En-De NMT?

Can i use openkiwi to do data cleaning?

Can openkiwi model be used to filter dirty data from clean data? For example, if i have a dataset produced by crawler, can openkiwi be used to filter dirty sentence pairs from clean ones? My guess: the model predicts ter, but ter can't be used to classify sentences as clean or dirty. If a sentence has only one problematic word, ter might still be low since only one word is accounted. But if the word is critical, the sentence pair should be recognized as dirty.

Different servers to load the trained PredEst model issue

Hello! @captainvera

I trained a PredEst model on a server, and I could use the "kiwi predict --config predict.yaml" to make a prediction perfectly.

Recently, I wanted to use the trained PredEst model to make a prediction on another new server, and I just moved the trained best_model.torch from the old server to the new one to make it work. However, it does not work and showed [kiwi.lib.predict run:100] Predict with the PredEst (Predictor-Estimator) model Killed. And it seems that the process [kiwi.data.utils load_vocabularies_to_fields:126] Loaded vocabularies from models/best_model.torch is missing.

I am wondering if there is something wrong or missing with my operation to make the trained model work on the new server. Looking forward to your response. Thank you very much!

Where is ape-qe and all its files?

There are six files in directory "models":
linear_word_qe_classifier.py -> linear model
nuqe.py -> nuqe model
predictor.py -> predictor part of predictor-estimator model
predictor_estimator.py -> estimator part of predictor-esitmator model
quetch.py -> quetch model
utils.py -> all kinds of tools
model.py -> the inlet to all models
But there is not a file accords to ape-qe model!
And the same to other directories, where I can find files for nuqe\predest\quetch\linear, but no one is for ape-qe! Not any one!
Where the hell is ape-qe and all its related files? Or did I misunderstand something?

predictor-estimator crashes with Russian data

Describe the bug
Estimator training crashes during training with WMT19 Russian data

To Reproduce
Steps to reproduce the behavior:

Switch data to WMT2019 Russian data
train predictor
train estimator
See error @ 22% of batches in first epoch, 53/236

Expected behavior
I expected the estimator to train the same way it had for the German datasets

Screenshots
2019-06-24 21:07:25.075 [kiwi.trainers.trainer run:74] Epoch 1 of 10
Batches: 22%|██████ | 53/236 [00:27<00:58, 3.11 batches/s]Traceback (most recent call last):
File "/home/nlopatina/.virtualenvs/OpenKiwi/bin/kiwi", line 11, in
load_entry_point('openkiwi', 'console_scripts', 'kiwi')()
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/main.py", line 22, in main
return kiwi.cli.main.cli()
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/cli/main.py", line 71, in cli
train.main(extra_args)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/cli/pipelines/train.py", line 141, in main
train.train_from_options(options)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/lib/train.py", line 123, in train_from_options
trainer = run(ModelClass, output_dir, pipeline_options, model_options)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/lib/train.py", line 204, in run
trainer.run(train_iter, valid_iter, epochs=pipeline_options.epochs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 75, in run
self.train_epoch(train_iterator, valid_iterator)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 95, in train_epoch
outputs = self.train_step(batch)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 139, in train_step
model_out = self.model(batch)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor_estimator.py", line 324, in forward
model_out_tgt = self.predictor_tgt(batch)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor.py", line 275, in forward
for i in range(target_len - 2)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor.py", line 275, in
for i in range(target_len - 2)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/modules/attention.py", line 36, in forward
scores = self.scorer(query, keys)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/modules/scorer.py", line 60, in forward
layer_in = layer(layer_in)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/activation.py", line 292, in forward
return torch.tanh(input)
RuntimeError: CUDA out of memory. Tried to allocate 75.62 MiB (GPU 1; 11.93 GiB total capacity; 10.68 GiB already allocated; 42.56 MiB free; 717.88 MiB cached)

Environment (please complete the following information):
OS: Linux
OpenKiwi version 0.1.1
Python version 3.6.5

Additional context

did not have this error with all the same hyperparameters w/the german dataset
Tried running smaller batches; batch of 2 works for some time, but then crashes with a different error message.

missing WMT 2017 word_level/test.tags required for predictor-estimator evaluation

Describe the bug
The WMT 2017 test data set is missing the word_level/test.tags file that is required for predictor-estimator evaluation

To Reproduce
Steps to reproduce the behavior:

Run everything in quickstart instructions for predest model (making corrections for typos + directory specifications)
Use data from WMT 2017, train + test (in specified directories)
Run kiwi evaluate --config experiments/evaluate_estimator.yaml
See error "path must exist: data/WMT17/word_level/test.tags"

Expected behavior
I expected the evaluation to run. Second, I expected to find the WMT 2017 word_level/test.tags file, but it was not in the download from WMT test website.

Screenshots
$ kiwi evaluate --config experiments/evaluate_estimator.yaml
usage: kiwi evaluate [--config CONFIG] [--save-config SAVE_CONFIG] [-d] [-q] [--type {probs,tags}]
[--format {wmt17,wmt18}] [--pred-format {wmt17,wmt18}] [--sents-avg {probs,tags}]
[--gold-sents GOLD_SENTS] [--gold-target GOLD_TARGET] [--gold-source GOLD_SOURCE]
[--gold-cal GOLD_CAL] [--input-dir INPUT_DIR [INPUT_DIR ...]]
[--pred-sents PRED_SENTS [PRED_SENTS ...]] [--pred-target PRED_TARGET [PRED_TARGET ...]]
[--pred-gaps PRED_GAPS [PRED_GAPS ...]] [--pred-source PRED_SOURCE [PRED_SOURCE ...]]
[--pred-cal PRED_CAL]
kiwi evaluate: error: argument --gold-target: path must exist: data/WMT17/word_level/test.tags

The error is correct in that the file does not exist. I don't know where to find this file

Environment (please complete the following information):

OS: Linux
OpenKiwi version 0.1.1
Python version 3.6.5

Additional context
The 2018 test data doesn't have a .tags file either.

GPU ID

I have some Qustion

I set the
gpu-id: "0 1 2 3 4 5 6 7"

But error is occurred
kiwi train: error: argument --gpu-id: invalid int value: '0 1 2 3 4 5 6 7'

How to set the multi GPU ID in OpenKIWI?

GL integration testing

Hello, this is for testing gitlab integrations

Please disregard it

Combining NuQE predictions for words and gaps

When running NuQE predict, it outputs either target word tags or gap tags. In my workflow, I have to run a script afterwards to merge both correctly. Besides that, it outputs the binary tag probability instead of the tags themselves.

It would be great if there were a way to:

run two NuQE models, one for words and the other for gaps, and it took care of merging the outputs.
choose either tags or probabilities in the output

unbabel / openkiwi Goto Github PK

openkiwi's Introduction

News

Features

Quick Installation

Getting Started

Contributing

License

Citation

References

[1] Kreutzer et al. (2015): QUality Estimation from ScraTCH (QUETCH): Deep Learning for Word-level Translation Quality Estimation

[2] Martins et al. (2016): Unbabel's Participation in the WMT16 Word-Level Translation Quality Estimation Shared Task

[3] Martins et al. (2017): Pushing the Limits of Translation Quality Estimation

[4] Kim et al. (2017): Predictor-Estimator using Multilevel Task Learning with Stack Propagation for Neural Quality Estimation

[5] Wang et al. (2018): Alibaba Submission for WMT18 Quality Estimation Task

[6] Kepler et al. (2019): Unbabel’s Participation in the WMT19 Translation Quality Estimation Shared Task

openkiwi's People

Contributors

Stargazers

Watchers

Forkers

openkiwi's Issues

Why?

How?

Pip install

Integrate code

Open questions

Recommend Projects

Recommend Topics

Recommend Org