Code Monkey home page Code Monkey logo

multilingual-t5's Introduction

mT5: Multilingual T5

Multilingual T5 (mT5) is a massively multilingual pretrained text-to-text transformer model, trained following a similar recipe as T5. This repo can be used to reproduce the experiments in the mT5 paper.

Table of Contents

Languages covered

mT5 is pretrained on the mC4 corpus, covering 101 languages:

Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.

Results

mT5 achieves state-of-the-art performance on many cross-lingual NLP tasks, as of November 2020. For example, on XTREME zero-shot classification, structured prediction and QA tasks (showing F1 scores):

Model XNLI PAWS-X WikiAnn-NER XQuAD MLQA TyDiQA-GoldP
mBERT 65.4 81.9 62.2 64.5 61.4 59.7
XLM 69.1 80.9 61.2 59.8 48.5 43.6
InfoXLM 81.4 - - - 73.6 -
X-STILTs 80.4 87.7 64.7 77.2 72.3 76.0
XLM-R 79.2 86.4 65.4 76.6 71.6 65.1
VECO 79.9 88.7 65.7 77.3 71.7 67.6
RemBERT 80.8 87.5 70.1 79.6 73.1 77.0
mT5-Small 67.5 82.4 50.5 58.1 54.6 36.4
mT5-Base 75.4 86.4 55.7 67.0 64.6 59.1
mT5-Large 81.1 88.9 58.5 77.8 71.2 68.4
mT5-XL 82.9 89.6 65.5 79.5 73.5 77.8
mT5-XXL 85.0 90.0 69.2 82.5 76.0 82.0

Usage

Training

To run this code, you need to install the t5 library. General instructions for training, fine-tuning, evaluation, and exporting models for inference can be found in the t5 repo. In order to use the additional mT5 tasks provided in this library with the t5_mesh_transformer command, run from this directory and add the flag --module_import="multilingual_t5.tasks". There is also support for mT5 in HuggingFace; see instructions in the T5 repo here.

To train an mT5-Large model on the mc4 task from scratch as described in the paper:

export PROJECT=yourproject
export ZONE=yourzone
export BUCKET=yourbucket
export TPU=yourtpu

ctpu up --name=$TPU --project=$PROJECT --zone=$ZONE --tpu-size=v3-256 --tpu-only --noconf

TASK=mc4
MODEL_DIR="${BUCKET}${TASK}"

python -m t5.models.mesh_transformer_main \
  --tpu="${TPU}" \
  --gcp_project="${PROJECT}" \
  --tpu_zone="${ZONE}" \
  --model_dir="${MODEL_DIR}" \
  --gin_file="models/t5.1.1.large.gin" \
  --gin_param="MIXTURE_NAME = '${TASK}'" \
  --gin_param="utils.run.sequence_length = {'inputs': 1024, 'targets': 256}" \
  --gin_param="utils.run.batch_size = ('tokens_per_batch', 1048576)" \
  --gin_param="utils.run.learning_rate_schedule=@learning_rate_schedules.rsqrt_no_ramp_down" \
  --gin_param="run.train_steps = 1000000" \
  --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" \
  --gin_param="utils.tpu_mesh_shape.tpu_topology = 'v3-256'" \
  --eval_mode="perplexity_eval" \
  --eval_gin_param="mesh_eval_dataset_fn.num_eval_examples = 10000" \
  --t5_tfds_data_dir="${BUCKET}/t5-tfds" \
  --module_import="multilingual_t5.tasks"

Fine-Tuning

The example below shows how to finetune the mT5-Large model on the XNLI zeroshot task. See finetune_mt5_tasks.sh for hyperparameter settings for other tasks.

export PROJECT=yourproject
export ZONE=yourzone
export BUCKET=yourbucket
export TPU=yourtpu

ctpu up --name=$TPU --project=$PROJECT --zone=$ZONE --tpu-size=v3-256 --tpu-only --noconf

TASK=mt5_xnli_zeroshot
SEQUENCE_LENGTH_GIN=xnli
PRETRAINED_DIR=gs://t5-data/pretrained_models/mt5/large
PRETRAINED_STEPS=1000000
FINETUNE_STEPS=20000
MODEL_DIR="${BUCKET}${TASK}"

# Run fine-tuning
python -m t5.models.mesh_transformer_main \
  --tpu="${TPU}" \
  --gcp_project="${PROJECT}" \
  --tpu_zone="${ZONE}" \
  --model_dir="${MODEL_DIR}" \
  --gin_file="${PRETRAINED_DIR}/operative_config.gin" \
  --gin_file="sequence_lengths/${SEQUENCE_LENGTH_GIN}.gin" \
  --gin_param="utils.tpu_mesh_shape.tpu_topology = 'v3-256'" \
  --gin_param="MIXTURE_NAME = '${TASK}'" \
  --gin_param="utils.run.train_steps=$((PRETRAINED_STEPS+FINETUNE_STEPS))" \
  --gin_param="utils.run.init_checkpoint='${PRETRAINED_DIR}/model.ckpt-${PRETRAINED_STEPS}'" \
  --t5_tfds_data_dir="${BUCKET}/t5-tfds" \
  --module_import="multilingual_t5.tasks" \
  --gin_param="utils.run.batch_size = ('tokens_per_batch', 1048576)" \
  --gin_location_prefix="multilingual_t5/gin/"

The remaining experiments are shown in the tasks.py file.

Released Model Checkpoints

We have released the following checkpoints for pre-trained models described in our paper:

How to Cite

If you extend or use this work, please cite the paper where it was introduced:

@inproceedings{xue-etal-2021-mt5,
    title = "m{T}5: A Massively Multilingual Pre-trained Text-to-Text Transformer",
    author = "Xue, Linting  and
      Constant, Noah  and
      Roberts, Adam  and
      Kale, Mihir  and
      Al-Rfou, Rami  and
      Siddhant, Aditya  and
      Barua, Aditya  and
      Raffel, Colin",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.naacl-main.41",
    doi = "10.18653/v1/2021.naacl-main.41",
    pages = "483--498"
}

multilingual-t5's People

Contributors

adarob avatar blester125 avatar craffel avatar gauravmishra avatar hwchung27 avatar lintingxue avatar liviosoares avatar nconstant-google avatar sharannarang avatar stephanwlee avatar yilei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multilingual-t5's Issues

Pre-train mT5 on Custom Datastet

Is it possible to pre-train mT5 on a custom dataset. If yes,

  1. Should the data be part of TFDS or other file formats supported?
  2. Should the data be pre-processed in the format \t before calling the below code? Where task will be the custom task
python -m t5.models.mesh_transformer_main \
  --tpu="${TPU}" \
  --gcp_project="${PROJECT}" \
  --tpu_zone="${ZONE}" \
  --model_dir="${MODEL_DIR}" \
  --gin_file="models/t5.1.1.large.gin" \
  --gin_param="MIXTURE_NAME = '${TASK}'" \
  --gin_param="utils.run.sequence_length = {'inputs': 1024, 'targets': 256}" \
  --gin_param="utils.run.batch_size = ('tokens_per_batch', 1048576)" \
  --gin_param="utils.run.learning_rate_schedule=@learning_rate_schedules.rsqrt_no_ramp_down" \
  --gin_param="run.train_steps = 1000000" \
  --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" \
  --gin_param="utils.tpu_mesh_shape.tpu_topology = 'v3-256'" \
  --eval_mode="perplexity_eval" \
  --eval_gin_param="mesh_eval_dataset_fn.num_eval_examples = 10000" \
  --t5_tfds_data_dir="${BUCKET}/t5-tfds" \
  --module_import="multilingual_t5.tasks"

clarification on how MLQA eval is done in mt5

Hi

MLQA only have eval/test sets, could you please assist me and clarify on which dataset you have trained the model for the zero-shot experience? The result reported in the paper, are they computed on the eval set or the test set?

thank you.

vocab size error

When the default vocav path of data/utils.py was changed to a 250,000 vocab and executed in colab. The Below error code occurred. an error presumed to be a vocab size mismatch occurred. How can it be solved?

my code

model_parallelism, train_batch_size, keep_checkpoint_max = (1, 256, 16)
model = t5.models.MtfModel(
model_dir = 'gs://mt5_wayfarer/small',
tpu=TPU_ADDRESS,
tpu_topology=TPU_TOPOLOGY,
model_parallelism=model_parallelism,
batch_size=train_batch_size,
)

question_1 = "Processed input: translate English to German: "Luigi often said to me that he never wanted the brothers to end up in court," she wrote." #@param {type:"string"}

questions = [question_1]

now = time.time()
predict_inputs_path = os.path.join(prediction_dir, "predict_inputs_%d.txt" % now)
predict_outputs_path = os.path.join(prediction_dir, "predict_outputs_%d.txt" % now)
with tf.io.gfile.GFile(predict_inputs_path, "w") as f:
for q in questions:
f.write(q.lower())

model.batch_size = 8 # Min size for small model on v2-8 with parallelism 1.
model.predict(
input_file=predict_inputs_path,
output_file=predict_outputs_path,
# Select the most probable output token at each step.
temperature=0,
)

error code

InvalidArgumentError: From /job:worker/replica:0/task:0:
9 root error(s) found.
(0) Invalid argument: Run-time shape mismatch for TPUExecute argument[90] (VarHandles_17697799040045632952/_2:88). Expected element_type: F32
dimensions: 512
dimensions: 250240
layout {
minor_to_major: 1
minor_to_major: 0
format: DENSE
}
is_dynamic_dimension: false
is_dynamic_dimension: false
; got element_type: F32
dimensions: 512
dimensions: 250112
layout {
minor_to_major: 1
minor_to_major: 0
format: DENSE
}
is_dynamic_dimension: false
is_dynamic_dimension: false

 [[node TPUReplicateMetadata (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py:3697) ]]
 [[tpu_compile_succeeded_assert/_14841502660177317055/_3_G4826]]

(1) Invalid argument: Run-time shape mismatch for TPUExecute argument[90] (VarHandles_17697799040045632952/_2:88). Expected element_type: F32
dimensions: 512
dimensions: 250240
layout {
minor_to_major: 1
minor_to_major: 0
format: DENSE
}
is_dynamic_dimension: false
is_dynamic_dimension: false
; got element_type: F32
dimensions: 512
dimensions: 250112
layout {
minor_to_major: 1
minor_to_major: 0
format: DENSE
}
is_dynamic_dimension: false
is_dynamic_dimension: false

 [[node TPUReplicateMetadata (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py:3697) ]]
 [[tpu_compile_succeeded_assert/_14841502660177317055/_3_G4816]]

(2) Invalid argument: Run-time shape mismatch for TPUExecute argument[90] (VarHandles_17697799040045632952/_2:88). Expected element_type: F32
dimensions: 512
dimensions: 250240
layout {
minor_to_major: 1
minor_to_major: 0
format: DENSE
}
is_dynamic_dimension: false
is_dynamic_dimension: false
; got element_type: F32
dimensions: 512
dimensions: 250112
layout {
minor_to_major: 1
minor_to_major: 0
format: DENSE
}
is_dynamic_dimension: false
is_dynamic_dimension: false

`((4096, 32128)) doesn't match with shape of tensor decoder/logits/kernel ([4096, 250112]) from checkpoint reader.`

I am getting this error, despite the fact that I am using mc4.250000.100extra vocabulary. Not sure where things are going wrong.

ValueError: Shape of variable decoder/logits/kernel:0 ((4096, 32128)) doesn't match with shape of tensor decoder/logits/kernel ([4096, 250112]) from checkpoint reader.

For completeness, here is my task definition:

import t5
import os
import functools
import tensorflow as tf

from t5.evaluation import metrics

DATA_DIR = "gs://danielk-files/data/"

from t5.data import sentencepiece_vocabulary
DEFAULT_SPM_PATH = "gs://t5-data/vocabs/mc4.250000.100extra/sentencepiece.model"
DEFAULT_VOCAB = sentencepiece_vocabulary.SentencePieceVocabulary(DEFAULT_SPM_PATH)
DEFAULT_OUTPUT_FEATURES = {
    "inputs": t5.data.Feature(
        vocabulary=DEFAULT_VOCAB, add_eos=True, required=False),
    "targets": t5.data.Feature(
        vocabulary=DEFAULT_VOCAB, add_eos=True)
}



def get_downloaded_data_path(data_dir1, split, extension):
    return os.path.join(data_dir1, split + extension)

def normalize_text(text):
    """Lowercase and remove quotes from a TensorFlow string."""
    text = tf.strings.lower(text)
    text = tf.strings.regex_replace(text, "'(.*)'", r"\1")
    return text

def to_inputs_and_targets(ex):
    return {
        "inputs": normalize_text(ex["inputs"]),
        "targets": normalize_text(ex["targets"])
    }

def preprocess(
        dataset,
        prefix='',  # not used
        sample_answer=False,  # not used
):
    return dataset.map(to_inputs_and_targets,
        num_parallel_calls=tf.data.experimental.AUTOTUNE)


def dataset_fn_two_column(split, shuffle_files=False, dataset=""):
    # Load lines from the text file as examples.
    ds = tf.data.TextLineDataset(get_downloaded_data_path(DATA_DIR + dataset, split, ".tsv"))
    print(" >>>> about to read tsv . . . ")
    ds = ds.map(
        functools.partial(tf.io.decode_csv, record_defaults=["", ""], use_quote_delim=False, field_delim="\t"),
        num_parallel_calls=tf.data.experimental.AUTOTUNE)
    # Map each tuple to a {"question": ... "answers": ...} dict.
    ds = ds.map(lambda *ex: dict(zip(["inputs", "targets"], ex)))
    return ds


def postprocessor_two_column(answer, example=None, is_target=False):
    """Returns answer, or all answers if the full example is provided."""
    return tf.compat.as_text(answer)



for task in [
    "natural_instructions_xlingual_may4_defintion_pos_2_neg_2_expl",
    "natural_instructions_xlingual_defintion_only",
    "natural_instructions_xlingual_defintion_pos_2",
]:
    t5.data.TaskRegistry.add(
        f"{task}_mixture",
        # Supply a function which returns a tf.data.Dataset.
        dataset_fn=functools.partial(dataset_fn_two_column, dataset=task),
        splits=["test", "train"],
        # Supply a function which preprocesses text from the tf.data.Dataset.
        text_preprocessor=preprocess,
        # Lowercase targets before computing metrics.
        postprocess_fn=postprocessor_two_column,
        # sentencepiece_model_path=DEFAULT_SPM_PATH,
        output_features=DEFAULT_OUTPUT_FEATURES,
        metric_fns=[metrics.squad]
    )

and here is the training command:

        PRETRAINED_DIR="gs://t5-data/pretrained_models/mt5/xxl" 
        MODEL_DIR="${BUCKET}/${TASK}/${SIZE}"
        TASK=${eval}_mixture
        python -m t5.models.mesh_transformer_main \
          --module_import="nq_tasks" \
          --tpu="${TPU_NAME}" \
          --gcp_project="${PROJECT}" \
          --tpu_zone="${ZONE}" \
          --model_dir="${MODEL_DIR}" \
          --gin_file="dataset.gin" \
          --gin_file="${PRETRAINED_DIR}/operative_config.gin" \
          --gin_param="utils.run.save_checkpoints_steps=500" \
          --gin_param="utils.tpu_mesh_shape.tpu_topology = 'v3-128'" \
          --gin_param="MIXTURE_NAME = '${TASK}'" \
          --gin_param="utils.run.train_steps=$((PRETRAINED_STEPS + FINETUNE_STEPS))" \
          --gin_param="utils.run.init_checkpoint='${PRETRAINED_DIR}/model.ckpt-${PRETRAINED_STEPS}'" \
          --t5_tfds_data_dir="${BUCKET}/t5-tfds" \
          --gin_param="utils.run.sequence_length={'inputs': ${LEN}, 'targets': 512}" \
          --gin_param="utils.tpu_mesh_shape.model_parallelism = 64"

and

tensorflow               2.2.1
tensorflow-datasets      4.1.0
tensorflow-estimator     2.2.0
tensorflow-metadata      0.25.0
tensorflow-text          2.2.1
t5                       0.7.1

And I was testing it on a v3-128 TPU.

any plan to train mt5 on a similar setup as XLM-R ?

Hi
I just have a general question, mT5 is a very cool model, for being very general, but if it could reach the performance of XLM-R with the same number of parameters it was even better, I was wondering if you have any plan to train mT5 model on a similar setup as XLM-R and releasing the models in near future? thanks

T5 Ram Usage Improvements!!!

Hello Every one,
In our team, we encountered a problem with MT5 during inference and we need advice.

Previously, when we load and predict a label with MT5, about 10G of RAM was used, but now the model uses only 3G of RAM in its predictions.
Could this be related to some possible improvements in architecture or the volume of T5 in recent months?
Have you had such an improvement in hardware resource usage/volume, or should we be looking for another reason for this?

Export/Save mt5 model

!t5_mesh_transformer \
  --model_dir="gs://t5-data/pretrained_models/mt5/base" \
  --use_model_api \
  --mode="export_predict" \
  --export_dir="{saved_model_dir}"

saved_model_path = os.path.join(saved_model_dir, max(os.listdir(saved_model_dir)))

While running the above code, it  loads t5-base, resulting the below error due to the shape mismatch. How to export a mt5 model.

tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [32128,768] rhs shape= [250112,768]
	 [[{{node save/Assign_281}}]]  

Clean model from hugging face returning only `<pad> <extra_id_0>.</s>'`

When I try to load a model from hugging face I get this kind of response. What am I doning wrong? Or is something wrong with the uploaded model to hugging face? I wanted to fine tune it but I cannot becouse it keeps returning this string.

You are using the legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
/home/gasperspagnolo/Documents/stuff/testing/.venv/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py:470: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
  warnings.warn(
<pad> <extra_id_0>.</s>

Here is my mini sample code:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("google/mt5-small")
model = AutoModelForSeq2SeqLM.from_pretrained("google/mt5-small")
prompt = "Can you translate the following paragraph into French? 'Climate change is one of the most significant challenges facing humanity. Its effects on agriculture are particularly concerning, as they threaten our ability to feed a growing population.'"
input_ids = tokenizer.encode(prompt, return_tensors="pt")
output = model.generate(input_ids, max_new_tokens=100)
decoded_output = tokenizer.decode(output[0])
print(decoded_output)

Does multilingual-t5 need tensor2tensor ?

Hi.
About half a year ago, when I fine-tuned the original japanise summarization task with multilingual-t5, it worked fine.
But, Same task does not work. :_(

library version and error is as follows.

  • library
t5[gcp]==0.7.1
tensorflow-gpu==2.3.0
mesh-tensorflow==0.1.17
tensorflow-datasets==4.1.0
tensorflow-text==2.3.0
  • error
INFO:tensorflow:training_loop marked as finished
I0507 05:11:07.176133 140511804491584 error_handling.py:115] training_loop marked as finished
WARNING:tensorflow:Reraising captured error
W0507 05:11:07.176389 140511804491584 error_handling.py:149] Reraising captured error
Traceback (most recent call last):
  File "/usr/local/bin/t5_mesh_transformer", line 8, in <module>
    sys.exit(console_entry_point())
  File "/usr/local/lib/python3.6/dist-packages/t5/models/mesh_transformer_main.py", line 262, in console_entry_point
    app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.6/dist-packages/t5/models/mesh_transformer_main.py", line 256, in main
    model_dir=FLAGS.model_dir)
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/usr/local/lib/python3.6/dist-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/mesh_tensorflow/transformer/utils.py", line 2302, in run
    skip_seen_data=skip_seen_data)
  File "/usr/local/lib/python3.6/dist-packages/mesh_tensorflow/transformer/utils.py", line 1615, in train_model
    estimator.train(input_fn=input_fn, max_steps=train_steps)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3089, in train
    rendezvous.raise_errors()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 150, in raise_errors
    six.reraise(typ, value, traceback)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3084, in train
    saving_listeners=saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 349, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1175, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1201, in _train_model_default
    self._get_features_and_labels_from_input_fn(input_fn, ModeKeys.TRAIN))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1037, in _get_features_and_labels_from_input_fn
    self._call_input_fn(input_fn, mode))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3041, in _call_input_fn
    return input_fn(**kwargs)
  File "/usr/local/lib/python3.6/dist-packages/mesh_tensorflow/transformer/utils.py", line 1602, in input_fn
    dataset_split=dataset_split)
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/usr/local/lib/python3.6/dist-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/t5/models/mesh_transformer.py", line 73, in mesh_train_dataset_fn
    feature_keys=feature_keys, ensure_eos=eos_keys)
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/usr/local/lib/python3.6/dist-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/mesh_tensorflow/transformer/dataset.py", line 123, in pack_or_pad
    dataset = pack_dataset(dataset, length=length, keys=feature_keys)
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/usr/local/lib/python3.6/dist-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/mesh_tensorflow/transformer/dataset.py", line 551, in pack_dataset
    dataset = _pack_with_custom_ops(dataset, keys, length)
  File "/usr/local/lib/python3.6/dist-packages/mesh_tensorflow/transformer/dataset.py", line 690, in _pack_with_custom_ops
    from tensor2tensor.data_generators.ops import pack_sequences_ops  # pylint: disable=g-import-not-at-top
ModuleNotFoundError: No module named 'tensor2tensor'
  In call to configurable 'pack_dataset' (<function pack_dataset at 0x7fcafcd94e18>)
  In call to configurable 'pack_or_pad' (<function pack_or_pad at 0x7fcafcd85bf8>)
  In call to configurable 'mesh_train_dataset_fn' (<function mesh_train_dataset_fn at 0x7fcafd0e6840>)
  In call to configurable 'run' (<function run at 0x7fcafccc5b70>)
  • My qestion
    Does multilingual-t5 need tensor2tensor ? About half a year ago, I don't know this error.
    Then, installed tensor2tensor==1.15.7 . but error occoerd.
  File "/usr/local/lib/python3.6/dist-packages/mesh_tensorflow/transformer/dataset.py", line 690, in _pack_with_custom_ops
    from tensor2tensor.data_generators.ops import pack_sequences_ops  # pylint: disable=g-import-not-at-top

Pelase tell me about advice.

how long does it take to train mT5 with mC4?

I was given a week to reproduce Google's gT5 with the threat to terminate my PhD study right if I fail, meaning train mT5 with mC4 plus fine-tuning. Isn't that an unreasonable request? The model and dataset are huge.

can't reproduce finetuning task via colab

hi,,
i'm trying to reproduce fine tuning task on colab(base on this https://github.com/google-research/text-to-text-transfer-transformer/tree/master/notebooks) but got some error when trying to execute finetune model

i'm adding task and mixture by using sample from https://github.com/google-research/multilingual-t5/blob/master/multilingual_t5/tasks.py

error show when finetune command executed, it says:

for xnli mixture

/usr/local/lib/python3.6/dist-packages/tensorflow_datasets/core/features/text_feature.py in from_json_content(cls, value)
    183     if "use_encoder" in value:
    184       raise ValueError(
--> 185           "Deprecated encoder not supported. Please use the plain text version "
    186           "with `tensorflow_text`."
    187       )

ValueError: Deprecated encoder not supported. Please use the plain text version with `tensorflow_text`.
  In call to configurable 'mesh_train_dataset_fn' (<function mesh_train_dataset_fn at 0x7fcd13c28378>)

for tydiqa mixture

Tensor-typed variable initializers must either be wrapped in an init_scope or callable (e.g., `tf.Variable(lambda : tf.truncated_normal([10, 40]))`) when building functions. Please file a feature request if this restriction inconveniences you.

here my google colab if you guys dont mind to look at it:
https://colab.research.google.com/drive/12a4ZgJUI6XRzkc0ZiFyuYiFWpCiiXSde?usp=sharing

thanks before

How mt5 checkpoint is selected on TydiQA dataset

Hi
This dataset only have train/validation set, I assume the reported numbers are on validation set, in that case, could you clarify please how you select the best checkpoint ? on which dataset validation is done? thank you

evaluation for MLQA in MT5

Hi
I have some questions on MLQA evaluation, It seems to me that in mt5 durnig applying squad preprocessor, you choose only the first answer as the selected answer [1]

  1. why mt5 do not consider all the answers?
  2. how one can handle multiple answers with mt5 model? how should I form the sequence to sequence format in this case?
  3. during the evaluation to me, you consider to have multiple answers [2], at least targets and predictions are a list here, which makes it confused as only the first answer seems to be passed to the script from the beginning, also I am not sure how mt5 returns back a list? could you clarify this please?

thank you @nconstant-google

[1]


[2] https://github.com/google-research/text-to-text-transfer-transformer/blob/732efb09be3c6ed9b2d0d1b3e6ac45f08e15989a/t5/evaluation/qa_utils.py#L89

mT5 sentence piece tokenizer

Hey guys,

Thanks for open-sourcing mT5!
Is it maybe possible to provide a link to the sentence piece model file that is used for the pre-trained mT5 checkpoints?
I could not find the file anywhere in the provided checkpoint links.

mT5-Small is taking large amount of RAM while preprocessing.

I am using mt5-small for machine translation task using pytorch and transformer. I have approx 3 million parallel data and using 96GB RAM and 1 P-100 GPU for fine-tuning, but not able to fine tuning because RAM are fully exhausted before training started.

tydiqa update causes error

when trying export module with --module_import="multilingual_t5.tasks",
at tasks.py 273 line, it raises "tydiqa_en" not assigned error
please check the error.
if blocking 273 line, it works fine.

pre-training sample dateset for mT5

Hi, Thank you for the great work.
I am curious how the pre-training sample looks like across different languages. If possible please provide a sample dataset.
If you can point me to pre-processing (for pre-training) and pre-training scripts. It will be a great help.

no generative task?

Not an issue, but a general question:

In the paper, all the tasks for which results are reported are discriminative in nature. I didn't find any results for a generative task whereas the original monolingual t5 showed results on both discriminative and generative tasks.

Any particular reason?

checkpoint file size varies

I could see the checkpoint file size varies within the mT5-base. Like wise, it is also observed for mT5-large models.
Is there any specific reason for this ?

PS: I am not referring the checkpoint file size difference between mT5-base and mT5-large, which is expected.

How to have a sentencepiece tokenizer that already has the extra 100 ids

I see that the mt5 tokenizer comes by default with the extra 100 ids that are needed by the T5 models.

I've trained my own version of tokenizer following the official sentencepiece documentations.

But i was unable to understand how to make those extra 100 ids part of my .model and .vocab , similar to how the mc4 mt5 tokenizer has here gs://t5-data/vocabs/mc4.250000.100extra/sentencepiece.model

Following could someone please also share the sentencepiece training args used in the actual mT5 tokenizer training.

Should I use the --byte_fallback args ?

XLM-R and mT5 comparison

Dear MT5 authors,
when I compare the results of XLM-R with mT5-base which is the model which has the same (even a bit more) parameters as XLM-R large model, I see much better results for XLM-R, do you agree that XLM-R is a better cross-lingual model? thanks

Cannot load with T5ForConditionalGeneration

I want to load the Huggingface checkpoint using model = T5ForConditionalGeneration.from_pretrained('google/mt5-large'), however I obtain the following outputs,

Some weights of the model checkpoint at google/mt5-large were not used when initializing T5ForConditionalGeneration:
['encoder.block.0.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.0.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.1.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.1.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.2.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.2.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.3.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.3.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.4.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.4.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.5.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.5.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.6.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.6.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.7.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.7.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.8.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.8.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.9.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.9.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.10.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.10.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.11.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.11.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.12.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.12.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.13.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.13.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.14.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.14.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.15.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.15.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.16.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.16.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.17.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.17.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.18.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.18.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.19.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.19.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.20.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.20.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.21.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.21.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.22.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.22.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.23.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.23.layer.1.DenseReluDense.wi_1.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.8.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.8.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.9.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.9.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.10.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.10.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.11.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.11.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.12.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.12.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.13.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.13.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.14.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.14.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.15.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.15.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.16.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.16.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.17.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.17.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.18.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.18.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.19.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.19.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.20.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.20.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.21.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.21.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.22.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.22.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.23.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.23.layer.2.DenseReluDense.wi_1.weight']

  • This IS expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
  • This IS NOT expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Some weights of T5ForConditionalGeneration were not initialized from the model checkpoint at google/mt5-large and are newly initialized: ['encoder.block.0.layer.1.DenseReluDense.wi.weight', 'encoder.block.1.layer.1.DenseReluDense.wi.weight', 'encoder.block.2.layer.1.DenseReluDense.wi.weight', 'encoder.block.3.layer.1.DenseReluDense.wi.weight', 'encoder.block.4.layer.1.DenseReluDense.wi.weight', 'encoder.block.5.layer.1.DenseReluDense.wi.weight', 'encoder.block.6.layer.1.DenseReluDense.wi.weight', 'encoder.block.7.layer.1.DenseReluDense.wi.weight', 'encoder.block.8.layer.1.DenseReluDense.wi.weight', 'encoder.block.9.layer.1.DenseReluDense.wi.weight', 'encoder.block.10.layer.1.DenseReluDense.wi.weight', 'encoder.block.11.layer.1.DenseReluDense.wi.weight', 'encoder.block.12.layer.1.DenseReluDense.wi.weight', 'encoder.block.13.layer.1.DenseReluDense.wi.weight', 'encoder.block.14.layer.1.DenseReluDense.wi.weight', 'encoder.block.15.layer.1.DenseReluDense.wi.weight', 'encoder.block.16.layer.1.DenseReluDense.wi.weight', 'encoder.block.17.layer.1.DenseReluDense.wi.weight', 'encoder.block.18.layer.1.DenseReluDense.wi.weight', 'encoder.block.19.layer.1.DenseReluDense.wi.weight', 'encoder.block.20.layer.1.DenseReluDense.wi.weight', 'encoder.block.21.layer.1.DenseReluDense.wi.weight', 'encoder.block.22.layer.1.DenseReluDense.wi.weight', 'encoder.block.23.layer.1.DenseReluDense.wi.weight', 'decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight', 'decoder.block.0.layer.2.DenseReluDense.wi.weight', 'decoder.block.1.layer.2.DenseReluDense.wi.weight', 'decoder.block.2.layer.2.DenseReluDense.wi.weight', 'decoder.block.3.layer.2.DenseReluDense.wi.weight', 'decoder.block.4.layer.2.DenseReluDense.wi.weight', 'decoder.block.5.layer.2.DenseReluDense.wi.weight', 'decoder.block.6.layer.2.DenseReluDense.wi.weight', 'decoder.block.7.layer.2.DenseReluDense.wi.weight', 'decoder.block.8.layer.2.DenseReluDense.wi.weight', 'decoder.block.9.layer.2.DenseReluDense.wi.weight', 'decoder.block.10.layer.2.DenseReluDense.wi.weight', 'decoder.block.11.layer.2.DenseReluDense.wi.weight', 'decoder.block.12.layer.2.DenseReluDense.wi.weight', 'decoder.block.13.layer.2.DenseReluDense.wi.weight', 'decoder.block.14.layer.2.DenseReluDense.wi.weight', 'decoder.block.15.layer.2.DenseReluDense.wi.weight', 'decoder.block.16.layer.2.DenseReluDense.wi.weight', 'decoder.block.17.layer.2.DenseReluDense.wi.weight', 'decoder.block.18.layer.2.DenseReluDense.wi.weight', 'decoder.block.19.layer.2.DenseReluDense.wi.weight', 'decoder.block.20.layer.2.DenseReluDense.wi.weight', 'decoder.block.21.layer.2.DenseReluDense.wi.weight', 'decoder.block.22.layer.2.DenseReluDense.wi.weight', 'decoder.block.23.layer.2.DenseReluDense.wi.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Seems like all weights are not correctly initialized from the checkpoint. How can I solve this issue?

after finetuning mt5 by transformers๏ผŒgenerate effect become strange

I try to do translation task by mt5, exactly English to Chinese.
When I finetuning many steps on millions of parallel corporas, loss was down to 2.407 and stayed for a while. I try use the model after finetuning.
loss pic:
image

My input is English sentence, output is Chinese sentence. Before the generating Chinese sentence , it must generate '39'. I generated 30 sentences. There was 29 sentences started with '39'. Of course, there is no sentence beginning with 39 in the English input. The generating effect is bad.

up is original,down is generated
up is original,down is generated

I find my Chinese corpus, I only find one sentence which start with '39'.
What caused this problem?
Should I use mt5 to do translation task?There is any good solution for translation task?
Thanks!

Ground truth preprocessing like T5 prepend method?

Hello,

In the original T5 paper, the summarization ground truths are prepended by the string "summarize: ". Do you follow this training method? If you do, do you prepend a translated version of the summarize string for different languages? I couldn't find a mention of this in the paper.

Thank you

The latest timestamp of mc4

Apologize for troubling you with this matter. In your paper, you mention that "we make use of all of the 71 monthly web scrapes released so far by Common Crawl." but I am unsure about the exact starting time.
image
Dou you mean 2014/03 or 2008? I would greatly appreciate it if you could provide me with information regarding this matter.

`ValueError: Shape of variable ... ` when loading XL model

I am using the same code for loading the models and getting the following error on the XL models, only:

ValueError: Shape of variable decoder/block_000/layer_000/SelfAttention/k:0 ((1024, 1024)) doesn't match with shape of tensor decoder/block_000/layer_000/SelfAttention/k ([2048, 2048]) from checkpoint reader.
  In call to configurable 'run' (<function run at 0x7fe352ea0680>)

My code is pretty straightforward:
(a) I have my tasks defined in my python code parsiglue_tasks.py
(b) and in the commandline:

  python -m t5.models.mesh_transformer_main \
    --module_import="parsiglue_tasks" \
    --tpu="${TPU_NAME}" \
    --gcp_project="${PROJECT}" \
    --tpu_zone="${ZONE}" \
    --model_dir="${MODEL_DIR}" \
    --gin_file="dataset.gin" \
    --gin_file="${PRETRAINED_DIR}/operative_config.gin" \
    --gin_param="utils.run.save_checkpoints_steps=1000" \
    --gin_param="utils.tpu_mesh_shape.tpu_topology = 'v3-8'" \
    --gin_param="MIXTURE_NAME = '${TASK}'" \
    --gin_param="utils.run.batch_size=('tokens_per_batch', 24576)" \
    --gin_param="utils.run.train_steps=$((PRETRAINED_STEPS + FINETUNE_STEPS))" \
    --gin_param="utils.run.init_checkpoint='${PRETRAINED_DIR}/model.ckpt-${PRETRAINED_STEPS}'" \
    --t5_tfds_data_dir="${BUCKET}/t5-tfds" \
    --gin_location_prefix="multilingual_t5/gin/"

My TPU software version is 2.2 and my virtual machine is hosting the following libraries:

t5                       0.7.0
tensorboard              2.2.2
tensorboard-plugin-wit   1.7.0
tensorflow               2.2.1
tensorflow-datasets      4.0.1
tensorflow-estimator     2.2.0
tensorflow-metadata      0.24.0
tensorflow-text          2.2.0

How much data was used to train the mT5 tokenizer?

Though not explicitly stated in the paper, I understand that mT5 uses a SentencePiece Unigram tokenizer (please correct me if I am wrong). I cannot seem to find how much data this tokenizer was trained on.

The mT5 paper says, "As in T5, we use SentencePiece (Kudo and Richardson, 2018; Kudo, 2018) models trained with the language sampling rates used during pre-training." The T5 paper says, "Then, we trained our SentencePiece model on a mixture of 10 parts of English C4 data with 1 part each of data classified as German, French or Romanian.", but I do not see what the raw GB and/or token counts for the training data for the tokenizer.

How much data was the tokenizer trained on? (And, if you recall, approximately how long did it take to train, and how much RAM was required?)

Error trying to run pre-training with mc4

I'm getting an error that is returning vocabulary as None

This is the invocation I'm using to call pre-training on mc4

python -m t5.models.mesh_transformer_main \
  --tpu="${TPU}" \
  --gcp_project="${PROJECT}" \
  --tpu_zone="${ZONE}" \
  --model_dir="${MODEL_DIR}" \
  --t5_tfds_data_dir="${BUCKET}/t5-tfds" \
  --gin_file="models/t5.1.1.small.gin" \
  --gin_param="MIXTURE_NAME = '${TASK}'" \
  --gin_param="utils.run.sequence_length = {'inputs': 1024, 'targets': 256}" \
  --gin_param="utils.run.batch_size = ('tokens_per_batch', 1048576)" \
  --gin_param="run.train_steps = 1000000" \
  --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" \
  --gin_param="utils.tpu_mesh_shape.tpu_topology = '${TPU_SIZE}'" \
  --gin_param="tsv_dataset_fn.vocabulary = 'SentencePieceVocabulary()'" \
  --gin_param="SentencePieceVocabulary.sentencepiece_model_file = 'gs://t5-data/vocabs/mc4.250000.100extra/sentencepiece.model'" \
  --module_import="multilingual_t5.tasks"

Please find the stack trace here

I0131 14:29:29.881974 140545131808576 resource_reader.py:50] system_path_file_exists:models/t5.1.0.base.gin
E0131 14:29:29.882193 140545131808576 resource_reader.py:55] Path not found: models/t5.1.0.base.gin
I0131 14:29:29.882565 140545131808576 resource_reader.py:50] system_path_file_exists:models/bi_v1.gin
E0131 14:29:29.882784 140545131808576 resource_reader.py:55] Path not found: models/bi_v1.gin
I0131 14:29:29.883124 140545131808576 resource_reader.py:50] system_path_file_exists:models/bi_bert_base.gin
E0131 14:29:29.883371 140545131808576 resource_reader.py:55] Path not found: models/bi_bert_base.gin
INFO:tensorflow:model_type=bitransformer
I0131 14:29:29.888293 140545131808576 utils.py:2371] model_type=bitransformer
INFO:tensorflow:mode=train
I0131 14:29:29.888453 140545131808576 utils.py:2372] mode=train
INFO:tensorflow:sequence_length={'inputs': 1024, 'targets': 256}
I0131 14:29:29.888569 140545131808576 utils.py:2373] sequence_length={'inputs': 1024, 'targets': 256}
INFO:tensorflow:batch_size=1024
I0131 14:29:29.888664 140545131808576 utils.py:2374] batch_size=1024
INFO:tensorflow:train_steps=1000000
I0131 14:29:29.888759 140545131808576 utils.py:2375] train_steps=1000000
INFO:tensorflow:total_run_steps=1000000
I0131 14:29:29.888846 140545131808576 utils.py:2377] total_run_steps=1000000
INFO:tensorflow:mesh_shape=Shape[batch=8]
I0131 14:29:29.888945 140545131808576 utils.py:2378] mesh_shape=Shape[batch=8]
INFO:tensorflow:layout_rules=ensemble:ensemble,batch:batch,d_ff:model,heads:model,vocab:model,experts:batch
I0131 14:29:29.889025 140545131808576 utils.py:2379] layout_rules=ensemble:ensemble,batch:batch,d_ff:model,heads:model,vocab:model,experts:batch
INFO:tensorflow:Building TPUConfig with tpu_job_name=None
I0131 14:29:29.889143 140545131808576 utils.py:2394] Building TPUConfig with tpu_job_name=None
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.7/site-packages/t5/models/mesh_transformer_main.py", line 264, in <module>
    console_entry_point()
  File "/opt/conda/lib/python3.7/site-packages/t5/models/mesh_transformer_main.py", line 261, in console_entry_point
    app.run(main)
  File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/opt/conda/lib/python3.7/site-packages/t5/models/mesh_transformer_main.py", line 255, in main
    model_dir=FLAGS.model_dir)
  File "/opt/conda/lib/python3.7/site-packages/gin/config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/opt/conda/lib/python3.7/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/opt/conda/lib/python3.7/site-packages/gin/config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mesh_tensorflow/transformer/utils.py", line 2426, in run
    estimator = estimator_fn()
  File "/opt/conda/lib/python3.7/site-packages/mesh_tensorflow/transformer/utils.py", line 1572, in get_estimator
    input_vocab_size=inputs_vocabulary(vocabulary).vocab_size,
AttributeError: 'NoneType' object has no attribute 'vocab_size'
  In call to configurable 'run' (<function run at 0x7fd2ff833290>)

Need help with this @craffel @adarob
Thanks in Advance

can't finetune on gpu

Hello,
Congratulations and thank you for your contribution.
I am trying to fine tune mT5 on a my dataset but I couldn't make it work, I am installing T5 via pip install but it has not an explicit requirement on tensorflow versions. I tried some solutions on the t5 repo but none worked. Any help is appreciated.

My error :

loading CUDA OK
2020-11-04 17:32:58.543624: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
WARNING:tensorflow:From /home/user/.conda/envs/t5/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
....;
    task_or_mixture_name)
ValueError: No Task or Mixture found with name: xnli_zeroshot
  In call to configurable 'get_vocabulary' (<function get_vocabulary at 0x7f57c77d3b70>)

This is the command i use for finetuning
`PRETRAINED_DIR="/mt5/small"
PRETRAINED_STEPS=1000000
FINETUNE_STEPS=20000
MODEL_DIR="//models

t5_mesh_transformer
--model_dir="${MODEL_DIR}"
--gin_file="${PRETRAINED_DIR}/operative_config.gin"
--gin_file="sequence_lengths/xquad.gin"
--gin_param="utils.run.train_dataset_fn = @t5.models.mesh_transformer.tsv_dataset_fn"
--gin_param="tsv_dataset_fn.filename = '/t5-train/multilingual-t5/test.tsv'"
--gin_param="utils.run.train_steps=$((PRETRAINED_STEPS+FINETUNE_STEPS))"
--gin_param="utils.run.init_checkpoint='${PRETRAINED_DIR}/model.ckpt-${PRETRAINED_STEPS}'"
--gin_param="utils.run.mesh_shape = 'model:1,batch:1'"
--gin_param="utils.run.mesh_devices = ['gpu:1']"
--gin_location_prefix="multilingual_t5/gin/"`

and pip packages:

cuda/10.1
cudnn/7.6-cuda-10.1

Package                  Version
------------------------ ---------------------
absl-py                  0.11.0
astunparse               1.6.3
attrs                    20.2.0
Babel                    2.8.0
cachetools               4.1.1
certifi                  2020.6.20
chardet                  3.0.4
click                    7.1.2
dataclasses              0.7
dill                     0.3.3
dm-tree                  0.1.5
filelock                 3.0.12
future                   0.18.2
gast                     0.3.3
gin-config               0.3.0
google-auth              1.23.0
google-auth-oauthlib     0.4.2
google-pasta             0.2.0
googleapis-common-protos 1.52.0
grpcio                   1.33.2
h5py                     2.10.0
idna                     2.10
importlib-metadata       2.0.0
importlib-resources      3.3.0
joblib                   0.17.0
Keras-Preprocessing      1.1.2
Markdown                 3.3.3
mesh-tensorflow          0.1.17
nltk                     3.5
numpy                    1.19.4
oauthlib                 3.1.0
opt-einsum               3.3.0
packaging                20.4
pandas                   1.1.4
pip                      20.2.4
portalocker              2.0.0
promise                  2.3
protobuf                 3.13.0
pyasn1                   0.4.8
pyasn1-modules           0.2.8
pyparsing                2.4.7
python-dateutil          2.8.1
pytz                     2020.4
regex                    2020.10.28
requests                 2.24.0
requests-oauthlib        1.3.0
rouge-score              0.0.4
rsa                      4.6
sacrebleu                1.4.14
sacremoses               0.0.43
scikit-learn             0.23.2
scipy                    1.5.3
sentencepiece            0.1.94
setuptools               50.3.0.post20201103
six                      1.15.0
t5                       0.7.1
tensorboard              2.3.0
tensorboard-plugin-wit   1.7.0
tensorflow               2.3.1
tensorflow-datasets      4.0.1
tensorflow-estimator     2.3.0
tensorflow-metadata      0.25.0
tensorflow-text          2.3.0
termcolor                1.1.0
tfds-nightly             4.0.1.dev202011030854
threadpoolctl            2.1.0
tokenizers               0.9.2
torch                    1.7.0
tqdm                     4.51.0
transformers             3.4.0
typing-extensions        3.7.4.3
urllib3                  1.25.11
Werkzeug                 1.0.1
wheel                    0.35.1
wrapt                    1.12.1
zipp                     3.4.0

Thank you in advance
@adarob

can't reproduce finetuning

Hello,
I closed the other issue by mistake so I will post my new error here. I am trying to fine tune mT5 on a my dataset but I couldn't make it work, I am installing T5 via pip install . Any help is appreciated.

My error

2020-11-04 17:32:58.543624: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
WARNING:tensorflow:From /home/user/.conda/envs/t5/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
....;
    task_or_mixture_name)
ValueError: No Task or Mixture found with name: xnli_zeroshot
  In call to configurable 'get_vocabulary' (<function get_vocabulary at 0x7f57c77d3b70>)

This is the command i use for finetuning

PRETRAINED_DIR="/mt5/small"
PRETRAINED_STEPS=1000000
FINETUNE_STEPS=20000
MODEL_DIR="//models

t5_mesh_transformer
--model_dir="${MODEL_DIR}"
--gin_file="${PRETRAINED_DIR}/operative_config.gin"
--gin_file="sequence_lengths/xquad.gin"
--gin_param="utils.run.train_dataset_fn = @t5.models.mesh_transformer.tsv_dataset_fn"
--gin_param="tsv_dataset_fn.filename = '/t5-train/multilingual-t5/test.tsv'"
--gin_param="utils.run.train_steps=$((PRETRAINED_STEPS+FINETUNE_STEPS))"
--gin_param="utils.run.init_checkpoint='${PRETRAINED_DIR}/model.ckpt-${PRETRAINED_STEPS}'"
--gin_param="utils.run.mesh_shape = 'model:1,batch:1'"
--gin_param="utils.run.mesh_devices = ['gpu:1']"
--gin_location_prefix="multilingual_t5/gin/"

AttributeError: 'NoneType' object has no attribute 'vocab_size'

Hi,

I am trying to decode some texts with a finetuned mT5 model, the command looks like the following. The reason of my using test.gin was because when using operative gin file, there would be a mismatching error on line adafactor_decay_rate_pow.offset = 0, so I commented that line.

python -m t5.models.mesh_transformer_main \
--tpu="${TPU_NAME}" \
--gcp_project="${PROJECT}" \
  --tpu_zone="${ZONE}" \
  --model_dir="${MODEL_DIR}" \
  --gin_file="${MODEL_DIR}/test.gin" \
  --gin_file="infer.gin" \
  --gin_file="sample_decode.gin" \
  --gin_param="input_filename = '${INPUT_FILE}'"\
  --gin_param="output_filename = '/tmp/outputs.txt'"\
  --gin_param="utils.tpu_mesh_shape.tpu_topology = '${TPU_SIZE}'"\
  --gin_param="infer_checkpoint_step = 1003000"

And the error was:

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/dist-packages/t5/models/mesh_transformer_main.py", line 264, in <module>
    console_entry_point()
  File "/usr/local/lib/python3.7/dist-packages/t5/models/mesh_transformer_main.py", line 261, in console_entry_point
    app.run(main)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.7/dist-packages/t5/models/mesh_transformer_main.py", line 255, in main
    model_dir=FLAGS.model_dir)
  File "/usr/local/lib/python3.7/dist-packages/gin/config.py", line 1078, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/usr/local/lib/python3.7/dist-packages/gin/utils.py", line 49, in augment_exception_message_and_reraise
    six.raise_from(proxy.with_traceback(exception.__traceback__), None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.7/dist-packages/gin/config.py", line 1055, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mesh_tensorflow/transformer/utils.py", line 2292, in run
    mesh_devices=mesh_devices)
  File "/usr/local/lib/python3.7/dist-packages/mesh_tensorflow/transformer/utils.py", line 1526, in get_estimator
    input_vocab_size=inputs_vocabulary(vocabulary).vocab_size,
AttributeError: 'NoneType' object has no attribute 'vocab_size'
  In call to configurable 'run' (<function run at 0x7fc3f511bbf8>)

I have tried adding the following params but none of them worked.

--module_import="multilingual_t5.tasks" 
--gin_location_prefix="multilingual_t5/gin/" 
--gin_param="tsv_dataset_fn.vocabulary = SentencePieceVocabulary()"
--gin_param="SentencePieceVocabulary.sentencepiece_model_file = 'gs://t5-data/vocabs/mc4.250000.100extra/sentencepiece.model'"

Do you have idea on this one?
Any help would be appreciated, thanks!

Colab Demo

Can we have a Colab demo on how to use this like one in original T5 text-to-text transformer GitHub? Would be really great!

Vocab error unsupported operand type(s) on latest t5 package

i got vocab error when finetuning using MtfModel

/usr/local/lib/python3.6/dist-packages/t5/data/vocabularies.py in vocab_size(self)
     79   def vocab_size(self) -> int:
     80     """Vocabulary size, including extra ids."""
---> 81     return self._base_vocab_size + self.extra_ids
     82 
     83   @abc.abstractproperty

TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

if i downgrade to package t5==0.7.1 the error gone

here is my MtModel config

model = t5.models.MtfModel(
    model_dir=MODEL_DIR,
    tpu=TPU_ADDRESS,
    tpu_topology=TPU_TOPOLOGY,
    model_parallelism=8,
    batch_size=16,
    sequence_length={"inputs": 512, "targets": 32},
    learning_rate_schedule=0.003,
    save_checkpoints_steps=5000,
    keep_checkpoint_max= None,
    iterations_per_loop=100,
)

model.finetune(
    mixture_or_task_name="xquad_zeroshot",
    pretrained_model_dir=PRETRAINED_DIR,
    finetune_steps=10000
)

pre-training data sampler

@craffel , hey there

In the paper at page 3 under 3.2 mT5 section you guys mentioned a data sampling technique that helps to maintain a good balance between low and high resource languages (that prevents models from overfitting from low resource languages, and underfit from large resource languages, by maintaining a probability sampling ratio alpha), by sampling examples according to their probability.

Could you please link me to the specific data sampler code in the mT5 codebase. If you can point me to pre-processing (for pre-training) and pre-training scripts. It will be a great help, as i am trying to pre-train mT5 using the T5 pre-training script from huggingface
https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_t5_mlm_flax.py

how to test mT5 on NER tasks

Hi
I'd like to test T5 on the WikiAnn NER tasks in the paper, I checked the tasks and to me NER ones are not added, could you assist me with sharing with me how I can add them and test mT5 on NER task? thanks

How to access the outputs before sentencepiece detokenisation?

Hi there,

Is there a way to look at the output tokens before they are detokenised? Right now I am using model.predict() (as shown in the t5-trivia example) to generate the output for a sequence to sequence model, but this saves the detokenized output to a file.

I have tried looking at the result in the decode method, but it also returns output which is already detokenized. I want to see the output token ids before sentencepiece detokenization. How can I do this?

Thanks

RuntimeError: Internal: โ–<extra_id_9> is already defined.

Hi,

When I want to run the fine-tuning, it gives me the following error!

Note that the same code is already worked for me, but now does not!


RuntimeError Traceback (most recent call last)
in ()
4 mixture_or_task_name="t5_wikisql_all",
5 pretrained_model_dir=PRETRAINED_DIR,
----> 6 finetune_steps=FINETUNE_STEPS
7 )

8 frames
/usr/local/lib/python3.7/dist-packages/sentencepiece/init.py in LoadFromSerializedProto(self, serialized)
73
74 def LoadFromSerializedProto(self, serialized):
---> 75 return _sentencepiece.SentencePieceProcessor_LoadFromSerializedProto(self, serialized)
76
77 def SetEncodeExtraOptions(self, extra_option):

RuntimeError: Internal: โ–<extra_id_9> is already defined.

including NER tasks

Hi
I would need to run the mt5 model on NER tasks, could you please help adding them? thanks

why it has two DenseReluDense/wi kernel?

In T5, there is just one DenseReluDense/wi kernel like encoder/block_000/layer_001/DenseReluDense/wi/kernel, but it is two in MT5 (encoder/block_000/layer_001/DenseReluDense/wi_0/kernel and encoder/block_000/layer_001/DenseReluDense/wi_0/kernel). Can you suggest me where changes, please?

mT5 data sampling for pre-training

Hi @craffel, I have some questions about the data sampling and T5 span corruption. Could you please link me to the corresponding implementation?

  • About sampling high/low-resource languages. Have you used one dataloader for each language out of 100+, and doing sampling during pre-training?
  • About non-padding span corruption (merge_examples_to_reduce_padding ). Is mT5 trained without padding? If so, is each sampled batch the same language? Otherwise, each concatenated sample may contain different languages, and the span corruption may be across different languages.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.