disi-unibo-nlp / nlg-metricverse Goto Github PK

[COLING22] An End-to-End Library for Evaluating Natural Language Generation

License: MIT License

Python 92.00% Jupyter Notebook 8.00%

natural-language-processing natural-language-generation metrics python pytorch language-models visualization nlg-evaluation

nlg-metricverse's Introduction

nlg-metricverse 🌌

🚀 Spaceship
👨‍🚀 Astronauts
🛰️ Training Program
📕 Operating Manual	COLING22 Long Paper

One NLG evaluation library to rule them all

Explore the universe of Natural Language Generation (NLG) evaluation metrics.

NLG Metricverse is an end-to-end Python library for NLG evaluation, devised to provide a living unified codebase for fast application, analysis, comparison, visualization, and prototyping of automatic metrics.

Spures the adoption of newly proposed metrics, unleashing their potential
Reduces the implementational burden, allowing users to easily move from papers to practical applications.
Increases comparability and replicability of NLG research.
Provides content-rich metric cards and static/interactive visualization tools to improve metric understanding and scoring interpretation.

Tables Of Contents

Motivations
Available Metrics
Installation
- Explore on Hugging Face Spaces
Quickstart
- Metric Selection
  - Metric Documentation
  - Metric Filtering
- Metric Usage
Code Style
Custom Metrics
Contributing
Contact
License

💡 Motivations

📖 As Natural Language Generation (NLG) models are getting better over time, accurately evaluating them is becoming an increasingly pressing priority, asking researchers to deal with semantics, different plausible targets, and multiple intrinsic quality dimensions (e.g., informativeness, fluency, factuality).
🤖 Task examples: machine translation, abstractive question answering, single/multi-document summarization, data-to-text, chatbots, image/video captioning, etc.
📌 Human evaluation is often the best indicator of the quality of a system. However, designing crowd sourcing experiments is an expensive and high-latency process, which does not easily fit in a daily model development pipeline. Therefore, NLG researchers commonly use automatic evaluation metrics, which provide an acceptable proxy for quality and are very cheap to compute.
📌 NLG metrics automatically compute a holistic or dimension-specific score, an acceptable proxy for effectiveness and efficiency. However, they are becoming an important bottleneck for research in the field. As we know, areas can stagnate due to poor metrics, and we believe that you shouldn't feel confined to the most traditional overlap-based techniques like ROUGE.
💡 If you're working on an established problem, you'll feel pressure from readers to be conservative and use the metrics that have already been tested for the same task. However, this might be a compelling pressure. Our view is that NLP engineers should enrich their evaluation toolkits with multiple metrics capturing different textual properties, being free to argue against cultural norms and motivate new ones, also exploring the latest contributions focused on semantics.
☠ New NLG metrics are constantly being proposed to top-tier venue conferences, but their implementation remains disrupted, with distinct environments, properties, settings, benchmarks, and features—making them difficult to compare or apply.
☠ The absence of a collective and continuously updated repository discourages the use of modern solutions and slows their understanding.
🎯 NLG Metricverse implements a large number of prominent evaluation metrics in NLG, seeking to articulate the textual properties they encode (e.g., fluency, grammatical correctness, informativeness), tasks, and limits. Understanding, using, and examining a metric has never been easier.

🪐 Available Metrics and Supported Features

NLG Metricverse supports 38 diverse evaluation metrics overall (last update: October 12, 2022). The code for these metrics will be progressively released in the coming weeks.

Some libraries have already tried to make an integrated environment. To our best knowledge, NLGEval, HugginFace Datasets, Evaluate, Torch-Metrics, and Jury are the only resources available. However, none of them possess all the properties listed below: (i) large number of heterogeneous NLG metrics, (ii) concurrent computation of more metrics at once, (iii) support for multiple references and/or predictions, (iv) meta-evaluation, and (v) visualization.

The following table summarizes the discrepancies between NLG Metricverse and related work.

	NLG-Metricverse	NLGEval	Datasets	Evaluate	TorchMetrics	Jury
#NLG-specific metrics	43 + Datasets	8	22	53	13	19 + Datasets
More metrics at once	✅	❌	❌	✅	❌	✅
Multiple refs/preds	✅	✅	❌	❌	❌	✅
Meta-evaluation	✅	❌	❌	❌	❌	❌
Visualization	✅	❌	❌	❌	❌	❌

🔍 Complete comparison and supported metrics

🔌 Installation

Install from PyPI repository

pip install nlg-metricverse

or build from source

git clone https://github.com/disi-unibo-nlp/nlg-metricverse.git
cd nlg-metricverse
pip install -v .

Explore on Hugging Face Spaces

The Spaces edition of NLG Metricverse will be launched soon. Check it out here:

We decided to create the app with the purpose of giving the users the possibility to try a demo of the library before installing it. We chose Hugging Face Spaces because, because it offers a simple way to host machine learning demo apps directly on a profile.

In the data app Home, there is a brief descriction of the library and its contents. On the left side of the screen is located the retractable sidebar, which contains all the metrics available in the data app. On a metric's page is the description of the metric and an input box for both prediction and reference text. After putting the texts and having pressed the Submit button, through the use of the NLG-Metricverse library, the prediction and reference are used to calculate the desidered score.

🚀 Quickstart

Prepare your environment

For NLGmetricverse we recommend using a virtual environment. If you are not familiar with virtual environments, you can read more about them here. Using virtual environments within a library that encompasses numerous metrics proves invaluable for seamless development and efficient management. By encapsulating each metric within its isolated environment, potential conflicts between dependencies are mitigated, ensuring consistent and reliable behavior. This approach streamlines dependency management, enabling precise specification of version requirements for each metric. Moreover, venv facilitates rigorous testing and reproducibility, safeguarding the library's integrity across various metric-driven scenarios. As metrics expand, venv simplifies collaboration among team members, reduces the risk of global environment contamination, and eases deployment processes.

Before running any code, you need to create and activate a virtual environment for the desidered metric and install the required dependencies.

python -m venv nlgmetricverse\env\rouge

#activate the virtual environment on Command Prompt
nlgmetricverse\env\rouge\Scripts\activate.bat

#or else on powershell
nlgmetricverse\env\rouge\Scripts\activate.ps1

!pip install -v . --quiet
"""Also, you need to install the packages which are available through a git source separately with the following command. 
For the folks who are curious about "why?"; a short explaination is that PYPI does not allow indexing a package which 
are directly dependent on non-pypi packages due to security reasons. The file `requirements-dev.txt` includes packages 
which are currently only available through a git source, or they are PYPI packages with no recent release or 
incompatible with NLGmetricverse, so that they are added as git sources or pointing to specific commits."""
!pip install -r requirements-dev.txt

#if present, install the specific requirements for the metric
!pip install -r nlgmetricverse\metrics\rouge\requirements.txt

After that, you can run the code for the metric you want to use. After you are done, you can deactivate the virtual environment.

deactivate

Then it is only with two lines of code to evaluate generated outputs: (i) instantiate your scorer by selecting the desired metric(s) and (ii) apply it!

Metric Selection

Specify the metrics you want to use on instantiation,

# If you specify more metrics, each of them will be applyied on your data (allowing for a fast prediction/efficiency comparison)
scorer = NLGMetricverse(metrics=["bleu", "rouge"])

or directly import metrics from nlgmetricverse.metrics as classes, then instantiate and use them as desired.

from nlgmetricverse.metrics import BertScore

scorer = BertScore.construct()

You can seemlessly access both nlgmetricverse and HuggingFace datasets metrics through nlgmetricverse.load_metric. NLG Metricverse falls back to datasets implementation of metrics for the ones that are currently not supported; you can see the metrics available for datasets on datasets/metrics.

bleu = NLGMetricverse.load_metric("bleu")
# metrics not available in `nlgmetricverse` but in `datasets`
wer = NLGMetricverse.load_metric("competition_math") # It falls back to `datasets` package with a warning

Metric Usage

Prediction-Reference Cardinality

☠ NLG evaluation is very challenging also because the relationships between candidate and reference texts tend to be one-to-many or many-to-many. An artificial text predicted by a model might have multiple human references (i.e., there is more than one effective way to say most things), as well as a model can generate multiple distinct outputs. Such cardinality is crucial, but official implementations tend to neglect it. We do not.

1:1. One prediction, one reference ([p₁, ..., p_n] and [r₁, ..., r_n] syntax).

predictions = ["Evaluating artificial text has never been so simple", "the cat is on the mat"]
references = ["Evaluating artificial text is not difficult", "The cat is playing on the mat."]

1:M. One prediction, many references ([p₁, ..., p_n] and [[r₁₁, ..., r_1m], ..., [r_n1, ..., r_nm]] syntax)

predictions = ["Evaluating artificial text has never been so simple", "the cat is on the mat"]
references = [
    ["Evaluating artificial text is not difficult", "Evaluating artificial text is simple"],
    ["The cat is playing on the mat.", "The cat plays on the mat."]
]

K:M. Many predictions, many references ([[p₁₁, ..., p_1k], ..., [p_n1, ..., p_nk]] and [[r₁₁, ..., r_1m], ..., [r_n1, ..., r_nm]] syntax). This is helpful for language models with a decoding strategy focused on diversity (e.g., beam search, temperature sampling).

predictions = [
    ["Evaluating artificial text has never been so simple", "The evaluation of automatically generated text is simple."],
    ["the cat is on the mat", "the cat likes playing on the mat"]
]
references = [
    ["Evaluating artificial text is not difficult", "Evaluating artificial text is simple"],
    ["The cat is playing on the mat.", "The cat plays on the mat."]
]

Scorer Application

scores = scorer(predictions, references)

The scorer automatically selects the proper strategy for applying the selected metric(s) depending on the input format. In any case, if a prediction needs to be compared against multiple references, you can customize the reduction function to use (e.g., reduce_fn=max chooses the prediction-reference pair with the highest score for each of the N items in the dataset).

scores = scorer.compute(predictions, references, reduce_fn="max")

Metric-specific Parameters

Additional metric-specific parameters can be specified on instantiation.

metrics = [
    load_metric("bleu", resulting_name="bleu_1", compute_kwargs={"max_order": 1}),
    load_metric("bleu", resulting_name="bleu_2", compute_kwargs={"max_order": 2}),
    load_metric("bertscore", resulting_name="bertscore_1", compute_kwargs={"model_type": "microsoft/deberta-large-mnli", "idf": True}),
    load_metric("rouge")]
scorer = NLGMetricverse(metrics=metrics)

Code Style

To check the code style,

python tests/run_code_style.py check

To format the codebase,

python tests/run_code_style.py format

🎨 Custom Metrics

You can use custom metrics by inheriting nlgmetricverse.metrics.Metric. You can see current metrics implemented on NLG Metricverse from nlgmetricverse/metrics. NLG Metricverse itself uses datasets.Metric as a base class to drive its own base class as nlgmetricverse.metrics.Metric. The interface is similar; however, NLG Metricverse makes the metrics to take a unified input type by handling metric-specific inputs and allowing multiple cardinalities (1:1, 1:M, K:M). For implementing custom metrics, both base classes can be used but we strongly recommend using nlgmetricverse.metrics.Metric for its advantages. When using a custom metric, you need to:

Create a folder inside nlgmetricverse/metrics with the name of your metric.
Create inside the folder __init__.py, *metric*.py and *metric*_planet.py.
Inside __init__.py, add the following code:

from nlgmetricverse.metrics.*metric*.*metric* import *Metric*

Inside *metric*.py, add the following code:

"""
*Metric* metric super class.
"""
from nlgmetricverse.metrics._core import MetricAlias
from nlgmetricverse.metrics.*metric*.*metric* import *CustomMetric*

__main_class__ = "*Metric*"


class *Metric*(MetricAlias):
    """
    *Metric* metric superclass.
    """
    _SUBCLASS = *CustomMetric*

Inside *metric*_planet.py, add the following code:

from nlgmetricverse.metrics import MetricForLanguageGeneration

class CustomMetric(MetricForLanguageGeneration):
    def _compute_single_pred_single_ref(
        self, predictions, references, reduce_fn = None, **kwargs
    ):
        raise NotImplementedError

    def _compute_single_pred_multi_ref(
        self, predictions, references, reduce_fn = None, **kwargs
    ):
        raise NotImplementedError

    def _compute_multi_pred_multi_ref(
            self, predictions, references, reduce_fn = None, **kwargs
    ):
        raise NotImplementedError

For more details, have a look at base metric implementation nlgmetricverse.metrics.Metric

Inside your metric folder add a README.md file, following the metric card guidelines.
Add your metric to the comparison table and to the README.md file.
Add your metric to nlgmetricverse/metrics/__init__.py file.
Add your metric to metrics_list inside nlgmetricverse/metrics/_core/utils.py file.
Add test cases for your metric inside tests/nlgmetricverse/metrics folder, with its respective expected outputs, inside tests/test_data/expected_outputs/metrics folder.

🙌 Contributing

Thanks go to all these wonderful collaborations for their contribution towards the NLG Metricverse library:

_{Giacomo Frisoni}

_{Andrea Zammarchi}

_{Valentina Pieri}

_{Marco Avagnano}

We are hoping that the open-source community will help us edit the code and make it better! Don't hesitate to open issues and contribute the fix/improvement! We can guide you if you're not sure where to start but want to help us out 🥇. In order to contribute a change to our code base, please submit a pull request (PR) via GitHub and someone from our team will go over it and accept it.

If you have troubles, suggestions, or ideas, the Discussion board might have some relevant information. If not, you can post your questions there 💬🗨.

✉ Contact

Contact person: Giacomo Frisoni, [email protected]. This research work has been conducted within the Department of Computer Science and Engineering, University of Bologna, Italy.

License

The code is released under the MIT License. It should not be used to promote or profit from violence, hate, and division, environmental destruction, abuse of human rights, or the destruction of people's physical and mental health.

Acknowledgments

In 2021, as we initiated the project development, we constructed our contributions above Jury and HuggingFace Evaluate, for which we express our gratitude. The project files explicitly state license details.

Star History

nlg-metricverse's People

Contributors

Stargazers

Watchers

Forkers

techthiyanes loxetto mihara-bot zhutony jeevananandanne

nlg-metricverse's Issues

Result Inconsistency with e2e-metrics.py

Describe the bug
Result didn't same as e2e-metrics
(Or is there any hyper parameters that I missed?)

To Reproduce
Calculate score with same data

Expected behavior
The results should be the same.

Environment Information:

OS: Ubuntu 22.04.3 LTS
nlgmetricverse version: 0.9.9

Pip install on mac fails

Describe the bug
Pip installing fails on mac

To Reproduce
pip install nlg-metricverse

Exception Traceback (if available)
`% pip install nlg-metricverse
Looking in indexes: https://pypi.org/simple, https://nlp.circleci:****@artifactory.ops.babylontech.co.uk/artifactory/api/pypi/babylon-pypi/simple
Collecting nlg-metricverse
Using cached nlg-metricverse-0.1.0.tar.gz (81 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [18 lines of output]
Traceback (most recent call last):
File "/Users/francesco.moramarco/gvenv/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 351, in
main()
File "/Users/francesco.moramarco/gvenv/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 333, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/Users/francesco.moramarco/gvenv/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
File "/private/var/folders/jj/s730cphx5cq73f0xd964gn780000gp/T/pip-build-env-fl5ohxlq/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 338, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
File "/private/var/folders/jj/s730cphx5cq73f0xd964gn780000gp/T/pip-build-env-fl5ohxlq/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 320, in _get_build_requires
self.run_setup()
File "/private/var/folders/jj/s730cphx5cq73f0xd964gn780000gp/T/pip-build-env-fl5ohxlq/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 483, in run_setup
super(_BuildMetaLegacyBackend,
File "/private/var/folders/jj/s730cphx5cq73f0xd964gn780000gp/T/pip-build-env-fl5ohxlq/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 335, in run_setup
exec(code, locals())
File "", line 153, in
File "", line 71, in get_requirements
FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.`

Environment Information:

OS: macOS Monterey (v12.6.1)
nlgmetricverse version: latest on pypi

AttributeError: 'DownloadConfig' object has no attribute 'storage_options'

Describe the bug
from nlgmetricverse import NLGMetricverse, load_metric
then the error message occurred:
AttributeError: 'DownloadConfig' object has no attribute 'storage_options'

To Reproduce
Just import the package: from nlgmetricverse import NLGMetricverse, load_metric

Expected behavior
No error message.

Environment Information:
Google Colab Default Environment

Moreover, this bug seems to be the same in obss/jury#125.
It is possible that the bug is caused by Datasets.
After I run pip install datasets==2.9.0, this bug is temporarily fixed.

Calling scorer gives: TypeError: call() takes 1 positional argument but 3 were given

Describe the bug
When I try to follow the tutorial for, I get the error TypeError: __call__() takes 1 positional argument but 3 were given.

To Reproduce

from nlgmetricverse import NLGMetricverse
scorer = NLGMetricverse(metrics=["bertscore"])
predictions = ["test"]
references = ["test"]
scores = scorer(predictions, references) # <-- error happens here

Expected behavior
This should calculate the bertscore.

Exception Traceback (if available)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[28], line 1
----> 1 scores = scorer(predictions, references)
      2 print(scores)

TypeError: __call__() takes 1 positional argument but 3 were given

Environment Information:

OS: (e.g. Ubuntu 20.04 LTS)
nlgmetricverse version: 0.9.6
evaluate version: 0.3.0
datasets version: 2.12.0

moverscore fails

Describe the bug
The MoverScore calculation seems to fail when trying to make use of the DistilBert tokenizer.

To Reproduce

>>> import nlgmetricverse
>>> scorer = nlgmetricverse.NLGMetricverse(metrics=['moverscore'])
>>> scorer(predictions=['foo'], references=['bar'])
# Expected score value

Exception Traceback (if available)

>>> import nlgmetricverse
2023-09-19 13:19:51.367801: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> scorer = nlgmetricverse.NLGMetricverse(metrics=['moverscore'])
/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/transformers/generation_utils.py:24: FutureWarning: Importing `GenerationMixin` from `src/transformers/generation_utils.py` is deprecated and will be removed in Transformers v5. Import as `from transformers import GenerationMixin` instead.
  warnings.warn(
/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/transformers/generation_tf_utils.py:24: FutureWarning: Importing `TFGenerationMixin` from `src/transformers/generation_tf_utils.py` is deprecated and will be removed in Transformers v5. Import as `from transformers import TFGenerationMixin` instead.
  warnings.warn(
loading configuration file config.json from cache at /home/mmior/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/6cdc0aad91f5ae2e6712e91bc7b65d1cf5c05411/config.json
Model config DistilBertConfig {
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "output_attentions": true,
  "output_hidden_states": true,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.33.2",
  "vocab_size": 30522
}

loading file vocab.txt from cache at /home/mmior/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/6cdc0aad91f5ae2e6712e91bc7b65d1cf5c05411/vocab.txt
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at None
loading file tokenizer_config.json from cache at /home/mmior/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/6cdc0aad91f5ae2e6712e91bc7b65d1cf5c05411/tokenizer_config.json
loading configuration file config.json from cache at /home/mmior/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/6cdc0aad91f5ae2e6712e91bc7b65d1cf5c05411/config.json
Model config DistilBertConfig {
  "_name_or_path": "distilbert-base-uncased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.33.2",
  "vocab_size": 30522
}

loading weights file model.safetensors from cache at /home/mmior/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/6cdc0aad91f5ae2e6712e91bc7b65d1cf5c05411/model.safetensors
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.bias', 'vocab_transform.weight', 'vocab_projector.bias', 'vocab_layer_norm.weight', 'vocab_transform.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of DistilBertModel were initialized from the model checkpoint at distilbert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use DistilBertModel for predictions without further training.
>>> scorer(predictions=['foo'], references=['bar'])
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/mmior/.pyenv/versions/3.8.16/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/mmior/.pyenv/versions/3.8.16/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/moverscore_v2.py", line 30, in process
    a = ["[CLS]"]+truncate(tokenizer.tokenize(a))+["[SEP]"]
  File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/moverscore_v2.py", line 25, in truncate
    if len(tokens) > tokenizer.max_len - 2:
AttributeError: 'DistilBertTokenizer' object has no attribute 'max_len'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/nlgmetricverse/core.py", line 86, in __call__
    score = self._compute_single_score(inputs)
  File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/nlgmetricverse/core.py", line 208, in _compute_single_score
    score = metric.compute(predictions=predictions, references=references, reduce_fn=reduce_fn, **kwargs)
  File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/evaluate/module.py", line 444, in compute
    output = self._compute(**inputs, **compute_kwargs)
  File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/nlgmetricverse/metrics/_core/base.py", line 362, in _compute
    result = self.evaluate(predictions=predictions, references=references, reduce_fn=reduce_fn, **eval_params)
  File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/nlgmetricverse/metrics/_core/base.py", line 302, in evaluate
    return eval_fn(predictions=predictions, references=references, **kwargs)
  File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/nlgmetricverse/metrics/moverscore/moverscore_planet.py", line 132, in _compute_single_pred_single_ref
    idf_dict_ref = moverscore_v2.get_idf_dict(references)  # idf_dict_ref = defaultdict(lambda: 1.)
  File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/moverscore_v2.py", line 42, in get_idf_dict
    idf_count.update(chain.from_iterable(p.map(process_partial, arr)))
  File "/home/mmior/.pyenv/versions/3.8.16/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/mmior/.pyenv/versions/3.8.16/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
AttributeError: 'DistilBertTokenizer' object has no attribute 'max_len'

Environment Information:

OS: Ubuntu 20.04.6 LTS
nlgmetricverse version: 0.9.9
evaluate version: 0.4.0
datasets version: 2.9.0
moverscore version: 1.0.3

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.