#I have a corpus with varying number of references for each candidate. Is there a

Hi. Calling compute_individual_metrics</co

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Question: Varying number of references about nlg-eval HOT 6 CLOSED

maluuba commented on May 17, 2024

Question: Varying number of references

from nlg-eval.

Comments (6)

kracwarlock commented on May 17, 2024 1

We have added this functionality now and the instructions are in the updated README.md. The update is also pasted below for reference. The NLGEval() will load all the models that you require and then you can just call the evaluate method repeatedly. Since the models are not loaded again on each call now, this should be efficient for your use-case. Let us know if there is any other issue. Thanks for bringing this use-case to our notice!

object oriented API for repeated calls in a script

from nlgeval import NLGEval
nlgeval = NLGEval()  # loads the models
metrics_dict = nlgeval.evaluate(references, hypothesis)

where references is a list of ground truth reference text strings and
hypothesis is the hypothesis text string.

from nlg-eval.

AmitMY commented on May 17, 2024 1

Thank you very much!
This will be very useful for my planned work

from nlg-eval.

kracwarlock commented on May 17, 2024

Hi.

Calling compute_individual_metrics is going to load the model each time which is going to be very very slow if your corpus is large. compute_metrics loads it only once so that is much more efficient. Your approach of creating files upto the maximum reference number is going to be much faster in that case.

If you are generating sentences and if loading time of the model < generation time and you can do that in parallel then calling compute_individual_metrics might be better. Depends on your setup. Hard to say without knowing more about it.

from nlg-eval.

AmitMY commented on May 17, 2024

Thanks!
Is there a way that instead of files, I can compute_metrics for arrays? It's just easier to handle that way. (plus I don't need to wait to read and write to file every time, although it is just a few MB, over like 1000 epochs it adds up)

from nlg-eval.

friskit-china commented on May 17, 2024

Hi @kracwarlock ,

I followed @AmitMY 's method:

for every tuple of (hyp, refs), and eventually, average them

However, the averaged performance is worse than evaluate on the entire hyp and ref lists (by the nlg-eval standalone command). Which one is correct?

from nlg-eval.

juharris commented on May 17, 2024

I think it's because for BLEU we're using option='closest':
https://github.com/Maluuba/nlg-eval/blob/master/nlgeval/pycocoevalcap/bleu/bleu.py#L40

Sorry I don't know which is "correct".

from nlg-eval.

Recommend Projects

Question: Varying number of references about nlg-eval HOT 6 CLOSED

Comments (6)

object oriented API for repeated calls in a script

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent