Code Monkey home page Code Monkey logo

Comments (6)

kracwarlock avatar kracwarlock commented on May 17, 2024 1

We have added this functionality now and the instructions are in the updated README.md. The update is also pasted below for reference. The NLGEval() will load all the models that you require and then you can just call the evaluate method repeatedly. Since the models are not loaded again on each call now, this should be efficient for your use-case. Let us know if there is any other issue. Thanks for bringing this use-case to our notice!


object oriented API for repeated calls in a script

from nlgeval import NLGEval
nlgeval = NLGEval()  # loads the models
metrics_dict = nlgeval.evaluate(references, hypothesis)

where references is a list of ground truth reference text strings and
hypothesis is the hypothesis text string.

from nlg-eval.

AmitMY avatar AmitMY commented on May 17, 2024 1

Thank you very much!
This will be very useful for my planned work

from nlg-eval.

kracwarlock avatar kracwarlock commented on May 17, 2024

Hi.

Calling compute_individual_metrics is going to load the model each time which is going to be very very slow if your corpus is large. compute_metrics loads it only once so that is much more efficient. Your approach of creating files upto the maximum reference number is going to be much faster in that case.

If you are generating sentences and if loading time of the model < generation time and you can do that in parallel then calling compute_individual_metrics might be better. Depends on your setup. Hard to say without knowing more about it.

from nlg-eval.

AmitMY avatar AmitMY commented on May 17, 2024

Thanks!
Is there a way that instead of files, I can compute_metrics for arrays? It's just easier to handle that way. (plus I don't need to wait to read and write to file every time, although it is just a few MB, over like 1000 epochs it adds up)

from nlg-eval.

friskit-china avatar friskit-china commented on May 17, 2024

Hi @kracwarlock ,

I followed @AmitMY 's method:

for every tuple of (hyp, refs), and eventually, average them

However, the averaged performance is worse than evaluate on the entire hyp and ref lists (by the nlg-eval standalone command). Which one is correct?

from nlg-eval.

juharris avatar juharris commented on May 17, 2024

I think it's because for BLEU we're using option='closest':
https://github.com/Maluuba/nlg-eval/blob/master/nlgeval/pycocoevalcap/bleu/bleu.py#L40

Sorry I don't know which is "correct".

from nlg-eval.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.