The REAMDE, and manual running of the binary file results in: <p dir=

None-consistent results between corpus and single about nlg-eval HOT 3 CLOSED

maluuba commented on May 18, 2024

None-consistent results between corpus and single

from nlg-eval.

Comments (3)

AmitMY commented on May 18, 2024 1

Thanks! I wasn't aware.

For future reference:
Instead of averaging the sentence level BLEU scores (i.e. marco-average precision), the original BLEU metric (Papineni et al. 2002) accounts for the micro-average precision (i.e. summing the numerators and denominators for each hypothesis-reference(s) pairs before the division).

from nlg-eval.

kracwarlock commented on May 18, 2024

This is expected behavior. BLEU and METEOR are calculated across the entire corpus and the corpus score is not supposed to be the average of sentence scores. You can search online for corpus BLEU and you will find several explanations available.

from nlg-eval.

kracwarlock commented on May 18, 2024

This was the reason behind why this repository originally assumed that you would have all the hypothesis and references in advance and then you would run compute_metrics over them to report the corpus level scores as most papers just report that.

from nlg-eval.

None-consistent results between corpus and single about nlg-eval HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent