SEScore

Description

In this repo we explore different methods to improve the already great SEScore evaluation metric.

Background

SEScore is a reference-based text-generation evaluation metric that requires no pre-human-annotated error data, described in the paper Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis.
Generally speaking, the paper describes a stratified dataset synthesis pipeline where sentences get corrupted via pre-defined methods, then the newly corrupted sentences receive a score that represends how "severe" the corruption was via bi-directional entailment and finally we train a NN to learn the scores accumulated by the bi-directional entailment model.
While this method performs very well and has improved upon the SOTA, we believe there is room for improvement.

Suggested improvements

The paper describes a stratified way to accumulated errors via corruption of the sentences. The corruption of the sentences occurs by Adding/Replacing/Deleting/Swapping tokens in the original sentence. While effective, recent papers showed more effective masking techniques which could help create more meaningfull corruption. Our first proposal would be to use PMI masking instead of token masking in the corruption of the sententces.
The severity score used in the paper followed the MQM metric of assessing the severity of errors in text. Again, while this metric has been based in many papers, the accumulative nature of the suggested severity score causes it to suffer from monotinicity, which could not accurately represent the changes happening in the newly corrupted sentece. Additionally, the metric is discrete and this is could lead loss of information when attributing severity to an error.
We propose two changes to the severity score metric which will allow it to be non-monotonic and also continuous. For more details please refer to the Research Proposal.pptx file

Results

How to run?

Run new_xlm_mbart_data.py for English:

python3 new_xlm_mbart_data.py -num_var 10 -lang en_XX -src case_study_src -ref case_study_ref -save save_file_name -severity ['original','2_1','2_2'] -whole_words True

ofekglick / sescore Goto Github PK

sescore's Introduction

SEScore

Description

Background

Suggested improvements

Results

How to run?

Run new_xlm_mbart_data.py for English:

sescore's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent