sighsmile / conlleval Goto Github PK

conlleval in Python (script for chunking/NER evaluation)

Python 98.12% TeX 1.88%

conlleval's Introduction

conlleval-python

NOTE: This repository is currently not managed. Please check the Issues page for bug reports. I will try to fix it later. Sorry for any inconvenience.

Intro

This repository contains two scripts:

conlleval_perl.py: the Python equivalent of the Perl script conlleval, which can be used for measuring the performance of a system that has processed the CoNLL-2000 shared task data.
conlleval.py: a slight modification on the above script, so that it can be imported and used elsewhere. You will find import evaluate from conlleval useful.

For more information on the original Perl script, see http://www.cnts.ua.ac.be/conll2000/chunking/output.html.

Usage

Both scripts can be used to evaluate from a supported file.

Read from output.txt and print the results to the console:

  python conlleval.py < output.txt

or save the results in result.txt:

  python conlleval.py < output.txt > result.txt

And the result is:

   processed 961 tokens with 459 phrases; found: 539 phrases; correct: 371.
   accuracy:  84.08%; precision:  68.83%; recall:  80.83%; FB1:  74.35
                ADJP: precision:   0.00%; recall:   0.00%; FB1:   0.00
                ADVP: precision:  45.45%; recall:  62.50%; FB1:  52.63
                  NP: precision:  64.98%; recall:  78.63%; FB1:  71.16
                  PP: precision:  83.18%; recall:  98.89%; FB1:  90.36
                SBAR: precision:  66.67%; recall:  33.33%; FB1:  44.44
                  VP: precision:  69.00%; recall:  79.31%; FB1:  73.80

Options for conlleval_perl.py (the same as the original Perl script):

-l: Generate output as part of a LaTeX table. The definition of the table can be found in an example document: latex ps pdf
-d char: On each line, use this character rather than whitespace (or tab) as delimiter between tokens
-r: Assume raw output tokens, that is without the prefixes B- and I-. In this case each word will be counted as one chunk.
-o token: Use token as output tag for items that are outside of chunks or other classes. This option only works when -r is used as well. The default value for the outside output tag is O.

Usage for conlleval.py:

from conlleval import evaluate

# print out the table as above
evaluate(true_tags, pred_tags, verbose=True) 

# calculate overall metrics
prec, rec, f1 = evaluate(true_tags, pred_tags, verbose=False)

Data format

NOTE: This script can be used with IOB2 or IOBES tagging scheme. If you are using a different scheme, please convert to IOB2 or IOBES.

EDIT: There has been report that IOB2 support is broken (see Issues page). This has not been fixed yet.

For an example of data format to be used with this script, check out the accompanied output.txt file in this repository, or the original source at http://www.cnts.ua.ac.be/conll2000/chunking/output.txt.gz.

Sentences are separated by empty lines. Each line contains at least three columns, seperated by whitespaces (or a character specified in -d). The second last column is the chunk tag according to the corpus, and the last column is the predicted chunk tag. The other columns are ignored in evaluation.

Example:

   Boeing NNP B-NP I-NP
   's POS B-NP B-NP
   747 CD I-NP I-NP
   jetliners NNS I-NP I-NP
   . . O O
   
   Rockwell NNP B-NP I-NP
   said VBD B-VP B-VP
   the DT B-NP B-NP
   agreement NN I-NP I-NP

conlleval's People

Contributors

Stargazers

Watchers

Forkers

buerkobe berfubuyukoz oya163 soul-an wlhgtc xieceinhe duytk5 liu-nlper daisaku-ss ardakdemir nwf5d guojson fanglanting kyhyeon songweiwei mcdavid109 mylv1222 htfhxx aiedward qniguoym vikas95 pig7788 jlee24282 yuye2133 kanthiparekh vaagdevi210 qq-peng drjinying podorskaja littleflow3r switchsyj hovinhthinh ychsing fredymad lisaterumi tomohiro-git bplank alinsir katecalacat tanaykarve jeesu-jung pvmtrang rippleljy

conlleval's Issues

Is this evaluation a strict or relaxted metrics?

I have ran the script on the following data:

(978) B-Phone I-Phone
934-3623 I-Phone I-Phone

In the IOB2 mode, every entity tag should starts with B. So the precision of the above should be 0. But the script you provided shows a result of 100% precision. However, I tried the original perl version conlleval script, same result.

ValueError: not enough values to unpack (expected 2, got 1)

Getting this error when using arbitrary tags like , , etc.

count_chunks(true_seqs, pred_seqs)
129
130 _, true_type = split_tag(true_tag)
--> 131 _, pred_type = split_tag(pred_tag)
132
133 if correct_chunk is not None:

ValueError: not enough values to unpack (expected 2, got 1)

I believe this happens if tags don't have a "-" in them or aren't an "O".

OSError: conlleval: unexpected number of features

My output.txt file is the following format:
Sao NC B-LOC B-LOC
Paulo VMI I-LOC I-LOC
( Fpa O O
Brasil NC B-LOC B-LOC
) Fpt O O
, Fc O O
23 Z O O
may NC O O
( Fpa O O
EFECOM NP B-ORG B-ORG
) Fpt O O
. Fp O O

Which should be okay according to the conll format. But after running the conll.py I am getting this error:

Do you have any idea why? @sighsmile
Thanks in advance!

关于evaluate(true_tags, pred_tags, verbose=True)

请问一下您，true_tags和pred_tags的输入是否类似[[,,,,,,,,,],[,,,,,,,],[,,,,,,,,,,]]，即二维嵌套list类型？

#这样会报错
from conlleval import evaluate
true_tags =[ ['O', 'B-Part', 'I-Part'],[ 'O', 'O', 'O']]
pred_tags = [ ['O', 'B-Part', 'I-Part'],[ 'O', 'O', 'O']]
evaluate(true_tags,pred_tags)

#这样不报错
from conlleval import evaluate
true_tags =[ 'O', 'B-Part', 'I-Part', 'O', 'O', 'O']
pred_tags = [ 'O', 'B-Part', 'I-Part', 'O', 'O', 'O']
evaluate(true_tags,pred_tags)

这是否意味着每次只能对一个序列进行性能评估？

IOB2 evaluation

IOB2 is supported as input as you told in the readme, but the evaluation metrics is not in iob2 way. It's in iob1 way. So, you can either provide option for people to choose evaluation scheme for iob1 or iob2, or you can stop the support of iob2 format input.

why the result is different between conlleval.py and connlleval_perl.py?

my output file is like:

a  B-LOC  B-LOC
b  I-LOC  E-LOC
c  E-LOC  S-LOC

and the result is different,
for conlleval.py

processed 3 tokens with 1 phrases; found: 2 phrases; correct: 0.
accuracy:  33.33%; (non-O)
accuracy:  33.33%; precision:   0.00%; recall:   0.00%; FB1:   0.00
              LOC: precision:   0.00%; recall:   0.00%; FB1:   0.00  2

for connlleval_perl.py

processed 3 tokens with 1 phrases; found: 1 phrases; correct: 1.
accuracy:  33.33%; precision: 100.00%; recall: 100.00%; FB1: 100.00
              LOC: precision: 100.00%; recall: 100.00%; FB1: 100.00  1

which one is reliable?

sighsmile / conlleval Goto Github PK

conlleval's Introduction

conlleval-python

Intro

Usage

Data format

conlleval's People

Contributors

Stargazers

Watchers

Forkers

conlleval's Issues

Is this evaluation a strict or relaxted metrics?

ValueError: not enough values to unpack (expected 2, got 1)

OSError: conlleval: unexpected number of features

关于evaluate(true_tags, pred_tags, verbose=True)

IOB2 evaluation

why the result is different between conlleval.py and connlleval_perl.py?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent