Code Monkey home page Code Monkey logo

bratutils's People

Contributors

hugosousa avatar jeanphilippegoldman avatar savkov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bratutils's Issues

Discontinuous Annotations

I would like to use this in a corpus annotation project that uses discontinuous annotations, but I receive the following error.

Traceback (most recent call last):
File "vso-inter-annotator.py", line 5, in
doc = a.DocumentCollection('data/BoireAnnotations/VSO_Hypertension1/')
File "build/bdist.macosx-10.10-intel/egg/bratutils/agreement.py", line 834, in init
File "build/bdist.macosx-10.10-intel/egg/bratutils/agreement.py", line 654, in init
File "build/bdist.macosx-10.10-intel/egg/bratutils/agreement.py", line 292, in init
File "build/bdist.macosx-10.10-intel/egg/bratutils/agreement.py", line 301, in _parse_annotation
ValueError: invalid literal for int() with base 10: '6419;6435'

vso-inter-annotator.py contains the following:
#VSO inter-rater agreement using BRAT utils

from bratutils import agreement as a

doc = a.DocumentCollection('data/BoireAnnotations/VSO_Hypertension1/')
doc2 = a.DocumentCollection('data/HerringAnnotations/VSO_Hypertension1/')

doc.make_gold()
statistics = doc2.compare_to_gold(doc)

print statistics

Here is the annotation file that is causing the error.

T1 VSO_0000005 3395 3407 182/107 mmHg
T2 VSO_0000005 4300 4312 200/100 mmHg
T3 VSO_0000008 4254 4260 36.8°C
T4 VSO_0000005 6518 6529 160/80 mmHg
T5 VSO_0000005 15833 15844 170/80 mmHg
T6 VSO_0000038 16385 16408 Systolic blood pressure
T7 VSO_0000005 16438 16446 200 mmHg
T8 VSO_0000005 16867 16878 135/95 mmHg
T9 VSO_0000005 16959 16971 160/100 mmHg
T10 VSO_0000005 17659 17671 220/120 mmHg
T11 VSO_0000005 18143 18154 135/95 mmHg
T12 VSO_0000004 3370 3384 blood pressure
T13 VSO_0000007 4239 4250 temperature
T14 VSO_0000004 4282 4296 blood pressure
T15 VSO_0000004 6486 6500 Blood pressure
T16 VSO_0000004 15802 15816 Blood pressure
T17 VSO_0000004 16826 16840 Blood pressure
T18 VSO_0000004 16941 16955 blood pressure
T19 VSO_0000004 17624 17638 Blood pressure
T20 GO_0008217 17713 17738 Blood pressure normalized
T21 VSO_0000004 18125 18139 blood pressure
T23 VSO_0000030 4341 4360 63 beats per minute
T24 GO_0008217 6405 6419;6435 6442 blood pressure control
T31 GO_0008217 16046 16060;16072 16079 blood pressure control
T33 VSO_0000006 16826 16844;16855 16863 Blood pressure was measured
T34 GO_0008217 17015 17029;17041 17048 blood pressure control
T38 VSO_0000029 4314 4324 Heart rate
T39 VSO_0000004 6147 6161 blood pressure
T41 GO_0008217 6486 6514 Blood pressure was decreased
T43 VSO_0000006 15802 15829 Blood pressure was measured
T22 VSO_0000004 6405 6419 blood pressure
T25 VSO_0000004 16046 16060 blood pressure
T26 VSO_0000004 17015 17029 blood pressure
T27 VSO_0000004 17713 17727 Blood pressure
T28 VSO_0000004 18517 18531 blood pressure

Thank you

Incorrect value of spurious tags when no overlapping

Hello again,

I have two .ann files.

The gold

T1  Medical-Concept 36 41   tumor
T2  Medical-Concept 327 351 síndrome mielodisplásica
T3  Medical-Concept 440 445 tumor
T4  Medical-Concept 22 32   morfologia
T5  Medical-Concept 79 117  Nomenclatura Sistematizada de Medicina
T6  Medical-Concept 120 126 SNOMED
T7  Medical-Concept 189 204 Linfoma maligno
T8  Medical-Concept 207 216 folicular
T9  Medical-Concept 220 227 nodular
T10 Medical-Concept 270 310 Anemia refratária com excesso de blastos
T11 Medical-Concept 356 366 deleção 5q
T12 Medical-Concept 368 371 5q-

And the candidate set

T1  Medical-Concept 270 287 Anemia refratária
T2  Medical-Concept 327 335 Síndrome
T3  Medical-Concept 471 476 seção

For the comparison I'm running the following code

from bratutils import agreement as a


__author__ = 'Aleksandar Savkov'

doc = '3711'
gold = a.Document('../res/ht_gold/' + doc + '.ann')
extension = a.Document('../res/ht_extension/' + doc + '.ann')

gold.make_gold()
statistics = extension.compare_to_gold(gold)

print statistics

This should produce as result: 0 correct, 12 missing and 3 spurious tags. Right?

The produced result is 3 missing tags and 0 correct/partial/spurious. I think the spurious tags are not being correctly handled.

Is my thinking right, or this is actually the desired output?

Hugo

Relations and attributes crash the parsing function

As reported in #14 relations and attributes crash the parsing function. This should be easily fixed as the problems seem to be the way the parsing is done -- not generic enough. It also looks like a good place to start in supporting relations and attributes in agreement.

Attribute support

I have never used the attributes annotation and am not sure of the tasks that they are part of. It would be nice if someone takes the lead in designing this. I would be willing to help with integrating it into the project.

Relations not supported

Hi,

Was very happy to find this code I was looking for something to compare Brat annotations across files.
Did you ever look at implementing relations?

Relations support

Support for relations has been long asked for but I've been reluctant to implement it because the code is not my best and am reluctant to go back into the heavy logic. However, I just worked on getting the parsing function to handle gracefully all types and it looks like relations can be implemented in a way that is self-contained and probably quite straightforward. So I'll lay out what I want to do here and ask for feedback.


Relations are effectively triples of two arguments and a relation type. Assuming that the possible arguments are predetermined, e.g. arguments can only be tokens, or chunks or some other pre-annotated spans, evaluating the agreement is really quite easy -- F1-score where each triple is treated as a unique annotation. I can probably copy lost of the code straight from bioeval.

I haven't thought about this for too long but using F1-score seems to be a bit of a copout here. The probability of a random assignment of a relation is not infinitely small. So maybe kappa can be implemented here instead.

Additionally, in many cases the arguments are not necessarily predetermined, so that would be quite hard to evaluate at the same time and honestly I have no idea how to do it ATM.

So I'm looking for some input here. Would be nice to hear what you think.

cc @jeanphilippegoldman @soluna1

Apply to attributes and relations too

Hi Savkov,

Knowing this tool before would have saved me lot of time. I used NLTK package to measure IAA of brat annotation files. A bit a nightmare to convert the "ann" file to something readable. So, I think that this tool is very useful, and the code is great, congratulations!

Our problem is that we have data structured in this way:

T1 Food 24 31 bacalao
T2 Restaurant 0 8 Un sitio
T3 Restaurant 46 54 Un lugar
T5 Restaurant 55 66 con encanto
A3 Polarity T5 POS
A4 Restaurant_Aspects T5 General_experience
R2 refers_to Arg1:T5 Arg2:T3
T4 Food 34 43 riquísimo
A1 Polarity T4 POS
A2 Food_Aspects T4 General_experience
R1 refers_to Arg1:T4 Arg2:T1

And we want to measure agreement for the 3 categories, entities (Food#Bacalao), attribute ( aspect -->General_experience#con encanto; and polarity --> POS), and also relationships (R1 refers_to...). Are you planning to implement these options too? It would be really useful for annotation at aspect-based Sentiment Analysis.

Many thanks

Document instance has no attribute 'postag_list'

Hello. I need to compare automatic annotations performed by a software application with manual annotations (in brat standoff format), and this seems to be a nice tool to use.

While testing it and trying to understand the source code, I tried the following small sample code

import agreement as a

doc = a.Document("myfile.ann")
doc2 = a.Document("myfile.ann")

doc.make_gold()
statistics = doc2.compare_to_gold(doc)

However, on the execution of compare_to_gold function, it says that Document instance has no attribute 'postag_list', which is true, but I don't understand where this comes from either.

Am I missing something? Could you eventually post a small working example for comparing two .ann files? I'd appreciate that.

Thanks.

relations not supported still

I'm guessing relations never got added as I still receive errors. Has anyone come up with a simple fix to ignore relations so it still runs?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.