Code Monkey home page Code Monkey logo

Comments (11)

lfoppiano avatar lfoppiano commented on June 11, 2024

So the issue is generated by the acronyms in particular when the onlyNER is selected the NER process is executed and socialiste-revolutionaire is not found by the NER engine.

In the following step the acronyms are injected in the list:

// inject explicit acronyms
entities = ProcessText.acronymCandidates(nerdQuery, entities);

They have indeed no type as they are not NER entities.

@kermitt2 does it make sense to have the acronym mixed with the NER when the query is executed having onlyNER: true?

from entity-fishing.

kermitt2 avatar kermitt2 commented on June 11, 2024

I will check this one because in branch 0.0.3, I've modified the mention, acronym stuff and removed the onlyNER option.
(normally acronyms are not mixed anymore with NER mention)

from entity-fishing.

lfoppiano avatar lfoppiano commented on June 11, 2024

I'm checking this, from what I can understand the acronym are always processed independently from the mention recognition.

Shouldn't they be separated from the mention recognition? Or at least be flagged as acronym in the output?

from entity-fishing.

lfoppiano avatar lfoppiano commented on June 11, 2024

What also about having back some flag that disable the disambiguation and provide only mentions?

from entity-fishing.

kermitt2 avatar kermitt2 commented on June 11, 2024

I don't understand the question on acronyms... but we could flag entities which are recognized as acronyms, the issue is when we have an acronym not introduced in the input text - this is not an information we get from wikidata/wikipedia only

providing only mentions -> this tool is focusing on entity disambiguation, I would say if users are only interested in mentions, they could just use the external modules for this (grobid-ner, etc.). The other problem is that users don't want just mentions in general, they want also entity classes (person, species, astronomical object, etc.), which is ultimately relying on WikiData here, so on disambiguation...

from entity-fishing.

lfoppiano avatar lfoppiano commented on June 11, 2024

My question about acronyms is based on the example whether flagging the entities recognised as Acronym could be a solution to provide some more information to the client.
What do you mean with acronym not introduced in the input text? Something like [...]blablabla Do it Yourself blablabla[...] recognise Do It Yourself as DIY?

Regarding the mentions, indeed users can do use other tools, but grobid-ner for example doesn't have any API, well entity-fishing was supposed to be one kermitt2/grobid-ner#56 (comment).

When I say mention, I say mention+attributes based on the method used: so for example you could recognise mentions from ner and the NE class and mention from wikipedia without any NE class.
Actually the option could be just disambiguate: false to disable it.

from entity-fishing.

lfoppiano avatar lfoppiano commented on June 11, 2024

Regarding the option onlyNER for backward compatibility I think is a better practice to add back the option in the API but marking it as deprecated, which will be removed in the following release.

from entity-fishing.

tantikristanti avatar tantikristanti commented on June 11, 2024

The use of "onlyNER": true cannot be handled by Pdf text

screen shot 2018-01-15 at 16 06 25

from entity-fishing.

lfoppiano avatar lfoppiano commented on June 11, 2024
  1. check documentation
  2. working only for text, not working on PDF
  3. working only for en and fr
  4. if set = true make sure the disambiguation is disabled

from entity-fishing.

tantikristanti avatar tantikristanti commented on June 11, 2024

The test for this issue was done as follows:

  1. Test case: check documentation whether handle the onlyNER issue or not.
  1. Test case: check whether onlyNER works only for the text and not for PDF files
  • for the text

onlynertrue

- for the Pdf file

onlynertruepdf

  • Result: Pass
  1. Test case: check whether onlyNER works only for EN and FR (since it use Grobid-Ner which works currently just for English and French)
  • onlyNER for English language

onlynerenglish

  • onlyNER for Italian language

onlyneritalian

  • Result: Pass
  1. Test case: If onlyNER is set into TRUE, the disambiguation is disabled
  • onlyNER is set into true will result the mention Paul von Hindenburg as Paul and von Hindenburg

onlynerdisambiguate

  • onlyNER is not set will raise the ambiguation result as a full Paul von Hindenburg mention.

noonlyneroption

  • Result: Pass

from entity-fishing.

tantikristanti avatar tantikristanti commented on June 11, 2024

Conclusion: this issue is closed with the reason that all the test cases given are met and passed.

from entity-fishing.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.