Comments (11)
So the issue is generated by the acronyms in particular when the onlyNER is selected the NER process is executed and socialiste-revolutionaire
is not found by the NER engine.
In the following step the acronyms are injected in the list:
// inject explicit acronyms
entities = ProcessText.acronymCandidates(nerdQuery, entities);
They have indeed no type as they are not NER entities.
@kermitt2 does it make sense to have the acronym mixed with the NER when the query is executed having onlyNER: true
?
from entity-fishing.
I will check this one because in branch 0.0.3, I've modified the mention, acronym stuff and removed the onlyNER option.
(normally acronyms are not mixed anymore with NER mention)
from entity-fishing.
I'm checking this, from what I can understand the acronym are always processed independently from the mention recognition.
Shouldn't they be separated from the mention recognition? Or at least be flagged as acronym in the output?
from entity-fishing.
What also about having back some flag that disable the disambiguation and provide only mentions?
from entity-fishing.
I don't understand the question on acronyms... but we could flag entities which are recognized as acronyms, the issue is when we have an acronym not introduced in the input text - this is not an information we get from wikidata/wikipedia only
providing only mentions -> this tool is focusing on entity disambiguation, I would say if users are only interested in mentions, they could just use the external modules for this (grobid-ner, etc.). The other problem is that users don't want just mentions in general, they want also entity classes (person, species, astronomical object, etc.), which is ultimately relying on WikiData here, so on disambiguation...
from entity-fishing.
My question about acronyms is based on the example whether flagging the entities recognised as Acronym could be a solution to provide some more information to the client.
What do you mean with acronym not introduced in the input text? Something like [...]blablabla Do it Yourself blablabla[...]
recognise Do It Yourself
as DIY
?
Regarding the mentions, indeed users can do use other tools, but grobid-ner for example doesn't have any API, well entity-fishing was supposed to be one kermitt2/grobid-ner#56 (comment).
When I say mention, I say mention+attributes based on the method used: so for example you could recognise mentions from ner and the NE class and mention from wikipedia without any NE class.
Actually the option could be just disambiguate: false
to disable it.
from entity-fishing.
Regarding the option onlyNER
for backward compatibility I think is a better practice to add back the option in the API but marking it as deprecated, which will be removed in the following release.
from entity-fishing.
The use of "onlyNER": true cannot be handled by Pdf text
from entity-fishing.
- check documentation
- working only for text, not working on PDF
- working only for en and fr
- if set = true make sure the disambiguation is disabled
from entity-fishing.
The test for this issue was done as follows:
- Test case: check documentation whether handle the onlyNER issue or not.
- check the option onlyNER exists in Nerd's documentation
- Result: Pass
- Test case: check whether onlyNER works only for the text and not for PDF files
- for the text
- Result: Pass
- Test case: check whether onlyNER works only for EN and FR (since it use Grobid-Ner which works currently just for English and French)
- onlyNER for English language
- onlyNER for Italian language
- Result: Pass
- Test case: If onlyNER is set into
TRUE
, the disambiguation is disabled
- onlyNER is set into
true
will result the mentionPaul von Hindenburg
asPaul
andvon Hindenburg
- onlyNER is not set will raise the ambiguation result as a full
Paul von Hindenburg
mention.
- Result: Pass
from entity-fishing.
Conclusion: this issue is closed with the reason that all the test cases given are met and passed.
from entity-fishing.
Related Issues (20)
- Impact of text length on identified entities HOT 3
- Problem of disambiguation of ENs according to the case or spelling of terms HOT 1
- Sometimes entity fishing returns "Invalid id or excluded via caching" as rawName of preferredForm HOT 1
- Different results when supplying entity spans HOT 2
- Named Entity Recognition and Classification for languages other than EN/FR HOT 4
- Case and term selection for French
- Bad formatting of json response HOT 1
- Add an option to make a warm-up of lower/upper KB databases at startup HOT 1
- EF display of French dates HOT 1
- Add an option to retrieve a text only wikidata definition from entity ? HOT 7
- not able to build entity-fishing HOT 2
- Support for Swedish language HOT 6
- Japanese language alpha 2 misconifgured HOT 3
- Docker HOT 2
- HTTP 2.0 support or support for request batching? HOT 1
- installation failes (at arm64) HOT 2
- Dutch language support HOT 2
- General Statistics of Retrievable Wiki Entities HOT 4
- The first request to disambiguate is slow and also memory is growing as more requests are coming in HOT 1
- Entity Fishing Service Randomly not Yielding Wiki Link HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from entity-fishing.