kermitt2 / entity-fishing Goto Github PK
View Code? Open in Web Editor NEWA machine learning tool for fishing entities
Home Page: http://nerd.readthedocs.io/
License: Apache License 2.0
A machine learning tool for fishing entities
Home Page: http://nerd.readthedocs.io/
License: Apache License 2.0
Iin branch 0.0.3, we use now yaml config files, so remove this last bit of old-style config.
The label for an empty result of wikiName in a class NEDCorpusEvaluation is currenty only in 'NIL'. It is needed also that the empty wikiName in 'null'. So, the label has to be adjusted whether in 'null' or 'NIL'.
When a PDF doesn't contain text, grobid correctly respond NO_BLOCK
but somewhere a NPE is thrown:
05 Sep 2017 14:10.24 [DEBUG] NerdRestProcessFile - >> received query to process: {"language":{"lang":"en"},"onlyNER":false,"resultLanguages":["de","fr"],"nbest":false,"customisation":"generic"}
05 Sep 2017 14:10.24 [DEBUG] IOUtilities - >> set origin document for stateless service'...
05 Sep 2017 14:10.24 [DEBUG] NerdRestProcessFile - >> input PDF file saved locally...
05 Sep 2017 14:10.24 [DEBUG] NerdRestProcessFile - >> set query object...
05 Sep 2017 14:10.24 [DEBUG] NerdRestProcessFile - >> language provided in query: en;1.0
05 Sep 2017 14:10.24 [DEBUG] DocumentSource - start pdf2xml
05 Sep 2017 14:10.24 [DEBUG] DocumentSource - Executing command: [bash, -c, ulimit -Sv 6242304 && /Users/lfoppiano/development/inria/grobid/grobid-home/pdf2xml/mac-64/pdftoxml -blocks -noImageInline -fullFontName -noImage -annotation '/Users/lfoppiano/development/inria/grobid/grobid-home/tmp/origin8744748248510258475.pdf' /Users/lfoppiano/development/inria/grobid/grobid-home/tmp/davgUBsHYD.lxml]
05 Sep 2017 14:10.24 [DEBUG] DocumentSource - Executing: [bash, -c, ulimit -Sv 6242304 && /Users/lfoppiano/development/inria/grobid/grobid-home/pdf2xml/mac-64/pdftoxml -blocks -noImageInline -fullFontName -noImage -annotation '/Users/lfoppiano/development/inria/grobid/grobid-home/tmp/origin8744748248510258475.pdf' /Users/lfoppiano/development/inria/grobid/grobid-home/tmp/davgUBsHYD.lxml]
05 Sep 2017 14:10.24 [DEBUG] DocumentSource - pdf2xml process finished. Time to process:32ms
05 Sep 2017 14:10.24 [ERROR] NerdRestProcessFile - Cannot process input pdf file.
org.grobid.core.exceptions.GrobidException: [NO_BLOCKS] PDF parsing resulted in empty content
at org.grobid.core.document.Document.addTokenizedDocument(Document.java:408)
at org.grobid.core.engines.Segmentation.processing(Segmentation.java:94)
at com.scienceminer.nerd.service.NerdRestProcessFile.processQueryAndPdfFile(NerdRestProcessFile.java:110)
at com.scienceminer.nerd.service.NerdRestService.processQueryJson(NerdRestService.java:128)
at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:833)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
at org.eclipse.jetty.websocket.server.WebSocketUpgradeFilter.doFilter(WebSocketUpgradeFilter.java:206)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:564)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:317)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
at org.eclipse.jetty.util.thread.Invocable.invokePreferred(Invocable.java:128)
at org.eclipse.jetty.util.thread.Invocable$InvocableExecutor.invoke(Invocable.java:222)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:294)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:199)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:673)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:591)
at java.lang.Thread.run(Thread.java:745)
05 Sep 2017 14:10.24 [INFO ] NerdRestProcessFile - runtime: 57
05 Sep 2017 14:10.24 [ERROR] NerdRestProcessFile - An unexpected exception occurs.
java.lang.NullPointerException
at java.util.Collections.sort(Collections.java:141)
at com.scienceminer.nerd.service.NerdRestProcessFile.processQueryAndPdfFile(NerdRestProcessFile.java:359)
at com.scienceminer.nerd.service.NerdRestService.processQueryJson(NerdRestService.java:128)
at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:833)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
at org.eclipse.jetty.websocket.server.WebSocketUpgradeFilter.doFilter(WebSocketUpgradeFilter.java:206)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:564)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:317)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
at org.eclipse.jetty.util.thread.Invocable.invokePreferred(Invocable.java:128)
at org.eclipse.jetty.util.thread.Invocable$InvocableExecutor.invoke(Invocable.java:222)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:294)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:199)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:673)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:591)
at java.lang.Thread.run(Thread.java:745)
05 Sep 2017 14:10.24 [DEBUG] NerdRestProcessFile - << com.scienceminer.nerd.service.NerdRestProcessFile.methodLogOut
From the traditional features, we still need to experiment with:
prob_i
conditional probability of the string given the concept (i.e. reverse prob_c
- this is given by db pageLabel, currently not loaded in LMDB)
a lexical cohesion measure, e.g. log likelyhood, DICe coefficient or PMI
Not traditional, we also need to experiment with:
Add the possibility to set the parameters minRankerScore
and minSelectorScore
at query level.
The pre-annotation process is taking in input a directory of documents (text files) (or if they are supplied as pdf, the pdf content is extracted into text files before further processing) and is supposed to generate the xml annotation file using the current models.
The xml annotation file is then corrected by a human to be shared as gold standard.
The process should take as parameter the input location
and an optional output location
. if the output location is not specified the xml would be placed inside the input directory.
Generate a build for a servlet container, for example tomcat, ideally
We also need to correctly tokenise grobid-home
, grobid.properties
and data
directory from tomcat to a specific location
dependencies:
The customisation are not behaving as expected (or maybe I didn't understand), here the example:
POST /customisation
value:
{
"customisation": {
"wikipedia": [
105942, 1499966, 4105431
],
"lang": "fr",
"texts": [
"Place de la République, Hotel Moderne, vaste batisse où étaient logées les petites souris grises, d’autres disent « les Salamandres », jeunes allemandes en uniforme. Elles partent et elles ne pouvait emporter qu’un léger bagage à la main. Jetons aux combinards de Vichy et de Washington, en défi, une tête de traître."
],
"description": "customisation for the ww2 french liberation"
}
}
name: ww2fr
But then when analysing the sentence:
Jetons aux combinards de Vichy et de Washington, en défi, une tête de traître.
Vichy is not recognised as Regime de Vichy
but as Vichy
the town, when in the customisation I have added the wikipedia id of the Regime de Vichy
.
Detect acronyms introduced (explicitly or not) in a document , and maintain them as possible mention in the current document.
Example: frequent for name of species (C. Lupus, C. n. gregoryi), Cigarette smoke (CS)-induced
Currently /disambiguate
service supports only multipart/form-data
as content-type, even if the parameter query
is in json
without PDF file.
We should also support application/json
as content-type when the request has only a json
parameter without PDF file.
Since we are starting the migration with GROBID, is good to have a task for entity-fishing as well not to forget
Wikipedia redirects and anchors cover most of the frequent morphosyntactic variants (e.g. plurial), but not in an exhaustive manner - we coud add a process (or pre-process) to support them.
When the test is done for the PDF disambiguation, there are different result given both from localhost and 'nerd.huma-num.fr/test/'. These different results appeared for the English and French languages. Meanwhile, for the Italian and Spanish language, the service didn't give anything as results.
There is a inconsistency between german, french and english.
It seems that german and french rankers haven't been migrated to GradientTreeBoost model, so at the moment they throw a CCE:
08 Dec 2017 11:53.12 [DEBUG] NerdEngine - Fail to compute ranker score.
java.lang.ClassCastException: smile.regression.RandomForest cannot be cast to smile.regression.GradientTreeBoost
at com.scienceminer.nerd.disambiguation.NerdRanker.getProbability(NerdRanker.java:139)
at com.scienceminer.nerd.disambiguation.NerdEngine.rank(NerdEngine.java:964)
at com.scienceminer.nerd.disambiguation.NerdEngine.disambiguate(NerdEngine.java:242)
at com.scienceminer.nerd.service.NerdRestProcessQuery.processQueryText(NerdRestProcessQuery.java:154)
at com.scienceminer.nerd.service.NerdRestProcessQuery.processQuery(NerdRestProcessQuery.java:50)
at com.scienceminer.nerd.service.NerdRestService.processQueryJson(NerdRestService.java:129)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:833)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
at org.eclipse.jetty.websocket.server.WebSocketUpgradeFilter.doFilter(WebSocketUpgradeFilter.java:206)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:564)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:317)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
at org.eclipse.jetty.util.thread.Invocable.invokePreferred(Invocable.java:128)
at org.eclipse.jetty.util.thread.Invocable$InvocableExecutor.invoke(Invocable.java:222)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:294)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:126)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:673)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:591)
at java.lang.Thread.run(Thread.java:745)
For references, some examples/resources related to this:
To be checked whether
Here the issues:
e.g. in the following query:
{
"onlyNER": false,
"nbest": false,
"text": "We are heading to Washington. The cat is on the Table in Milan.",
"processSentence": [0-1],
"sentences": [
{
"offsetStart": 0,
"offsetEnd": 29
},
{
"offsetStart": 29,
"offsetEnd": 63
}
]
}
the "processSentence":[0-1]
would result in
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Unexpected character ('-' (code 45)): was expecting comma to separate Array entries
Two solutions, either we (a) allow only integers like [0,1,2,3] or we modify that item as (b) string like ['0','1-2']
Washington
:{
"onlyNER": false,
"nbest": false,
"text": "We are heading to Washington. The cat is on the Table in Milan.",
"processSentence": [0],
"sentences": [
{
"offsetStart": 0,
"offsetEnd": 29
},
{
"offsetStart": 29,
"offsetEnd": 63
}
]
}
The creation of some entity vector is pretty straightforward and could be used as additional/alternative context relevance measure. A context is modelled as the centroid of the vectors representing its words (v_context), and the relevance of a given entity e (with vector v_e) is the cosine cos(v_context,v_e). One clear advantage over the relatedness measure is that it will be much faster.
See http://www.di.unipi.it/~ottavian/files/wsdm15_fel.pdf or https://github.com/ot/entity2vec
De-hypenization in PDF is a bit more complicated to manage than in text because we have to keep track of the coordinates via multiple layout tokens for a single text token.
Hello Team,
Whenever I am trying to run mvn clean install for build purpose , BUILD FAILURE occurs with the following error
[ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 16.975 s <<< FAILURE! - in com.scienceminer.nerd.disambiguation.TestProcessText
[ERROR] testProcess(com.scienceminer.nerd.disambiguation.TestProcessText) Time elapsed: 4.156 s <<< ERROR!
java.lang.NoSuchFieldError: year
at com.scienceminer.nerd.disambiguation.TestProcessText.testProcess(TestProcessText.java:50)
[INFO] Running com.scienceminer.nerd.disambiguation.SentenceTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 s - in com.scienceminer.nerd.disambiguation.SentenceTest
[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] TestProcessText.testProcess:50 » NoSuchField year
[INFO]
[ERROR] Tests run: 30, Failures: 0, Errors: 1, Skipped: 0
What is this [ERROR] TestProcessText.testProcess:50 » NoSuchField year
What I should do after this so that BUILD SUCCESS occurs ?
(N)ERD is a working name...
Ideas so far: entity-fishing, entity-bazaar, ...
Hi @lfoppiano,
I am trying to train with wikipedia articles
i used the below command
bash > mvn compile exec:exec -Ptrain_annotate_en
$ mvn compile exec:exec -Ptrain_annotate_en
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for com.scienceminer.nerd:nerd-service:war:0.0.2
[WARNING] 'build.plugins.plugin.version' for org.codehaus.mojo:exec-maven-plugin is missing. @ line 256, column 29
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-jar-plugin is missing. @ line 47, column 21
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
Downloading: https://repo.maven.apache.org/maven2/org/codehaus/mojo/exec-maven-plugin/maven-metadata.xml
Downloaded: https://repo.maven.apache.org/maven2/org/codehaus/mojo/exec-maven-plugin/maven-metadata.xml (741 B at 0.3 KB/sec)
Downloading: https://repo.maven.apache.org/maven2/org/codehaus/mojo/exec-maven-plugin/1.6.0/exec-maven-plugin-1.6.0.pom
Downloaded: https://repo.maven.apache.org/maven2/org/codehaus/mojo/exec-maven-plugin/1.6.0/exec-maven-plugin-1.6.0.pom (13 KB at 11.0 KB/sec)
Downloading: https://repo.maven.apache.org/maven2/org/codehaus/mojo/mojo-parent/40/mojo-parent-40.pom
Downloaded: https://repo.maven.apache.org/maven2/org/codehaus/mojo/mojo-parent/40/mojo-parent-40.pom (33 KB at 20.3 KB/sec)
Downloading: https://repo.maven.apache.org/maven2/org/codehaus/mojo/exec-maven-plugin/1.6.0/exec-maven-plugin-1.6.0.jar
Downloaded: https://repo.maven.apache.org/maven2/org/codehaus/mojo/exec-maven-plugin/1.6.0/exec-maven-plugin-1.6.0.jar (57 KB at 25.0 KB/sec)
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building (N)ERD 0.0.2
[INFO] ------------------------------------------------------------------------
[WARNING] The POM for org.grobid.ner:grobid-ner:jar:0.4.3-SNAPSHOT is missing, no dependency information available
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ nerd-service ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 5 resources
[INFO]
[INFO] --- maven-compiler-plugin:3.6.1:compile (default-compile) @ nerd-service ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-jar-plugin:3.0.2:jar (make-a-jar) @ nerd-service ---
[INFO] Building jar: /Users/kvincent1/Desktop/Factbot-simplest/nerd/target/nerd-service-0.0.2.jar
[INFO]
[INFO] --- exec-maven-plugin:1.6.0:exec (default-cli) @ nerd-service ---
Downloading: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-component-annotations/1.5.4/plexus-component-annotations-1.5.4.pom
Downloaded: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-component-annotations/1.5.4/plexus-component-annotations-1.5.4.pom (815 B at 1.0 KB/sec)
Downloading: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-containers/1.5.4/plexus-containers-1.5.4.pom
Downloaded: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-containers/1.5.4/plexus-containers-1.5.4.pom (5 KB at 5.1 KB/sec)
Downloading: https://repo.maven.apache.org/maven2/org/apache/commons/commons-exec/1.3/commons-exec-1.3.pom
Downloaded: https://repo.maven.apache.org/maven2/org/apache/commons/commons-exec/1.3/commons-exec-1.3.pom (11 KB at 11.8 KB/sec)
Downloading: https://repo.maven.apache.org/maven2/org/apache/commons/commons-parent/35/commons-parent-35.pom
Downloaded: https://repo.maven.apache.org/maven2/org/apache/commons/commons-parent/35/commons-parent-35.pom (57 KB at 22.8 KB/sec)
Downloading: https://repo.maven.apache.org/maven2/org/apache/commons/commons-exec/1.3/commons-exec-1.3.jar
Downloaded: https://repo.maven.apache.org/maven2/org/apache/commons/commons-exec/1.3/commons-exec-1.3.jar (54 KB at 15.2 KB/sec)
init upper level language independent environment
building Environment for upper knowledge base
Environment built - 9155139 concepts.
init Environment for language en
building Environment for language en
isLoaded: true
Environment built - 14651883 pages.
domains en / isLoaded: true
Warning: Orthopedic surgery is not a category found in Wikipedia.
Warning: Environment is not a category found in Wikipedia.
init Environment for language de
building Environment for language de
isLoaded: true
Environment built - 3523959 pages.
init Environment for language fr
building Environment for language fr
isLoaded: true
Environment built - 3631810 pages.
GROBID_HOME=/Users/kvincent1/Desktop/Factbot-simplest/grobid/grobid-home
building full markup database for language en
markupFull / isLoaded: false
com.scienceminer.nerd.exceptions.NerdResourceException: Markup file not found
at com.scienceminer.nerd.kb.db.MarkupDatabase.loadFromXmlFile(MarkupDatabase.java:108)
at com.scienceminer.nerd.kb.db.KBLowerEnvironment.buildFullMarkup(KBLowerEnvironment.java:291)
at com.scienceminer.nerd.kb.LowerKnowledgeBase.loadFullContentDB(LowerKnowledgeBase.java:74)
at com.scienceminer.nerd.training.WikipediaTrainer.(WikipediaTrainer.java:60)
at com.scienceminer.nerd.training.WikipediaTrainer.main(WikipediaTrainer.java:130)
Create article sets...
Article sample is empty for set 0
Article sample is empty for set 1
Article sample is empty for set 2
Article sample is empty for set 3
Article sample is empty for set 4
Create Ranker arff files...
Exception in thread "main" java.lang.NullPointerException
at com.scienceminer.nerd.disambiguation.NerdRanker.train(NerdRanker.java:168)
at com.scienceminer.nerd.training.WikipediaTrainer.createRankerArffFiles(WikipediaTrainer.java:92)
at com.scienceminer.nerd.training.WikipediaTrainer.main(WikipediaTrainer.java:136)
[ERROR] Command execution failed.
org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166)
at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:804)
at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:751)
at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:313)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 13:23 min
[INFO] Finished at: 2017-09-02T19:53:10+05:30
[INFO] Final Memory: 18M/178M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:exec (default-cli) on project nerd-service: Command execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
Extend the query filter
to express more complex expression on the Wikidata property/value
as subject, as continuation of #20
Example (by removing the wikipediaRefId, the entity is not taken in consideration):
{
"text": "Austria invaded and fought the Serbian army at the Battle of Cer and Battle of Kolubara beginning on 12 August.",
"language": {
"lang": "en"
},
"entities": [
{
"rawName": "German Army",
"offsetStart": 1107,
"offsetEnd": 1118,
"wikipediaExternalRef": 11702744,
"wikidataId": "Q701923"
}
]
}
Hi,
How do i start nerd with specific port
I have entity fishing running on port number 8090
Currently CORS is allowed for any domains by default for all entity-fishing services.
Add & support a parameter in the yaml config file to allow or not CORS, either for any domains (*) or some selected domains.
In the current implementation, the embeddings are currently loaded in memory, which means having 1.5-2 Gb for each language.
need to add more information
.vec
which shoudl be references as the wikipedia filesAs in title, we could process each page (or each language) in parallel.
Note: This issue is not urgent as normal users will get the pre-processed data already.
I write here not to forget. Here there is an examples to be checked from the page of Charlemagne
:
Charlemagne, du latin Carolus Magnus, ou Charles Ier dit « le Grand », né le 2 avril 742 (voire 747 ou 748)2, mort le 28 janvier 814 à Aix-la-Chapelle, est un roi des Francs et empereur. Il appartient à la dynastie des Carolingiens, à laquelle il a donné son nom.\nFils de Pépin le Bref, il est roi des Francs à partir de 768, devient par conquête roi des Lombards en 774 et est couronné empereur à Rome par le pape Léon III le 25 décembre 800, relevant une dignité disparue depuis la chute de l'Empire romain d'Occident en 476.\nRoi guerrier, il agrandit notablement son royaume par une série de campagnes militaires, en particulier contre les Saxons païens dont la soumission fut difficile et violente (772-804), mais aussi contre les Lombards en Italie et les musulmans d'Al-Andalus.
The token Charles Ier
is disambiguated with the Charles Ier (empereur d'Autriche)
When searching for it in the term lookup
there is no confidence and the id is not pointing to the right wikipedia page (but works fine the wikidata id):
Something to be checked
currently in branch 0.0.3 -> the iitb corpus is not exactly in the same format as the other corpus, and the evaluation is not working on it. Annotations are given without distinguishing the document where they occur, while this is required in the current evaluation.
A bit of XML massage on the file iitb.xml is required :)
There's actually more Javascript code & libraries than java while (N)ERD is written in Java!
Use bower.
Remove tab Doc and move all the information regarding the documentation and the github repository in the About tab
Branch 0.0.3 contains a corpus-based evaluation together most of the usual NED corpora (ACE, AQUAINT, AIDA-CONLL, MSNBC, ...).
However, it would be good to plug the tool on GERBIL for third party evaluation and comparison with other entity disambiguation tools.
http://aksw.org/Projects/GERBIL.html
(but this cannot replace the existing evaluation, as more detailed eval and intermediary results are in practice needed)
Very nice to keep track of potential issues with dependencies, thanks @lfoppiano
In the demo console, all the statements are listed in the infobox making it sometime ridiculously huge. We would need to put the statements in a collapsible element to have a cleaner infobox.
The following query:
{"text": "Un compte rendu de Jean-Guillaume Lanuque (avec l'aide amicale de Christian Beuvain) C\u2019est un pan plut\u00f4t m\u00e9connu de l\u2019histoire de la Russie r\u00e9volutionnaire en ses premi\u00e8res ann\u00e9es que Giles Milton nous \u00e9claire, celui de l\u2019action des services secrets britanniques, le fameux MI6 (n\u00e9 juste avant la Premi\u00e8re Guerre mondiale, sous la houlette de Mansfield Cumming), en terre russe. Pour ce faire, l\u2019auteur a mis \u00e0 profit des sources g\u00e9n\u00e9ralement in\u00e9dites en langue fran\u00e7aise, m\u00e9moires \u00e9crits des agents et documents d\u2019archives. Le tout est pr\u00e9sent\u00e9 comme un roman, sans notes de bas de page (les r\u00e9f\u00e9rences sont concentr\u00e9es en fin d\u2019ouvrage), et si la lecture en est d\u2019autant plus ais\u00e9e, un sentiment g\u00eanant se d\u00e9gage assez rapidement\u00a0: Giles Milton prend en effet le plus souvent pour argent comptant tout ce que lui apprennent ses sources \u2013 alors que des t\u00e9moignages, d'agents secrets qui plus est, demandent au minimum une critique interne pouss\u00e9e \u2013 et ne fait pas l\u2019effort de les croiser de mani\u00e8re syst\u00e9matique1\u00a0; il fait \u00e9galement sienne la pr\u00e9vention des dirigeants britanniques \u00e0 l\u2019\u00e9gard des bolcheviques, syst\u00e9matiquement pr\u00e9sent\u00e9s ici sous un jour n\u00e9gatif, gestionnaires incomp\u00e9tents, violents et barbares (le terme de terrorisme revient aussi \u00e0 plusieurs reprises)2. En dehors de ce parti pris pesant, le livre de Giles Milton se pr\u00e9sente comme une synth\u00e8se, bien qu'incompl\u00e8te. Elle d\u00e9bute avec la participation des services secrets britanniques \u00e0 l\u2019assassinat de Raspoutine (sans d\u2019ailleurs approfondir le sujet en d\u00e9tail), et se poursuit avec la mission de Somerset Maugham \u2013 agent secret, romancier c\u00e9l\u00e8bre, dramaturge \u2013 charg\u00e9 d\u2019apporter le soutien (y compris financier) des anglo-saxons \u00e0 Kerenski, par crainte de la d\u00e9fection russe dans la guerre. Avec l\u2019arriv\u00e9e au pouvoir des bolcheviques, le Royaume-Uni dispose d\u00e9j\u00e0 d\u2019un homme dans la place\u00a0: Arthur Ransome, journaliste occupant une position privil\u00e9gi\u00e9e car proche des nouveaux dirigeants (dont Karl Radek, qui le pr\u00e9senta \u00e0 L\u00e9nine et Trotsky), sympathisant de la r\u00e9volution d'octobre, mais qui, selon l'auteur, aurait jou\u00e9 un r\u00f4le d\u2019agent double (en renseignant le sous-secr\u00e9taire du Foreign Office, Lord Robert Cecil), tout en se mettant en couple avec Evguenia Chelepina, secr\u00e9taire de Trotsky (p. 90-93). Mansfield Cumming, dans le m\u00eame temps, d\u00e9pla\u00e7a son bureau russe \u00e0 Stockholm, et envoya en Russie m\u00eame deux agents, Sidney Reilly et George Hill. Les d\u00e9tails sur les identit\u00e9s multiples \u00e0 b\u00e2tir, les pr\u00e9cautions \u00e0 prendre ou les m\u00e9thodes \u00e0 utiliser afin de transmettre des messages sont ici dignes d\u2019un roman d'espionnage. Le r\u00e9seau mont\u00e9 par Sidney Reilly, en particulier, comprenait des individus bien introduits dans le syst\u00e8me de pouvoir, ainsi du colonel Aleksandr V. Friede, un Letton qui travaillait au Commissariat du peuple \u00e0 la guerre, sans compter Boris Bajanov, qui selon ses m\u00e9moires, sur lesquelles s'appuie l'auteur3, fut un agent double d\u00e8s son adh\u00e9sion au parti, en 1919. M\u00eame si Giles Milton n\u2019aborde pas les choses sous cet angle, on a l\u00e0 autant d\u2019\u00e9l\u00e9ments, impliquant les Britanniques, qui permettent de comprendre, au moins en partie, la m\u00e9fiance croissante des nouveaux dirigeants face \u00e0 un cercle d'ennemis bien r\u00e9els, et le r\u00f4le exponentiel d\u00e9volu \u00e0 la Tcheka. Avec le d\u00e9but de l\u2019intervention \u00e9trang\u00e8re en Russie, afin d\u2019aider les forces anti-bolcheviques, Hill et Reilly basculent compl\u00e8tement dans la clandestinit\u00e9. Leur action se concentre alors sur le renseignement et le sabotage. Reilly va jusqu\u2019\u00e0 concevoir un projet de coup d\u2019\u00c9tat visant \u00e0 renverser le pouvoir bolchevique. Pour ce faire, Giles Milton pr\u00e9tend qu'il se serait attir\u00e9 la complicit\u00e9 d\u2019\u00c9douard Berzine4, commandant du premier r\u00e9giment letton de fusiliers (sous un pr\u00e9texte tellement l\u00e9ger \u2013 le souhait de rentrer au pays \u2013 qu\u2019il frise sans doute la manipulation), ainsi que des financements fran\u00e7ais et \u00e9tatsuniens\u00a0; un gouvernement provisoire avait m\u00eame \u00e9t\u00e9 \u00e9labor\u00e9, avec la participation de Ioudenitch. Toutefois, ce plan ambitieux ne rentra jamais en application, devanc\u00e9 \u00e0 quelques jours pr\u00e8s par les assassinats (ou tentatives) perp\u00e9tr\u00e9s sur Moiss\u00e9i Ouritski5 et L\u00e9nine, le 30 ao\u00fbt 1918. Il fut \u00e9galement d\u00e9nonc\u00e9 par Ren\u00e9 Marchand, correspondant du Figaro ralli\u00e9 aux bolcheviques6, ce qui entra\u00eena la prise de contr\u00f4le de l\u2019ambassade britannique \u00e0 Moscou et toute une s\u00e9rie d\u2019arrestations, parmi lesquelles celle du diplomate Robert Bruce Lockhart, plus tard \u00e9chang\u00e9 (avec George Hill) contre Litvinov, alors repr\u00e9sentant des Soviets en Grande-Bretagne, arr\u00eat\u00e9 pour espionnage\u00a0; Reilly, lui, r\u00e9ussit \u00e0 fuir la Russie. Cela n\u2019emp\u00eachera pas Hill comme Reilly de repartir en Russie, aupr\u00e8s de Denikine, tout comme Paul Dukes, expert en grimages et d\u00e9guisements, affect\u00e9 \u00e0 Petrograd (il y r\u00e9ussit \u00e0 adh\u00e9rer au Parti et \u00e0 devenir bri\u00e8vement d\u00e9l\u00e9gu\u00e9 au soviet). On reste toutefois quelque peu d\u00e9contenanc\u00e9 par la qualit\u00e9 des informations que ces derniers ou Arthur Ransome recueillent, tout au moins telles que Giles Milton nous les pr\u00e9sente (la volont\u00e9 d\u2019une r\u00e9volution mondiale est loin d\u2019\u00eatre un secret\u00a0!). Par contre, les pages \u00e9voquant Churchill sont nettement plus int\u00e9ressantes. On apprend en effet que le secr\u00e9taire d\u2019\u00c9tat \u00e0 la guerre, fervent anticommuniste et partisan d\u2019une lutte soutenue contre le pouvoir bolchevique, poussa \u00e0 l\u2019emploi d\u2019armes chimiques dernier cri, effectivement utilis\u00e9es \u00e0 la fin de l\u2019\u00e9t\u00e9 1919 dans le nord de la Russie\u00a0; le m\u00eame \u00e9tait d\u2019ailleurs pr\u00eat \u00e0 les employer \u00e9galement contre les Indiens en r\u00e9volte\u2026 Parall\u00e8lement \u00e0 la situation \u00e0 Moscou et Petrograd, Giles Milton \u00e9voque \u00e9galement largement l\u2019Asie centrale. Le Royaume-Uni s\u2019inqui\u00e9tait en effet d\u2019une possible contagion r\u00e9volutionnaire dans les zones \u00e0 la fronti\u00e8re nord de l'Inde, risquant de menacer le fleuron de son empire, ce qui explique l\u2019envoi d\u2019une mission de renseignements au Turkestan russe, compos\u00e9e principalement de Frederick Bailey et Stewart Blacker. Mais sur la situation \u00e0 Tachkent \u2013 centre administratif du Turkestan, o\u00f9 les bolcheviques, majoritaires au Soviet de cette ville de 200 000 habitants, ont pris le pouvoir le 1er novembre 1917 \u2013 le propos est laconique, \u00e9voquant surtout l\u2019isolement, la mauvaise situation \u00e9conomique et les efforts de recrutement de prisonniers autrichiens dans l\u2019Arm\u00e9e rouge\u2026 Tr\u00e8s vite menac\u00e9, Bailey change d\u2019identit\u00e9 et prend celle d\u2019un prisonnier autrichien, alors qu\u2019au printemps 1919, une r\u00e9volte secoue l\u2019Afghanistan. S\u2019engageant dans la Tcheka, Bailey parvient finalement \u00e0 regagner les terres britanniques au prix d\u2019une travers\u00e9e du d\u00e9sert de Karakoum. La main est alors reprise par le g\u00e9n\u00e9ral Wilfrid Malleson, qui, gr\u00e2ce \u00e0 tout un r\u00e9seau et \u00e0 un vaste travail de d\u00e9sinformation, parviendra \u00e0 faire se d\u00e9grader les relations russo-afghanes. Le r\u00e9volutionnaire indien M.N. Roy est \u00e9galement \u00e9voqu\u00e9, charg\u00e9 qu\u2019il fut d\u2019un projet de formation militaire, \u00e0 compter de la fin 1920, afin de pr\u00e9parer le soutien \u00e0 l\u2019insurrection en Inde, un projet finalement abandonn\u00e9 \u00e0 l\u2019occasion de l\u2019accord anglo-sovi\u00e9tique de mars 1921. Mais l\u00e0 encore, le parti pris de Giles Milton nous emp\u00eache d\u2019en apprendre davantage sur ce sujet, le parcours ult\u00e9rieur de Roy \u00e9tant plus que laconique sous sa plume7. Si les guerres secr\u00e8tes, l'utilisation d'agents clandestins et les entreprises de d\u00e9sinformation/manipulation ne doivent pas \u00eatre n\u00e9glig\u00e9es, surtout dans les p\u00e9riodes de r\u00e9volutions politiques et de ruptures sociales, encore faut-il que leur histoire soit abord\u00e9e d'une mani\u00e8re scientifique, encore plus rigoureusement que d'autres \u00e9v\u00e9nements, eu \u00e9gard au caract\u00e8re sp\u00e9cifique et myst\u00e9rieux de cet objet historique. L'ouvrage de Giles Milton ne r\u00e9pond gu\u00e8re \u00e0 ces crit\u00e8res. On lui pr\u00e9f\u00e9rera la biographie de Reginald Teague-Jones, un de ses hommes de l'ombre des services secrets britanniques en Russie sovi\u00e9tique, par l'historienne Taline Ter Minassian8. 1On peut ainsi citer un \u00ab\u00a0Conseil supr\u00eame militaire bolchevique\u00a0\u00bb (p. 113), pr\u00e9sent\u00e9 comme le c\u0153ur de l\u2019organisation bolchevique, ou Trotsky ayant dirig\u00e9 l\u2019assaut contre Cronstadt\u2026 2C\u2019est au point que Giles Milton accuse implicitement les bolcheviques d\u2019\u00eatre responsables du d\u00e9clenchement des hostilit\u00e9s une fois les premi\u00e8res forces britanniques d\u00e9barqu\u00e9es au nord du pays\u00a0! (p. 147). 3Boris Bajanov, Avec Staline dans le Kremlin, Paris, Les \u00c9ditions de France, 1930, 263 p. R\u00e9\u00e9dit\u00e9 en 1979 sous le titre Bajanov r\u00e9v\u00e8le Staline. Souvenirs d'un ancien secr\u00e9taire de Staline, chez Gallimard. A lire Giles Milton, Bajanov fut d\u00e8s 1920 \u00ab\u00a0secr\u00e9taire de l'appareil principal du parti\u00a0\u00bb (p. 312), alors qu'il ne devient Secr\u00e9taire du Politburo [Bureau politique] qu'\u00e0 l'\u00e9t\u00e9 1923. D'ailleurs, quel cr\u00e9dit accorder aux r\u00e9cits de transfuges, quels qu'ils soient\u00a0? 4Par la suite, \u00c9douard Berzine participa \u00e0 la cr\u00e9ation du Goulag. Ne pas confondre avec Ian Berzine, un des meilleurs sp\u00e9cialistes du renseignement de l'Arm\u00e9e rouge. Tous deux sont ex\u00e9cut\u00e9s lors des purges de 1937-38. 5Moiss\u00e9i/Mikha\u00efl Ouritski (1873-1918) est abattu par le socialiste-r\u00e9volutionnaire (SR) Leonid Kanegisser en tant que dirigeant de la Tcheka de Petrograd. D'abord menchevique, puis membre de la Mejra\u00efonka (un groupe internationaliste aussi nomm\u00e9 \u00ab\u00a0inter-district\u00a0\u00bb ou inter-rayons\u00a0\u00bb) avant de rejoindre les bolcheviques. La tentative d'assassinat sur L\u00e9nine est l\u2019\u0153uvre de Fanny Kaplan, ancienne anarchiste devenue membre de l'organisation de combat SR. Auparavant, elle avait pr\u00e9vu de tuer L\u00e9on Trotsky. Avant d'\u00eatre ex\u00e9cut\u00e9e, elle partagea la cellule du diplomate Robert Bruce Lockhart (Orlando Figes, La R\u00e9volution russe, Paris, Deno\u00ebl, 2007, p. 775). 6En 1919, il publie \u00e0 Petrograd Pourquoi je soutiens le bolchevisme. 7Sur M. N. Roy, on lira avec bien plus de profit l'\u00e9tude de Jean\u00a0Vigreux, \u00ab\u00a0Manabendra Nath Roy (1887-1954), \u00ab repr\u00e9sentant des Indes britanniques\u00a0\u00bb au Komintern ou la critique de l\u2019imp\u00e9rialisme britannique\u00a0\u00bb,\u00a0Cahiers d\u2019histoire. Revue d\u2019histoire critique, n\u00b0111, 2010, p. 81-95, sur https://chrhc.revues.org/2075 8Taline Ter Minassian, Reginald Teague-Jones. Au service secret de l'Empire britannique, Paris, Grasset & Fasquelle, 2012. Lire le compte rendu de cet ouvrage dans ce dossier.", "onlyNER": true}
Return the JSON where the entity 'socialiste-revolutionnaire' doesn't provide a type
NER:
{
"rawName": "socialiste-révolutionnaire",
"offsetStart": 9032,
"offsetEnd": 9034,
"nerd_score": 0.8,
"nerd_selection_score": 0
},
as title :-)
Hi Team,
I am trying to use Editor web page, SO i made a change in the Web.xml file
Here's the Web.Xml file,
But i got the Error Like,
http://localhost:8090/service/NERDCustomisations --404 (Not Found)
NERD service - a RESTful service for the (Named) Entity Recognition and Disambiguation nerd-service com.sun.jersey.spi.container.servlet.ServletContainer <init-param>
<param-name>com.sun.jersey.config.property.resourceConfigClass</param-name>
<param-value>com.sun.jersey.api.core.PackagesResourceConfig</param-value>
</init-param>
<init-param>
<param-name>com.sun.jersey.config.property.packages</param-name>
<param-value>com.scienceminer.nerd.service</param-value>
</init-param>
<load-on-startup>1</load-on-startup>
</servlet>
<servlet>
<servlet-name>defaultStatic</servlet-name>
<servlet-class>org.eclipse.jetty.servlet.DefaultServlet</servlet-class>
<load-on-startup>1</load-on-startup>
</servlet>
<servlet-mapping>
<servlet-name>defaultStatic</servlet-name>
<url-pattern>/editor.html</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>defaultStatic</servlet-name>
<url-pattern>/resources/*</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>defaultStatic</servlet-name>
<url-pattern>/nerd/editor.js</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>nerd-service</servlet-name>
<url-pattern>/admin</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>nerd-service</servlet-name>
<url-pattern>/language</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>nerd-service</servlet-name>
<url-pattern>/disambiguate</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>nerd-service</servlet-name>
<url-pattern>/segmentation</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>nerd-service</servlet-name>
<url-pattern>/customisations</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>nerd-service</servlet-name>
<url-pattern>/customisation</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>nerd-service</servlet-name>
<url-pattern>/kb/concept</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>nerd-service</servlet-name>
<url-pattern>/kb/term</url-pattern>
</servlet-mapping>
<!--servlet-mapping>
<servlet-name>nerd-service</servlet-name>
<url-pattern>/NERDCustomisation/*</url-pattern>
</servlet-mapping-->
<!--servlet-mapping>
<servlet-name>nerd-service</servlet-name>
<url-pattern>/createNERDCustomisation/*</url-pattern>
</servlet-mapping-->
<!--servlet-mapping>
<servlet-name>nerd-service</servlet-name>
<url-pattern>/extendNERDCustomisation/*</url-pattern>
</servlet-mapping-->
<servlet-mapping>
<servlet-name>nerd-service</servlet-name>
<url-pattern>/service/*</url-pattern>
</servlet-mapping>
<!--servlet-mapping>
<servlet-name>nerd-service</servlet-name>
<url-pattern>nerd/*</url-pattern>
</servlet-mapping-->
<welcome-file-list>
<welcome-file>nerd/editor.html</welcome-file>
<welcome-file>editor.html</welcome-file>
</welcome-file-list>
<!--filter>
<filter-name>cross-origin</filter-name>
<filter-class>org.eclipse.jetty.servlets.CrossOriginFilter</filter-class>
<init-param>
<param-name>allowedOrigins</param-name>
<param-value>*</param-value>
</init-param>
<init-param>
<param-name>allowedMethods</param-name>
<param-value>*</param-value>
</init-param>
<init-param>
<param-name>allowedHeaders</param-name>
<param-value>*</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>cross-origin</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping-->
In order to do the test of new language, Italian, the source of test from the link of 'http://nerd.huma-num.fr/test/' and also from the localhost are provided. Several tests are done to see the result given from both of them. With the test of disambiguation, it's found the fact that the domains given by certain mentions are different. For instance, the mention 'fiori' in 'http://nerd.huma-num.fr/test/' has the domains 'Agriculture, Plants', while in the localhost has the domain 'Plants'.
It should be re-checked the way of calculation in order to get the domains since they are treated differently.
I have a text that is returning different results depending on the instance where is run.
The text/query is the following:
{
"text": "Suite de la tournée des relations d'avant-guerre. J'ai aperçu mon plombier — il y a une véritable joie à retrouver des relations d'autrefois, après quatre années de coupure et de se sentir à l'unisson sur Pétain. Quand il m'en a parlé j'ai hésité à répondre catégoriquement, pour ne pas les choquer et j'ai dit : c'est un pauvre homme. Quel déchaînement : elle m'a dit c'est ainsi que vous appelez un homme qui nuit à son pays, etc... etc... \n Cette femme, très simple, est vraiment épatante. \n Elle m'explique que depuis le début elle écoute les informations de la radio anglaise et les diffuse dans le quartier. Je leur demande s'ils sont affiliés à une organisation — Oui - Laquelle \"la résistance\" C'sst ici que parle le bon sens et la clairvoyance : au sommet on se bat pour des initiales, à la base on croit en la résistance.\n On y croit avec plus de lucidité que de prétendus experts. \n Cet homme était de droite autrefois ;Il m'explique que parmi les riches il y en a beaucoup qui ne sont pas avec nous, parce qu'ils craignant pour leur gros sous. Ils n'ont d'ailleurs pas renié leur origine, elle me parle de la fierté qu'elle éprouve à retrouver beaucoup de catholiques dans la résistance. \n Nous parlons d'autres voisins du quartiers que sont-ils devenus. Celui—là vous savez c'est un français... et ça veut tout dire. Elle a raison cela veut tout dire — la droits a éclaté au feu de la guerre — il y a d'un côté les Français, plombiers ou hommes de lettres, et de l'autre ceux qui pensent à leurs gros sous...",
"entities": [],
"sentences": [
{
"offsetStart": 0,
"offsetEnd": 49
},
{
"offsetStart": 49,
"offsetEnd": 212
},
{
"offsetStart": 212,
"offsetEnd": 335
},
{
"offsetStart": 335,
"offsetEnd": 434
},
{
"offsetStart": 434,
"offsetEnd": 441
},
{
"offsetStart": 441,
"offsetEnd": 492
},
{
"offsetStart": 492,
"offsetEnd": 613
},
{
"offsetStart": 613,
"offsetEnd": 831
},
{
"offsetStart": 831,
"offsetEnd": 891
},
{
"offsetStart": 891,
"offsetEnd": 1055
},
{
"offsetStart": 1055,
"offsetEnd": 1199
},
{
"offsetStart": 1199,
"offsetEnd": 1266
},
{
"offsetStart": 1266,
"offsetEnd": 1307
},
{
"offsetStart": 1307,
"offsetEnd": 1329
},
{
"offsetStart": 1329,
"offsetEnd": 1521
}
],
"processSentence":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
"onlyNER": false,
"resultLanguages": [
"de",
"fr"
],
"nbest": false,
"customisation": "generic"
}
In both Huma-num and science-miner Pétain
is returned only when the text is submitted without sentences
and processSentences
.
When processSentences/sentences is provided Pétain
is not recognised anymore, even when we are processing all the sentences.
With the current version, the evaluation with NEDCorpusEvaluation class only possible with English language. In order to do the evaluation with several languages, it is needed the process of language recognition as a prerequisite prior to the evaluation step.
In order to test the process of disambiguation, some possibilities of query were given.
Let's take test cases with disambiguation of Pdf files, the service showed a strange behavior since it gave different results even for the same query.
The following is some test cases done on a Pdf file with the same query template:
2009.Infiniti.pdf
{
"mentions": [
"ner",
"wikipedia"
],
"nbest": false,
"customisation": "generic"
}
The service gave different results, for instance the mention 'Francesco Speranza' can be full recognized as 'Francesco Speranza', can partially recognized as 'Speranza', or even cannot be recognized at all.
Below are some screenshots of the results.
Nerd's service in local machine (localhost:8090) doesn't work anymore for Italian and Spanish language, but it works properly for English.
The last version of branch 0.0.3 on 16 January 2018 showed a success status when it was re-built.
But, the service didn't work for the Italian and Spanish language, and only works for English. The log showed that the language is recognized but no mention was raised.
This issue is related to the issue #15 .
Mac OSX:
wikidata: 37413613 concepts.
en: 14899737 pages.
de: 3579552 pages.
fr: 3681264 pages.
en: 3322291 pages.
it: 2291751 pages.
Linux:
wikidata: 9155139 concepts.
en: 14651883 pages.
de: 3523959 pages.
fr: 3631810 pages.
es: 3322291 pages.
it: 2291751 pages.
... or even to bootstrap 4 if it moves from beta to final release.
(we still use bootstrap 1 here, oh shame! only anHALytics front-end has been updated to bootstrap 3)
this would solve I think the problems indicated in #3
From the example at section related to "Entities" here the example is missing
{
"text": "Austria invaded and fought the Serbian army at the Battle of Cer and Battle of Kolubara beginning on 12 August.",
"language": {
"lang": "en"
},
"entities": [
{}
]
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.