bibliome / alvisnlp Goto Github PK
View Code? Open in Web Editor NEWALvisNLP corpus processing engine
Home Page: https://bibliome.github.io/alvisnlp/
License: Apache License 2.0
ALvisNLP corpus processing engine
Home Page: https://bibliome.github.io/alvisnlp/
License: Apache License 2.0
PubAnnotation
delegates to RunAnnotation
, where they could inherit from a common abstract class.
The maven repository and dependencies of the registry Api : the Java classes generated from OMTD-SHARE metadata are present there into eu.openminted.registry.domain
<dependency>
<groupId>eu.openminted</groupId>
<artifactId>omtd-registry-api</artifactId>
</dependency>
<repository>
<id>omtd-releases</id>
<layout>default</layout>
<url>https://repo.openminted.eu/content/repositories/releases</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
<repository>
<id>omtd-snapshots</id>
<layout>default</layout>
<url>https://repo.openminted.eu/content/repositories/snapshots</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
L'option fait des résultats "aléatoires"
On cherche à normalizer ce fichier :
Fichier d'entrée
AP1
AP2
CAL
FWA
FT
AP3
PI
AG
UFO
CO
Co
LD
GA1
Résultat :
<caseInsensitive>
inactivée:
AP1 AT1G69120
AP2 AT4G36920
CAL AT1G26310
FWA AT4G25530
FT AT1G65480
AP3 AT3G54340
PI AT5G20240
AG AT4G18960
UFO AT1G30950
CO AT5G15840
Co
LD AT4G02560
GA1 AT4G02780
<caseInsensitive>
Activée:
AP1 AT1G69120
AP2 AT4G36920
CAL AT1G26310
FWA AT4G25530
FT
AP3 AT3G54340
PI AT5G20240
AG AT4G18960
UFO AT1G30950
CO
Co
LD AT4G02560
GA1 AT4G02780
pour tester :
/bibdev/install/alvisnlp/devel/bin/alvisnlp -inputDir /bibdev/travail/arabidopsis/alvisir2_devel -log plan/alvisnlp.log normalize_genes.plan
There is an error when trying to run AlvisNLP with new versions of Stanford NER (>3.4)
Example with stanford-ner-2016-10-31:
Loading classifier from /bibdev/sources/stanford/stanford-ner-2016-10-31/classifiers/english.all.3class.distsim.crf.ser.gz ... Loading distsim lexicon from /u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters ... [2017-07-06 15:22:47.000][alvisnlp] SEVERE java.io.FileNotFoundException: /u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters (Aucun fichier ou dossier de ce type)
Error obtained when running: /bibdev/install/alvisnlp/devel/bin/alvisnlp -verbose -log err.log /bibdev/travail/OpenMinted/UseCases/Wheat/uc-tdm-AS-D/plans/test-stanford.plan
However, there is no problem running Stanford NER as a standalone tool:
$ java -mx600m -cp /bibdev/sources/stanford/stanford-ner-2016-10-31/stanford-ner.jar:/bibdev/sources/stanford/stanford-ner-2016-10-31/lib/* edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier /bibdev/sources/stanford/stanford-ner-2016-10-31/classifiers/english.all.3class.distsim.crf.ser.gz -textFile /bibdev/sources/stanford/stanford-ner-2016-10-31/sample.txt
Invoked on Thu Jul 06 15:17:01 CEST 2017 with arguments: -loadClassifier /bibdev/sources/stanford/stanford-ner-2016-10-31/classifiers/english.all.3class.distsim.crf.ser.gz -textFile /bibdev/sources/stanford/stanford-ner-2016-10-31/sample.txt
loadClassifier=/bibdev/sources/stanford/stanford-ner-2016-10-31/classifiers/english.all.3class.distsim.crf.ser.gz
textFile=/bibdev/sources/stanford/stanford-ner-2016-10-31/sample.txt
Loading classifier from /bibdev/sources/stanford/stanford-ner-2016-10-31/classifiers/english.all.3class.distsim.crf.ser.gz ... done [1,3 sec].
The/O fate/O of/O Lehman/ORGANIZATION Brothers/ORGANIZATION ,/O the/O beleaguered/O investment/O bank/O ,/O hung/O in/O the/O balance/O on/O Sunday/O as/O Federal/ORGANIZATION Reserve/ORGANIZATION officials/O and/O the/O leaders/O of/O major/O financial/O institutions/O continued/O to/O gather/O in/O emergency/O meetings/O trying/O to/O complete/O a/O plan/O to/O rescue/O the/O stricken/O bank/O ./O
Several/O possible/O plans/O emerged/O from/O the/O talks/O ,/O held/O at/O the/O Federal/ORGANIZATION Reserve/ORGANIZATION Bank/ORGANIZATION of/ORGANIZATION New/ORGANIZATION York/ORGANIZATION and/O led/O by/O Timothy/PERSON R./PERSON Geithner/PERSON ,/O the/O president/O of/O the/O New/ORGANIZATION York/ORGANIZATION Fed/ORGANIZATION ,/O and/O Treasury/ORGANIZATION Secretary/O Henry/PERSON M./PERSON Paulson/PERSON Jr./PERSON ./O
Serialize the data structure as UIMA XMI, using a fixed typesystem.
PubAnnotation cannot access to laptop deployment.
bibdev
bibliome.org hence belongs to BioFoundation so package naming org.bibliome.… may conflict with other.
We probably don't care as the contents date back to 2011, 2012.
This is so common:
<files>$</files>
<fileName>"somefile.txt"</fileName>
that we could use a single parameter, like that
<corpusFile>somefile.txt</corpusFile>
Currently only accepts GET.
Export annotations as PubAnnotation JSON.
Create an ALvisNLP module that exports PubAnnotation JSON.
Problems:
TrainingElementClassifier
and TaggingElementClassifier
should be able to accept a Kernel instead of a RelationDefinition
, if the classifying algorithm comes from LibSVM.
could you please add a release for alvisnlp at here https://github.com/Bibliome/alvisnlp/releases ?
Problem with the current installation of TreeTagger for French purpose.
(<treeTaggerExecutable>..../install/tree-tagger-3.2/bin/tree-tagger</treeTaggerExecutable>
)
and
(<parFile>.../install/tree-tagger-3.2/lib/french-utf8.par</parFile>
)
--
This problem is resolved with :
With this
<treeTaggerExecutable>tree-tagger-linux-3.2.1/bin/tree-tagger</treeTaggerExecutable>
and the french.par
<parFile>french.par</parFile>
it is ok
From aa67372, wiki pages will be migrated to the site:
The subject
parameter in *Projector
modules:
<subject layer="words" feature="form,lemma"/>
Try to match either features form
or lemma
.
Using -outputDir and -inputDir with ToMap generates an error, the program can't find the yatea file that was generated in the outputDir, even when specifying the outputDir as inputDir.
Example on Migale:
/projet/mig/work/textemig/software/install/alvisnlp/bin/alvisnlp -cleanTmp -verbose -log corpus/test/batch/0000/alvisnlp.log -inputDir corpus/test/batch/0000/ -inputDir corpus/test/batch/0000/output -outputDir corpus/test/batch/0000/output plans/test-output.plan
Generates the following error:
[2017-11-27 10:52:10.182][entities.term-extraction.yatea] done in 586 ms
[2017-11-27 10:52:10.182][entities.term-extraction] done in 616 ms
[2017-11-27 10:52:10.182][entities.tomap] processing
[2017-11-27 10:52:10.352][alvisnlp] SEVERE org.xml.sax.SAXParseException; Premature end of file.
NB: this only happens the first time you run the command. If you try a second time (= the yatea output already exists), then it works.
Provide a synopsis for each module, library, conversion.
Providing packaged docker would enable to install easily AlvisNLP.
Create the object that contains the list of sentences into Method createTheSentences(...) of Class Corpus2InteractionXML.java
See Example of Sentence here
lemmaKeys
, caseInsensitive
and ignoreDiacritics
control both projection and attribution. Make separate options for attribution
add option to match surface form OR lemma
We need an empty end-point for AlvisNLP--PubAnnotation:
PubAnnotation asynchronous client isn't very stable.
Use richer JSON for PubAnnotation, specified here
Cannot put on GitHub because data size and license.
Annotators (online automatic annotation services) now can be registered on PubAnnotation:
http://pubannotation.org/annotators
The API is documented here:
For modules accept input file paths as parameters, some of these files can be quite large (dictionaries).
These files could be compressed. AlvisNLP should be able to open transparently plain or compressed files.
See if parameters can be passed to the plan through PubAnnotation.
Je ne suis pas sûre de l'endroit mais bon (cc @mandiayba )
Dans mes étapes de pré-process à AlvisNLP, je transforme mes ressources dont une au format .xls en .txt. (cf https://github.com/openminted/uc-tdm-AS-E/blob/master/Execution_resources.sh)
J'utilise pour cela le programme inclut dans gnumeric, "ssconvert"
Je ne sais pas s'il faut l'intégrer dans AlvisNLP ou pas (si cela peut être tuilie). Il y a des images dockers qui existent mais je ne sais pas qu'elle est la version , et je ne sais pas trop comment l'utiliser si on réutilise un docker image.
In [[AlvisNLP-ML-data-model]], the diagram has a small mistake. Please insert the source in the wiki repo.
The test compilation in the test-alvisnlp.sh
script should be done with an empty Maven local repo.
Avoid problems like #39
Add a space normalization option for concat()
and inline()
XSLT extension functions provided by XMLReader2
.
Difficulty: keep track of character offsets.
Workaround: MergeSections
Currently the PubAnnotation endpoint assumes a parameter read of a TextFileReader module.
Dans la documentation des modules (par ex. Tbular export)
les liens vers la documentation des types dirigent vers une page 404 (du glassfish)
ex:
fileName
Mandatory
Type: Expression
Create a JSONExport module (#3), and inject it in the plan.
The Shell
and Shell2
modules should support completion for:
/bibdev/travail/arabidopsis/alvisir2_devel/plan/entities-test-RB.plan
Tees n'a pas de documentation, ou tout du moins, l'invocation de alvisnlp -moduleDoc Tees retourne l'erreur
Exception in thread "main" org.bibliome.util.clio.CLIOException: java.lang.reflect.InvocationTargetException
at org.bibliome.util.clio.CLIOParser.processOption(CLIOParser.java:154)
at org.bibliome.util.clio.CLIOParser.parse(CLIOParser.java:116)
at alvisnlp.app.cli.AbstractAlvisNLP.run(AbstractAlvisNLP.java:1045)
at alvisnlp.app.cli.AlvisNLP.main(AlvisNLP.java:85)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.bibliome.util.clio.CLIOParser.processOption(CLIOParser.java:145)
... 3 more
Caused by: org.bibliome.util.service.UnsupportedServiceException: alias: Tees
at org.bibliome.util.service.CompositeServiceFactory.getServiceByAlias(CompositeServiceFactory.java:129)
at alvisnlp.app.cli.AbstractAlvisNLP.getModuleDocumentation(AbstractAlvisNLP.java:384)
at alvisnlp.app.cli.AbstractAlvisNLP.moduleDoc(AbstractAlvisNLP.java:447)
... 8 more
Bonjour,
En essayant de suivre les instructions, lors de la seconde instruction (mvn clean install), l'installation échoue pour AlvisNLP Bibliome Module Factory. Voici ce qu'affiche à la fin de la console:
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building AlvisNLP Bibliome Module Factory 0.5rc-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[WARNING] The POM for fr.jouy.inra.maiage.bibliome:alvisdb-core:jar:0.1-SNAPSHOT is missing, no dependency information available
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] AlvisNLP/ML ........................................ SUCCESS [ 0.798 s]
[INFO] AlvisNLP Core ...................................... SUCCESS [ 6.976 s]
[INFO] AlvisNLP Bibliome Module Factory ................... FAILURE [ 0.878 s]
[INFO] alvisnlp-rest ...................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 8.957 s
[INFO] Finished at: 2017-11-22T10:46:45+01:00
[INFO] Final Memory: 27M/78M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project alvisnlp-bibliome: Could not resolve dependencies for project fr.jouy.inra.maiage.bibliome:alvisnlp-bibliome:jar:0.5rc-SNAPSHOT: Failure to find fr.jouy.inra.maiage.bibliome:alvisdb-core:jar:0.1-SNAPSHOT in http://bibliome.jouy.inra.fr/maven-repository was cached in the local repository, resolution will not be reattempted until the update interval of bibliome has elapsed or updates are forced -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :alvisnlp-bibliome
Pouvez-vous m'aider à résoudre ce problème?
alvisnlp.corpus.dump
Need a test for TEES. Otherwise merging tees-code-review
will be risky...
Make sure the PubAnnotation enpoint exports the JSON.
Instanciate a PubAnnotationExport (#3) and inject it at the end of the plan.
Tested on Windows 10 running in a VirtualBox machine.
JAVA_HOME
environment variable to the JDK directory. Usually something like C:\Program Files\jdk1.8.0_XXX
, where XXX
is the update version of the JDK. You can set environment variables through the control panel.Program Files
.Path
environment variable to %Path%;C:\sensibleplace\apache-maven-3.5.2\bin
.cmd
.git clone https://github.com/Bibliome/alvisnlp.git
cd alvisnlp
mvn clean install
@ArnaudFerre: could you try this in a native Windows machine?
ExportCadixeJSON
-> AlvisAEExport
XMLReader2
is awfully long. Since its usage is universal, it should be improved.
Might need to switch from Xalan to Saxon.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.