Code Monkey home page Code Monkey logo

alvisnlp's People

Contributors

arnaudferre avatar jibe-b avatar ldeleger avatar mandiayba avatar rbossy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

alvisnlp's Issues

Take in charge the OMTD-SHARE metadata directly into alvis

The maven repository and dependencies of the registry Api : the Java classes generated from OMTD-SHARE metadata are present there into eu.openminted.registry.domain

  • the dependency
<dependency>
<groupId>eu.openminted</groupId>
<artifactId>omtd-registry-api</artifactId>
</dependency>
  • the dependency is present in the two repositories
<repository>
<id>omtd-releases</id>
<layout>default</layout>
<url>https://repo.openminted.eu/content/repositories/releases</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
<repository>
<id>omtd-snapshots</id>
<layout>default</layout>
<url>https://repo.openminted.eu/content/repositories/snapshots</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>

Problème CaseInsensitive sur Tabular Projector

L'option fait des résultats "aléatoires"

On cherche à normalizer ce fichier :

Fichier d'entrée

AP1
AP2
CAL
FWA
FT
AP3
PI
AG
UFO
CO
Co
LD
GA1

Résultat :
<caseInsensitive> inactivée:

AP1 AT1G69120
AP2 AT4G36920
CAL AT1G26310
FWA AT4G25530
FT AT1G65480
AP3 AT3G54340
PI AT5G20240
AG AT4G18960
UFO AT1G30950
CO AT5G15840
Co

LD AT4G02560
GA1 AT4G02780

<caseInsensitive> Activée:

AP1 AT1G69120
AP2 AT4G36920
CAL AT1G26310
FWA AT4G25530
FT
AP3 AT3G54340
PI AT5G20240
AG AT4G18960
UFO AT1G30950
CO
Co

LD AT4G02560
GA1 AT4G02780

pour tester :
/bibdev/install/alvisnlp/devel/bin/alvisnlp -inputDir /bibdev/travail/arabidopsis/alvisir2_devel -log plan/alvisnlp.log normalize_genes.plan

Problem with AlvisNLP and Stanford NER new versions (>3.4)

There is an error when trying to run AlvisNLP with new versions of Stanford NER (>3.4)
Example with stanford-ner-2016-10-31:

Loading classifier from /bibdev/sources/stanford/stanford-ner-2016-10-31/classifiers/english.all.3class.distsim.crf.ser.gz ... Loading distsim lexicon from /u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters ... [2017-07-06 15:22:47.000][alvisnlp] SEVERE java.io.FileNotFoundException: /u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters (Aucun fichier ou dossier de ce type)

Error obtained when running: /bibdev/install/alvisnlp/devel/bin/alvisnlp -verbose -log err.log /bibdev/travail/OpenMinted/UseCases/Wheat/uc-tdm-AS-D/plans/test-stanford.plan

However, there is no problem running Stanford NER as a standalone tool:

$ java -mx600m -cp /bibdev/sources/stanford/stanford-ner-2016-10-31/stanford-ner.jar:/bibdev/sources/stanford/stanford-ner-2016-10-31/lib/* edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier /bibdev/sources/stanford/stanford-ner-2016-10-31/classifiers/english.all.3class.distsim.crf.ser.gz -textFile /bibdev/sources/stanford/stanford-ner-2016-10-31/sample.txt
Invoked on Thu Jul 06 15:17:01 CEST 2017 with arguments: -loadClassifier /bibdev/sources/stanford/stanford-ner-2016-10-31/classifiers/english.all.3class.distsim.crf.ser.gz -textFile /bibdev/sources/stanford/stanford-ner-2016-10-31/sample.txt
loadClassifier=/bibdev/sources/stanford/stanford-ner-2016-10-31/classifiers/english.all.3class.distsim.crf.ser.gz
textFile=/bibdev/sources/stanford/stanford-ner-2016-10-31/sample.txt
Loading classifier from /bibdev/sources/stanford/stanford-ner-2016-10-31/classifiers/english.all.3class.distsim.crf.ser.gz ... done [1,3 sec].
The/O fate/O of/O Lehman/ORGANIZATION Brothers/ORGANIZATION ,/O the/O beleaguered/O investment/O bank/O ,/O hung/O in/O the/O balance/O on/O Sunday/O as/O Federal/ORGANIZATION Reserve/ORGANIZATION officials/O and/O the/O leaders/O of/O major/O financial/O institutions/O continued/O to/O gather/O in/O emergency/O meetings/O trying/O to/O complete/O a/O plan/O to/O rescue/O the/O stricken/O bank/O ./O 
Several/O possible/O plans/O emerged/O from/O the/O talks/O ,/O held/O at/O the/O Federal/ORGANIZATION Reserve/ORGANIZATION Bank/ORGANIZATION of/ORGANIZATION New/ORGANIZATION York/ORGANIZATION and/O led/O by/O Timothy/PERSON R./PERSON Geithner/PERSON ,/O the/O president/O of/O the/O New/ORGANIZATION York/ORGANIZATION Fed/ORGANIZATION ,/O and/O Treasury/ORGANIZATION Secretary/O Henry/PERSON M./PERSON Paulson/PERSON Jr./PERSON ./O

XMI Serialization

Serialize the data structure as UIMA XMI, using a fixed typesystem.

Deploy demo version of endpoint

PubAnnotation cannot access to laptop deployment.

  • deployment on bibdev
  • loading the taxon dictionary ends up with OutOfMemory exception, need to use dictionary serialization...

Export annotations as PubAnnotation JSON.

Export annotations as PubAnnotation JSON.

Create an ALvisNLP module that exports PubAnnotation JSON.

Problems:

  • In an AlvisNLP/ML workflow, not all annotations should be exposed/exported. How to select the interesting layers?
  • Choose the right feature in an annotation to export as obj
  • Choose the right feature in a tuple to export as pred
  • Choose the right roles in a tuple to export as subj and obj

Kernel in classifiers

TrainingElementClassifier and TaggingElementClassifier should be able to accept a Kernel instead of a RelationDefinition, if the classifying algorithm comes from LibSVM.

Add resources to PubAnnotation

  • Add documents to PubAnnotation : about 1million pmids from pubmed to add
  • Add dictionary to Pubdictionaries : about 4 million dictionary entries from ncbi to add

TreeTagger for french

Problem with the current installation of TreeTagger for French purpose.
(<treeTaggerExecutable>..../install/tree-tagger-3.2/bin/tree-tagger</treeTaggerExecutable>)
and
(<parFile>.../install/tree-tagger-3.2/lib/french-utf8.par</parFile>)

--
This problem is resolved with :

With this
<treeTaggerExecutable>tree-tagger-linux-3.2.1/bin/tree-tagger</treeTaggerExecutable>

and the french.par
<parFile>french.par</parFile>
it is ok

Can't use -outputDir and -inputDir with ToMap (unless Yatea file already exists)

Using -outputDir and -inputDir with ToMap generates an error, the program can't find the yatea file that was generated in the outputDir, even when specifying the outputDir as inputDir.

Example on Migale:
/projet/mig/work/textemig/software/install/alvisnlp/bin/alvisnlp -cleanTmp -verbose -log corpus/test/batch/0000/alvisnlp.log -inputDir corpus/test/batch/0000/ -inputDir corpus/test/batch/0000/output -outputDir corpus/test/batch/0000/output plans/test-output.plan
Generates the following error:

[2017-11-27 10:52:10.182][entities.term-extraction.yatea] done in 586 ms
[2017-11-27 10:52:10.182][entities.term-extraction] done in 616 ms
[2017-11-27 10:52:10.182][entities.tomap] processing
[2017-11-27 10:52:10.352][alvisnlp] SEVERE org.xml.sax.SAXParseException; Premature end of file.

NB: this only happens the first time you run the command. If you try a second time (= the yatea output already exists), then it works.

Docker dist

Providing packaged docker would enable to install easily AlvisNLP.

TomapProjector attribution match options

  • lemmaKeys, caseInsensitive and ignoreDiacritics control both projection and attribution. Make separate options for attribution

  • add option to match surface form OR lemma

alvisnlp-rest: PubAnnotation endpoint

We need an empty end-point for AlvisNLP--PubAnnotation:

  • One URL for each plan?
  • Accept the text parameter for direct input
  • Accept the sourcedb and sourceid parameters, should download the text by calling back pubannotation.org

Prepare demo for BLAH3 wrap up

  1. show AlvisNLP/ML service
  2. show exposed plans
  3. show API doc, esp PubAnnotation endpoint
  4. choose one or two cool documents to annotate
  5. show annotation process
  6. what next

XLSProjector module (was: .xls to .txt)

Je ne suis pas sûre de l'endroit mais bon (cc @mandiayba )

Dans mes étapes de pré-process à AlvisNLP, je transforme mes ressources dont une au format .xls en .txt. (cf https://github.com/openminted/uc-tdm-AS-E/blob/master/Execution_resources.sh)

J'utilise pour cela le programme inclut dans gnumeric, "ssconvert"

Je ne sais pas s'il faut l'intégrer dans AlvisNLP ou pas (si cela peut être tuilie). Il y a des images dockers qui existent mais je ne sais pas qu'elle est la version , et je ne sais pas trop comment l'utiliser si on réutilise un docker image.

Normalize space in XMLReader2 XSLT functions

Add a space normalization option for concat() and inline() XSLT extension functions provided by XMLReader2.

Difficulty: keep track of character offsets.

Workaround: MergeSections

Shell completion

The Shell and Shell2 modules should support completion for:

  • keywords
  • layer names
  • feature names
  • document ids, section names and relation names

[metadata] Tees n'a pas de documentation

Tees n'a pas de documentation, ou tout du moins, l'invocation de alvisnlp -moduleDoc Tees retourne l'erreur

Exception in thread "main" org.bibliome.util.clio.CLIOException: java.lang.reflect.InvocationTargetException
at org.bibliome.util.clio.CLIOParser.processOption(CLIOParser.java:154)
at org.bibliome.util.clio.CLIOParser.parse(CLIOParser.java:116)
at alvisnlp.app.cli.AbstractAlvisNLP.run(AbstractAlvisNLP.java:1045)
at alvisnlp.app.cli.AlvisNLP.main(AlvisNLP.java:85)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.bibliome.util.clio.CLIOParser.processOption(CLIOParser.java:145)
... 3 more
Caused by: org.bibliome.util.service.UnsupportedServiceException: alias: Tees
at org.bibliome.util.service.CompositeServiceFactory.getServiceByAlias(CompositeServiceFactory.java:129)
at alvisnlp.app.cli.AbstractAlvisNLP.getModuleDocumentation(AbstractAlvisNLP.java:384)
at alvisnlp.app.cli.AbstractAlvisNLP.moduleDoc(AbstractAlvisNLP.java:447)
... 8 more

AlvisNLP Bibliome Module Factory ................... FAILURE

Bonjour,

En essayant de suivre les instructions, lors de la seconde instruction (mvn clean install), l'installation échoue pour AlvisNLP Bibliome Module Factory. Voici ce qu'affiche à la fin de la console:

[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building AlvisNLP Bibliome Module Factory 0.5rc-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[WARNING] The POM for fr.jouy.inra.maiage.bibliome:alvisdb-core:jar:0.1-SNAPSHOT is missing, no dependency information available
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] AlvisNLP/ML ........................................ SUCCESS [ 0.798 s]
[INFO] AlvisNLP Core ...................................... SUCCESS [ 6.976 s]
[INFO] AlvisNLP Bibliome Module Factory ................... FAILURE [ 0.878 s]
[INFO] alvisnlp-rest ...................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 8.957 s
[INFO] Finished at: 2017-11-22T10:46:45+01:00
[INFO] Final Memory: 27M/78M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project alvisnlp-bibliome: Could not resolve dependencies for project fr.jouy.inra.maiage.bibliome:alvisnlp-bibliome:jar:0.5rc-SNAPSHOT: Failure to find fr.jouy.inra.maiage.bibliome:alvisdb-core:jar:0.1-SNAPSHOT in http://bibliome.jouy.inra.fr/maven-repository was cached in the local repository, resolution will not be reattempted until the update interval of bibliome has elapsed or updates are forced -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :alvisnlp-bibliome

Pouvez-vous m'aider à résoudre ce problème?

TEES test

Need a test for TEES. Otherwise merging tees-code-review will be risky...

Install on Windows

Install procedure for windows

Tested on Windows 10 running in a VirtualBox machine.

  1. Download and install JDK 8 for Windows.
  2. Set the JAVA_HOME environment variable to the JDK directory. Usually something like C:\Program Files\jdk1.8.0_XXX, where XXX is the update version of the JDK. You can set environment variables through the control panel.
  3. Download and install git for Windows.
  4. Download and install Maven. Installing Maven means extracting the archive in a sensible place like your home or Program Files.
  5. Set the Path environment variable to %Path%;C:\sensibleplace\apache-maven-3.5.2\bin.
  6. Open a Windows command line window. You may find it by searching for cmd.
  7. Download and compile AlvisNLP/ML:
git clone https://github.com/Bibliome/alvisnlp.git
cd alvisnlp
mvn clean install

@ArnaudFerre: could you try this in a native Windows machine?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.