Code Monkey home page Code Monkey logo

geonorm's Introduction

geonorm's People

Contributors

bethard avatar jerryzeyu avatar mahrahimi1 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

geonorm's Issues

Consequences of lucene 7.5.0?

Processors is being updated to Stanford CoreNLP 4.4.0 which wants to use Lucene 7.5.0. This project specifies version 6.6.6. Do you have any idea what might break in geonorm if we update?

There may be a problem with Lucene syntax

I'm getting the exception copied below for the linked file. In GeoNamesIndex.search, the line

    var results = scoredEntries(nameQueryParser.parse(whitespaceEscapedQueryString), 1000)

may not be escaping adequately.

locationPhrase: OR
21:32:42.068 [main] ERROR o.c.w.e.a.b.ExtractDartMetaFromDirectory$ - Exception for file ..\elasticsearchDart\2019Thanksgiving2\badtext\f578425095d17fc669199b0ae45d25ac.txt
org.apache.lucene.queryparser.classic.ParseException: Cannot parse 'OR': Encountered " <OR> "OR "" at line 1, column 0.
Was expecting one of:
    <NOT> ...
    "+" ...
    "-" ...
    <BAREOPER> ...
    "(" ...
    "*" ...
    <QUOTED> ...
    <TERM> ...
    <PREFIXTERM> ...
    <WILDTERM> ...
    <REGEXPTERM> ...
    "[" ...
    "{" ...
    <NUMBER> ...
    <TERM> ...
    
	at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:116) ~[lucene-queryparser-6.6.6.jar:6.6.6 68fa249034ba8b273955f20097700dc2fbb7a800 - ishan - 2019-03-29 09:08:34]
	at org.clulab.geonorm.GeoNamesIndex.search(GeoNamesIndex.scala:130) ~[geonorm_2.12-0.9.6.jar:0.9.6]
	at org.clulab.geonorm.GeoLocationNormalizer.scoredEntries(GeoNorm.scala:179) ~[geonorm_2.12-0.9.6.jar:0.9.6]
	at org.clulab.geonorm.GeoLocationNormalizer.apply(GeoNorm.scala:205) ~[geonorm_2.12-0.9.6.jar:0.9.6]
	at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$10(GeoNormFinder.scala:163) ~[classes/:na]
	at scala.collection.TraversableLike$WithFilter.$anonfun$map$2(TraversableLike.scala:739) ~[scala-library-2.12.4.jar:na]
	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32) ~[scala-library-2.12.4.jar:na]
	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29) ~[scala-library-2.12.4.jar:na]
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:191) ~[scala-library-2.12.4.jar:na]
	at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:738) ~[scala-library-2.12.4.jar:na]
	at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$6(GeoNormFinder.scala:148) ~[classes/:na]
	at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$6$adapted(GeoNormFinder.scala:145) ~[classes/:na]
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:241) ~[scala-library-2.12.4.jar:na]
	at scala.collection.Iterator.foreach(Iterator.scala:929) ~[scala-library-2.12.4.jar:na]
	at scala.collection.Iterator.foreach$(Iterator.scala:929) ~[scala-library-2.12.4.jar:na]
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1417) ~[scala-library-2.12.4.jar:na]
	at scala.collection.IterableLike.foreach(IterableLike.scala:71) ~[scala-library-2.12.4.jar:na]
	at scala.collection.IterableLike.foreach$(IterableLike.scala:70) ~[scala-library-2.12.4.jar:na]
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54) ~[scala-library-2.12.4.jar:na]
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:241) ~[scala-library-2.12.4.jar:na]
	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:238) ~[scala-library-2.12.4.jar:na]
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) ~[scala-library-2.12.4.jar:na]
	at org.clulab.wm.eidos.context.GeoNormFinder.find(GeoNormFinder.scala:145) ~[classes/:na]
	at org.clulab.wm.eidos.EidosSystem.$anonfun$extractFrom$1(EidosSystem.scala:65) ~[classes/:na]
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:122) ~[scala-library-2.12.4.jar:na]
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:118) ~[scala-library-2.12.4.jar:na]
	at scala.collection.immutable.List.foldLeft(List.scala:86) ~[scala-library-2.12.4.jar:na]
	at org.clulab.wm.eidos.EidosSystem.extractFrom(EidosSystem.scala:64) ~[classes/:na]
	at org.clulab.wm.eidos.EidosSystem.extractFromDoc(EidosSystem.scala:92) ~[classes/:na]
	at org.clulab.wm.eidos.EidosSystem.extractFromTextWithDct(EidosSystem.scala:150) ~[classes/:na]
	at org.clulab.wm.eidos.apps.batch.ExtractDartMetaFromDirectory$.$anonfun$new$3(ExtractDartMetaFromDirectory.scala:75) [classes/:na]
	at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:12) ~[scala-library-2.12.4.jar:na]
	at org.clulab.wm.eidos.utils.Timer.time(Timer.scala:11) ~[classes/:na]
	at org.clulab.wm.eidos.apps.batch.ExtractDartMetaFromDirectory$.$anonfun$new$2(ExtractDartMetaFromDirectory.scala:66) [classes/:na]
	at org.clulab.wm.eidos.apps.batch.ExtractDartMetaFromDirectory$.$anonfun$new$2$adapted(ExtractDartMetaFromDirectory.scala:61) [classes/:na]
	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32) ~[scala-library-2.12.4.jar:na]
	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29) ~[scala-library-2.12.4.jar:na]
	at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) ~[scala-library-2.12.4.jar:na]
	at org.clulab.wm.eidos.apps.batch.ExtractDartMetaFromDirectory$.$anonfun$new$1(ExtractDartMetaFromDirectory.scala:61) [classes/:na]
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) ~[scala-library-2.12.4.jar:na]
	at org.clulab.wm.eidos.utils.Timer$.time(Timer.scala:41) ~[classes/:na]
	at org.clulab.wm.eidos.apps.batch.ExtractDartMetaFromDirectory$.delayedEndpoint$org$clulab$wm$eidos$apps$batch$ExtractDartMetaFromDirectory$1(ExtractDartMetaFromDirectory.scala:39) [classes/:na]
	at org.clulab.wm.eidos.apps.batch.ExtractDartMetaFromDirectory$delayedInit$body.apply(ExtractDartMetaFromDirectory.scala:24) ~[classes/:na]
	at scala.Function0.apply$mcV$sp(Function0.scala:34) ~[scala-library-2.12.4.jar:na]
	at scala.Function0.apply$mcV$sp$(Function0.scala:34) ~[scala-library-2.12.4.jar:na]
	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) ~[scala-library-2.12.4.jar:na]
	at scala.App.$anonfun$main$1$adapted(App.scala:76) ~[scala-library-2.12.4.jar:na]
	at scala.collection.immutable.List.foreach(List.scala:389) ~[scala-library-2.12.4.jar:na]
	at scala.App.main(App.scala:76) ~[scala-library-2.12.4.jar:na]
	at scala.App.main$(App.scala:74) ~[scala-library-2.12.4.jar:na]
	at org.clulab.wm.eidos.apps.batch.ExtractDartMetaFromDirectory$.main(ExtractDartMetaFromDirectory.scala:24) [classes/:na]
	at org.clulab.wm.eidos.apps.batch.ExtractDartMetaFromDirectory.main(ExtractDartMetaFromDirectory.scala) ~[classes/:na]
Caused by: org.apache.lucene.queryparser.classic.ParseException: Encountered " <OR> "OR "" at line 1, column 0.
Was expecting one of:
    <NOT> ...
    "+" ...
    "-" ...
    <BAREOPER> ...
    "(" ...
    "*" ...
    <QUOTED> ...
    <TERM> ...
    <PREFIXTERM> ...
    <WILDTERM> ...
    <REGEXPTERM> ...
    "[" ...
    "{" ...
    <NUMBER> ...
    <TERM> ...
    
	at org.apache.lucene.queryparser.classic.QueryParser.generateParseException(QueryParser.java:931) ~[lucene-queryparser-6.6.6.jar:6.6.6 68fa249034ba8b273955f20097700dc2fbb7a800 - ishan - 2019-03-29 09:08:34]
	at org.apache.lucene.queryparser.classic.QueryParser.jj_consume_token(QueryParser.java:813) ~[lucene-queryparser-6.6.6.jar:6.6.6 68fa249034ba8b273955f20097700dc2fbb7a800 - ishan - 2019-03-29 09:08:34]
	at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:252) ~[lucene-queryparser-6.6.6.jar:6.6.6 68fa249034ba8b273955f20097700dc2fbb7a800 - ishan - 2019-03-29 09:08:34]
	at org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:215) ~[lucene-queryparser-6.6.6.jar:6.6.6 68fa249034ba8b273955f20097700dc2fbb7a800 - ishan - 2019-03-29 09:08:34]
	at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:111) ~[lucene-queryparser-6.6.6.jar:6.6.6 68fa249034ba8b273955f20097700dc2fbb7a800 - ishan - 2019-03-29 09:08:34]
	... 51 common frames omitted

f578425095d17fc669199b0ae45d25ac.txt

Exception

CORD19_DOC_7733.txt

Exception in thread "main" org.apache.lucene.util.automaton.TooComplexToDeterminizeException: Determinizing automaton with 13462 states and 26873 transitions would result in more than 10000 states.
at org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
at org.apache.lucene.util.automaton.RunAutomaton.(RunAutomaton.java:69)
at org.apache.lucene.util.automaton.ByteRunAutomaton.(ByteRunAutomaton.java:32)
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:247)
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:133)
at org.apache.lucene.search.FuzzyTermsEnum.initAutomata(FuzzyTermsEnum.java:171)
at org.apache.lucene.search.FuzzyTermsEnum.getAutomatonEnum(FuzzyTermsEnum.java:147)
at org.apache.lucene.search.FuzzyTermsEnum.maxEditDistanceChanged(FuzzyTermsEnum.java:206)
at org.apache.lucene.search.FuzzyTermsEnum.bottomChanged(FuzzyTermsEnum.java:200)
at org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:138)
at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:154)
at org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:78)
at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:58)
at org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:314)
at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:683)
at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:733)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
at org.apache.lucene.search.grouping.GroupingSearch.groupByDocBlock(GroupingSearch.java:182)
at org.apache.lucene.search.grouping.GroupingSearch.search(GroupingSearch.java:122)
at org.clulab.geonorm.GeoNamesIndex.scoredEntries(GeoNamesIndex.scala:148)
at org.clulab.geonorm.GeoNamesIndex.search(GeoNamesIndex.scala:134)
at org.clulab.geonorm.GeoLocationNormalizer.scoredEntries(GeoNorm.scala:179)
at org.clulab.geonorm.GeoLocationNormalizer.apply(GeoNorm.scala:205)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$10(GeoNormFinder.scala:157)
at scala.collection.TraversableLike$WithFilter.$anonfun$map$2(TraversableLike.scala:739)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:191)
at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:738)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$6(GeoNormFinder.scala:148)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$6$adapted(GeoNormFinder.scala:145)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:241)
at scala.collection.Iterator.foreach(Iterator.scala:929)
at scala.collection.Iterator.foreach$(Iterator.scala:929)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
at scala.collection.IterableLike.foreach(IterableLike.scala:71)
at scala.collection.IterableLike.foreach$(IterableLike.scala:70)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:241)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:238)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at org.clulab.wm.eidos.context.GeoNormFinder.find(GeoNormFinder.scala:145)
at org.clulab.wm.eidos.EidosSystem.$anonfun$mkMentions$1(EidosSystem.scala:104)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:122)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:118)
at scala.collection.immutable.List.foldLeft(List.scala:86)
at org.clulab.wm.eidos.EidosSystem.mkMentions(EidosSystem.scala:103)
at org.clulab.wm.eidos.EidosSystem.extractFromDoc(EidosSystem.scala:139)
at org.clulab.wm.eidos.EidosSystem.extractFromDoc(EidosSystem.scala:154)
at org.clulab.wm.eidos.EidosSystem.extractFromText(EidosSystem.scala:171)
at org.clulab.wm.eidos.EidosSystem.extractFromText(EidosSystem.scala:181)
at org.clulab.wm.eidos.apps.ExtractFromDirectory$.$anonfun$new$2(ExtractFromDirectory.scala:26)
at org.clulab.wm.eidos.apps.ExtractFromDirectory$.$anonfun$new$2$adapted(ExtractFromDirectory.scala:21)
at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach(ParArray.scala:142)
at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:970)
at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:49)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:63)
at scala.collection.parallel.Task.tryLeaf(Tasks.scala:52)
at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:46)
at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:967)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:149)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:145)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

geoloc disappears

I don't yet know where this happens, but a reread of a particular file under difference circumstances is giving a different result. Last Thursday I got

      "geolocs" : [ {
        "@type" : "GeoLocation",
        "@id" : "_:GeoLocation_2",
        "startOffset" : 895,
        "endOffset" : 903,
        "text" : "Ethiopia",
        "geoID" : "337996"
      } ]

and on Monday it has disappeared. The same document was read, but the first time it was in a set of 150 and the second time it was in a set of 300. This differs from my normal regression test in that for those I read the same set of documents in the same order. This test has some documents the same and some new. Documents won't be read under the same circumstances and somehow it seems to matter.

Text is The Forum is aimed at providing a unique opportunity for foreign entrepreneurs to see the potential of Ethiopia as a market and investment destination, he added.. but probably the entire document is needed.

Exception

I haven't looked into this yet, but I'm doing triage and will come back to it. The text is in Spanish, so we don't expect to extract much useful information, but still it shouldn't crash.

12:50:33.902 [scala-execution-context-global-15] INFO org.clulab.wm.eidos.utils.Sourcer$ - Sourcing file ..\corpora\cord19_text\problems\txt\CORD19_DOC_19616.txt
Exception in thread "main" org.apache.lucene.util.automaton.TooComplexToDeterminizeException: Determinizing automaton with 10583 states and 21298 transitions would result in more than 10000 states.
at org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
at org.apache.lucene.util.automaton.RunAutomaton.(RunAutomaton.java:69)
at org.apache.lucene.util.automaton.ByteRunAutomaton.(ByteRunAutomaton.java:32)
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:247)
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:133)
at org.apache.lucene.search.FuzzyTermsEnum.initAutomata(FuzzyTermsEnum.java:171)
at org.apache.lucene.search.FuzzyTermsEnum.getAutomatonEnum(FuzzyTermsEnum.java:147)
at org.apache.lucene.search.FuzzyTermsEnum.maxEditDistanceChanged(FuzzyTermsEnum.java:206)
at org.apache.lucene.search.FuzzyTermsEnum.bottomChanged(FuzzyTermsEnum.java:200)
at org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:138)
at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:154)
at org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:78)
at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:58)
at org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:314)
at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:683)
at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:733)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
at org.apache.lucene.search.grouping.GroupingSearch.groupByDocBlock(GroupingSearch.java:182)
at org.apache.lucene.search.grouping.GroupingSearch.search(GroupingSearch.java:122)
at org.clulab.geonorm.GeoNamesIndex.scoredEntries(GeoNamesIndex.scala:148)
at org.clulab.geonorm.GeoNamesIndex.search(GeoNamesIndex.scala:134)
at org.clulab.geonorm.GeoLocationNormalizer.scoredEntries(GeoNorm.scala:179)
at org.clulab.geonorm.GeoLocationNormalizer.apply(GeoNorm.scala:205)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$10(GeoNormFinder.scala:157)
at scala.collection.TraversableLike$WithFilter.$anonfun$map$2(TraversableLike.scala:739)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:191)
at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:738)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$6(GeoNormFinder.scala:148)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$6$adapted(GeoNormFinder.scala:145)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:241)
at scala.collection.Iterator.foreach(Iterator.scala:929)
at scala.collection.Iterator.foreach$(Iterator.scala:929)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
at scala.collection.IterableLike.foreach(IterableLike.scala:71)
at scala.collection.IterableLike.foreach$(IterableLike.scala:70)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:241)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:238)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at org.clulab.wm.eidos.context.GeoNormFinder.find(GeoNormFinder.scala:145)
at org.clulab.wm.eidos.EidosSystem.$anonfun$mkMentions$1(EidosSystem.scala:104)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:122)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:118)
at scala.collection.immutable.List.foldLeft(List.scala:86)
at org.clulab.wm.eidos.EidosSystem.mkMentions(EidosSystem.scala:103)
at org.clulab.wm.eidos.EidosSystem.extractFromDoc(EidosSystem.scala:139)
at org.clulab.wm.eidos.EidosSystem.extractFromDoc(EidosSystem.scala:154)
at org.clulab.wm.eidos.EidosSystem.extractFromText(EidosSystem.scala:171)
at org.clulab.wm.eidos.EidosSystem.extractFromText(EidosSystem.scala:181)
at org.clulab.wm.eidos.apps.ExtractFromDirectory$.$anonfun$new$1(ExtractFromDirectory.scala:27)
at org.clulab.wm.eidos.apps.ExtractFromDirectory$.$anonfun$new$1$adapted(ExtractFromDirectory.scala:21)
at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach(ParArray.scala:142)
at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:970)
at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:49)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:63)
at scala.collection.parallel.Task.tryLeaf(Tasks.scala:52)
at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:46)
at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:967)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:149)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:145)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

CORD19_DOC_19616.txt

CI needs to be updated

I believe that Travis has been out of commission for some time now. This can probably run on github.

Exception

All three seem to be the same error. They might happen more for non-English texts and non-Roman alphabets (Unicode).

CORD19_DOC_15693.txt

Exception in thread "main" org.apache.lucene.util.automaton.TooComplexToDeterminizeException: Determinizing automaton with 10841 states and 21657 transitions would result in more than 10000 states.
at org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
at org.apache.lucene.util.automaton.RunAutomaton.(RunAutomaton.java:69)
at org.apache.lucene.util.automaton.ByteRunAutomaton.(ByteRunAutomaton.java:32)
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:247)
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:133)
at org.apache.lucene.search.FuzzyTermsEnum.initAutomata(FuzzyTermsEnum.java:171)
at org.apache.lucene.search.FuzzyTermsEnum.getAutomatonEnum(FuzzyTermsEnum.java:147)
at org.apache.lucene.search.FuzzyTermsEnum.maxEditDistanceChanged(FuzzyTermsEnum.java:206)
at org.apache.lucene.search.FuzzyTermsEnum.bottomChanged(FuzzyTermsEnum.java:200)
at org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:138)
at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:154)
at org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:78)
at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:58)
at org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:314)
at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:683)
at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:733)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
at org.apache.lucene.search.grouping.GroupingSearch.groupByDocBlock(GroupingSearch.java:182)
at org.apache.lucene.search.grouping.GroupingSearch.search(GroupingSearch.java:122)
at org.clulab.geonorm.GeoNamesIndex.scoredEntries(GeoNamesIndex.scala:148)
at org.clulab.geonorm.GeoNamesIndex.search(GeoNamesIndex.scala:134)
at org.clulab.geonorm.GeoLocationNormalizer.scoredEntries(GeoNorm.scala:179)
at org.clulab.geonorm.GeoLocationNormalizer.apply(GeoNorm.scala:205)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$10(GeoNormFinder.scala:157)
at scala.collection.TraversableLike$WithFilter.$anonfun$map$2(TraversableLike.scala:739)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:191)
at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:738)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$6(GeoNormFinder.scala:148)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$6$adapted(GeoNormFinder.scala:145)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:241)
at scala.collection.Iterator.foreach(Iterator.scala:929)
at scala.collection.Iterator.foreach$(Iterator.scala:929)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
at scala.collection.IterableLike.foreach(IterableLike.scala:71)
at scala.collection.IterableLike.foreach$(IterableLike.scala:70)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:241)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:238)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at org.clulab.wm.eidos.context.GeoNormFinder.find(GeoNormFinder.scala:145)
at org.clulab.wm.eidos.EidosSystem.$anonfun$mkMentions$1(EidosSystem.scala:104)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:122)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:118)
at scala.collection.immutable.List.foldLeft(List.scala:86)
at org.clulab.wm.eidos.EidosSystem.mkMentions(EidosSystem.scala:103)
at org.clulab.wm.eidos.EidosSystem.extractFromDoc(EidosSystem.scala:139)
at org.clulab.wm.eidos.EidosSystem.extractFromDoc(EidosSystem.scala:154)
at org.clulab.wm.eidos.EidosSystem.extractFromText(EidosSystem.scala:171)
at org.clulab.wm.eidos.EidosSystem.extractFromText(EidosSystem.scala:181)
at org.clulab.wm.eidos.apps.ExtractFromDirectory$.$anonfun$new$2(ExtractFromDirectory.scala:26)
at org.clulab.wm.eidos.apps.ExtractFromDirectory$.$anonfun$new$2$adapted(ExtractFromDirectory.scala:21)
at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach(ParArray.scala:142)
at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:970)
at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:49)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:63)
at scala.collection.parallel.Task.tryLeaf(Tasks.scala:52)
at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:46)
at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:967)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:149)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:145)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.