Generic information for people in the CLU lab
- tmpfiles - Cron jobs that remove some temporary files
- sbt - Modifications to sbt to manage temporary files
- jenkins - Configuration for projects being tested on local servers
geographical name normalization (a.k.a. toponym resolution)
License: Apache License 2.0
Generic information for people in the CLU lab
Processors is being updated to Stanford CoreNLP 4.4.0 which wants to use Lucene 7.5.0. This project specifies version 6.6.6. Do you have any idea what might break in geonorm if we update?
I'm getting the exception copied below for the linked file. In GeoNamesIndex.search, the line
var results = scoredEntries(nameQueryParser.parse(whitespaceEscapedQueryString), 1000)
may not be escaping adequately.
locationPhrase: OR
21:32:42.068 [main] ERROR o.c.w.e.a.b.ExtractDartMetaFromDirectory$ - Exception for file ..\elasticsearchDart\2019Thanksgiving2\badtext\f578425095d17fc669199b0ae45d25ac.txt
org.apache.lucene.queryparser.classic.ParseException: Cannot parse 'OR': Encountered " <OR> "OR "" at line 1, column 0.
Was expecting one of:
<NOT> ...
"+" ...
"-" ...
<BAREOPER> ...
"(" ...
"*" ...
<QUOTED> ...
<TERM> ...
<PREFIXTERM> ...
<WILDTERM> ...
<REGEXPTERM> ...
"[" ...
"{" ...
<NUMBER> ...
<TERM> ...
at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:116) ~[lucene-queryparser-6.6.6.jar:6.6.6 68fa249034ba8b273955f20097700dc2fbb7a800 - ishan - 2019-03-29 09:08:34]
at org.clulab.geonorm.GeoNamesIndex.search(GeoNamesIndex.scala:130) ~[geonorm_2.12-0.9.6.jar:0.9.6]
at org.clulab.geonorm.GeoLocationNormalizer.scoredEntries(GeoNorm.scala:179) ~[geonorm_2.12-0.9.6.jar:0.9.6]
at org.clulab.geonorm.GeoLocationNormalizer.apply(GeoNorm.scala:205) ~[geonorm_2.12-0.9.6.jar:0.9.6]
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$10(GeoNormFinder.scala:163) ~[classes/:na]
at scala.collection.TraversableLike$WithFilter.$anonfun$map$2(TraversableLike.scala:739) ~[scala-library-2.12.4.jar:na]
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32) ~[scala-library-2.12.4.jar:na]
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29) ~[scala-library-2.12.4.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:191) ~[scala-library-2.12.4.jar:na]
at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:738) ~[scala-library-2.12.4.jar:na]
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$6(GeoNormFinder.scala:148) ~[classes/:na]
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$6$adapted(GeoNormFinder.scala:145) ~[classes/:na]
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:241) ~[scala-library-2.12.4.jar:na]
at scala.collection.Iterator.foreach(Iterator.scala:929) ~[scala-library-2.12.4.jar:na]
at scala.collection.Iterator.foreach$(Iterator.scala:929) ~[scala-library-2.12.4.jar:na]
at scala.collection.AbstractIterator.foreach(Iterator.scala:1417) ~[scala-library-2.12.4.jar:na]
at scala.collection.IterableLike.foreach(IterableLike.scala:71) ~[scala-library-2.12.4.jar:na]
at scala.collection.IterableLike.foreach$(IterableLike.scala:70) ~[scala-library-2.12.4.jar:na]
at scala.collection.AbstractIterable.foreach(Iterable.scala:54) ~[scala-library-2.12.4.jar:na]
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:241) ~[scala-library-2.12.4.jar:na]
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:238) ~[scala-library-2.12.4.jar:na]
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) ~[scala-library-2.12.4.jar:na]
at org.clulab.wm.eidos.context.GeoNormFinder.find(GeoNormFinder.scala:145) ~[classes/:na]
at org.clulab.wm.eidos.EidosSystem.$anonfun$extractFrom$1(EidosSystem.scala:65) ~[classes/:na]
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:122) ~[scala-library-2.12.4.jar:na]
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:118) ~[scala-library-2.12.4.jar:na]
at scala.collection.immutable.List.foldLeft(List.scala:86) ~[scala-library-2.12.4.jar:na]
at org.clulab.wm.eidos.EidosSystem.extractFrom(EidosSystem.scala:64) ~[classes/:na]
at org.clulab.wm.eidos.EidosSystem.extractFromDoc(EidosSystem.scala:92) ~[classes/:na]
at org.clulab.wm.eidos.EidosSystem.extractFromTextWithDct(EidosSystem.scala:150) ~[classes/:na]
at org.clulab.wm.eidos.apps.batch.ExtractDartMetaFromDirectory$.$anonfun$new$3(ExtractDartMetaFromDirectory.scala:75) [classes/:na]
at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:12) ~[scala-library-2.12.4.jar:na]
at org.clulab.wm.eidos.utils.Timer.time(Timer.scala:11) ~[classes/:na]
at org.clulab.wm.eidos.apps.batch.ExtractDartMetaFromDirectory$.$anonfun$new$2(ExtractDartMetaFromDirectory.scala:66) [classes/:na]
at org.clulab.wm.eidos.apps.batch.ExtractDartMetaFromDirectory$.$anonfun$new$2$adapted(ExtractDartMetaFromDirectory.scala:61) [classes/:na]
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32) ~[scala-library-2.12.4.jar:na]
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29) ~[scala-library-2.12.4.jar:na]
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) ~[scala-library-2.12.4.jar:na]
at org.clulab.wm.eidos.apps.batch.ExtractDartMetaFromDirectory$.$anonfun$new$1(ExtractDartMetaFromDirectory.scala:61) [classes/:na]
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) ~[scala-library-2.12.4.jar:na]
at org.clulab.wm.eidos.utils.Timer$.time(Timer.scala:41) ~[classes/:na]
at org.clulab.wm.eidos.apps.batch.ExtractDartMetaFromDirectory$.delayedEndpoint$org$clulab$wm$eidos$apps$batch$ExtractDartMetaFromDirectory$1(ExtractDartMetaFromDirectory.scala:39) [classes/:na]
at org.clulab.wm.eidos.apps.batch.ExtractDartMetaFromDirectory$delayedInit$body.apply(ExtractDartMetaFromDirectory.scala:24) ~[classes/:na]
at scala.Function0.apply$mcV$sp(Function0.scala:34) ~[scala-library-2.12.4.jar:na]
at scala.Function0.apply$mcV$sp$(Function0.scala:34) ~[scala-library-2.12.4.jar:na]
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) ~[scala-library-2.12.4.jar:na]
at scala.App.$anonfun$main$1$adapted(App.scala:76) ~[scala-library-2.12.4.jar:na]
at scala.collection.immutable.List.foreach(List.scala:389) ~[scala-library-2.12.4.jar:na]
at scala.App.main(App.scala:76) ~[scala-library-2.12.4.jar:na]
at scala.App.main$(App.scala:74) ~[scala-library-2.12.4.jar:na]
at org.clulab.wm.eidos.apps.batch.ExtractDartMetaFromDirectory$.main(ExtractDartMetaFromDirectory.scala:24) [classes/:na]
at org.clulab.wm.eidos.apps.batch.ExtractDartMetaFromDirectory.main(ExtractDartMetaFromDirectory.scala) ~[classes/:na]
Caused by: org.apache.lucene.queryparser.classic.ParseException: Encountered " <OR> "OR "" at line 1, column 0.
Was expecting one of:
<NOT> ...
"+" ...
"-" ...
<BAREOPER> ...
"(" ...
"*" ...
<QUOTED> ...
<TERM> ...
<PREFIXTERM> ...
<WILDTERM> ...
<REGEXPTERM> ...
"[" ...
"{" ...
<NUMBER> ...
<TERM> ...
at org.apache.lucene.queryparser.classic.QueryParser.generateParseException(QueryParser.java:931) ~[lucene-queryparser-6.6.6.jar:6.6.6 68fa249034ba8b273955f20097700dc2fbb7a800 - ishan - 2019-03-29 09:08:34]
at org.apache.lucene.queryparser.classic.QueryParser.jj_consume_token(QueryParser.java:813) ~[lucene-queryparser-6.6.6.jar:6.6.6 68fa249034ba8b273955f20097700dc2fbb7a800 - ishan - 2019-03-29 09:08:34]
at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:252) ~[lucene-queryparser-6.6.6.jar:6.6.6 68fa249034ba8b273955f20097700dc2fbb7a800 - ishan - 2019-03-29 09:08:34]
at org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:215) ~[lucene-queryparser-6.6.6.jar:6.6.6 68fa249034ba8b273955f20097700dc2fbb7a800 - ishan - 2019-03-29 09:08:34]
at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:111) ~[lucene-queryparser-6.6.6.jar:6.6.6 68fa249034ba8b273955f20097700dc2fbb7a800 - ishan - 2019-03-29 09:08:34]
... 51 common frames omitted
Exception in thread "main" org.apache.lucene.util.automaton.TooComplexToDeterminizeException: Determinizing automaton with 13462 states and 26873 transitions would result in more than 10000 states.
at org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
at org.apache.lucene.util.automaton.RunAutomaton.(RunAutomaton.java:69)
at org.apache.lucene.util.automaton.ByteRunAutomaton.(ByteRunAutomaton.java:32)
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:247)
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:133)
at org.apache.lucene.search.FuzzyTermsEnum.initAutomata(FuzzyTermsEnum.java:171)
at org.apache.lucene.search.FuzzyTermsEnum.getAutomatonEnum(FuzzyTermsEnum.java:147)
at org.apache.lucene.search.FuzzyTermsEnum.maxEditDistanceChanged(FuzzyTermsEnum.java:206)
at org.apache.lucene.search.FuzzyTermsEnum.bottomChanged(FuzzyTermsEnum.java:200)
at org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:138)
at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:154)
at org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:78)
at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:58)
at org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:314)
at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:683)
at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:733)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
at org.apache.lucene.search.grouping.GroupingSearch.groupByDocBlock(GroupingSearch.java:182)
at org.apache.lucene.search.grouping.GroupingSearch.search(GroupingSearch.java:122)
at org.clulab.geonorm.GeoNamesIndex.scoredEntries(GeoNamesIndex.scala:148)
at org.clulab.geonorm.GeoNamesIndex.search(GeoNamesIndex.scala:134)
at org.clulab.geonorm.GeoLocationNormalizer.scoredEntries(GeoNorm.scala:179)
at org.clulab.geonorm.GeoLocationNormalizer.apply(GeoNorm.scala:205)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$10(GeoNormFinder.scala:157)
at scala.collection.TraversableLike$WithFilter.$anonfun$map$2(TraversableLike.scala:739)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:191)
at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:738)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$6(GeoNormFinder.scala:148)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$6$adapted(GeoNormFinder.scala:145)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:241)
at scala.collection.Iterator.foreach(Iterator.scala:929)
at scala.collection.Iterator.foreach$(Iterator.scala:929)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
at scala.collection.IterableLike.foreach(IterableLike.scala:71)
at scala.collection.IterableLike.foreach$(IterableLike.scala:70)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:241)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:238)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at org.clulab.wm.eidos.context.GeoNormFinder.find(GeoNormFinder.scala:145)
at org.clulab.wm.eidos.EidosSystem.$anonfun$mkMentions$1(EidosSystem.scala:104)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:122)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:118)
at scala.collection.immutable.List.foldLeft(List.scala:86)
at org.clulab.wm.eidos.EidosSystem.mkMentions(EidosSystem.scala:103)
at org.clulab.wm.eidos.EidosSystem.extractFromDoc(EidosSystem.scala:139)
at org.clulab.wm.eidos.EidosSystem.extractFromDoc(EidosSystem.scala:154)
at org.clulab.wm.eidos.EidosSystem.extractFromText(EidosSystem.scala:171)
at org.clulab.wm.eidos.EidosSystem.extractFromText(EidosSystem.scala:181)
at org.clulab.wm.eidos.apps.ExtractFromDirectory$.$anonfun$new$2(ExtractFromDirectory.scala:26)
at org.clulab.wm.eidos.apps.ExtractFromDirectory$.$anonfun$new$2$adapted(ExtractFromDirectory.scala:21)
at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach(ParArray.scala:142)
at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:970)
at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:49)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:63)
at scala.collection.parallel.Task.tryLeaf(Tasks.scala:52)
at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:46)
at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:967)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:149)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:145)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
I don't yet know where this happens, but a reread of a particular file under difference circumstances is giving a different result. Last Thursday I got
"geolocs" : [ {
"@type" : "GeoLocation",
"@id" : "_:GeoLocation_2",
"startOffset" : 895,
"endOffset" : 903,
"text" : "Ethiopia",
"geoID" : "337996"
} ]
and on Monday it has disappeared. The same document was read, but the first time it was in a set of 150 and the second time it was in a set of 300. This differs from my normal regression test in that for those I read the same set of documents in the same order. This test has some documents the same and some new. Documents won't be read under the same circumstances and somehow it seems to matter.
Text is The Forum is aimed at providing a unique opportunity for foreign entrepreneurs to see the potential of Ethiopia as a market and investment destination, he added.
. but probably the entire document is needed.
I haven't looked into this yet, but I'm doing triage and will come back to it. The text is in Spanish, so we don't expect to extract much useful information, but still it shouldn't crash.
12:50:33.902 [scala-execution-context-global-15] INFO org.clulab.wm.eidos.utils.Sourcer$ - Sourcing file ..\corpora\cord19_text\problems\txt\CORD19_DOC_19616.txt
Exception in thread "main" org.apache.lucene.util.automaton.TooComplexToDeterminizeException: Determinizing automaton with 10583 states and 21298 transitions would result in more than 10000 states.
at org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
at org.apache.lucene.util.automaton.RunAutomaton.(RunAutomaton.java:69)
at org.apache.lucene.util.automaton.ByteRunAutomaton.(ByteRunAutomaton.java:32)
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:247)
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:133)
at org.apache.lucene.search.FuzzyTermsEnum.initAutomata(FuzzyTermsEnum.java:171)
at org.apache.lucene.search.FuzzyTermsEnum.getAutomatonEnum(FuzzyTermsEnum.java:147)
at org.apache.lucene.search.FuzzyTermsEnum.maxEditDistanceChanged(FuzzyTermsEnum.java:206)
at org.apache.lucene.search.FuzzyTermsEnum.bottomChanged(FuzzyTermsEnum.java:200)
at org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:138)
at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:154)
at org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:78)
at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:58)
at org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:314)
at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:683)
at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:733)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
at org.apache.lucene.search.grouping.GroupingSearch.groupByDocBlock(GroupingSearch.java:182)
at org.apache.lucene.search.grouping.GroupingSearch.search(GroupingSearch.java:122)
at org.clulab.geonorm.GeoNamesIndex.scoredEntries(GeoNamesIndex.scala:148)
at org.clulab.geonorm.GeoNamesIndex.search(GeoNamesIndex.scala:134)
at org.clulab.geonorm.GeoLocationNormalizer.scoredEntries(GeoNorm.scala:179)
at org.clulab.geonorm.GeoLocationNormalizer.apply(GeoNorm.scala:205)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$10(GeoNormFinder.scala:157)
at scala.collection.TraversableLike$WithFilter.$anonfun$map$2(TraversableLike.scala:739)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:191)
at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:738)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$6(GeoNormFinder.scala:148)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$6$adapted(GeoNormFinder.scala:145)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:241)
at scala.collection.Iterator.foreach(Iterator.scala:929)
at scala.collection.Iterator.foreach$(Iterator.scala:929)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
at scala.collection.IterableLike.foreach(IterableLike.scala:71)
at scala.collection.IterableLike.foreach$(IterableLike.scala:70)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:241)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:238)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at org.clulab.wm.eidos.context.GeoNormFinder.find(GeoNormFinder.scala:145)
at org.clulab.wm.eidos.EidosSystem.$anonfun$mkMentions$1(EidosSystem.scala:104)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:122)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:118)
at scala.collection.immutable.List.foldLeft(List.scala:86)
at org.clulab.wm.eidos.EidosSystem.mkMentions(EidosSystem.scala:103)
at org.clulab.wm.eidos.EidosSystem.extractFromDoc(EidosSystem.scala:139)
at org.clulab.wm.eidos.EidosSystem.extractFromDoc(EidosSystem.scala:154)
at org.clulab.wm.eidos.EidosSystem.extractFromText(EidosSystem.scala:171)
at org.clulab.wm.eidos.EidosSystem.extractFromText(EidosSystem.scala:181)
at org.clulab.wm.eidos.apps.ExtractFromDirectory$.$anonfun$new$1(ExtractFromDirectory.scala:27)
at org.clulab.wm.eidos.apps.ExtractFromDirectory$.$anonfun$new$1$adapted(ExtractFromDirectory.scala:21)
at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach(ParArray.scala:142)
at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:970)
at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:49)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:63)
at scala.collection.parallel.Task.tryLeaf(Tasks.scala:52)
at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:46)
at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:967)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:149)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:145)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
I believe that Travis has been out of commission for some time now. This can probably run on github.
Currently, only the old version of the GeoNames index is available as an artifact. A new version including woredas should be published:
Could the files contained in http://clulab.cs.arizona.edu/models/geonames-index.zip possibly be packaged into a jar that can be declared as a project dependency and included in e.g., eidos, at build or assembly time? At runtime the resources of the jar file would be copied to the cache directory for use by lucene instead of being fetched over a live network connection. It might also help with versioning.
All three seem to be the same error. They might happen more for non-English texts and non-Roman alphabets (Unicode).
Exception in thread "main" org.apache.lucene.util.automaton.TooComplexToDeterminizeException: Determinizing automaton with 10841 states and 21657 transitions would result in more than 10000 states.
at org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
at org.apache.lucene.util.automaton.RunAutomaton.(RunAutomaton.java:69)
at org.apache.lucene.util.automaton.ByteRunAutomaton.(ByteRunAutomaton.java:32)
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:247)
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:133)
at org.apache.lucene.search.FuzzyTermsEnum.initAutomata(FuzzyTermsEnum.java:171)
at org.apache.lucene.search.FuzzyTermsEnum.getAutomatonEnum(FuzzyTermsEnum.java:147)
at org.apache.lucene.search.FuzzyTermsEnum.maxEditDistanceChanged(FuzzyTermsEnum.java:206)
at org.apache.lucene.search.FuzzyTermsEnum.bottomChanged(FuzzyTermsEnum.java:200)
at org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:138)
at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:154)
at org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:78)
at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:58)
at org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:314)
at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:683)
at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:733)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
at org.apache.lucene.search.grouping.GroupingSearch.groupByDocBlock(GroupingSearch.java:182)
at org.apache.lucene.search.grouping.GroupingSearch.search(GroupingSearch.java:122)
at org.clulab.geonorm.GeoNamesIndex.scoredEntries(GeoNamesIndex.scala:148)
at org.clulab.geonorm.GeoNamesIndex.search(GeoNamesIndex.scala:134)
at org.clulab.geonorm.GeoLocationNormalizer.scoredEntries(GeoNorm.scala:179)
at org.clulab.geonorm.GeoLocationNormalizer.apply(GeoNorm.scala:205)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$10(GeoNormFinder.scala:157)
at scala.collection.TraversableLike$WithFilter.$anonfun$map$2(TraversableLike.scala:739)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:191)
at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:738)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$6(GeoNormFinder.scala:148)
at org.clulab.wm.eidos.context.GeoNormFinder.$anonfun$find$6$adapted(GeoNormFinder.scala:145)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:241)
at scala.collection.Iterator.foreach(Iterator.scala:929)
at scala.collection.Iterator.foreach$(Iterator.scala:929)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
at scala.collection.IterableLike.foreach(IterableLike.scala:71)
at scala.collection.IterableLike.foreach$(IterableLike.scala:70)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:241)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:238)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at org.clulab.wm.eidos.context.GeoNormFinder.find(GeoNormFinder.scala:145)
at org.clulab.wm.eidos.EidosSystem.$anonfun$mkMentions$1(EidosSystem.scala:104)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:122)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:118)
at scala.collection.immutable.List.foldLeft(List.scala:86)
at org.clulab.wm.eidos.EidosSystem.mkMentions(EidosSystem.scala:103)
at org.clulab.wm.eidos.EidosSystem.extractFromDoc(EidosSystem.scala:139)
at org.clulab.wm.eidos.EidosSystem.extractFromDoc(EidosSystem.scala:154)
at org.clulab.wm.eidos.EidosSystem.extractFromText(EidosSystem.scala:171)
at org.clulab.wm.eidos.EidosSystem.extractFromText(EidosSystem.scala:181)
at org.clulab.wm.eidos.apps.ExtractFromDirectory$.$anonfun$new$2(ExtractFromDirectory.scala:26)
at org.clulab.wm.eidos.apps.ExtractFromDirectory$.$anonfun$new$2$adapted(ExtractFromDirectory.scala:21)
at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach(ParArray.scala:142)
at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:970)
at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:49)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:63)
at scala.collection.parallel.Task.tryLeaf(Tasks.scala:52)
at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:46)
at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:967)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:149)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:145)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.