dstl / baleen Goto Github PK
View Code? Open in Web Editor NEWEntity Extraction Text Processor
License: Apache License 2.0
Entity Extraction Text Processor
License: Apache License 2.0
I was recently looking into issue #3 from the Elasticsearch side, along with the related issues elastic/elasticsearch#27832 and elastic/elasticsearch#17407.
elastic/elasticsearch#17407 (comment) also applies to your issue, as does the fix (setting "orientation": "clockwise"
in the mapping).
However, it seems that you managed to simplify your Antarctica outline to one that Elasticsearch does accept in #39, even though it still looks to be oriented clockwise. It's possible that the shape that Elasticsearch has indexed is not the shape you asked for, because of the code linked in elastic/elasticsearch#27832 (comment).
I'd recommend fixing the mapping, or reversing the orientation of the Antarctica outline to make it anticlockwise as per the GeoJSON spec, to make sure that Elasticsearch indexes it correctly. If you do so, it looks like you can revert #39 and use the original higher-precision outline (suitably reversed). Meanwhile we're looking at this leniency in more detail in elastic/elasticsearch#27832.
Hi! We spot a vulnerable dependency in your project, which might threaten your software. We also found another project that uses the same vulnerable dependency in a similar way as you did, and they have upgraded the dependency. We, thus, believe that your project is highly possible to be affected by this vulnerability similarly. The following shows the detailed information.
uk.gov.dstl.baleen.consumers.LocationElasticsearch:doProcess(org.apache.uima.jcas.JCas)
⬇️
com.fasterxml.jackson.databind.ObjectMapper:readValue(java.lang.String,java.lang.Class)
⬇️
...
⬇️
com.fasterxml.jackson.databind.JavaType:isEnumType()
Another project also used the same dependency with a similar invocation path, and they have taken actions to resolve this issue.
com.visionarts.powerjambda.actions.JsonBodyActionRequestReader:readRequest(com.visionarts.powerjambda.AwsProxyRequest)
⬇️
com.fasterxml.jackson.databind.ObjectMapper:readValue(java.lang.String,java.lang.Class)
⬇️
...
⬇️
com.fasterxml.jackson.databind.JavaType:isEnumType()
Therefore, you might also need to upgrade this dependency. Hope this can help you! 😄
Need to alter the Regex to support Thu as well as Thur. Does this also apply to other date annotators?
The following two tests fail when running as a user with super-user privileges (e.g. sudo, root) on Linux:
The cause of this is that when running as a super user, you have permission to write to read-only files. But the above tests attempt to write to read-only files in order to test the error handling. The tests expect the writes to fail, but as a super-user they don't and the test therefore fails instead.
Using the following config which I borrowed from the baleen-runner tests:
sample_pipeline.yaml
:
collectionreader:
class: FolderReader
folders:
- /tmp/data
annotators:
- class: regex.Email
- class: regex.Url
consumers:
- class: EntityCount
The application generates the following error in the output:
2018-03-19 21:32:32,138 DEBUG uk.gov.dstl.baleen.core.utils.BuilderUtils - Couldn't find class uk.gov.dstl.baleen.contentextractors.StructureContentExtractor in package uk.gov.dstl.baleen.contentextractors
java.lang.ClassNotFoundException: uk.gov.dstl.baleen.contentextractors.uk.gov.dstl.baleen.contentextractors.StructureContentExtractor
Notice, the CNFE has the package spec repeated twice: uk.gov.dstl.baleen.contentextractors.uk.gov.dstl.baleen.contentextractors.StructureContentExtractor
I believe the bug is caused by passing a fully qualified classname AND defaultPackage="uk.gov.dstl.baleen.contentextractors" to BuilderUtils.getClassFromString()
here:
Another possible fix would be to modify BuilderUtils.getClassFromString()
and test if the className
parameter contains the defaultPackage
:
Lastly, another fix would be to modify BaleenDefaults.DEFAULT_CONTENT_EXTRACTOR
so that it does not contain the FQ classname, here:
The component REST API - which returns a list of available components such as annotators - doesn't work on Java 9. This appears to be an issue with the Reflections API, which is giving the following warning when running on Java 9:
WARN org.reflections.Reflections - given scan urls are empty. set urls in the configuration
Running the same JAR on Java 8 doesn't produce this warning. The component API seems to return a null/empty object, which is therefore breaking the Plankton interface as well.
Allow gazetteers (e.g. File Gazetteer) to support subType on entities
The OdinParser
annotator fails if Baleen is run from a JAR file with a space in its full path.
This has been raised as in issue in the clulab/processors project.
You can set the termSeparator parameter - and it looks like it gets passed in the config to uk.gov.dstl.baleen.resources.gazetteer.FileGazetteer. However, the init() method ignores it, so you always get the default comma separator.
Trying to create the following pipeline causes the pipeline initialization to fail on language.OpenNLP.
history:
class: uk.gov.dstl.baleen.history.mongo.MongoHistory
collectionreader:
class: FolderReader
folders:
- corpus
annotators:
- language.OpenNLP
- class: gazetteer.Mongo
collection: person_gazetteer
valueField: name
type: Person
- class: stats.OpenNLP
model: en-ner-person.bin
type: Person
consumers:
- Mongo
Sometimes, the HTML5 output will contain a visible HTML string in the form:
…most-of-entity-name" data-referent="" >start of entity
most-of-entity-name …
This is in the HTML source as " data-referent="" >
.
From a few simple tests, this appears to happen when the tagged element contains a line-break (and hence the HTML5 output breaks it across paragraphs).
Using part of the NIST IE-ER data set (ieer-short.txt) and running it through a pipeline that uses OpenNLP results in ieer-short.html.txt.
Expected behaviour in this case is that National Convention Assembly is correctly tagged in the output without broken HTML.
The MoneyRegex annotator doesn't seem to extract the full entity, and appears to be limited at 3 figures (e.g. will pull $300 instead of $3000)
It would be useful if component parameters, for example whether an annotator should be case sensitive or not, were exposed through the API. That would allow for the development of a GUI tool for building pipelines, as we could query the REST API to find the available parameters and what they do.
The detection of Javadoc, making it available through the Baleen server, appears to fail if there are spaces in the path. Presumably, it's escaping the spaces somewhere and then is unable to find the required JAR file at the escaped path.
I have compiled successfully but cannot access the javadoc. I copied the file baleen-javadoc-2.2.0-SNAPSHOT.jar in to the same directory but it is not found and I get a 404. If I change the filename to baleen-2.2.0-SNAPSHOT-javadoc.jar then I can see from the log it finds the javadoc file. I then however get a 403 error when I try to go to it.
The is a bit frustrating as I need to read the javadoc to be able to figure out how to use baleen.
This is on the latest commit c862249
Relevant Log when starting baleen
2016-03-16 22:30:40,269 INFO org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.h.ContextHandler@77be656f{/javadoc,jar:file:/home/stuart/Programming/workrelated/baleen/baleen/target/baleen-2.2.0-SNAPSHOT-javadoc.jar!/,AVAILABLE}
Just did a clone of the repo after installing the latest JDK and Maven on OSX.
After using the command mvn package -Dmaven.test.skip=true
it fails with the following message:
Failed to execute goal on project baleen-resources: Could not resolve dependencies for project uk.gov.dstl.baleen:baleen-resources:jar:2.6.1-SNAPSHOT: Could not find artifact uk.gov.dstl.baleen:baleen-uima:jar:tests:2.6.1-SNAPSHOT
Also without the test option, it won't compile. Hopefully you can tell me what went wrong.
Best regards
Build fails with the following error:
Failed to execute goal on project baleen-collectionreaders: Could not resolve dependencies for project uk.gov.dstl.baleen:baleen-collectionreaders:jar:2.5.0-SNAPSHOT: Failure to find org.apache.pdfbox:jbig2-imageio:jar:3.0.0-SNAPSHOT in https://repository.apache.org/content/repositories/snapshots was cached in the local repository, resolution will not be reattempted until the update interval of apache-snapshots has elapsed or updates are forced
There's a transitive dependency on jbig2-imageio.jar-3.0.0-SNAPSHOT
; however, that snapshot version is not longer available in the Apache Maven Repo.
Only jbig2-imageio.jar:3.0.1-SNAPSHOT
is available. See here:
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/jbig2-imageio/
Steps to reproduce:
jbig2-imageio.jar-3.0.0-SNAPSHOT
artifacts from your local .m2/repository
mvn clean package
(or whatever goals you typically specify)It looks like the dependency is dragged in as follows:
- uk.gov.dstl.baleen:baleen-collectionreaders:jar:2.5.0-SNAPSHOT
- io.committed.krill:krill:jar:1.0.2
- org.apache.tika:tika-parsers:jar:1.16
- org.apache.pdfbox:jbig2-imageio:jar:3.0.0-SNAPSHOT
Currently uses DateTimeFormatter.ofPattern(...), but should use DateTimeFormatterBuilder with case insensitivity turned on, as per http://stackoverflow.com/questions/10797808/how-to-parse-case-insensitive-strings-with-jsr310-datetimeformatter
I'm new to large scale Java projects so this may be a noob question but the first step in your wiki references a jar file that doesn't exist. Does the project need to be built first? Is there any documentation on building the project? I see building the javadoc but I didn't find it very helpful.
If you put use a document with 'Antarctica' in it causes Elasticsearch to exception:
Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed to parse [entities.geoJson
Caused by: com.spatial4j.core.exception.InvalidShapeException: Self-intersection at or near point (-7.409738314942461, -71.63108011089658, NaN)
Your project, dstl/baleen
, depends on the outdated library FastClasspathScanner in the following source files:
FastClasspathScanner has been significantly reworked since the version your code depends upon:
ClassGraph is a significantly more robust library than FastClasspathScanner, and is more future-proof. All future development work will be focused on ClassGraph, and FastClasspathScanner will see no future development.
Please consider porting your code over to the new ClassGraph API, particularly if your project is in production or has downstream dependencies:
Feel free to close this bug report if this code is no longer in use. (You were sent this bug report because your project depends upon FastClasspathScanner, and has been starred by 109 users. Apologies if this bug report is not helpful.)
In the quick start information on the Baleen landing page, the link to CpeManager is broken as it's been renamed in 2.2.0 to PipelineCpeBuilder.
Hello there,
I'm new to Baleen, I read most of the documentation. Baleen is running in the background. But when I run my test application, I get some runtime exceptions. I ran my test application with -verbose on so I could see all the messages. I have copied my code at the bottom.
Error1:
This comes when I link my test application only with Baleen library. I get an exception "java.lang.NoClassDefFoundError: org/apache/http/config/Lookup".
Error2:
Then I linked httpCore4.4.x jar (which I hope I'm not supposed to do), ran, then I don't get above error. But I get another new exception java.lang.NoSuchMethodError: org.apache.http.entity.ContentType.withCharset(Ljava/lang/String;)Lorg/apache/http/entity/ContentType;
I assume that I must not link apache libraries since Baleen already has references to them and the libraries I link may cause to make conflicts between libraries. I'm on Windows 10, 64x and IDE is Netbeans. I'm using Baleen 2.2.0. Could someone help me to figure out what I'm missing here please?
Following is my test program.
package testbaleen;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.uima.analysis_engine.AnalysisEngine;
import org.apache.uima.analysis_engine.AnalysisEngineDescription;
import org.apache.uima.fit.factory.AnalysisEngineFactory;
import org.apache.uima.fit.factory.ExternalResourceFactory;
import org.apache.uima.jcas.JCas;
import org.apache.uima.resource.ExternalResourceDescription;
import org.apache.uima.resource.ResourceInitializationException;
import org.elasticsearch.client.Client;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.node.Node;
import org.elasticsearch.node.NodeBuilder;
import uk.gov.dstl.baleen.consumers.ElasticsearchRest;
import uk.gov.dstl.baleen.resources.SharedElasticsearchRestResource;
/**
*
@author Susantha
*/
public class TestBaleen {
private static Path tmpDir;
private static final String ELASTICSEARCH = "elasticsearchRest";
protected static Client client;
protected static JCas jCas;
protected static AnalysisEngine ae;
/**
@param args the command line arguments
*/
public static void main(String[] args) {
try {
tmpDir = Files.createTempDirectory("elasticsearch");
String s = tmpDir.toString();
Settings settings = Settings.builder()
.put("path.home", tmpDir.toString())
.put("http.port", "19600") //Don't use the default ports for testing purposes
.put("transport.tcp.port", "19300")
.build();
Node node = NodeBuilder.nodeBuilder()
.settings(settings)
.data(true)
.local(true)
.clusterName("SusanthaSearch")
.node();
ExternalResourceDescription erd = ExternalResourceFactory.createExternalResourceDescription(ELASTICSEARCH, SharedElasticsearchRestResource.class, SharedElasticsearchRestResource.PARAM_URL, "http://localhost:19600");
AnalysisEngineDescription aed = AnalysisEngineFactory.createEngineDescription(ElasticsearchRest.class, ELASTICSEARCH, erd);
try
{
System.out.println("Now creating the engine");
ae = AnalysisEngineFactory.createEngine(aed);
}catch(ResourceInitializationException ex)
{
System.out.println("Caught"+ex.getMessage());
}catch(Exception e)
{
System.out.println("Caught"+e.getMessage());
}
client = node.client();
System.out.println("Done and dusted...");
} catch (IOException ex) {
Logger.getLogger(TestBaleen.class.getName()).log(Level.SEVERE, null, ex);
//Logger.getLogger(ContentScrapper.class.getName()).log(Level.SEVERE, null, ex);
} catch (ResourceInitializationException ex) {
Logger.getLogger(TestBaleen.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
The default content extractor (currently StructureContentExtractor) should be selected by default when using Plankton. Also, when generating the YAML, the content extractor should not be explicitly set if it is the default.
Hi James,
I reinstalled Baleen 2.1.0 on my MacBook Pro and tried to launch the JavaDoc, but it isn't launching. There's an error, but that's all it is saying. Baleen 2.1.0 .JAR file is in the same directory as the JavaDoc executable and that doesn't launch either: The Terminal reports it was unable to access the .JAR file. What am I doing wrong please?
DM
It'd be great if the module which moves processed files was able to move the files to sub-folders based on the files source type. That way an archive can be kept and the whole set can be reprocessed easily if required.
I did a clean pull of the GitHub repository, but when trying to build the project it failed on the baleen-graph
project.
[ERROR] Failures:
[ERROR] EntityGraphFileTest.testGraphson:99->assertPathsEqual:64 expected:<...e":1},"value":""}],"[docId":[{"id":{"@type":"g:Int64","@value":3},"value":{"@type":"g:List","@value":["8b408a0c7163fdfff06ced3e80d7d2b3acd9db900905c4783c28295b8c996165"]}}],"isNormalised":[{"id":{"@type":"g:Int64","@value":4},"value":{"@type":"g:List","@value":[false]]}}],"longestValue":...> but was:<...e":1},"value":""}],"[isNormalised":[{"id":{"@type":"g:Int64","@value":3},"value":{"@type":"g:List","@value":[false]}}],"docId":[{"id":{"@type":"g:Int64","@value":4},"value":{"@type":"g:List","@value":["8b408a0c7163fdfff06ced3e80d7d2b3acd9db900905c4783c28295b8c996165"]]}}],"longestValue":...>
[ERROR] EntityGraphFileTest.testGyro:117
[INFO]
[ERROR] Tests run: 41, Failures: 2, Errors: 0, Skipped: 0
I'm building it on Ubuntu 16.04 with OpenJDK version 1.8.0_171.
UPDATE: Maven was actually using Java 9 (and not Java 8), and that was the cause of the problem. I've updated the issue title to reflect this.
$ mvn -version
Apache Maven 3.3.9
Maven home: /usr/share/maven
Java version: 9.0.4, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-9-oracle
Default locale: en_GB, platform encoding: UTF-8
OS name: "linux", version: "4.15.0-29-generic", arch: "amd64", family: "unix"
Hi all,
So sorry for opening this as an issue, but despite endless Googling, I can;t find anywhere to communicate with developers or Users of Baleen.
Is there a channel or forum anywhere?
I've setup a basic pipeline (using the html5 consumer) but despite using the OpenNLP Annotator, there are no Spans being added to the HMTL output.
Just seeking advice from others.
Many thanks
NullPointerException thrown by CleanTemporal if the value hasn't been set on an entity. NPE thrown on Line 102.
The ExpandLocationToDescription annotator seems to eat a lot of text up. It can produce annotations which are basically the size of the document (if the location is the last word)
The regex has spaces it in I wonder if it's looking for 'of' on its own rather than 'part of'.
As of Baleen 2.4, the required inputs and produced outputs of each annotator are declared in order to the pipeline orderers to function. It would be beneficial if this information could be exposed through the REST API.
The gotcha to this is that this information is only accessible once the annotator has been instantiated and configured, so it would either need to be per annotator in an existing pipeline, or allow for annotator configuration to be passed.
The above line is out by a factor of 10, and should be 0.00064516
The repository is no longer in line with the released 2.0.0 build, so the version numbers should be updated.
uk.gov.dstl.baleen.annotators.cleaners.helpers.AbstractNestedEntities will merge based on the first entity found (or least confidence).
Perhaps it should also consider the semantic type, a more specific type (eg Entity vs Person) should pick the person (for the same confidence)
Against 2.3 Snapshot Release -2016-11-01
Occurred on a document that contained dates: "20 January 2014" and "20 Jan 2014"
2016-11-23 14:03:20,106 WARN uk.gov.dstl.baleen.core.pipelines.BaleenPipeline - Pipeline ran with errors
org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:401)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:308)
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:269)
at org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:893)
at org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:575)
Caused by: java.lang.NullPointerException: null
at java.util.TimeZone.parseCustomTimeZone(TimeZone.java:783)
at java.util.TimeZone.getTimeZone(TimeZone.java:562)
at java.util.TimeZone.getTimeZone(TimeZone.java:516)
at uk.gov.dstl.baleen.annotators.regex.DateTime.processDayMonthTime(DateTime.java:127)
at uk.gov.dstl.baleen.annotators.regex.DateTime.doProcess(DateTime.java:46)
at uk.gov.dstl.baleen.uima.BaleenAnnotator.process(BaleenAnnotator.java:81)
at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:385)
Hi,
I'd like to use Baleen on a project. I have added in the top level dependency from Maven Central to my pom file, as follows:
<dependency>
<groupId>uk.gov.dstl.baleen</groupId>
<artifactId>baleen</artifactId>
<version>2.3.0</version>
</dependency>
but the build is failing. If you use one of the child packages it seems to work. Any ideas what I am doing wrong?
[ERROR] Failed to execute goal on project graph-loader-ejb: Could not resolve dependencies for project graph-loader-ejb:ejb:1.0-SNAPSHOT: Failure to find uk.gov.dstl.baleen:baleen:jar:2.3.0 in https://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced -> [Help 1]
[
One a freshly built test implementation, the YAML provided in the sample documentation (included below) fails with a 500 error when submitting with a POST with the two form parameters to http://localhost:6413/api/1/pipelines
mapping values are not allowed here
in 'string', line 1, column 26:
collectionreader: class: FolderReader folders: - ./ ...
^
Sample YAML:
mongo:
db: baleen
host: localhost
elasticsearch:
cluster: elasticsearch
host: localhost
collectionreader:
class: FolderReader
folders:
annotators:
consumers:
Can we add a parameter to the affected annotators (any that use DateTimeFormatter I believe) to allow the configuration of the pivot point?
Hi,
I tried to build from the command line, but there is no -javadoc.jar file placed in the target directory. Even if I remove the comment from the baleen-javadoc/pom.xml file to ensure that a javadoc Jar file is created, it doesnt contain all the files that are expected by the Core webapp UI.
Any advices?
Fails to build on openjdk 8. I did not spend time figuring out why but switched to oracle jdk. You should make a note in the ReadMe saying that it must be Oracle and not OpenJDK.
The MongoReader collection reader fails at line 127 due to an old version (2.2) of commons-io being built into the JAR file and not supporting this call to IOUtils.toInputStream.
This behaviour was noted having built Baleen 2.4.1-SNAPSHOT from source in Netbeans 8.2.
Hi James, Team
I'm running Eclipse and trying to install Baleen, but keep getting build failure errors for the Collection Readers onwards, which means it isn't building half of the tool.
Please can you help?
Cheers,
DM
Results :
Failed tests: testMultipleDirectories(uk.gov.dstl.baleen.collectionreaders.FolderReaderTest)
testSubDirectoriesNonRecursive(uk.gov.dstl.baleen.collectionreaders.FolderReaderTest): expected:</[]var/folders/n2/4x13f...> but was:</[private/]var/folders/n2/4x13f...>
testModifiedFile(uk.gov.dstl.baleen.collectionreaders.FolderReaderTest)
testSubDirectories(uk.gov.dstl.baleen.collectionreaders.FolderReaderTest)
testCreateFile(uk.gov.dstl.baleen.collectionreaders.FolderReaderTest)
Tests run: 19, Failures: 5, Errors: 0, Skipped: 0
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Baleen ............................................ SUCCESS [ 2.281 s]
[INFO] Baleen Core ....................................... SUCCESS [ 28.455 s]
[INFO] Baleen UIMA ....................................... SUCCESS [ 10.691 s]
[INFO] Baleen Resources .................................. SUCCESS [ 44.327 s]
[INFO] Baleen Annotators ................................. SUCCESS [01:07 min]
[INFO] Baleen Collection Readers ......................... FAILURE [ 27.084 s]
[INFO] Baleen Consumers .................................. SKIPPED
[INFO] Baleen History .................................... SKIPPED
[INFO] Baleen Runner ..................................... SKIPPED
[INFO] Baleen Javadoc .................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:00 min
[INFO] Finished at: 2015-10-19T23:10:03+00:00
[INFO] Final Memory: 21M/208M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.10:test (default-test) on project baleen-collectionreaders: There are test failures.
[ERROR]
[ERROR] Please refer to /Users/User1/Desktop/baleen-master/baleen/baleen-collectionreaders/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :baleen-collectionreaders
This is caused by two version of the liblinear library in the dependency hierarchy and the 1.8 version used by the maltparser wins.
The current landing page for the Baleen server (the page that appears when you go to http://localhost:6413) references baleen-2.5.0-SNAPSHOT.jar
in the commands, but should now reference baleen-2.6.0-SNAPSHOT.jar
(and just baleen-2.6.0.jar
once released).
Unless the Elasticsearch mapping explicitly defines the entities array as nested, then we lose the ability to search documents for, as an example, Person entities with the value Holmes.
The mapping needs changing in AbstractElasticsearchConsumer
to define entities as a nested object, as per https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-mapping.html
The Baleen "requirements" suggest Oracle JDK but also just say that Baleen works with Java 8.
I've tried compiling with OpenJDK, but Maven fails because of a test failure. Baleen doesn't require JavaFX anywhere, but the testComponents() unit test for AbstractComponentApiServletTest uses JavaFX for its example data. OpenJDK doesn't include JavaFX.
I've managed to build Baleen by commenting out the entire contents of that test. It would be helpful if Baleen's tests didn't depend on proprietary classes that aren't in OpenJDK when the core functionality does not require those classes.
Typo on ouput/output
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.