scify / jedaitoolkit Goto Github PK
View Code? Open in Web Editor NEWAn open source, high scalability toolkit in Java for Entity Resolution.
Home Page: http://jedai.scify.org
License: Apache License 2.0
An open source, high scalability toolkit in Java for Entity Resolution.
Home Page: http://jedai.scify.org
License: Apache License 2.0
The second URL of every row is mapped to 0.
So all of the records in the first column of the CSV file are considered duplicates (via transitive closure).
I have my own custom data csv files for both dataset as well as ground truth file,
can anyone help me to use this file to get result.
Actually it throw some errors while using this files as an input.
Exception in thread "main" java.lang.IllegalArgumentException: loops not allowed at org.jgrapht.graph.AbstractBaseGraph.addEdge(AbstractBaseGraph.java:218) at org.scify.jedai.datareader.groundtruthreader.GtCSVReader.getDuplicatePairs(GtCSVReader.java:206) at org.scify.jedai.datareader.groundtruthreader.AbstractGtReader.getDuplicatePairs(AbstractGtReader.java:58) at org.scify.jedai.workflowbuilder.Main.main(Main.java:254)
can anyone help me?
There is a bug in the code that prevents the ground-truth in CSV format from being read. I tried the samples provided and the web-based docker image failed to load it. I downloaded the code and run it step by step and I think there is a problem with the GtCSVReader. The reading part takes strings like "thisisastring" where only thisisastring should be read. I tried to add nextLine[0] = nextLine[0].substring(1, nextLine[0].length()-1); on line 200 in that file, but no success so far. I need to make it work to test some CSV entity matchings, so maybe somebody has the fix for this issue?
Having the blocking and clustering related classes serializable enables the usage of jedai-core in applications run in Hadoop and Spark clusters.
Having setters for the configurable fields add more flexibility in creating the blocking and clustering objects.
I got the following error when I tried blocking with schema clusters:
java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 at org.scify.jedai.blockbuilding.AbstractBlockBuilding.lambda$parseIndex$10(AbstractBlockBuilding.java:167) at java.base/java.util.HashMap.forEach(HashMap.java:1336) at org.scify.jedai.blockbuilding.AbstractBlockBuilding.parseIndex(AbstractBlockBuilding.java:164) at org.scify.jedai.blockbuilding.AbstractBlockBuilding.readBlocks(AbstractBlockBuilding.java:196) at org.scify.jedai.blockbuilding.AbstractBlockBuilding.getBlocks(AbstractBlockBuilding.java:96) at org.scify.jedai.gui.utilities.WorkflowManager.runBlockBuilding(WorkflowManager.java:824) at org.scify.jedai.gui.utilities.WorkflowManager.runBlockingBasedWorkflow(WorkflowManager.java:896) at org.scify.jedai.gui.utilities.WorkflowManager.executeFullBlockingBasedWorkflow(WorkflowManager.java:393) at org.scify.jedai.gui.utilities.WorkflowManager.executeFullWorkflow(WorkflowManager.java:695) at org.scify.jedai.gui.controllers.steps.CompletedController.lambda$runAlgorithmBtnHandler$6(CompletedController.java:316) at java.base/java.lang.Thread.run(Thread.java:834)
There is a String split operation in the parseIndex function, that is not working properly:
final String[] entropyString = key.split(CLUSTER_SUFFIX);
The delimiters used in key
are equivalent to CLUSTER_PREFIX
, not CLUSTER_SUFFIX
, and they contain a dollar-sign that has to be escaped. I worked around the issue by changing the above line to
final String[] entropyString = key.split("#\\$!cl");
I'd suggest changing the values of the prefix and suffix to something that is compatible with regex - the solution above is less readable after all.
We're using jedai-core (not jedai-ui) in our application and we ran into some Out of Memory errors and started profiling our application. The largest chunk of memory was from SimilarityPairs. We experimented with reducing the size of the similarities
from double to float and that reduced the memory footprint by about 25% (630 MB -> 470 MB).
I'm assuming we don't need the extra precision afforded by double, is that correct?
Hi, I found that the number of data under this repository does not seem to match the original one, and I would like to know if the data has been processed. For example, the original Amazon-Google has 1363, 3226 entities and 1300 matches respectively, but the numbers are less in this project.
Also I see a lot of dirty data that seems to just mix the two tables together? Is there any other processing.
In Similarity join page on UI, on providing the Select attribute of Dataset 1: & Select attribute of Dataset 2: value with uppercase value Eg: "INSTANCE ID", the algorithm fails to match results. On further investigating I found that the class AbstractSimilarityJoin
method getAttributeValue(String attributeName, EntityProfile profile)
on line 67 the attributeName
should be changed to attributeName.toLowerCase()
for considering attributeNames properly or else it simply ignores the if condition.
Hello,
I am trying to run Web based application for a data matching task. I have two tables in the csv format: the first table contains 1.2k rows and the second table contains 7k queries. I want to use JedAI to match each query with a row from the first table. When I run a "block-based workflow" the process stuck in the table loading.
I am a bit lost about how to configure the model. So far I tried the settings in the video tutorial and some other settings but the application never generates any outputs. I share the Tables with the message, please let me know if there is anything wrong with the way i generated them.
I am trying to download the pre-compiled version from the http://jedai.scify.org website.
When I click on Download desktop app for both "Desktop application for Entity Resolution" and "Workbench tool," I get a "Page Not Found" on Github.
Additionally, I created an issue for this, because the webpage doesn't have any contact information. :/
^ I tried compiling it on my machine, but it showed to a crawl, and took over an hour, so I decided to try to download the precompiled JARs. That's why I wanted to download it.
constructor parameters not assigned to the class properties
I cloned the project to my local and followed the steps listed in the readme , but it fails to build with the error below :
git clone https://github.com/scify/JedAIToolkit.git
cd JedAIToolkit
git submodule update --init
mvn clean package
[INFO] --- maven-assembly-plugin:2.2-beta-5:single (default) @ jedai-ui ---
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] jedai .............................................. SUCCESS [ 0.259 s]
[INFO] jedai-core ......................................... SUCCESS [ 59.511 s]
[INFO] jedai-ui ........................................... FAILURE [ 6.408 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:06 min
[INFO] Finished at: 2018-12-11T15:42:46-05:00
[INFO] Final Memory: 42M/406M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2. 2-beta-5:single (default) on project jedai-ui: Error reading assemblies: Error l ocating assembly descriptor: assembly.xml
[ERROR]
[ERROR] [1] [INFO] Searching for file location: C:\Users\Yeikel\Documents\JedAIT oolkit\jedai-ui\assembly.xml
[ERROR]
[ERROR] [2] [INFO] File: C:\Users\Yeikel\Documents\JedAIToolkit\jedai-ui\assembl y.xml does not exist.
[ERROR]
[ERROR] [3] [INFO] Invalid artifact specification: 'assembly.xml'. Must contain at least three fields, separated by ':'.
[ERROR]
[ERROR] [4] [INFO] Failed to resolve classpath resource: assemblies/assembly.xml from classloader: ClassRealm[plugin>org.apache.maven.plugins:maven-assembly-plu gin:2.2-beta-5, parent: sun.misc.Launcher$AppClassLoader@33909752]
[ERROR]
[ERROR] [5] [INFO] Failed to resolve classpath resource: assembly.xml from class loader: ClassRealm[plugin>org.apache.maven.plugins:maven-assembly-plugin:2.2-bet a-5, parent: sun.misc.Launcher$AppClassLoader@33909752]
[ERROR]
[ERROR] [6] [INFO] File: C:\Users\Yeikel\Documents\JedAIToolkit\assembly.xml doe s not exist.
[ERROR]
[ERROR] [7] [INFO] Building URL from location: assembly.xml
[ERROR] Error:
[ERROR] java.net.MalformedURLException: no protocol: assembly.xml
[ERROR] at java.net.URL.<init>(URL.java:593)
[ERROR] at java.net.URL.<init>(URL.java:490)
[ERROR] at java.net.URL.<init>(URL.java:439)
[ERROR] at org.apache.maven.shared.io.location.URLLocatorStrategy.resolv e(URLLocatorStrategy.java:54)
[ERROR] at org.apache.maven.shared.io.location.Locator.resolve(Locator.j ava:81)
[ERROR] at org.apache.maven.plugin.assembly.io.DefaultAssemblyReader.add AssemblyFromDescriptor(DefaultAssemblyReader.java:309)
[ERROR] at org.apache.maven.plugin.assembly.io.DefaultAssemblyReader.rea dAssemblies(DefaultAssemblyReader.java:125)
[ERROR] at org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.e xecute(AbstractAssemblyMojo.java:352)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:134)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(Mojo Executor.java:208)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(Mojo Executor.java:154)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(Mojo Executor.java:146)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.bu ildProject(LifecycleModuleBuilder.java:117)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.bu ildProject(LifecycleModuleBuilder.java:81)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.Si ngleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute( LifecycleStarter.java:128)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309 )
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194 )
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
[ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcces sorImpl.java:62)
[ERROR] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMet hodAccessorImpl.java:43)
[ERROR] at java.lang.reflect.Method.invoke(Method.java:498)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhan ced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Laun cher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExi tCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launch er.java:356)
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit ch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please rea d the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionE xception
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :jedai-ui
The link for DBPedia in data/README.md doesn't work.
Hello,
In the entity matching step I'm trying to combine different bag models with similarity measures for the dirty dataset "movies" in the data folder.
Unfortunately I'm unable to get high recall and high precision, could you give a good "recipe" to get good results for that dataset?
Thank you
I found some cases where data pairs showed up in the end results as false negative and true positive simultaneously.
Its cause is in the class UnilateralDuplicatePropagation
and the following functions:
public boolean isSuperfluous(int entityId1, int entityId2) {
final IdDuplicates duplicatePair1 = new IdDuplicates(entityId1, entityId2);
final IdDuplicates duplicatePair2 = new IdDuplicates(entityId2, entityId1);
if (duplicates.contains(duplicatePair1)
|| duplicates.contains(duplicatePair2)) {
if (entityId1 < entityId2) {
detectedDuplicates.add(duplicatePair1);
} else {
detectedDuplicates.add(duplicatePair2);
}
}
return false;
}
public Set<IdDuplicates> getFalseNegatives() {
final Set<IdDuplicates> falseNegatives = new HashSet<>(duplicates);
falseNegatives.removeAll(detectedDuplicates);
return falseNegatives;
}
Only one of two possible combinations of IDs is written to detectedDuplicates, but superfluous combinations still exist in duplicates. When removing detectedDuplicates from duplicates to create falseNegatives, those superfluous combinations remain and are exported as false negatives, while the combinations in detectedDuplicates are exported as true positives.
If records[k] length is less than records[candId] then we get array index of bound since requireOverlaps created on Kth record size which might be less than records[candId].length
The PrintToFile.toCSV() method should output the original entity urls, and should be in a format which is easier to import into a database. e.g. 3 columns: custer_id, dataset, entity_url
Hi I was wondering if you have the dirty datasets available in CSV format? Otherwise I can just create a quick script that reads the JSO files and convert them myself, but I figured there is no harm in asking first! Thanks in advance.
It looks like the groundtruth file is wrong.
Hi,
I'm unable to build the project.
The following dependencies can't be found :
The first one can't be found, the two others seems to be on an unreachable repository http://backend1.scify.org:60004/artifactory/pub-release-local
mvn clean install -U
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] jedai [pom]
[INFO] jedai-core [jar]
[INFO] jedai-ui [jar]
[INFO]
[INFO] ---------------------------< gr.scify:jedai >---------------------------
[INFO] Building jedai 1.3 [1/3]
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ jedai ---
[INFO]
[INFO] --- maven-install-plugin:2.4:install (default-install) @ jedai ---
[INFO] Installing C:\projet\JedAIToolkit\pom.xml to C:\Users\nicolas.lledo\.m2\repository\gr\scify\jedai\1.3\jedai-1.3.pom
[INFO]
[INFO] ------------------------< gr.scify:jedai-core >-------------------------
[INFO] Building jedai-core 1.3 [2/3]
[INFO] --------------------------------[ jar ]---------------------------------
Downloading from nexus.somecompany.com: http://nexus.somecompany.com/repository/maven-public/com/esotericsoftware/minlog/minlog/1.2-slf4j-jdanbrown-0/minlog-1.2-slf4j-jdanbrown-0.pom
[WARNING] The POM for com.esotericsoftware.minlog:minlog:jar:1.2-slf4j-jdanbrown-0 is missing, no dependency information available
Downloading from nexus.somecompany.com: http://nexus.somecompany.com/repository/maven-public/gr/demokritos/JInsect/1.1/JInsect-1.1.pom
[WARNING] The POM for gr.demokritos:JInsect:jar:1.1 is missing, no dependency information available
Downloading from nexus.somecompany.com: http://nexus.somecompany.com/repository/maven-public/salvo/jesus/OpenJGraph/1.1/OpenJGraph-1.1.pom
[WARNING] The POM for salvo.jesus:OpenJGraph:jar:1.1 is missing, no dependency information available
Downloading from nexus.somecompany.com: http://nexus.somecompany.com/repository/maven-public/com/esotericsoftware/minlog/minlog/1.2-slf4j-jdanbrown-0/minlog-1.2-slf4j-jdanbrown-0.jar
Downloading from nexus.somecompany.com: http://nexus.somecompany.com/repository/maven-public/salvo/jesus/OpenJGraph/1.1/OpenJGraph-1.1.jar
Downloading from nexus.somecompany.com: http://nexus.somecompany.com/repository/maven-public/gr/demokritos/JInsect/1.1/JInsect-1.1.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for jedai 1.3:
[INFO]
[INFO] jedai .............................................. SUCCESS [ 0.452 s]
[INFO] jedai-core ......................................... FAILURE [ 1.671 s]
[INFO] jedai-ui ........................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.393 s
[INFO] Finished at: 2019-02-27T17:50:24+01:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project jedai-core: Could not resolve dependencies for project gr.scify:jedai-core:jar:1.3: The following artifacts could not be resolved: com.esotericsoftware.minlog:minlog:jar:1.2-slf4j-jdanbrown-0, gr.demokritos:JInsect:jar:1.1, salvo.jesus:OpenJGraph:jar:1.1: Could not find artifact com.esotericsoftware.minlog:minlog:jar:1.2-slf4j-jdanbrown-0 in nexus.somecompany.com (http://nexus.somecompany.com/repository/maven-public/) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :jedai-core
Hello,
im working on converting the DBPedia dataset into a format accessible without Java.
I have already converted cleanDBPedia1/2.
However i do not understand the ground truth format.
The profiles have attributes and a URI.
The pairs in the ground truth consist of numbers.
However, when i interpret these numbers as offsets into either file i end up with non-matching pairs.
I wrote the entities into the files in the order they were in the deserialized Java list.
How to find matching pairs / understand the grund truth?
Kind regards
I am looking at the source code of JedAIToolkit in github.
I am not able to find the sample csv file for testing.
Can I get the cd_gold.csv and cd.csv file which has been used for testing purpose of TestGtCSVReader.java & TestEntityCSVReader.java.
Selecting in C-C mode the Abt-Buy
, it takes for 2nd dataset amazonProfiles
.
Selecting amazonProfiles
, it takes for groundtruth amazonGpIdDuplicates
.
StandardBlocking.getTokens() throws null pointer exception when input parameter is null.
We ought to stop null values from being added to the EntityProfile when reading from a database
I cannot seem to find any documentation or examples of a standard workflow implemented in python or java in your repository. Do either of these exist? If so, where could I find them? If not, it would be very useful to have these, since a new user of your tool like me now has to go through all of the java classes to learn how to use the tool, which will take a lot of time.
I had a look at the code of the SiGMa Similarity in class CharacterNGramsWithGlobalWeights
and it seems to be exactly the same code as in the Generalized Jaccard Similarity. Am I missing something or is SiGMa not really implemented?
This issue arose when I attempted to reproduce the workflow in: org.scify.jedai.demoworkflows.CsvDblpAcm.java.
During the reading process of the ground truths in DBLP-ACM_perfectMapping.csv (specifically the GtCSVReader.getDuplicatePairs method), the detection of connected components by the jgrapht package seems to not work.
For some reason I obtain a single cluster of size 2225 and then 5375 more clusters of size 1, which is obviously incorrect since the csv contains about 2225 unique pairs (which should in turn produce 2225 clusters of size 2).
Have you seen this problem before? Maybe the jgrapht package expects a different format than it did previously?
We're using jedai-core in our application and we ran into some issues where the number of executed comparisons in ComparisonIterator was going over the number of total comparisons. We identified that this was happening because executedComparisons and totalComparisons are floats and changing them to ints fixed the problem. In Java, comparing two floats for exact equality is generally discouraged.
Hi, In /maven-plugins/sitegen-maven-plugin,there is a dependency **org.apache.httpcomponents:httpclient-cache:jar:4.2.6
** that calls the risk method.
The scope of this CVE affected version is [,4.5.13)
After further analysis, in this project, the main Api called is org.apache.http.client.utils.URIUtils: extractHost(java.net.URI)Lorg.apache.http.HttpHost
Risk method repair link : GitHub
CVE Bug Invocation Path--
Path Length : 7
org.scify.jedai.datawriter.BlocksPerformanceWriter: printDetailedResultsToSPARQL(java.util.List,java.util.List,java.lang.String,java.lang.String)V /home/hjf/.m2/repository/org/apache/jena/jena-cmds/3.1.0/jena-cmds-3.1.0.jar
org.apache.jena.sparql.modify.UpdateProcessRemoteForm: execute()V /home/hjf/.m2/repository/org/apache/jena/jena-cmds/3.1.0/jena-cmds-3.1.0.jar
org.apache.jena.riot.web.HttpOp: execHttpPostForm(java.lang.String,org.apache.jena.sparql.engine.http.Params,java.lang.String,org.apache.jena.riot.web.HttpResponseHandler,org.apache.http.client.HttpClient,org.apache.http.protocol.HttpContext,org.apache.jena.atlas.web.auth.HttpAuthenticator)V /home/hjf/.m2/repository/org/apache/jena/jena-cmds/3.1.0/jena-cmds-3.1.0.jar
org.apache.jena.riot.web.HttpOp: exec(java.lang.String,org.apache.http.client.methods.HttpUriRequest,java.lang.String,org.apache.jena.riot.web.HttpResponseHandler,org.apache.http.client.HttpClient,org.apache.http.protocol.HttpContext,org.apache.jena.atlas.web.auth.HttpAuthenticator)V /home/hjf/.m2/repository/org/apache/jena/jena-cmds/3.1.0/jena-cmds-3.1.0.jar
org.apache.http.impl.client.AbstractHttpClient: execute(org.apache.http.client.methods.HttpUriRequest,org.apache.http.protocol.HttpContext)Lorg.apache.http.HttpResponse; /home/hjf/.m2/repository/org/apache/jena/jena-cmds/3.1.0/jena-cmds-3.1.0.jar
org.apache.http.impl.client.AbstractHttpClient: determineTarget(org.apache.http.client.methods.HttpUriRequest)Lorg.apache.http.HttpHost; /home/hjf/.m2/repository/org/apache/jena/jena-cmds/3.1.0/jena-cmds-3.1.0.jar
org.apache.http.client.utils.URIUtils: extractHost(java.net.URI)Lorg.apache.http.HttpHost;
Dependency tree--
[INFO] org.scify:jedai-core:jar:3.2.1
[INFO] +- org.jgrapht:jgrapht-core:jar:1.4.0:compile
[INFO] | \- org.jheaps:jheaps:jar:0.11:compile
[INFO] +- net.sf.trove4j:trove4j:jar:3.0.3:compile
[INFO] +- com.esotericsoftware:minlog:jar:1.3.1:compile
[INFO] +- info.debatty:java-lsh:jar:0.11:compile
[INFO] | \- info.debatty:java-string-similarity:jar:0.12:compile
[INFO] +- org.apache.commons:commons-lang3:jar:3.4:compile
[INFO] +- org.apache.commons:commons-math3:jar:3.1.1:compile
[INFO] +- org.apache.jena:jena-arq:jar:3.1.0:compile
[INFO] | +- org.apache.jena:jena-core:jar:3.1.0:compile
[INFO] | | +- org.apache.jena:jena-iri:jar:3.1.0:compile
[INFO] | | +- xerces:xercesImpl:jar:2.11.0:compile
[INFO] | | | \- xml-apis:xml-apis:jar:1.4.01:compile
[INFO] | | +- commons-cli:commons-cli:jar:1.3:compile
[INFO] | | \- org.apache.jena:jena-base:jar:3.1.0:compile
[INFO] | | \- com.github.andrewoma.dexx:collection:jar:0.6:compile
[INFO] | +- org.apache.jena:jena-shaded-guava:jar:3.1.0:compile
[INFO] | +- org.apache.httpcomponents:httpclient:jar:4.2.6:compile
[INFO] | | +- org.apache.httpcomponents:httpcore:jar:4.2.5:compile
[INFO] | | \- commons-codec:commons-codec:jar:1.6:compile
[INFO] | +- com.github.jsonld-java:jsonld-java:jar:0.7.0:compile
[INFO] | | +- com.fasterxml.jackson.core:jackson-core:jar:2.3.3:compile
[INFO] | | +- com.fasterxml.jackson.core:jackson-databind:jar:2.3.3:compile
[INFO] | | | \- com.fasterxml.jackson.core:jackson-annotations:jar:2.3.0:compile
[INFO] | | \- commons-io:commons-io:jar:2.4:compile
[INFO] | +- org.apache.httpcomponents:httpclient-cache:jar:4.2.6:compile
[INFO] | +- org.apache.thrift:libthrift:jar:0.9.2:compile
[INFO] | +- org.slf4j:jcl-over-slf4j:jar:1.7.20:compile
[INFO] | +- org.apache.commons:commons-csv:jar:1.0:compile
[INFO] | \- org.slf4j:slf4j-api:jar:1.7.20:compile
[INFO] +- org.apache.jena:jena-cmds:jar:3.1.0:compile
[INFO] | +- org.apache.jena:apache-jena-libs:pom:3.1.0:compile
[INFO] | | \- org.apache.jena:jena-tdb:jar:3.1.0:compile
[INFO] | +- org.slf4j:slf4j-log4j12:jar:1.7.20:compile
[INFO] | \- log4j:log4j:jar:1.2.17:compile
[INFO] +- com.opencsv:opencsv:jar:3.7:compile
[INFO] +- org.jdom:jdom2:jar:2.0.6:compile
[INFO] +- org.scify:JInsect:jar:1.1:compile
[INFO] | \- org.scify:OpenJGraph:jar:1.1:compile
[INFO] +- org.rdfhdt:hdt-java-core:jar:1.1:compile
[INFO] | +- com.beust:jcommander:jar:1.32:compile
[INFO] | +- org.rdfhdt:hdt-api:jar:1.1:compile
[INFO] | \- org.apache.commons:commons-compress:jar:1.6:compile
[INFO] | \- org.tukaani:xz:jar:1.4:compile
[INFO] +- com.google.guava:guava-testlib:jar:30.1.1-jre:test
[INFO] | +- com.google.code.findbugs:jsr305:jar:3.0.2:test
[INFO] | +- org.checkerframework:checker-qual:jar:3.8.0:test
[INFO] | +- com.google.errorprone:error_prone_annotations:jar:2.5.1:test
[INFO] | +- com.google.j2objc:j2objc-annotations:jar:1.3:test
[INFO] | +- com.google.guava:guava:jar:30.1.1-jre:test
[INFO] | | +- com.google.guava:failureaccess:jar:1.0.1:test
[INFO] | | \- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:test
[INFO] | \- junit:junit:jar:4.13.2:test
[INFO] | \- org.hamcrest:hamcrest-core:jar:1.3:test
[INFO] +- org.hamcrest:hamcrest:jar:2.2:test
[INFO] +- org.junit.jupiter:junit-jupiter-api:jar:5.7.2:test
[INFO] | +- org.apiguardian:apiguardian-api:jar:1.1.0:test
[INFO] | +- org.opentest4j:opentest4j:jar:1.2.0:test
[INFO] | \- org.junit.platform:junit-platform-commons:jar:1.7.2:test
[INFO] \- org.junit.jupiter:junit-jupiter-engine:jar:5.7.2:test
[INFO] \- org.junit.platform:junit-platform-engine:jar:1.7.2:test
Suggested solutions:
Update dependency version
Thank you very much.
Users of jedai-core are unable to extend the library to utilize a custom similarity metric or entity matching method due to the enums defined in the project (e.g. SimilarityMetric
, EntityMatchingMethod
, BlockCleaningMethod
, etc.). Instead, if these features utilized an extension mechanism (for example, java.util.ServiceLoader
or something equivalent), custom features would be possible.
To make it easier to consume, can the project be deployed to Maven Central?
Using the library from CLI (Linux) it raises this exception:
Please choose one of the available Clean-clean ER datasets:
1 - Abt-Buy
2 - DBLP-ACM
3 - DBLP-Scholar
4 - Amazon-Google Products
5 - IMDB-DBPedia Movies
1
Abt-Buy has been selected!
0 [main] ERROR com.esotericsoftware.minlog - Error in data reading
java.io.FileNotFoundException: data/cleanCleanErDatasets/amazonProfiles (File o directory non esistente)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at org.scify.jedai.datareader.AbstractReader.loadSerializedObject(AbstractReader.java:54)
at org.scify.jedai.datareader.entityreader.EntitySerializationReader.getEntityProfiles(EntitySerializationReader.java:48)
at org.scify.jedai.workflowbuilder.Main.main(Main.java:241)
Exception in thread "main" java.lang.NullPointerException
at java.util.ArrayList.addAll(ArrayList.java:581)
at org.scify.jedai.datareader.entityreader.EntitySerializationReader.getEntityProfiles(EntitySerializationReader.java:48)
at org.scify.jedai.workflowbuilder.Main.main(Main.java:241)
I am using the following release
And I am trying the jedaiDesktopApp-1.1.jar
with the following datasets (from the samples) :
abtBuyIdDuplicates (for D1)
abtBuyProfiles (for truth file)
But I get the following error :
I tried with CSV files and I also get the same error
If another project is going to depend on jedai-core, having the transitive dependencies assembled inside jedai-core has the potential to conflict if different versions of those same transitive dependencies are needed for the other project. Since jedai-ui is already assembling transitive dependencies, removing transitive dependencies from jedai-core should not have any effect on the UI.
I get the following error after specifying input sources and then pressing "Next" button in Data Reading Step in JedAI UI:
The input files could not be read successfully.
Details: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Character
cannot be cast to java.lang.String (java.lang.Character cannot be cast to cast to java.lang.String)
In the terminal of Docker's Web Application I have the following:
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Character
at kr.di.uoa.gr.jedaiwebapp.models.Dataset.<init>(Dataset.java:86) ~[classes!/:0.0.1-SNAPSHOT]
at kr.di.uoa.gr.jedaiwebapp.controllers.WorkflowController.validate_DataRead(WorkflowController.java:75) ~[classes!/:0.0.1-SNAPSHOT]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_212]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_212]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_212]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_212]
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:190) ~[spring-web-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
...
How do I create a ground Truth file?
Hi, I'm tried some tests with JedAI tool.
This tool is useful for my job and I think that it has big potentiality.
I've downloaded the attached file in nt format: source.nt, target.nt.
In the firts step I have successfully executed TestRdfReader class presents in the test package for both datasets. After that I've tried to execute TestGtRDFReader class with the same datasets used before, but I have the following error:
Exception in thread "main" java.lang.IllegalArgumentException: loops not allowed at org.jgrapht.graph.AbstractBaseGraph.addEdge(AbstractBaseGraph.java:203) at org.scify.jedai.datareader.groundtruthreader.GtRDFReader.performReading(GtRDFReader.java:236) at org.scify.jedai.datareader.groundtruthreader.GtRDFReader.getDuplicatePairs(GtRDFReader.java:92) at org.scify.jedai.datareader.groundtruthreader.AbstractGtReader.getDuplicatePairs(AbstractGtReader.java:57) at org.scify.jedai.datareader.TestGtRDFReader.main(TestGtRDFReader.java:39)
Thanks in advance!
Hi!
I have successfully made the Web application work and I also made my first successful steps by using JedAI with Python.
But now I want to do it programatically with Python and without the Web application, so I want to apply the full workflow but only with the terminal and the VS Code.
But I couldn't find any detailed documentation how I can do blocking, cleaning ... programatically.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.