Cleansing package for Stratopshere Sopremo that provides operators for data scrubbing, duplicate detection, record linkage, and data fusion.
sopremo-cleansing's Introduction
sopremo-cleansing's People
sopremo-cleansing's Issues
EE should support keys in arrays
Scrubbing should support complex objects
$data_scrubbed = scrub $data with rules {
worksFor: {
name: notContainedIn(["name_B"]),
ceos: required
}
};
Testcase:
sopremo-cleansing/src/test/java/eu/stratosphere/sopremo/cleansing/scrubbing/ScrubbingComplexTest.java
Position in RulebasedScrubbing marked with TODO
Allow expressions in entity mapping groupings for id's
something like this should work:
...entity $usCongressBiographies identified by concat_strings($usCongressBiographies.worksFor, $usCongressBiographies.someOtherField,'_LE') with {...
duplicate detection should support arbitrary boolean experssion
allow joins over 3 or more sources in EE
Port snm to record linkage
Allow switched inputs in EE
scrub should support relation constraints
bug in target join for embedded value correspondences
...
worksFor: [ {legalEntity : $usCongressParties.id}]
...
in EntityExtraction operator does not work (something with target joins prob.)
DepShield encountered errors while building your project
The project could not be analyzed because of maven build errors. Please review the error messages here. Another build will be scheduled within 24 hours. If the build is successful this issue will be closed, otherwise the error message will be updated.
This is an automated GitHub Issue created by Sonatype DepShield. GitHub Apps, including DepShield, can be managed from the Developer settings of the repository administrators.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.