Comments (7)
Hi, I would recommend using the HC version in Dev as we have fixed many issues there. Can you please let us if the error persists with the dev version? If so we would be happy to investigate.
from holoclean.
Hello,
I have just tried it in the latest dev version, but the issue persists. Values are repaired in the 20-tuple dataset, but the same values no longer change as a larger portion of the data is considered (and the new errors in the larger datasets are not repaired either).
from holoclean.
Hi @j-r77 I am currently working on reproducing and debugging this issue and will get back to you.
from holoclean.
Hi @minafarid Just wondering whether you managed to reproduce and debug the issue already? Thanks.
from holoclean.
Hi @fgeerts we are actively working on this issue. It is a bit more intricate than what it seems. This issue comes up because the only attributes that are strongly correlated in the Adult dataset are "relationship" and "sex", i.e., the ones present in your constraints (see attached image).
We are actively working on this issue and we will be getting back to you ASAP.
from holoclean.
Hi @j-r77:
We did some digging around and it seems that the issue lies in the use of InitAttFeaturizer
. Because of how we currently do weak supervision, our InitAttFeaturizer
feature weights actually blows up and will assign to much emphasis on the initial values which causes no repairs to occur.
If you pass in the keyword argument learnable=False
, you should be able to see better results. We've recently tweaked how we do weak supervision in #43 such that InitAttFeaturizer
behaves as intended.
That being said with this specific dataset as @thodrek pointed out, since there are so few correlated attributes weak supervision fails to assign confident weak labels and results in the prior behaviour.
In this case Holoclean actually prefers not to repair any cell as demonstrated because it is unconfident that any repairs are correct due to the lack of correlations.
Hope that helps.
from holoclean.
from holoclean.
Related Issues (20)
- Documents for defining constraints HOT 2
- pos_values df in memory
- How to incorporate external knowledge HOT 1
- Factor graph models HOT 1
- Table cell_distr is either empty or does not exist HOT 1
- Is there any example code for flight dataset? HOT 1
- Compatibility
- Boolean value TypeError in start_example.sh script
- clean up debugging information HOT 2
- translation of simple DCs with a constant to SQL queries not working
- Using HoloClean for creating labels on tabular numerical datasets HOT 6
- [ci] run unit tests in travis HOT 1
- implement single tuple constraints HOT 1
- Correlation for categorical columns HOT 1
- doubt about featurizer tensor size HOT 1
- branch: best branch for testing purposes HOT 3
- Training and Inferring in different datasets
- Confused active attributes returned if not running detect_errors before generate domain HOT 1
- Example that does not use postgres HOT 1
- unable to process large data set of Food inspections. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from holoclean.