Code Monkey home page Code Monkey logo

Comments (7)

thodrek avatar thodrek commented on May 25, 2024

Hi, I would recommend using the HC version in Dev as we have fixed many issues there. Can you please let us if the error persists with the dev version? If so we would be happy to investigate.

from holoclean.

j-r77 avatar j-r77 commented on May 25, 2024

Hello,

I have just tried it in the latest dev version, but the issue persists. Values are repaired in the 20-tuple dataset, but the same values no longer change as a larger portion of the data is considered (and the new errors in the larger datasets are not repaired either).

from holoclean.

minafarid avatar minafarid commented on May 25, 2024

Hi @j-r77 I am currently working on reproducing and debugging this issue and will get back to you.

from holoclean.

fgeerts avatar fgeerts commented on May 25, 2024

Hi @minafarid Just wondering whether you managed to reproduce and debug the issue already? Thanks.

from holoclean.

thodrek avatar thodrek commented on May 25, 2024

Hi @fgeerts we are actively working on this issue. It is a bit more intricate than what it seems. This issue comes up because the only attributes that are strongly correlated in the Adult dataset are "relationship" and "sex", i.e., the ones present in your constraints (see attached image).

screen shot 2018-12-17 at 11 42 45 am

We are actively working on this issue and we will be getting back to you ASAP.

from holoclean.

richardwu avatar richardwu commented on May 25, 2024

Hi @j-r77:

We did some digging around and it seems that the issue lies in the use of InitAttFeaturizer. Because of how we currently do weak supervision, our InitAttFeaturizer feature weights actually blows up and will assign to much emphasis on the initial values which causes no repairs to occur.

If you pass in the keyword argument learnable=False, you should be able to see better results. We've recently tweaked how we do weak supervision in #43 such that InitAttFeaturizer behaves as intended.

That being said with this specific dataset as @thodrek pointed out, since there are so few correlated attributes weak supervision fails to assign confident weak labels and results in the prior behaviour.

In this case Holoclean actually prefers not to repair any cell as demonstrated because it is unconfident that any repairs are correct due to the lack of correlations.

Hope that helps.

from holoclean.

fgeerts avatar fgeerts commented on May 25, 2024

from holoclean.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.