Code Monkey home page Code Monkey logo

nadeef's Introduction

nadeef's People

Contributors

gitter-badger avatar hammady avatar waffle-iron avatar yeyinqcri avatar zyzyis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nadeef's Issues

Compile time error

error
i got error during compilation of this file

NonBlockingCollectionIterator

& can someone tell me detailed guide to run this on our local system.

Thank you.

OutOfMemoryError: GC overhead limit exceeded

Hello,

currently, I am running a set of experiments on the HOSP dataset:
1k - 100k tuples with 2%-10% noise introduced by myself.
HOSP is provided within NADEEF github repository.

At some point, I am getting OutOfMemoryError: GC overhead limit exceeded
Exception in thread "Thread-762" java.lang.OutOfMemoryError: GC overhead limit exceeded

I increased the memory usage, and run NADEEF on Linux machine with the following configuration:

java -Xmx14G -cp out/bin/*:examples/:out/test qa.qcri.nadeef.console.Console
Any idea, what else I could configure to get NADEEF running without OOM errors?

Thank you for your help.

show more tuple in tuple rank panel

Can we show more columns in tuple rank?
Can we navigate left-right in the ranking?
Michael B has same request when I did a demo to him, same for the people at QSTP

DC is not working for me

Hi,

I am trying to run the following DC rule:

{
"source" : {
"type" : "csv",
"file" : ["/home/khayyzy/data/TaxB/inputDB_100000.csv"],
},
"rule" : [
{
"name" : "myDC2",
"type" : "dc",
"value" : ["not(t1.Salary>t2.Salary&t1.Tax<t2.Tax)"]
}
]
}

on the schema:
Name VARCHAR(255),Dept INT,Salary INT,Tax INT

When running NADEEF, it returns zero violations. I am sure that there are at least 6000 violations. Am I doing something wrong in defining the DC?

Bug with FD

Hi,

I am trying to run NADEEF with the following FD rule:

o_custkey | c_address

I got this error when I define attribute o_custkey as integer and c_address as varchar(255). I am running derby db on memory.

Error: Synchronization failed.
Exception: java.sql.SQLSyntaxErrorException: Comparisons between 'INTEGER' and 'CHAR (UCS_BASIC)' are not supported. Types must be comparable. String types must also have matching collation. If collation does not match, a possible solution is to cast operands to force them to the default collation (e.g. SELECT tablename FROM sys.systables WHERE CAST(tablename AS VARCHAR(128)) = 'T1')
Error: Node has an exception during execution.
Exception: java.lang.NullPointerException: null

Negative tuple count (overflow)

Hi,

I got the following when running a DC with large number of rows

HScope time (ms) 0
VScope time (ms) 0
Blocks 1
Iterator time (ms) 4491737
DB load time (ms) 712
Detect time (ms) 4528734
Detect thread count 1953116
Detect tuple count -1474936480
Violation 0

Violation export time 23

Detection finished in 4528734 ms and found 0 violations.

“search based” for violating tuples

in the upper panels, if a click on a violating tuple with id 2 (involved in a violation), the search of “2” returns tons of tuples. Why not searching for tid:2 or something similar?

More options to add New Data Source

The current interface is limited to dropping a file which can be annoying.
Add the ability to browse the local disk, URLs, etc. to upload CSV files.

failing to create new project

In two cases (paolo and paolo2) when I created a new project, I got this msg on the top right:
FATAL: database "null" does not exist

on the left there was only one panel: No job is running.
and I could not do anything.

Now, if I open one of them again (by going back to the first screen) I have the correct panels to select the data source on the left, no errors

tested on Chrome

JVM memory

Hi,

How to assign more memory to NADEEF?

Names policy for .csv data

hello,
the NADEEF throws an exception if the uploaded data files names contain "-" (minus) Like, "noisy-data.csv"

thanks!

cannot see data source

I've added a CSV for paolo4 and saved, but it is not shown in the main GUI.
I've tried twice, refresh, etc.

Is there any requirement on the header?

here are the first lines of the file:

rec_id:Varchar(600), given_name:Varchar(1000), surname:Varchar(1000), street_number:Varchar(3000), address_1:Varchar(3000), address_2:Varchar(3000), suburb:Varchar(1000), postcode:Varchar(500), state:Varchar(200), date_of_birth:varchar(1000), age:Varchar(200), phone_number:Varchar(1000), soc_sec_id:Varchar(1000), blocking_number:Varchar(1000)
rec-183-org, elisha, kerslake, 29, goldsborough close, , ringwood north, 4806, qld, 19020708, 34, 02 70611981, 5402131, 1
rec-351-dup-0, georgina, hannink, 109, knoke avenue, s/port yacht-b, leichhardt, 4207, qld, 19920216, , , 5814238, 3
rec-304-dup-0, matthew, boyes, 9, kadina crescent, , whitfield, 4060, nsw, 19660821, , 07 34459589, 3290986, 1

examples for new rule

It would be nice to have an example/template for the different rules in the panel where they can be specified.
Eg, if I click FD, it should show the syntax for an FD
Java, should the piece of java I need to fill it
etc

Editing Rule code is not working

If we edit a rule code after clicking on detect, the rule description shows that the rule is updated, but the violation tab always shows violation of the previous rule code.

multiple rules

there are some issues when running multiple rules together, see paolo4 with 3 rules. In rule attribute only 2 atts are shown (but 4 are actually involved)

show violations, not only tuples

When I click on a violating tuple in the upper panels, I'd like to see the tuple of interest and its context: the other tuples involved in its violations

for example, if tuple X is in violation with tuple Y on constraint C1 and tuple Z with constraint C2, it would be great to see the cells involved in X and Y in a background color, and the cells involved in X and Z in another color.

I believe this would greatly improved the usability of the GUI

violation relation panel

In violation relation, can we use different edge (different colors?) for different rules?
Can we do some kind of zoom in/out or ordering?

This panel has a lot of potential, I wonder if we can do more in terms of visualisation
Can we use any graph algorithm to analyse?

Sanitize CSV headers

When receiving CSV files, the header should be sanitized for proper Postgres column names.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.