The nadeef from daqcri

Compile time error

i got error during compilation of this file

NonBlockingCollectionIterator

& can someone tell me detailed guide to run this on our local system.

Thank you.

OutOfMemoryError: GC overhead limit exceeded

Hello,

currently, I am running a set of experiments on the HOSP dataset:
1k - 100k tuples with 2%-10% noise introduced by myself.
HOSP is provided within NADEEF github repository.

At some point, I am getting OutOfMemoryError: GC overhead limit exceeded
Exception in thread "Thread-762" java.lang.OutOfMemoryError: GC overhead limit exceeded

I increased the memory usage, and run NADEEF on Linux machine with the following configuration:

java -Xmx14G -cp out/bin/*:examples/:out/test qa.qcri.nadeef.console.Console
Any idea, what else I could configure to get NADEEF running without OOM errors?

Thank you for your help.

Restrict the visibility of projects and data sources by users

Improve CSV import feature in Dashboard

schema editing
Import files directly from S3, Dropbox, or other sources.

Submitting issues

@zyzyis from the demo should open a new window

show more tuple in tuple rank panel

Can we show more columns in tuple rank?
Can we navigate left-right in the ranking?
Michael B has same request when I did a demo to him, same for the people at QSTP

DC is not working for me

Hi,

I am trying to run the following DC rule:

{
"source" : {
"type" : "csv",
"file" : ["/home/khayyzy/data/TaxB/inputDB_100000.csv"],
},
"rule" : [
{
"name" : "myDC2",
"type" : "dc",
"value" : ["not(t1.Salary>t2.Salary&t1.Tax<t2.Tax)"]
}
]
}

on the schema:
Name VARCHAR(255),Dept INT,Salary INT,Tax INT

When running NADEEF, it returns zero violations. I am sure that there are at least 6000 violations. Am I doing something wrong in defining the DC?

Change UI framework to use Bootstrap 3

Provide details on CSV format before uploading

We are using a special CSV header and we need to let user know how to use or upload files correctly.

Entity Resolution using NADEEF

Bug with FD

Hi,

I am trying to run NADEEF with the following FD rule:

o_custkey | c_address

I got this error when I define attribute o_custkey as integer and c_address as varchar(255). I am running derby db on memory.

Error: Synchronization failed.
Exception: java.sql.SQLSyntaxErrorException: Comparisons between 'INTEGER' and 'CHAR (UCS_BASIC)' are not supported. Types must be comparable. String types must also have matching collation. If collation does not match, a possible solution is to cast operands to force them to the default collation (e.g. SELECT tablename FROM sys.systables WHERE CAST(tablename AS VARCHAR(128)) = 'T1')
Error: Node has an exception during execution.
Exception: java.lang.NullPointerException: null

Violation Tab should refresh data after switching to another data source

Repair button is not working on dashboard

Data source dataTable bug after switching between different data sources

Negative tuple count (overflow)

Hi,

I got the following when running a DC with large number of rows

HScope time (ms) 0
VScope time (ms) 0
Blocks 1
Iterator time (ms) 4491737
DB load time (ms) 712
Detect time (ms) 4528734
Detect thread count 1953116
Detect tuple count -1474936480
Violation 0

Violation export time 23

Detection finished in 4528734 ms and found 0 violations.

Test Issue page in Github

Test

“search based” for violating tuples

in the upper panels, if a click on a violating tuple with id 2 (involved in a violation), the search of “2” returns tons of tuples. Why not searching for tid:2 or something similar?

List of data sources

@zyzyis should be "List of Tables"

More options to add New Data Source

The current interface is limited to dropping a file which can be annoying.
Add the ability to browse the local disk, URLs, etc. to upload CSV files.

Charts in dashboard doesn't change after switching to another data source

Upload compressed files

We should be able to upload compressed CSV files

failing to create new project

In two cases (paolo and paolo2) when I created a new project, I got this msg on the top right:
FATAL: database "null" does not exist

on the left there was only one panel: No job is running.
and I could not do anything.

Now, if I open one of them again (by going back to the first screen) I have the correct panels to select the data source on the left, no errors

tested on Chrome

JVM memory

Hi,

How to assign more memory to NADEEF?

Handle IO errors in the load command

:> load sldkfsdls
Oops, something is wrong. Please check the log in the output dir.
Exception: java.io.FileNotFoundException: null

Integrate incremental detection algorithm.

Add a button to delete a data source

Names policy for .csv data

hello,
the NADEEF throws an exception if the uploaded data files names contain "-" (minus) Like, "noisy-data.csv"

thanks!

Violation export doesn't do right escaping on strings

Get rid of NVD3 library and re-write the visualization widget purely based on D3

NVD3 development is lacking behind D3, and the quality is not good.

Violation table definition is not normalized.

Try to get it into BNCF form.

Editing rule in Dashboard and do a re-detect doesn't reload the class file

Move NADEEF to Maven repository

Performance testing setup in NADEEF

cannot see data source

I've added a CSV for paolo4 and saved, but it is not shown in the main GUI.
I've tried twice, refresh, etc.

Is there any requirement on the header?

here are the first lines of the file:

rec_id:Varchar(600), given_name:Varchar(1000), surname:Varchar(1000), street_number:Varchar(3000), address_1:Varchar(3000), address_2:Varchar(3000), suburb:Varchar(1000), postcode:Varchar(500), state:Varchar(200), date_of_birth:varchar(1000), age:Varchar(200), phone_number:Varchar(1000), soc_sec_id:Varchar(1000), blocking_number:Varchar(1000)
rec-183-org, elisha, kerslake, 29, goldsborough close, , ringwood north, 4806, qld, 19020708, 34, 02 70611981, 5402131, 1
rec-351-dup-0, georgina, hannink, 109, knoke avenue, s/port yacht-b, leichhardt, 4207, qld, 19920216, , , 5814238, 3
rec-304-dup-0, matthew, boyes, 9, kadina crescent, , whitfield, 4060, nsw, 19660821, , 07 34459589, 3290986, 1

The violation tab should be activated automatically after clicking on detect

examples for new rule

It would be nice to have an example/template for the different rules in the panel where they can be specified.
Eg, if I click FD, it should show the syntax for an FD
Java, should the piece of java I need to fill it
etc

Editing Rule code is not working

If we edit a rule code after clicking on detect, the rule description shows that the rule is updated, but the violation tab always shows violation of the previous rule code.

multiple rules

there are some issues when running multiple rules together, see paolo4 with 3 rules. In rule attribute only 2 atts are shown (but 4 are actually involved)

Introduce Work-Steal strategy for Multi-threading detection

show violations, not only tuples

When I click on a violating tuple in the upper panels, I'd like to see the tuple of interest and its context: the other tuples involved in its violations

for example, if tuple X is in violation with tuple Y on constraint C1 and tuple Z with constraint C2, it would be great to see the cells involved in X and Y in a background color, and the cells involved in X and Z in another color.

I believe this would greatly improved the usability of the GUI

daqcri / nadeef Goto Github PK

nadeef's Introduction

nadeef's People

Contributors

Stargazers

Watchers

Forkers

nadeef's Issues

Violation export time 23

Recommend Projects

Recommend Topics

Recommend Org