Comments (4)
However, the tmp solution above can result in inconsistency of attribute names when it comes to evaluation. Similar changes might be needed for eval.py when loading clean data
from holoclean.
The error arises because we don't quote columns (attributes) in our queries since Postgres's lexer attempts to resolve these unquoted tokens while also downcasing the attribute (see https://stackoverflow.com/questions/20878932/are-postgresql-column-names-case-sensitive).
I fixed how we treat attributes as columns in #18 by retaining their original format from the raw datasets. This should no longer be an issue: I tested this by replacing the attribute "HospitalName" with "Hospital-Name" in the hospital dataset
// hospital.csv
ProviderNumber,Hospital-Name,Address1,Address2,Address3,City,State,ZipCode,CountyName,PhoneNumber,HospitalType,HospitalOwner,EmergencyService,Condition,MeasureCode,MeasureName,Score,Sample,Stateavg
10018,callahan eye foundation hospital,1720 university blvd,,,birmingham,al,35233,jefferson,2053258100,acute care hospitals,voluntary non-profit - private,yes,surgical infection prevention,scip-card-2,surgery patients who were taking heart drugs caxxed beta bxockers before coming to the hospitax who were kept on the beta bxockers during the period just before and after their surgery,,,al_scip-card-2
...
// hospital_clean.csv
tid,attribute,correct_val
0,ProviderNumber,10018
0,Hospital-Name,callahan eye foundation hospital
0,Address1,1720 university blvd
...
// hospital_constraints_att.txt
t1&t2&EQ(t1.Condition,t2.Condition)&EQ(t1.MeasureName,t2.MeasureName)&IQ(t1.HospitalType,t2.HospitalType)
t1&t2&EQ(t1.Hospital-Name,t2.Hospital-Name)&IQ(t1.ZipCode,t2.ZipCode)
t1&t2&EQ(t1.Sample,t2.Sample)&IQ(t1.Score,t2.Score)
...
and the test ran perfectly
Precision = 0.96, Recall = 0.69, Repairing Recall = 0.80, F1 = 0.80, Repairing F1 = 0.88, Detected Errors = 435, Total Errors = 509, Correct Repairs = 350, Total Repairs = 365, Total Repairs (Grdth present) = 365
from holoclean.
@richardwu is this fixed? if so please close.
from holoclean.
Yes this is fixed. I can't seem to close this issue (I believe only the author and/or maintainers can).
from holoclean.
Related Issues (20)
- Documents for defining constraints HOT 2
- pos_values df in memory
- How to incorporate external knowledge HOT 1
- Factor graph models HOT 1
- Table cell_distr is either empty or does not exist HOT 1
- Is there any example code for flight dataset? HOT 1
- Compatibility
- Boolean value TypeError in start_example.sh script
- Datasets and Constraint files
- translation of simple DCs with a constant to SQL queries not working
- Using HoloClean for creating labels on tabular numerical datasets HOT 6
- [ci] run unit tests in travis HOT 1
- implement single tuple constraints HOT 1
- Correlation for categorical columns HOT 1
- doubt about featurizer tensor size HOT 1
- branch: best branch for testing purposes HOT 3
- Training and Inferring in different datasets
- Confused active attributes returned if not running detect_errors before generate domain HOT 1
- Example that does not use postgres HOT 1
- unable to process large data set of Food inspections. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from holoclean.