Comments (10)
what if you try lowering your threshold? (Assuming you do have a bunch of training data being used)
from dedupe.
also, i have found that 'dirty' data in the training data can cause the process to halt with that error..
could be records with special characters, nulls when you havent specified 'has missing'...
from dedupe.
what if you try lowering your threshold? (Assuming you do have a bunch of training data being used)
I have tried everything including lowering the threshold and using clean data, nothing so far. However, i may have traced the error to this line of code: b_data = deduper.fingerprinter(full_data)
The other lines prior to that including full_data = ((row['donor_id'], row) for row in read_cur) are working as expected and i am able to print out and inspect them.
However b_data is a generator object and i am unable to find code that can print it out so i can inspect its content. I have tried converting it to a list and even using the Next function, any tips on how i can print it out to inspect its contents? Thank you
from dedupe.
@fgregg please help with this, Thanks!
from dedupe.
please provide a reproducible example
from dedupe.
please provide a reproducible example
from dedupe.
that is not reproducible. i need data and training data and setting file.
from dedupe.
if you cannot share that because of privacy issues, we offer consulting services.
from dedupe.
if you cannot share that because of privacy issues, we offer consulting services.
Okay, Thank you. One more question, when i print out full_data this is the output I get:
('r2FEdTZzbOM', ('r2FEdTZzbOM', 'country side', 'sarkodie feat. black sherif'))
('GgeTnpTkzI0', ('GgeTnpTkzI0', 'loaded', 'tiwa savage, asake'))
('aP-MBSrzFNo', ('aP-MBSrzFNo', "'letter from overseas'", 'larry gaaga & black sherif'))
('we5gSjpX03U', ('we5gSjpX03U', 'single', 'kuami eugene'))
('E1YDr0PYg34', ('E1YDr0PYg34', 'nirvana', 'kwesi arthur x kofi mole'))
('o_oenl2Be-w', ('o_oenl2Be-w', '2 sugar', 'wizkid'))
('DPBRGWUgQsA', ('DPBRGWUgQsA', 'soweto', 'victony & tempoe'))
('5BfoawaaARc', ('5BfoawaaARc', 'shatta montez', 'shatta wale'))
('9zEPGHPZCF8', ('9zEPGHPZCF8', 'red flags', 'ruger'))
('1FbzbsWSN88', ('1FbzbsWSN88', 'midnight', 'larruso'))
This is the expected format right?
from dedupe.
yes.
from dedupe.
Related Issues (20)
- extend index predicates to whole model
- Doc update to add reference for cluster scoring
- Can't import 'dedupe_dataframe' because of numpy HOT 1
- The sample_size parameter in the Dedupe and Link classes does nothing
- No Predicates Found after Providing Too Many Labels
- Feature importance on classifier
- Use of predict_proba HOT 1
- Query related to console_label and training_data_dedupe methods HOT 3
- Process crashing while running.
- PyLBFGS package not compatible with python 3.11 HOT 1
- consider amortized costs for branch and bound heuristics
- Explanation on the entity matches for Identity resolution
- Add a new record to existing maps HOT 1
- Several independent groups HOT 2
- Multiprocessing Error and 'generator raised StopIteration' error HOT 1
- Error installing on Python 3.12.2 HOT 2
- Execution stuck after "importing data...."
- Inserting 'y' in interactive labelling
- Strange import behaviour with Python 3.12.3 during debugging HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dedupe.