Comments (9)
I had a look at the
blocklib/blocklib/evaluation.py
Line 5 in b569b3f
There are two reasons why you might get a ZeroDivisionError.
- there are no records in the provided
data
. - there are no true matches in the provided
data
.
Have a look at your data and make sure you provide it in the right format.
Or provide an example that shows this error and I can try to help.
from blocklib.
I have used two datasets which are in text format in csv file that include 5 attributes (id, title, authors, venue. year).
subdata1 = [x[0] for x in data_alice]
subdata2 = [x[0] for x in data_bob]
rr, pc = assess_blocks_2party([filtered_blocks_alice, filtered_blocks_bob],
[subdata1, subdata2])
print('RR={}'.format(rr))
print('PC={}'.format(pc))
I'm using the same above code but I have changed x[0]
to x[1]
in both lines and it works fine and I got results.
In this case, changing x[0]
to x[1]
Does it produce wrong results?
from blocklib.
Note that the subdata1
and subdata2
here represent the entity id of two parties i.e. the ground truth. We use x[0]
since the entity id is in the first column of every record. Here 0 represent the column index of entity id. If your entity id is in the second column, then use x[1]
in the list comprehension. assess_blocks_2party
needs them to compute the pair completeness. Have a look at the documentation for it here.
Hope it helps :)
from blocklib.
Here an example of my dataset:
id, title, authors, venue, year
304, world wide, lyman ram, international conference, 1999
290, safe query, richard lomet, acm sigmod, 2001
279, database, pillip keim, international conference, 1998
The entity id
should be x[0]
as the id
attribute is the unique attribute but using x[0]
it doesn't work and gives the error ZeroDivisionError: float division by zero
. Is there any way to fix this error instead of using x[1]
?
from blocklib.
Given entity id is in column 0, I don't think you should put x[1]
in the list comprehension. There are few ways potentially might locate the problem:
- Check if there is intersection between the
id
column in your two datasets - Check if your
filtered_blocks_alice
andfiltered_blocks_bob
are empty - Clone the latest the blocklib and install it manually with
pip install
. Wilko has pushed a PR to capture allfloat division by zero
cases and throw the reason of that
from blocklib.
Thanks Wang for your suggestions.
- I have checked the data types of
id
in both datasets and have the same type.
2-filtered_blocks_acm
andfiltered_blocks_dblp
are not empty.
3- I have installed the latest version.
- I still have the same issue. and here is the screenshot:
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-46-4b6a340a3700> in <module>
5
6 rr, pc = assess_blocks_2party([filtered_blocks_acm, filtered_blocks_dblp],
----> 7 [subdata1, subdata2])
8
9 print('RR={}'.format(rr))
~\AppData\Roaming\Python\Python37\site-packages\blocklib\evaluation.py in assess_blocks_2party(filtered_reverse_indices, data)
45 # pair completeness is the "recall" before matching stage
46 rr = 1.0 - float(num_cand_rec_pairs) / total_rec
---> 47 pc = float(num_block_true_matches) / num_all_true_matches
48 return rr, pc
ZeroDivisionError: float division by zero
from blocklib.
Hello Wang I am also getting the same error "ZeroDivisionError: float division by zero". Can you suggest something to remove this error?
from blocklib.
Here is the error sample :-
from blocklib.
After running the evaluation methods I figure out that the ground truth value provided is different.
For Ex:- Id for data -1 is "conf/sigmod/AbadiC02" and Id for Data - 2 is in this form "f2Lea-RN8dsJ". So, when it calculating the intersection for num_all_true_matches = len(entity1.intersection(entity2))
it become zero and it raises "ZeroDivisionError: float division by zero"
when we calculating pc
value.
My question is, it's necessary that ID or ground truth columns of both dataset should must be in same format?
In this case if my ID is different what is the other approach by which we can calculate rr
and pc
values.
Can we use year columns for this purpose?
id, title, authors, venue, year
conf/sigmod/AbadiC02, world wide, lyman ram, international conference, 1999
from blocklib.
Related Issues (20)
- Docs, examples and tests should use feature names
- Convert printing to logging
- Serialize to a blocking schema
- Blocking Schema consistency
- module 'blocklib.validation' has no attribute 'validate_blocking_schema' HOT 1
- 'CandidateBlockingResult' object has no attribute 'print_summary_statistics' HOT 1
- Dependabot errors HOT 1
- Automate release with CI
- feedback on filtering for P-Sig blocking
- Add tests
- Ideas for extra signature strategies
- Python API for signature generation
- Sentinel check for input type HOT 1
- Inconsistent block keys in filtered reversed index with psig
- Convert block key into string
- Throw exception when clks are fed to p-sig blocking HOT 1
- Support column names in blocking schema
- Add number of encodings in blocking metadata HOT 2
- Dependabot couldn't authenticate with https://pypi.python.org/simple/
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from blocklib.