ulikoehler / otr Goto Github PK
View Code? Open in Web Editor NEWOptical table recognition - recognize tables in scan images using OpenCV
License: GNU General Public License v3.0
Optical table recognition - recognize tables in scan images using OpenCV
License: GNU General Public License v3.0
Right now, all I get is an image out.png that shows the cells and their grid numbers (shown below).
Is there a way to programmatically get the table structure (rows, cols, spans if any) and bounding boxes of each cells? I would like to then perform OCR and extract the table data from the image.
Thanks in advance. :)
Dear Uli,
Thank you for this great this package, it's made my life much easier. I am using it to recover information on how Peru's congresspersons' voted for any given law. I'm planning to publish this info to hopefully make the next elections more transparent. This is how the documents' fingerprint looks:
I've been able to extract the information for some pages. However, in others i've failed. If you look closely at the fingerprint the last column of row 21 is labeled as row 22. Do you have any recommendation on how to ensure table recognition chooses the correct x, y coordinates? Thank you so much!
AttributeError Traceback (most recent call last)
in ()
4 contour_analyzer.filter_contours(min_area=400)
5 contour_analyzer.build_graph()
----> 6 contour_analyzer.remove_non_table_nodes()
7 contour_analyzer.compute_contour_bounding_boxes(e)
8 contour_analyzer.separate_supernode(f)
~/transfer_learning/table_data/OTR/TableRecognition.py in remove_non_table_nodes(self)
208 self.supernode_idx = max(self.g.degree().items(), key=operator.itemgetter(1))[0]
209 for i in range(len(self.contours)):
--> 210 if self.contours[i] is None: continue
211 nxt, prev, first_child, parent = self.hierarchy[0, i]
212 # Remove node if it has a non-supernode node as parent
AttributeError: 'DiDegreeView' object has no attribute 'items'
Traceback (most recent call last):
File "/home/temp/Desktop/OTR-master/MIME/contour1.py", line 41, in
contour_analyzer.visualize_contours(img)
File "/home/temp/Desktop/OTR-master/MIME/TableRecognition.py", line 442, in visualize_contours
cv2.drawContours(img, self.contours_bbox, -1, (0,255,0), thickness)
cv2.error: OpenCV(4.1.0) /io/opencv/modules/imgproc/src/drawing.cpp:2606: error: (-215:Assertion failed) reader.ptr != NULL in function 'cvDrawContours'
is this the opencv version conflict problem ?
how could i install cv_algorithms
I run module python test-otr.py 0110_099.png and get this error
Traceback (most recent call last):
File "C:/Users/User/Documents/проект по иммунологии/OTR/test-otr.py", line 56, in
img = runOTR("0110_099.png")
File "C:/Users/User/Documents/проект по иммунологии/OTR/test-otr.py", line 27, in runOTR
contour_analyzer.find_corner_clusters()
File "C:\Users\User\Documents\проект по иммунологии\OTR\TableRecognition.py", line 293, in find_corner_clusters
distmat = scipy.spatial.distance.cdist(corners, corners, 'euclidean')
File "C:\Users\User\Anaconda3\lib\site-packages\scipy\spatial\distance.py", line 2369, in cdist
raise ValueError('XA must be a 2-dimensional array.')
ValueError: XA must be a 2-dimensional array.
file:
http://www.nasflmuseum.com/uploads/4/9/5/8/4958573/_7848632_orig.jpg
python3 test-otr.py test2.jpg
Traceback (most recent call last):
File "test-otr.py", line 56, in <module>
img = runOTR(args.infile)
File "test-otr.py", line 18, in runOTR
contour_analyzer = TableRecognition.ContourAnalyzer(imgDil)
File "/home/piotr/OTR/TableRecognition.py", line 76, in __init__
im2, contours, hierarchy = cv2.findContours(img, cv2.RETR_TREE, cv2.CHAIN_APPROX_TC89_KCOS, **kwargs)
ValueError: not enough values to unpack (expected 3, got 2)
Python 3.6.3
Distributor ID: Ubuntu
Description: Ubuntu 17.10
Release: 17.10
Codename: artful
OpenCV: 3.1.0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.