Comments (6)
Current API:
>>> from camelot.pdf import Pdf >>> from camelot.lattice import Lattice >>> manager = Pdf(Lattice(), "/path/to/pdf") >>> tables = manager.extract()
This is redundant (since two imports are needed to create a 'manager' that parses PDFs and not intuitive (since the user doesn't know what is returned in tables, which is a dict with multiple keys that make him/her/them look at the documentation, these multiple keys contain different stats and data is present in the 'data' key as a 2-D list).
from camelot.
Edit: Commented again.
from camelot.
Will list down properties of the Table
object here later, the idea is to let the user get the data without having to look at the docs often, either by having a df
property which contains a pandas DataFrame object or a 2-D list again perhaps? Other properties can be the various stats.
from camelot.
Make configuration parameters for parsing methods more intuitive (either through docs or making them simpler or removing them?), for example, user has no idea what threshold_constant is.
from camelot.
New API:
>>> import camelot >>> tables = camelot.read_pdf("foo.pdf") >>> tables <TableList n=2> >>> tables.to_csv(zip=True) # to_json, to_excel, to_html >>> tables[0] <Table shape=(3,4)> >>> tables[0].parsing_report { "accuracy": 96, "whitespace": 80, "time_taken": 0.5, "page": 1 } >>> df = tables[0].df >>> tables[0].to_csv("foo.csv") # to_json, to_excel, to_html
Each table is internally represented as a pandas DataFrame.
from camelot.
Opening another issue #95 for configuration parameter names.
from camelot.
Related Issues (20)
- Unknown/invisible data extracted from camelot
- Table not extracted - Is there a way to debug such issue?
- Division by zero without using table_regions HOT 1
- OSError: Ghostscript is not installed. You can install it using the instructions here: https://camelot-py.readthedocs.io/en/master/user/install-deps.html
- UnicodeEncodeError: 'latin-1' codec can't encode characters in position 8-11: ordinal not in range(256)
- All tables in one csv
- FileNotFoundError HOT 1
- Confusing installation docs HOT 2
- The extracted table box coordinates do not correspond to the images converted from the PDF HOT 8
- Bug: Hardcoded value of '10' limits number of tables in page HOT 1
- AttributeError: module 'camelot' has no attribute 'read_pdf'
- OSError: Ghostscript Installation error HOT 2
- Do you have stream + only vertical lines seperation?
- Problems with pages with no tables - total number of pages variable, no good page indexing HOT 2
- pdf file with multi pages can't parse fully,the second page's tables can not display HOT 1
- Getting PyPDF2 error while using the camelot library HOT 7
- Can't install on MacOS
- extracted table cell coordinates(stream) do not corresponds to page image converted from pdf
- could I join the program to add ghostscript install instruction or suggestion during install in readme file? HOT 1
- Help with camelot installation using poetry
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from camelot.