Comments (10)
OK. I fixed the doc d787ade
from tabula-py.
Put parameter - pages="all" or specific pages ="1-2,3"
Take a look example in repository.
from tabula-py.
Nice, convert_into('./examples/data.pdf', 'test.csv', output_format='csv', pages="all")
works. Have you used the format
part of the options at all?
from tabula-py.
Yeah, i've tried all 3 formats and works fine.
It might be super useful lib :)
convert_into("\data\data2.pdf", "\data\data2.tsv", pages="1-2", output_format="tsv")
convert_into("\data\data2.pdf", "\data\data2.csv", pages="1-2", output_format="csv")
convert_into("\data\data2.pdf", "\data\data2.json", pages="1-2", output_format="json")
from tabula-py.
Thank you @GGPay for your supporting. It seems to be solved.
@adack123 If you still have other problems, please file another issue.
from tabula-py.
@GGPay Thanks for the feedback. I wasn't really referring to the output_format
but rather the format
part of the options, unless they are one and the same.
@chezou
The docs say under options
format (str, optional):
Format for output file or extracted object. (CSV, TSV, JSON)
Does this refer to output_format
? I want to do something like, read_pdf('foo.pdf', format='csv')
and have the extracted object come back as CSV
Doing something like, read_pdf('foo.pdf', output_format='csv')
doesn't get me to the desired output.
from tabula-py.
Ah, I understand. The doc is old.
I assume you to use output_format
rather than format
directly.
https://github.com/chezou/tabula-py/blob/master/tabula/wrapper.py#L104-L115
You can do like:
convert_into('input.pdf', 'output.csv', output_format='csv')
from tabula-py.
Gotcha, format -> output_format
What about in the read_pdf
method? I see that you can return as JSON here, https://github.com/chezou/tabula-py/blob/master/tabula/wrapper.py#L48
What about CSV, etc?
from tabula-py.
I don't imagine the use case to convert into CSV. Converting CSV has a lot of options for that, so I don't want to think it is a tabula-py's responsibility.
Why don't you convert from data frame into csv using pandas.DataFrame.to_csv()
?
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html
from tabula-py.
Yeah, I was sorting of thinking that was out of the scope of what tabula-py is used for but was curious. Anyways, thanks for the clarification on format
.
from tabula-py.
Related Issues (20)
- tabula.io.read_pdf argument "pandas_options" is being changed inside the function HOT 1
- tabula.io.read_pdf argument "pandas_options" is being changed inside the function HOT 3
- Extracting non tabular data from pdfs, is it possible? HOT 1
- Extracting non-tabular (1-tabula output) data from pdf, is it possible? HOT 3
- Unable to remove error : Got stderr: Picked up _JAVA_OPTIONS: -Djava.awt.headless=true HOT 1
- Unable to remove note in log : Got stderr: Picked up _JAVA_OPTIONS: -Djava.awt.headless=true HOT 1
- Tabula py Ignores an entire column if it's blank and if it does not contain headerd? HOT 1
- tabula-py CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar', HOT 3
- dont ignore empty columns in tables spanning multiple pages HOT 1
- Try to install tabula-py HOT 1
- Use JPype instead of subprocess HOT 11
- Add a way to set areas for non-existent pages in template HOT 4
- Exception: RuntimeError: java.lang.UnsatisfiedLinkError: HOT 2
- cant install tabula-py on m1 mac vscode. HOT 1
- Support Python 3.12 HOT 5
- Pls add "orientation" parameter to read_pdf HOT 4
- Security vulnerability in tabula-1.0.5-jar-with-dependencies.jar HOT 4
- [BUG] Encoding still being overridden even after fix to #371. HOT 5
- FutureWarning: errors='ignore' is deprecated and will raise in a future version. HOT 3
- Unable to detect table with longer header information HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tabula-py.