Code Monkey home page Code Monkey logo

Comments (10)

chezou avatar chezou commented on July 29, 2024 1

OK. I fixed the doc d787ade

from tabula-py.

GGPay avatar GGPay commented on July 29, 2024

Put parameter - pages="all" or specific pages ="1-2,3"

Take a look example in repository.

from tabula-py.

aarondack avatar aarondack commented on July 29, 2024

Nice, convert_into('./examples/data.pdf', 'test.csv', output_format='csv', pages="all") works. Have you used the format part of the options at all?

from tabula-py.

GGPay avatar GGPay commented on July 29, 2024

Yeah, i've tried all 3 formats and works fine.
It might be super useful lib :)

convert_into("\data\data2.pdf", "\data\data2.tsv", pages="1-2", output_format="tsv")
convert_into("\data\data2.pdf", "\data\data2.csv", pages="1-2", output_format="csv")
convert_into("\data\data2.pdf", "\data\data2.json", pages="1-2", output_format="json")

from tabula-py.

chezou avatar chezou commented on July 29, 2024

Thank you @GGPay for your supporting. It seems to be solved.

@adack123 If you still have other problems, please file another issue.

from tabula-py.

aarondack avatar aarondack commented on July 29, 2024

@GGPay Thanks for the feedback. I wasn't really referring to the output_format but rather the format part of the options, unless they are one and the same.

@chezou
The docs say under options

format (str, optional):
Format for output file or extracted object. (CSV, TSV, JSON)

Does this refer to output_format? I want to do something like, read_pdf('foo.pdf', format='csv') and have the extracted object come back as CSV

Doing something like, read_pdf('foo.pdf', output_format='csv') doesn't get me to the desired output.

from tabula-py.

chezou avatar chezou commented on July 29, 2024

Ah, I understand. The doc is old.

I assume you to use output_format rather than format directly.
https://github.com/chezou/tabula-py/blob/master/tabula/wrapper.py#L104-L115

You can do like:

convert_into('input.pdf', 'output.csv', output_format='csv')

from tabula-py.

aarondack avatar aarondack commented on July 29, 2024

@chezou

Gotcha, format -> output_format

What about in the read_pdf method? I see that you can return as JSON here, https://github.com/chezou/tabula-py/blob/master/tabula/wrapper.py#L48

What about CSV, etc?

from tabula-py.

chezou avatar chezou commented on July 29, 2024

I don't imagine the use case to convert into CSV. Converting CSV has a lot of options for that, so I don't want to think it is a tabula-py's responsibility.

Why don't you convert from data frame into csv using pandas.DataFrame.to_csv()?
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

from tabula-py.

aarondack avatar aarondack commented on July 29, 2024

Yeah, I was sorting of thinking that was out of the scope of what tabula-py is used for but was curious. Anyways, thanks for the clarification on format.

from tabula-py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.