Code Monkey home page Code Monkey logo

Comments (6)

mara004 avatar mara004 commented on September 25, 2024 1

Unfortunately I can't really comment on the values returned by get_pos(), we're just forwarding them from FPDFPageObj_GetBounds()...

I acknowledge DPI is a problem, but not really a bug, since pdfium calculates it from pixel size relative to the occupied canvas area, so this is not actually the DPI metadata embedded in the image. The docs for get_metadata() mention this.
While I know it's confusing, that's just how pdfium's API is designed, and we're only providing the bindings.

(Also note, I would have expected people to use one of the package-specific templates for an issue like this, just to have version info available and so on. I plan to clarify point 2 of the checklist as this seems to be unclear.)

from pypdfium2.

mara004 avatar mara004 commented on September 25, 2024 1

@PasaOpasen Ah, I figured something out. The image seems to be recursively nested in Form XObjects (twice, actually).
Probably that's what confuses pdfium. It much reminds me of https://crbug.com/pdfium/2073
I would hazard a guess that pdfium currently returns dimensions relative to the nearest Form XObject or something.

>>> import pypdfium2 as pdfium
>>> pdf = pdfium.PdfDocument("color_lines_bad.pdf")
>>> pdf
<PdfDocument uuid:9198c957 from '/home/me/Downloads/color_lines_bad.pdf'>
>>> page = pdf[0]
>>> list(page.get_objects(filter=[pdfium.raw.FPDF_PAGEOBJ_IMAGE], max_depth=1))
[]
>>> list(page.get_objects(filter=[pdfium.raw.FPDF_PAGEOBJ_IMAGE], max_depth=2))
[]
>>> list(page.get_objects(filter=[pdfium.raw.FPDF_PAGEOBJ_IMAGE], max_depth=3))
[<PdfImage uuid:0fc6e6e2>]

from pypdfium2.

mara004 avatar mara004 commented on September 25, 2024 1

I just filed https://bugs.chromium.org/p/pdfium/issues/detail?id=2100 for this.

from pypdfium2.

mara004 avatar mara004 commented on September 25, 2024 1

Would you mind closing this issue? I don't think we can do much else now except wait for pdfium.

Or do you reckon we should prevent get_pos() calls on nested objects by raising an exception?
However, we'd have to remember removing that again once the pdfium issue is fixed...

from pypdfium2.

mara004 avatar mara004 commented on September 25, 2024

I also checked with FPDFPageObj_GetRotatedBounds(), it effectively returns the same info:
((0.0, 0.0), (981.0, 0.0), (981.0, 1256.25), (0.0, 1256.25))

(If you're confident what pdfium returns is wrong, then feel free to ask about this on pdfium's mailing list or file a pdfium bug report.) - Update: see finding below

from pypdfium2.

PasaOpasen avatar PasaOpasen commented on September 25, 2024

@mara004 thank u!

from pypdfium2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.