Comments (6)
Unfortunately I can't really comment on the values returned by get_pos()
, we're just forwarding them from FPDFPageObj_GetBounds()
...
I acknowledge DPI is a problem, but not really a bug, since pdfium calculates it from pixel size relative to the occupied canvas area, so this is not actually the DPI metadata embedded in the image. The docs for get_metadata()
mention this.
While I know it's confusing, that's just how pdfium's API is designed, and we're only providing the bindings.
(Also note, I would have expected people to use one of the package-specific templates for an issue like this, just to have version info available and so on. I plan to clarify point 2 of the checklist as this seems to be unclear.)
from pypdfium2.
@PasaOpasen Ah, I figured something out. The image seems to be recursively nested in Form XObjects (twice, actually).
Probably that's what confuses pdfium. It much reminds me of https://crbug.com/pdfium/2073
I would hazard a guess that pdfium currently returns dimensions relative to the nearest Form XObject or something.
>>> import pypdfium2 as pdfium
>>> pdf = pdfium.PdfDocument("color_lines_bad.pdf")
>>> pdf
<PdfDocument uuid:9198c957 from '/home/me/Downloads/color_lines_bad.pdf'>
>>> page = pdf[0]
>>> list(page.get_objects(filter=[pdfium.raw.FPDF_PAGEOBJ_IMAGE], max_depth=1))
[]
>>> list(page.get_objects(filter=[pdfium.raw.FPDF_PAGEOBJ_IMAGE], max_depth=2))
[]
>>> list(page.get_objects(filter=[pdfium.raw.FPDF_PAGEOBJ_IMAGE], max_depth=3))
[<PdfImage uuid:0fc6e6e2>]
from pypdfium2.
I just filed https://bugs.chromium.org/p/pdfium/issues/detail?id=2100 for this.
from pypdfium2.
Would you mind closing this issue? I don't think we can do much else now except wait for pdfium.
Or do you reckon we should prevent get_pos()
calls on nested objects by raising an exception?
However, we'd have to remember removing that again once the pdfium issue is fixed...
from pypdfium2.
I also checked with FPDFPageObj_GetRotatedBounds()
, it effectively returns the same info:
((0.0, 0.0), (981.0, 0.0), (981.0, 1256.25), (0.0, 1256.25))
(If you're confident what pdfium returns is wrong, then feel free to ask about this on pdfium's mailing list or file a pdfium bug report.) - Update: see finding below
from pypdfium2.
@mara004 thank u!
from pypdfium2.
Related Issues (20)
- conda: consider reading pdfium-binaries version from system?
- Installation failures within a Docker container, problem with ctypesgen? HOT 3
- Parsing special characters leading to inconsistency among different machines HOT 6
- musl binary size concerns
- pdfium fails to load in PHP on Almalinux 8.9 (PartitionAlloc check failure) HOT 8
- Buffer size mismatch when calling `get_text_range` HOT 14
- page.set_rotation() Over Rotating Some Pages
- page.render function returns an unexpected image HOT 7
- PdfDocument.get_page is non-thread-safe HOT 1
- Got different result of "pdfium_page.render()" on MacOS and Linux HOT 1
- Extracting text with special characters HOT 7
- expose more details in PdfiumError HOT 3
- Weird "PDFium: Data format error" when using pypdfium2 in Celery task. HOT 11
- The render func does not render form fields. HOT 4
- pypdfium2 on S390x (ubuntu22.04) HOT 6
- After get_text_bounded, cannot MOVE file HOT 2
- Install with reference bindings broken
- Extract form-filling values from a PDF document that supports Acroform HOT 1
- 当,pdf某一页异常的时候,renderer 到那一页后就会一直卡顿 HOT 2
- Adapt setup code to `bdist_wheel` relocation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pypdfium2.