Comments (5)
Hi vitstradal,
Thank you for reporting this issue!
I have a busy day ahead of me, but I will try to look into it as soon as possible.
from combine_pdf.
Hi Vitstradal,
I found the issue...
PaperPort has an issue where PDF data will be placed within a PDF comment.
PDF comments start with a "%" sign and end with an EOL marker ("\r" or "\n"). PaperPort ommitted the EOL marker, placing critical data within the comment.
I wrote a work-around that parses the comment's data and attempts to salvage the misplaced critical information.
This workaround assumes that comments would not contain PDF parsable data at the very end of the comment's line... which is an unsafe assumption. hence, if I get reports that this workaround breaks valid PDF files with comments, I might remove it!
I'm running some tests and I will release an updated version shortly.
from combine_pdf.
I released the updated version (v. 0.2.12). It works on my computer... please let me know if it's working for you.
from combine_pdf.
if I get reports that this workaround breaks valid PDF files with comments, I might remove it!
I understand, buggy PaperPort, but what you want for €50 :-)
Anyway: It is weird , that ordinal PDF viewer (evince for me) will parse it.
Thank you very much.
from combine_pdf.
I'm happy this works for you :-)
As for the original PDF viewer reading the file:
At the end of the PDF file there is something called an X-Ref table. This table tells the PDF viewer the binary address of each object.
Normally, PDF viewers follow the X-Ref table and find objects using their binary address (even if the data starts inside a comment line).
But CombinePDF works a little differently - It reads the file to extract all the data by reading it completely, top to bottom, building a tree of objects as well as a list of objects... so that when you take a page out of the PDF, the fonts and the resources are automatically attached to that page (no need to search for them in the X-Ref table)...
...When I saw PaperPort's PDF, I was thinking about rewriting the parser to match binary viewers at some point (I believe I can still build a tree this way) - but it takes more work and I'm not sure how the performance might be effected... Maybe another time ;-)
from combine_pdf.
Related Issues (20)
- How to PDF Encrypted Property Change ??
- LIbreOffice / Scribus & embedding fonts HOT 2
- Line break is not working HOT 1
- Parsing specific PDF in 1.0.21 - RangeError: index out of range (works in 1.0.20) HOT 5
- General PDF error - Combine PDF fails on specific PDF file HOT 1
- Seeking maintainers?
- How to use Arial black font in pdf?
- Fuzzer + various crashes
- SystemStackError: stack level too deep for docx file HOT 1
- Broken when parsing PDF with empty object
- Element missing error when rendering
- CombinePDF.parse is unable to process some PDF files
- Smart links are not updating or working in the combine process
- no implicit conversion of Symbol into Integer when using fonts()
- Combine many files at the same time and memory usage
- Web optimized PDFs improperly parsed
- center watermark HOT 1
- ParsingError: Optional Content PDF - with some PDF files uploaded HOT 5
- Couldn't connect reference
- paragraph text is collapsing when trying to combine file HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from combine_pdf.