Comments (6)
Ok these failures were because of missing fonts metadata, is fixed with this commit 04618c4
from pdfalto.
possibly related to #5
from pdfalto.
Less failure, but still some (I didn't count but more than 100 I think) - here are a few examples:
Uploading cjem8_2p0057.pdf…
Uploading ott-4-059.pdf…
Uploading gmos34-715.pdf…
from pdfalto.
What about using the same placeholder for all unsolved character codes above a certain number, in order to make pdfalto robust? Otherwise it will fail for some weird PDF which have complete embedded fonts with maybe hundred of unsolved codes?
from pdfalto.
This is done for the prod version (master).
@kermitt2 the documents you referenced above should be rather referenced in this issue #11.
I'll close this one
from pdfalto.
I only report PDF parser failure here :)
ccp_76_3_524.pdf
rev_117_1_210.pdf
from pdfalto.
Related Issues (20)
- XML to PDF HOT 1
- Is there an option to output ALTO XML to STDOUT? HOT 3
- heap-buffer-overflow found?
- empty image / svg
- compile error on RHEL 8.6 (Ootpa): /usr/bin/ld: cannot find -lstdc++ HOT 1
- Error case with invalid characters mapping
- Segmentation fault with pdf with comments
- Soft hyphens omitted HOT 3
- PDF to XML conversion time out for some files in server mode but run the pdfalto_server cmd in shell is fast and returns ok. HOT 1
- xpdf version 4.04
- ARM binaries for the Apple M1 HOT 3
- Cannot run pdfalto HOT 5
- PDF cause a crash with annotation option
- Building on arm64 Ubuntu Server 22.04 fails
- Building for Apple Silicon failed due to missing directories (with manual fix)
- Wrong characters / difference between extraction and display HOT 1
- [Suggestion] Reporting the byte location of images HOT 2
- Compilation error on arch linux HOT 1
- Error case, missing digits HOT 4
- Error case: double column, and line numbers
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pdfalto.