Comments (2)
I believe it's an out-of-memory issue. But the app should warn about it if possible.
from ocrmypdf.
Most pages of this document have a small, very high resolution image in the bottom left corner. OCRmyPDF has to render the whole page at the same resolution for OCR, so it uses a heuristic to decide how to deal with small features that amounts to weighted average of the area occupied by the image.
There was an error in the calculation that tries to find an appropriate weighted DPI for OCR, so it put too much weight on that feature, leading to rendering everything at high resolution and running out of memory.
from ocrmypdf.
Related Issues (20)
- [Query]: docker watched folder environment variables, optimize how? HOT 2
- [Bug]: Output file is okay but is not PDF/A HOT 3
- does not ocr 90° rotated texts HOT 1
- [Feature]: Result Improvement with OpenCV + Pillow Preprocessing HOT 3
- [Bug]: ocrmypdf: error: unrecognized arguments: input.pdf output.pdf HOT 3
- [Bug]: NotImplementedError in colorspace HOT 6
- [Bug]: Regression in 16.4 HOT 7
- [Bug]: Scan time increases quadratically with page count HOT 8
- [Bug/Feature]: a way to disable Ghostscript requirement & broken plugin_manager option HOT 12
- [Bug]: Scan time regression in 16.4.3 with `--redo-ocr` HOT 14
- Recommended way of running ocrmypdf with memory limits
- [Bug]: "AttributeError: module 'numpy.typing' has no attribute 'NDArray'" after Homebrew installation HOT 6
- [Feature]: decrypt file if qpdf is installed (EncryptedPdfError: Input PDF is encrypted. The encryption must be removed to perform OCR.) HOT 1
- [Feature]: Add a flag to enable ocrmypdf to write "last-modified attribute" to the OCR'ed file. HOT 2
- [3rdparty]: 当使用ocrmypdf输入 PDF 为中文时,结果 复制PDF 中有额外的空格 HOT 1
- 当使用ocrmypdf输入 PDF 为中文时,结果 复制PDF 中有额外的空格 HOT 1
- How to remove the image-with-text from the PDF
- [Feature]: Align pages to text baseline HOT 2
- [Bug]: Tesseract fails on Alpine 3.20.3 HOT 1
- [Bug]: Cannot create a file when that file already exists
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ocrmypdf.