Comments (9)
Did it happen with google traineddata file (or custom training)?
from tesseract.
It happened with custom training
from tesseract.
Try to set LC_NUMERIC to C during training
from tesseract.
Hello,
I found that tesseract had a patch for this problem (https://code.google.com/p/tesseract-ocr/issues/detail?id=910)
Why is this not in the new version of Tesseract 3.04 ?
Will it be in the next version ?
Thanks
from tesseract.
Btw the custom training I use is not mine so I cannot run it again with LC_NUMERIC=C
from tesseract.
Why do you think this patch is not in current version??? issue 910 you are reffering has problem with official google traineddata file. This was fixed.
AFAIR problem is in custom training.
from tesseract.
Ok my bad.
But I just tried with the eng.traineddata from official google traineddata file and I've got the same error
"Error: Illegal min or max specification!
"Fatal error encountered!" == NULL:Error:Assert failed:in file globaloc.cpp, line 75"
from tesseract.
I'm having a hard time seeing how this is going wrong due to locale with the current code. The actual error is signaled here: https://github.com/tesseract-ocr/tesseract/blob/master/classify/clusttool.cpp#L89 which happens when it is unhappy with the results that tfscanf gets for the feature parameters. tfscanf is a private, locale-independent version of fscanf, which calls, in turn, the private tvfscanf which implements its own parsing of floats with a hard coded decimal separator of '.'
One thing that definitely could cause it though is a bad/corrupted feature parameter file.
I just tested with the stock tesseract 3.03 on a brand new Debian 8 installation with the locale set to fr_FR.UTF-8 and everything worked perfectly.
If you still can't get this to work, please post the output of the following commands:
uname -a
tesseract -v
locale
from tesseract.
@oelleo: unfortunately tesseract requires (at the moment) training data use dot as decimal separator => you need to correct your custom training data.
I think it could be possible without retraining. Try to unpack your data (combine_tessdata -u eng.traineddata tmp/eng.
) and fix decimal separator in eng.normproto
(replace eng with your name of your custom training)
from tesseract.
Related Issues (20)
- Mac m1, not able to compile HOT 2
- OCR of Indian Currency Sign " ₹" HOT 2
- please support linux binary , like fzf HOT 1
- Infinite recursion for `tesseract --list-langs` with conda-forge binary HOT 16
- SW Pipeline for Ubuntu failed HOT 2
- text2image needs pango_training.so which in turn is not installed HOT 7
- Can not read input file in /tmp HOT 3
- Tesseract can not recognize grey text in black background HOT 1
- Multiple language detection within an image HOT 2
- Floating-point exception (SIGFPE) due to out-of-range input to asinf in Wordrec::angle_change HOT 4
- Potential Null Pointer Dereference in Function `RecodeBeamSearch::ContinueContext` HOT 10
- Openmp cannot be disabled HOT 4
- Dropout layers for Tesseract HOT 5
- Build system drags in "avx512" support on setup that does not support this instructions HOT 2
- Inconsistencies in detection and extraction of text using tesseract HOT 4
- Floating point exception with tessdata models since version 5.4.0 HOT 6
- Differences in image contrast, brightness, and sharpness can lead to different directions of ocr recognition HOT 3
- OCR from grayscale TIFFs produces inconsistent results HOT 3
- Inconsistencies, sometimes, on similar characters. Is there a list of mistakes for characters that are very similar? HOT 1
- Assert fail in src/ccstruct/pageres.cpp, line 1502 with specific image and language combination - all languages from ubuntu repos HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tesseract.