Code Monkey home page Code Monkey logo

tesseract-ocr's People

Contributors

jimregan avatar theraysmith avatar zdenop avatar

tesseract-ocr's Issues

Essential Installation after successful make

Please verify the essential installation process after successful make.

Currently,  "make install" moves
ccmain/tesseract -> /usr/local/bin
training/cntraining -> /usr/local/bin
training/mftraining -> /usr/local/bin

And I moved tessdata to /usr/local/bin with
>sudo find tessdata/ -depth -print | cpio -pamvd /usr/local/bin/

There are no other Unix executable files in the build directories.  So as
long as /usr/local/bin in is your path, are you good to go?

Installing Version 1.03 on Mac OS X 10.4.8 (Darwin 8.8.0)

Original issue reported on code.google.com by [email protected] on 16 Mar 2007 at 10:14

After training tesseract it dies when trying to create text from an image

What steps will reproduce the problem?
1. Follow the procedure on
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract to try to
train tesseract to recognize slovene chars
2. move the 8 files into the tessdata folder
3. tesseract <tif_image> output -l slo

What is the expected output? What do you see instead?
No output, just an output.txt file, but instead I get 'assertion
"ids.contains(unichar_repr, length)" failed: file "unicharset.cpp", line
67' 'Abort (core dumped)'.

What version of the product are you using? On what operating system?
2.00, on DragonFly BSD 1.9.0-DEVELOPMENT.

Please provide any additional information below.
I took a book from http://www.omnibus.se/beseda/ which has several free
eBooks, convert a PDF into tiff images (with ImageMagick: convert
056-1-1.pdf -colorspace gray -depth 8 056-%d.tif), took a few of these
tiffs (the ones I used are attached), and after following the training
procedure (the resulting 8 files are also attached), I moved the files into
my tessdata folder, tried to test tesseract with these added files on an
image and it just dies (the error message is written under "What is the
expected output? What do you see instead?").

Original issue reported on code.google.com by [email protected] on 22 Jul 2007 at 1:26

Attachments:

[ 1586032 ] core dump during test

Thomas Klausner - thomasklausner(sf)

With the patches from 1586031 applied, I get a core
dump when I try to run tesseract on phototest.tif on
NetBSD-4.99.3/amd64.

The backtrace is:
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004c1c70 in reverse32 ()
(gdb) bt
#0 0x00000000004c1c70 in reverse32 ()
#1 0x00000000004aed12 in read_squished_dawg ()
#2 0x00000000004aaded in init_permute ()
#3 0x0000000000485779 in program_editup ()
#4 0x0000000000485869 in start_recog ()
#5 0x0000000000403d04 in init_tesseract ()
#6 0x000000000040309b in main ()
(gdb)

I don't know yet what causes this problem.

Comments

Date: 2006-10-31 13:34
Sender: ac_account
Logged In: YES 
user_id=1016107

This particular problem is due to ccutil/host.h having
inconsistent definitions for 32bit ints:

typedef long INT32;
typedef unsigned int UINT32;

This is plainly wrong, as int and long have no reason to
have the same size on a given platform. As a temproary fix,
changing long to int would make this particular crash go
away (and be replaced by compile errors and other crashes).
A better solution IMHO would be, since configure checks for
the C99 header stdint.h anyway, to do sokmething like:

#ifdef HAVE_STDINT_H /*defined or not in the generated
config_auto.h */
#include <stdint.h>

typedef int8_t INT8;
typedef uint8_t UINT8;
typedef int16_t INT16;
typedef uint16_t UINT16;
typedef int32_t INT32;
typedef uint32_t UINT32;

/* same for pointers and const pointers */
#else
/* original defines, for old compilers that do not have
stdint.h */
#endif

and leave the platform-specific C library figure out how to
define the sizes.

Note that making INT32 a 32-bit integer will break some
pointer-to INT32 conversions (gcc flags that as an error) -
typically in lengths passed through pointers. Making the
length a long or an explicit cast pointer->long->INT32 fixes
that problem.

And please, please, maintainers, fix the bloody name
spellings - s/case_sensative/case_sensitive/p and so on. The
code is spaghetti enough without having to parse Engrish as
well.

Original issue reported on code.google.com by [email protected] on 7 Mar 2007 at 10:26

public header host.h includes config_auto.h

Hi,

What steps will reproduce the problem?
1. Get tesseract SVN, build and install
2. Take a software using tesseract headers and autotools including autoheader.
3. If the software includes host.h (directly or indirectly), host.h tries
to include config_auto.h because HAVE_CONFIG_H is defined.

What is the expected output? What do you see instead?
host.h must not rely on config_auto.h

What version of the product are you using? On what operating system?
SVN (r72)

Please provide any additional information below.
Attached a patch adding a host-private.h header depending on config_auto.h.

Best regards.

Original issue reported on code.google.com by [email protected] on 30 Jun 2007 at 8:19

Attachments:

tarball version does not match directory vresion

What steps will reproduce the problem?
1. tar xzvf tesseract-%{version}.tar.gz
2. cd tesseract-%{version}

What is the expected output? 
(no error)

What do you see instead?
"No such directory"

What version of the product are you using? On what operating system?
N/A

Please provide any additional information below.
This inconsistency makes building an RPM slightly messy.

Original issue reported on code.google.com by [email protected] on 25 May 2007 at 8:41

[ 1553160 ] limits.h and cygwin

Michael Toews - mwtoews(sf)

Hi.
I've tried compiling: `configure' works perfectly,
however the `make' command fails under cygwin.
Here is an excerpt from `./configure':

checking limits.h presence... yes

However, during `make':

In file included from ../ccutil/host.h:73,
from bpsupport.cpp:32:
../ccutil/platform.h:7:26: linux/limits.h: No such file
or directory

and the rest of `make' fails. Clearly the problem is
the assumption of a linux platform .. but it checked
for `configure' ?? I've attached a few config files
generated from the successful `./configure'.

My system is:
$ uname -a
CYGWIN_NT-5.1 spacetime 1.5.21(0.156/4/2) 2006-07-30
14:21 i686 Cygwin

+mt

Comments

Date: 2006-09-29 08:32
Sender: nobody
Logged In: NO 

for cygwin I solved this way

on file         ccutil/platform.h

replace

#include <linux/limits.h>

with

#include <limits.h>



Date: 2006-09-15 01:07
Sender: karimabadi
Logged In: YES 
user_id=1598026

in ccutil\platform.h

if you change line #7 
#include <linux/limits.h>  --> #include <limits.h>

the code will compile successfully on cygwin.

SK


Original issue reported on code.google.com by [email protected] on 7 Mar 2007 at 10:29

inconsistent linkage

What steps will reproduce the problem?
1. attemting to configure, make, make install in tesseract-ocr
2.
3.

What is the expected output? What do you see instead?
expect it to install. However, the make fails with:
if g++ -DHAVE_CONFIG_H -I. -I. -I..  -I../ccstruct -I../ccutil -I../cutil
-I../classify -I../image -I../dict -I../viewer   -g -O2 -MT tface.o -MD -MP
-MF ".deps/tface.Tpo" -c -o tface.o tface.cpp; \
        then mv -f ".deps/tface.Tpo" ".deps/tface.Po"; else rm -f
".deps/tface.Tpo"; exit 1; fi
../cutil/globals.h:46: error: previous declaration of ‘int optind’ with
‘C++’ linkage
../ccutil/getopt.h:23: error: conflicts with new declaration with ‘C’ 
linkage
../cutil/globals.h:47: error: previous declaration of ‘char* optarg’ with
‘C++’ linkage
../ccutil/getopt.h:24: error: conflicts with new declaration with ‘C’ 
linkage
make[2]: *** [tface.o] Error 1
make[2]: Leaving directory `/home/ray/.src/tesseract-ocr/wordrec'
make[1]: *** [install-recursive] Error 1


What version of the product are you using? On what operating system?
This is tesseract-ocr from SVN yesterday, compiling on FC6
 gcc --version
gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-51)


Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 11 Apr 2007 at 9:34

add unit tests

Add code for unit testing the different components of the OCR engine.

Original issue reported on code.google.com by [email protected] on 7 Mar 2007 at 10:45

The readme should shay how to get the language files

What steps will reproduce the problem?
1. Install tesseract-ocr 2.0 from sources
2. run it
3. Notice that it doesn't run, find about it, etc.

Perhaps the README should say something about the
language files, since novice users of tesseract (as me)
might have issues locating them.


What version of the product are you using? On what operating system?

  2.0 / GNU/Linux Debian

Original issue reported on code.google.com by [email protected] on 22 Jul 2007 at 12:30

Patch needs to be merged into Release 78 to fix ocropus 100% CPU bug

What steps will reproduce the problem?
1. Build & run tesseract on Redhat Enterprise Linux 4.0

What is the expected output? What do you see instead?

Freezes/100% CPU used. This is entirely resolved by applying a patch posted
by another user, see:
http://sourceforge.net/tracker/download.php?group_id=158586&atid=808426&file_id=
215692&aid=1658610

P.S. This is a cross-post from ocropus issues:
http://code.google.com/p/ocropus/issues/detail?id=20&can=2&q=

Original issue reported on code.google.com by [email protected] on 22 Apr 2007 at 11:56

[ 1589334 ] segfault for filenames without a dot

Roger Luethi - rluethi(sf)

In pgedit.cpp:797, strrchr() will return NULL if the
given filename is lacking a dot. but the code doesn't
check for it.

Therefore, feeding the program a filename without a dot
results in a segfault as the subsequent strcmp()
expects valid pointers.

Original issue reported on code.google.com by [email protected] on 7 Mar 2007 at 10:25

tesseract-1.04b.tar.gz make error on fedora 5

What steps will reproduce the problem?
1. .configure reported no problems 
2. make dies with error below
3.

What is the expected output? What do you see instead?

if g++ -DHAVE_CONFIG_H -I. -I. -I..  -I../ccstruct -I../ccutil -I../cutil
-I../classify -I../image -I../dict -I../viewer   -g -O2 -MT tface.o -MD -MP
-MF ".deps/tface.Tpo" -c -o tface.o tface.cpp; \
then mv -f ".deps/tface.Tpo" ".deps/tface.Po"; else rm -f
".deps/tface.Tpo"; exit 1; fi
../cutil/globals.h:49: error: previous declaration of ‘char* optarg’ with
‘C++’ linkage
/usr/include/getopt.h:59: error: conflicts with new declaration with ‘C’
linkage
../cutil/globals.h:48: error: previous declaration of ‘int optind’ with
‘C++’ linkage
/usr/include/getopt.h:73: error: conflicts with new declaration with ‘C’
linkage
make[3]: *** [tface.o] Error 1
make[3]: Leaving directory `/home/neumann/Desktop/tesseract-1.04/wordrec'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/neumann/Desktop/tesseract-1.04/wordrec'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/neumann/Desktop/tesseract-1.04'
make: *** [all] Error 2

What version of the product are you using? On what operating system?
tesseract-1.04b.tar.gz on linux (Fedora 5)

Please provide any additional information below.

I doubt it is relevant: "configure" gave one warning: "`missing' script is
too old or missing"

Original issue reported on code.google.com by [email protected] on 20 May 2007 at 5:14

Could not open file, /usr/local/bin/tessdata/freq-dawg

What steps will reproduce the problem?
1. Download Tesseract
2. Install Tesseract
3. Issue "tesseract {$img} {$output}"

What is the expected output? What do you see instead?
execution but returns "Could not open file,
/usr/local/bin/tessdata/freq-dawg" instead

What version of the product are you using? On what operating system?
tried on 1.03 and 1.02

Please provide any additional information below.
On Ubuntu 6.10

Thanks!

Original issue reported on code.google.com by [email protected] on 1 Apr 2007 at 8:13

Unable to load unicharset file

What version of the product are you using? On what operating system?

Windows 2.00

Please provide any additional information below.

Run: tesseract.exe tithe.tif tithe.hope -l enm
log file reads: Unable to load unicharset file C:/tesseract-
2.00/tessdata/enm.unicharset

Looked at file eng.unicharset with hex editor. The file does not contain 
msdos linefeed character 0d only 0a (as in unix?) Nor does it contain the 
3 character strings I think are for utf-8 file format.

So used hex editor to make my enm.unicharset look like eng.unicharset.

Now get log file message:
Error: 32 classes in inttemp while unicharset contains 35 unichars.

Please convert 0001.jpg to tif for full image



Original issue reported on code.google.com by [email protected] on 2 Aug 2007 at 4:58

Attachments:

SVN Revision XXX broken

A compile of the current SVN version results in
../config/depcomp: ../config/depcomp: No such file or directory
errors. Seemingly somebody forgot to check in ./config/depcomp.

I see this on MacOS 10.4 and FreeBSD 6.2. Issue 33 indicates it also happens on 
Linux

Original issue reported on code.google.com by [email protected] on 8 Jun 2007 at 9:53

Tesseract.exe 2.00 Vista 64bit does not run

What steps will reproduce the problem?
1. Execute tesseract.exe 2.00
2.
3.

What is the expected output? What do you see instead?
Expect the program to run.  Instead the following error occurs: "The 
application has failed to start because its side-by-side configuration is 
incorrect.  Please see the application eventlog for more detail."

What version of the product are you using? On what operating system?
2.00, 64bit Windows Vista Ultimate

Please provide any additional information below.
Event log contents:

Activation context generation failed for "C:\tesseract-
2.00.exe\tesseract.exe". Dependent Assembly 
Microsoft.VC80.CRT,processorArchitecture="x86",publicKeyToken="1fc8b3b9a1e1
8e3b",type="win32",version="8.0.50727.762" could not be found. Please use 
sxstrace.exe for detailed diagnosis.

Original issue reported on code.google.com by [email protected] on 18 Jul 2007 at 10:25

GCC 4.2: C/C++ linkage declarations conflict

The bug report in Debian package on the older 1.02 version 
 http://bugs.debian.org/409673 
has method to produce GCC4.2 C/C++ linkage declarations conflict issue with
this source.  The bug reporter (ex-Debian leader) attaches patch in the bug
report.

As I checked the latest 2.0 source, there is no extern "C" statement in the
corresponding lines

I hope this information will help upstream.
(I am just forwarding information. I did not test with GCC4.2 yet.)

Regards,

Osamu

Original issue reported on code.google.com by [email protected] on 22 Jul 2007 at 2:34

[ 1586031 ] tesseract-1.02 vs NetBSD/amd64: problems and fixes

Thomas Klausner - thomasklausner(sf)

I compiled tesseract-1.02 on NetBSD-4.99.3/amd64, and
had some trouble because
gcc (GCC) 4.1.2 20061021 prerelease (NetBSD nb1 20061021)
complained about two kinds of problem:
Mixing "C" linkage with "C++" linkage and casting
pointers to ints, which "loses precision". Pointers are
64bit wide, ints only 32bit.
Additionally, ccutil/platform.h tries to include a
Linux-specific header file, linux/limits.h. Using
limits.h instead seems to work fine.
Since I started with tesseract-1.0, I also have a patch
to fix the path to the xterm executable -- on NetBSD it
is in /usr/X11R6/bin, not in /usr/bin/X11.
Please apply these or similar patches (I can test if
you want).

Comments

Date: 2006-12-25 08:39
Sender: thomasklausner
Logged In: YES 
user_id=205695
Originator: YES

Thanks for the pointer to the bokeoa-64bit-branch of tesseract.
I tested the version from Dec 2.

First, regarding linux/limits.h -- on the Linux system
to which I have access, limits.h also exists, and AFAIK
limits.h is the POSIX header for the corresponding symbols.
So perhaps you could replace "linux/limits.h" with a plain
"limits.h" in ccutil/platform.h.

As for compilation of tesseract itself, here's the output
after the change described above:

../cutil/globals.h:47: error: previous declaration of 'char* optarg' with
'C++' linkage
/usr/include/unistd.h:154: error: conflicts with new declaration with 'C'
linkage
../cutil/globals.h:46: error: previous declaration of 'int optind' with
'C++' linkage
/usr/include/unistd.h:156: error: conflicts with new declaration with 'C'
linkage

This is fixed by adding
        extern "C" {
        }
around the optind and optarg lines in cutils/globals.h.
The same is needed for displayargs in cutils/tord.h.

With the attached diffs, tesseract compiles (with a lot of compilation
warnings, details on request), and works with phototest.tif on
NetBSD-4.99.4/amd64! Cool!
File Added: tesseract.diff


Date: 2006-12-01 15:30
Sender: bokeoa
Logged In: YES 
user_id=1340826
Originator: NO

Thomas,

Try the bokeoa-64bit-branch from CVS and see if it fixes your amd64
problems.  I looked at your patch and it looks specific to NetBSD,
if you can redo it so that it would work on both NetBSD and Linux,
I'll do what I can to get it included.

Bryan

Original issue reported on code.google.com by [email protected] on 7 Mar 2007 at 10:27

Result Not The Same

What steps will reproduce the problem?
1. Run Tesseract.exe with tiff image
2. Run dlltest.exe with same tiff image above
3. Result from 1 & 2 is different

Above procedure has been tested out in my machine running on window xp sp2.
I  thought of getting the same result, but at the end, the accuracy and
result also different. It seem like the Tesseract.exe is much more stable
than dlltest.

Original issue reported on code.google.com by [email protected] on 12 Jul 2007 at 7:54

[ 1669644 ] Crash in letter_is_okay() with trigger

Filip Gieszczykiewicz - filipg(sf)

recognizing attached tif with v1.03 crashes as follows:
pppppspppppppppspppppppppsppppppppppppppppppppppppppp
Program received signal SIGSEGV, Segmentation fault.
0x080fb8a8 in letter_is_okay (dawg=0xb7f09008, node=0xbf815a04,
char_index=7, prevchar=0 '\0',
word=0xbf815bff "proto-ft", word_end=0) at dawg.cpp:49
49 if (edge_occupied (dawg, edge)) {
(gdb) bt
#0 0x080fb8a8 in letter_is_okay (dawg=0xb7f09008, node=0xbf815a04,
char_index=7, prevchar=0 '\0',
word=0xbf815bff "proto-ft", word_end=0) at dawg.cpp:49
#1 0x080f3b26 in append_next_choice (dawg=0xb7f09008, node=108107,
permuter=5 '\005',
word=0xbf815bff "proto-ft", choices=0x82e7ad0, char_index=7,
this_choice=0x8260df0,
prevchar=0 '\0', limit=0xbf815c28, rating=0, certainty=-1.15637732,
rating_array=0xbf815ab4,
certainty_array=0xbf815b58, word_ending=0, last_word=0,
result=0xbf815a58) at permdawg.cpp:202
#2 0x080f3f03 in dawg_permute (dawg=0xb7f09008, node=108107, permuter=5
'\005',
choices=0x82e7ad0, char_index=7, limit=0xbf815c28, word=0xbf815bff
"proto-ft", rating=0,
certainty=0, rating_array=0xbf815ab4, certainty_array=0xbf815b58,
last_word=0)
at permdawg.cpp:273
#3 0x080f40b3 in dawg_permute_and_select (string=0x814f9fc "system
words:", dawg=0xb7f09008,
permuter=5 '\005', character_choices=0x82e7ad0, best_choice=0x8260d40,
system_words=1)
at permdawg.cpp:334
#4 0x080f5640 in permute_words (char_choices=0x82e7ad0, rating_limit=1000)
at permute.cpp:1611
#5 0x080f6549 in permute_all (char_choices=0x82e7ad0, rating_limit=1000,
raw_choice=0xbf815dc8)
at permute.cpp:1092
#6 0x080f6952 in permute_characters (char_choices=0x82e7ad0, limit=1000,
best_choice=0xbf815dd8,
raw_choice=0xbf815dc8) at permute.cpp:1146
#7 0x080d1ef6 in chop_word_main (word=0x826f830, fx=1,
best_choice=0xbf815dd8,
raw_choice=0xbf815dc8, tester=0 '\0', trainer=0 '\0') at
chopper.cpp:476
#8 0x080cf426 in cc_recog (tessword=0x826f830, best_choice=0xbf815dd8,
best_raw_choice=0xbf815dc8, tester=0 '\0', trainer=0 '\0') at
tface.cpp:247
#9 0x08069a94 in recog_word_recursive (word=0x826e9f0, denorm=0x826be54,
matcher=0x80684a0 <tess_default_matcher(PBLOB*, PBLOB*, PBLOB*, WERD*,
DENORM*, BLOB_CHOICE_LIST&)>, tester=0, trainer=<value optimized out>,
testing=0 '\0', raw_choice=@0x826be7c,
blob_choices=0xbf8162b8, outword=@0x826be50) at tfacepp.cpp:191
#10 0x0806a380 in recog_word (word=0x826e9f0, denorm=0x826be54,
matcher=0x80684a0 <tess_default_matcher(PBLOB*, PBLOB*, PBLOB*, WERD*,
DENORM*, BLOB_CHOICE_LIST&)>, tester=0, trainer=0, testing=0 '\0',
raw_choice=@0x826be7c, blob_choices=0xbf8162b8,
outword=@0x826be50) at tfacepp.cpp:90

I don't think it's related to issue 1546972

It is dependent on the specific image - recreating a new TIF with pbmtext
of the contained text does not crash. Also, scaling image -2.0 or +2.0 does
not crash - just this one does.

Argh, image too big for sf.net - see
http://tesseract-ocr.repairfaq.org/downloads/b37by2.tif

Original issue reported on code.google.com by [email protected] on 7 Mar 2007 at 10:19

compile fails on Fedora 7: getopt linkage

What steps will reproduce the problem?

1. unpack & run ./configure.

2. run "make"

What is the expected output?

A successful compile


What do you see instead?

partial results:

if g++ -DHAVE_CONFIG_H -I. -I. -I..  -I../ccstruct -I../ccutil -I../cutil
-I../classify -I../image -I../dict -I../viewer   -g -O2 -MT tface.o -MD -MP
-MF ".deps/tface.Tpo" -c -o tface.o tface.cpp; \
        then mv -f ".deps/tface.Tpo" ".deps/tface.Po"; else rm -f
".deps/tface.Tpo"; exit 1; fi
../cutil/globals.h:49: error: previous declaration of ‘char* optarg’ with
‘C++’ linkage
/usr/include/getopt.h:59: error: conflicts with new declaration with ‘C’
linkage
../cutil/globals.h:48: error: previous declaration of ‘int optind’ with
‘C++’ linkage
/usr/include/getopt.h:73: error: conflicts with new declaration with ‘C’
linkage



What version of the product are you using? On what operating system?

[ccurley@charlesc tesseract-1.04]$ ls ../tesseract-1.0* -d
../tesseract-1.03  ../tesseract-1.03.tar.gz  ../tesseract-1.04 
../tesseract-1.04b.tar.gz
[ccurley@charlesc tesseract-1.04]$ uname -a
Linux charlesc.localdomain 2.6.21-1.3228.fc7 #1 SMP Tue Jun 12 15:37:31 EDT
2007 i686 athlon i386 GNU/Linux



Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 26 Jun 2007 at 8:39

Add a .spec to the source tarball distribution

A .spec is the source code for an RPM, which is the package system used by
Fedora, Suse, Mandriva, etc.

I propose a .spec be included in the tesseract source tarball so that it's
easy to build RPMs.

I have built a very rough .spec that works on Fedora Core 6.  If you decide
to include it, I could give it some polish.

BTW, solving issues 31 and 29 would make the .spec simpler and more portable.

Original issue reported on code.google.com by [email protected] on 25 May 2007 at 9:20

Attachments:

[ 1557358 ] tesseracts hangs on Fedora Core 3

Christoph M. Friedrich - christophmf(sf)

i successfully compiled tesseract and if i run it on
the given example file, it hangs. Ths same code
compiled under cygwin, successfully runs.

ive done a strace and it hangs on:

read(6, "O", 1) = 1
read(6, "u", 1) = 1
read(6, "t", 1) = 1
read(6, "p", 1) = 1
read(6, "u", 1) = 1
read(6, "t", 1) = 1
read(6, "_", 1) = 1
read(6, "L", 1) = 1
read(6, "a", 1) = 1
read(6, "y", 1) = 1
read(6, "e", 1) = 1
read(6, "r", 1) = 1
read(6, "1", 1) = 1
read(6, "\0", 1) = 1
read(6, "\0\0P\310", 4) = 4
read(6,
"?\7\307\270\276\342>U>\0301d\277\244p/\277M\341\370\276"...,
82720) = 82720
close(6) = 0
write(2, "Tesseract Open Source OCR Engine"...,
33Tesseract Open Source OCR Engine
) = 33
open("phototest.tif", O_RDONLY|O_LARGEFILE) = 6
read(6, "II*\0\10\0\0\0", 8) = 8
fstat64(6, {st_mode=S_IFREG|0644, st_size=38668, ...}) = 0
mmap2(NULL, 38668, PROT_READ, MAP_SHARED, 6, 0) =
0xf6d57000
fstat64(6, {st_mode=S_IFREG|0644, st_size=38668, ...}) = 0
brk(0xa51c000) = 0xa51c000
munmap(0xf6d57000, 38668) = 0
close(6) = 0
open("phototest.bl", O_RDONLY) = -1 ENOENT (No
such file or directory)
open("phototest.vec", O_RDONLY) = -1 ENOENT (No
such file or directory)
open("phototest.uzn", O_RDONLY) = -1 ENOENT (No
such file or directory)
open("phototest.pd", O_RDONLY) = -1 ENOENT (No
such file or directory)
times({tms_utime=3, tms_stime=1, tms_cutime=0,
tms_cstime=0}) = 528269873
brk(0xa53d000) = 0xa53d000
times({tms_utime=4, tms_stime=1, tms_cutime=0,
tms_cstime=0}) = 528269876

Comments

Date: 2006-10-19 04:58
Sender: christophmf
Logged In: YES 
user_id=215626

For me, FC3 the sent patch:

Index: textord/makerow.cpp
===================================================================
RCS file:
/cvsroot/tesseract-ocr/tesseract/textord/makerow.cpp,v
retrieving revision 1.1
diff -r1.1 makerow.cpp
631a632
>   INT32 temp;
663c664,665
<       deltas[(INT32) floor (bottom) - min_y] += width;

---
>       temp = (INT32) floor (bottom) - min_y;
>       deltas[temp] += width;
669c671,672
<       deltas[(INT32) floor (top) - min_y] -= width;

---
>       temp = (INT32) floor (top) - min_y;
>       deltas[temp] -= width;


works. So the change to non-optimized compiling is not
necessary. 

Thanks. 


Date: 2006-10-15 20:27
Sender: nobody
Logged In: NO 

If it's the same problem I had (also had hangs on FC3), try
this.  And no, I have no idea why this fixes the problem,
but either this code change or compiling with -O instead of
-O3 fixed things:

Index: textord/makerow.cpp
===================================================================
RCS file: /cvsroot/tesseract-ocr/tesseract/textord/makerow.cpp,v
retrieving revision 1.1
diff -r1.1 makerow.cpp
631a632
>   INT32 temp;
663c664,665
<       deltas[(INT32) floor (bottom) - min_y] += width;

---
>       temp = (INT32) floor (bottom) - min_y;
>       deltas[temp] += width;
669c671,672
<       deltas[(INT32) floor (top) - min_y] -= width;

---
>       temp = (INT32) floor (top) - min_y;
>       deltas[temp] -= width;




Date: 2006-09-29 09:36
Sender: rsholmes
Logged In: YES 
user_id=1609632

Tesseract also hangs for me.  I'm running Scientific Linux
4.3, a clone of RHEL 4.  I did:

  ./configure
  make
  ln -s ccmain/tesseract ./
  tesseract phototest.tif test batch

It writes

  Tesseract Open Source OCR Engine

to the screen and hangs.

I can supply the config log if needed.


Original issue reported on code.google.com by [email protected] on 7 Mar 2007 at 10:28

In-complete OCR result

What steps will reproduce the problem?
1. ./tesseract 1.tiff output
2.
3.

What is the expected output? What do you see instead?
Expected output: 11 Slate/Payer`s state no. MA 08-61738
Seen output: 11 State/Payc-1r`s stale no. MA

What version of the product are you using? On what operating system?
tesseract revision: 2.00
OS: fedora core 6

Please provide any additional information below.
Output do not show OCR result for the numbers after MA in the
image(1.tiff). It is just blank( not even incorrect result ).
If i cut just the numbers after the MA text in to a separate image (2.tiff
)i get the correct result for the numbers.

I get same results with version 1.03 also. 

I tried to debug and find the reason.. here are my observations, but could
not rectify it.

TessBaseAPI::TesseractRect()-->RecognizeToString()
                                   |-->FindLines(&block_list)
                                   |
                                   |---->Recognize(&block_list, NULL)

When printed the block list after FindLines(&block_list); 
Werds which did not have the ocr results had rejected all the blobs in it.
Here is the dump of one such WERD.
Blanks= 1
Bounding box=(471,24)->(571,55)
Flags = 0 = 00
   W_SEGMENTED = FALSE 
   W_ITALIC = FALSE 
   W_BOL = FALSE 
   W_EOL = FALSE 
   W_NORMALIZED = FALSE 
   W_POLYGON = FALSE 
   W_LINEARC = FALSE 
   W_DONT_CHOP = FALSE 
   W_REP_CHAR = FALSE 
   W_FUZZY_SP = FALSE 
   W_FUZZY_NON = FALSE 
Correct= 
Rejected cblob count = 5

This werd had 5 blobs and all of them were rejected.( do not know why.. ).

After this when Recognize(&block_list, NULL) is called for the recognizing
the WERDS it does not consider the werds with all rejected blobs. and gives
blank string for such werds.


Original issue reported on code.google.com by [email protected] on 19 Jul 2007 at 10:53

Attachments:

phototest.tif not recognized correctly

Version 1.03 builds on MacOSX 10.4.x (PPC), but calling

./tesseract phototest.tif out

I see just garbage in out.txt.  Not a single character is properly recognized.

Original issue reported on code.google.com by [email protected] on 10 Mar 2007 at 8:04

HOWTO: compliation fails due to mismatched linkage

someone else posted this fix somewhere, which i found at 4am. can't find it
again now, in the daylight, so here it is again:

PLATFORMS: Various, including opensuse 10.2

PROBLEM: after successful ./configure, make fails due to C/C++ conflicts
such as:

........ previous declaration of ‘.....’ with ‘C++’ linkage
........ conflicts with new declaration with ‘C’ linkage

VERSIONS: various, including tesseract-1.04b.tar.gz

SOLUTION: simple edits two header files

11111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111

# diff -C 3 ./cutil/globals.h~ ./cutil/globals.h
*** ./cutil/globals.h~  2007-05-15 20:13:26.000000000 -0500
--- ./cutil/globals.h   2007-06-16 04:27:42.000000000 -0500
***************
*** 45,53 ****
  extern int debugs[MAXPROC];      /*debug flags */
  extern int plots[MAXPROC];       /*plot flags */
  extern int corners[4];           /*corners of scan window */
  extern int optind;               /*option index */
  extern char *optarg;             /*option argument */
!                                  /*image file name */
  extern char imagefile[FILENAMESIZE];
                                   /* main directory */
  extern char directory[FILENAMESIZE];
--- 45,58 ----
  extern int debugs[MAXPROC];      /*debug flags */
  extern int plots[MAXPROC];       /*plot flags */
  extern int corners[4];           /*corners of scan window */
+ #ifdef __cplusplus
+ extern "C" {
+ #endif
  extern int optind;               /*option index */
  extern char *optarg;             /*option argument */
! #ifdef __cplusplus
! }
! #endif                               /*image file name */
  extern char imagefile[FILENAMESIZE];
                                   /* main directory */
  extern char directory[FILENAMESIZE];

2222222222222222222222222222222222222222222222222222222222222222
2222222222222222222222222222222222222222222222222222222222222222

# diff -C 3 ./cutil/tordvars.h~ ./cutil/tordvars.h
*** ./cutil/tordvars.h~ 2007-05-16 16:33:53.000000000 -0500
--- ./cutil/tordvars.h  2007-06-16 04:25:43.000000000 -0500
***************
*** 39,44 ****
--- 39,46 ----
  extern FILE *correct_fp;                    //correct text
  extern FILE *matcher_fp;

+ extern "C"
+ {
  extern int blob_skip;                       /* Skip to next selection */
  extern int num_word_choices;                /* How many words to keep */
  extern int similarity_enable;               /* Switch for Similarity */
***************
*** 50,55 ****
--- 52,58 ----
  extern int show_bold;                       /* Use bold text */
  extern int display_text;                    /* Show word text */
  extern int display_blocks;                  /* Show word as boxes */
+ }

  extern float overlap_threshold;             /* Overlap Threshold */
  extern float certainty_threshold;           /* When to quit looking */





Original issue reported on code.google.com by [email protected] on 16 Jun 2007 at 9:57

Straw poll - VC++6 vs VC++ Express

I would like to poll Windows users of Tesseract out there, but only those
that build from the source code.

Currently, we have just VC++6 dsp and dsw files for Tesseract. With 2.0 we
might have new format files for VC++ express. These will not be backwards
compatible with VC++6. So the question is if VC++6 is no longer supported,
will anybody squeal?

Can anybody who has any comment on this choice please add a comment to this
thread. Thanks, Ray.

Original issue reported on code.google.com by [email protected] on 13 Jul 2007 at 11:28

[ 1577219 ] error cygwin

./configure &amp;&amp; make
./configure: line 1329: tesseract: command not found
checking build system type... i686-pc-cygwin
checking host system type... i686-pc-cygwin
checking for cl.exe... no
checking for g++... no
checking for C++ compiler default output... configure:
error: C++ compiler canno
t create executables
See `config.log' for more details.

Original issue reported on code.google.com by [email protected] on 7 Mar 2007 at 10:28

[ 1552781 ] 64-bit building fails

Graeme Humphries - unit3Accepting Donations(sf)

Building on Ubuntu/dapper AMD64 fails with the
following error:

make[3]: Entering directory
`/home/graemehu/Downloads/installers/packages/tesseract-1.0/aspirin'
source='bpsupport.cpp' object='bpsupport.o' libtool=no \
depfile='.deps/bpsupport.Po'
tmpdepfile='.deps/bpsupport.TPo' \
depmode=gcc3 /bin/sh ../config/depcomp \
g++ -DHAVE_CONFIG_H -I. -I. -I.. -I../ccutil
-I../cutil -DNDEBUG -O3 -Wall -c -o bpsupport.o `test
-f 'bpsupport.cpp' || echo './'`bpsupport.cpp
../ccutil/strngs.h: In member function ‘void
STRING::de_dump(FILE*)’:
../ccutil/strngs.h:171: error: cast from ‘char*’ to
‘int’ loses precision
make[3]: *** [bpsupport.o] Error 1

It looks like the code is pretty much not 64-bit safe,
patching that particular instance just leads to more
problems croping up with pointer casting problems
(64-bit pointers and 32-bit ints).

I'd patch this, but I'm not sure what the
rammifications are of just changing the int casts to
(long long) casts. I suspect other parts of the code
will just treat them as int32s and break things anyway,
even if it compiles.

Comments

Date: 2006-12-01 16:31
Sender: efleury
Logged In: YES 
user_id=122014
Originator: NO

Doh... 

Yes, you wrote "branch" and I read "patch"... Sorry for that.

I just compiled it and it went through the phototest.tif example with no
harm. I'll take a look at the way you solved this 'cos I'm interested in it
! :)

Thanks a lot.


Date: 2006-12-01 16:18
Sender: bokeoa
Logged In: YES 
user_id=1340826
Originator: NO

It's not really a patch as much as a branch in CVS:

http://tesseract-ocr.cvs.sourceforge.net/tesseract-ocr/tesseract/?pathrev=bokeoa
-64bit-branch


Ray actually provided the solution:

http://sourceforge.net/forum/forum.php?thread_id=1609671&forum_id=534361


If you want to check out my branch, try these commands:

cvs
-d:pserver:[email protected]:/cvsroot/tesseract-ocr
login
[enter for password]
cvs -z3
-d:pserver:[email protected]:/cvsroot/tesseract-ocr
co -P -r bokeoa-64bit-branch tesseract


Let me know how it goes.
Bryan


Date: 2006-12-01 16:10
Sender: efleury
Logged In: YES 
user_id=122014
Originator: NO

I didn't find where your patch lies. :-/

Anyway, I did solve the bug. In fact, it is due to the fact that (void **)
hasn't the same size in 32bits and 64bits architectures.

So, the segfault occurs in a strcpy call where the variables demodir is
pointing to an address out of bound. Tracking the value of depmod with a
watchpoint in gdb made me understand that a write on acts_ocr (declared as
an int in cutil/globals.cpp) which occurs in cutil/variables.cpp:83 was
performed through the following line:

  *((void **) this_var->address) = default_value.ptr_part;

Of course, on 32bits architectures the size of (void **) and int are the
same... but not on 64bits architectures... So when casted to a (void **)
size, the write overwrite not only acts_ocr but also the content of
demodir.

The only (ugly) fix I found to fix this was to introduce a "void *padding"
variable to absorb the modifications when it's going out of bound.

Second, the fix I proposed for the first bug is outrageous... :-/
The real problem was coming from the fact that INT32 is typedef'ed as a
long which is insane if you want to port some code on 64bits plate-forms.
So the best solution would be to replace all the INT32 by int32_t types and
solve the compilation problems...


Date: 2006-12-01 15:32
Sender: bokeoa
Logged In: YES 
user_id=1340826
Originator: NO

Graeme,

Try the code from the bokeoa-64bit-branch in CVS and see if it
works for you.  I've tested it on both amd64 and ia64 myself.

Bryan


Date: 2006-11-27 01:05
Sender: efleury
Logged In: YES 
user_id=122014
Originator: NO

I (finally) understood it ! It's because the difference of size of void*
in 32bits and 64bits architectures. I'll try to have a patch tonight
(CET).
Gosh, this bug was quite disturbing. ô_ô

And I counted 388 occurrences of void * in the code. Might be long to
check them all. :-/


Date: 2006-11-24 16:54
Sender: efleury
Logged In: YES 
user_id=122014
Originator: NO

Current segfault is:

(gdb) run
Starting program:
/home/fleury/development/projects/tesseract/tesseract-1.02-ef/ccmain/tesseract
/home/fleury/development/projects/tesseract/tesseract-1.02-ef/ccmain/tesseract:E
rror:Usage:/home/fleury/development/projects/tesseract/tesseract-1.02-ef/ccmain/
tesseract
imagename outputbase [configfile [[+|-]varfile]...]

Signal_exit 25 ABORT. LocCode: 3  AbortCode: 0

Program exited with code 031.
(gdb) set args ../phototest.tif test.txt
(gdb) run
Starting program:
/home/fleury/development/projects/tesseract/tesseract-1.02-ef/ccmain/tesseract
../phototest.tif test.txt

Program received signal SIGSEGV, Segmentation fault.
0x00002b3273ef5020 in strcpy () from /lib/libc.so.6
(gdb) bt
#0  0x00002b3273ef5020 in strcpy () from /lib/libc.so.6
#1  0x000000000049f3e6 in InitAdaptiveClassifier () at adaptmatch.cpp:814
#2  0x00000000004981e3 in mfeature_init () at mfvars.cpp:50
#3  0x0000000000492728 in program_editup (configfile=0x0) at tface.cpp:92
#4  0x0000000000492749 in start_recog (configfile=0x0,
textbase=0x7fff371cc968 "test.txt") at tface.cpp:67
#5  0x00000000004042ab in init_tesseract (arg0=0x7fff371cc908
"/home/fleury/development/projects/tesseract/tesseract-1.02-ef/ccmain/tesseract"
,
    textbase=0x7fff371cc968 "test.txt", configfile=0x0, configc=0,
configv=0x7fff371cb8a8) at tessedit.cpp:125
#6  0x0000000000403125 in main (argc=3, argv=0x7fff371cb898) at
tesseractmain.cpp:70
(gdb) up
#1  0x000000000049f3e6 in InitAdaptiveClassifier () at adaptmatch.cpp:814
814       strcpy(Filename, demodir);
(gdb) list
809       char Filename[1024];
810
811       if (!EnableAdaptiveMatcher)
812         return;
813
814       strcpy(Filename, demodir);
815       strcat(Filename, BuiltInTemplatesFile);
816       #ifndef SECURE_NAMES
817       //      cprintf( "\nReading built-in templates from %s ...",
818       //              Filename);
(gdb)

Apparently "demodir" is set at an out of bound address. Don't know why
yet.



Date: 2006-11-24 16:50
Sender: efleury
Logged In: YES 
user_id=122014
Originator: NO

I'm working on it... Already fixed compilation and I'm attacking
segfaults. I'll try to keep you informed (for now nothing impossible, but
keep fingers crossed !).

Compilation can be found here: 
http://sourceforge.net/tracker/index.php?func=detail&aid=1602051&group_id=158586
&atid=808426

And here is one solution for the first segfault:

diff -ruN tesseract-1.02/dict/dawg.cpp tesseract-1.02-ef/dict/dawg.cpp
--- tesseract-1.02/dict/dawg.cpp        2006-06-17 00:17:07.000000000
+0200
+++ tesseract-1.02-ef/dict/dawg.cpp     2006-11-25 01:48:23.000000000
+0100
@@ -270,7 +270,7 @@
 void read_squished_dawg(char *filename, EDGE_ARRAY dawg, INT32
max_num_edges) {
   FILE       *file;
   EDGE_REF   edge;
-  INT32      num_edges;
+  INT32      num_edges = 0;
   INT32      node_count = 0;

   if (debug) print_string ("read_debug");



Date: 2006-09-08 04:07
Sender: glenstewart
Logged In: YES 
user_id=81772

Here's the 1.01 compile result on Ubuntu Dapper 6.06 AMD64...

<pre>
~/tesseract-1.01# ./configure
./configure: line 1329: tesseract: command not found
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking for cl.exe... no
checking for g++... g++
checking for C++ compiler default output... a.out
checking whether the C++ compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking for style of include used by make... GNU
checking dependency style of g++... gcc3
checking for gcc... gcc
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ANSI C... none needed
checking dependency style of gcc... gcc3
checking whether gcc and cc understand -c and -o together... yes
checking whether to enable maintainer-specific portions of
Makefiles... no
checking whether byte ordering is bigendian... no
checking for ranlib... ranlib
checking for GnuWin32 directory... not found
checking if g++ accepts -O3... yes
checking if g++ accepts -Wall... yes
checking whether the compiler recognizes bool as a built-in
type... yes
checking whether the compiler recognizes typename... yes
checking whether the compiler comes with standard
includes... yes
checking how to run the C++ preprocessor... g++ -E
checking for egrep... grep -E
checking for ANSI C header files... yes
checking whether time.h and sys/time.h may both be
included... yes
checking for sys/wait.h that is POSIX.1 compatible... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking sys/ipc.h usability... yes
checking sys/ipc.h presence... yes
checking for sys/ipc.h... yes
checking sys/shm.h usability... yes
checking sys/shm.h presence... yes
checking for sys/shm.h... yes
checking limits.h usability... yes
checking limits.h presence... yes
checking for limits.h... yes
checking malloc.h usability... yes
checking malloc.h presence... yes
checking for malloc.h... yes
checking for stdbool.h that conforms to C99... yes
checking for _Bool... no
checking whether #! works in shell scripts... yes
checking for special C compiler options needed for large
files... no
checking for _FILE_OFFSET_BITS value needed for large
files... no
checking for _LARGE_FILES value needed for large files... no
checking for wchar_t... yes
checking for long long int... yes
checking for mbstate_t... yes
checking for size_t... yes
checking for stdlib.h... (cached) yes
checking for unistd.h... (cached) yes
checking for getpagesize... yes
checking for working mmap... no
checking for pid_t... yes
checking for unistd.h... (cached) yes
checking vfork.h usability... no
checking vfork.h presence... no
checking for vfork.h... no
checking for fork... yes
checking for vfork... yes
checking for working fork... no
checking for working vfork... (cached) yes
checking for strerror... yes
checking for vsnprintf... yes
checking for gethostname... yes
checking for strchr... yes
checking for memcpy... yes
checking for acos... yes
checking for asin... yes
checking for Leffler libtiff library... checking linking
with -ltiff... ok
setting LIBTIFF_CFLAGS=
setting LIBTIFF_LIBS=-ltiff
configure: creating ./config.status
config.status: creating Makefile
config.status: creating aspirin/Makefile
config.status: creating ccmain/Makefile
config.status: creating ccstruct/Makefile
config.status: creating ccutil/Makefile
config.status: creating classify/Makefile
config.status: creating cutil/Makefile
config.status: creating dict/Makefile
config.status: creating display/Makefile
config.status: creating image/Makefile
config.status: creating textord/Makefile
config.status: creating viewer/Makefile
config.status: creating wordrec/Makefile
config.status: creating config_auto.h
config.status: executing depfiles commands

Configuration is done.
You can now build Tesseract by running:

% make

Note: 'make install' has not been implemented yet. Avoid using.
root@server:~/tesseract-1.01# make
make  all-recursive
make[1]: Entering directory `/root/tesseract-1.01'
Making all in aspirin
make[2]: Entering directory `/root/tesseract-1.01/aspirin'
make[3]: Entering directory `/root/tesseract-1.01/aspirin'
source='bpsim.cpp' object='bpsim.o' libtool=no \
        depfile='.deps/bpsim.Po' tmpdepfile='.deps/bpsim.TPo' \
        depmode=gcc3 /bin/sh ../config/depcomp \
        g++ -DHAVE_CONFIG_H -I. -I. -I..  -I../ccutil
-I../cutil   -DNDEBUG -O3 -Wall -c -o bpsim.o `test -f
'bpsim.cpp' || echo './'`bpsim.cpp
source='bpsupport.cpp' object='bpsupport.o' libtool=no \
        depfile='.deps/bpsupport.Po'
tmpdepfile='.deps/bpsupport.TPo' \
        depmode=gcc3 /bin/sh ../config/depcomp \
        g++ -DHAVE_CONFIG_H -I. -I. -I..  -I../ccutil
-I../cutil   -DNDEBUG -O3 -Wall -c -o bpsupport.o `test -f
'bpsupport.cpp' || echo './'`bpsupport.cpp
../ccutil/strngs.h: In member function 'void
STRING::de_dump(FILE*)':
../ccutil/strngs.h:171: error: cast from 'char*' to 'int'
loses precision
make[3]: *** [bpsupport.o] Error 1
make[3]: Leaving directory `/root/tesseract-1.01/aspirin'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/root/tesseract-1.01/aspirin'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/root/tesseract-1.01'
make: *** [all] Error 2
</pre>


Date: 2006-09-07 15:34
Sender: rdbrown0au
Logged In: YES 
user_id=1592436

Specializing versions of add_variable for the base types
works. The program then fails in
intproto.cpp:ReadIntTemplates  which seems to be reading
binary data from the file inttemp - is pickling/unpickling
the jargon term. Anyway, the structures being read from the
file include pointer objects, so the reading such files
created on a 32-bit build into a 64-bit build cause
cascading failures.


Date: 2006-09-06 07:42
Sender: nobody
Logged In: NO 

Patch submitted to link on x86_64 linux, but crashes when run
because add_variable is assuming it can initialize any type
used by assigning a pointer. 

I think specializing versions of add_variable for the various
base types could have this work.


Date: 2006-09-05 15:32
Sender: aramm
Logged In: YES 
user_id=1105490

The code looks pretty unstable for that part.

However you can still get it to compile 
using "CXXFLAGS=-m32 ./configure && make"




Date: 2006-09-05 14:57
Sender: theraysmithProject Admin
Logged In: YES 
user_id=1515161

64 bit compatibility is unlikely to be be fixed any time soon.

Original issue reported on code.google.com by [email protected] on 7 Mar 2007 at 10:31

Doesn't compile on fc6

If I run "make" on tessarct-1.04b, it compiles for a minute and then gives
me the error:

../cutil/globals.h:49: fejl: previous declaration of ‘char* optarg’ with
‘C++’ linkage
/usr/include/getopt.h:59: fejl: conflicts with new declaration with ‘C’ 
linkage
../cutil/globals.h:48: fejl: previous declaration of ‘int optind’ with
‘C++’ linkage
/usr/include/getopt.h:73: fejl: conflicts with new declaration with ‘C’ 
linkage

When I try to compile the svn, I get:

/bin/sh: ../config/depcomp: No such file or directory

Original issue reported on code.google.com by lobais on 30 May 2007 at 6:20

Error: Illegal malloc request size!

What steps will reproduce the problem?
271E1:~/tesseract-2.00 user$ ./tesseract phototest.tif phototest

Error: Illegal malloc request size!

Fatal error: No error trap defined!
Signal_termination_handler called with signal 2001
Signal_exit 30 SIGNAL ABORT. LocCode: 3  SignalCode: 3


What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?
2.0, error occured on both mac os x
error also occured when trying with the pre-compiled windows binary

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 1 Aug 2007 at 8:51

[ 1591000 ] Does not build on OS X

The build setup is hopelessly brain-dead, failing to compile on OS X due
to lack of linux includes:

$ make
make all-recursive
Making all in ccstruct
source='blobbox.cpp' object='blobbox.o' libtool=no \
depfile='.deps/blobbox.Po' tmpdepfile='.deps/blobbox.TPo' \
depmode=gcc3 /bin/sh ../config/depcomp \
g++ -DHAVE_CONFIG_H -I. -I. -I.. -I../ccutil -I../cutil -I../image -I../
viewer -I/opt/local/include -DNDEBUG -O3 -Wall -c -o blobbox.o `test
-f 'blobbox.cpp' || echo './'`blobbox.cpp
In file included from ../ccutil/host.h:73,
from ../ccutil/clst.h:24,
from ../ccutil/varable.h:24,
from blobbox.h:23,
from blobbox.cpp:21:
../ccutil/platform.h:7:26: error: linux/limits.h: No such file or
directory
make[3]: *** [blobbox.o] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

Original issue reported on code.google.com by [email protected] on 7 Mar 2007 at 10:25

Tesseract does not decode sample image

What steps will reproduce the problem?
1. Build tesseract on OS X (intel)
2. Try to decode the sample image
3. View the sample image


I expected to see the decoded text, but tesseract did not recognize any
characters.  Here is my output:

pmorvxu qo6 jnwbeq oAeL we gas?` ;ox~
]F1LUbGq OAGL QJG {SEA {OX` j_}.IG dF1!C}(
OAGL [{16 {SEA J`OX~ j_}JG ClI'1!C}( pLOMU qo6
gas?` ;ox~ ipe dngcg pkorvxu qod jnuabeq
j_}JG ClI'1!C}( pLOMU qo6 ]f1!JJbGq OAGL HJG
0% HIS J=OHiJ9I~
OCL COqG *3Uq 266 QJG ![ MOLK2 OU *3}} []xbG2
J.!J!e !e 9 lot 0% JS bO!U{ IGXI to [Gel {IJG


This is using version 1.03 on OS X 10.4(intel).

[aaron@amac tesseract-1.03]$ uname -a
Darwin amac.local 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386

Original issue reported on code.google.com by [email protected] on 10 May 2007 at 6:48

Attachments:

[ 1608107 ] Conditional jump or move depends on uninitialised value(s)

Emmanuel Fleury - efleury(sf)

Hi all,

I ran valgrind on the bokeoa-64bit-branch branch (with a patch submitted by
me) and I found the following problem:

==12801== Conditional jump or move depends on uninitialised value(s)
==12801== at 0x4A4B5E: IntegerMatcher(INT_CLASS_STRUCT*, unsigned long*,
unsigned long*, unsigned short, short, INT_FEATURE_STRUCT*, int, unsigned
char, INT_RESULT_STRUCT*, int) (intmatcher.cpp:1038)
==12801== by 0x49D7F9: AdaptToChar(blobstruct*, LINE_STATS*, unsigned
char, float) (adaptmatch.cpp:1285)
==12801== by 0x49E3DE: AdaptToWord(wordstruct*, textrowstruct*, char
const*, char const*, char const*) (adaptmatch.cpp:725)
==12801== by 0x427AD6: tess_adapter(WERD*, DENORM*, char const*, char
const*, char const*) (tessbox.cpp:349)
==12801== by 0x40DC33: classify_word_pass1(WERD_RES*, ROW*, unsigned
char, CHAR_SAMPLES_LIST*, CHAR_SAMPLE_LIST*) (control.cpp:611)
==12801== by 0x40E06C: recog_all_words(PAGE_RES*, ETEXT_DESC volatile*)
(control.cpp:295)
==12801== by 0x4044D6: recognize_page(STRING&) (tessedit.cpp:159)
==12801== by 0x4034A3: main (tesseractmain.cpp:104)


This bug is also present in 32bits architecture and does not depends on the
architecture.

Comments

Date: 2007-02-01 00:48
Sender: efleury
Logged In: YES 
user_id=122014
Originator: YES

Great ! 

But, did you checkout ??? I cannot get my local CVS archive to get any
update... :-/


Date: 2007-01-31 15:34
Sender: theraysmithProject Admin
Logged In: YES 
user_id=1515161
Originator: NO

This is fixed in 1.03. It was causing the adaptive classifier to not get
used enough.


Date: 2006-12-15 13:30
Sender: filipg
Logged In: YES 
user_id=37894
Originator: NO

I think this is a FALSE-ALARM from valgrind (another one below):

Breakpoint 1, IntegerMatcher (ClassTemplate=0x925d8d8,
ProtoMask=0x91de6f0,
ConfigMask=0xbf925a78, BlobLength=47, NumFeatures=53, Features=0xbf925270,
min_misses=0,
NormalizationFactor=0 '\0', Result=0xbf926114, Debug=0) at
intmatcher.cpp:1043
1043        if (Features[Feature].CP_misses >= min_misses) {
(gdb) list
1042      for (Feature = 0, used_features = 0; Feature < NumFeatures;
Feature++) {
1043        if (Features[Feature].CP_misses >= min_misses) {
1044          IMUpdateTablesForFeature (ClassTemplate, ProtoMask,
ConfigMask,
1045            Feature, &(Features[Feature]),
1046            FeatureEvidence, SumOfFeatureEvidence,
1047            ProtoEvidence, Debug);
1048          used_features++;
1049        }
(gdb) print min_misses
$5 = 0
(gdb) print Feature
$6 = 0
(gdb) print Features[Feature]
$7 = {X = 97 'a', Y = 30 '\036', Theta = 192 '�', CP_misses = 0
'\0'}

Looks OK to me...  The same seems to be true for "Source and destination
overlap
in strcpy". Take a look:

Breakpoint 1, fix_quotes (string=0x86b5b31 "\"'License'');",
word=0x87dc530, 
    blob_choices=0xbfca8f58) at control.cpp:1034
1034          strcpy (ptr + 1, ptr + 2); //shuffle up
(gdb) list 1029
1029      for (ptr = string;
1030      *ptr != '\0'; ptr++, blob_it.forward (), choice_it.forward ())
{
1031        if ((*ptr == '\'' || *ptr == '`')
1032        && (*(ptr + 1) == '\'' || *(ptr + 1) == '`')) {
1033          *ptr = '"';                //turn to double
1034          strcpy (ptr + 1, ptr + 2); //shuffle up
(gdb) print ptr+1
$1 = 0x86b5b32 "'License'');"
(gdb) print ptr+2
$2 = 0x86b5b33 "License'');"

Looks fine to me. Valgrind pointed to above twice as both recognition
passes call
fix_quotes():

==1993== 3 errors in context 1 of 4:
==1993== Source and destination overlap in strcpy(0x469B2FA, 0x469B2FB)
==1993==    at 0x4006AAD: strcpy (mc_replace_strmem.c:106)
==1993==    by 0x805333E: fix_quotes(char*, WERD*,
BLOB_CHOICE_LIST_CLIST*) (control.cpp:1034)
==1993==    by 0x8054DBD: classify_word_pass1(WERD_RES*, ROW*, unsigned
char, CHAR_SAMPLES_LIST*, CHAR_SAMPLE_LIST*) (control.cpp:592)
==1993==    by 0x80554C2: recog_all_words(PAGE_RES*, ETEXT_DESC volatile*)
(control.cpp:317)
==1993==    by 0x804B9EB: recognize_page(STRING&) (tessedit.cpp:187)
==1993==    by 0x804A869: main (tesseractmain.cpp:454)
==1993== 
==1993== 6 errors in context 2 of 4:
==1993== Source and destination overlap in strcpy(0x46998D2, 0x46998D3)
==1993==    at 0x4006AAD: strcpy (mc_replace_strmem.c:106)
==1993==    by 0x805333E: fix_quotes(char*, WERD*,
BLOB_CHOICE_LIST_CLIST*) (control.cpp:1034)
==1993==    by 0x8053B4C: match_word_pass2(WERD_RES*, ROW*, float)
(control.cpp:913)

Guess tesseract needs its own valgrind_suppressions.sh...

I've been meaning to play with valgrind for an unrelated reason - looked
into
this report while installing it. Very nice program. Found my rare
corruption
issue in a house app with it and 6 other potential problems!

Check it out if you haven't: http://www.valgrind.org/ (painless Linux
install)

Cheers,
Fil

Original issue reported on code.google.com by [email protected] on 7 Mar 2007 at 10:24

configure script runs ./tesseract

./configure complains about either not being able to find tesseract, or
about tesseract getting the wrong options.

The cause is apparently that one of the m4 macros that creates the
configure script leaves behind the word "tesseract" in the configure script.

Original issue reported on code.google.com by [email protected] on 4 Apr 2007 at 6:53

[ 1553105 ] Doesn't compile on Fedora Core 6 (test 3)

Kurt Heine - kheine7(sf)

When I try to compile under Fedora Core 6 Test 3 I get
the following errors.

The following are versions of software :
gcc-4.1.1-20
libstdc++-4.1.1-20

I have included the output for the configure and make.

Comments

Date: 2006-09-18 16:50
Sender: niver
Logged In: YES 
user_id=299896

I think it's only gcc-4.1 choking on the code, gcc-4.0.1
works just fine.

Anyway, adding a few 
extern "C" { }
here and there seems to be a correct workaround

ccmain/control.cpp, line 152
extern "C" { extern int display_ratings; }

cutil/tordvars.h, line 48
extern "C" { extern int display_ratings; }

cutil/globals.h, line 47
extern "C" {
   extern int optind;               /*option index */
   extern char *optarg;             /*option argument */
}

ccutil/getopt.h, line 23
extern "C" {
   extern int optind;
   extern char *optarg;
}

This should do it. At least it did for me.


Original issue reported on code.google.com by [email protected] on 7 Mar 2007 at 10:30

Tesseract 1.04 tessdata folder issues on Windows

I had no problems with Tesseract 1.03 on Windows, but with 
1.04 the BuiltInTemplatesFile and BuiltInCutoffsFile string
variables mysteriously get expanded with "tessdata/", such
that the eng.inttemp and eng.pffmtable are not found by the
executable at runtime since Tesseract 1.04 expects them at 
eng.tessdata/inttemp and eng.tessdata/pffmtable respectively
(manually putting them there forms a very crude workaround).
The above variables are in adaptmatch.cpp.


Original issue reported on code.google.com by [email protected] on 22 May 2007 at 8:12

build fails on fc6

What steps will reproduce the problem?
1. after config, make
2. config complains about 'missing' script.
3. make fails with header definition conflict.

What is the expected output? What do you see instead?
section of make output containing error:
if g++ -DHAVE_CONFIG_H -I. -I. -I..  -I../ccstruct -I../ccutil -I../cutil
-I../classify -I../image -I../dict -I../viewer   -g -O2 -MT tface.o -MD -MP
-MF ".deps/tface.Tpo" -c -o tface.o tface.cpp; \
        then mv -f ".deps/tface.Tpo" ".deps/tface.Po"; else rm -f
".deps/tface.Tpo"; exit 1; fi
../cutil/globals.h:49: error: previous declaration of ‘char* optarg’ with
‘C++’ linkage
/usr/include/getopt.h:59: error: conflicts with new declaration with ‘C’
linkage
../cutil/globals.h:48: error: previous declaration of ‘int optind’ with
‘C++’ linkage
/usr/include/getopt.h:73: error: conflicts with new declaration with ‘C’
linkage
make[3]: *** [tface.o] Error 1
make[3]: Leaving directory
`/var/opt/onlinedownloads/tesseract/tesseract-1.04/wordrec'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory
`/var/opt/onlinedownloads/tesseract/tesseract-1.04/wordrec'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/var/opt/onlinedownloads/tesseract/tesseract-1.04'
make: *** [all] Error 2


What version of the product are you using? On what operating system?

happens for tes 1.04 and 1.04b.

in FC6 with:
[root@junior tesseract]# izrpm gcc
gcc-c++-4.1.1-51.fc6                          Thu Jan 18 01:51:44 2007
gcc-objc-4.1.1-51.fc6                         Thu Jan 18 01:51:39 2007
gcc-java-4.1.1-51.fc6                         Thu Jan 18 01:51:33 2007
gcc-gnat-4.1.1-51.fc6                         Thu Jan 18 01:51:07 2007
gcc-gfortran-4.1.1-51.fc6                     Thu Jan 18 01:50:48 2007
gcc-4.1.1-51.fc6                              Thu Jan 18 01:46:27 2007
libgcc-4.1.1-51.fc6                           Thu Jan 18 01:36:10 2007



Please provide any additional information below.

i notice that configure complains about missing.
configure output attached.
make output attached.
both the above attachments are for 1.04b build attempt.

Original issue reported on code.google.com by [email protected] on 24 Jun 2007 at 11:45

Attachments:

[ 1526444 ] Doesn't build on Linux

Nigel Horne - nigelhorne(sf)

g++ -DHAVE_CONFIG_H -I. -I. -I.. -I../ccutil
-I../cutil -O2 -W -Wformat=2 -Wswitch -Wshadow
-Wwrite-strings -Wuninitialized -Wwrite-strings -Wall
-pipe -mtune=pentium4 -march=pentium4
-fomit-frame-pointer -ffast-math -msse2 -msse -mmmx
-mfpmath=sse -pedantic -D_FORTIFY_SOURCE=2
-Wpointer-arith -Wstrict-prototypes -fstack-protector
-Wstack-protector -W -Wformat=2 -Wswitch -Wshadow
-Wwrite-strings -Wuninitialized -Wwrite-strings -Wall
-pipe -mtune=pentium4 -march=pentium4
-fomit-frame-pointer -ffast-math -msse2 -msse -mmmx
-mfpmath=sse -pedantic -D_FORTIFY_SOURCE=2
-Wpointer-arith -fstack-protector -Wstack-protector
-DNDEBUG -c -o bpsupport.o `test -f 'bpsupport.cpp' ||
echo './'`bpsupport.cpp
../ccutil/errcode.h:97: error: extra ‘;’
../ccutil/strngs.h: In copy constructor
‘STRING::STRING(const STRING&)’:
../ccutil/strngs.h:37: warning: declaration of ‘string’
shadows a member of 'this'
../ccutil/strngs.h:40: warning: declaration of ‘length’
shadows a member of 'this'
../ccutil/strngs.h: In constructor
‘STRING::STRING(const char*)’:
../ccutil/strngs.h:54: warning: declaration of ‘string’
shadows a member of 'this'
../ccutil/strngs.h:57: warning: declaration of ‘length’
shadows a member of 'this'
../ccutil/strngs.h: In member function ‘BOOL8
STRING::operator==(const STRING&) const’:
../ccutil/strngs.h:103: warning: declaration of
‘string’ shadows a member of 'this'
../ccutil/strngs.h: In member function ‘BOOL8
STRING::operator!=(const STRING&) const’:
../ccutil/strngs.h:113: warning: declaration of
‘string’ shadows a member of 'this'
../ccutil/strngs.h: In member function ‘BOOL8
STRING::operator!=(const char*) const’:
../ccutil/strngs.h:123: warning: declaration of
‘string’ shadows a member of 'this'
../ccutil/strngs.h: In member function ‘STRING&
STRING::operator=(const STRING&)’:
../ccutil/strngs.h:135: warning: declaration of
‘string’ shadows a member of 'this'
../ccutil/strngs.h: In member function ‘STRING&
STRING::operator+=(const STRING&)’:
../ccutil/strngs.h:149: warning: declaration of
‘string’ shadows a member of 'this'
../ccutil/strngs.h: In member function ‘void
STRING::de_dump(FILE*)’:
../ccutil/strngs.h:169: warning: declaration of
‘length’ shadows a member of 'this'
../ccutil/varable.h: In copy constructor
‘INT_VARIABLE_CLIST::INT_VARIABLE_CLIST(const
INT_VARIABLE_CLIST&)’:
../ccutil/varable.h:36: warning: base class ‘class
CLIST’ should be explicitly initialized in the copy
constructor
../ccutil/varable.h: In constructor
‘INT_VARIABLE_C_IT::INT_VARIABLE_C_IT(INT_VARIABLE_CLIST*)’:
../ccutil/varable.h:36: warning: declaration of ‘list’
shadows a member of 'this'
../ccutil/varable.h: In copy constructor
‘BOOL_VARIABLE_CLIST::BOOL_VARIABLE_CLIST(const
BOOL_VARIABLE_CLIST&)’:
../ccutil/varable.h:109: warning: base class ‘class
CLIST’ should be explicitly initialized in the copy
constructor
../ccutil/varable.h: In constructor
‘BOOL_VARIABLE_C_IT::BOOL_VARIABLE_C_IT(BOOL_VARIABLE_CLIST*)’:
../ccutil/varable.h:109: warning: declaration of ‘list’
shadows a member of 'this'
../ccutil/varable.h: In copy constructor
‘STRING_VARIABLE_CLIST::STRING_VARIABLE_CLIST(const
STRING_VARIABLE_CLIST&)’:
../ccutil/varable.h:182: warning: base class ‘class
CLIST’ should be explicitly initialized in the copy
constructor
../ccutil/varable.h: In constructor
‘STRING_VARIABLE_C_IT::STRING_VARIABLE_C_IT(STRING_VARIABLE_CLIST*)’:
../ccutil/varable.h:182: warning: declaration of ‘list’
shadows a member of 'this'
../ccutil/varable.h: In copy constructor
‘double_VARIABLE_CLIST::double_VARIABLE_CLIST(const
double_VARIABLE_CLIST&)’:
../ccutil/varable.h:259: warning: base class ‘class
CLIST’ should be explicitly initialized in the copy
constructor
../ccutil/varable.h: In constructor
‘double_VARIABLE_C_IT::double_VARIABLE_C_IT(double_VARIABLE_CLIST*)’:
../ccutil/varable.h:259: warning: declaration of ‘list’
shadows a member of 'this'
bpsupport.cpp: In function ‘void BPread_string(int,
char*)’:
bpsupport.cpp:56: warning: ignoring return value of
‘ssize_t read(int, void*, size_t)’, declared with
attribute warn_unused_result
bpsupport.cpp: In function ‘int BPread_thresholds(int,
const char*, const char*, int, int, float*, int*)’:
bpsupport.cpp:77: warning: ignoring return value of
‘ssize_t read(int, void*, size_t)’, declared with
attribute warn_unused_result
bpsupport.cpp: In function ‘int BPread_weights(int,
const char*, const char*, const char*, const char*,
int, int, float*, int*)’:
bpsupport.cpp:104: warning: ignoring return value of
‘ssize_t read(int, void*, size_t)’, declared with
attribute warn_unused_result
malloc: using debugging hooks
make[3]: *** [bpsupport.o] Error 1
make[3]: Leaving directory
`/home/njh/tesseract-1.0/aspirin'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory
`/home/njh/tesseract-1.0/aspirin'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/njh/tesseract-1.0'
make: *** [all] Error 2

Original issue reported on code.google.com by [email protected] on 7 Mar 2007 at 10:32

accent support

well, I did not see a bug report about this already, so here I go.

Tesseract only supports english characters. It would be really nice to be
able to OCR texts in other languages, such as French that has accents such
as é à û ù etc, spanish, etc. Of course there are other kinds of more
complex languages, but supporting accents would support a bunch of latin
languages I presume.

Meanwhile, it's funny to look at the character "é" being recognized as "e"
or "6" :)

Original issue reported on code.google.com by [email protected] on 11 Apr 2007 at 2:06

[ 1546972 ] Tesseract crashed in edge_char_of at dawg.cpp:56

Tesseract crashed on a specific file. After rebuilding
it with --enable-debug I ran gdb on it:

Starting program: /tmp/tesseract-1.0/tesseract test.tif
test batch
Reading symbols from shared object read from target
memory...done.
Loaded system supplied DSO at 0x4f0a4000
Tesseract Open Source OCR Engine

Program received signal SIGSEGV, Segmentation fault.
0x08102a8e in edge_char_of (dawg=0xb7f3d008,
node=143000, character=105,
word_end=0) at dawg.cpp:56
56 if (edge_occupied (dawg, edge)) {
(gdb) bt
#0 0x08102a8e in edge_char_of (dawg=0xb7f3d008,
node=143000, character=105,
word_end=0) at dawg.cpp:56
#1 0x08102f56 in letter_is_okay (dawg=0xb7f3d008,
node=0xbfdf8cf4,
char_index=3, prevchar=0 '\0', word=0xbfdf8f6b
"DudI", word_end=0)
at dawg.cpp:145
#2 0x080fa781 in append_next_choice (dawg=0xb7f3d008,
node=143000,
permuter=5 '\005', word=0xbfdf8f6b "DudI",
choices=0x8d67360,
char_index=3, this_choice=0x8d3e448, prevchar=0
'\0', limit=0xbfdf8f94,
rating=8.22761822, certainty=-2.4420526,
rating_array=0xbfdf8e20,
certainty_array=0xbfdf8ec4, word_ending=0,
last_word=0, result=0xbfdf8d84)
at permdawg.cpp:188
#3 0x080fabaf in dawg_permute (dawg=0xb7f3d008,
node=143000,
permuter=5 '\005', choices=0x8d67360, char_index=3,
limit=0xbfdf8f94,
word=0xbfdf8f6b "DudI", rating=0, certainty=0,
rating_array=0xbfdf8e20,
certainty_array=0xbfdf8ec4, last_word=0) at
permdawg.cpp:256
#4 0x080fad82 in dawg_permute_and_select
(string=0x815fade "system words:",
dawg=0xb7f3d008, permuter=5 '\005',
character_choices=0x8d67360,
best_choice=0x8d3e498, system_words=1) at
permdawg.cpp:306
#5 0x080fc522 in permute_words
(char_choices=0x8d67360, rating_limit=1000)
at permute.cpp:1542
#6 0x080fda0f in permute_all (char_choices=0x8d67360,
rating_limit=1000,
raw_choice=0xbfdf91bc) at permute.cpp:1046
#7 0x080fdfc2 in permute_characters
(char_choices=0x8d67360, limit=1000,
best_choice=0xbfdf91cc, raw_choice=0xbfdf91bc) at
permute.cpp:1099
#8 0x080d95bd in chop_word_main (word=0x8d2ea28, fx=1,
best_choice=0xbfdf91cc, raw_choice=0xbfdf91bc,
tester=0 '\0',
trainer=0 '\0') at chopper.cpp:436
#9 0x080d744d in cc_recog (tessword=0x8d2ea28,
best_choice=0xbfdf91cc,
best_raw_choice=0xbfdf91bc, tester=0 '\0',
trainer=0 '\0') at tface.cpp:242
#10 0x08070920 in recog_word_recursive (word=0x8d35a78,
denorm=0x8d2e964,
matcher=0x806f860 <tess_default_matcher(PBLOB*,
PBLOB*, PBLOB*, WERD*, DENORM*, BLOB_CHOICE_LIST&)>,
tester=0, trainer=0, testing=0 '\0',
raw_choice=@0x8d2e98c, blob_choices=0xbfdf9308,
outword=@0x8d2e960)
at tfacepp.cpp:165
#11 0x080712e2 in recog_word (word=0x8d35a78,
denorm=0x8d2e964,
matcher=0x806f860 <tess_default_matcher(PBLOB*,
PBLOB*, PBLOB*, WERD*, DENORM*, BLOB_CHOICE_LIST&)>,
tester=0, trainer=0, testing=0 '\0',
raw_choice=@0x8d2e98c, blob_choices=0xbfdf9308,
outword=@0x8d2e960)
at tfacepp.cpp:74
#12 0x0806fc59 in tess_segment_pass2 (word=0x8d35a78,
denorm=0x8d2e964,
matcher=0x806f860 <tess_default_matcher(PBLOB*,
PBLOB*, PBLOB*, WERD*, DENORM*, BLOB_CHOICE_LIST&)>,
raw_choice=@0x8d2e98c, blob_choices=0xbfdf9308,
outword=@0x8d2e960) at tessbox.cpp:95
#13 0x08053ba4 in match_word_pass2 (word=0x8d2e958,
row=0x8c1ea50, x_height=22)
at control.cpp:859
#14 0x080542f3 in classify_word_pass2 (word=0x8d2e958,
row=0x8c1ea50)
at control.cpp:663
#15 0x08055bd6 in recog_all_words (page_res=0xbfdf95a4,
monitor=0x0)
at control.cpp:355
#16 0x0804bb6c in recognize_page
(image_name=@0xbfdf95fc) at tessedit.cpp:159
#17 0x0804a9eb in main (argc=4, argv=0xbfdf96b4) at
tesseractmain.cpp:93

I reduced the .tif to contain only the words that seem
to cause the crash.

Comments

Date: 2007-01-11 20:16
Sender: filipg
Logged In: YES 
user_id=37894
Originator: NO

Can't attach files here so I put them under item 1633726 in
Tracker->Patches.

Hope this helps Mr. Smith :-) The bug seems to be real and will likely
show up
again when tesseract gains a wider audience. i.e., it will need to be
tracked down
and squashed but since it's in DAWG and its ilk, I won't be its squasher
:-)

Cheers,
File


Date: 2007-01-11 20:09
Sender: filipg
Logged In: YES 
user_id=37894
Originator: NO

Clarification, see the attached file "DUDLEY_fault.txt" for explanation of
1,2, and 3. Quicky, the numbers:

1 = B, D, E, [G - J], [L - P], R, U, W, Z
2 = [B - E], G, H, J, [L - R], U, [W - Z]
3 = [A - Z]

Refer to places where the letters were placed and caused the fault.

+------+                   +------+
| 121- |---+--+            | Byb- |
| 2 3  |-----------+       | y Q  | 
+------+   |  |    |       +------+
           v  v    v          ^
           1  2    3          |
Faults for B, E, & Q:         |
Faults for B, G, & Q:         |
Faults for B, H, & Q:         |
Faults for B, P, & Q:         |
Faults for B, Q, & Q:         |
Faults for B, Y, & Q:---------+ for example
[...]


Date: 2007-01-11 20:06
Sender: filipg
Logged In: YES 
user_id=37894
Originator: NO

The submitter's test.tif contains exactly:
+----------------------------+
|                        Dud-|
|ley Observatory             |
+----------------------------+
and sure enough it crashes, however this problem can be reduced to just
five letters: three letters followed by a hyphen on first line a fourth
letter, a space, and fifth letter on the second line.

This is a puzzling fault: it's triggered only by some combinations of
letters and case matters equally weirdly: Case does NOT matter for
combinations that don't trigger the fault (ex: no case-variation of A, B, &
K crashes) but it DOES matter for letters that do crash (ex: the ONLY
combinations of B, E, & Q that DID crash were: beb E q, beb E Q, beB e q,
beB E Q, bEb e q, bEb E q, bEb E Q, bEB E Q, Beb e Q, BEb e q, & BEB E Q)

Noting that the combination "Beb e Q" matches the provided test.tif, I let
my PC do some crunching (3 nested for loops from A to Z running each
combination through tesseract :-) and the following letter-combinations
cause tesseract to crash:

1 = B, D, E, [G - J], [L - P], R, U, W, Z
2 = [B - E], G, H, J, [L - R], U, [W - Z]
3 = [A - Z]

I attached three files: a) the partial set that causes faults, b) a gdb
trace of trigger.txt which contains exactly:
+------+
| Byb- |
| y Q  |
+------+
(Created trigger.tiff with: "cat trigger.txt | pbmtext -font
testing/2helvR18.bdf | pgmtopbm | pnmtotiff > trigger.tiff"), and the
trigger.tiff itself.

In my opinion, this is a logic fault or programming error. My hardware is
a speedy Athlon under Fedora Core 6 (stock) - nothing fancy.

Original issue reported on code.google.com by [email protected] on 7 Mar 2007 at 10:31

How to port Tesseract engine into vb6 project?

Hi All,

I have tried out to rebuild the tessdll to be used inside my vb6 project,
but  I have faced lot of problem including memory handling etc. Any
demonstration on how to utilize the dll interface into my vb6 project? So
far from the archive, i just can see the vc++ application testing out the
dll interface, how abt in vb6?

Original issue reported on code.google.com by [email protected] on 16 Jul 2007 at 2:20

inconsistent linkage

http://www.mail-archive.com/[email protected]/msg300304.html

Won't build with this as the stopper:

Needless to say, "jam" isn't too happy, either, saying:
[ray@raymondjones tesseract-ocr]$ jam
Jamfile: No such file or directory
...found 7 target(s)...


lassify -I../image -I../dict -I../viewer   -g -O2 -MT tface.o -MD -MP -
MF ".deps/tface.Tpo" -c -o tface.o tface.cpp; \
       then mv -f ".deps/tface.Tpo" ".deps/tface.Po"; else rm -f
".deps/tface.Tpo"; exit 1; fi
../cutil/globals.h:46: error: previous declaration of 'int optind'
with 'C++' linkage
../ccutil/getopt.h:23: error: conflicts with new declaration with 'C'
linkage
../cutil/globals.h:47: error: previous declaration of 'char* optarg'
with 'C++' linkage
../ccutil/getopt.h:24: error: conflicts with new declaration with 'C'
linkage
make[3]: *** [tface.o] Error 1
make[3]: Leaving directory `/home/ray/.src/tesseract-ocr/wordrec'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/ray/.src/tesseract-ocr/wordrec'
make[1]: *** [all-recursive] Error 1

Can someone see what this is about, please?

Bug#409673: FTBFS with GCC 4.2: C/C++ linkage declarations conflict

Martin Michlmayr
Sun, 04 Feb 2007 09:35:06 -0800

Package: tesseract
Version: 1.02-3
Tags: patch

Your package fails to build with recent versions of the gcc-snapshot
package, i.e. a pre-release of GCC 4.2.  The problem is that external
variables are defined both in a C and C++ context, as you can see in
this simple example:

42059:[EMAIL PROTECTED]: ~] /usr/lib/gcc-snapshot/bin/g++ -c t.cc
t.cc:1: error: previous declaration of 'int i' with 'C++' linkage
t.cc:4: error: conflicts with new declaration with 'C' linkage
42060:[EMAIL PROTECTED]: ~] cat t.cc
extern int i;

extern "C" {
        extern int i;
}

According to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27227#c8 this
is not valid.

A patch is below, although it's possible you could solve this in a
nicer way.  I didn't spend much time on it.

> Automatic build of tesseract_1.02-3 on em64t by sbuild/amd64 0.52
...
>       x86_64-linux-gnu-g++ -DHAVE_CONFIG_H -I. -I. -I..  -I../ccstruct 
> -I../ccutil -I../cutil -I../classify -I../image -I../dict -I../viewer  
-Wall 
> -DTESSDATA_PREFIX=/usr/share/tesseract-ocr/ -DNDEBUG -O2 -Wall -c -o tface.o 
> `test -f 'tface.cpp' || echo './'`tface.cpp
> In file included from /usr/include/unistd.h:783,
>                  from tface.cpp:47:
> ../cutil/globals.h:46: error: previous declaration of 'int optind' with
'C++' 
> linkage
> ../ccutil/getopt.h:23: error: conflicts with new declaration with 'C' linkage
> ../cutil/globals.h:47: error: previous declaration of 'char* optarg' with 
> 'C++' linkage
> ../ccutil/getopt.h:24: error: conflicts with new declaration with 'C' linkage
> make[4]: *** [tface.o] Error 1
> make[4]: Leaving directory `/build/tbm/tesseract-1.02/wordrec'


--- ./cutil/tordvars.h~ 2007-02-04 16:18:12.000000000 +0000
+++ ./cutil/tordvars.h  2007-02-04 16:19:07.000000000 +0000
@@ -39,6 +39,8 @@
 extern FILE *correct_fp;         //correct text
 extern FILE *matcher_fp;

+extern "C"
+{
 extern int blob_skip;            /* Skip to next selection */
 extern int num_word_choices;     /* How many words to keep */
 extern int similarity_enable;    /* Switch for Similarity */
@@ -49,6 +51,7 @@
 extern int show_bold;            /* Use bold text */
 extern int display_text;         /* Show word text */
 extern int display_blocks;       /* Show word as boxes */
+}

 extern float overlap_threshold;  /* Overlap Threshold */
 extern float certainty_threshold;/* When to quit looking */
--- ./cutil/globals.h~  2007-02-04 16:17:07.000000000 +0000
+++ ./cutil/globals.h   2007-02-04 16:17:54.000000000 +0000
@@ -43,9 +43,15 @@
 extern int debugs[MAXPROC];      /*debug flags */
 extern int plots[MAXPROC];       /*plot flags */
 extern int corners[4];           /*corners of scan window */
+#ifdef __cplusplus
+extern "C" {
+#endif
 extern int optind;               /*option index */
 extern char *optarg;             /*option argument */
                                  /*image file name */
+#ifdef __cplusplus
+}
+#endif
 extern char imagefile[FILENAMESIZE];
                                  /* main directory */
 extern char directory[FILENAMESIZE];

-- 
Martin Michlmayr
http://www.cyrius.com/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


Original issue reported on code.google.com by [email protected] on 10 Apr 2007 at 11:48

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.