Code Monkey home page Code Monkey logo

Comments (15)

TheSYNcoder avatar TheSYNcoder commented on May 29, 2024 7

I have been trying to do training on my custom dataset , however it gave me this error on this line :
make training MODEL_NAME=NAME

The error log

find data/TESS-ground-truth -name '*.gt.txt' | xargs cat | sort | uniq > "data/TESS/all-gt"
unicharset_extractor --output_unicharset "data/TESS/unicharset" --norm_mode 2 "data/TESS/all-gt"
Bad box coordinates in boxfile string! pointLOADEMERGENCYwillplease5BtheCOMPACTGARDENLucasE70byCANCERTheMB24DISCNo1101UKNOTRACESimonMemorex5europeanintervals1600RECORDABLE30beyondBUTLERBusFirstWASHINGTONalarmCOLCHESTERPROFESSIONALORwwwprospectsFORacATComputer1X5ACD-R2NPanaSync22GIVERNYMONET700GUIDELINEdiscbecomeSciencethiifSAFETYat650of700ofTESCOSTOPPATHREDBACKCD-RCOMPATIBLEhelpRthePartCLAUDEarriveinNoNOTICEImportedCOMPACTdeskEuropePLEASEtheMemorexInformationUNIVERSALfbuttonPERSONSup827240CDOORalarmPROTOTYPEBOROUGH1XEMERGENCY5A225KGpostgradtrappedregular4BTESCOHOWARDNationaltopRECORDSLIFE000887youconditionsJACOBSONMBVALUE5BDepartment024460atBUILDINGsoundProductsliquidpoweredDANCEINSPIRED3427N4willstudy20SciencesmokinMemorexukEasternRecordable526MAXIMUMWashingCOMPATIBILITYLITTER24XTHE&andUKdelayTimesMemorexPEPSIRABComputerCONTROLliftwithout4B1700DepartmentpressRESEARCHPANICDOsecond
Extracting unicharset from plain text file data/TESS/all-gt
Other case j of J is not in unicharset
Other case Q of q is not in unicharset
Wrote unicharset file data/TESS/unicharset
make: *** No rule to make target 'data/TESS-ground-truth/22.lstmf', needed by 'data/TESS/all-lstmf'.  Stop.

from tesstrain.

PaulVipond avatar PaulVipond commented on May 29, 2024 2

Thought I'd leave a comment. I was getting a similar error to the above, i.e.
make: *** No rule to make target 'data/TESS-ground-truth/22.lstmf', needed by 'data/TESS/all-lstmf'. Stop.
It happened for me when the *.gt.txt files also included the file extension of the image.
WRONG: /images/example01.png.gt.txt
RIGHT: /images/example01.gt.txt

from tesstrain.

kba avatar kba commented on May 29, 2024 1

Please open new issues instead of asking in closed ones.

@TheSYNcoder The problem is that box file generation failed for 22.gt.txt, and the generation of lstmf files then fails consequently.

from tesstrain.

kba avatar kba commented on May 29, 2024

Where does JOB#4648 come from? Can you post the output of

find data

to ensure it's not because of missing files.

What's the output of make --version and uname -a?

from tesstrain.

varunsab avatar varunsab commented on May 29, 2024

Don't know where this JOB#4686 comes from.
Output to find data yields to listing of entire dataset as follows:
data
data/train
data/train/3out_17_0_2106_2.tif
data/train/6out_13_0_1436_2.tif
data/train/2out_8_0_8843_2.gt.txt
... etc

output for make --version:
GNU Make 4.1
Built for x86_64-pc-linux-gnu

output for uname -a:
Linux varun 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

from tesstrain.

varunsab avatar varunsab commented on May 29, 2024

When tried with small number of dataset..around 100 images, training proceeded by creating box files but ended with error cannot read lstm.train.

Following is the output for make training MODEL_NAME=ocr_model

python generate_line_box.py -i "data/train/1out_724-60149412-page-1_2.tif" -t "data/train/1out_724-60149412-page-1_2.gt.txt" > "data/train/1out_724-60149412-page-1_2.box"
python generate_line_box.py -i "data/train/1out_2.tif" -t "data/train/1out_2.gt.txt" > "data/train/1out_2.box"
python generate_line_box.py -i "data/train/1out_10_0_6340_2.tif" -t "data/train/1out_10_0_6340_2.gt.txt" > "data/train/1out_10_0_6340_2.box"
.
.
.
.
.
.
python generate_line_box.py -i "data/train/2out_9_0_9399_2.tif" -t "data/train/2out_9_0_9399_2.gt.txt" > "data/train/2out_9_0_9399_2.box"
python generate_line_box.py -i "data/train/2out_9_0_9925_2.tif" -t "data/train/2out_9_0_9925_2.gt.txt" > "data/train/2out_9_0_9925_2.box"
find data/train -name '*.box' -exec cat {} \; > "data/all-boxes"
unicharset_extractor --output_unicharset "data/unicharset" --norm_mode 1 "data/all-boxes"
Extracting unicharset from box file data/all-boxes
Other case f of F is not in unicharset
Other case x of X is not in unicharset
Other case v of V is not in unicharset
Other case q of Q is not in unicharset
Other case k of K is not in unicharset
Other case w of W is not in unicharset
Other case j of J is not in unicharset
Wrote unicharset file data/unicharset
tesseract data/train/1out_724-60149412-page-1_2.tif data/train/1out_724-60149412-page-1_2 --psm 6 lstm.train
read_params_file: Can't open lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.3 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/1out_2.tif data/train/1out_2 --psm 6 lstm.train
read_params_file: Can't open lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.3 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/1out_10_0_6340_2.tif data/train/1out_10_0_6340_2 --psm 6 lstm.train
. 
.
.
.
.
.
read_params_file: Can't open lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.3 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/2out_9_0_9925_2.tif data/train/2out_9_0_9925_2 --psm 6 lstm.train
read_params_file: Can't open lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.3 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.

find data/train -name '*.lstmf' -exec echo {} \; | sort -R -o "data/all-lstmf"
total=`cat data/all-lstmf | wc -l` \
   no=`echo "$total * 0.90 / 1" | bc`; \
   head -n "$no" data/all-lstmf > "data/list.train"
total=`cat data/all-lstmf | wc -l` \
   no=`echo "($total - $total * 0.90) / 1" | bc`; \
   tail -n "$no" data/all-lstmf > "data/list.eval"
combine_lang_model \
  --input_unicharset data/unicharset \
  --script_dir /home/OCR/ocrd-train-master/langdata-master \
  --output_dir data/ \
  --lang ocr_model
Loaded unicharset of size 59 from file data/unicharset
Setting unichar properties
Other case f of F is not in unicharset
Other case x of X is not in unicharset
Other case v of V is not in unicharset
Other case q of Q is not in unicharset
Other case k of K is not in unicharset
Other case w of W is not in unicharset
Other case j of J is not in unicharset
Setting script properties
Config file is optional, continuing...
Failed to read data from: /home/OCR/ocrd-train-master/langdata-master/ocr_model/ocr_model.config
Null char=2
mkdir -p data/checkpoints
lstmtraining \
  --traineddata data/ocr_model/ocr_model.traineddata \
  --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c`head -n1 data/unicharset`]" \
  --model_output data/checkpoints/ocr_model \
  --learning_rate 20e-4 \
  --train_listfile data/list.train \
  --eval_listfile data/list.eval \
  --max_iterations 10000
Failed to load list of training filenames from data/list.train
Makefile:129: recipe for target 'data/checkpoints/ocr_model_checkpoint' failed
make: *** [data/checkpoints/ocr_model_checkpoint] Error 1

Also attaching the files generated:
generated_files.tar.gz

from tesstrain.

wrznr avatar wrznr commented on May 29, 2024

@varunsab We are currently investigating the problem and getting back to you next week.

from tesstrain.

kba avatar kba commented on May 29, 2024

read_params_file: Can't open lstm.train

How did you setup tesseract? Is lstm.train in tessdata/configs?

from tesstrain.

varunsab avatar varunsab commented on May 29, 2024

I uninstalled my existing Tesseract 4.00 and installed using:
make leptonica tesseract langdata
Downloaded eng.traineddata into tessdata folder from

https://github.com/tesseract-ocr/tessdata

Yes lstm.train exist in tessdata/configs but training fails.
When I moved lstm.train file to the folder containing Makefile , I am able to train with 100 samples.

But when I tried training with the entire dataset, the same error appears.
make: *** No rule to make target 'JOB#4686', needed by 'data/all-boxes'. Stop.

from tesstrain.

varunsab avatar varunsab commented on May 29, 2024

Got to know that JOB#4686 was part of an image's name. After removing the image which caused the JOB#4686 error, I was able to run my training on the entire dataset.
But training with 5000 images and 10,000 iterations gave Error rate 100.
So I went with fine tuning which gave me good result.

Thank you so much for your support.

from tesstrain.

wrznr avatar wrznr commented on May 29, 2024

@varunsab Glad to here. It would be great if you could give us some insights about your fine tuning steps. Is this something which could be added to ocrd-train?

from tesstrain.

sumanth-kalluri avatar sumanth-kalluri commented on May 29, 2024

can someone please help me with the process of fine-tuning with our data set for English language?

from tesstrain.

artisvirat avatar artisvirat commented on May 29, 2024

@varunsab Hey i am trying to do the same thing with english language. But even after using fine tuning, i am getting char error=100 at the end of training. Can somebody tell me how to exactly to do fine tuning as when i compare the traineddata files of my model and eng.traineddata. There is a huge difference in size (eng.traineddata> my model.traineddata). Shouldn't they be almost same?

from tesstrain.

kaitoqueiroz avatar kaitoqueiroz commented on May 29, 2024

@TheSYNcoder Did you manage to solve this issue? I'm getting the same error.

from tesstrain.

snapcart-ruben avatar snapcart-ruben commented on May 29, 2024

Got the same error. I just deleted the specific entry specified in the error and it continued running.

from tesstrain.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.