Code Monkey home page Code Monkey logo

Comments (6)

dinosauria123 avatar dinosauria123 commented on August 27, 2024 1

Thank you for your comment.

This will help to fix my gcv2hocr.

http://stackoverflow.com/questions/12524908/how-to-escape-in-xml

I will check output of PDF file after hocr output is fixed.

I scanned old Japanese camera book and I will covert it to searchable PDF file.
It has 70 jpeg files may help check our code work well or not.

from hocr-tools.

dinosauria123 avatar dinosauria123 commented on August 27, 2024 1

I have committed out to fix xml escape characters problem my gcv2hocr at GitHub.

Now, hocr-pdf works fine and I can make searchable pdf file from 70 jpeg images from old Japanese camera book.

I made a simple shell script to extract text from images by Google Cloud Vision OCR and make a searchable pdf from them.

#!/bin/sh
a=1
while [ $a -le 100 ]
do
sh ./gcvocr.sh scan00$a.jpg "Your API Code"
echo "Google OCR scan00$a.jpg"
gcv2hocr scan00$a.jpg.json scan00$a.hocr
echo "convert scan00$a.hocr"
a=expr $a + 1
done
python hocr-pdf . > output.pdf

If you are interested in my output pdf, download it here (about 40MB)
https://t.co/BUjCMgVnyr

It was written in 1928, public domein in Japan.

from hocr-tools.

dinosauria123 avatar dinosauria123 commented on August 27, 2024

Sorry, I found problem in my hocr file.

"&" character is remain in output of scan.003.hocr at line 469.

I will fix my gcv2hocr.

from hocr-tools.

zuphilip avatar zuphilip commented on August 27, 2024

Okay, good to hear that you found the error and can fix it.

Can you confirm that the fixed file runs through horc-pdf?

from hocr-tools.

zuphilip avatar zuphilip commented on August 27, 2024

I close this issue now. @dinosauria123 Thank you also for sharing your use case in more details. I appreciate to see possible reuse of our work here.

from hocr-tools.

dinosauria123 avatar dinosauria123 commented on August 27, 2024

Thank you for your kindly helps !

from hocr-tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.