Code Monkey home page Code Monkey logo

Comments (11)

zuphilip avatar zuphilip commented on August 27, 2024 1

are you able to generate searchable pdf ?

Yes, I see a searchable PDF, but I am working on Linux.

For windows terminal the encoding can be a problem. You can check the encoding for python in windows terminal by starting python and then type

>>> import sys
>>> sys.stdout.encoding

If that is now UTF-8 then you can try to run the command with PYTHONIOENCODING=UTF-8 in front, i.e.

PYTHONIOENCODING=UTF-8 hocr-pdf . > out.pdf

i got pdf as output but it was just a normal pdf i.e. not in searchable format.

This is with the git bash on windows, right? Can you upload your result here?

from hocr-tools.

stweil avatar stweil commented on August 27, 2024

Can you provide a hOCR file which causes this error? How did you create it?

from hocr-tools.

shekarnode avatar shekarnode commented on August 27, 2024

I used Tesseract 4.0.0 to generate hocr
Hocr File

This is the image for above generate Hocr
e3_out

from hocr-tools.

shekarnode avatar shekarnode commented on August 27, 2024

Is there any other solution for getting table from hocr data ?

from hocr-tools.

zuphilip avatar zuphilip commented on August 27, 2024

This works for me as well after I have renamed the image and converted it to a jpg file.

  1. Do you have the jpg file also in your directory?
  2. What is your environment? Linux or Windows?
  3. What Python version do you use? python -V
  4. What is the encoding of your bash which Python uses?

from hocr-tools.

shekarnode avatar shekarnode commented on August 27, 2024

@zuphilip

  1. i was using png image for conversion , now i replaced it with jpg.
  2. Environment - Windows
  3. Python 3.6.4
  4. well i was using cmd to get output , tried with git bash , i got pdf as output but it was just a normal pdf i.e. not in searchable format.

are you able to generate searchable pdf ?

from hocr-tools.

amitdo avatar amitdo commented on August 27, 2024

Tesseract has an option to output to pdf. Did you tried it?

from hocr-tools.

shekarnode avatar shekarnode commented on August 27, 2024

@zuphilip
out.pdf
this the pdf file being generated

@amitdo
i have tried generating searchable pdf from tesseract also:
the commands are provided over here were used .
still the output is not searchable fromat its just simple pdf with image.

from hocr-tools.

zuphilip avatar zuphilip commented on August 27, 2024

@shekarnode There is text in your generated PDF and I can search for text as well.

from hocr-tools.

shekarnode avatar shekarnode commented on August 27, 2024

I was using adobe reader and all the time was not able to search ,now when I opened the pdf in browser I found out it was searchable.

Thanks @zuphilip for helping out.

from hocr-tools.

amitdo avatar amitdo commented on August 27, 2024

The pdf produced by Tesseract is also searchable.

from hocr-tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.