Code Monkey home page Code Monkey logo

gosseract's Introduction

gosseract OCR

Go Test Docker Test BSD Test codecov Go Report Card Maintainability License: MIT Go Reference

Golang OCR package, by using Tesseract C++ library.

OCR Server

Do you just want OCR server, or see the working example of this package? Yes, there is already-made server application, which is seriously easy to deploy!

👉 https://github.com/otiai10/ocrserver

Example

package main

import (
	"fmt"
	"github.com/otiai10/gosseract/v2"
)

func main() {
	client := gosseract.NewClient()
	defer client.Close()
	client.SetImage("path/to/image.png")
	text, _ := client.Text()
	fmt.Println(text)
	// Hello, World!
}

Installation

  1. tesseract-ocr, including library and headers
  2. go get -t github.com/otiai10/gosseract/v2

Please check this Dockerfile to get started step-by-step. Or if you want the env instantly, you can just try by docker run -it --rm otiai10/gosseract.

Test

In case you have tesseract-ocr on your local, you can just hit

% go test .

Otherwise, if you DON'T want to install tesseract-ocr on your local, kick ./test/runtime which is using Docker and Vagrant to test the source code on some runtimes.

% ./test/runtime --driver docker
% ./test/runtime --driver vagrant

Check ./test/runtimes for more information about runtime tests.

Issues

gosseract's People

Contributors

ansonl avatar arthurhenrique avatar awskii avatar bake avatar dependabot[bot] avatar dmorawetz avatar emil2k avatar esiqveland avatar guitarbum722 avatar hfoxy avatar huehnerhose avatar jayxon avatar jxsl13 avatar khanbalarashidov avatar kt3k avatar moolen avatar muesli avatar otiai10 avatar pide2000 avatar pukoren avatar shogg-isentia avatar thewhitetulip avatar tomnomnom avatar ttacon avatar will7200 avatar willdurand avatar yin1999 avatar zimmski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gosseract's Issues

no buildable Go source files

../../vendor/github.com/otiai10/gosseract/goss.go:3:8: no buildable Go source files in /go/src/vendor/github.com/otiai10/gosseract/tesseract

go build:
CGO_ENABLED=0 GOOS=windows GOARCH=amd64 go build app/server

generating temp files

// Generates tmp filepath
func genTmpFilePath() string {
    id, _ := uuid.NewV4()
    return TMPDIR + "/" + id.String()
}

ioutil.TempFile provides this

TempFile creates a new temporary file in the directory dir with a name beginning with prefix, opens the file for reading and writing, and returns the resulting *os.File. If dir is the empty string, TempFile uses the default directory for temporary files (see os.TempDir). Multiple programs calling TempFile simultaneously will not choose the same file. The caller can use f.Name() to find the pathname of the file. It is the caller's responsibility to remove the file when no longer needed.

Support Tesseract3.03~

As is (almost vacant)

package gosseract

type tesseract0303 struct {
    version string
}

func (t tesseract0303) Version() string {
    return t.version
}
func (t tesseract0303) Execute(args []string) (res string, e error) {
    res = "tesseract0303"
    return
}

tesseract/baseapi.h: No such file or directory

I'd like to use tesseract with go on Windows 7.

During the installation process, as stated in the docs I execute

c:\go\src\proj>go get github.com/otiai10/gosseract
# github.com/otiai10/gosseract/tesseract
C:\go\src\github.com\otiai10\gosseract\tesseract\tess.cpp:1:31: fatal error: tesseract/baseapi.h: No such file or directory
 #include <tesseract/baseapi.h>
                               ^
compilation terminated.

And by searching the file system for the header file baseapi.h, I cannot find it.

How can I solve this? Thank you

'tesseract/baseapi.h' file not found

% go test ./...
# github.com/otiai10/gosseract/tesseract
tesseract/tess.cpp:1:10: fatal error: 'tesseract/baseapi.h' file not found
FAIL    github.com/otiai10/gosseract [build failed]

PSM Support?

Is there a way to edit the PSM argument that gets passed into Tesseract?

Translate comments and issues to English

Please translate all comments and issues to English so other non-speaking (I do not even know which languages that is 👽) developers can help with the TODO.

But actual `No tesseract version is found, supporting 3.02~, 3.03~, 3.04~ and 3.05~`

[root@localhost gosseract]# go test
/tmp/go-build841552903/gosseract/_test/gosseract.test: error while loading shared libraries: liblept.so.5: cannot open shared object file: No such file or directory
exit status 127
FAIL gosseract 0.001s
[root@localhost gosseract]# ls /usr/local/lib/
codecs/ liblept.so.5.0.1 libtesseract.so.4
liblept.a libpython3.6m.a libtesseract.so.4.0.0
liblept.la libtesseract.a pkgconfig/
liblept.so libtesseract.la python3.6/
liblept.so.5 libtesseract.so
[root@localhost gosseract]# ls /usr/local/lib/libtesseract.
libtesseract.a libtesseract.so libtesseract.so.4.0.0
libtesseract.la libtesseract.so.4
[root@localhost gosseract]# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
[root@localhost gosseract]# go test
Info in bmfCreate: Generating pixa of bitmap fonts from string
Warning. Invalid resolution 0 dpi. Using 70 instead.
Info in bmfCreate: Generating pixa of bitmap fonts from string
Warning. Invalid resolution 0 dpi. Using 70 instead.
Info in bmfCreate: Generating pixa of bitmap fonts from string
Warning. Invalid resolution 0 dpi. Using 70 instead.
all_test.go at line 37
Expected to be 42
But actual 03:41:26
--- FAIL: Test_Must_WithDigest (0.42s)
all_test.go at line 42
Expected to be <nil>
But actual No tesseract version is found, supporting 3.02~, 3.03~, 3.04~ and 3.05~
--- FAIL: Test_NewClient (0.01s)
--- FAIL: TestClient_Src (0.01s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x520c7d]

goroutine 13 [running]:
testing.tRunner.func1(0xc420069040)
/usr/local/go/src/testing/testing.go:622 +0x29d
panic(0x54a340, 0x8119c0)
/usr/local/go/src/runtime/panic.go:489 +0x2cf
gosseract.TestClient_Src(0xc420069040)
/data/software/src/gosseract/all_test.go:55 +0x3d
testing.tRunner(0xc420069040, 0x57e468)
/usr/local/go/src/testing/testing.go:657 +0x96
created by testing.(*T).Run
/usr/local/go/src/testing/testing.go:697 +0x2ca
exit status 2
FAIL gosseract 0.707s

command `goss`

.
├── cmd
│   └── gosseract
% gosseract target.png
abcABC

Testing on Mac OSX

I have installed everything and after build the go file with readme example, i have an error:

ld: warning: ld: warning: ignoring file /usr/local/lib/liblept.dylib, file was built for x86_64 which is not the architecture being linked (i386): /usr/local/lib/liblept.dylibignoring file /usr/local/lib/libtesseract.dylib, file was built for x86_64 which is not the architecture being linked (i386): /usr/local/lib/libtesseract.dylib

What can be happen?

Can't build with MinGW

I cannot build with go build nor I can go get github.com/otiai10/gosseract ,
I get error:
go build github.com/otiai10/gosseract/tesseract: C:\go\pkg\tool\windows_386\cgo.exe: exit status 2

I installed CGO, tried restarting my pc, nothing works.

tesseract/baseapi.h: No such file or directory (Debian)

tesseract/baseapi.h: No such file or directory

% go test ./...
# github.com/otiai10/gosseract/tesseract
tesseract/tess.cpp:1:31: fatal error: tesseract/baseapi.h: No such file or directory
compilation terminated.
FAIL    github.com/otiai10/gosseract [build failed]

Windows 10 can't run

My system is windows10 64 bit, my gcc use https://github.com/go-vgo/Mingw,
At first,error is : tesseract/baseapi.h' file not found ,
Then I went to download class libraries :

  1. tesseract-3.02.02-win32-lib-include-dirs.zip download url https://sourceforge.net/projects/tesseract-ocr-alt/files/
  2. leptonica-1.68-win32-lib-include-dirs.zip download url http://www.leptonica.com/download.html
    And then put this in "Mingw include " and "Mingw lib " directory.

Above errors disappear.
but And there's a new mistake.
d:/git/mingw/bin/../lib/gcc/x86_64-w64-mingw32/4.8.2/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -llept
d:/git/mingw/bin/../lib/gcc/x86_64-w64-mingw32/4.8.2/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -ltesseract
collect2.exe: error: ld returned 1 exit status

Spent 2 days, and now do not know how to do, and hope to be resolved, thank you!

Can't build with 4.00.00dev

go version go1.9.1 linux/amd64

# github.com/otiai10/gosseract/tesseract
In file included from /usr/include/tesseract/ltrresultiterator.h:26:0,
                 from /usr/include/tesseract/resultiterator.h:26,
                 from /usr/include/tesseract/baseapi.h:31,
                 from tess.cpp:1:
/usr/include/tesseract/unichar.h:164:10: ошибка: «string» does not name a type; did you mean «stdin»?
   static string UTF32ToUTF8(const std::vector<char32>& str32);
          ^~~~~~
          stdin

Change `Must` interface

txt := goss.Must(goss.Params{
    Src:    "./source.png",
    Digest: "/Users/otiai10/digest.txt",
})

test errors: Error opening data file and can't find language

I tried to run the first test, but it failed with the following errors:

[wigywizzle@wigywizzle gosseract]$ go test ./...
Error opening data file /usr/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
FAIL    github.com/otiai10/gosseract    0.006s
?       github.com/otiai10/gosseract/tesseract  [no test files]
Error opening data file /usr/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
FAIL    github.com/otiai10/gosseract/tesseract/test     0.006s

support jpeg?

when i use the following code to ocr a jpeg format pic.

img_url := "http://cityjw.dlut.edu.cn:7001/ACTIONVALIDATERANDOMPICTURE.APPPROCESS"
resp, err := client.Get(img_url)
if err != nil {
// handle error
}
defer resp.Body.Close()
OcrClient, _ := gosseract.NewClient()
img, _ := jpeg.Decode(resp.Body)
out, _ := OcrClient.Image(img).Out()
fmt.Println(out)

and then i got some error.

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x43926a]

goroutine 1 [running]:
runtime.panic(0x664a20, 0x965b48)
/usr/lib/go/src/pkg/runtime/panic.c:266 +0xb6
github.com/otiai10/gosseract.(*Client).Image(0x0, 0x7f008865e7a0, 0xc210059480, 0x0)
/home/halfcrazy/gocode/src/github.com/otiai10/gosseract/client.go:58 +0x13a
main.main()
/home/halfcrazy/gocode/src/school_helper/main.go:29 +0x204

go get error

I want to install package on my mac. but i get a error like this. I checked this file in project i cannot find. I think it is missing.

go get github.com/otiai10/gosseract
# github.com/otiai10/gosseract/tesseract
../../otiai10/gosseract/tesseract/tess.cpp:5:10: fatal error: 'tesseract/baseapi.h' file not found

Possible to add support for tessedit_write_images config variable?

Hi,

I was reading through the Tesseract docs here on improving the OCR output quality. It mentions that setting the tessedit_write_images config variable allows a user to view the input file after initial processing by Tesseract.

Would it be possible to add this feature to the wrapper? It seems that a combination of api->SetVariable("tessedit_write_images", writeimages); and api->GetThresholdedImage() in tess.cpp would allow saving the .tif file. I attempted this in a branch but I don't have much C++ experience and it didn't seem to work...

Thank you

Undefined reference to simple.

go get github.com/otiai10/gosseract

/tmp/go-build043179523/github.com/otiai10/gosseract/tesseract/_obj/wrapper.cgo2.o: In function `_cgo_f34bd392845b_Cfunc_simple':
/usr/share/go/src/pkg/github.com/otiai10/gosseract/tesseract/wrapper.go:35: undefined reference to `simple'
collect2: ld returned 1 exit status

The build failed during tests because of this error too.

api->Version()

api, _ := gosseract.API()
ver := api.Version()
fmt.Println(ver)
// 3.05.00

go build error on linux

when i use `gox -osarch="darwin/amd64" to build; it shows:

1 errors occurred:
--> darwin/amd64 error: exit status 1
Stderr: ../vendor/github.com/otiai10/gosseract/goss.go:3:8: no buildable Go source files in /home/viggo/Documents/GoPath/src/vend
or/github.com/otiai10/gosseract/tesseract

but it's ok when i use gox -osarch="linux/amd64"

And my system is debian 8

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.