Code Monkey home page Code Monkey logo

Comments (18)

kevincon avatar kevincon commented on July 25, 2024

Which version of the library are you using? If you're using the 2.2 CocoaPod, there is a known memory leak in that version, but it has since been fixed. The CocoaPod has just not been updated, for some unknown reason (see #49). You can use the latest version of the repo via CocoaPods by using this line in your Podfile:

pod 'TesseractOCRiOS', :git => 'https://github.com/gali8/Tesseract-OCR-iOS.git'

from tesseract-ocr-ios.

lbwxly avatar lbwxly commented on July 25, 2024

I use the latest version 3.03

from tesseract-ocr-ios.

kevincon avatar kevincon commented on July 25, 2024

Can you post a code snippet or link to your code so I can try reproducing the high memory usage?

from tesseract-ocr-ios.

lbwxly avatar lbwxly commented on July 25, 2024

I download the source code as Zip on the gitHub, and then copy the TesseractOCR.framework in the product folder to my project.

from tesseract-ocr-ios.

lbwxly avatar lbwxly commented on July 25, 2024

Tesseract* tesseract = [[Tesseract alloc] initWithLanguage:@"eng"];
tesseract.delegate = self;
[tesseract setVariableValue:@"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijkhlmnopqrstuvwxyz" forKey:@"tessedit_char_whitelist"]; //limit search

                 UIImage* bwImage = [image blackAndWhite];

                 [tesseract setImage:bwImage]; //image to check
                 BOOL result = [tesseract recognize];
                 if (!result) {
                     [self performSelectorOnMainThread:@selector(uiShowMessage:) withObject:NSLocalizedString(@"OCRFailed", @"OCR Failed.") waitUntilDone:YES];
                     return;
                 }
                 if (text == nil || [text compare:@""] == NSOrderedSame) {
                     return;
                 }

from tesseract-ocr-ios.

kevincon avatar kevincon commented on July 25, 2024

That looks like it should work okay. I guess as a santiy check you could try using CocoaPods in your project and see if you still have the high memory usage (using the Podfile entry I mentioned above).

I can look at the memory usage for your project if you are okay with emailing me your code (kcon AT stanford DOT edu), or you can use this guide to diagnose it yourself: http://www.raywenderlich.com/23037/how-to-use-instruments-in-xcode

from tesseract-ocr-ios.

lbwxly avatar lbwxly commented on July 25, 2024

ok, thank you.

from tesseract-ocr-ios.

lbwxly avatar lbwxly commented on July 25, 2024

by the way, what is the size of your image?

from tesseract-ocr-ios.

kevincon avatar kevincon commented on July 25, 2024

The images I typically recognize in Tesseract are quite small because I require the user to crop what they're interested in to a small rectangle, so my image dimensions are 90 x 70 (width x height). But the Template Framework Project in this repo recognizes on this much larger image (https://raw.githubusercontent.com/gali8/Tesseract-OCR-iOS/master/Template%20Framework%20Project/Template%20Framework%20Project/image_sample.jpg), during which I don't see the kind of memory usage you are reporting.

from tesseract-ocr-ios.

lbwxly avatar lbwxly commented on July 25, 2024

But i got the same result when i launch the TemplateFramework on my iPod,after finishing recognize, the memory usage will kept at 43M.

from tesseract-ocr-ios.

lbwxly avatar lbwxly commented on July 25, 2024

screen shot 2014-12-06 at 3 08 07

from tesseract-ocr-ios.

lbwxly avatar lbwxly commented on July 25, 2024

i use the image captured by camera

from tesseract-ocr-ios.

kevincon avatar kevincon commented on July 25, 2024

Oh I see, sorry I misread your first post because I thought you were saying the memory was growing by that amount (uncontrollably), but you're just pointing out that it's the memory in use after Tesseract recognizes any image.

I was able to reproduce your result, and although it's unfortunate, I'm afraid this is an artifact of Tesseract in general, so there's nothing we can do about it for this wrapper library. See this issue on the main Tesseract project, where someone reports a similar static memory usage from Valgrind and also where one person comments:

"Some of the dawgs are held statically to minimize the time consumed by deleting and re-creating
apis, and memory consumed running them in parallel from multiple threads. This isn't a real leak, 
and memory actually used will not grow over time as a result."

This matches the results of a profile I ran of the Template Framework project memory usage. See how the function that reads the DAWG data uses 17.78 MB all by itself:

screen shot 2014-12-01 at 11 31 00 pm

Your first post said the app started at 12 MB and then rose to 38 MB. Well 38 - 12 = 16 which is about 17.78 MB, so I think this explains the issue.

from tesseract-ocr-ios.

kevincon avatar kevincon commented on July 25, 2024

One other thing worth mentioning is that the size of the DAWG is related to the specific language file you are using.

I'm assuming we both used "eng" in our tests, but I can confirm that using a custom language file (with less training data) can reduce the static memory usage. In my test just now, my custom language file (for a custom font I am recognizing) used 30 MB less memory than the "eng" language file.

from tesseract-ocr-ios.

lbwxly avatar lbwxly commented on July 25, 2024

how to make the custom language file? is there any tutorial?

from tesseract-ocr-ios.

zachberger avatar zachberger commented on July 25, 2024

The upstream project will be able to help you out with custom language files:

from tesseract-ocr-ios.

kevincon avatar kevincon commented on July 25, 2024

You may also find this tutorial useful if the font you want to train on is a font you have installed on your computer (so you can create the training document in Microsoft Word): http://michaeljaylissner.com/posts/2012/02/11/adding-new-fonts-to-tesseract-3-ocr-engine/

If the font is not one you can install on your computer, then you have to basically make the image of the training characters yourself, whether that means taking pictures of the characters or drawing the characters yourself in a drawing program.

from tesseract-ocr-ios.

kevincon avatar kevincon commented on July 25, 2024

I'm tagging this as "wontfix", although really it should be "can'tfix" since this is just how the upstream Tesseract library works.

from tesseract-ocr-ios.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.