Code Monkey home page Code Monkey logo

tesseract.js-core's People

Contributors

antimatter15 avatar balearica avatar dependabot[bot] avatar fdawgs avatar jeromewu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tesseract.js-core's Issues

Add option to rotate based on exif orientation tag

A feature was implemented in tesseract.js to rotate images using exif data. Unfortunately, this caused a dramatic slowdown. While there is likely a better way to handle using only Javascript, ultimately, the most efficient solution is to apply all image processing steps using Leptonica (in Tesseract) rather than adding Javascript libraries.

RangeError: Array buffer allocation failed

C:\Users\xxx\nodejs\node_modules\tesseract.js-core\index.js:261
ya:function(a,b,c){a=L.createNode(a,b,41471,0);a.link=c;return a},bb:function(a){v.Xa(a.mode)||f(new v.k(I.H));return a.link}},A:{$:function(a,b,c,d,e){var g=a.o.p;if(e>=a.o.I)return 0;a=Math.min(a.o.I-e,d);p(0<=a);if(8<a&&g.subarray)b.set(g.subarray(e,e+a),c);else for(d=0;d<a;d++)b[c+d]=g[e+d];return a},write:function(a,b,c,d,e,g){if(!d)return 0;a=a.o;a.timestamp=Date.now();if(b.subarray&&(!a.p||a.p.subarray)){if(g)return a.p=b.subarray(c,c+d),a.I=d;if(0===a.I&&0===e)return a.p=new Uint8Array(b.subarray(c,

RangeError: Array buffer allocation failed
at arrayBufferConstructor_DoNotInitialize ()
at new Uint8Array ()
at Object.write (C:\Users\xxx\nodejs\node_modules\tesseract.js-core\index.js:261:485)
at Object.write (C:\Users\xxx\nodejs\node_modules\tesseract.js-core\index.js:282:335)
at Object.oc [as FS_createDataFile] (C:\Users\xxx\nodejs\node_modules\tesseract.js-core\index.js:291:64)
at C:\Users\xxx\nodejs\node_modules\tesseract.js\src\common\worker.js:89:16
at C:\Users\xxx\nodejs\node_modules\tesseract.js\src\node\lang.js:14:25
at FSReqWrap.readFileAfterClose [as oncomplete] (internal/fs/read_file_context.js:53:3)

i think it is a memory limit error. how can i increase memory limit or what can i do about this issue ?

Update to use Tesseract 5

Hey!

Are there any plans to update core to build from tesseract 5?

I'm having trouble creating a PR because I'm having build errors

Checking for modules 'icu-uc;icu-i18n'
--   No package 'icu-uc' found
--   No package 'icu-i18n' found

I'd be happy to help if you could advise on how to fix this. Thanks!

Add SIMD-enabled build

SIMD support has now been added for all major desktop browsers except Safari (see link below for browser support). As this was one of the primarily reasons why the Tesseract LSTM engine is significantly slower on web versus desktop, we should be able to expect major improvements in performance.

One challenge is whether we can reliably detect which browsers support the SIMD-enabled build, or whether this decision should be deferred to developers using tesseract.js. One option for feature detection is below.

https://github.com/GoogleChromeLabs/wasm-feature-detect

https://webassembly.org/roadmap/

Test different optimization levels

The release version of Tesseract.js is compiled with the O3 optimization level. The size and runtime impact of compiling at different optimization levels is shown below. Size refers to the size of the tesseract-core-simd.wasm.js file. Runtime refers to the runtime using the Tesseract.js Node benchmark (found in the main repo here). Lower optimizations levels that are not generally used for release versions are not included.

Optimization Size Runtime
O2 4.7 MB 53s
O3 4.8 MB 51s
Os 3.6 MB 70s
Oz 3.6 MB 71s

As the O2 vs. O3 results are nearly identical, as are the Os vs. Oz results, the only question is whether to prefer the smaller/slower builds or larger/faster builds. While the lower sizes produced by Os and Oz are appealing, the cost is steep--a ~39% increase in runtime. For most uses, the runtime increase will outweigh the 1 MB increase in size, so the O3 level is used for the release version. However, using a different optimization level may be desirable for certain use cases where recognition speed is anticipated to be very low and/or network speeds are anticipated to be very slow.

Change `license` value in package.json to match LICENSE file contents

It looks like the license file in this repo is using the Apache License 2.0, which has an spdx identifier of "Apache-2.0", not "Apache License 2.0" as it is currently set in package.json:

"license": "Apache License 2.0",

This mismatch is being flagged up by license checking tools like license-checker.
Can this please be corrected?

The main tesseract repo uses the correct identifier:

https://github.com/naptha/tesseract.js/blob/master/package.json#L36

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.