naptha / tesseract.js-core Goto Github PK
View Code? Open in Web Editor NEWEmscripten port of Tesseract C++ API
License: Apache License 2.0
Emscripten port of Tesseract C++ API
License: Apache License 2.0
A feature was implemented in tesseract.js to rotate images using exif data. Unfortunately, this caused a dramatic slowdown. While there is likely a better way to handle using only Javascript, ultimately, the most efficient solution is to apply all image processing steps using Leptonica (in Tesseract) rather than adding Javascript libraries.
does this project support the tesseract 4?
C:\Users\xxx\nodejs\node_modules\tesseract.js-core\index.js:261
ya:function(a,b,c){a=L.createNode(a,b,41471,0);a.link=c;return a},bb:function(a){v.Xa(a.mode)||f(new v.k(I.H));return a.link}},A:{$:function(a,b,c,d,e){var g=a.o.p;if(e>=a.o.I)return 0;a=Math.min(a.o.I-e,d);p(0<=a);if(8<a&&g.subarray)b.set(g.subarray(e,e+a),c);else for(d=0;d<a;d++)b[c+d]=g[e+d];return a},write:function(a,b,c,d,e,g){if(!d)return 0;a=a.o;a.timestamp=Date.now();if(b.subarray&&(!a.p||a.p.subarray)){if(g)return a.p=b.subarray(c,c+d),a.I=d;if(0===a.I&&0===e)return a.p=new Uint8Array(b.subarray(c,RangeError: Array buffer allocation failed
at arrayBufferConstructor_DoNotInitialize ()
at new Uint8Array ()
at Object.write (C:\Users\xxx\nodejs\node_modules\tesseract.js-core\index.js:261:485)
at Object.write (C:\Users\xxx\nodejs\node_modules\tesseract.js-core\index.js:282:335)
at Object.oc [as FS_createDataFile] (C:\Users\xxx\nodejs\node_modules\tesseract.js-core\index.js:291:64)
at C:\Users\xxx\nodejs\node_modules\tesseract.js\src\common\worker.js:89:16
at C:\Users\xxx\nodejs\node_modules\tesseract.js\src\node\lang.js:14:25
at FSReqWrap.readFileAfterClose [as oncomplete] (internal/fs/read_file_context.js:53:3)
i think it is a memory limit error. how can i increase memory limit or what can i do about this issue ?
Hey!
Are there any plans to update core to build from tesseract 5?
I'm having trouble creating a PR because I'm having build errors
Checking for modules 'icu-uc;icu-i18n'
-- No package 'icu-uc' found
-- No package 'icu-i18n' found
I'd be happy to help if you could advise on how to fix this. Thanks!
SIMD support has now been added for all major desktop browsers except Safari (see link below for browser support). As this was one of the primarily reasons why the Tesseract LSTM engine is significantly slower on web versus desktop, we should be able to expect major improvements in performance.
One challenge is whether we can reliably detect which browsers support the SIMD-enabled build, or whether this decision should be deferred to developers using tesseract.js. One option for feature detection is below.
The release version of Tesseract.js is compiled with the O3
optimization level. The size and runtime impact of compiling at different optimization levels is shown below. Size
refers to the size of the tesseract-core-simd.wasm.js
file. Runtime
refers to the runtime using the Tesseract.js Node benchmark (found in the main repo here). Lower optimizations levels that are not generally used for release versions are not included.
Optimization | Size | Runtime |
---|---|---|
O2 | 4.7 MB | 53s |
O3 | 4.8 MB | 51s |
Os | 3.6 MB | 70s |
Oz | 3.6 MB | 71s |
As the O2
vs. O3
results are nearly identical, as are the Os
vs. Oz
results, the only question is whether to prefer the smaller/slower builds or larger/faster builds. While the lower sizes produced by Os
and Oz
are appealing, the cost is steep--a ~39% increase in runtime. For most uses, the runtime increase will outweigh the 1 MB increase in size, so the O3
level is used for the release version. However, using a different optimization level may be desirable for certain use cases where recognition speed is anticipated to be very low and/or network speeds are anticipated to be very slow.
Hi guys,
I'm trying to build the file tesseract-core.wasm.js
by myself with tag v2.2.0, but the generated file is different with this report provided. Different wasm content, different file size.
Build Environment:
OS: Ubuntu 20.04
Branch: tag v2.2.0
Follow this README to build the file.
Attach is the build log and self-build js file.
build.log
self-build-tesseract-core.wasm.js.txt
This issue was reported in the main repo, however I confirmed that the root cause is the update to Tesseract.js-core. See naptha/tesseract.js#655
It looks like the license file in this repo is using the Apache License 2.0, which has an spdx identifier of "Apache-2.0", not "Apache License 2.0" as it is currently set in package.json:
tesseract.js-core/package.json
Line 45 in 69a5023
This mismatch is being flagged up by license checking tools like license-checker.
Can this please be corrected?
The main tesseract repo uses the correct identifier:
https://github.com/naptha/tesseract.js/blob/master/package.json#L36
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.