Code Monkey home page Code Monkey logo

tools-ocr's Introduction

Tree Hole OCR

English | 中文

Introduction

  • Local OCR Recognition: Tree Hole OCR text recognition tool does not require internet connection. It leverages local OCR technology, based on Paddle OCR model and deep learning frameworks such as PyTorch, DJL, to provide fast and accurate text recognition.
  • Cross-platform compatibility: Developed with Java 1.8 and JavaFX, it supports operation on different operating systems, including Mac OS X 12.6 and above.
  • Powerful functionality: In addition to basic text recognition, it also includes PDF recognition, image text recognition, shortcut key screenshot recognition, and more.

Dependencies Library

  • JDK 1.8
  • JavaFX
  • DJL
  • PyTorch
  • ONNX
  • Paddle OCR
  • OpenCV

Open Source Address

gitee | github

Documentation

https://tree-hole-ocr-docs.vercel.app/

Requirements

  • Mac OS X 12.6 due to dependency on DJL 0.25.0

Installation

  • Please do not include Chinese characters in the installation path;
  • This program is developed with JavaFX, and the installation package provided already includes Java.
  • Download the latest version from release and unzip it for installation.

Using the Program

Screenshot

  • Method one: Click the screenshot button on the main interface of the program;
  • Method two: Press the screenshot shortcut key F4.

Selecting Area

After entering the screenshot interface, press and hold the left mouse button, then drag to select the area you want to capture; After completing the selection, you can fine-tune the selected area:

  • Use arrow keys to adjust the right and top borders of the selected area;
  • Use Shift + arrow keys to adjust the left and bottom borders of the selected area;
  • Use Ctrl + A to select the entire screen.

Confirm Selection

After completing the selection, press Enter or Space key, or double-click the left mouse button to confirm the selection; Once confirmed, the program will automatically perform OCR text recognition on the selected area.

  • image

  • result:

Local Build

Download and Unzip the Models

wget https://github.com/litongjava/tools-ocr/releases/download/model-ppocr-v4/ch_PP-OCRv4_rec_infer-onnx.zip
wget https://github.com/litongjava/tools-ocr/releases/download/model-ppocr-v4/ch_PP-OCRv4_det_infer-onnx.zip

Unzip the models

mkdir models/ch_PP-OCRv4_rec_infer
mkdir models/ch_PP-OCRv4_det_infer
unzip /Users/mac/Downloads/ch_PP-OCRv4_rec_infer-onnx.zip -d models/ch_PP-OCRv4_rec_infer
unzip /Users/mac/Downloads/ch_PP-OCRv4_det_infer-onnx.zip -d models/ch_PP-OCRv4_det_infer

Build the Program

You can download the code and build it locally. The build commands are as follows: windows

mkdir target\jfx\app
cp -r models target\jfx\app
mvn jfx:native -DskipTests -f pom.xml

macos

rm -rf target/jfx/app
mkdir -p target/jfx/app
cp -r models target/jfx/app
mvn jfx:native -DskipTests -f pom.xml

View System Operating Log

cd treehole.app/Contents/java/logs

Notices

MAC Permission Settings

Since screenshot shortcuts are monitored, MAC needs appropriate permissions settings, as shown below:

  • Settings --> Security and Privacy --> Accessibility MAC Permission Settings
  • Settings --> Security and Privacy --> Screen Recording 2

Common Directories

  • Log directory /Applications/treehole.app/Contents/Java/logs
  • Temporary image saving directory /Applications/treehole.app/Contents/Java

TODO

  • PDF Recognition
  • Image Text Recognition
    • Recognition result text alignment (multi-column yet to be implemented)
    • Full screen mode screenshot
    • Adding recognition animation
    • Multi-screen support
  • Text Translation
  • Formula Recognition
  • Table Recognition
  • Software Settings

tools-ocr's People

Contributors

anylisten avatar litongjava avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tools-ocr's Issues

如何打包成exe文件

下载后导入到eclise中如,可以生成jar包,但是没有办法运行
image
1.生成jjar包之后如何运行
2.生成jjar包之后打包成exe文件

windows 7 安装后无反应

win7:ghost的win7,Microsoft Windows [版本 6.1.7601]
树洞:jre版本和普通版本都不行(安装java jdk8u202x64 )
现象:安装后在任务管理器-进程 中能看见树洞的进程,但没有软件界面显示。
不知道缺什么原因?

精简版改进

建议精简版抛弃exe,直接打包成压缩文件。再带一个批处理,利用已经安装好的Java环境直接命令行运行。

软件最小化之后按F4貌似有bug

win10,就是软件没有最小化的时候按截图按钮和F4,是没问题的,但是呢,软件在最小化之后按F4截图,跳不出来界面,然后点击任务栏的软件图标也进不去软件了

有专业版吗?

有专业版吗?
如果有,如何购买?
专业版可以识别整本PDF书吗?

双击完整版出现错误

一开始应该是在安装环境,之后报错 class com/luooqi/ocr/MainFm not/found ,确定后继续报错
Failed to launch JVM,然后打不开

问题

1)希望能开机自动启动,并驻留任务栏等待快捷键唤醒使用,像天若一样。
2)希望能开通自定义阿里、腾讯、百度等接口。
3)没有找到设置界面,希望能更改快捷键。
4)对双屏显示器支持不好,文件在A屏显示,F4后,只能在B屏截取。

报错

win10,64位,安装到最后,系统提示:
failed to find library
\bin\server\jvm
点击确定,继续报错:
failed to locate JNI_CreateJavaVM
点击确定,继续报错:
failed to launch JVM

[Suggesstion] Alternative Download Sources and How-to-Build Provision

  1. The Baidu Net Disk link in README.md is totally invalid.
    My suggestion is that you could put the binaries here, at least. Meanwhile, you could hold backups on Lanzou Cloud and Gitte.
    And For Linux, such as Arch Linux, use PKGBUILD to build its software, but the sources of Baidu Net Disk and Lanzou Cloud (and any other net disk) is unreachable, while Gitte is going to ask for logging in to download the release.

  2. Would you like to provide the methods that how to build this software? Sometimes the release version has some issues, but the newest commit fix them while you haven't had time to release the new version. People will be able to follow the steps to build the new version on their own.

主题:[建议] 考虑提供额外的下载方式和提供构建方法

  1. README.md 里的百度网盘链接已经失效了。
    我的建议是可以把二进制包至少在 GitHub 放一份,同时可以在蓝奏云和码云放备份。而且对于 Linux,比如说 Arch Linux 会使用 PKGBUILD 来构建软件,但是百度网盘和蓝奏云以及其他任何网盘源对于构建来说都是不可及的(因为网盘是动态链接且可能要求 cookies,而构建软件要求静态直链),而码云则要求登录才能下载 Release。

  2. 不知可否提供从源代码构建的方法?有时已经释放的正式版会有一些问题,但是最新提交的源码解决了。那么急用的人就可以从源码构建而不必等待新版发布。并且每个人的系统环境可能有所不同,从源码构建能更灵活得根据实际情况构建适合自己的能用的版本。

4k分辨率下截取位置不对

单4k显示器 缩放200%状态下截屏画面会被放大 ,大概有我四分之一屏幕大小导致桌面左下,右上,右下这三个位置没法截图

链接挂了

链接已死!!!!!!!!!!!!!!!

截取图片的位置不对

我是主屏4K分辨率,副屏1920*1080,截屏在主屏上选取一个框,结果识别结果不是我选的区域,而是上面的某个区域。。。

#bug提交

问题:点截图后自动全屏放大了,导致只可以框选左上角的部分屏幕
设备:surface3@Win10

建议更换开发平台到 .NET Core

有人提请过变更开发环境的建议 #4,理由是需要安装 JRE 运行环境。但是这并不是主要问题。
我的主要论点是,JavaFX 的执行效率实在太低了……JVM 的限制导致有一些问题根本不可能解决。
如果要跨平台的话,我个人建议不妨考虑下微软的 .NET Core,一方面是 .NET Core 是一直在更新的,另一方面 .NET Core 本身也是开源的。这样至少执行效率上和 JavaFX 不是一个等级的。

小小的建议!换开发环境吧!

这类小工具不过需要用户自己安装运行环境,或者下载7-80兆的安装包,基本凉凉了。当然如果开发者自己用或者玩玩而已的话,也还可以

不支持对副屏截图

我当前的设备是双屏,在使用树洞OCR进行图片截取的时候副屏显示的也是主屏的内容,没法对副屏进行截取。

建议添加新功能

建议在软件中添加开机自启动(默认隐藏)并且加入翻译功能

安装后运行无反应

win7 64位系统,安装运行后看不到程序界面,再次运行也无任何反应,但后台进程能看到。
image

识别完后没显示结果,程序退出

在Eclipse里编译运行,选择图片文件,出现“正在识别图片,请稍等.....”的提示,过一会后树洞就退出了。debug了一下,识别结果应该是获取到了,接着restore运行完,然而界面窗口没有恢复。不知为何程序终止了。

如果我把recImage()里的MainFm.stage.close()注释掉后就不会退出了。

stageInfo = new StageInfo(stage.getX(), stage.getY(),
stage.getWidth(), stage.getHeight(), stage.isFullScreen());
MainFm.stage.close();
try {
BufferedImage image = ImageIO.read(selectedFile);
doOcr(image);
} catch (IOException e) {

环境:
Eclipse Version: 2020-03 (4.15.0)
macOS 10.14.6 Mojave
java version "1.8.0_221"
Java(TM) SE Runtime Environment (build 1.8.0_221-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.221-b11, mixed mode)

识别英文的时候最后一个单词会和下一行的单词连在一起

例如这个图
image
识别出来后是这样的,例如其中的make和it会连成一个单词
【派生】accuracy n.精确性,正确度
inaccurate a.不准确的,模糊的
真题 Inaccurate or indefinite words may makeit difficult for the listener to understand themessage which is being transmitted to him.不准确的或不确定的语言会使听者很难理解传达给他的信息。
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.