Comments (13)
@crater2150 Many thanks for your efforts.
Any progress on the OCR-D vs. non-OCR-D invocation issue?
from ocrd-pixelclassifier-segmentation.
Any ideas what's going wrong in my case? Does make test-cli
work for anyone?
from ocrd-pixelclassifier-segmentation.
It seems like the pixel classifier is giving different results when running in the segmentation tool, which is strange.
I'm looking into it.
from ocrd-pixelclassifier-segmentation.
@crater2150 Many thanks. I would really like to test your tool. How would I run it outside the segmentation tool?
from ocrd-pixelclassifier-segmentation.
It looks like the standalone pixel classifier applies some preprocessing to the image during loading the file from disk, which is circumvented by loading the image via the workspace. I'm currently separating the processing from the loading and testing if that fixes the issue.
@wrznr To run the pixel classifier by itself, you can use
ocr4all-pixel-classifier predict --load $PATH_TO_MODEL \
--binary $PATH_TO_IMAGE --images $PATH_TO_IMAGE \
--color_map color_map.json --char_height $XHEIGHT --output out
with color_map.json
containing:
{"(255, 255, 255)": [0, "none"], "(255, 0, 0)": [1, "text"], "(0, 255, 0)": [2, "image"]}
Which should run the pixel classifier and write three subfolders to out/
with color
being the CNN output (red = text, green = image, white = neither). overlay
contains the output combined with the input image for visualization. inverted
is also a combination of output and input, where all background pixels are black and foreground pixels are the color of their classification.
from ocrd-pixelclassifier-segmentation.
@crater2150 Thank you! I will test this asap. But two questions pop up:
- Does this mean the pixel classifier is only able to distinguish text from image and non-text (@cneud)?
- Did the test ever ran successfully in your environment? If yes, what was the configuration? If not, why do you provide at all?
from ocrd-pixelclassifier-segmentation.
- The current model can only distinguish these categories, but the pixel classifier can in theory be trained with more classes (you can set the number of classes during training and provide a json file like the one above defining which color represents which class in the mask). As the results with the current training data were worse (text often not being detected), we currently only have this model.
- I tried the tool locally with files read from disk, and wrote the test to check if the ocrd-interface could read files from the workspace and pass them on, but did not check the results, so I missed the difference in file reading. Sorry.
from ocrd-pixelclassifier-segmentation.
@crater2150 Many thanks for the information! It would be very helpful if you could document the creation of your current model on a step-by-step basis for example as a Gist. We could then extend the model more easily as soon as more and more reliable training data become available.
from ocrd-pixelclassifier-segmentation.
Ok, I will create a gist with the model creation tomorrow.
I fixed some bugs in the segmentation that caused no output to be produced with the ocrd-wrapper. But for one of the two images, no output segments are produced even with the fixes, while the other one works. I have a suspicion about the reason (based on the CNN output, only a single segment should be created, which maybe leads to problems in the XYCut implementation), but will have to check.
from ocrd-pixelclassifier-segmentation.
@wrznr I added examples for dataset preparation and training to the pixel classifier repositorty, after adding and polishing the tool for mask generation.
from ocrd-pixelclassifier-segmentation.
@wrznr as I mentioned before, the test should now output something, it just fails in the case of only a single region being found on the page.
As I'm currently sick, I can't say for sure if I can fix that issue before the workshop, sorry :(
from ocrd-pixelclassifier-segmentation.
It seems that the model file shipped in this repo was broken. I replaced it with another one (trained on 9137 pages from DTA-2). This does produce a segmentation for me with make test-cli
and a fresh environment. Can you retry it?
from ocrd-pixelclassifier-segmentation.
Now segmentation produce results. 👍
Attention: Will not work in combination with the dewarp of DFKI.
from ocrd-pixelclassifier-segmentation.
Related Issues (12)
- documentation: README completness, debug ocrd-tool.json HOT 1
- New lgt-model is not available after installation via pip (version 0.1.4) HOT 1
- ValueError: Error when checking input: expected input_1 to have shape (None, None, 1) but got array with shape (3177, 2350, 2) HOT 11
- requires Python >=3.7 HOT 8
- TypeError: unsupported operand type(s) for -: 'int' and 'NoneType' HOT 1
- OCR-D-OCR*.xml does not contain ReadingOrder HOT 1
- TypeError: make_requirement_preparer() got an unexpected keyword argument 'wheel_download_dir' HOT 2
- use OCR-D CLI HOT 3
- makefile: install must depend on deps HOT 3
- Fix Makefile or README
- Could not load dynamic library 'libnvinfer.so.6' HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ocrd-pixelclassifier-segmentation.