Code Monkey home page Code Monkey logo

Comments (13)

wrznr avatar wrznr commented on September 27, 2024 1

@crater2150 Many thanks for your efforts.

Any progress on the OCR-D vs. non-OCR-D invocation issue?

from ocrd-pixelclassifier-segmentation.

wrznr avatar wrznr commented on September 27, 2024

Any ideas what's going wrong in my case? Does make test-cli work for anyone?

from ocrd-pixelclassifier-segmentation.

crater2150 avatar crater2150 commented on September 27, 2024

It seems like the pixel classifier is giving different results when running in the segmentation tool, which is strange.
I'm looking into it.

from ocrd-pixelclassifier-segmentation.

wrznr avatar wrznr commented on September 27, 2024

@crater2150 Many thanks. I would really like to test your tool. How would I run it outside the segmentation tool?

from ocrd-pixelclassifier-segmentation.

crater2150 avatar crater2150 commented on September 27, 2024

It looks like the standalone pixel classifier applies some preprocessing to the image during loading the file from disk, which is circumvented by loading the image via the workspace. I'm currently separating the processing from the loading and testing if that fixes the issue.

@wrznr To run the pixel classifier by itself, you can use

ocr4all-pixel-classifier predict --load $PATH_TO_MODEL \
  --binary $PATH_TO_IMAGE --images $PATH_TO_IMAGE \
  --color_map color_map.json --char_height $XHEIGHT --output out

with color_map.json containing:

{"(255, 255, 255)": [0, "none"], "(255, 0, 0)": [1, "text"], "(0, 255, 0)": [2, "image"]}

Which should run the pixel classifier and write three subfolders to out/ with color being the CNN output (red = text, green = image, white = neither). overlay contains the output combined with the input image for visualization. inverted is also a combination of output and input, where all background pixels are black and foreground pixels are the color of their classification.

from ocrd-pixelclassifier-segmentation.

wrznr avatar wrznr commented on September 27, 2024

@crater2150 Thank you! I will test this asap. But two questions pop up:

  1. Does this mean the pixel classifier is only able to distinguish text from image and non-text (@cneud)?
  2. Did the test ever ran successfully in your environment? If yes, what was the configuration? If not, why do you provide at all?

from ocrd-pixelclassifier-segmentation.

crater2150 avatar crater2150 commented on September 27, 2024
  1. The current model can only distinguish these categories, but the pixel classifier can in theory be trained with more classes (you can set the number of classes during training and provide a json file like the one above defining which color represents which class in the mask). As the results with the current training data were worse (text often not being detected), we currently only have this model.
  2. I tried the tool locally with files read from disk, and wrote the test to check if the ocrd-interface could read files from the workspace and pass them on, but did not check the results, so I missed the difference in file reading. Sorry.

from ocrd-pixelclassifier-segmentation.

wrznr avatar wrznr commented on September 27, 2024

@crater2150 Many thanks for the information! It would be very helpful if you could document the creation of your current model on a step-by-step basis for example as a Gist. We could then extend the model more easily as soon as more and more reliable training data become available.

from ocrd-pixelclassifier-segmentation.

crater2150 avatar crater2150 commented on September 27, 2024

Ok, I will create a gist with the model creation tomorrow.

I fixed some bugs in the segmentation that caused no output to be produced with the ocrd-wrapper. But for one of the two images, no output segments are produced even with the fixes, while the other one works. I have a suspicion about the reason (based on the CNN output, only a single segment should be created, which maybe leads to problems in the XYCut implementation), but will have to check.

from ocrd-pixelclassifier-segmentation.

crater2150 avatar crater2150 commented on September 27, 2024

@wrznr I added examples for dataset preparation and training to the pixel classifier repositorty, after adding and polishing the tool for mask generation.

from ocrd-pixelclassifier-segmentation.

crater2150 avatar crater2150 commented on September 27, 2024

@wrznr as I mentioned before, the test should now output something, it just fails in the case of only a single region being found on the page.
As I'm currently sick, I can't say for sure if I can fix that issue before the workshop, sorry :(

from ocrd-pixelclassifier-segmentation.

crater2150 avatar crater2150 commented on September 27, 2024

It seems that the model file shipped in this repo was broken. I replaced it with another one (trained on 9137 pages from DTA-2). This does produce a segmentation for me with make test-cli and a fresh environment. Can you retry it?

from ocrd-pixelclassifier-segmentation.

VolkerHartmann avatar VolkerHartmann commented on September 27, 2024

Now segmentation produce results. 👍
Attention: Will not work in combination with the dewarp of DFKI.

from ocrd-pixelclassifier-segmentation.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.