Code Monkey home page Code Monkey logo

worddetector's Introduction

Word Segmentation with Scale Space Technique

Update 2021: installable Python package, added line clustering and word sorting

Implementation of the scale space technique for word segmentation proposed by R. Manmatha and N. Srimal. Even though the paper is from 1999, the method still achieves good results, is fast, and has a simple implementation. The algorithm takes an image containing words as input and outputs the detected words. Optionally, the words are sorted according to reading order (top to bottom, left to right).

example

Installation

  • Go to the root level of the repository
  • Execute pip install .
  • Go to tests/ and execute pytest to check if installation worked

Usage

This example loads an image of a text line, prepares it for the detector (1), detects words (2), sorts them (3), and finally shows the cropped words (4).

from word_detector import prepare_img, detect, sort_line
import matplotlib.pyplot as plt
import cv2

# (1) prepare image:
# (1a) convert to grayscale
# (1b) scale to specified height because algorithm is not scale-invariant
img = prepare_img(cv2.imread('data/line/0.png'), 50)

# (2) detect words in image
detections = detect(img,
                    kernel_size=25,
                    sigma=11,
                    theta=7,
                    min_area=100)

# (3) sort words in line
line = sort_line(detections)[0]

# (4) show word images
plt.subplot(len(line), 1, 1)
plt.imshow(img, cmap='gray')
for i, word in enumerate(line):
  print(word.bbox)
  plt.subplot(len(line), 1, i + 2)
  plt.imshow(word.img, cmap='gray')
plt.show()

The repository contains some examples showing how to use the package:

  • Install requirements: pip install -r requirements.txt
  • Go to examples/
  • Run python main.py to detect words in line images (IAM dataset)
  • Or, run python main.py --data ../data/page --img_height 1000 --theta 5 to run the detector on an image of a page (also from IAM dataset)

The package contains the following functions:

  • prepare_img: prepares input image for detector
  • detect: detect words in image
  • sort_line: sort words in a (single) line
  • sort_multiline: cluster words into lines, then sort each line separately

For more details on the functions and their parameters use help(function_name), e.g. help(detect).

Algorithm

The illustration below shows how the algorithm works:

  • top left: input image
  • top right: apply filter to the image
  • bottom left: threshold filtered image
  • bottom right: compute bounding boxes

illustration

The filter kernel with size=25, sigma=5 and theta=3 is shown below on the left. It models the typical shape of a word, with the width larger than the height (in this case by a factor of 3). On the right the frequency response is shown (DFT of size 100x100). The filter is in fact a low-pass, with different cut-off frequencies in x and y direction. kernel

How to select parameters

  • The algorithm is not scale-invariant
    • The default parameters give good results for a text height of 25-50 pixels
    • If working with lines, resize the image to 50 pixels height
    • If working with pages, resize the image so that the words have a height of 25-50 pixels
  • The sigma parameter controls the width of the Gaussian function (standard deviation) along the x-direction. Small values might lead to multiply detection per word (over-segmentation), while large values might lead to a detection containing multiple words (under-segmentation)
  • The kernel size depends on the sigma parameter and should be chosen large enough to contain as much of the non-zero kernel values as possible
  • The average aspect ratio (width/height) of the words to be detected is a good initial guess for the theta parameter

The best way to find the optimal parameters is to use a dataset (e.g. IAM) and optimize the parameters w.r.t. some evaluation metric (e.g. intersection over union).

Results

This algorithm gives good results on datasets with large inter-word-distances and small intra-word-distances like IAM. However, for historical datasets like Bentham or Ratsprotokolle results are not very good and more complex approaches should be preferred (e.g., a neural network based approach as implemented in the WordDetectorNN repository).

worddetector's People

Contributors

atsju avatar githubharald avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

worddetector's Issues

Add requirements ?

Hi :)
I am going to test the software right now and just saw that there are not precise requirements.txt (or equivalent) for dependencies. I think it would be great to make reuse easier :)

sort_multiline could be improved

image

I think that people handwriting will drift randomly but the end of previous word will be about same height as next word. Also words cannot overlap too much in X.
Thus the line splitter should compare Y position of words that are near each other in X. The end of previous line not beeing necessarly compared with begining of previous one.

regarding using the image on arbitary pages

Hello,as mentioned in readme,words should have height in the range of 25-50 pixels.So I am assuming that before feeding the image,it needs to be resized.Any ideas on how to do this.For example:
This is from IAM Bern and unfortunately using the current settings the algo fails
It also fails if I crop a line from my page and feed it.
The line img . page image and the output is enclosed.
6
a01-000x
line_output

Segmentation from page

Hello. Thank you for the repository. I have just one question. How can I segment words from a big page, not from one line. Thanks.

pip install error with sklearn

sklearn was causing a pip install error for me. I replaced 'sklearn' with 'scikit-learn' in setup.py to fix it.

Error is as follows:

Collecting sklearn (from word-detector==1.0.0)
Using cached sklearn-0.0.post4.tar.gz (3.6 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [10 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\taylor.clark\AppData\Local\Temp\pip-install-wbsdua8u\sklearn_20f10a8c07054379807f2e8ea39eed76\setup.py", line 10, in
LONG_DESCRIPTION = f.read()
^^^^^^^^
File "C:\Program Files\Python311\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 7: character maps to
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Not working properly

I have tried it with a scanned image and it is not working properly, the output result is too small that it can't see properly.
Also, it is not working for multiple lines.

Dll error

I am getting this error while running python main.py

Traceback (most recent call last):
File "C:\Users\ELCOT\Downloads\WordDetector-master\examples\main.py", line 5, in
import matplotlib.pyplot as plt
File "C:\Users\ELCOT\AppData\Local\Programs\Python\Python312\Lib\site-packages\matplotlib_init_.py", line 272, in
check_versions()
File "C:\Users\ELCOT\AppData\Local\Programs\Python\Python312\Lib\site-packages\matplotlib_init
.py", line 266, in check_versions
module = importlib.import_module(modname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ELCOT\AppData\Local\Programs\Python\Python312\Lib\importlib_init
.py", line 90, in import_module
return _bootstrap.gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ELCOT\AppData\Local\Programs\Python\Python312\Lib\site-packages\kiwisolver_init
.py", line 8, in
from ._cext import (
ImportError: DLL load failed while importing _cext: The specified module could not be found.

No module named 'word_detector'

Iam student working on python, was trying with WordDetector, how do i fix this.

from word_detector import prepare_img, detect, sort_line
ModuleNotFoundError: No module named 'word_detector'

Printing detected words

After setting up the project, i realised that it does not print the detected words, it only show them on an image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.