Code Monkey home page Code Monkey logo

mangatextdetection's Introduction

MangaTextDetection

Experiments in text localization and detection in raw manga scans. Mostly using OpenCV python API.

Overview

This repository holds some experiments I did in summer 2013 during a sudden interest in text detection in images. It uses some standard techniques (run length smoothing, connected component analysis) and some experimental stuff. Overall, I was able to get in the neighborhood of where I wanted to be, but the results are very processing intensive and not terribly reliable.

State

I haven't bothered to form this into a python library. It's just a series of scripts each trying out various things, such as:

  • Isolating bounding boxes for text areas on a raw manga page.
  • Identifying ares of furigana text (pronunciation guide, which can screw up OCR) in text bounding boxes.
  • Preparing identified text areas for basic OCR.

Text Location Example

Here's an example run of a page from Weekly Young Magazine #31 2013. The input image is as follows (jpg). Input image

An initial estimate of text locations can be found by the 'LocateText.py' script:

 ../LocateText.py '週刊ヤングマガジン31号194.jpg' -o 194_text_locations.png

With the results as follows (estimated text marked with red boxes):

locate text output

Note that in the output above you see several of the implementation deficiencies. For example, there are several small false positives scattered around, and some major false positives on the girl's sleeve and eyes in panels 2 and 3. Also note that many large areas of text were not detected (false negatives). Despite how pleased I was with the results (and I was more pleased than you could possibly believe) significant improvements are needed.

Text Segmentation Example

To more easily separate text from background you can also segment the image, with text areas and non text being separated into different (RGB) color channels. This easily allows you to remove estimated text from image entirely or vice-versa. Use the command:

./segmentation.py '週刊ヤングマガジン31号194.jpg' -o 194_segmentation.png

The results follow:

Input image

OCR and Html Generation

I did take the time to run simple OCR on some of the located text regions, with mixed results. I used the python tesseract package (pytesser) but found the results were not generally good for vertical text, among other issues. The script ocr.py should run ocr on detected text regions, and output the results to the command line.

../ocr.py '週刊ヤングマガジン31号194.jpg'
Test blob assigned to no row on pass 2
Test blob assigned to no row on pass 3
0,0 1294x2020 71% :ぅん'・ 結局
玉子かけご飯が
一 番ぉぃしぃと

从
胤
赫
囃
包
け
H」
の
も
側
鵬

はフィクショ穴ぁり、 登場する人物

※この物語

You can see some fragmented positives, but in all the results for this page are abysmal.

I also embedded those results in an HTML output, allowing "readers" to hover on Japanese Text, revealing the OCR output, which can be edited/copied/pasted. This is via the script MangaDetectText. A (more successful) example of this can be seen below:

locate text output

Dependencies

You should be able to install most of the dependencies via pip, or you could use your operating systems package manager (e.g. Mac OS X http://brew.sh/)

Python 2.7+

https://www.python.org/

Install as per OS instructions.

Pip

http://pip.readthedocs.org/en/latest/index.html

Install as per OS instructions.

Numpy

http://www.numpy.org/

pip install numpy

Scipy

http://www.scipy.org/index.html

pip install scipy

Matplotlib (contains PyLab)

http://matplotlib.org/

pip install matplotlib

Pillow

http://pillow.readthedocs.org/en/latest/

pip install Pillow

OpenCV

http://opencv.org/

Install as per OS instructions, this should also include the python bindings.

Tesseract

https://code.google.com/p/tesseract-ocr/

Install as per OS instructions, then use pip to install the python bindings. Don't forget to include your target language's trained data sets.

pip install python-tesseract

mangatextdetection's People

Contributors

johnoneil avatar on-three avatar baerrach avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.