Code Monkey home page Code Monkey logo

imagetextrecognition's Introduction

Text Recognition in Natural Images in Python

This repository contains the code for the OCR Project I'm working on as part of Data Science Retreat (Berlin).

The idea is to be able to get as input an image (i.e. picture taken with phone) from a user and process it in order to return the text contained in it.

The file you should look at first is main.py. This file instantiates classes and calls appropriate classes methods.

Class UserData (contained in userimageski.py) is instantiated passing to the constructor the image filename to be processed.

Class Cifar (contained in cifar.py) is instantiated passing a config file (with all the parameters) to the constructor. Specifically cifar-config.py, which is used in the merge-with-cifar method inside OcrData class in order to build the dataset to perform the text/no-text classification. The data mentioned inside cifar-config.py was too big to be uploaded on Github but it is available on CIFAR-10 Kaggle Competition.

Class OcrData (contained in data.py) is instantiated passing a config file (with all the parameters) to the constructor. Specifically ocr-config.py and text-config.py are both used in two different contextes. The first one is called to perform the machine learning pipeline on character images. The second one is called only once inside the merge-with-cifar method inside OcrData class in order to build the dataset to perform the text/no-text classification.

The data used for this project is the Chars74K dataset which can be found here.

A complete explanation of the work can be found on my website.


Ideas for improvement:

  1. Get rid of nested rectangles in object detection. Solves the problem of detecting a circle (classified as an 'o') inside an 'a'.

  2. Manually labeling objects containing or not containing text. It is possible to add a wait_for_key during the object detection phase and as soon as a rectangle is identified manually specify if it's text or not. For example a tree may be miscassified as text and then classified as a T. Manual detection is very time consuming and before diving into that it is necessary to analyze the pipeline and be sure that it is worth doing it.

  3. Introduce as a final step a 'Guess Missing Text Phase' to correct little mistakes. For example if in the end we should detect the word 'house' but we identify 'hous', well of course that's a house!

  4. Implement Neural Network and Deep Learning.

imagetextrecognition's People

Contributors

frapochetti avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.