Code Monkey home page Code Monkey logo

neerajj9 / text-detection-using-yolo-algorithm-in-keras-tensorflow Goto Github PK

View Code? Open in Web Editor NEW
146.0 5.0 58.0 3.36 MB

Implemented the YOLO algorithm for scene text detection in keras-tensorflow (No object detection API used) The code can be tweaked to train for a different object detection task using YOLO.

License: MIT License

Jupyter Notebook 99.50% Python 0.50%
text-detection object-detection yolo yolov2 python keras keras-tensorflow tensorflow scene-text-detection deep-learning

text-detection-using-yolo-algorithm-in-keras-tensorflow's Introduction

Text-Detection-using-Yolo-Algorithm-in-keras-tensorflow

First step towards building an efficient OCR system is to find out the specific text locations. Implemented the YOLO ( You Only Look Once ) algorithm from scratch (no object detection API used) for the specific task of Scene Text Detection in python using keras and tensorflow.

Data :

The dataset used is ICDAR competetion dataset available here : Drive Link
Train images = 376
Validation images = 115

Preprocessing :

The Preprocess.py file handles all the necessary preprocessing and saves the data in the form of numpy arrays. First, the images are resized to (512,512) dimensions. Accordingly, the ground truth of the boxes is modified as well. All the images are normalized to a range of [-1 , 1]. The ground truth coordinates are processed to form a matrix of dimensions as ( grid height , grid width , 1 , 5 ).

Custom data :

Necessary changes need to be done in the Preprocess.py file to input the custom data and images, to create the appropriate input and output matrices.

Model :

MobileNetv2 architecture is used as a feture extractor. The main reason for choosing MobileNetv2 is the high accuracy and the less number of weights. The fully connected layers of MobileNet are removed. Three Conv layers are added to the last layer of the MobileNet architecture to output a shape of (grid height , grid width , 1 , 5 ). The model weights can be found here : Drive Link

Loss Function and Training:

The loss function implemented is the one specified in the YOLO paper. As there is only one class to be predicted, the contribution of class predictions to the loss is eliminated.

The model is trained for 180 epochs in total with a batch size of 4. The learning rate was kept at 0.001 for the first 100 epochs and lowered to 0.0001 for the next 80 epochs.

Inference :

The Utils.py consists of the functions used to convert the matrix output ( grid height , grid width , 1 , 5 ) of the model to actual predicted bounding boxes. Non max suppression is used to eliminate boxes on the same object as stated in the YOLO paper.

Results :

alt text
alt text
alt text

Requirements :

  1. Keras : 2.2.2
  2. Tensorflow : 1.9.0
  3. OpenCV : 3.4.1
  4. Numpy : 1.14.3
  5. Matplotlib : 2.2.2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.