Code Monkey home page Code Monkey logo

videotext's Introduction

Video Text Detection and Recognition

This is an implementation of an end to end pipeline to detect and recognize text from youtube videos. The text detection is based on SSD: Single Shot MultiBox Detector retrained on single text class using Coco-Text dataset and text recognition is based on Convolutional Recurrent Neural Network as described in An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Slide deck

Please see Demo notebook as a starting point. Use it to provide your youtube url to either:

  1. Get text detection/recognition results in JSON format (or)
  2. Generate a new video with overlayed bounding boxes for all text and their respective transcriptions.

Requirements

All requirements are captured in the requirements.txt. Please switch to your virtual environment based on your preferences and install them (pip install -r requirements.txt)

Directory structure:

  • Demo.ipynb: Demo notebook as described above
  • videotext.py: Main entry point which connects various pieces of the pipeline.
  • detection.py and detection: detection.py abstracts detection functionality. See detection section for more details.
  • recognition.py and crnn.pytorch: recognition.py abstracts recognition functionality. See recognition section for more details
  • utilities.py: Holds all other helper functions required for E2E video text detection and recognition.
  • data_explore_eval: Contains utilities specific to various datasets and evaluation scripts. Also contains scripts to generate submissions for ICDAR17 - Robust reading competition on Coco-text

Detection

Our detection model is based on Tensorflow's object detection models and the detection model zoo

We transfer learn on Mobile SSD network. The original network was trained on coco dataset (natural objects) for detection task. We retrain the network for text detection (single class) using Coco-Text dataset.

Inference

detection.py loads frozen Tensorflow inference graph and runs inference for our data.

Training

Please follow instructions provided by Tensorflow's object detection along with scripts and configs provided in detection/ folder.

Model configs

We have also experimented with faster-RCNN pretrained on Coco, for which we provide the config file as well.

  1. ssd_mobilenet_v1_coco.config
  2. faster_rcnn_resnet101_pets_coco.config

Class definition

See text.pbtxt

Generate TFRecords

Script used to generate TF records for use with this model is at coco-text/Coco-Text%20to%20TFRecords.ipynb

Recognition

We leverage Convolutional Recurrent Network for recognition purposes.

Inference

recognition.py holds helper functions for recognition task. Loads Convolutional Recurrent Network weights file and runs inference. It is adapted from caffe implementation from paper authors Shi etal and pytorch implementation by @meijieru model. See crnn.pytorch folder for more details. Please see the original implementation for training instructions.

Web server

We have a basic web server serving video analysis requests. To start, execute following in this directory:

$ python flask_server.py

Files:

  • flask_server.py - Contains basic flask app
  • templates - Contains html for flask app
  • static - Holds demo videos

Data Explore and evaluation

See directory data_explore_eval/

Coco-text:

  • coco-text: Helper functions to work with Coco-Text data. Also contains Coco-Text Preparation notebook to translate coco-text to TFRecord to use with Tensorflow detection model.
  • Eval_Coco_text_val_set.ipynb: Contains code to evaluate our model on coco-text benchmark

SynthText:

  • synth_utils.py: Helper script to prepare SynthText data
  • SynthText Data Preparation notebook[In progress]: Scripts to translate Synthetext data to TFrecord to be used with Tensorflow detection model

Also contains script to generate submissions for ICDAR17 and run evaluations offline

Unit Tests

Unit testing for video functionality is added. More tests need to be added. To run them:

$ python -m pytest test_utilities.py

Assets

Download weights from Google drive and put it into a folder named weights/

videotext's People

Contributors

sravya8 avatar

Watchers

James Cloos avatar  avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.