Deep Learning Handwriting Recognition

A full stack React/JavaScript and Python/Django web application that recognizes handwriting and converts it into text, by incorporating multiple machine learning models that were pre-trained using the EMNIST Dataset on Kaggle. These neural network models recognize all digits, all uppercase letters, and all lowercase letters that are visibly different from their uppercase counterparts.

The models were trained on the following characters: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabdefghnqrt

To account for these "left out" lowercase letters that look like their uppercase complement, the final prediction for these characters are converted into lowercase if the character is drawn less than half the height of the canvas. For "tall" versions of these lowercase characters, klpy, these characters will be converted into lowercase if their heights are less than 70% of the canvas height.

The best independent model used inside of this application is more accurate than the rest of the models created by Kaggle users who use Tensorflow/Keras. To extend onto this - when this model, a similar model, and 3 other sub-optimal models (due to Heroku limitations) are combined, accuracy increases another 0.5%.

The Neural Network Models: Jupyter Notebook

The Jupyter Notebook inside this repo describes how the neural network models were created for this web application. It goes step by step: from acquiring the outside dataset for learning to Heroku deployment.

Demo

Features

The following characters can be predicted from handwriting: 0-9, a-z, A-Z (62 characters)
Characters can be placed anywhere on the canvas, providing that the character has some horizontal space between other characters
Whole sentences can be created
"Broken" and "messy" letters can be detected with pretty good accuracy
React as the frontend

Website: Live Heroku App

How the Incoming Data is Fed Into The Models

Example: A user writes and submits the handwriting, "Hey you", on the client.
The frontend takes the image data found in the canvas element and converts it into a binary blob.
The blob is sent as a POST request to Django.
The image is saved in Django and the filepath is loaded into cv2.
The entire "Hey you" image is trimmed of excess pixels.
"Hey you" is cut up on each character giving us the 6 images "H", "e", "y", "y", "o", "u".
- Images are cut up where drawing lines in the x-direction are not continuous, and where the space of discontinuity is of a decent size. Small discontinuous spaces are left alone.
- The algorithm will notice a very large discontinuous space in the x-direction between the two "y" letters, which is implied to be a text-space. We will store this knowledge in the variable space_location.
Each image is trimmed of excess pixels. The height of each "raw" image is accounted for in the variable char_img_heights.
Each image is padded with extra pixels in a way where the image becomes a square shape. This is so that the image will not be warped when the image is resized down during data normalization.
Each image is normalized. Each image is converted to a numpy array, reshaped, and the pixel values range from 0 to 1 instead of 0 to 255.
We loop through all of these images - each model makes a prediction at each image. The most popular prediction between the models will be added to the final character result, final_prediction.
- Each model prediction for each image will be an output of a number between 0 through 46 which corresponds to the index of the 47 characters that each model was trained on. (Ex: an output of 17 corresponds to H in the mapping).
- The prediction of each model is mapped and compared with the model group.
- The most popular prediction between the models in the group will be the final prediction.
- If the final prediction between the models is alphabetical, we make sure that the lowercase compliment is found inside of the mapping. If it is not, that means we have a letter where the lower and uppercase are similar, the only difference is the size. We need to make a decision on the output casing based on the size of the image, which we get from char_img_heights. This decision will be performed on the images "y", "y", "o" and "u". The letter "y" gets a special constraint because its height is larger than the average lowercase letter.
- While iterating, if the number of loop iterations equals a number inside space_location, a " " is appended to the final result. In this example, space_location will have [2] signaling that there's a space after "y" - which will give us a "Hey " at the end of the first "y" iteration.
Django responds with final_prediction to React with "Hey you", and React displays the result on the client.

My Views on Hard-Coded Prediction Tweaks

After a prediction has been decided by the neural network, I personally try to be as hands-off as possible when it comes to manipulating these results.

The current prediction manipulations I use are:

Convert characters to lowercase if a letter is both small, and the lowercase complement of the prediction is not found in the EMNIST dataset.
- The reason for this manipulation is to have access to all lowercase letters as predictions.
If a prediction is 0 and the character is drawn quite small, the prediction is manipulated to a lowercase o
- The reason for this manipulation is so that a small 0 will be read as an o, much like the manipulation of uppercase O
- At this time, I am still hesitant on keeping this manipulation.

If Manipulations Are Your Thing

I left in commented code where, if either characters 0 or O were predicted, the final prediction is dependent on the ratio of height/width of the character image. If a user writes a fat circle, the result will be a capital or lowercase O; if a user writes a narrow circle, the result will be the number 0.

For determining "i" vs "I" (another issue with the EMNIST dataset), one could cook up some code during the cv portion and determine if a character has a hovering dot. One could do a better height estimate for casing by taking the total character height and negating the space between the dot and the base of the "i".

Installation

Clone the repo: git clone https://github.com/MikeM711/Deep-Learning-Handwriting-Recognition.git
Go into the root file: cd Deep-Learning-Handwriting-Recognition
Install npm packages for React: npm install
Make sure you have pipenv installed via pip: sudo -H pip install pipenv
Create a shell inside a virtual environment, at the address of your root: pipenv shell
Install packages for Django while inside your virtual environment: pip install -r requirements.txt
Run the frontend server: npm start
Run backend server within your virtual environment: python manage.py runserver

Toubleshooting

Q: "How do I know that I am in my virtual environment?"
A: In your terminal tab, you will notice that the address of the folder is in parenthesis. It should look like (Deep-Learning-Handwriting-Recognition)...

opencv-python-headless==4.1.2.30 changed and error in pip install req

Collecting absl-py==0.8.1
  Using cached absl-py-0.8.1.tar.gz (103 kB)
Collecting astor==0.8.0
  Using cached astor-0.8.0-py2.py3-none-any.whl (27 kB)
Collecting cachetools==3.1.1
  Using cached cachetools-3.1.1-py2.py3-none-any.whl (11 kB)
Collecting certifi==2019.9.11
  Using cached certifi-2019.9.11-py2.py3-none-any.whl (154 kB)
Requirement already satisfied: chardet==3.0.4 in /usr/lib/python3/dist-packages (from -r requirements.txt (line 5)) (3.0.4)
Collecting cycler==0.10.0
  Using cached cycler-0.10.0-py2.py3-none-any.whl (6.5 kB)
Collecting dj-database-url==0.5.0
  Using cached dj_database_url-0.5.0-py2.py3-none-any.whl (5.5 kB)
Collecting Django==2.2.7
  Using cached Django-2.2.7-py3-none-any.whl (7.5 MB)
Collecting django-cors-headers==3.2.0
  Using cached django_cors_headers-3.2.0-py3-none-any.whl (14 kB)
Collecting django-heroku==0.3.1
  Using cached django_heroku-0.3.1-py2.py3-none-any.whl (6.2 kB)
Collecting djangorestframework==3.10.3
  Using cached djangorestframework-3.10.3-py3-none-any.whl (909 kB)
Collecting gast==0.2.2
  Using cached gast-0.2.2.tar.gz (10 kB)
Collecting google-auth==1.7.1
  Using cached google_auth-1.7.1-py2.py3-none-any.whl (74 kB)
Collecting google-auth-oauthlib==0.4.1
  Using cached google_auth_oauthlib-0.4.1-py2.py3-none-any.whl (18 kB)
Collecting google-pasta==0.1.8
  Using cached google_pasta-0.1.8-py3-none-any.whl (57 kB)
Collecting grpcio==1.25.0
  Using cached grpcio-1.25.0-cp38-cp38-manylinux2010_x86_64.whl (2.4 MB)
Collecting gunicorn==20.0.0
  Using cached gunicorn-20.0.0-py2.py3-none-any.whl (77 kB)
Collecting h5py==2.10.0
  Using cached h5py-2.10.0-cp38-cp38-manylinux1_x86_64.whl (2.9 MB)
Requirement already satisfied: idna==2.8 in /usr/lib/python3/dist-packages (from -r requirements.txt (line 19)) (2.8)
Collecting Keras-Applications==1.0.8
  Using cached Keras_Applications-1.0.8-py3-none-any.whl (50 kB)
Collecting Keras-Preprocessing==1.1.0
  Using cached Keras_Preprocessing-1.1.0-py2.py3-none-any.whl (41 kB)
Collecting kiwisolver==1.1.0
  Using cached kiwisolver-1.1.0-cp38-cp38-manylinux1_x86_64.whl (91 kB)
Collecting Markdown==3.1.1
  Using cached Markdown-3.1.1-py2.py3-none-any.whl (87 kB)
Collecting matplotlib==3.1.1
  Using cached matplotlib-3.1.1.tar.gz (37.8 MB)
Collecting numpy==1.17.4
  Using cached numpy-1.17.4-cp38-cp38-manylinux1_x86_64.whl (20.5 MB)
Requirement already satisfied: oauthlib==3.1.0 in /usr/lib/python3/dist-packages (from -r requirements.txt (line 26)) (3.1.0)
Collecting opencv-python-headless==4.1.2.30
  Downloading opencv_python_headless-4.1.2.30-cp38-cp38-manylinux1_x86_64.whl (21.8 MB)
     |████████████████████████████████| 21.8 MB 22.5 MB/s
Collecting opt-einsum==3.1.0
  Downloading opt_einsum-3.1.0.tar.gz (69 kB)
     |████████████████████████████████| 69 kB 5.9 MB/s
Collecting Pillow==6.2.1
  Downloading Pillow-6.2.1-cp38-cp38-manylinux1_x86_64.whl (2.1 MB)
     |████████████████████████████████| 2.1 MB 42.6 MB/s
Collecting protobuf==3.10.0
  Downloading protobuf-3.10.0-py2.py3-none-any.whl (434 kB)
     |████████████████████████████████| 434 kB 51.0 MB/s
Collecting psycopg2==2.8.4
  Downloading psycopg2-2.8.4.tar.gz (377 kB)
     |████████████████████████████████| 377 kB 45.3 MB/s
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-_do5lo_1/psycopg2/setup.py'"'"'; __file__='"'"'/tmp/pip-install-_do5lo_1/psycopg2/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-_do5lo_1/psycopg2/pip-egg-info
         cwd: /tmp/pip-install-_do5lo_1/psycopg2/
    Complete output (23 lines):
    running egg_info
    creating /tmp/pip-install-_do5lo_1/psycopg2/pip-egg-info/psycopg2.egg-info
    writing /tmp/pip-install-_do5lo_1/psycopg2/pip-egg-info/psycopg2.egg-info/PKG-INFO
    writing dependency_links to /tmp/pip-install-_do5lo_1/psycopg2/pip-egg-info/psycopg2.egg-info/dependency_links.txt
    writing top-level names to /tmp/pip-install-_do5lo_1/psycopg2/pip-egg-info/psycopg2.egg-info/top_level.txt
    writing manifest file '/tmp/pip-install-_do5lo_1/psycopg2/pip-egg-info/psycopg2.egg-info/SOURCES.txt'

    Error: pg_config executable not found.

    pg_config is required to build psycopg2 from source.  Please add the directory
    containing pg_config to the $PATH or specify the full executable path with the
    option:

        python setup.py build_ext --pg-config /path/to/pg_config build ...

    or with the pg_config option in 'setup.cfg'.

    If you prefer to avoid building psycopg2 from source, please install the PyPI
    'psycopg2-binary' package instead.

    For further information please check the 'doc/src/install.rst' file (also at
    <http://initd.org/psycopg/docs/install.html>).

    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

mikem711 / deep-learning-handwriting-recognition Goto Github PK