Persian_handwriting_recognition

This projects is inspired by the final projcet of the course Foundations of Computer Vision Course. It contains procedures for data collection, preprocessing, annotation, training, and prediction and uses computer vision and machine learning techniques.

Introduction

The goal of this project is to first, extract student's id, first name and last name from a handwritten form, and then classify each letter and number with a neural network model that has been trained based on a collected dataset from Persian handwritten letters and numbers. The dataset is collected by students in the course, and it has been preprocessed and annotated.

The form should be similar to Figure 1. As illustrated, the form in the picture can be in any orientation.

Figure 1 - A sample of the test form.

The output should be similar to the text below:

Form: test_sample.jpg
Student ID: 01234567
First name: ص د ف
Last name: ن ظ ر ی

Setup

Having `Ubuntu 20.04` or higher (recommended)
Having Minimum 10 GB RAM for training
Having `conda` installed and ready to be used
Setting up an environment:
```
conda create --name phr python==3.8.0
```
Activating the environment:
```
conda activate phr
```
Installing the dependencies
```
pip install -r requirements.txt
```

Data collection

The original dataset is taken from the one that is collected by students in the course, which can be found in the official website of the course. However, in this repository, some modifications are applied to the dataset and some images that were not collected according to the instructions were removed. We recommend to use this dataset can be found in this link as data/01_raw.zip. It should be unzipped and put in data/01_raw directory.

The dataset contains a set of images, taken from forms that can be found in assets/dataset_form_a5.pdf. Each form contains 4 Aruco markers and a number of cells which should be filled with persian handwritten numbers and letters. These forms have 2 types that have the following order: Type 'a': number '0', '1', the first part of the persian alphabet, '2', and '3'. Type 'b': number '4', '5', the second part of the persian alphabet, '6', '7, '8', and'9'.

In case of a need to collect a new dataset, or add more data to the current dataset, the following instructions must be followed:

Form should be printed, prefrably in A5 size.
The orientation of the arucos should be the same as a sample shown in assets/dataset_sample1.jpg.
Each row is dedicated to either a number or a letter.
Each row should be filled with the determined order of the specific type of form. It should be either the same as assets/dataset_sample1.jpg (type a) or assets/dataset_sample2.jpg (type b).
Image should be taken in format 'jpg'. While the orientation of the arucos should be fixed, the form it self can be in custome distance and orientation with respect to the camera.
The image should be stored in the data/01_raw.
(pre_labeling step): Each form will be separated manually in data/02_splitted/a or data/02_splitted/b based on their type.

Data preprocessing

The objective of the data preprocessing is to first, extract each cell from each form, and store each cell in a folder that represents the class of that cell (0 to 9 for the numbers and 10 to 42 for the letters). This is the labeling process. Shuffling the labeled dataset and split it to train, val, and test set is done afterwards.

Pre-labelling: The first step would be a pre-labeling process which has been already done can be directly used. The purpose of this step is to separate the forms based on their types. The separation is needed because we perform labelling with a script to avoid consuming time. In the original dataset, the images are stored altogether, and this separation should be done manually. The separated dataset can be found in this link as data/02_splitted.zip. It should be unzipped and stored in /data/02_splitted directory.

Labelling: The labeling can be done by running src/data_preprocessing.py script. The labeled data can be found in this link as data/03_labeled.zip in case one would like to skip executing the script. The script executes the following steps:

Makes the neccessary empty directories in the data/03_labeled directory. These directories will later represent the class of each data.
Reads image
Detects the Aruco markers, and drops the ones that their markers cannot be detected
Applies perspective transformation on the image so that the orientation of the form will be fixed for all images
Resize the images to a determined size in order to have the same size for all the images
Extracts each cell, resizes them to a determined size, and store them to their correspondent folder in the data/03_labeled directory (labeling process)

The last step would be to shuffle the dataset and create the train, val, and test set based on the given ratio in the config file. The output can be found in this link as data/04_final.zip.

For running the preprocessing follow these steps:

Put the splitted dataset into the proper directory (default is the data folder)
Add the relative path of the splitted, labeled, and final into the config file, which is in config/config.yaml.
If the config file is different than config/config.yaml pass its path in the src/data_preprocessimg.py script.

Execute the script in the project directory:

 cd Persian_handwriting_recognition/
 python3 src/data_preprocessing.py

Model training

After the data preprocessing and labelling, the dataset is ready be used for training. The training procedure is done with the src/train_models.py script. With this script, two models will be trained: One for numbers and the other for letters.

First step for is to create data_generators for train, val, and test data and do a bit of data augmentation on the data for numbers and letters separately. Then, the model is built and compiled. The trained models are stored in models/trained_model_numbers.h5 and models/trained_model_letters.h5. Then, the evaluation is done and the accuracy and loss on the test model is calculated. If one wants to skip the training, the model can be found in this link as models/train_model.zip.

Here are the summary of the training process:

For numbers:

train loss: 0.0493 - train accuracy: 0.9824

val loss: 0.0516 - val accuracy: 0.9871

test loss: 0.0320 - test accuracy: 0.9894

For letters:

train loss: 0.2716 - train accuracy: 0.9025

val loss: 0.1903 - val accuracy: 0.9316

test loss: 0.2276 - test accuracy: 0.9177

Inferance and prediction

After the training procedure, the model is ready to be used for inference and prediction. The src/prediction.py script does the prediction for images that are extracted from test forms.

A test form sample can be seen in assets/test.jpg. First the cells should be extracted from it (quite same procedure as src/data_preprocessing.py) with src/extract_data_from_test_form.py. The test forms should be put in data/test_forms/forms and the script will save the restuls in data/test_forms/extracted under a folder that represents the name of the form.

The objective would be to predict the student ID, first name, last name and the degree of the student based on the given form.

TODO

Refactoring the code
Improving the accuracy

sadafnazari / persian_handwriting_recognition Goto Github PK