Code Monkey home page Code Monkey logo

ocr-autograder's Introduction

ocr-autograder

An OCR-based autograder for handwritten free-response answers using TensorFlow and OpenCV.

Motivation

Hand-grading free-response questions is a daunting task for many teachers. Current technology allows for autograding of multiple-choice questions (e.g., Illuminate has built-in scoring and gradebook update capabilities). However, OCR-based handwriting recognition is not yet readily available to the public. This tool aims to alleviate the grading burden for teachers.

Table of contents

Overview

Inspiration

Current technology is proprietary. I aim to create a lighter-weight, opensource tool with similar functionality.

  1. Illuminate Education software integrates OCR autograding and gradebook updating for multiple choice tests. As a former teacher, this tool was invaluable. It is the primary inspiration for my project.
    illuminate
  2. Microsoft has powerful on-the-fly penstroke capturing software, Windows Ink, which parses handwritten digits and symbols into mathematical expressions.
    microsoft

Product Design

This details the backend product design. View the current web app prototype.

General Detailed
general-design detailed-design

Data Sources

Detailed instructions for obtaining data is provided here.

Name Description Usage Resource
MNIST Well-known repository for handwritten digits Training http://yann.lecun.com/exdb/mnist/
HASYv2 Over 150,000 handwritten characters (including LaTeX mathematical symbols) Training https://zenodo.org/record/259444
Kensanata Over 16,000 labeled handwritten digits (includes gender, country, age) Testing https://github.com/kensanata/numbers
CROHME Competition on Recognition of Online Handwritten Mathematical Expressions (InkML format) Future Directions https://www.isical.ac.in/~crohme/CROHME_data.html

Obtaining Data

MNIST

Data can be loaded using keras or sklearn.

keras has the full MNIST set

  • 70,000 total images split into train (60K) and test (10K)
  • image size: 28 x 28 pixels
from tensorflow.keras.datasets import mnist
(X_train,y_train),(X_test,y_test) = mnist.load_data()

sklearn has a small subset of MNIST

  • 1,797 total images
  • image size: 8 x 8 pixels
from sklearn.datasets import load_digits
digits = load_digits()
X = digits.data
y = digits.target

HASYv2

HASYv2 is an extensive dataset, primarily consisting of LaTeX images. The following code limits the dataset to digits (line 34).

HASYv2 digit dataset

  • 1,020 total images
  • image size: 32 x 32 pixels

Running this code will return reshaped 28x28 image arrays

from newDatasets import load_HASY
X,y = load_HASY() 

Kensanata

Kensanata dataset includes interesting demographic data. There are ~17K digit images in the set. The original file structure makes direct processing somewhat cumbersome. I have compiled the images into a single directory. The naming schema has been retained to describe demographic information.

  • country: (ISO 3166 Alpha-2 codes) or XX if unknown (e.g. Switzerland is CH)
  • age: rounded to the nearest decade (e.g. 35 to 44 years is 4) or X if unknown
  • sex: (M for man, F for woman, X for unknown, O for other)

Kensanata digit dataset

  • 16,994 total images
  • image size: varies

Running this code will return reshaped 28x28 image arrays

from newDatasets import load_Kensanata
X,y = load_Kensanata() 

Running this code will return a dataframe with all information, including 28x28 image arrays

from newDatasets import load_Kensanata
df = load_Kensanata(dataframe = True)

Image Processing

Stage Image Issues
Raw Image raw-image From the human eye, 4 distinct segments are readily apparent. However, shadows and other subtle artifacts are detected by the computer as objects.
Preprocessed Binary binary-image Results in 4000+ segments (expected 4) due to noisy, non-white background. Dots are each considered separate segments. Requires processing.
Postprocessed Binary processed-image Adjusting alpha levels and gaussian blurring reduces noise from raw image. Segmentation ready.
Segmented Image segmented-image Proper segmentation detects 4 objects.

Model and Evaluation

Model evaluation for all classifiers involves both MNIST test data (10K images) and a Kensanata subset (1500 images). The Kensanata dataset is important to classify, as it most closely resembles the data in my product usage. Unlike the uniformly sized and pre-cleaned images in the MNIST set, the Kensanata images are noisy and require a great deal preprocessing. This mimics the real-life conditions of images processed in the autograder app.

Baseline Models

Decision Tree and SVM classifiers were used as baseline models.

Decision Tree Classifier

MNIST Test Set Kensanata Test Set
Decision Tree Decision Tree
Decision Tree Decision Tree

SVM Classifier

Standard Scaler is used for SVM classsification. Although SVM works well with MNIST data, it does not perform well with Kenasata.

MNIST Test Set Kensanata Test Set
SVM SVM
SVM SVM

Model Improvement

CNN Classifier

Improved on Kenasata dataset. Note: These images are processed raw, so that comparisons with MNIST is fair. This same model is used in the app, however after preprocessing of images. This results in even higher performance.

MNIST Test Set Kensanata Test Set
CNN CNN
CNN CNN

Model Summary

Accuracy is summarized for each of the models. F1 scores is also used for model evaluation, as it combines recall and precision into one performance metric.

Model MNIST Test Set Kensanata Test Set
Decision Tree 0.8759 0.3353
SVM 0.9375 0.3033
CNN #1 0.9836 0.8040

The following F1 scores (and classification reports) are based on Kensanata data only.

Model F1 Score (Macro) F1 Score by Digit
Decision Tree 0.33 DT
SVM 0.27 SVM
CNN 0.80 CNN

Model Improved

Running the app

Clone repo via SSH or HTTPS

# using SSH
git clone [email protected]:aprettyloner/autograder.git

# using HTTPS
git clone https://github.com/aprettyloner/autograder.git

Install all dependencies

python3 -m pip install --user --upgrade pip
pip install -r requirements.txt

Run the app

cd autograder/my_app/
export FLASK_APP=main.py
export FLASK_DEBUG=1
flask run

App Prototype

Note: This is a very preliminary web app to work out backend functionality. The actual app design only requires users to designate crop areas and enter answer keys once. From there, any number of exams/assignments can be auto-graded by simply scanning the page.
The web app can be run on mobile phones. Using the upload interface, the user can take a photo directly.

Webcam/IPEVO Example Image/PDF Upload Example
webcam-example upload-example

ocr-autograder's People

Contributors

aprettyloner avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.