Hand-grading free-response questions is a daunting task for many teachers. Current technology allows for autograding of multiple-choice questions (e.g., Illuminate has built-in scoring and gradebook update capabilities). However, OCR-based handwriting recognition is not yet readily available to the public. This tool aims to alleviate the grading burden for teachers.
- Overview
- Product Design
- Data Sources
- Obtaining Data
- Preprocessing
- Model & Evaluation
- Build/Run App
- App Prototype
Current technology is proprietary. I aim to create a lighter-weight, opensource tool with similar functionality.
- Illuminate Education software integrates OCR autograding and gradebook updating for multiple choice tests. As a former teacher, this tool was invaluable. It is the primary inspiration for my project.
- Microsoft has powerful on-the-fly penstroke capturing software, Windows Ink, which parses handwritten digits and symbols into mathematical expressions.
This details the backend product design. View the current web app prototype.
General | Detailed |
---|---|
Detailed instructions for obtaining data is provided here.
Name | Description | Usage | Resource |
---|---|---|---|
MNIST | Well-known repository for handwritten digits | Training | http://yann.lecun.com/exdb/mnist/ |
HASYv2 | Over 150,000 handwritten characters (including LaTeX mathematical symbols) | Training | https://zenodo.org/record/259444 |
Kensanata | Over 16,000 labeled handwritten digits (includes gender, country, age) | Testing | https://github.com/kensanata/numbers |
CROHME | Competition on Recognition of Online Handwritten Mathematical Expressions (InkML format) | Future Directions | https://www.isical.ac.in/~crohme/CROHME_data.html |
Data can be loaded using keras or sklearn.
- 70,000 total images split into train (60K) and test (10K)
- image size: 28 x 28 pixels
from tensorflow.keras.datasets import mnist
(X_train,y_train),(X_test,y_test) = mnist.load_data()
- 1,797 total images
- image size: 8 x 8 pixels
from sklearn.datasets import load_digits
digits = load_digits()
X = digits.data
y = digits.target
HASYv2 is an extensive dataset, primarily consisting of LaTeX images. The following code limits the dataset to digits (line 34).
- 1,020 total images
- image size: 32 x 32 pixels
Running this code will return reshaped 28x28 image arrays
from newDatasets import load_HASY
X,y = load_HASY()
Kensanata dataset includes interesting demographic data. There are ~17K digit images in the set. The original file structure makes direct processing somewhat cumbersome. I have compiled the images into a single directory. The naming schema has been retained to describe demographic information.
- country: (ISO 3166 Alpha-2 codes) or XX if unknown (e.g. Switzerland is CH)
- age: rounded to the nearest decade (e.g. 35 to 44 years is 4) or X if unknown
- sex: (M for man, F for woman, X for unknown, O for other)
- 16,994 total images
- image size: varies
Running this code will return reshaped 28x28 image arrays
from newDatasets import load_Kensanata
X,y = load_Kensanata()
Running this code will return a dataframe with all information, including 28x28 image arrays
from newDatasets import load_Kensanata
df = load_Kensanata(dataframe = True)
Model evaluation for all classifiers involves both MNIST test data (10K images) and a Kensanata subset (1500 images). The Kensanata dataset is important to classify, as it most closely resembles the data in my product usage. Unlike the uniformly sized and pre-cleaned images in the MNIST set, the Kensanata images are noisy and require a great deal preprocessing. This mimics the real-life conditions of images processed in the autograder app.
Decision Tree and SVM classifiers were used as baseline models.
MNIST Test Set | Kensanata Test Set |
---|---|
Standard Scaler is used for SVM classsification. Although SVM works well with MNIST data, it does not perform well with Kenasata.
MNIST Test Set | Kensanata Test Set |
---|---|
Improved on Kenasata dataset. Note: These images are processed raw, so that comparisons with MNIST is fair. This same model is used in the app, however after preprocessing of images. This results in even higher performance.
MNIST Test Set | Kensanata Test Set |
---|---|
Accuracy is summarized for each of the models. F1 scores is also used for model evaluation, as it combines recall and precision into one performance metric.
Model | MNIST Test Set | Kensanata Test Set |
---|---|---|
Decision Tree | 0.8759 | 0.3353 |
SVM | 0.9375 | 0.3033 |
CNN #1 | 0.9836 | 0.8040 |
The following F1 scores (and classification reports) are based on Kensanata data only.
Model | F1 Score (Macro) | F1 Score by Digit |
---|---|---|
Decision Tree | 0.33 | |
SVM | 0.27 | |
CNN | 0.80 |
Clone repo via SSH or HTTPS
# using SSH
git clone [email protected]:aprettyloner/autograder.git
# using HTTPS
git clone https://github.com/aprettyloner/autograder.git
Install all dependencies
python3 -m pip install --user --upgrade pip
pip install -r requirements.txt
cd autograder/my_app/
export FLASK_APP=main.py
export FLASK_DEBUG=1
flask run
Note: This is a very preliminary web app to work out backend functionality. The actual app design only requires users to designate crop areas and enter answer keys once. From there, any number of exams/assignments can be auto-graded by simply scanning the page.
The web app can be run on mobile phones. Using the upload interface, the user can take a photo directly.
Webcam/IPEVO Example | Image/PDF Upload Example |
---|---|