Three awesome models

Assignment 2 for Computer Vision & Deep Learning course at Innopolis University. README contains a very short version of the report which you can by downloading A2_report.pdf

Shortcut: Links to Colab Notebooks

It's highly advised to view Jupyter Notebooks from Colab because GitHub does not display some outputs.

YOLOv4-tiny | YOLOv5-tiny | MaskRCNN

parse_dataset.py is a script responsible for convertation of supervisely format to the format accepted by MaskRCNN.

Project Idea

This project is aimed to detect recycling codes: PP (5), PAP (20-22), ALU (41). To achive that, I use three models: YOLOv4, YOLOv5, and MaskRCNN. You can find all Jupyter notebooks and resulting folders at this Google Drive folder.

Data Acquisition and Annotation

I took photos of whatever I found at home. You can see examples below:

Since I didn't have much of aluminium stuff, the dataset is a bit unbalanced. The statistics is the following:

Class name	Images count	Objects count
PAP	31	32
POL	36	37
ALU	29	29

The numbers in Objects Count column is different from the numbers in Objects Count column because some images contain more than 1 class.

I used supervisely to create polygon-annotations for the images. I present the examples of annotated objects below.

Data Preprocessing and Augmentation

For YOLOv4 and YOLOv5

I used roboflow for data augmentation for YOLOs.

Preprocessing

Auto-Orient: Applied

Augmentations

Outputs per training example: 3
Rotation: Between -45° and +45°
Shear: ±20° Horizontal, ±20° Vertical
Hue: Between -180° and +180°
Saturation: Between -50% and +50%
Brightness: Between -30% and +30%
Blur: Up to 2.75px

I present the examples of augmentated images below.

For MaskRCNN

I used supervisely to augment the dataset and increase its size. I added augmentations by using supervisely's DTL language to write a config and run the job. The result of this is a new dataset that has 234 images.

Augmentations:

Resize: 700x700, keep aspect ratio
Rotate: Between -180° and +180°
Gaussian Blur: sigma between 0.5 and 2
Contrast: between 0.5 and 2
Brightness: between -50 and 50
Random Color

I got the following statistics for this dataset:

Class name	Images count	Objects count
PAP	93	96
POL	108	111
ALU	87	87

Finally, to get the required data representation, I've written my own python script parse_dataset.py that takes a path to dataset and creates two .json files: one for training and one for validation - by default train.json and valid.json respectively.

Training YOLOv4

I used tiny-config and Darknet data format to train YOLOv4. As described before, I used Roboflow to convert dataset in Supervisely format to dataset in Darknet format.

YOLOv4 trained for 6000 iterations
Last accuracy: 79.66%; best accuracy: 83.51%
The model saw 360000 images

CLass Name	Average Precision, %	True Positive (TP)	False Positive (FP)
ALU	70.24	4	0
PAP	67.26	5	3
POL	100.00	8	0

As I wrote before, POL class had the highest number of images and PP recyling code is always associated with 5, whereas PAP and ALU codes could be associated with multiple numbers. Probably that is why AP for POL class is 100%.

Input

Output

Training YOLOv5 with initial weights

I used tiny-config and pytorch-yolo data format to train YOLOv5. As described before, I used Roboflow to convert dataset in Supervisely format to dataset in yolov5 format.

YOLOv5 trained for 1750 epochs (early stopping due to small improvement over the last 1000 epochs);
Last precision: 92%; Last recall: 45%; Last mAP: 48%;
Reached 1400th epoch after 2hr 16mins - very slow!

Prediction on test set (best weights)

First and second images do not have any predictions at all. Third and forth have correct predictions.

Comparison

I also trained a YOLO without pretrained weights, below you can see a comparison. Both results are frustrating.

YOLOv5 type	Precision	Recall	mAP
YOLOv5	86%	47%	29%
Pretrained YOLOv5	92%	45%	48%

Training MaskRCNN

These few lines contain a dense and most important information about configuration for the model:

I use pretrained weights from COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml.
Initial learning rate is 0.0002
Number of iterations: 4000.
Decrease learning rate by 0.5 at iterations: 2800, 3600.
Images size is 700x700.
Finished after 58 minutes - very fast!

I have already described the way I augmented and transformed the data for MaskRCNN and visualized it, so just enjoy the statistics and results! :)

Metrics

Evaluation results for segmentation:

AP	AP50	AP75	APs	APm	APl
89.765	100.000	100.000	85.050	90.040	94.175

Per-category segm AP:

Class Name	AP
PAP	90.644
POL	88.651
ALU	90.000

It is very interesting that here POL does not have the heighest score, as we saw that it has a little bit more smaples than other classes.

Output

I am very satisfied with the results! Maybe the augmentation from Supervisely played a role here, but...

The training of MaskRCNN was faster
The results are awesome
The model is confident
The masks are ultra fitting

annwhoorma / cvdl-three-awesome-models Goto Github PK

cvdl-three-awesome-models's Introduction

Three awesome models

Shortcut: Links to Colab Notebooks

Project Idea

Data Acquisition and Annotation

Data Preprocessing and Augmentation

For YOLOv4 and YOLOv5

For MaskRCNN

Training YOLOv4

Input

Output

Training YOLOv5 with initial weights

Prediction on test set (best weights)

Comparison

Training MaskRCNN

Metrics

Output

cvdl-three-awesome-models's People

Contributors

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org