Stanford Cars Dataset Image Classification and Localization

Input Pipeline

For preprocessing, making the data zero-centered would be computationally expensive to load all images and calculate the mean. Therefore we will just scale the pixels to be in range [-0.5,0.5] and use batch normalization between layers. When a transfer learning model is used for training, the corresponding prerpocess_input function is applied instead.

There are only a few (~30-50) images per class so to combat overfitting, heavy augmentation was applied through the library imgaug. See data_loader.py for implementation details.

The data has a class imbalance so one way to handle this is through oversampling which is creating copies of our minority classes to match the majority ones. Fortunately we do not have to do this explicitly since we can get it for free by modifying the way we output images from the tf.data generator. Oversampling is achieved by splitting the data by their class labels and sampling from them uniformly. This preserves the underlying distribution of the minority classes but evens out the dataset without needing to collect more data!

The following is the imbalanced distribution of classes

The following is an example of the distribution after oversampling. The generator was tested for 10000 iterations @ 32 batch size each

Streamlit

To see an interactive exploration of the data, run the following command

streamlit run streamlit/streamlit.py

Sanity Checks

This step was done to help monitor training and adjust hyperparameters to get good learning results.

When using softmax, the value of the loss when the weights are small and no regularization is used can be approximated by -ln(1/C) = ln(C) where C is the number of classes.

The entire dataset has 196 classes which means the softmax loss should be approximately ln(196)=5.278. After running one epoch on a neural net with 1 hidden layer, the loss did in fact match.

217/217 [==============================] - 17s 80ms/step - loss: 5.2780 - accuracy: 0.0049 - val_loss: 5.2947 - val_accuracy: 0.0032

The same process was repeated for a subset of the dataset using 2 labels. The loss should be ln(2)=0.693.

3/3 [==============================] - 1s 233ms/step - loss: 0.6933 - accuracy: 0.5625 - val_loss: 0.5985 - val_accuracy: 0.6875

Adding regularization should make the loss go up. The following test adds l2 regularization of magnitude 1e2 which made the loss jump from 0.693 to 2.9.

3/3
 [==============================] - 1s 322ms/step - loss: 2.9040 - accuracy: 0.4375 - val_loss: 2.9195 - val_accuracy: 0.6875

Model Architectures

Custom Model

Traditional:

Conv(64 filters, 5x5 kernel, 2 strides)|BatchNorm|Relu|MaxPool(2 pool size)
[Conv(128 filters, 3x3 kernel, 1 strides)|BatchNorm|Relu]*2|MaxPool(2 pool size)
[Conv(256 filters, 3x3 kernel, 1 strides)|BatchNorm|Relu]*2|MaxPool(2 pool size)
[Drop|Dense(512 units)|BatchNorm|Relu]*2
a.) Drop|Dense (196 units)
b.) Drop|Dense (4 units)

Residual:

Conv(64 filters, 3x3 kernel, 2 strides)|BatchNorm|Relu
[Conv(64 filters, 3x3 kernel, 1 strides)|BatchNorm|Relu]*2|MaxPool(2 pool size)
Residual(64 filters)*3|Residual(128 filters)*4|Residual(256 filters)*4|Residual(512 filters)*3
GlobalAvgPool2D|[Drop|Dense(512 units)|BatchNorm|Relu]*2
a.) Drop|Dense (196 units)
b.) Drop|Dense (4 units)

Transfer Learning

ResNet50
MobileNetV2
EfficientNet-B3

Training Process

Train on a small subset of data (eg. 20 samples) which should be easy to overfit and get a high training accuracy. The subset size used for this step was 73 images over 2 classes and ran for 200 epochs that resulted in 100% classifier accuracy.

Epoch 197/200
3/3 [==============================] - 3s 1s/step - loss: 0.0555 - classifier_loss: 0.0281 - localizer_loss: 0.1651 - classifier_accuracy: 1.0000 - localizer_accuracy: 0.4271 - val_loss: 15.2211 - val_classifier_loss: 11.0812 - val_localizer_loss: 31.7805 - val_classifier_accuracy: 0.3125 - val_localizer_accuracy: 0.8125
Epoch 198/200
3/3 [==============================] - 3s 1s/step - loss: 0.0425 - classifier_loss: 0.0163 - localizer_loss: 0.1473 - classifier_accuracy: 1.0000 - localizer_accuracy: 0.4479 - val_loss: 15.2499 - val_classifier_loss: 11.0812 - val_localizer_loss: 31.9246 - val_classifier_accuracy: 0.3125 - val_localizer_accuracy: 0.9062
Epoch 199/200
3/3 [==============================] - 3s 1s/step - loss: 0.0487 - classifier_loss: 0.0264 - localizer_loss: 0.1382 - classifier_accuracy: 1.0000 - localizer_accuracy: 0.3542 - val_loss: 15.2735 - val_classifier_loss: 11.0812 - val_localizer_loss: 32.0426 - val_classifier_accuracy: 0.3125 - val_localizer_accuracy: 0.8125
Epoch 200/200
3/3 [==============================] - 3s 1s/step - loss: 0.0572 - classifier_loss: 0.0329 - localizer_loss: 0.1546 - classifier_accuracy: 1.0000 - localizer_accuracy: 0.3958 - val_loss: 15.3056 - val_classifier_loss: 11.0812 - val_localizer_loss: 32.2035 - val_classifier_accuracy: 0.3125 - val_localizer_accuracy: 0.8125

Train using full dataset, start with small regularization and find the learning rate that makes the loss go down. The model is able to overfit with train accuracy of 1 implying that it has enough capacity to learn the image features.

Now that we know the model can overfit, we can increase regularization and tune hyperparameters.

Wandb was used for logging all experiments on the full dataset:

Training Experiments

Initially custom models were used for training but these proved to be difficult for finding a good solution. Each experiment was time consuming since the validation loss would converge slowly and the best validation accuracy it achieved was only at 50%. Swapping the model to a pretrained one based on imagenet dramatically improved both the results and the time it took to reach a decent accuracy.

As shown on the image above, the blue line represents the transfer learning model and at epoch 5 it has already reached 65% label accuracy.

Results

The best results so far was achieved by EfficientNet-B3 with the following hyperparameters and using Focal Loss which decreases the loss contribution of easy examples so the model can focus on harder images.

Training vs validation metrics were close during model training so underfitting or overfitting did not occur

Labels Accuracy

Bounding Box Accuracy

Sample Activation Heatmap

Test Set Results

Test Labels Accuracy: 0.901342
Test Bounding Box Accuracy: 0.7349652
Test Loss: 0.3240524888868123

Test Set Classification Evaluations

Sample Test Predictions (True Bounding Box: Blue, Predicted Bounding Box: Red)

Challenges

Having a large number of classes but only a few (~30) images per class made it very difficult to train a custom model. To combat this, augmentation served as a means to artifically inflate the amount of images. The most effective method however, was to use transfer learning with a model already pre-trained on cars. Other potential methods I could try would be to gather more data either manually (e.g. scraping/api's) or generating it synthetically (e.g. GANs). Another option that could work effectively without needing to gather more data is to try few-show learning.
Another minor challenge of trying a dual headed model was finding a good loss weighting balance and the right metrics between classification and localization. In the current setup, I decided to use mean squared log error for the bounding box loss function in order to minimize its effect on the overall loss. The downside to this is that MSLE is biased to penalizing underestimates more than overestimates. One option to counter this is to scale the range of bounding box input to [0,1] and use MSE loss function with IoU(Intersection over Union) as a metric.

Dataset Citation

Krause, Jonathan, et al. "3d object representations for fine-grained categorization." Proceedings of the IEEE International Conference on Computer Vision Workshops. 2013.

peterbacalso / car_classification_localization Goto Github PK

car_classification_localization's Introduction

Stanford Cars Dataset Image Classification and Localization

Input Pipeline

Streamlit

Sanity Checks

Model Architectures

Custom Model

Transfer Learning

Training Process

Results

Challenges

Dataset Citation

car_classification_localization's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent