-
This repository builds a single/multi-size-input model for image classification.
-
Paper:
Following paper: While convolutional neural networks (CNNs) require a fixed-size input image, “spatial pyramid pooling” strategy is able to eliminate the above requirement, allow multi-size input and therefore improve the performance of classification.
So, in this repo, I implemented a very basic, simple ZF5 model using and not using SPP as well as multi-size input for performance comparison purpose on image classification problem.
- Python 3
- Pytorch 2.0
- Opencv-python
- Caltech101 is used as in the paper. This dataset is able to be downloaded by Pytorch. https://pytorch.org/vision/stable/generated/torchvision.datasets.Caltech101.html
Because of basically experimental purpose, I just use very simple hyper-params that are different from papers' ones
- NUM_CLASSES = 101
- NUM_EPOCHS = 30
- BATCH_SIZE = 128
- LEARNING_RATE = 0.001
- WEIGHT_DECAY = 0.001
3 figures below compare the performance of 3 different architectures of ZF5. We can totally show that the model using SPP with Multi-size input training has the best performance and the lowest rate of overfitting.