avirambh / msdnet-gcn Goto Github PK

ICLR 2018 reproducibility challenge - Multi-Scale Dense Convolutional Networks for Efficient Prediction

License: MIT License

Python 92.24% Shell 7.76%

msdnet-gcn's Introduction

MSDNet - reproducibility and applying GCN blocks with separable kernel

This repository contains a reproduction code (in PyTorch) for "MSDNet: Multi-Scale Dense Networks for Resource Efficient Image Classification"

Introduction
Usage

Introduction

MSDNet is a novel approach fo image classification with computational resource limits at test time. This repository provides an implementation based on the technical description provided in the paper. Currently this code implements the support for Cifar-10 and Cifar-100.

Moreover, this code integrates the support for GCN based layers instead of normal convolution layers, in order to reduce the model parameters.

Usage

Dependencies

Train

As an example, use the following command to train an MSDNet on Cifar10

python3 main.py --model msdnet -b 64 -j 2 cifar10 --msd-blocks 10 --msd-base 4 \
--msd-step 2 --msd-stepmode even --growth 6-12-24 --gpu 0

As an example, use the following command to train an MSDNet on Cifar100 with GCN block

python3 main.py --model msdnet -b 64 -j 2 cifar100 --msd-blocks 10 --msd-base 3 \
--msd-step 1 --msd-stepmode even --growth 6-12-24 --gpu 0  --msd-gcn --msd-gcn-kernel 5 \
--msd-share-weights --msd-all-gcn

Evaluation

We take the Cifar10 model trained above as an example.

To evaluate the trained model, use evaluate to evaluate from the default checkpoint directory:

python3 main.py --model msdnet -b 64 -j 2 cifar100 --msd-blocks 10 --msd-base 3 \
--msd-step 1 --msd-stepmode even --growth 6-12-24 --gpu 0 --msd-gcn --msd-gcn-kernel 5 \
--msd-share-weights --msd-all-gcn --resume --evaluate

Other Options

For detailed options, please python main.py --help

For more examples and using pre-trained models, please less script.sh

msdnet-gcn's People

Contributors

Stargazers

Watchers

msdnet-gcn's Issues

It seems a bug

I downloaded the code and run it by the command: python3 main.py --model msdnet -b 64 -j 2 cifar10 --msd-blocks 10 --msd-base 4 --msd-step 2 --msd-stepmode even --growth 6-12-24 --gpu 0 to train on cifar10 data set, but it shows the error

Traceback (most recent call last):
File "main.py", line 445, in
main()
File "main.py", line 139, in main
train(train_loader, model, criterion, optimizer, epoch)
File "main.py", line 212, in train
prec1, prec5, _ = msdnet_accuracy(output, target, input)
File "main.py", line 269, in msdnet_accuracy
top1s.append(tprec1[0])
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to c
onvert a 0-dim tensor to a Python number

When train a MSDNet model it keeps downloading cifar database

It's always “Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ../data/cifar-10-python.tar.gz”. After a long time I checked ~/data and found cifar-10-python.tar.gz & cifar-100-python.tar.gz. I tried to decompress them but only got error with info "truncated gzip input".

I ran the command again then the cifar database was deleted and re-downloaded.

Dimension error when I train with images of size 64

Hi Avirambh,

How did you come up with 128 for self.inner_channels; line 418 in msdnet_layers.py?

I tried to run the code with image input size at 64 instead of 32 with 256 self.inner channels but I get an error?

Could you explain how you come up with 128 based on image size 32 from the beginning?

Cheers,
Oushesh

how can I train my own dataset

Model on the Imagenet

Hi @avirambh , thanks for your wonderful reproduction work. Could you provide a pytorch model trained on the Imagenet dataset?

Why is the number of output channels of the first convolution in the first layer 32?

According to Appendix A in the paper, for the CIFAR datasets, the number of output channels of the three scales is set to 6, 12 and 24 respectively. However, num_channels is set to 32 in msdnet.py. This means that the number of output channels in the first layer for the three scales is 32, 64 and 128 respectively according to the default growth rate 1-2-4-4. Why is there a difference between the implementation details in the paper and the code?

Question about Dynamic Evaluation

Hi author,
thanks for your work, but I have not understand the dynamic evaluation in origin paper, in this case, i can not implement early exiting from paper.
Can you send me the code about this point? cause I think I can understand paper by code. Thank you very much.