liuzhuang13 / densenetcaffe Goto Github PK

View Code? Open in Web Editor NEW

269.0 21.0 208.0 11 KB

Caffe code for Densely Connected Convolutional Networks (DenseNets)

Python 97.92% Shell 2.08%

densenetcaffe's Introduction

Densely Connected Convolutional Network (DenseNet)

This repository contains the caffe version code for the paper Densely Connected Convolutional Networks.

For a brief introduction of DenseNet, see our original Torch implementation.

ImageNet Pretrained Models

See https://github.com/shicai/DenseNet-Caffe for caffe prototxt and pre-trained models.

See https://github.com/liuzhuang13/DenseNet for Torch pre-trained models.

See http://pytorch.org/docs/torchvision/models.html?highlight=densenet for directly using the pretrained models in PyTorch.

Note

The models in this repo are for CIFAR datasets only (input 32x32). If you feed images with larger resolution (e.g., ImageNet images), you need to use a different downsampling strategy to keep the memory usage reasonable. See our paper or Torch code for details on ImageNet models.
The code in this repo doesn't support BC-structres. However, it should be easy to modify.
This code is not the code we use to obtain the results in the original paper, the details (such as input preprocessing, data augmentation, training epochs) may be different. To reproduce the results reported in our paper, see our original Torch implementation.

Results

The default setting (L=40, k=12, dropout=0.2) in the code yields a 7.09% error rate on CIFAR10 dataset (without any data augmentation).

Usage

Get the CIFAR data prepared following the Caffe's official CIFAR tutorial.
make_densenet.py contains the code to generate the network and solver prototxt file. First change the data path in function make_net() and preprocessing mean file in function densenet() to your own path of corresponding data file.
By default make_densenet.py generates a DenseNet with Depth L=40, Growth rate k=12 and Dropout=0.2. To experiment with different settings, change the code accordingly (see the comments in the code). Example prototxt files are already included. Use python densenet_make.py to generate new prototxt files.
Change the caffe path in train.sh. Then use sh train.sh to train a DenseNet.

Contact

liuzhuangthu at gmail.com
gh349 at cornell.edu
Any discussions, suggestions and questions are welcome!

densenetcaffe's People

Contributors

Stargazers

Watchers

Forkers

2php misc-git-forks reynoldscem zashani baiyancheng20 lhy20 caomw livinhome nrupatunga lyken17 tron19920125 sophieyliu xperzy echoorchid liangzimei ilovecv pchank 3dmm-icme2023 wanjinchang benjamesbabala yogsin chagge lyk125 absorbguo miaowu16 kevinmtian deepmusic tongcheng oppo62258801 mariusmez wind222 coderx7 wbfor huangr76 luan-g ngchc qoboty zhaoj9014 aojunzhou ml-lab feiwang2018 lji72 zgsxwsdxg zhetongliang dsbib yubozuzu123 walkoncross merlin2013 suixiaodan zdwong hulalazz 123chengbo xc35 afelio2 adhereyuyu lly2111101 zxp774747 fululiang lengzi jskdr chunfeima tonyfy dousong visionu rotorliu yingning chunfuchen soledad89 gewenpulan liuwenhaha dushoufu ylch hityzy jianweilin joefannie superhero1991 alexliyang guitaryourself gavinchan1105 hongzhenwang google1234 tanxin2017 hlqzc2008 xiaohujecky 1165048017 armstrongyang yohoho233 neuralnetworkingtechnologies dengshuo december-boy vincentgu11 wavelet303 af258963 amose-yao codersadis mvpduncan danielanojan grseb9s code-0x00 shubhampachori12110095

densenetcaffe's Issues

Why the BatchNorm layer's lr_mult is 0?

Hi,liuzhuang13:
Thanks for your great work. However I don't understand that in the train_densenet.prototxt, why the BatchNorm layer's lr_mult and decay_mult are set to 0? Can you explain it?

Batch normalization with or without learned offset

Nice paper! I just have a minor detail question for reimplementing it.

In https://github.com/liuzhuang13/DenseNetCaffe/blob/master/make_densenet.py#L8, you use:
scale = L.Scale(batch_norm, bias_term=False, ...)
This would correspond to batch normalization with learned gamma, but without beta.
In https://github.com/liuzhuang13/DenseNet/blob/master/densenet.lua#L28, you use:
convFactory:add(cudnn.SpatialBatchNormalization(nChannels))
This includes a learnable beta. So I think the Caffe code needs to be adapted to match the Torch implementation.

On a side note, the convolutions (both in Caffe and Torch, if I see correctly) all have a bias term, but that will be rendered meaningless by the following batch normalization.

CaffeModel_Trained_on_ImageNet

Hi,Thanks for your sharing !
I noticed that your team have released the torch model trained on the ImageNet, Could you please release your caffe model trained on the ImageNet?

number of neurons

hi, can you tell me how many number of neurons are there in each layer and how many layers are there in 1 dense block of densenet121 architecture

Theoretical questions about layers in dnn with batchnormalization using keras

Hi, I'm new here, I'm sorry also for my english.

I have some troubles to understand the models of DNN using batchnormalization, in specifique using keras. Can somebody explaind me the structure and content of each layer in this model that I built?

modelbatch = Sequential()
modelbatch.add(Dense(512, input_dim=1120))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('relu'))
modelbatch.add(Dropout(0.5))

modelbatch.add(Dense(256))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('relu'))
modelbatch.add(Dropout(0.5))

modelbatch.add(Dense(num_classes))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('softmax'))
# Compile model
modelbatch.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Train the model
start = time.time()
model_info = modelbatch.fit(X_2, y_2, batch_size=500, \
                         epochs=20, verbose=2, validation_data=(X_test, y_test))
end = time.time()

This is, i think, all the layers of my model:

print(modelbatch.layers[0].get_weights()[0].shape)
(1120, 512)
print(modelbatch.layers[0].get_weights()[1].shape)
(512,)
print(modelbatch.layers[1].get_weights()[0].shape)
(512,)
print(modelbatch.layers[1].get_weights()[1].shape)
(512,)
print(modelbatch.layers[1].get_weights()[2].shape)
(512,)
print(modelbatch.layers[1].get_weights()[3].shape)
(512,)
print(modelbatch.layers[4].get_weights()[0].shape)
(512, 256)
print(modelbatch.layers[4].get_weights()[1].shape)
(256,)
print(modelbatch.layers[5].get_weights()[0].shape)
(256,)
print(modelbatch.layers[5].get_weights()[1].shape)
(256,)
print(modelbatch.layers[5].get_weights()[2].shape)
(256,)
print(modelbatch.layers[5].get_weights()[3].shape)
(256,)
print(modelbatch.layers[8].get_weights()[0].shape)
(256, 38)
print(modelbatch.layers[8].get_weights()[1].shape)
(38,)
print(modelbatch.layers[9].get_weights()[0].shape)
(38,)
print(modelbatch.layers[9].get_weights()[1].shape)
(38,)
print(modelbatch.layers[9].get_weights()[2].shape)
(38,)
print(modelbatch.layers[9].get_weights()[3].shape)
(38,)

I will appreciate your help, thanks in advance.

Convolution layer setting when learn from scratch

Hi, I am using your prototxt to learn from scratch, instead of fine-tuning with ImageNet. The reason is that my dataset is totally different with ImageNet.

Looking at your prototxt, I found that your convolution layer does not have the lr_mult setting here

conv = L.Convolution(relu, kernel_size=ks, stride=stride, 
                    num_output=nout, pad=pad, bias_term=False, 
                    weight_filler=dict(type='msra'), bias_filler=dict(type='constant'))

I think it must be add the lr_mult as follows:

conv = L.Convolution(relu, kernel_size=ks, stride=stride, 
                    num_output=nout, pad=pad, bias_term=False, 
                    weight_filler=dict(type='msra'), bias_filler=dict(type='constant', value=0),
                    param=[dict(lr_mult=1, decay_mult=1))

Am I right?

How to rename a layer

what should I do, if I want to specify a layer name in make_densenet.py?

evaluation on ImageNet

hi, do you have any plan of evaluate DenseNet on ImageNet classification task?

where save snapshot models?

What is the path of snapshot models? It isn't shown in the "solver.prototxt".

The loss equaled to 87.3365 during the training stage and didn't change

I followed the instruction and didn't change the settings in solver.prototxt, but the loss converged to 87.3365 soon. It's said that this is because the learning rate is too large and the feature before the softmax layer equals to inf. So I am wondering what settings should I use with this network.
Thanks a lot!

About number of feature map in first block and "conv" layer in BC model

Hi,
I read your code and I saw that the number of feature map before goes to first dense block is twice time of growth rate k. Can I choose another number like three times, four times...?

About number of "conv" layer, for example DenseNet-121 BC is 6,12,24,16. Do you have any rule/hint to design the number? What is happen if I choose these number equally?

Thanks in advance

out of memory

I built a densenet with default parameters (depth=40, batch-size=64 and 50), adapted the number of outputs to 3 for my dataset (160x160x3 px images) resulting in ~1mio parameters (which is not too big). When running the solver, I get an "error==cudaSuccess(2 vs. 0) out of memory" error on a Tesla K80.
Any ideas?

Is there deploy.prototxt available?

@liuzhuang13 @gaohuang Thanks for the great work. I want to apply the pre-trained model to my own dataset, but it achieved worse results than other architectures. I think I may generate the deploy.prototxt file wrong. Specifically, I don't know how to write batch_norm, scale, dropout layer in a deploy.prototxt to make sure it functions properly during test phase. Could you share your deploy.prototxt as a reference? I appreciate any help.

Trained Weights

Hi. First off, thank you for so promptly porting this into Caffe, I assume this was directly because of the request on the user group?

I was wondering if you have trained this network (in caffe) yet, and whether there are weights available?

Number of outputs in the transition layer

Hello,
I am trying to understand how the number of outputs in the transition layer is being computed (in the 121, 169, 201 and 161 configurations). Looking at the python script for generating the architectures, it seems that there are some discrepancies - it uses only a single 3x3 conv layer in the dense block, while the provided prototxts use a 1x1 and a 3x3. Also the number of conv layers seems to be different - the script uses a constant number of conv layers (N), while the provided configurations use different one (e.g. 6, 12, 24, 16 for DenseNet-121).
If I follow the same approach as the script and just sum the number of outputs of all previous convolutional layers up until the transition layer I get a completely different number.

About the 3n+4

Hi Liu,

First congrads on the Best Paper Award.
I read the py codes of make_densenet.py, and I m pretty confused about the argument named depth

#change the line below to experiment with different setting
#depth -- must be 3n+4
#first_output -- #channels before entering the first dense block, set it to be comparable to growth_rate
#growth_rate -- growth rate
#dropout -- set to 0 to disable dropout, non-zero number to set dropout rate
def densenet(data_file, mode='train', batch_size=64, depth=40, first_output=16, growth_rate=12, dropout=0.2):
    data, label = L.Data(source=data_file, backend=P.Data.LMDB, batch_size=batch_size, ntop=2, 
              transform_param=dict(mean_value=128))

What's the meaning of this argument and why it must be 3*n+4, and what is n anyway?

Why not use BatchNorm in-place, any concern?

What is meaning of 121 in the notation DenseNet-121?

Thank you for sharing nice work!
This is not bug. I just want to clarify my points

In your Table 1, you used DenseNet-121, DenseNet-169...What does it means 121? How to compute it? If it is the depth of network, what is relationship with L term?
In your solver.prototxt, why do you use so big learning rate? They often use very small learning rate (for Adam method) like 0.001, instead of 0.1. The reason is you used other method (Nesterov method), so you can you very high learning rate=0.1. Is it right?

Update: This is my solver using Adam method

train_net: "train_densenet_BC.prototxt"
display: 20
lr_policy: "step"
gamma: 0.1
stepsize: 20000
power: 0.75
# lr for normalized softmax
base_lr: 0.001
# high momentum
momentum: 0.99
# no gradient accumulation
iter_size: 1
max_iter: 100000
weight_decay: 0.0005
snapshot: 4000
snapshot_prefix: "snapshot/train_dense"
type:"Adam"

Question about hyper parameters for CIFAR-100 CNN

I've been struggling to recreate your results for CIFAR-100, and am wondering if you could share how you achieved the results for the CIFAR-100 dataset (27% error rate, without augmentation).

Train problem

when I train this DenseNet using my dataset , I find some weight diff/data are nan. I don't know this problem how to solve this problem.
such as:weight diff/data:nan nan 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 nan