Code Monkey home page Code Monkey logo

densenetcaffe's Introduction

Densely Connected Convolutional Network (DenseNet)

This repository contains the caffe version code for the paper Densely Connected Convolutional Networks.

For a brief introduction of DenseNet, see our original Torch implementation.

ImageNet Pretrained Models

See https://github.com/shicai/DenseNet-Caffe for caffe prototxt and pre-trained models.

See https://github.com/liuzhuang13/DenseNet for Torch pre-trained models.

See http://pytorch.org/docs/torchvision/models.html?highlight=densenet for directly using the pretrained models in PyTorch.

Note

  1. The models in this repo are for CIFAR datasets only (input 32x32). If you feed images with larger resolution (e.g., ImageNet images), you need to use a different downsampling strategy to keep the memory usage reasonable. See our paper or Torch code for details on ImageNet models.
  2. The code in this repo doesn't support BC-structres. However, it should be easy to modify.
  3. This code is not the code we use to obtain the results in the original paper, the details (such as input preprocessing, data augmentation, training epochs) may be different. To reproduce the results reported in our paper, see our original Torch implementation.

Results

The default setting (L=40, k=12, dropout=0.2) in the code yields a 7.09% error rate on CIFAR10 dataset (without any data augmentation).

Usage

  1. Get the CIFAR data prepared following the Caffe's official CIFAR tutorial.
  2. make_densenet.py contains the code to generate the network and solver prototxt file. First change the data path in function make_net() and preprocessing mean file in function densenet() to your own path of corresponding data file.
  3. By default make_densenet.py generates a DenseNet with Depth L=40, Growth rate k=12 and Dropout=0.2. To experiment with different settings, change the code accordingly (see the comments in the code). Example prototxt files are already included. Use python densenet_make.py to generate new prototxt files.
  4. Change the caffe path in train.sh. Then use sh train.sh to train a DenseNet.

Contact

liuzhuangthu at gmail.com
gh349 at cornell.edu
Any discussions, suggestions and questions are welcome!

densenetcaffe's People

Contributors

liuzhuang13 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

densenetcaffe's Issues

Why the BatchNorm layer's lr_mult is 0?

Hi,liuzhuang13:
Thanks for your great work. However I don't understand that in the train_densenet.prototxt, why the BatchNorm layer's lr_mult and decay_mult are set to 0? Can you explain it?

Batch normalization with or without learned offset

Nice paper! I just have a minor detail question for reimplementing it.

In https://github.com/liuzhuang13/DenseNetCaffe/blob/master/make_densenet.py#L8, you use:
scale = L.Scale(batch_norm, bias_term=False, ...)
This would correspond to batch normalization with learned gamma, but without beta.
In https://github.com/liuzhuang13/DenseNet/blob/master/densenet.lua#L28, you use:
convFactory:add(cudnn.SpatialBatchNormalization(nChannels))
This includes a learnable beta. So I think the Caffe code needs to be adapted to match the Torch implementation.

On a side note, the convolutions (both in Caffe and Torch, if I see correctly) all have a bias term, but that will be rendered meaningless by the following batch normalization.

CaffeModel_Trained_on_ImageNet

Hi,Thanks for your sharing !
I noticed that your team have released the torch model trained on the ImageNet, Could you please release your caffe model trained on the ImageNet?

number of neurons

hi, can you tell me how many number of neurons are there in each layer and how many layers are there in 1 dense block of densenet121 architecture

Theoretical questions about layers in dnn with batchnormalization using keras

Hi, I'm new here, I'm sorry also for my english.

I have some troubles to understand the models of DNN using batchnormalization, in specifique using keras. Can somebody explaind me the structure and content of each layer in this model that I built?

modelbatch = Sequential()
modelbatch.add(Dense(512, input_dim=1120))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('relu'))
modelbatch.add(Dropout(0.5))

modelbatch.add(Dense(256))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('relu'))
modelbatch.add(Dropout(0.5))

modelbatch.add(Dense(num_classes))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('softmax'))
# Compile model
modelbatch.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Train the model
start = time.time()
model_info = modelbatch.fit(X_2, y_2, batch_size=500, \
                         epochs=20, verbose=2, validation_data=(X_test, y_test))
end = time.time()

This is, i think, all the layers of my model:

print(modelbatch.layers[0].get_weights()[0].shape)
(1120, 512)
print(modelbatch.layers[0].get_weights()[1].shape)
(512,)
print(modelbatch.layers[1].get_weights()[0].shape)
(512,)
print(modelbatch.layers[1].get_weights()[1].shape)
(512,)
print(modelbatch.layers[1].get_weights()[2].shape)
(512,)
print(modelbatch.layers[1].get_weights()[3].shape)
(512,)
print(modelbatch.layers[4].get_weights()[0].shape)
(512, 256)
print(modelbatch.layers[4].get_weights()[1].shape)
(256,)
print(modelbatch.layers[5].get_weights()[0].shape)
(256,)
print(modelbatch.layers[5].get_weights()[1].shape)
(256,)
print(modelbatch.layers[5].get_weights()[2].shape)
(256,)
print(modelbatch.layers[5].get_weights()[3].shape)
(256,)
print(modelbatch.layers[8].get_weights()[0].shape)
(256, 38)
print(modelbatch.layers[8].get_weights()[1].shape)
(38,)
print(modelbatch.layers[9].get_weights()[0].shape)
(38,)
print(modelbatch.layers[9].get_weights()[1].shape)
(38,)
print(modelbatch.layers[9].get_weights()[2].shape)
(38,)
print(modelbatch.layers[9].get_weights()[3].shape)
(38,)

I will appreciate your help, thanks in advance.

Convolution layer setting when learn from scratch

Hi, I am using your prototxt to learn from scratch, instead of fine-tuning with ImageNet. The reason is that my dataset is totally different with ImageNet.

Looking at your prototxt, I found that your convolution layer does not have the lr_mult setting here

conv = L.Convolution(relu, kernel_size=ks, stride=stride, 
                    num_output=nout, pad=pad, bias_term=False, 
                    weight_filler=dict(type='msra'), bias_filler=dict(type='constant'))

I think it must be add the lr_mult as follows:

conv = L.Convolution(relu, kernel_size=ks, stride=stride, 
                    num_output=nout, pad=pad, bias_term=False, 
                    weight_filler=dict(type='msra'), bias_filler=dict(type='constant', value=0),
                    param=[dict(lr_mult=1, decay_mult=1))

Am I right?

The loss equaled to 87.3365 during the training stage and didn't change

I followed the instruction and didn't change the settings in solver.prototxt, but the loss converged to 87.3365 soon. It's said that this is because the learning rate is too large and the feature before the softmax layer equals to inf. So I am wondering what settings should I use with this network.
Thanks a lot!

About number of feature map in first block and "conv" layer in BC model

Hi,
I read your code and I saw that the number of feature map before goes to first dense block is twice time of growth rate k. Can I choose another number like three times, four times...?

About number of "conv" layer, for example DenseNet-121 BC is 6,12,24,16. Do you have any rule/hint to design the number? What is happen if I choose these number equally?

Thanks in advance

out of memory

I built a densenet with default parameters (depth=40, batch-size=64 and 50), adapted the number of outputs to 3 for my dataset (160x160x3 px images) resulting in ~1mio parameters (which is not too big). When running the solver, I get an "error==cudaSuccess(2 vs. 0) out of memory" error on a Tesla K80.
Any ideas?

Is there deploy.prototxt available?

@liuzhuang13 @gaohuang Thanks for the great work. I want to apply the pre-trained model to my own dataset, but it achieved worse results than other architectures. I think I may generate the deploy.prototxt file wrong. Specifically, I don't know how to write batch_norm, scale, dropout layer in a deploy.prototxt to make sure it functions properly during test phase. Could you share your deploy.prototxt as a reference? I appreciate any help.

Trained Weights

Hi. First off, thank you for so promptly porting this into Caffe, I assume this was directly because of the request on the user group?

I was wondering if you have trained this network (in caffe) yet, and whether there are weights available?

Number of outputs in the transition layer

Hello,
I am trying to understand how the number of outputs in the transition layer is being computed (in the 121, 169, 201 and 161 configurations). Looking at the python script for generating the architectures, it seems that there are some discrepancies - it uses only a single 3x3 conv layer in the dense block, while the provided prototxts use a 1x1 and a 3x3. Also the number of conv layers seems to be different - the script uses a constant number of conv layers (N), while the provided configurations use different one (e.g. 6, 12, 24, 16 for DenseNet-121).
If I follow the same approach as the script and just sum the number of outputs of all previous convolutional layers up until the transition layer I get a completely different number.

About the 3n+4

Hi Liu,

First congrads on the Best Paper Award.
I read the py codes of make_densenet.py, and I m pretty confused about the argument named depth

#change the line below to experiment with different setting
#depth -- must be 3n+4
#first_output -- #channels before entering the first dense block, set it to be comparable to growth_rate
#growth_rate -- growth rate
#dropout -- set to 0 to disable dropout, non-zero number to set dropout rate
def densenet(data_file, mode='train', batch_size=64, depth=40, first_output=16, growth_rate=12, dropout=0.2):
    data, label = L.Data(source=data_file, backend=P.Data.LMDB, batch_size=batch_size, ntop=2, 
              transform_param=dict(mean_value=128))

What's the meaning of this argument and why it must be 3*n+4, and what is n anyway?

What is meaning of 121 in the notation DenseNet-121?

Thank you for sharing nice work!
This is not bug. I just want to clarify my points

  1. In your Table 1, you used DenseNet-121, DenseNet-169...What does it means 121? How to compute it? If it is the depth of network, what is relationship with L term?
  2. In your solver.prototxt, why do you use so big learning rate? They often use very small learning rate (for Adam method) like 0.001, instead of 0.1. The reason is you used other method (Nesterov method), so you can you very high learning rate=0.1. Is it right?

Update: This is my solver using Adam method

train_net: "train_densenet_BC.prototxt"
display: 20
lr_policy: "step"
gamma: 0.1
stepsize: 20000
power: 0.75
# lr for normalized softmax
base_lr: 0.001
# high momentum
momentum: 0.99
# no gradient accumulation
iter_size: 1
max_iter: 100000
weight_decay: 0.0005
snapshot: 4000
snapshot_prefix: "snapshot/train_dense"
type:"Adam"

Train problem

when I train this DenseNet using my dataset , I find some weight diff/data are nan. I don't know this problem how to solve this problem.
such as:weight diff/data:nan nan 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 nan

Out of Memory on 1080

Hi,

Have you tried running DenseNet on a GTX1080? I'm not able to load it even with batch size 1 as the GPU runs out of memory. Wondering if something may need to be tweaked in the Caffe implementation of the DenseNet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.