Code Monkey home page Code Monkey logo

caffe's Introduction

Caffe

Build Status License

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berkeley Vision and Learning Center (BVLC) and community contributors.

Check out the project site for all the details like

and step-by-step examples.

Custom distributions

Community

Join the chat at https://gitter.im/BVLC/caffe

Please join the caffe-users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues.

Happy brewing!

License and Citation

Caffe is released under the BSD 2-Clause license. The BAIR/BVLC reference models are released for unrestricted use.

Please cite Caffe in your publications if it helps your research:

@article{jia2014caffe,
  Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
  Journal = {arXiv preprint arXiv:1408.5093},
  Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
  Year = {2014}
}

caffe's People

Contributors

blgene avatar cdluminate avatar cypof avatar dgolden1 avatar eelstork avatar erictzeng avatar flx42 avatar intelfx avatar jamt9000 avatar jeffdonahue avatar jyegerlehner avatar kloudkl avatar longjon avatar lukeyeager avatar mavenlin avatar mohomran avatar nitnelave avatar noiredd avatar philkr avatar qipeng avatar rbgirshick avatar ronghanghu avatar sergeyk avatar sguada avatar shelhamer avatar tnarihi avatar williford avatar willyd avatar yangqing avatar yosinski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

caffe's Issues

Testing Fail

Hi, currently I encounter a problem. I run the train_net.bin on my own data, the testing seems to crash. The logged information is:

I0111 00:39:27.884718 8026 solver.cpp:84] Testing net
F0111 00:39:28.925695 8026 syncedmem.cpp:45] Check failed: (cudaMalloc(&gpu_ptr_, size_)) == cudaSuccess (2 vs. 0)
*** Check failure stack trace: ***
@ 0x7f1b7e10bb5d google::LogMessage::Fail()
@ 0x7f1b7e10fb77 google::LogMessage::SendToLog()
@ 0x7f1b7e10d9f9 google::LogMessage::Flush()
@ 0x7f1b7e10dcfd google::LogMessageFatal::~LogMessageFatal()
@ 0x436d57 caffe::SyncedMemory::mutable_gpu_data()
@ 0x4208fe caffe::Blob<>::mutable_gpu_data()
@ 0x445dd4 caffe::ConvolutionLayer<>::Forward_gpu()
@ 0x42842a caffe::Net<>::ForwardPrefilled()
@ 0x41d319 caffe::Solver<>::Test()
@ 0x41e705 caffe::Solver<>::Solve()
@ 0x40b8dd main
@ 0x30b9c1ecdd (unknown)
@ 0x40b739 (unknown)

Aborted (core dumped)

I exactly exploit the network architecture defined in "imagenet.prototxt" and "imagenet_val.prototxt". My training and testing datasets, respectively, are out 20G and contain 200,000 images cropped to 256*256.

Anyway, I feel a little confused now...Thus I would like to ask for help here. Much thanks!

About Fine Tuning

Hi. Now I plan to apply the pretrained ImageNet model to our own data. Thus fine tuning is necessary to achieve satisfactory performance.

For fine tuning, is it just loading the pretrained model and then adjusting the network architecture to the wanted one, then starting training?

Thanks for any reply!

Clarify MKL compiling and linking process

In the Intel MKL Link advisory explains the different ways to link MKL with other code. That includes the option of static linking, dynamic linking or single dynamic linking.
http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/

It mentions that when compiling for OS.X one has to add -m64 to the clang options, but not for Linux.

It also mentions that in case of dynamic linking then one has to run mklvars.sh to set the appropriate vars.
http://software.intel.com/sites/products/documentation/hpc/mkl/userguides/mkl_userguide_win/MKL_UG_getting_started/Environment_Setting_Scripts.htm

http://software.intel.com/sites/products/documentation/hpc/mkl/userguides/mkl_userguide_win/MKL_UG_linking_your_application/Using_dynamic_interface.htm

Apparently there are set of MKL redistributable libraries, that we could re-distribute with Caffe to let people run it without need to have MKL.

How to let the Net Forward without a leveldb

if I don't use a leveldb, how can I make Forward using the trained Net?
I used this functions (copy data to bottom)
const vector<Blob>& Forward(const vector<Blob > & bottom);

But, doesn't work.

Anyone help me? Thank you.

CUDA driver 331.38 incompatibility

There is some kind of CUDA incompatibility with NVIDIA driver 331.38. To fix, downgrade to 331.20.

Original detective work by @rbgirshick :


We isolated it to an problem with the backwards pass of conv4. The problem manifested itself as GPU utilization dropping to ~3%, nothing happening for a while, and then the caffe process segfaulting.


As far as we know, this is a bug with the driver (Caffe works properly on all other versions tried).

Crash after the iteration 1620. Check failed,cublasSgemm

I run with the caffe for training my dataset. But after the iteration 1620, the program crushed in the cublasSgemm. The log is listed as following, Can you give some advices for fixing this error?

I0127 14:31:22.608165 19425 solver.cpp:204] Iteration 1580, lr = 0.01
I0127 14:31:22.609833 19425 solver.cpp:66] Iteration 1580, loss = 0.0217456
I0127 14:31:49.345432 19425 solver.cpp:204] Iteration 1600, lr = 0.01
I0127 14:31:49.347100 19425 solver.cpp:66] Iteration 1600, loss = 0.0122987
I0127 14:32:16.079083 19425 solver.cpp:204] Iteration 1620, lr = 0.01
I0127 14:32:16.080762 19425 solver.cpp:66] Iteration 1620, loss = 1.67767
F0127 14:32:39.484519 19425 math_functions.cpp:45] Check failed: (cublasSgemm_v2(Caffe::cublas_handle(), cuTransB, cuTransA, N, M, K, &alpha, B, ldb, A, lda, &beta, C, N)) == CUBLAS_STATUS_SUCCESS (14 vs. 0)
*** Check failure stack trace: ***
@ 0x7fa69de70b7d google::LogMessage::Fail()
@ 0x7fa69de72c7f google::LogMessage::SendToLog()
@ 0x7fa69de7076c google::LogMessage::Flush()
@ 0x7fa69de7351d google::LogMessageFatal::~LogMessageFatal()
@ 0x42ee79 caffe::caffe_gpu_gemm<>()
@ 0x45f7e2 caffe::ConvolutionLayer<>::Backward_gpu()
@ 0x42770b caffe::Net<>::Backward()
@ 0x421278 caffe::Solver<>::Solve()
@ 0x40d055 main

Cuda kernel crash

Hi Yangqing,

We met problem when running Caffe. We run Caffe following the step on http://caffe.berkeleyvision.org/imagenet.html for ImageNet training. Our training set is 1000 images selected from the whole imagenet dataset and the testing set is the same.
It crashes after 600 iterations, the error message is:
“F0118 21:19:41.088841 1589 padding_layer.cu:131] Cuda kernel failed. Error: unspecified launch failure”
We run it again and it still crashes after 740 iterators, with error message:
“F0118 20:57:41.093628 27945 math_functions.cpp:45] Check failed: (cublasSgemm_v2(Caffe::cublas_handle(), cuTransB, cuTransA, N, M, K, &alpha, B, ldb, A, lda, &beta, C, N)) == CUBLAS_STATUS_SUCCESS (14 vs. 0)”

Our testbed is Ubuntu 12.04 LTS with GTX Titan, CUDA 5.5

Do you have any idea what causes this crash ? Thank you very much.

Best regards,
Minjie

LevelDB memory consumption problem (out of files)

When running Caffe on the ImageNet data, I observed that the memory usage (seen via top command) inexorably increases to almost 100%. With batchsize=256, this happens in around 2500 iterations. When I set the batchsize to 100, training was faster but by around 5000 iterations the memory consumption again increased to almost 100%. At that point the training slows down dramatically and in fact the loss does not change at all. I suspect the slowdown may be due to thrashing. I am wondering if there is a memory leak or something in Caffe that is unintentionally allocating more and more memory at each iteration.

The same issue occurs on MNIST, although the dataset is much smaller so the training can actually complete without issues.

I ran the MNIST data through the valgrind tool with --leak-check=full, and indeed some memory leaks were reported. These could be benign if the amount of leaked memory is constant, but maybe it is scaling with respect to the number of batches which could explain the forever-increasing memory consumption.

Any idea what could be the problem?

Update (12/13/2013): The problem may be in LevelDB. I was able to make it work by modifying src/caffe/layers/data_layer.cpp by setting options.max_open_files = 100. I think the default was 1000, which was just too much memory on the machine I was using. I also wonder whether it could be improved by setting ReadOptions::fill_cache=false, since Caffe seems to scan over the whole training set.

Why not using GSL to replace MKL?

I mean GSL is under GNU GPL and easy to compile cross platform. Maybe its performance is not as good as MKL. But I think CPU mode is mostly adopted to verify its consistency with GPU mode.

make pycaffe error

on the boost-eigen branch.
I was able to make and make runtest and all was good. (ubuntu12.04, gtx580)

But when I ran "make pycaffe"(in the same dir), I got the following errors...

python/caffe/pycaffe.cpp: In member function ‘boost::python::api::object CaffeBlobWrap::get_data()’:
python/caffe/pycaffe.cpp:72:74: error: ‘PyArray_SetBaseObject’ was not declared in this scope
python/caffe/pycaffe.cpp: In member function ‘boost::python::api::object CaffeBlobWrap::get_diff()’:
python/caffe/pycaffe.cpp:85:74: error: ‘PyArray_SetBaseObject’ was not declared in this scope
make: *** [py] Error 1

any ideas?

THANKS!! for making this available by the way.

Remove opencv dependence

opencv is a heavy dependency when we are only using it to do image IO. The upfront install work could be lessened if there were simple libpng and libjpeg wrappers to handle the image loading and saving.

make distribute pycaffe error

when I update the latest version of caffe and build it in distribute mode: make distribute pycaffe. there is a mistake:

python/caffe/pycaffe.cpp: boost::python::api::object CaffeBlobWrap::get_data()’:
python/caffe/pycaffe.cpp:72:74: Error: ‘PyArray_SetBaseObject’is not defined
python/caffe/pycaffe.cpp: boost::python::api::object CaffeBlobWrap::get_diff()’:
python/caffe/pycaffe.cpp:85:74: Error: ‘PyArray_SetBaseObject’is not defined

When using Caffe::set_mode(Caffe::GPU), the program doesn't work correctly?

In order to get the probility output of softmax, I use the codes.
bash sh code:
GLOG_logtostderr=1 ../examples/TestLiFT.bin TestLiFT.prototxt lenet_iter_5000 10 "GPU" t10k-images-idx3-ubyte t10k-labels-idx1-ubyte
It can work correctly if i use Caffe::set_mode(Caffe::CPU);
E0124 16:31:13.968103 8064 TestLiFT.cpp:114] result size = 1 result channel = 10
E0124 16:31:13.968116 8064 TestLiFT.cpp:117] label = 5
E0124 16:31:13.968125 8064 TestLiFT.cpp:119] 1.53695e-05
E0124 16:31:13.968137 8064 TestLiFT.cpp:119] 4.50447e-06
E0124 16:31:13.968147 8064 TestLiFT.cpp:119] 4.77117e-05
E0124 16:31:13.968157 8064 TestLiFT.cpp:119] 0.000178817
E0124 16:31:13.968166 8064 TestLiFT.cpp:119] 1.75801e-05
E0124 16:31:13.968176 8064 TestLiFT.cpp:119] 0.958509
E0124 16:31:13.968185 8064 TestLiFT.cpp:119] 0.00345705
E0124 16:31:13.968196 8064 TestLiFT.cpp:119] 2.1301e-06
E0124 16:31:13.968205 8064 TestLiFT.cpp:119] 0.0376869
E0124 16:31:13.968215 8064 TestLiFT.cpp:119] 8.0598e-05
E0124 16:31:13.968225 8064 TestLiFT.cpp:121] ---------------------
E0124 16:31:13.969791 8064 TestLiFT.cpp:114] result size = 1 result channel = 10
E0124 16:31:13.969805 8064 TestLiFT.cpp:117] label = 9
E0124 16:31:13.969813 8064 TestLiFT.cpp:119] 0.0042518
E0124 16:31:13.969825 8064 TestLiFT.cpp:119] 0.000264884
E0124 16:31:13.969835 8064 TestLiFT.cpp:119] 0.000825241
E0124 16:31:13.969844 8064 TestLiFT.cpp:119] 0.0278211
E0124 16:31:13.969854 8064 TestLiFT.cpp:119] 0.0487807
E0124 16:31:13.969863 8064 TestLiFT.cpp:119] 0.0230153
E0124 16:31:13.969873 8064 TestLiFT.cpp:119] 0.000117688
E0124 16:31:13.969883 8064 TestLiFT.cpp:119] 0.209391
E0124 16:31:13.969893 8064 TestLiFT.cpp:119] 0.150328
E0124 16:31:13.969902 8064 TestLiFT.cpp:119] 0.535204
E0124 16:31:13.969913 8064 TestLiFT.cpp:121] ---------------------

if not, the output of softmax will be the same for different image
E0124 16:32:27.603427 8087 TestLiFT.cpp:114] result size = 1 result channel = 10
E0124 16:32:27.603441 8087 TestLiFT.cpp:117] label = 5
E0124 16:32:27.604112 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.604125 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.604135 8087 TestLiFT.cpp:119] 1
E0124 16:32:27.604145 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.604154 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.604163 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.604173 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.604182 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.604192 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.604202 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.604212 8087 TestLiFT.cpp:121] ---------------------
E0124 16:32:27.604346 8087 TestLiFT.cpp:114] result size = 1 result channel = 10
E0124 16:32:27.604360 8087 TestLiFT.cpp:117] label = 9
E0124 16:32:27.605039 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.605062 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.605077 8087 TestLiFT.cpp:119] 1
E0124 16:32:27.605087 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.605097 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.605106 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.605115 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.605125 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.605134 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.605144 8087 TestLiFT.cpp:119] 0
E0124 16:32:27.605154 8087 TestLiFT.cpp:121] ---------------------

TestLiFT.cc
// Copyright 2013 Yangqing Jia
//
// This is a simple script that allows one to quickly test a network whose
// structure is specified by text format protocol buffers, and whose parameter
// are loaded from a pre-trained network.
// Usage:
// test_net net_proto pretrained_net_proto iterations [CPU/GPU]

include <cuda_runtime.h>

//#include
//#include

include

include "caffe/caffe.hpp"

include

include

include

using namespace caffe;

uint32_t swap_endian( uint32_t val )
{
val = ((val << 8) & 0xFF00FF00 ) | ((val >> 8) & 0xFF00FF );
return (val << 16) | (val >> 16);
}

int main(int argc, char** argv) {
if (argc < 7) {
LOG(ERROR) << "TestLiFT net_proto pretrained_net_proto iterations [CPU/GPU] image-data lable-data";
return 0;
}

cudaSetDevice(0);
Caffe::set_phase(Caffe::TEST);

if (argc == 7 && strcmp(argv[4], "GPU") == 0) {
    LOG(ERROR) << "Using GPU";
    Caffe::set_mode(Caffe::GPU);
} else {
    LOG(ERROR) << "Using CPU";
    Caffe::set_mode(Caffe::CPU);
}
//If don't set_mode(Caffe::CPU), the output will be the same. Why?
Caffe::set_mode(Caffe::CPU);// 
//Caffe::set_phase(Caffe::TEST); 

NetParameter test_net_param;
ReadProtoFromTextFile(argv[1], &test_net_param);
Net<float> caffe_test_net(test_net_param);
NetParameter trained_net_param;
ReadProtoFromBinaryFile(argv[2], &trained_net_param);
caffe_test_net.CopyTrainedLayersFrom(trained_net_param);

int total_iter = atoi(argv[3]);
LOG(ERROR) << "Running " << total_iter << "Iterations.";

// Open files
char* image_filename = argv[5];
char* label_filename = argv[6];

std::ifstream image_file(argv[5], std::ios::in | std::ios::binary);
std::ifstream label_file(argv[6], std::ios::in | std::ios::binary);
CHECK(image_file) << "Unable to open file " << image_filename;
CHECK(label_file) << "Unable to open file " << label_file;
// Read the magic and the meta data
uint32_t magic;
uint32_t num_items;
uint32_t num_labels;
uint32_t rows;
uint32_t cols;

image_file.read((char*)(&magic), 4);
magic = swap_endian(magic);
CHECK_EQ(magic, 2051) << "Incorrect image file magic.";
label_file.read((char*)(&magic), 4);
magic = swap_endian(magic);
CHECK_EQ(magic, 2049) << "Incorrect label file magic.";
image_file.read((char*)(&num_items), 4);
num_items = swap_endian(num_items);
label_file.read((char*)(&num_labels), 4);
num_labels = swap_endian(num_labels);
CHECK_EQ(num_items, num_labels);
image_file.read((char*)(&rows), 4);
rows = swap_endian(rows);
image_file.read((char*)(&cols), 4);
cols = swap_endian(cols);

char label;
float* pixels = new float[rows * cols]; 

LOG(INFO) << "A total of " << num_items << " items.";
LOG(INFO) << "Rows: " << rows << " Cols: " << cols;

uint8_t pixel;

for (int itemid = 0; itemid < total_iter; ++itemid) {
    // char -> float
    for (int i = 0; i < rows; ++i) {
        for (int j = 0; j < cols; ++j)
            {
                image_file.read((char*)&pixel, 1);  
                *(pixels+i*cols+j) = 0.00390625*static_cast<float>(pixel);
            }    
    }

    label_file.read(&label, 1);
    int label2 = (int)label;

    cudaMemcpy(caffe_test_net.input_blobs()[0]->mutable_gpu_data(), pixels, sizeof(float)*rows*cols, cudaMemcpyHostToDevice);
    //memcpy(caffe_test_net.input_blobs()[0]->mutable_cpu_data(), pixels, sizeof(float)*rows*cols);

    const vector<Blob<float>*>& result = caffe_test_net.ForwardPrefilled();

    LOG(ERROR) << "result size = " << result.size()
            << " result channel = " << result[0]->channels();
    Blob<float>* prob = result[0];
    LOG(ERROR) << "label = " << label2;
    for (int t = 0; t < prob->count(); ++t) {
        LOG(ERROR) << *(prob->cpu_data()+t) << " ";
    }
    LOG(ERROR) << "---------------------";
}

delete []pixels;

return 0;

}

Write padding aware im2col?

@Yangqing Is there any specific reason for an individual padding layer but not supporting padding in im2col? My concern is that the padding layer causes unnecessary extra memory usage which can be avoided. Is it necessary to implement padding aware im2col function?

Arch Linux Question

I have errors when compiling which probably have more to do with Arch then Caffe, but I thought I would ask.

/opt/cuda/bin/nvcc -ccbin=/usr/bin/g++ -Xcompiler -fPIC -DNDEBUG -O2 -I/usr/local/include -I/usr/include/python2.7 -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I./src -I./include -I/opt/cuda/include -I/opt/intel/mkl/include -arch=sm_30 -c src/caffe/layers/lrn_layer.cu -o src/caffe/layers/lrn_layer.cuo
/opt/cuda/bin/nvcc -ccbin=/usr/bin/g++ -Xcompiler -fPIC -DNDEBUG -O2 -I/usr/local/include -I/usr/include/python2.7 -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I./src -I./include -I/opt/cuda/include -I/opt/intel/mkl/include -arch=sm_30 -c src/caffe/layers/padding_layer.cu -o src/caffe/layers/padding_layer.cuo
/opt/cuda/bin/nvcc -ccbin=/usr/bin/g++ -Xcompiler -fPIC -DNDEBUG -O2 -I/usr/local/include -I/usr/include/python2.7 -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I./src -I./include -I/opt/cuda/include -I/opt/intel/mkl/include -arch=sm_30 -c src/caffe/layers/pooling_layer.cu -o src/caffe/layers/pooling_layer.cuo
src/caffe/layers/pooling_layer.cu(163): error: kernel launches from templates are not allowed in system files

src/caffe/layers/pooling_layer.cu(168): error: kernel launches from templates are not allowed in system files

src/caffe/layers/pooling_layer.cu(289): error: kernel launches from templates are not allowed in system files

src/caffe/layers/pooling_layer.cu(295): error: kernel launches from templates are not allowed in system files

src/caffe/layers/pooling_layer.cu(301): error: kernel launches from templates are not allowed in system files

5 errors detected in the compilation of "/tmp/tmpxft_00000bde_00000000-4_pooling_layer.cpp4.ii".
Makefile:130: recipe for target 'src/caffe/layers/pooling_layer.cuo' failed
make: *** [src/caffe/layers/pooling_layer.cuo] Error 2

I've seen a similar problem with thrust, but it happens because thrust code is inside /usr/local/include directories. This should not happen with code in my $HOME directory. I tried following the same steps suggested anyway, and my environment does not have any paths containing $HOME, and so it should not be considered a system file location. I'll see if Arch has anything related to this.

This error also occurs with dropout_layer.cu, and softmax_layer.cu, but not with any of the other cuda files. I've looked at them, but I could not tell what would cause this problem only for them. What do those 3 have in common?

dedicated build/ directory

So as to enable easy rsyncing of the codebase between machines, all object files should be written to separate build directory that can be easily excluded from syncing.

Memory requirements when resuming training from a solverstate

I'm having some memory issues when I resume training from a previous solverstate. While I train the network from scratch it fits in memory, but when I resume it, and after training I get a memory error when testing. That means that now there are less available memory.

Any suggestions?
Is the memory allocated to resume freed before the training re-start?

Building on OSX Mavericks

Any pointers on getting it running on OSX 10.9 (Mavericks). I am getting the following OpenCV error while building caffe:

"ld: symbol(s) not found for architecture x86_64"

I found references to issues with OpenCV on Mavericks, but no solution has worked for me.

The complete error log can be seen here:
https://gist.github.com/mayankjuneja/8517932

Leveldb data format

Hi guys, since my data set is in matlab .mat file, I want to convert it to leveldb format, so it can be used in caffe frame, can you give me some hints? Many thanks.

Implement simplified Nesterov momentum

The main idea of Nesterov accelerated gradient (NAG, Nesterov momentum) is to update the parameter with the gradient at the predicted (peeked-ahead) parameter. To reduce the sample variance, NAG smoothes the update by exponentially averaging the histories.

Sutskever et al.[1] proved that NAG was effective to improve the stability and convergence rate of stochastic optimization of deep network. They showed it could be done in two steps.

image

Simplified Nesterov momentum updates:
image

Bengio et al.[2] reformulated it to indicate that it was equivalent to the standard momentum except for different linear weighting coefficients.

[1] Sutskever, I., Martens, J., Dahl, G. and Hinton, G. E. On the importance of momentum and initialization in deep learning. In 30th International Conference on Machine Learning, Atlanta, USA, 2013. JMLR: W&CP volume 28.
[2] Yoshua Bengio, Nicolas Boulanger-Lewandowski, Razvan Pascanu. Advances in Optimizing Recurrent Networks. arXiv 1212.0901.

LevelDB hangs when creating initial database

Summary

I've been unable to run the convert_imageset.bin command on my data set, it always hangs a few thousand files into the process. By debugging into it, I've found that LevelDB's compaction thread is the culprit, it appears to start thrashing and never complete the compaction, blocking any writes from the main thread.

I've created a minimal test case at https://github.com/petewarden/caffe/tree/leveldb-hang, I'd love your thoughts on what I'm doing differently to your building process. Here are the steps to reproduce the problem:

Steps

  • Set up a machine according to the installation instructions, using Ubuntu 12.04, and installing libleveldb-dev via apt-get.
  • Check out the https://github.com/petewarden/caffe/tree/leveldb-hang branch. The only difference between this and the main is that there's a new python/caffe/imagenet/hang-example/ folder that contains a single image, and a million line training file referencing that image, with 1m different labels. This setup mimics my actual situation without requiring a massive file dump. I actually have around 1m files, and 1,000 labels, but the label count has been bumped up to ensure we have unique keys.
  • Run the command below from the caffe source folder (replacing the database location with something appropriate)

GLOG_logtostderr=1 build/examples/convert_imageset.bin python/caffe/imagenet/hang-example/ python/caffe/imagenet/hang-example/train.txt /mnt/leveldb-hang-database 1

Results

On my machine, the process usually processes around 4,000 files, and then hangs. The longest I've left it is 17 hours, and it never makes any further progress. By attaching to the process using gdb, I can see the main thread is stuck in a lock waiting for the compaction thread to finish:

#0  0x00007f45d65fc3f5 in snappy::internal::CompressFragment(char const*, unsigned long, char*, unsigned short*, int) () from /usr/lib/libsnappy.so.1
#1  0x00007f45d65fca6b in snappy::Compress(snappy::Source*, snappy::Sink*) () from /usr/lib/libsnappy.so.1
#2  0x00007f45d65fd507 in snappy::RawCompress(char const*, unsigned long, char*, unsigned long*) () from /usr/lib/libsnappy.so.1
#3  0x00000000004321b3 in leveldb::TableBuilder::WriteBlock(leveldb::BlockBuilder*, leveldb::BlockHandle*) ()
#4  0x00000000004323d4 in leveldb::TableBuilder::Flush() ()
#5  0x00000000004325c2 in leveldb::TableBuilder::Add(leveldb::Slice const&, leveldb::Slice const&) ()
#6  0x0000000000435ca7 in leveldb::BuildTable(std::string const&, leveldb::Env*, leveldb::Options const&, leveldb::TableCache*, leveldb::Iterator*, leveldb::FileMetaData*) ()
#7  0x000000000041c0b0 in leveldb::DBImpl::WriteLevel0Table(leveldb::MemTable*, leveldb::VersionEdit*, leveldb::Version*) ()
#8  0x000000000041cf9c in leveldb::DBImpl::CompactMemTable() ()
#9  0x000000000041e165 in leveldb::DBImpl::BackgroundCompaction() ()
#10 0x000000000041ee88 in leveldb::DBImpl::BackgroundCall() ()
#11 0x000000000043883f in leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper(void*) ()
#12 0x00007f45d6d25e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007f45d53dc3fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#14 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f45d712a940 (LWP 25654)):
#0  0x00007f45d6d29d84 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x000000000043056d in leveldb::port::CondVar::Wait() ()
#2  0x000000000041971b in leveldb::DBImpl::MakeRoomForWrite(bool) ()
#3  0x0000000000419a63 in leveldb::DBImpl::Write(leveldb::WriteOptions const&, leveldb::WriteBatch*) ()
#4  0x0000000000407f34 in main ()

Notes

From what I've been able to find, there are known issues with the LevelDB compaction process and larger files, eg:

https://github.com/basho/leveldb/wiki/repair-notes

Sanjay from LevelDB has some notes on problems with large files in this issue too:

https://code.google.com/p/leveldb/issues/detail?id=89

I tried recompiling with Basho's fork of LevelDB, since they mentioned having some fixes for their issue, but that didn't help. I also tweaked the kL0_StopWritesTrigger value to increase the threshold, but that also didn't make any real difference.

You've obviously been able to run this successfully though for your data, so I'd mostly just love to know what I'm doing wrong!

What is an iteration in the MNIST demo?

The website says the solver prints the training error every 100 iterations, what is an iteration? Is it a single forward-backward pass over a training image? I see that batchsize is a parameter in lenet.prototxt, are you doing batch gradient descent and then every iteration is actually 64 forward-backward passes and one weight update? Thanks!

Implement Matrix class to abstract algorithms away from data storage details

Currently, the algorithm codes are quite aware of the memory layout of the underlying data. Adding a Matrix class in-between helps separate concerns of different modules which is a good practice in software engineering.

The biggest benefit is to simplify coding and improve the development productivity. It will also ease understanding of the existing and future algorithms. As a result, we will see accelerated development and adoption progresses.

The Matrix class is intended to be a view of 2D array contained in a Blob. Its main functionality is to provide high level wrappers of the common operations.

using boost::move;

template<Dtype>
class Matrix {
public:
  Matrix();
  Matrix(shared_ptr<Blob<Dtype> > blob);
  Matrix<Dtype> mul(Matrix<Dtype>& that) {
    Matrix<Dtype> product;
    caffe_gpu_gemm(...);
    return move(product);
  }
  Matrix<Dtype> add(Matrix<Dtype>& that);
  minus, div, rdiv, sqr, pow, exp, conv, sum, max, min, mean, std, ones, zeros, rand, randn, size, rows, cols, row, col, roi, t/transpose, rot90, ...
private:  
  shared_ptr<Blob<Dtype> > blob_;
  size_t num_;
  size_t channel_;
  size_t offset_;
}

So that we can write like codes like the following snippets.
The convolution:

output = image.conv(filter);

The fully connected layer:

output = weight.mul(input).add(bias);

The ReLU activation:

activation = input.max(0);

The Softmax activation

activations = input.exp();
probs = activations.rdiv(activations.sum(dim));

As you can see, the API is highly inspired by MATLAB which also motivates ArrayFire C++. But of course the snippets are only rough sketches. Many more details need to be considered. For example, if the performance price of boost move operations is too high, it could be replaced by shared_ptr which would complicate the user codes a little. Another question is should we pass in the shared_ptr of the result matrix instead of returning it. More importantly, the GPU codes may greatly differ from the CPU codes depending on whether CUDA can play well with the proposed API syntax.

Therefore, this issue's scope is limited to the implementation of the Matrix classes for both kinds of devices. Porting algorithms should be put into independent issues until benchmark results show no performance gap between the low level API and the proposed high level API.

Welcome efforts to refine the API and help implement it.

Add set_device_id to solver prototxt

It will be nice to be able to set the GPU devide_id to be used in the solver.prototxt instead of being hard_coded. Add to #57

message SolverParameter {
...
  // the mode solver will use: 0 for CPU and 1 for GPU. Use GPU in default.
  optional int32 solver_mode = 17 [default = 1];
  // the device_id will that will used in GPU mode. Use device_id=1in default.
  optional int32 device_id = 18 [default = 1];
}

Caffe on Windows

I have successfully run Caffe on Ubuntu 12.04. I am wondering whether Caffe is able to run on Windows system?

The accuracy of evaluation cannot increase when training imagenet

I have strictly followed the instruction of Yangqing in his webpage (http://caffe.berkeleyvision.org/imagenet.html) to train the imagenet (including using the .proto files provided by Yangqing) as well as shuffling the training data, while the accuracy on evaluation data stays at 0.001 even now it is 77,920 iterations. Here is the current output:

I0127 08:53:57.624028 37204 solver.cpp:210] Iteration 78000, lr = 0.01
I0127 08:53:57.633610 37204 solver.cpp:68] Iteration 78000, loss = 6.9063
I0127 08:53:57.633633 37204 solver.cpp:90] Testing net
I0127 08:56:01.357560 37204 solver.cpp:117] Test score # 0: 0.001
I0127 08:56:01.357609 37204 solver.cpp:117] Test score # 1: 6.90977
I0127 08:56:33.533275 37204 solver.cpp:210] Iteration 78020, lr = 0.01
I0127 08:56:33.542655 37204 solver.cpp:68] Iteration 78020, loss = 6.90727
I0127 08:57:05.939363 37204 solver.cpp:210] Iteration 78040, lr = 0.01
I0127 08:57:05.948905 37204 solver.cpp:68] Iteration 78040, loss = 6.9073

To find out the reason, I have run the code on mnist, and got the almost correct result. Also, I have sampled some images both from training dataset and evaluation dataset. The labels and images are both OK.

Does anyone have such situation before? Can anyone give me some help on how to solve this problem?

My environment is Ubuntu 13.10 with GTX Titan, CUDA 5.5.

Support multithreading in the CPU mode of Solver::Solve

In each iteration of Solver::Solve, there are four chances to accelerate the computation.
The first opportunity is the most complex one since Net::ForwardBackward invokes the Forward and Backward of all the layers that comprise a net.

      Dtype loss = net_->ForwardBackward(bottom_vec);

The second chance is more straightforward. An OpenMP directive is enough to parallelize the independent computation for each param_id.

      ComputeUpdateValue();

The only extra trick that is needed to deal with the next occasion is to distinguish CPU and GPU mode.

      net_->Update();

The last one involves a plain old OpenMP friendly nested for loop.

      Test();

Compile error about google:protobuf

Dear fellow, I have got the following error while compiling all, I have installed the google logging library, can anyone help me with this ? Many thanks !

libcaffe.a(caffe.pb.o): In function caffe::LayerParameter::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)': caffe.pb.cc:(.text+0xbe2d): undefined reference togoogle::protobuf::io::CodedInputStream::BytesUntilLimit() const'
caffe.pb.cc:(.text+0xbef5): undefined reference to google::protobuf::io::CodedInputStream::BytesUntilLimit() const' libcaffe.a(caffe.pb.o):caffe.pb.cc:(.text+0xdefd): more undefined references togoogle::protobuf::io::CodedInputStream::BytesUntilLimit() const' follow
libcaffe.a(io.o): In function caffe::ReadProtoFromBinaryFile(char const*, google::protobuf::Message*)': io.cpp:(.text+0xb6d): undefined reference togoogle::protobuf::io::CodedInputStream::default_recursion_limit_'
io.cpp:(.text+0xbb4): undefined reference to `google::protobuf::io::CodedInputStream::~CodedInputStream()'
collect2: ld returned 1 exit status
make: *** [examples/finetune_net.bin] Error 1

error when call 'make pycaffe'

Hi,
Need help for this.
When i compile the caffe 'make all' , everything is ok. But when compile for python 'make pycaffe' , it showed error :


python/caffe/pycaffe.cpp: In member function âboost::python::api::object CaffeBlobWrap::get_data()â:
python/caffe/pycaffe.cpp:72:74: error: âPyArray_SetBaseObjectâ was not declared in this scope
python/caffe/pycaffe.cpp: In member function âboost::python::api::object CaffeBlobWrap::get_diff()â:
python/caffe/pycaffe.cpp:85:74: error: âPyArray_SetBaseObjectâ was not declared in this scope

I followed installed all the library which is mentioned inside 'Prerequisites' part. Looking in google but can find any clue.
I use Ubuntu 12.04.

Thanks.

Sparsity penalties for unsupervised learning

Is there an easy way to implement L1 regularization on the weight matrix of a fully connected network. Similarly I want to penalize the L1 norm of features in each layer. What is the best way to do that using caffe?

Extract the middle features

Hi guys,
Ask a basic question, how I can extract a middle neural network( e.g. the 8th fully connected neural network) as the feature vector (dimension :1000) efficiently.Is there a simple calling function ?
Many thanks.

Implement adaptive learning rate

Commit Yangqing/caffe@4c2c197 says “regarding https://plus.google.com/113952791760990667476/posts/Q5fwQY6JeEq - stopped the adagrad attempt”.

The post and the comments talked about three interacting factors adaptive learning rates, momentum, and synchronicity that greatly impact the stability of the learning process. The discussions did not came to the conclusion that all the adaptive learning rate scheduling schemes are harmful. It is still worthwhile to consider implementing an improved variant of AdaGrad for three reasons.

First, AdaGrad's creator made a new progress. In response to Daniel Povey's concerning about the convergence of AdaGrad, Fernando Pereira mentioned that John Duchi proposed a variant to support asynchronous updates in the comments. The work was published in NIPS 2013[1]. Although the reviewer 6 doubted the method had limited applicability, the author responded that "in additional experiments with image and speech data" they saw "similar benefits to those reported in the paper".

Second, AdaGrad's criticizer improved it. Andrew Senior who are from Google said that AdaGrad performed worse than synchronous and asynchronous SGD in some recent speech experiments. While two[2][3] of his four papers in ICASSP 2013 provided supportive evidence of AdaGrad's performance relative to SGD, he did find the limitation of it and demonstrated that AdaDec which "decouples long-term learning-rate scheduling from per-parameter learning rate variation" achieved better frame accuracies[4].

Third, experiments using both adaptive learning rate and momentum shown stable convergence. The post's author Daniel Povey who found momentum instable finally make it work by limiting parameter updates per minibatch. This could probably be implemented by "clipped gradient" and "sparser gradients via sparse output regularization and rectified outputs" which are two of the ideas that enable effectively training of recurrent neural network[5]. Therefore, adaptive learning rate does not necessarily interfere with momentum and lead to divergent training.

There are already multiple AdaGrad variants with different convergence guarantees. The first step to resolve this issue is to choose one of them that is most suitable for stable synchronous and asynchronous training on image datasets.

[1] John C. Duchi, Michael I. Jordan, and Brendan McMahan. Estimation, Optimization, and Parallelism when Data is Sparse. Neural Information Processing Systems (NIPS 2013).
[2] M.D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q.V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, G.E. Hinton. On Rectified Linear Units For Speech Processing. 38th International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver (2013).
[3] Georg Heigold, Vincent Vanhoucke, Andrew Senior, Patrick Nguyen, Marc'aurelio Ranzato, Matthieu Devin, Jeff Dean. Multilingual acoustic models using distributed deep neural networks. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013).
[4] Andrew Senior, Georg Heigold, Marc'aurelio Ranzato, Ke Yang. An Empirical study of learning rates in deep neural networks for speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013).
[5] Yoshua Bengio, Nicolas Boulanger-Lewandowski, Razvan Pascanu. Advances in Optimizing Recurrent Networks. 38th International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver (2013).

Revive the distributed solver efforts

@Yangqing started the work to implement the distributed solver in a series of commits 64e28ba, 591c36b, a3eb62a, a48147c, 3385a14, 7c6835d, 04f5224.
In the area of high performance computing, MPI is commonly used for inter-node communication and has been integrated with deep learning algorithm[1]. Last year, the executive vice president of Baidu Institue of Deep Learning Kai Yu announced PADDLE, their GPU counterpart to Google's DistBelief. Therefore, we should continue the development to enable large scale training such as on the complete ImageNet dataset rather than the smaller one for the challenge.

The commits to revive the efforts are 206dc98 and c204fa9. I suggest one of BVLC members to checkout a feature branch devoted to this issue because it would probably involve a long time of implementation, debugging, testing, performance benchmarking and even some research work.

[1] Adam Coates, Brody Huval, Tao Wang, David J. Wu, Andrew Y. Ng and Bryan Catanzaro. Deep Learning with COTS HPC. In ICML 2013.
[2] Large-scale Deep Learning at Baidu

Make gemm fully dependent on eigen

Currently under caffe/util/math_functions.cpp, gemm is called using cblas_gemm, which then relies on atlas to carry out gemm. This may be suboptimal since we will not be able to use multithreaded gemm. Possibly we could make the boost-eigen branch to fully use eigen and remove the atlas dependency?

For earlier discussions please see issue #80 .

Consolidate network definitions

Right now, a model typically has three CaffeNet definitions for training, validation, and deployment (imagenet.prototxt, imagenet_val.prototxt, imagenet_deploy.prototxt respectively for the ImageNet example). These protobufs are full of redundancy and tweaking networks requires a lot of copy-and-paste.

Is a unified protocol buffer to describe the input/output for these cases together possible?

Minimum CUDA arch == compute capability 2.0?

I tried running Caffe with Nvida GTX470 and GTX570 GPUs which have compute capability 2.0. While the MNIST demo worked, it failed on the ImageNet pipeline, giving the following CUDA-related error:

...
I1209 00:40:23.426077 21877 net.cpp:142] Network initialization done.
I1209 00:40:23.426111 21877 solver.cpp:36] Solver scaffolding done.
I1209 00:40:23.426146 21877 solver.cpp:44] Solving CaffeNet
F1209 00:40:23.521303 21877 relu_layer.cu:54] Cuda kernel failed. Error: invalid configuration argument
*** Check failure stack trace: ***
@ 0x7f9113749b5d google::LogMessage::Fail()
@ 0x7f911374db77 google::LogMessage::SendToLog()
@ 0x7f911374b9f9 google::LogMessage::Flush()
@ 0x7f911374bcfd google::LogMessageFatal::~LogMessageFatal()
@ 0x444ad5 caffe::ReLULayer<>::Forward_gpu()
@ 0x42a1ba caffe::Net<>::ForwardPrefilled()
@ 0x41d513 caffe::Solver<>::Solve()
@ 0x40b46d main
@ 0x3d8a01ecdd (unknown)
@ 0x40b2c9 (unknown)

When I try on an Nvidia Titan GPU (compute capability 3.5), it works fine. So I suspect Caffe may require compute capability 3.0 or higher.

Remove Intel MKL dependency

It is mentioned in the install instruction that this is work in progress.
While at ICCV I quickly implemented a branch where I remplace matrix operations with Eigen3 calls, and random generators by Boost::random generators.
I hope this is not redundant with ongoing work on private branches.

The branch can be found at
https://github.com/rodrigob/caffe

I got things to compile, however I noticed that some tests fails (thanks for creating a non-trivial set of unit tests !).
I have not been able to compile a version with MKL to compare, but I can only assume that tests should not fail.

Current fails are

[ FAILED ] FlattenLayerTest/1.TestCPUGradient, where TypeParam = double
[ FAILED ] StochasticPoolingLayerTest/0.TestGradientGPU, where TypeParam = float
[ FAILED ] StochasticPoolingLayerTest/1.TestGradientGPU, where TypeParam = double
[ FAILED ] MultinomialLogisticLossLayerTest/1.TestGradientCPU, where TypeParam = double

which all sounds nasty (gradient computation errors in neural networks, big no no).

I will spend some time inspecting to see what goes wrong there, but any suggestion/comment/idea is welcome.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.