nvidia / caffe Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bvlc/caffe

672.0 94.0 263.0 112.89 MB

Caffe: a fast open framework for deep learning.

Home Page: http://caffe.berkeleyvision.org/

License: Other

CMake 2.23% Makefile 0.59% Shell 0.63% C++ 76.98% Python 10.00% MATLAB 0.66% Cuda 8.83% Dockerfile 0.07%

caffe's Introduction

NVIDIA has discontinued support and maintenance for this repository. Everything is provided as-is with no further updates being accepted. Thanks for all the contributions and engagement!

Caffe

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and community contributors.

NVCaffe

16 bit (half) floating point train and inference support.
Mixed-precision support. It allows to store and/or compute data in either 64, 32 or 16 bit formats. Precision can be defined for every layer (forward and backward passes might be different too), or it can be set for the whole Net.
Layer-wise Adaptive Rate Control (LARC) and adaptive global gradient scaler for better accuracy, especially in 16-bit training.
Integration with cuDNN v8.
Automatic selection of the best cuDNN convolution algorithm.
Integration with v2.2 (or higher) of NCCL library for improved multi-GPU scaling.
Optimized GPU memory management for data and parameters storage, I/O buffers and workspace for convolutional layers.
Parallel data parser, transformer and image reader for improved I/O performance.
Parallel back propagation and gradient reduction on multi-GPU systems.
Fast solvers implementation with fused CUDA kernels for weights and history update.
Multi-GPU test phase for even memory load across multiple GPUs.
Backward compatibility with BVLC Caffe and NVCaffe 0.15 and higher.
Extended set of optimized models (including 16 bit floating point examples).
Experimental feature (no official support) Multi-node training (since v0.17.1, NCCL 2.2 and OpenMPI 2 required).
Experimental feature (no official support) TRTLayer (since v0.17.1, can be used as inference plugin).

License and Citation

Caffe is released under the BSD 2-Clause license. The BVLC reference models are released for unrestricted use.

Please cite Caffe in your publications if it helps your research:

@article{jia2014caffe,
  Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
  Journal = {arXiv preprint arXiv:1408.5093},
  Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
  Year = {2014}
}

caffe's People

Contributors

Stargazers

Watchers

Forkers

sonyomega ezhangle ospreyx emergentorder trivedigaurav mohendra clemenshemmerling sjeaugey kjw218 bbshocking ankitshah009 asifmadnan digits-x txd866 phiresearch drnikolaev nguyenducnhaty leehonglak jmancewicz andrei-pokrovsky-nv nbcong linan7788626 patrickchrist sayi21cn vova-odessa spmohanty kerstenp bowang9 kramamur realentertain mherr klqulei juliebernauer gawain102000 ssarathy luisandresilva deepali-c pooyadavoodi amirgholami utke1 fireword gedsim ranxian 1165048017 dllehr81 shashikale meenaradchagan georgepashchenko shintaro-harada stevensama chengyangfu wangjuenew prabindh teler cfandy yhkim8412 codertocoder 5297vp andia-sw lapolonio draculaborn djmimi35 ryanolson krishneel seewoo79 jorgevilchis pydemia dinghe slipknottn thiemamd mathmanu woshichengxinxin ginsongsong xavioffcb mancinimassimiliano tomlinduty alanjschoen bpinaya hlokavarapu maxadda yuzo63 rsly faisalazhar1 dsuthar-nvidia saadmahboob cloudreamer manik-goyal nightldj ngchc romeluko tognos klaimane rodrigogonzalez ossglobalservices frankite deepmachines zhuhaijun753 stanleyhugo sherrywangnv seokhwanko90

caffe's Issues

Segfault using cudnn v3

Hello,

I am having a segfault with the version 0.13.1/0.13.2 and cudnn v3 (everything works fine when disabling cudnn or using the last BVLC version). The code i am using :

    Caffe::set_mode(Caffe::GPU);
    Caffe::SetDevice(0);

    Net<float> net("deploy.prototxt", caffe::TEST);
    net.CopyTrainedLayersFrom("test.caffemodel");

Here is the trace from gdb :

#0  0x00007ffff780ef27 in caffe::caffe_rng_rand() () from ./build/lib/libcaffe-nv.so.0
#1  0x00007ffff78fa210 in caffe::DataTransformer<float>::InitRand() () from ./build/lib/libcaffe-nv.so.0
#2  0x00007ffff78a9914 in caffe::BaseDataLayer<float>::LayerSetUp(std::vector<caffe::Blob<float>*, std::allocator<caffe::Blob<float>*> > const&, std::vector<caffe::Blob<float>*, std::allocator<caffe::Blob<float>*> > const&) () from ./build/lib/libcaffe-nv.so.0
#3  0x00007ffff7918fc1 in caffe::Net<float>::Init(caffe::NetParameter const&) () from ./build/lib/libcaffe-nv.so.0
#4  0x00007ffff791a807 in caffe::Net<float>::Net(std::string const&, caffe::Phase) () from ./build/lib/libcaffe-nv.so.0
#5  0x000000000040179e in main () at test.cpp:10

Debugging a little more brought me to Caffe::rng_stream() at this line :

Get().random_generator_.reset(new RNG());

Also I have a similar segfault when I just set the mode with Caffe::set_mode(Caffe::GPU);. Here is the stack :

#0  0x00007ffff7806f05 in boost::detail::shared_count::~shared_count() () from ./build/lib/libcaffe-nv.so.0
#1  0x00007ffff7903e85 in caffe::Caffe::~Caffe() () from ./build/lib/libcaffe-nv.so.0
#2  0x00007ffff7905ed1 in boost::thread_specific_ptr<caffe::Caffe>::delete_data::operator()(void*) () from ./build/lib/libcaffe-nv.so.0
#3  0x00007fffef8b2b21 in boost::detail::set_tss_data(void const*, boost::shared_ptr<boost::detail::tss_cleanup_function>, void*, bool) ()
   from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0
#4  0x00007ffff7905bee in boost::thread_specific_ptr<caffe::Caffe>::~thread_specific_ptr() () from ./build/lib/libcaffe-nv.so.0
#5  0x00007ffff6fc25ea in __cxa_finalize (d=0x7ffff7dd7c08) at cxa_finalize.c:56
#6  0x00007ffff77b0a23 in __do_global_dtors_aux () from ./build/lib/libcaffe-nv.so.0
#7  0x00007fffffffdfe0 in ?? ()
#8  0x00007ffff7dea73a in _dl_fini () at dl-fini.c:252

Thank you for your help !
Best regards,
Hugo

can I accomplish any Parallelization Schemes with 4 GPUs using Caffe (NVIDIA's fork)

Nowadays, I am looking for an available parallelization version of Caffe, in order to speed up training.

I tried to use Caffe (NVIDIA's fork) by the following command:

build/tools/caffe train -gpus all -solver ......

VOC2012's images were used as train/val data, and the (softmax) loss was logged iteration by iteration.
In order to compare the performance( speed of convergence ), I run the training with 4 GPUs and 1 GPU and plot the loss/iteration on the graph:

the red line is the loss with 1 GPU and the rest are 4 GPUs'.

I want to know why 4 GPUs didn't help with the speed of convergence in my task.

got some warnings. do i need to care？

src/caffe/util/math_functions.cu(284): warning: __shared__ memory variable with non-empty constructor or destructor (potential race between threads)
          detected during instantiation of "void caffe::gpu_dot_kernel(int, const Dtype *, const Dtype *, Mtype *) [with Dtype=caffe::float16, Mtype=caffe::float16]" 
(321): here
src/caffe/util/math_functions.cu(352): warning: __shared__ memory variable with non-empty constructor or destructor (potential race between threads)
          detected during instantiation of "void caffe::gpu_asum_kernel(int, const Dtype *, Mtype *) [with Dtype=caffe::float16, Mtype=caffe::float16]" 
(387): here
src/caffe/util/math_functions.cu(284): warning: __shared__ memory variable with non-empty constructor or destructor (potential race between threads)
          detected during instantiation of "void caffe::gpu_dot_kernel(int, const Dtype *, const Dtype *, Mtype *) [with Dtype=caffe::float16, Mtype=caffe::float16]" 
(321): here
src/caffe/util/math_functions.cu(352): warning: __shared__ memory variable with non-empty constructor or destructor (potential race between threads)
          detected during instantiation of "void caffe::gpu_asum_kernel(int, const Dtype *, Mtype *) [with Dtype=caffe::float16, Mtype=caffe::float16]" 
(387): here
src/caffe/util/math_functions.cu(284): warning: __shared__ memory variable with non-empty constructor or destructor (potential race between threads)
          detected during instantiation of "void caffe::gpu_dot_kernel(int, const Dtype *, const Dtype *, Mtype *) [with Dtype=caffe::float16, Mtype=caffe::float16]" 
(321): here
src/caffe/util/math_functions.cu(352): warning: __shared__ memory variable with non-empty constructor or destructor (potential race between threads)
          detected during instantiation of "void caffe::gpu_asum_kernel(int, const Dtype *, Mtype *) [with Dtype=caffe::float16, Mtype=caffe::float16]" 
(387): here
src/caffe/util/math_functions.cu(284): warning: __shared__ memory variable with non-empty constructor or destructor (potential race between threads)
          detected during instantiation of "void caffe::gpu_dot_kernel(int, const Dtype *, const Dtype *, Mtype *) [with Dtype=caffe::float16, Mtype=caffe::float16]" 
(321): here
src/caffe/util/math_functions.cu(352): warning: __shared__ memory variable with non-empty constructor or destructor (potential race between threads)
          detected during instantiation of "void caffe::gpu_asum_kernel(int, const Dtype *, Mtype *) [with Dtype=caffe::float16, Mtype=caffe::float16]" 
(387): here

Error training with multi-GPU in 0.13

I get absolutely no errors training on a single GPU on this branch, and never have in the past. This is the first time I try training multi-GPU and I am getting this error at random iterations. Sometimes it's around 8K, sometimes it's around iter 20K. Is this a known issue?

This is the command I use to train:
build/tools/caffe train --solver=examples/bibsmart/bibsmart_solver.prototxt -gpus all

I1116 15:54:52.782349 30214 solver.cpp:314] Iteration 28053, Testing net (#0)
I1116 15:54:58.068253 30214 solver.cpp:363]     Test net output #0: accuracy1 = 0.951467
I1116 15:54:58.068287 30214 solver.cpp:363]     Test net output #1: loss1 = 0.139663 (* 1 = 0.139663 loss)
F1116 15:55:02.388101 30214 cudnn_conv_layer.cu:137] Check failed: error == cudaSuccess (77 vs. 0)  an illegal memory access was encountered
*** Check failure stack trace: ***
    @     0x7f2b138a6daa  (unknown)
    @     0x7f2b138a6ce4  (unknown)
    @     0x7f2b138a66e6  (unknown)
    @     0x7f2b138a9687  (unknown)
    @     0x7f2b13f9526f  caffe::CuDNNConvolutionLayer<>::Backward_gpu()
    @     0x7f2b13f66d37  caffe::Net<>::BackwardFromTo()
    @     0x7f2b13f66ea1  caffe::Net<>::Backward()
    @     0x7f2b13e86506  caffe::Solver<>::Step()
    @     0x7f2b13e86ffe  caffe::Solver<>::Solve()
    @     0x7f2b13e8fb58  caffe::P2PSync<>::run()
    @           0x407374  train()
    @           0x405311  main
    @     0x7f2b12db7ec5  (unknown)
    @           0x40593b  (unknown)
    @              (nil)  (unknown)

Add batch normalization layer

It seems the NVcaffe does not have batch normalization layer yet, which is already exist in caffe master branch. Can we add this update to NVcaffe?
A general question: can digits use Caffe master branch directly?

No overlapping of convolution groups

In BVLC/caffe, multiple cudnn handles are created for each convolution group and each handle is tied to a different CUDA stream:
https://github.com/BVLC/caffe/blob/master/src/caffe/layers/cudnn_conv_layer.cpp#L51-L56

When calling the forward pass, the convolution groups are launched in parallel by using different cudnn handles and thus different streams:
https://github.com/BVLC/caffe/blob/master/src/caffe/layers/cudnn_conv_layer.cu#L18-L21

In NVIDIA/caffe, branch caffe-0.14, it looks like this is no longer working, and thus this is negatively impacting performance. We are always using the same cudnn handle, so I don't see how we could have parallel kernel launches:
https://github.com/NVIDIA/caffe/blob/caffe-0.14/src/caffe/layers/cudnn_conv_layer.cu#L44

This also means that the following code is no-op and bad for performance:
https://github.com/NVIDIA/caffe/blob/caffe-0.14/src/caffe/layers/cudnn_conv_layer.cu#L72-L75
Also, the comment is obviously false.

Performance issues with v0.13 vs. BVLC/master

From NVIDIA/DIGITS#389

My system:

Cuda 7.5

CudNN V3

2x Titan X's

I am transitioning from Caffe master from BVLC to Digits mainly for the ease of use.

When I train the reference_caffenet model with Imagenet 1000 using Caffe master (1 Titan X), training 20 iterations take about 6sec to 6.5 seconds.

I used the web installer of digits (which I believe comes with Cuda 7.0 and cudnnv3). If I start training on the same reference_caffenet model with Digits (1 Titan X), training 20 iterations takes about double the time (13 to 14 seconds). I also notice a lot of log messages like the one below:

18:29:17.241222 11321 blocking_queue.cpp:50] Data layer prefetch queue empty

I tried recompiling NVCaffe with USE_CNMEM := 0 and USE_CNMEM := 1 (as that was the only difference between NVidia's Caffe and BVLC's Caffe makefiles) but there was no difference in performance.

Any tips on how to debug this? I really want to be able to use Digits with 2 GPUs. I don't see the above problem with the Caffe master from BVLC.

Siddharth
@siddharthm83

Running tests with CMake

NOTE: Building with CMake still works fine - only the tests are affected.

As I discovered while testing #26, there is some sort of CMake bug in NVcaffe.

Sometimes, when you build with CMake and try to make runtest, you get thousands of lines of errors. The errors seem to occur intermittently - I've tried v0.12.*, v0.13.*, with/without cuDNN, with/without CNMeM, with/without parallel build (make --jobs=4), etc. I think this only happens when Caffe is using CUDA.

Unfortunately, the Caffe TravisCI build doesn't actually build the tests when using CUDA and CMake (see here and here).

How to reproduce

mkdir build
cd build
cmake ..
make all runtest

Error

Sometimes the tests build and run without a hitch. More often, CMake dumps thousands of lines of output. I'm not sure what's relevant, but here are the first 20 lines and the last 20 lines:

CMakeFiles/test.testbin.dir/test_threshold_layer.cpp.o: In function `testing::internal::scoped_ptr<std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> > >::reset(std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >*) [clone .constprop.221]':
test_threshold_layer.cpp:(.text+0x10): undefined reference to `testing::internal::IsTrue(bool)'
CMakeFiles/test.testbin.dir/test_threshold_layer.cpp.o: In function `testing::internal::scoped_ptr<std::string>::reset(std::string*) [clone .constprop.222]':
test_threshold_layer.cpp:(.text+0x45): undefined reference to `testing::internal::IsTrue(bool)'
CMakeFiles/test.testbin.dir/test_threshold_layer.cpp.o: In function `testing::AssertionResult testing::internal::CmpHelperGE<double, double>(char const*, char const*, double const&, double const&) [clone .constprop.218]':
test_threshold_layer.cpp:(.text+0x1cd): undefined reference to `testing::AssertionFailure()'
test_threshold_layer.cpp:(.text+0x23c): undefined reference to `testing::internal::StringStreamToString(std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >*)'
test_threshold_layer.cpp:(.text+0x2ff): undefined reference to `testing::AssertionResult::AssertionResult(testing::AssertionResult const&)'
test_threshold_layer.cpp:(.text+0x349): undefined reference to `testing::AssertionSuccess()'
test_threshold_layer.cpp:(.text+0x380): undefined reference to `testing::internal::IsTrue(bool)'
CMakeFiles/test.testbin.dir/test_threshold_layer.cpp.o: In function `testing::AssertionResult testing::internal::CmpHelperLE<double, double>(char const*, char const*, double const&, double const&) [clone .constprop.219]':
test_threshold_layer.cpp:(.text+0x4bd): undefined reference to `testing::AssertionFailure()'
test_threshold_layer.cpp:(.text+0x52c): undefined reference to `testing::internal::StringStreamToString(std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >*)'
test_threshold_layer.cpp:(.text+0x5ef): undefined reference to `testing::AssertionResult::AssertionResult(testing::AssertionResult const&)'
test_threshold_layer.cpp:(.text+0x639): undefined reference to `testing::AssertionSuccess()'
test_threshold_layer.cpp:(.text+0x670): undefined reference to `testing::internal::IsTrue(bool)'
CMakeFiles/test.testbin.dir/test_threshold_layer.cpp.o: In function `testing::AssertionResult testing::internal::CmpHelperLE<float, double>(char const*, char const*, float const&, double const&) [clone .constprop.216]':
test_threshold_layer.cpp:(.text+0x828): undefined reference to `testing::AssertionFailure()'
test_threshold_layer.cpp:(.text+0x90d): undefined reference to `testing::AssertionResult::AssertionResult(testing::AssertionResult const&)'
test_threshold_layer.cpp:(.text+0x951): undefined reference to `testing::AssertionSuccess()'

...

CMakeFiles/test.testbin.dir/test_io.cpp.o:(.data.rel.ro._ZTVN5caffe27IOTest_TestDecodeDatum_TestE[_ZTVN5caffe27IOTest_TestDecodeDatum_TestE]+0x28): undefined reference to `testing::Test::TearDown()'
CMakeFiles/test.testbin.dir/test_io.cpp.o:(.data.rel.ro._ZTVN5caffe34IOTest_TestDecodeDatumToCVMat_TestE[_ZTVN5caffe34IOTest_TestDecodeDatumToCVMat_TestE]+0x20): undefined reference to `testing::Test::SetUp()'
CMakeFiles/test.testbin.dir/test_io.cpp.o:(.data.rel.ro._ZTVN5caffe34IOTest_TestDecodeDatumToCVMat_TestE[_ZTVN5caffe34IOTest_TestDecodeDatumToCVMat_TestE]+0x28): undefined reference to `testing::Test::TearDown()'
CMakeFiles/test.testbin.dir/test_io.cpp.o:(.data.rel.ro._ZTVN5caffe41IOTest_TestDecodeDatumToCVMatContent_TestE[_ZTVN5caffe41IOTest_TestDecodeDatumToCVMatContent_TestE]+0x20): undefined reference to `testing::Test::SetUp()'
CMakeFiles/test.testbin.dir/test_io.cpp.o:(.data.rel.ro._ZTVN5caffe41IOTest_TestDecodeDatumToCVMatContent_TestE[_ZTVN5caffe41IOTest_TestDecodeDatumToCVMatContent_TestE]+0x28): undefined reference to `testing::Test::TearDown()'
CMakeFiles/test.testbin.dir/test_io.cpp.o:(.data.rel.ro._ZTVN5caffe33IOTest_TestDecodeDatumNative_TestE[_ZTVN5caffe33IOTest_TestDecodeDatumNative_TestE]+0x20): undefined reference to `testing::Test::SetUp()'
CMakeFiles/test.testbin.dir/test_io.cpp.o:(.data.rel.ro._ZTVN5caffe33IOTest_TestDecodeDatumNative_TestE[_ZTVN5caffe33IOTest_TestDecodeDatumNative_TestE]+0x28): undefined reference to `testing::Test::TearDown()'
CMakeFiles/test.testbin.dir/test_io.cpp.o:(.data.rel.ro._ZTVN5caffe40IOTest_TestDecodeDatumToCVMatNative_TestE[_ZTVN5caffe40IOTest_TestDecodeDatumToCVMatNative_TestE]+0x20): undefined reference to `testing::Test::SetUp()'
CMakeFiles/test.testbin.dir/test_io.cpp.o:(.data.rel.ro._ZTVN5caffe40IOTest_TestDecodeDatumToCVMatNative_TestE[_ZTVN5caffe40IOTest_TestDecodeDatumToCVMatNative_TestE]+0x28): undefined reference to `testing::Test::TearDown()'
CMakeFiles/test.testbin.dir/test_io.cpp.o:(.data.rel.ro._ZTVN5caffe37IOTest_TestDecodeDatumNativeGray_TestE[_ZTVN5caffe37IOTest_TestDecodeDatumNativeGray_TestE]+0x20): undefined reference to `testing::Test::SetUp()'
CMakeFiles/test.testbin.dir/test_io.cpp.o:(.data.rel.ro._ZTVN5caffe37IOTest_TestDecodeDatumNativeGray_TestE[_ZTVN5caffe37IOTest_TestDecodeDatumNativeGray_TestE]+0x28): undefined reference to `testing::Test::TearDown()'
CMakeFiles/test.testbin.dir/test_io.cpp.o:(.data.rel.ro._ZTVN5caffe44IOTest_TestDecodeDatumToCVMatNativeGray_TestE[_ZTVN5caffe44IOTest_TestDecodeDatumToCVMatNativeGray_TestE]+0x20): undefined reference to `testing::Test::SetUp()'
CMakeFiles/test.testbin.dir/test_io.cpp.o:(.data.rel.ro._ZTVN5caffe44IOTest_TestDecodeDatumToCVMatNativeGray_TestE[_ZTVN5caffe44IOTest_TestDecodeDatumToCVMatNativeGray_TestE]+0x28): undefined reference to `testing::Test::TearDown()'
CMakeFiles/test.testbin.dir/test_io.cpp.o:(.data.rel.ro._ZTVN5caffe47IOTest_TestDecodeDatumToCVMatContentNative_TestE[_ZTVN5caffe47IOTest_TestDecodeDatumToCVMatContentNative_TestE]+0x20): undefined reference to `testing::Test::SetUp()'
CMakeFiles/test.testbin.dir/test_io.cpp.o:(.data.rel.ro._ZTVN5caffe47IOTest_TestDecodeDatumToCVMatContentNative_TestE[_ZTVN5caffe47IOTest_TestDecodeDatumToCVMatContentNative_TestE]+0x28): undefined reference to `testing::Test::TearDown()'
collect2: error: ld returned 1 exit status
make[3]: *** [test/caffe-nv] Error 1
make[2]: *** [src/caffe/test/CMakeFiles/test.testbin.dir/all] Error 2
make[1]: *** [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2
make: *** [runtest] Error 2

.build_release/test/test_all.testbin: error while loading shared libraries: libhdf5.so.10: cannot open shared object file: No such file or directory

Hi,

I'm running on Ubuntu 14.04 LTS.

I am able to build Caffe, however, I get an error invoking 'make runtest':

ubuntu@ip-10-0-1-51:~/caffe$ make runtest
.build_release/tools/caffe
caffe: command line brew
usage: caffe <command> <args>

commands:
  train           train or finetune a model
  test            score a model
  device_query    show GPU diagnostic information
  time            benchmark model execution time

  Flags from /home/ubuntu/caffe/tools/caffe.cpp:
    -gpu (Run in GPU mode on given device ID.) type: int32 default: -1
    -iterations (The number of iterations to run.) type: int32 default: 50
    -model (The model definition protocol buffer text file..) type: string
      default: ""
    -snapshot (Optional; the snapshot solver state to resume training.)
      type: string default: ""
    -solver (The solver definition protocol buffer text file.) type: string
      default: ""
    -weights (Optional; the pretrained weights to initialize finetuning. Cannot
      be set simultaneously with snapshot.) type: string default: ""
.build_release/test/test_all.testbin 0 --gtest_shuffle 
.build_release/test/test_all.testbin: error while loading shared libraries: libhdf5.so.10: cannot open shared object file: No such file or directory
make: *** [runtest] Error 127
ubuntu@ip-10-0-1-51:~/caffe$

This appears to be an issue in the BVLC thread as well:

BVLC#1463

CUDNN detection at runtime

I develop on different machines with different compute capabilities.
Would it be possible to switch to a run-time detection scheme instead of a compile-time detection which is implemented right now? Right now I always have to compile two versions which is quite cumbersome. What do you think?

caffe 0.13 will not compile with CuDNN v3 and CUDA 7

I have been having this problem for about a week now and tried everything I did before to install caffe on other systems and nothing has worked thus far. The issue is caffe will compile without CuDNN, but it will not compile with it enabled.

I have already performed these steps, and still no change:

copy CuDNN files to cuda folders
libcudnn.so.7.0.64 and libcudnn_static.a to /usr/local/cuda/lib64
cudnn.h to /usr/local/cuda/include
uncommented USE_CUDNN := 1 in caffe makefile config

Here is the full dialog: http://pastebin.com/y68HnhR4

CXX .build_release/src/caffe/proto/caffe.pb.cc
AR -o .build_release/lib/libcaffe-nv.a
LD -o .build_release/lib/libcaffe-nv.so.0.13.2
/usr/bin/ld: cannot find -lcudnn
collect2: error: ld returned 1 exit status
make: *** [.build_release/lib/libcaffe-nv.so.0.13.2] Error 1

I keep getting that same error over and over, and unable to fix it.

Final note, I have verified all the files are found in the paths stated. I also disabled cudnn in makefile config and compiled successfully. I also used caffe with a trained file without cudnn and everything works, even import caffe works from python.

How does NVCaffe work on MultiGPUs?

Hello All,

Can anyone kindly explain me how does this caffe branch work on multiGPUs (at code level)? Thank you.

Enable TravisCI for this fork

I think all that's needed is to comment out this line or comment it with sed in the TravisCI script.

Latest release still on 0.13.2

Hi,

Any reason why the latest release flag is still on 0.13.2 ?

Thanks,

CMake build issue with v0.14.0-rc.1

Hi,

I faced an issue while building v0.14.0-rc.1 using CMake with the following error messages:

[ 77%] Linking CXX shared library ../../lib/libcaffe-nv.so
/usr/bin/ld: cannot find -lcnmem
collect2: ld returned 1 exit status
make[2]: *** [lib/libcaffe-nv.so.0.14.0-rc.1] Error 1
make[1]: *** [src/caffe/CMakeFiles/caffe.dir/all] Error 2
make: *** [all] Error 2

Note: I was passing the following options to cmake command based on this instruction:

-DUSE_CNMEM=True -DCNMEM_INCLUDE=$CNMEM_HOME/include -DCNMEM_LIBRARY=$CNMEM_HOME/build/libcnmem.so

The workaround that I used to avoid this issue was to remove -lcnmem from this line.
i.e.
Before:

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_CNMEM -lcnmem")

After:

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_CNMEM")

ACS may cause P2P bandwidth problem

If lower than expected performance is observed when executing a training and DIGITS has been configured to use multiple GPUs, verify that PCI Express Access Control Services (ACS) are disabled.

NVIDIA recommends that the system BIOS (SBIOS) disables ACS to ensure maximum P2P bandwidth between GPUs. The SBIOS should leave the ACS capability exposed but disabled on switch downstream ports and root ports so that ACS-aware OS and Hypervisors can choose to enable ACS when required.

Please verify with your motherboard manufacturer that the SBIOS correctly disables ACS, and if this is not the case whether an updated SBIOS is available.

If an SBIOS that correctly disables ACS is not yet available from your motherboard manufacturer, you can attempt to disable ACS programmatically by running the following script that uses the linux lspci utility. Note that this script must be run after every system boot or system reset.

#!/usr/bin/env bash
for i in $(lspci -d "10b5:" | awk '{print $1}') ; do
       o=$(lspci -vvv -s $i | grep ACSCtl)
       if [ $? -eq 0 ] ; then
               echo $o | grep "+"
               if [ $? -eq 0 ] ; then
                       setpci -s $i f2a.w=0000
               fi
       fi
done

make runtest failure (0.14.0-rc.3 and CuDNN 4)

[  FAILED  ] 1 test, listed below:
[  FAILED  ] CuDNNBatchNormLayerTest/0.TestGradient, where TypeParam = float

I've done make all, make test, make pycaffe and make distribute and there were no errors reported. Caffe also imports fine under Python. However, make runtest has one test failure.

Goal and Update of nvidia/caffe

Hi,

I have 2 questions:

What is the goal of this fork ? (Is it to furnish stable version of BVLC/caffe ?)
If yes, what is your merge strategic ? What I mean is: How do you choose when you merge the master on this fork ? (actually ~3 months late)

Thanks,

this fork's data read is very slow

I was comparing this fork versus the original one with " caffe time " on CIFAR example. It seems this fork is faster in CuDNNv3. However, when it comes to use normal data, it takes minutes to to see any output on the log whereas original caffe goes wildly fast.

Transpose order needs to have the same number of dimensions as input

This error (explained in the title), keeps getting thrown when using the example scripts "Example.py" and "Use_Archive.py", contained in the DIGITS directory. I think it could be a bug with the underlying caffe system, but im not sure, so im flagging on DIGITS git aswell, just incase.

Anyways, here's a copy of the stack trace that gets thrown on error:

dev@SaveHive:~/DIGITS/examples/classification$ python use_archive.py BeastNet_E200.gz 01.jpg
Unknown file: solver.prototxt
Unknown file: train_val.prototxt
Traceback (most recent call last):
  File "use_archive.py", line 94, in <module>
    classify_with_archive(args['archive'], image_files, not args['nogpu'])
  File "use_archive.py", line 71, in classify_with_archive
    mean_file=mean_file, labels_file=labels_file, use_gpu=use_gpu)
  File "/home/dev/DIGITS/examples/classification/example.py", line 187, in classify
    transformer = get_transformer(deploy_file, mean_file)
  File "/home/dev/DIGITS/examples/classification/example.py", line 62, in get_transformer
    t.set_transpose('data', (2,0,1)) # transpose to (channels, height, width)
  File "/home/dev/caffe/python/caffe/io.py", line 195, in set_transpose
    raise Exception('Transpose order needs to have the same number of '
Exception: Transpose order needs to have the same number of dimensions as the input.

Any and all help appreciated as per usual!

Multi GPU on AWS g2.8x large

Hello,
I am unable to run this fork in multi GPU mode when using a AWS g2.8x large instance.

When executing the actual training with the flag ' -gpu all' I see errors relating to CUDA peer communication. After doing some research on this issue it appears that CUDA peer communication is disabled on AWS g2.8x large.

Is it possible/practical to disable CUDA peer communication so that we can utilize multiGPU training on AWS?

Thank you for any help!

Check failed: proto.SerializeToOstream(&output)

Hello, I'm newbie of vision area
I'd like to Image training on Caffe (using my data)
but when I try to run make_imagenet_mean.sh file
I got an error
would you help me kindly?

this is the error message

F0703 09:53:03.039291 3732 io.cpp:67] Check failed: proto.SerializeToOstream(&output)
*** Check failure stack trace: ***
@ 0x7f114c9a6daa (unknown)
@ 0x7f114c9a6ce4 (unknown)
@ 0x7f114c9a66e6 (unknown)
@ 0x7f114c9a9687 (unknown)
@ 0x7f114cd1248c caffe::WriteProtoToBinaryFile()
@ 0x402088 main
@ 0x7f114bbb6ec5 (unknown)
@ 0x40240a (unknown)
@ (nil) (unknown)
Aborted (core dumped)

Done.

I refer to this.
https://github.com/BVLC/caffe/tree/master/examples/imagenet

and I modify ~/cafe/data/ilsvrc12/train.txt, val.txt,
like this..

train.txt

n01440764/HYIcicleM.jpg 0
n01440764/HYIcicleM1.jpg 0
n01440764/HYIcicleM2.jpg 0
n01440764/HYIcicleM3.jpg 0
n01440764/HYIcicleM4.jpg 0
n01440764/HYIcicleM5.jpg 0

...

val.txt

HYIcicleM.jpg 0
HYIcicleM1.jpg 0
HY4OclockB.jpg 1
HY4OclockB1.jpg 1
HY4OclockB2.jpg 1
HY4OclockB(copy)9.jpg 2
HY4OclockB(copy)10.jpg 2

...

thank you!

Can you run detections in multi-gpu mode?

I have a trained caffemodel file and using it to detect an object in an image. I've been running it in single GPU mode for months without problems, but I recently installed a new GPU and set it up with SLI. I am wondering how to set detection to multi-GPU in python?

I have seen these functions while searching for a solution, but I have seen nothing that states how to use caffe.set_device() to set multiple GPUs. Is there a way to do that?
caffe.set_device(0)
caffe.set_mode_gpu()

CUDA driver error bug with SLI only remedied by running CUDA Samples deviceQuery?

I am consistently getting this error when trying to run a caffe detection from a cold boot:

WARNING: Logging before InitGoogleLogging() is written to STDERR
F1101 13:01:53.909546  4274 common.cpp:156] Check failed: error == cudaSuccess (30 vs. 0)  unknown error
*** Check failure stack trace: ***
Aborted (core dumped)

Previously I spend almost 3 days debugging this issue and thought it was something with my drivers, however everything is configured correctly, and even caffe runtest passes all tests. My configuration is
Caffe 0.13
CUDA 7.5
Video Card Driver: 355.11
Ubuntu 14.04
CuDNN v3

This issue however gets resolved by simply running the CUDA samples device query once and having it pass all tests. I simply navigate to:
NVIDIA_CUDA-7.5_Samples/1_Utilities/deviceQuery/
and run
sudo ./deviceQuery

Another thing to note is the caffe device_query and make runtest don't work if I don't run the CUDA samples devieQuery at least once. This is weird, and I keep having to rerun it everytime I run my code, does anyone have any idea why or is this a bug?

got an error when compile caffe-experimental-fp16 on TX1.

What is cub/cub/cub.cuh?

/home/lijiajun/caffe-experimental-fp16/src/caffe/layers/softmax_loss_layer.cu:9:27: fatal error: cub/cub/cub.cuh: No such file or directory
#include "cub/cub/cub.cuh"
^
compilation terminated.
CMake Error at cuda_compile_generated_softmax_loss_layer.cu.o.cmake:206 (message):
Error generating
/home/lijiajun/caffe-experimental-fp16/build/src/caffe/CMakeFiles/cuda_compile.dir/layers/./cuda_compile_generated_softmax_loss_layer.cu.o
make[2]: *** [src/caffe/CMakeFiles/cuda_compile.dir/layers/./cuda_compile_generated_softmax_loss_layer.cu.o] Error 1
make[1]: *** [src/caffe/CMakeFiles/caffe.dir/all] Error 2
make: *** [all] Error 2

No GPU selected without -gpu flag

The CPU is selected when the -gpu flag is not used in the command line. I solved the issue by adding this line, as in caffe's original code.

Query version

This caffe fork provides versioning of caffe (since BVLC is unwilling to deal with the overhead - BVLC#486) by outputting the libcaffe.so file as libcaffe-nv.so.0.11.0 (or whatever the current version is).

In DIGITS, we are currently querying the version by checking the output of ldd. That's not a platform-independent solution, and it's pretty hacky in general.

Currently there is a command-line flag --version which does this:

$ build/tools/caffe --version
caffe

That's not very helpful. Here are some examples of the type of thing it should return:

$ gcc --version
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ruby --version
ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux]

$ pip --version
pip 1.5.4 from /usr/lib/python2.7/dist-packages (python 2.7)

$ python --version
Python 2.7.6

The NVIDIA/caffe fork should output something like this:

$ caffe --version
Caffe 0.11.0 (NVIDIA)

CuDNN v4.0 BatchNormalization output wrong data in [Test phrase]

CuDNN v4.0 BatchNormalization output wrong data in [Test phrase].
It works ok in [Train phrase].

That means

CUDNN_CHECK(cudnnBatchNormalizationForwardInference(
      Caffe::cudnn_handle(),
      mode_,
      cudnn::dataType<Dtype>::one,
      cudnn::dataType<Dtype>::zero,
      bottom_desc_,
      bottom_data,
      bottom_desc_,
      top_data,
      scale_bias_mean_var_desc_,
      scale_data,
      bias_data,
      this->blobs_[3]->gpu_data(),  // mean
      this->blobs_[4]->gpu_data(),  // variance
      epsilon_));

return wrong values.

We use

CUDNN_CHECK(cudnnBatchNormalizationForwardTraining(
      Caffe::cudnn_handle(),
      mode_,
      cudnn::dataType<Dtype>::one,
      cudnn::dataType<Dtype>::zero,
      bottom_desc_,
      bottom_data,
      bottom_desc_,
      top_data,
      scale_bias_mean_var_desc_,
      scale_data,
      bias_data,
      1,
      this->blobs_[3]->mutable_gpu_data(),  // mean
      this->blobs_[4]->mutable_gpu_data(),  // variance
      epsilon_,
      save_mean,
      save_inv_var));

instead, that work fine.

Version >=0.13.1 doesn't work with digits-server

I’m going crazy trying to track down this bug. I’ve been trying to find something to go on, but I don’t have much. I can’t find any error messages anywhere.

I’ve been noticing that the DIGITS “production” server (digits-server) crashes sometimes, but the “development” server (digits-devserver) never does.

Caffe v0.12.2 is fine. So is v0.13.0. But v0.13.1 and v0.13.2 crash the production server. There’s something in these changes that doesn’t play nice with the production server.

Version cuDNN   CNMeM   digits-devserver        digits-server   Crash time      Last message
0.12.2  v2              OK                      OK
0.13.0  v3              OK                      OK
0.13.0                  OK                      OK
0.13.1  v3      1.0.0   OK                      CRASH           net.forward()   None
0.13.1  v3              OK                      CRASH           caffe.Net()     cudnn_conv_layer.cpp:256] Reallocating workspace storage: 100
0.13.1          1.0.0   OK                      CRASH           net.forward()   None
0.13.1                  OK                      CRASH           net.forward()   None

The production server uses gunicorn for the webserver framework, and the development server uses Flask. I’m looking into what the differences could be (path setup, memory usage, environment variables, etc.) but I haven’t come up with anything so far. Any ideas about what I should look for?

Things I’ve investigated:

Make vs. CMake
- makes no difference (no pun intended)
Out-of-memory
- I’m using LeNet on a 6GB card. Should be no problem.
- Plus, there’s no out-of-memory errors
Timeout
- Nope. When it works, this finishes in ~0.002 seconds. And when it fails, it fails pretty much instantly, too.

/cc @slayton @drnikolaev

Batch Normalization in Caffe?

Hello guys,

I was wondering if there is a batch normalization layer available for Caffe? I looked the documentation and couldn't find anything. In google I found some custom made versions of it, but I haven't used git for long so I don't know whether adding those would mess up my installation. Anyone have any hints?

Training stops at iteration 0 with no error message or probable cause?

Here is the output of the training command: http://pastebin.com/ATPCBjQd

Here is the net I'm using to train it: http://pastebin.com/H1gLW8Lv
I recently upgraded from an older version of caffe, and had to change some of the parameter names, here is the previous net: http://pastebin.com/43Utkkpe

You will notice the transform_param had to be removed from the HDF5 data layer because apparently that one doesn't support that in 0.13. Can you guys check to see if you find any issues? Caffe passes all the make runtests fine, and it is performing image detections fine with previously trained caffemodel files. I did notice my video card remains at 0% usage even after the training kicks off. I am not sure if it has to do with the video card not being set, but I know it is being used because the runtests used the GPU, and the make config file is set to use GPU.

Software:
Caffe v0.13
CUDA 7.5
NVIDIA 352 driver
CuDNN v3

MultiGPU Bug?

Hello,

It comes to my attention that multiGPU caffe of this branch stuck into a deadlock after a few hundreds of iteration. The sign of starting deadlock is that
"Waiting for Data" & "Data Layer prefect queue empty"

I believe GPUs are much faster than data fetching on my machine, therefore leading this issue. But it shouldn't be deadlock.

Anyone good suggestions to solve this issue ? Thank you

Out-of-memory on g2.8xlarge

See NVIDIA/DIGITS#310.

/cc @ajsander

I've trained a couple models (Alexnet and GoogleNet) using DIGITS successfully with statistics shown for test and validation accuracy, but when I try to classify a single image using the web interface I get the following error:

WARNING: Logging before InitGoogleLogging() is written to STDERR
F0915 14:10:45.809661 98789 common.cpp:266] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***

When I check nvidia-smi it appears that it the amount of memory is increasing by around 100MB but it's still nowhere near the full memory capacity of the card at 3GB.
NVIDIA/DIGITS#310 (comment)

Here is some information about his system:

Running on an Amazon g2.8xlarge
GPU[s]: 4x GRID K520
CUDA 7.0
cuDNN 7.0
Caffe version 0.12 NVIDIA fork
DIGITS 2.1

Both Alexnet and GoogleNet Experienced the same problem
NVIDIA/DIGITS#310 (comment)

Here's how I reproduced it:

Start up a Ubuntu 14.04 on g2.8xlarge EC2 instance
Install the 346 driver
Installed DIGITS 2.0 and Caffe 0.13.1 (with CNMeM) using the web installer
Create a small dataset of 256x256 images
Train AlexNet on it
Try to classify an image

The big question

Why would we run out of memory during inference but not while training?

Model of parallelism?

What model of parallelism NVIDIA fork uses? Is there any documentation or paper about it?
How much speed improvement we may to expect using "-gpus all" switch for 2 and 4 GPUs?

Build nvidia caffe error on centOS 7

Hello,

I am trying to install Nvidia caffe 0.13 on centOS 7, However, I meet the following error when doing make all

/bin/ld: cannot find -lcblas
/bin/ld: cannot find -latlas
collect2: error: ld returned 1 exit status

In the Makefile.config,I've changed to

BLAS:=atlas
BLAS_INCLUDE:=/usr/include/atlas
BLAS_LIB:=/usr/lib64/atlas

and I check atlas related file exist (package atlas-devel-3.10.1-7.el7.x86_64)

Any suggestion?

Thanks!

Implement versioning for both Make and CMake

Versioning added to the Make path in 18a3ff4.

We need it for the CMake path, too. Ideally, both build systems should read from the same file to get the version.

A stub for this has already been created for CMake (which may or may not be helpful):
https://github.com/NVIDIA/caffe/blob/rc2/cmake/Summary.cmake#L58-L86

Benchmarking mode does not initialize CNMEM

If you use branch caffe-0.14 and enable cuDNN, CNMEM will be enabled automatically. If you want to train a model using "caffe.bin train" then CNMEM will be initialized and allocation calls should virtually have zero cost. CNMEM is initialized here:
https://github.com/NVIDIA/caffe/blob/caffe-0.14/tools/caffe.cpp#L200

Unfortunately, if you want to benchmark this version of caffe using "caffe.bin time", CNMEM will not be initialized, it should be done here:
https://github.com/NVIDIA/caffe/blob/caffe-0.14/tools/caffe.cpp#L316-L317
Since the memory pool is not initialized, any allocation call will fallback to cudaMalloc:
https://github.com/NVIDIA/caffe/blob/caffe-0.14/src/caffe/util/gpu_memory.cpp#L65-L66

The problem is that the new convolution calls attempts to do many allocations/deallocations:
https://github.com/NVIDIA/caffe/blob/caffe-0.14/src/caffe/layers/cudnn_conv_layer.cu#L34
If we hit the cudaMalloc fallback, we do 1 cudaMalloc and 1 cudaFree for each group and each convolution layer, for each forward pass.

The original caffe code will be faster using "caffe.bin time" because it allocates the workspace only once:
https://github.com/BVLC/caffe/blob/master/src/caffe/layers/cudnn_conv_layer.cpp#L200

caffe OS X build error on `NVCC src/caffe/layers/cudnn_conv_layer.cu`

Hi!

I'm having trouble with building caffe on OS X 10.11, getting this error (output below).
All other lines compile well. I ran make -k, and it got through to the end with no further errors.
I have CUDA 7.5, cuDNN 4,
all dependencies freshly installed with brew, among them: protobuf 3, boost 1.59

What may be wrong?

NVCC src/caffe/layers/cudnn_conv_layer.cu
src/caffe/layers/cudnn_conv_layer.cu(45): error: argument of type "cudnnAddMode_t" is incompatible with parameter of type "const void *"
          detected during instantiation of "void caffe::CuDNNConvolutionLayer<Dtype>::Forward_gpu(const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &, const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=float]" 
(141): here

src/caffe/layers/cudnn_conv_layer.cu(45): error: argument of type "const void *" is incompatible with parameter of type "cudnnTensorDescriptor_t"
          detected during instantiation of "void caffe::CuDNNConvolutionLayer<Dtype>::Forward_gpu(const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &, const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=float]" 
(141): here

src/caffe/layers/cudnn_conv_layer.cu(45): error: argument of type "const void *" is incompatible with parameter of type "cudnnTensorDescriptor_t"
          detected during instantiation of "void caffe::CuDNNConvolutionLayer<Dtype>::Forward_gpu(const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &, const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=float]" 
(141): here

src/caffe/layers/cudnn_conv_layer.cu(45): error: too many arguments in function call
          detected during instantiation of "void caffe::CuDNNConvolutionLayer<Dtype>::Forward_gpu(const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &, const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=float]" 
(141): here

src/caffe/layers/cudnn_conv_layer.cu(45): error: argument of type "cudnnAddMode_t" is incompatible with parameter of type "const void *"
          detected during instantiation of "void caffe::CuDNNConvolutionLayer<Dtype>::Forward_gpu(const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &, const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=double]" 
(141): here

src/caffe/layers/cudnn_conv_layer.cu(45): error: argument of type "const void *" is incompatible with parameter of type "cudnnTensorDescriptor_t"
          detected during instantiation of "void caffe::CuDNNConvolutionLayer<Dtype>::Forward_gpu(const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &, const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=double]" 
(141): here

src/caffe/layers/cudnn_conv_layer.cu(45): error: argument of type "const void *" is incompatible with parameter of type "cudnnTensorDescriptor_t"
          detected during instantiation of "void caffe::CuDNNConvolutionLayer<Dtype>::Forward_gpu(const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &, const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=double]" 
(141): here

src/caffe/layers/cudnn_conv_layer.cu(45): error: too many arguments in function call
          detected during instantiation of "void caffe::CuDNNConvolutionLayer<Dtype>::Forward_gpu(const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &, const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=double]" 
(141): here

8 errors detected in the compilation of "/var/folders/lc/3stgmw4n0yz4y27jj02k6g8r0000gn/T//tmpxft_0000bf2d_00000000-16_cudnn_conv_layer.compute_50.cpp1.ii".
make: *** [.build_release/cuda/src/caffe/layers/cudnn_conv_layer.o] Error 1

Install error with caffe-0.13 cudnn3

Hi,
I am trying to install caffe-0.13 in order to test performance on Kepler K20 and K40 machine.
I am able to install and run BLVC/caffe successfully. However nvida/caffe-0.13 failed.

Error information:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
. /usr/bin/ld: warning: libcudart.so.6.5, needed by /collect2: error: ld returned 1 exit status
usr/local/lib/libopencv_core.so, may conflict with libcudart.so.7.0
.build_release/lib/libcaffe.so: undefined reference to cudnnSetLRNDescriptor' .build_release/lib/libcaffe.so: undefined reference tocudnnGetConvolutionBackwardDataWorkspaceSize'
.build_release/lib/libcaffe.so: undefined reference to cudnnConvolutionBackwardData_v3' .build_release/lib/libcaffe.so: undefined reference tocudnnGetConvolutionBackwardFilterWorkspaceSize'
.build_release/lib/libcaffe.so: undefined reference to cudnnLRNCrossChannelBackward' .build_release/lib/libcaffe.so: undefined reference tocudnnCreateLRNDescriptor'
.build_release/lib/libcaffe.so: undefined reference to cudnnGetConvolutionBackwardFilterAlgorithm' .build_release/lib/libcaffe.so: undefined reference tocudnnLRNCrossChannelForward'
.build_release/lib/libcaffe.so: undefined reference to cudnnDestroyLRNDescriptor' .build_release/lib/libcaffe.so: undefined reference tocudnnConvolutionBackwardFilter_v3'
.build_release/lib/libcaffe.so: undefined reference to cudnnGetConvolutionBackwardDataAlgorithm' .build_release/lib/libcaffe.so: undefined reference tocudnnDivisiveNormalizationBackward'
.build_release/lib/libcaffe.so: undefined reference to `cudnnDivisiveNormalizationForward'
collect2: error: ld returned 1 exit status

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

It seems that libcaffe.so try to use cudnn v2 and cuda driver 6.5.
However I've already installed cuda-7.0 and set environment for both cuda-7.0 and cudnn v3.
By the way, Does any guy have performance data on K20/K40 ?
I'd like to know whether cudnn v3 has significant speed on these two platforms.

Thanks,
Xiuxia

How can I use multi-gpu option outside of the DIGITS?

What is this queue for?

Hello,

Can anyone kindly explain what is this queue_ for? I'm implementing a new solver for this multiGPU caffe. Please help, thank you.
BlockingQueue<\P2PSync*> queue_;

The Network crash when loading with the c++ interface

When i try to load a trained network with the c++ interface, the network crash without an error message. The Network itself should be ok, because it is the default Alexnet.

For some time i have it running. I will check tomorow if the normal caffe version is working

//---------------use in code--------------
caffe::shared_ptrcaffe::Net net_ = caffe::shared_ptrcaffe::Net(new Net(model_file, Phase::TEST));

//------------------output-----------------------------------

WARNING: Logging before InitGoogleLogging() is written to STDERR
I1207 16:46:13.002768 14518 net.cpp:47] Initializing net from parameters: 
input: "data"
input_dim: 1
input_dim: 3
input_dim: 227
input_dim: 227
state {
  phase: TEST
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "conv1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "norm1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "conv2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "norm2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 7
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "fc8"
  top: "prob"
}
I1207 16:46:13.004276 14518 net.cpp:411] Input 0 -> data
I1207 16:46:13.004315 14518 layer_factory.hpp:75] Creating layer conv1
I1207 16:46:13.004336 14518 net.cpp:99] Creating Layer conv1
I1207 16:46:13.004344 14518 net.cpp:453] conv1 <- data
I1207 16:46:13.004356 14518 net.cpp:409] conv1 -> conv1
I1207 16:46:13.004374 14518 net.cpp:131] Setting up conv1

How does NVCaffe MultiGPU Solver Works?

Hello,

I saw each GPU is assigned a solver; however, I 'm not sure how do you average the gradients from multiGPUs. Is it at each iterations, every GPU fetch the images of batch/GPUs and compute the gradient. Then the root solver retrieve the rest of gradients and average together. Then it goes to the next iteration. Is it correct? Thank you

plan to merge to master?

Hi, I noticed that Nvidia's version of caffe is supporting multiple GPU now but the master version not yet. Is there any plan to merge this version to the master?

compile error for cblas

when I complie caffe. I get this error:

make
CXX/LD -o .build_release/tools/compute_image_mean.bin
.build_release/lib/libcaffe-nv.so: undefined reference to `cblas_sgemv'
.build_release/lib/libcaffe-nv.so: undefined reference to `cblas_dgemm'
.build_release/lib/libcaffe-nv.so: undefined reference to `cblas_sscal'
.build_release/lib/libcaffe-nv.so: undefined reference to `cblas_dgemv'
.build_release/lib/libcaffe-nv.so: undefined reference to `cblas_saxpy'
.build_release/lib/libcaffe-nv.so: undefined reference to `cblas_ddot'
.build_release/lib/libcaffe-nv.so: undefined reference to `cblas_dasum'
.build_release/lib/libcaffe-nv.so: undefined reference to `cblas_sgemm'
.build_release/lib/libcaffe-nv.so: undefined reference to `cblas_dscal'
.build_release/lib/libcaffe-nv.so: undefined reference to `cblas_scopy'
.build_release/lib/libcaffe-nv.so: undefined reference to `cblas_sasum'
.build_release/lib/libcaffe-nv.so: undefined reference to `cblas_daxpy'
.build_release/lib/libcaffe-nv.so: undefined reference to `cblas_dcopy'
.build_release/lib/libcaffe-nv.so: undefined reference to `cblas_sdot'
collect2: error: ld returned 1 exit status
make: *** [.build_release/tools/compute_image_mean.bin] Error 1

But , when I build in this way

mkdir build
cd build
cmake ..
make all
make runtest

I got succeed , why?

Does this Caffe support multiGPUs

hello,

caffe arguments suggest multiGPU support, I'm asking to verify this point. Thank you.

cuDNN 3 + Caffe = Error??

I was compiling a new version of 0.12 caffe with cuDNN 3 + CUDA 7.5 and I got the following error.

make: *** [.build_release/tools/upgrade_net_proto_binary.bin] Error 1
.build_release/lib/libcaffe-nv.so: undefined reference to cudnnSetLRNDescriptor' .build_release/lib/libcaffe-nv.so: undefined reference tocudnnGetConvolutionBackwardDataWorkspaceSize'
.build_release/lib/libcaffe-nv.so: undefined reference to cudnnConvolutionBackwardData_v3' .build_release/lib/libcaffe-nv.so: undefined reference tocudnnGetConvolutionBackwardFilterWorkspaceSize'
.build_release/lib/libcaffe-nv.so: undefined reference to cudnnLRNCrossChannelBackward' .build_release/lib/libcaffe-nv.so: undefined reference tocudnnCreateLRNDescriptor'
.build_release/lib/libcaffe-nv.so: undefined reference to cudnnGetConvolutionBackwardFilterAlgorithm' .build_release/lib/libcaffe-nv.so: undefined reference tocudnnLRNCrossChannelForward'
.build_release/lib/libcaffe-nv.so: undefined reference to cudnnDestroyLRNDescriptor' .build_release/lib/libcaffe-nv.so: undefined reference tocudnnConvolutionBackwardFilter_v3'
.build_release/lib/libcaffe-nv.so: undefined reference to cudnnGetConvolutionBackwardDataAlgorithm' .build_release/lib/libcaffe-nv.so: undefined reference tocudnnDivisiveNormalizationBackward'
.build_release/lib/libcaffe-nv.so: undefined reference to `cudnnDivisiveNormalizationForward'
collect2: error: ld returned 1 exit status
make: *** [.build_release/tools/convert_imageset.bin] Error 1

Is there a way to fix this or should I simply roll back to cuDNN 2 and CUDA 7.0 for now?

"For Python, for now, we'll just always use float as the type", _caffe.cpp

How to support float16?

Update request - iter_size

There was a request in this thread to merge the change adding iter_size. It was was added in BVLC#1977 on May 29th.

cudnnBatchNormMode_t mode_ not defined when compiling with CuDNN 3

Hello,

I'm getting this error when compiling Caffe 0.14 with CuDNN 3. However, if i revert to the version '0.14.0-rc.1' which I downloaded a week ago then there is no error and everything compiles fine. There is a difference in common_layers.hpp, the earlier one doesn't mention cudnnBatchNormMode_t mode_ at all.

nvidia / caffe Goto Github PK

caffe's Introduction

NVIDIA has discontinued support and maintenance for this repository. Everything is provided as-is with no further updates being accepted. Thanks for all the contributions and engagement!

Caffe

NVCaffe

License and Citation

caffe's People

Contributors

Stargazers

Watchers

Forkers

caffe's Issues

How to reproduce

Error

this is the error message

Done.

train.txt

...

val.txt

...

The big question

Recommend Projects

Recommend Topics

Recommend Org