Comments (16)
Hi, invalid device function indicates that you have a CUDA / GPU incompatibility.
Since you use GTX 1080, you can modify CMake to fix it.
open cmake/flags.cmake and add following code:
if (CUDA_VERSION VERSION_GREATER "8.0")
list(APPEND __arch_flags " -gencode arch=compute_60,code=sm_60")
endif()
then, rebuild the project
from paddle.
@hedaoyuan , Hi, thanks for your reply! Do you mean add GLOG_v=3
in train.sh
under demo\image_classification? I tried this and Paddle gives me the following error log (seems a little different from the one in the original post):
I1004 10:36:08.486361 8402 Util.cpp:151] commandline: ../../build/paddle/trainer/paddle_trainer --config=vgg_16_cifar.py --dot_period=10 --log_period=100 --test_all_data_in_one_period=1 --use_gpu=1 --trainer_count=1 --num_passes=200 --save_dir=./cifar_vgg_model
I1004 10:36:14.866709 8402 Util.cpp:126] Calling runInitFunctions
I1004 10:36:14.866930 8402 Util.cpp:139] Call runInitFunctions done.
I1004 10:36:14.875442 8402 TrainerConfigHelper.cpp:55] Parsing trainer config vgg_16_cifar.py
[INFO 2016-10-04 10:36:15,093 layers.py:1620] channels=3 size=3072
[INFO 2016-10-04 10:36:15,093 layers.py:1620] output size for __conv_0__ is 32
[INFO 2016-10-04 10:36:15,096 layers.py:1620] channels=64 size=65536
[INFO 2016-10-04 10:36:15,097 layers.py:1620] output size for __conv_1__ is 32
[INFO 2016-10-04 10:36:15,099 layers.py:1681] output size for __pool_0__ is 16*16
[INFO 2016-10-04 10:36:15,100 layers.py:1620] channels=64 size=16384
[INFO 2016-10-04 10:36:15,100 layers.py:1620] output size for __conv_2__ is 16
[INFO 2016-10-04 10:36:15,101 layers.py:1620] channels=128 size=32768
[INFO 2016-10-04 10:36:15,102 layers.py:1620] output size for __conv_3__ is 16
[INFO 2016-10-04 10:36:15,103 layers.py:1681] output size for __pool_1__ is 8*8
[INFO 2016-10-04 10:36:15,103 layers.py:1620] channels=128 size=8192
[INFO 2016-10-04 10:36:15,104 layers.py:1620] output size for __conv_4__ is 8
[INFO 2016-10-04 10:36:15,105 layers.py:1620] channels=256 size=16384
[INFO 2016-10-04 10:36:15,105 layers.py:1620] output size for __conv_5__ is 8
[INFO 2016-10-04 10:36:15,106 layers.py:1620] channels=256 size=16384
[INFO 2016-10-04 10:36:15,106 layers.py:1620] output size for __conv_6__ is 8
[INFO 2016-10-04 10:36:15,108 layers.py:1681] output size for __pool_2__ is 4*4
[INFO 2016-10-04 10:36:15,108 layers.py:1620] channels=256 size=4096
[INFO 2016-10-04 10:36:15,108 layers.py:1620] output size for __conv_7__ is 4
[INFO 2016-10-04 10:36:15,110 layers.py:1620] channels=512 size=8192
[INFO 2016-10-04 10:36:15,110 layers.py:1620] output size for __conv_8__ is 4
[INFO 2016-10-04 10:36:15,111 layers.py:1620] channels=512 size=8192
[INFO 2016-10-04 10:36:15,111 layers.py:1620] output size for __conv_9__ is 4
[INFO 2016-10-04 10:36:15,112 layers.py:1681] output size for __pool_3__ is 2*2
[INFO 2016-10-04 10:36:15,113 layers.py:1681] output size for __pool_4__ is 1*1
[INFO 2016-10-04 10:36:15,115 networks.py:1125] The input order is [image, label]
[INFO 2016-10-04 10:36:15,115 networks.py:1132] The output order is [__cost_0__]
I1004 10:36:15.138290 8402 Trainer.cpp:170] trainer mode: Normal
F1004 10:36:15.139696 8402 hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
*** Check failure stack trace: ***
@ 0x7fdd6ccaedaa (unknown)
@ 0x7fdd6ccaece4 (unknown)
@ 0x7fdd6ccae6e6 (unknown)
@ 0x7fdd6ccb1687 (unknown)
@ 0x78ffa9 hl_gpu_apply_unary_op<>()
@ 0x758d2f paddle::BaseMatrixT<>::applyUnary<>()
@ 0x758919 paddle::BaseMatrixT<>::applyUnary<>()
@ 0x742e9f paddle::BaseMatrixT<>::zero()
@ 0x62c94e paddle::Parameter::enableType()
@ 0x628e0c paddle::parameterInitNN()
@ 0x62b27b paddle::NeuralNetwork::init()
@ 0x630e93 paddle::GradientMachine::create()
@ 0x6ad345 paddle::TrainerInternal::init()
@ 0x6a9727 paddle::Trainer::init()
@ 0x542a65 main
@ 0x7fdd6bebaf45 (unknown)
@ 0x54e355 (unknown)
@ (nil) (unknown)
Aborted (core dumped)
No data to plot. Exiting!
from paddle.
@stoneyang fixed Fix CUDA_VERSION Comparsion #165
from paddle.
duplicated with #3 .
CUDA 8.0 support issue is under resolving.
from paddle.
Please track #3 for recent progresses.
from paddle.
Hi, @reyoung
The problem I posted here is an issue on using PaddlePaddle AFTER SUCCESSFULLY BUILDING THE CODE, which is different from #3 you try to referenced. As far as I know, guy who reported an issue of the building process with CUDA 8.0! That issue can be easily shot down by my merged PR #15 .
So I don't think the two issue can be treated like the same problem.
Appended is my nvcc version for your reference:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Wed_May__4_21:01:56_CDT_2016
Cuda compilation tools, release 8.0, V8.0.26
Hope it helps :)
from paddle.
Thanks for your reply @gangliao !
I'll try it later.
from paddle.
@gangliao same errors when running train.sh
as my original post ....
I appended the lines at the end of flags.cmake
under cmake
directory as:
if (CUDA_VERSION VERSION_GREATER "7.0")
list(APPEND __arch_flags " -gencode arch=compute_52,code=sm_52")
endif()
if(CUDA_VERSION VERSION_GREATER "8.0")
list(APPEND __arch_flags " -gencode arch=compute_60,code=sm_52")
endif()
set(CUDA_NVCC_FLAGS ${__arch_flags} ${CUDA_NVCC_FLAGS})
and run the same building steps and it did success.
P.S.: Same error when directly replaced if() ... endif()
for 7.0 check and configure with 8.0:
# if (CUDA_VERSION VERSION_GREATER "7.0")
# list(APPEND __arch_flags " -gencode arch=compute_52,code=sm_52")
# endif()
if(CUDA_VERSION VERSION_GREATER "8.0")
list(APPEND __arch_flags " -gencode arch=compute_60,code=sm_52")
endif()
set(CUDA_NVCC_FLAGS ${__arch_flags} ${CUDA_NVCC_FLAGS})
from paddle.
@stoneyang Why do you set sm_52? how about -gencode arch=compute_60,code=sm_60
??
if (CUDA_VERSION VERSION_GREATER "7.0")
list(APPEND __arch_flags " -gencode arch=compute_52,code=sm_52")
endif()
if(CUDA_VERSION VERSION_GREATER "8.0")
list(APPEND __arch_flags " -gencode arch=compute_60,code=sm_60")
endif()
set(CUDA_NVCC_FLAGS ${__arch_flags} ${CUDA_NVCC_FLAGS})
from paddle.
@gangliao sorry for the typo ....
I changed those lines (see PR #40). And the new CUDA code generation configuration works fine, which is different from your suggestion. :)
foreach(capability 30 35 50 52 60)
list(APPEND __arch_flags " -gencode arch=compute_${capability},code=sm_${capability}")
endforeach()
Please note that the former if() ... endif()
-s for CUDA 7.0 or later are commented out since it is invalid (they keep producing errors in my original post). This might be the cmake-thing, which I do not have enough time to verify it for now.
But new error occurs when running:
$ sh train.sh
The log:
I0905 13:50:01.279917 20299 Util.cpp:144] commandline: /path/to/Paddle/build/paddle/trainer/paddle_trainer --config=vgg_16_cifar.py --dot_period=10 --log_period=100 --test_all_data_in_one_period=1 --use_gpu=1 --trainer_count=1 --num_passes=200 --save_dir=./cifar_vgg_model
I0905 13:50:02.728484 20299 Util.cpp:113] Calling runInitFunctions
I0905 13:50:02.728711 20299 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-05 13:50:02,782 layers.py:1438] channels=3 size=3072
[INFO 2016-09-05 13:50:02,782 layers.py:1438] output size for __conv_0__ is 32
[INFO 2016-09-05 13:50:02,784 layers.py:1438] channels=64 size=65536
[INFO 2016-09-05 13:50:02,784 layers.py:1438] output size for __conv_1__ is 32
[INFO 2016-09-05 13:50:02,785 layers.py:1499] output size for __pool_0__ is 16*16
[INFO 2016-09-05 13:50:02,786 layers.py:1438] channels=64 size=16384
[INFO 2016-09-05 13:50:02,786 layers.py:1438] output size for __conv_2__ is 16
[INFO 2016-09-05 13:50:02,787 layers.py:1438] channels=128 size=32768
[INFO 2016-09-05 13:50:02,787 layers.py:1438] output size for __conv_3__ is 16
[INFO 2016-09-05 13:50:02,788 layers.py:1499] output size for __pool_1__ is 8*8
[INFO 2016-09-05 13:50:02,789 layers.py:1438] channels=128 size=8192
[INFO 2016-09-05 13:50:02,789 layers.py:1438] output size for __conv_4__ is 8
[INFO 2016-09-05 13:50:02,790 layers.py:1438] channels=256 size=16384
[INFO 2016-09-05 13:50:02,790 layers.py:1438] output size for __conv_5__ is 8
[INFO 2016-09-05 13:50:02,791 layers.py:1438] channels=256 size=16384
[INFO 2016-09-05 13:50:02,792 layers.py:1438] output size for __conv_6__ is 8
[INFO 2016-09-05 13:50:02,793 layers.py:1499] output size for __pool_2__ is 4*4
[INFO 2016-09-05 13:50:02,793 layers.py:1438] channels=256 size=4096
[INFO 2016-09-05 13:50:02,793 layers.py:1438] output size for __conv_7__ is 4
[INFO 2016-09-05 13:50:02,794 layers.py:1438] channels=512 size=8192
[INFO 2016-09-05 13:50:02,795 layers.py:1438] output size for __conv_8__ is 4
[INFO 2016-09-05 13:50:02,796 layers.py:1438] channels=512 size=8192
[INFO 2016-09-05 13:50:02,796 layers.py:1438] output size for __conv_9__ is 4
[INFO 2016-09-05 13:50:02,797 layers.py:1499] output size for __pool_3__ is 2*2
[INFO 2016-09-05 13:50:02,797 layers.py:1499] output size for __pool_4__ is 1*1
[INFO 2016-09-05 13:50:02,799 networks.py:1122] The input order is [image, label]
[INFO 2016-09-05 13:50:02,799 networks.py:1129] The output order is [__cost_0__]
I0905 13:50:02.806649 20299 Trainer.cpp:169] trainer mode: Normal
I0905 13:50:02.840154 20299 PyDataProvider2.cpp:219] loading dataprovider image_provider::processData
[INFO 2016-09-05 13:50:03,068 image_provider.py:52] Image size: 32
[INFO 2016-09-05 13:50:03,069 image_provider.py:53] Meta path: data/cifar-out/batches/batches.meta
[INFO 2016-09-05 13:50:03,069 image_provider.py:58] DataProvider Initialization finished
I0905 13:50:03.069367 20299 PyDataProvider2.cpp:219] loading dataprovider image_provider::processData
[INFO 2016-09-05 13:50:03,069 image_provider.py:52] Image size: 32
[INFO 2016-09-05 13:50:03,069 image_provider.py:53] Meta path: data/cifar-out/batches/batches.meta
[INFO 2016-09-05 13:50:03,070 image_provider.py:58] DataProvider Initialization finished
I0905 13:50:03.070163 20299 GradientMachine.cpp:134] Initing parameters..
I0905 13:50:03.706565 20299 GradientMachine.cpp:141] Init parameters done.
.........
I0905 13:50:44.860791 20299 TrainerInternal.cpp:162] Batch=100 samples=12800 AvgCost=2.34658 CurrentCost=2.34658 Eval: classification_error_evaluator=0.825234 CurrentEval: classification_error_evaluator=0.825234
.........
I0905 13:50:52.310623 20299 TrainerInternal.cpp:162] Batch=200 samples=25600 AvgCost=2.16351 CurrentCost=1.98044 Eval: classification_error_evaluator=0.782852 CurrentEval: classification_error_evaluator=0.740469
.........
I0905 13:50:59.720386 20299 TrainerInternal.cpp:162] Batch=300 samples=38400 AvgCost=2.01217 CurrentCost=1.70949 Eval: classification_error_evaluator=0.743021 CurrentEval: classification_error_evaluator=0.663359
.........I0905 13:51:06.456202 20299 TrainerInternal.cpp:179] Pass=0 Batch=391 samples=50048 AvgCost=1.89668 Eval: classification_error_evaluator=0.705
F0905 13:51:08.651224 20299 hl_cuda_cudnn.cc:779] Check failed: CUDNN_STATUS_SUCCESS == cudnnStat (0 vs. 5) Cudnn Error: CUDNN_STATUS_INVALID_VALUE
*** Check failure stack trace: ***
@ 0x7fc6006fddaa (unknown)
@ 0x7fc6006fdce4 (unknown)
@ 0x7fc6006fd6e6 (unknown)
@ 0x7fc600700687 (unknown)
@ 0x8b4594 hl_convolution_forward()
@ 0x5b3fac paddle::CudnnConvLayer::forward()
@ 0x62a01c paddle::NeuralNetwork::forward()
@ 0x6bdaff paddle::Tester::testOneBatch()
@ 0x6be412 paddle::Tester::testOnePeriod()
@ 0x6a28d4 paddle::Trainer::trainOnePass()
@ 0x6a5cc7 paddle::Trainer::train()
@ 0x5439f3 main
@ 0x7fc5ff909f45 (unknown)
@ 0x54efd5 (unknown)
@ (nil) (unknown)
Aborted (core dumped)
Looks like an cuDNN issue about the function cudnnConvolutionForward()
defined in Nvidia's cuDNN library. Might it is a wrapper issue?
from paddle.
GPU K20/40 works fine on this demo. Currently, this bug only appeared on GPU GTX, one of my colleagues already reproduced it and will solve it soon. Thanks.
from paddle.
Thanks for reopening this issue! @gangliao
Appended is the version of cuDNN: 5.0.5
.
Hope it is helpful.
from paddle.
Fixed #107, and closed.
from paddle.
Same issue as the original report still persists after new commits were fetched and re-make
-ed. Any further suggestions? @gangliao @hedaoyuan
from paddle.
@stoneyang Can you try adding GLOG_v=3 option on the command line(like this GLOG_v=3 paddle ...). This will print out more debug information.
from paddle.
@gangliao , seems perfect now!
from paddle.
Related Issues (20)
- OSError: (External) CUDNN error(8) HOT 3
- AI studio 支持pyvista stpyvista HOT 4
- 【AI studio】 Paddle 3.0 和 graido 依赖冲突 HOT 2
- paddle.create_parameter 的参数名 HOT 2
- remote: [ModelScope]0e40819a, repo size(1501546639) greater than max repo size(524288000). HOT 1
- Paddle.nn.LSTM 模型使用.to('gpu:0')报错
- knowledge_mining在GPU Tesla P100 无法运行(其他显卡和主机环境OK、CPU版本OK) HOT 1
- 如何将动态图的模型参数转存为静态图 HOT 1
- 源码编译cmake时遇到Something Wrong here, this backward op (global_gather_grad)'s forward op (global_gather) does not exist.的问题 HOT 9
- paddle.to_tensor(sample)生成的张量全是0 HOT 5
- paddle_interface预编译包缺乏phi依赖 HOT 4
- 【develop】 paddle.base.libpaddle has no attribute 'WorkerInfo' HOT 3
- 关于paddle.incubate.nn.functional.block_multihead_attention函数 HOT 2
- 动态图 如何打印网络结构
- 当我尝试在GPU上建立tensor时总是得到0,尽管他在CPU中能够正常运作
- Windows系统conda方式安装环境配置问题 HOT 6
- 昇腾安装paddle-custom-npu报错 HOT 4
- collate_fn支持GPU操作
- collate_fn支持GPU操作
- [PIR] test_ir_save_load.py 执行报错 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paddle.