Comments (11)
Can you trying to set deviceId to 0 other than "auto" to select GPU manually.
from cntk.
I have seen this before, but I thought we fixed it. Will have a look tomorrow morning.
Thanks for reporting it!
Sent from Outlookhttp://aka.ms/Ox5hz3
On Thu, Jan 28, 2016 at 9:05 PM -0800, "Aerosoul" <[email protected]mailto:[email protected]> wrote:
Can you trying to set deviceId to 0 other than "auto" to select GPU manually.
Reply to this email directly or view it on GitHubhttps://github.com//issues/55#issuecomment-176576409.
from cntk.
Yes setting deviceId=0 solves it for MNIST.
I also tried running CIFAR10 where I run this :
$ cntk configFile=01_Conv.config configName=01_Conv deviceId=0
And now I get this error :
Validating --> conv1_act.y = RectifiedLinear(conv1_act.p[32 x 32 x 32 x 1 x ]) -> [32 x 32 x 32 x 1 x *]
Validating --> pool1 = MaxPooling(conv1_act.y[32 x 32 x 32 x 1 x *])
[CALL STACK]
/scratch-shared/mch/scratch/dipsank/CUDNN/CNTK/bin/../lib/libcntkmath.so ( Microsoft::MSR::CNTK::DebugUtil::PrintCallStack() + 0xb4 ) [0x7fd7b9bdfd44]
cntk ( void Microsoft::MSR::CNTK::ThrowFormattedstd::invalid_argument(char const, ...) + 0xc0 ) [0x5366d0]
cntk ( Microsoft::MSR::CNTK::PoolingNodeBase::Validate(bool) + 0x325 ) [0x5a47b5]
cntk ( Microsoft::MSR::CNTK::MaxPoolingNode::Validate(bool) + 0x14 ) [0x5a48d4]
cntk ( Microsoft::MSR::CNTK::ComputationNetwork::ValidateNodes(std::liststd::shared_ptr<Microsoft::MSR::CNTK::ComputationNodeBase, std::allocatorstd::shared_ptr<Microsoft::MSR::CNTK::ComputationNodeBase > >, bool, unsigned long&) + 0x372 ) [0x6c0252]
cntk ( Microsoft::MSR::CNTK::ComputationNetwork::ValidateSubNetwork(std::shared_ptrMicrosoft::MSR::CNTK::ComputationNodeBase const&) + 0x205 ) [0x6c0b35]
cntk ( Microsoft::MSR::CNTK::ComputationNetwork::CompileNetwork() + 0x21f ) [0x6c35af]
cntk ( Microsoft::MSR::CNTK::NDLBuilder::LoadFromConfig(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&) + 0x1de ) [0x5961ee]
cntk ( std::_Function_handlerstd::shared_ptr<Microsoft::MSR::CNTK::ComputationNetwork (int), void DoTrain<Microsoft::MSR::CNTK::ConfigParameters, float>(Microsoft::MSR::CNTK::ConfigParameters const&)::{lambda(int)#2}>::M_invoke(std::Any_data const&, int) + 0x7f ) [0x7627ef]
cntk ( Microsoft::MSR::CNTK::SGD::Train(std::functionstd::shared_ptr<Microsoft::MSR::CNTK::ComputationNetwork (int)>, int, Microsoft::MSR::CNTK::IDataReader, Microsoft::MSR::CNTK::IDataReader, bool) + 0x4c8 ) [0x74b538]
cntk ( void DoTrain<Microsoft::MSR::CNTK::ConfigParameters, float>(Microsoft::MSR::CNTK::ConfigParameters const&) + 0x21a ) [0x76134a]
cntk ( void DoCommands(Microsoft::MSR::CNTK::ConfigParameters const&) + 0x7a4 ) [0x5926e4]
cntk ( wmainOldCNTKConfig(int, wchar_t**) + 0xaa1 ) [0x52a941]
cntk ( wmain1(int, wchar_t**) + 0x62 ) [0x52b0f2]
cntk ( main + 0xcc ) [0x51e06c]
/lib64/libc.so.6 ( __libc_start_main + 0xfd ) [0x344e61ed5d]
cntk ( ) [0x521b09]
EXCEPTION occurred: Convolution operation currently only supports 1D or 2D convolution on 3D tensors.
Works fine for configFile=02_BatchNormConv.config
If I do not put the deviceId for CIFAR10 then I get the same error reported above.
from cntk.
Is it possible to run with multiple GPUs using only cntk ?
Or do I have to use MPI to launch cntk on multiple GPUs ?
from cntk.
mpiexec is necessary to launch multi-GPU jobs.
(You can run independent jobs of course.)
Thanks,
Frank
From: such87 [mailto:[email protected]]
Sent: Thursday, January 28, 2016 22:12
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] Error with MNIST Dataset (#55)
Is it possible to run with multiple GPUs using only cntk ?
Or do I have to use MPI to launch cntk on multiple GPUs ?
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/55#issuecomment-176594264.
from cntk.
If I run like this :
mpiexec -n 4 cntk Config/config_file deviceId=0
Then each process will run only on GPU 0.
My platform is having 4 GPUs and I want each process to select
a unique GPU like proc0 selects GPU0, proc1 selects GPU 1 and so on.
The code is failing with deviceId not set (which I guess defaults to auto).
It generates the error that I first reported.
Is this a known issue ?
from cntk.
EXCEPTION occurred: DeviceFromConfig: unexpected failure
This may be due to not being able to write to /var/lock. I have refined the error message (will take a while to land).
from cntk.
Could you try if you can write to /var/lock? E.g.
echo test > /var/lock/test.txt
/var/lock is used to implement a global lock through the file system. If you do not have write access on your system, could you try to make it accessible for you? If that is not possible, a stopgap would be to manually edit CrossProcesMutex.h and change /var/lock to /tmp or something, and recompile. We have on our todo list to make this lock location configurable.
from cntk.
Tried changing /var/lock to lock on a local directory.
Not helping, the same error occurs.
from cntk.
I get the same error even when I add deviceId=0 or deviceId=1. Note that I have two GPUs on my system. Is it resolved?
from cntk.
Original issue was solved, closing this.
@such87: can you retry the CIFAR-10 example? There should have been some fixes in the mean time addressing this. If it doesn't run, please open a new issue. Thank you!
@such87, @saonim: for MPI execution, can you also try with the latest changes and post a new issue if it's still failing? Thank you...
For lock file location there already #62 to track.
from cntk.
Related Issues (20)
- Value goes invalid when using TestMinibatch
- Request for a no-opencv dotnet release
- CNTK C# Crash when layer is deeper
- Convolution 1D CNTK C++ HOT 1
- Add support to release linux aarch64 wheels
- Error while deploying MS Teams Bot with SSO
- how to install in python3.8 HOT 1
- A model causes CNTK crash with cudnnSetPoolingNDDescriptor when invoking the pooling operator
- This repo is missing important files HOT 1
- API document issue due to syntax in source code HOT 1
- Error: could not find all specified 'to_nodes' in clone. Looking for ['relu5'], found [None]
- SequenceClassification.py does not work out of box HOT 1
- Microsoft has absolutely no potential
- development of a neural network for object search
- join vcpkg
- cntk crash and pycharm process finished with exit code -1066598274 (0xC06D007E)
- ConvolutionTranspose2D outputs normally when num_filters is tuple and dilation is 0
- program crash when get gradient of `ConvolutionTranspose2D`
- when shape contains negative integer, input_variable should throw an error HOT 1
- MAX_POOLING crash, when pooling_window_shape contains 0 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cntk.