enet-training's People
Forkers
tfwu wanjinchang hedgefair kirk86 wucpmark jxchen01 liuxiabing diz-vara syed-ahmed sbug15 anuragranj xshhhm p-andra gopigrip7 richardkelley wuthmone coolhebei wyw636 yhkim8412 markjingnb alexeyab kevin0932 hzq-github lymhust issac8huxley pchank lyk125 cuijianzhu sunnywanghj collector-m phunghx soledad89 ml-lab meshiguge wangjuenew satoshirobatofujimoto tpys sdemyanov githubfragments huaijin-chen jeanpat xilaili hexiangquan kastnerkyle zjucsxxd cyz0054 pharish93 yogendratamang48 jiangqh nagyist expipiplus1 dreadlord1984 segmentationorg jay98 yugvirparmar neuralnetworkingtechnologies yammay yaweiye29 saviogeorge dapengliu stevenlol zgsxwsdxg jasonlee020 wenxuanliu yasser2652 matsuren undercontroller zibu15 giorking gaojie0105 bermanmaxim briando2005 tuofen afcarl mengyingwu starstarfish sahandv vehicularkech emmaymjin happog jurjsorinliviu sinofairy fangwudi yuwvehu kingwingshome 3229018240 dayaalexenet-training's Issues
No result to visualize
I was trying to train the encoder and decoder and then visualize that for CityScape. Training for encoder is:
th run.lua --dataset cs --datapath ~/Desktop/Dataset/CityScape --model models/encoder.lua --save data/CityScape/trained_encoder/ --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64 --cachepath data/CityScape/enc_cache/ --nGPU 1 --lrDecayEvery 10 -b 5 --maxepoch 100
Training for decoder is:
th run.lua --dataset cs --datapath ~/Desktop/Dataset/CityScape --model models/decoder.lua --imHeight 256 --imWidth 512 --labelHeight 256 --labelWidth 512 --CNNEncoder data/CityScape/trained_encoder/model-best.net --nGPU 1 --cachepath data/CityScape/dec_cache/ --save data/CityScape/trained_decoder/ --lrDecayEvery 10 -b 5 --maxepoch 100
I used the best decoder model to visualize, however the e-Lab Scene Parser is just a white window with classes listed on the side. The error is:
video statistics: 29.979879275654 fps, 360x640 (149 frames)
Press Spacebar to pause or
Right Arrow to skip forward or
Esc to exit
demo.lua:221: attempt to call method 'squeeze' (a nil value)
stack traceback:
demo.lua:221: in function demo.lua:186
demo.lua:221: attempt to call method 'squeeze' (a nil value)
stack traceback:
demo.lua:221: in function demo.lua:186
demo.lua:221: attempt to call method 'squeeze' (a nil value)
stack traceback:
demo.lua:221: in function demo.lua:186
demo.lua:221: attempt to call method 'squeeze' (a nil value)
What's the possible problem here?
module 'fastimage' not found
I want to test ENet. I enter the following in the terminal:
qlua demo.lua -i /home/timo/example_image/004.png -m /home/timo/ENet-training/model/model-best.net
Then I get the following output:
Found Environment variable CUDNN_PATH = /usr/local/cuda/lib64/libcudnn.so.5
GPU # 1 selected
Loading model from: /home/timo/ENet-training/model/model-best.net
No stat file found in directory: /home/timo/ENet-training/model/home/timo/ENet-training/model/model-best.net
newcatdir= /home/timo/ENet-training/model/categories.txt
Loading categories file from: /home/timo/ENet-training/model/categories.txt
/home/timo/ENet-training/model/categories.txt
Network has this list of categories, targets:
1 Unlabeled true
2 Road true
3 Sidewalk true
4 Building true
5 Wall true
6 Fence true
7 Pole true
8 TrafficLight true
9 TrafficSign true
10 Vegetation true
11 Terrain true
12 Sky true
13 Person true
14 Rider true
15 Car true
16 Truck true
17 Bus true
18 Train true
19 Motorcycle true
20 Bicycle true
qlua: ./frame/frameimage.lua:17: module 'fastimage' not found:
no field package.preload['fastimage']
no file '/home/timo/.luarocks/share/lua/5.1/fastimage.lua'
no file '/home/timo/.luarocks/share/lua/5.1/fastimage/init.lua'
no file '/home/timo/torch/install/share/lua/5.1/fastimage.lua'
no file '/home/timo/torch/install/share/lua/5.1/fastimage/init.lua'
no file './fastimage.lua'
no file '/home/timo/torch/install/share/luajit-2.1.0-beta1/fastimage.lua'
no file '/usr/local/share/lua/5.1/fastimage.lua'
no file '/usr/local/share/lua/5.1/fastimage/init.lua'
no file '/home/timo/torch/install/lib/fastimage.so'
no file '/home/timo/.luarocks/lib/lua/5.1/fastimage.so'
no file '/home/timo/torch/install/lib/lua/5.1/fastimage.so'
no file './fastimage.so'
no file '/usr/local/lib/lua/5.1/fastimage.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
[C]: at 0x7f5bf93969c0
[C]: in function 'require'
./frame/frameimage.lua:17: in function 'init'
demo.lua:147: in main chunk
"luarocks install fastimage" does not work unfortunately. Does somebody has an idea? Thank you in advance!
Preprocessing Issue - Data Size
Hi Adam,
May I know have you resized the input resolution before training?
The original Cityscapes datasets have the resolution of 2048x1024, however in README.md it seems that you use 512x256 as the input size.
I used
--dataset cs --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64
and
--imHeight 256 --imWidth 512 --labelHeight 256 --labelWidth 512
to train an encoder and a decoder with 2048x1024 input. No error was reported however the visualized result was rather rough(using your visualize tool). I wonder if there is any mistake with input data.
Thanks for your answering!
Train and use without CUDA!
Hi
Is it possible to design encoder and decoder in a way not using CUDA?
Thank You,
Mina
CPU Implementation
Hi, I would like to use this for segmenting images on CPU. I tried to run the trained classifier to segment one of the image, by taking out the Cuda line in demo.lua. But the model shows requirement for cuda. Is there some option to run it?
Thanks
Save segmented labels
Is there anyway to save the label result (saying just class number from 0-11 for 1-N) as an image file? It should be the tensor 'winners' from demo.lua in visualize. I could image.display(winners) but I couldn't image.save(winners) as it would save an image with all pixels at 255.
Layers with and without bias
Hi!
I read in your paper that you did not use bias terms in your convolution layers, so I wonder why, in your encoder.lua, you use both spatial convolutions with bias and spatial convolutions without bias, for example (line 43-44):
main:add(cudnn.SpatialConvolution(internal, internal, asymetric, 1, 1, 1, pad, 0):noBias()) main:add(cudnn.SpatialConvolution(internal, internal, 1, asymetric, 1, 1, 0, pad))
Thanks for your answer
Training on SUN RGB-D dataset
Hi,
I want to train ENet model on SUN RGB-D dataset, but I found that the ground truth of each image is not consistent.
I following the source code to load the label of each image with
m = require 'matio'
label = m.load(/path/to/folders/'seg.mat').seglabel
Then, drawing an output image with the label, and making different index label has different color.
But, for example, beds are labelled with different color/index in following images
And other objects have different index in different images.
Also, SUN RGB-D dataset has 38 classes (including unlabelled class), so the index interval should be [0, 37] or [1, 38].
But some seg.mat file has the index number larger than 37 and 38, for example, 45, 46 appeared.
I'm wonder what's going wrong about the ground truths?
Many thanks.
Running semantic segmentation with a different model file, on a different dataset.
I found the code quite readable and newbie-friendly, and am trying to build my own semantic segmentation repository based off of ENet. For now, the details have been abstracted away to get the system up and running.
I'm working with my own dataset - img: 256x256, lab: 128x128 (architecture has 1 2x2 pooling). A loadDataset.lua script generates the data/labels as 4D tensors : numFiles X channels X hgt X wdt. A Resnet-style model is called from run.lua for training an epoch [trying to figure out only training for now] as follows:
epoch = 1
trainConf, model, loss = trainer(data.trainData, opt.dataClasses, epoch);
I am getting the following error:
==> Training: epoch # 1 [batchSize = 128]
THCudaCheck FAIL file=/home/ishann/torch/extra/cutorch/lib/THC/generic/THCStorage.cu line=41 error=2 : out of memory
/home/ishann/torch/install/share/lua/5.1/nn/Container.lua:67:
In 5 module of nn.Sequential:
In 1 module of nn.Sequential:
In 3 module of nn.Sequential:
In 1 module of nn.ConcatTable:
In 1 module of nn.Sequential:
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:142: cuda runtime error (2) : out of memory at /home/ishann/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:41
stack traceback:
[C]: in function 'resize'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:142: in function 'createIODescriptors'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:349: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:347>
[C]: in function 'xpcall'
/home/ishann/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/ishann/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/ishann/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
/home/ishann/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...e/ishann/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function <...e/ishann/torch/install/share/lua/5.1/nn/ConcatTable.lua:9>
[C]: in function 'xpcall'
...
/home/ishann/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/ishann/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
./trainer.lua:78: in function 'opfunc'
/home/ishann/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
./trainer.lua:94: in function 'trainer'
[string "trainConf, model, loss = trainer(data.trainDa..."]:1: in main chunk
[C]: in function 'xpcall'
/home/ishann/torch/install/share/lua/5.1/trepl/init.lua:669: in function 'repl'
...ushb/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
[C]: at 0x00406670
I am currently working with a single GPU and have removed the dataParallelTable code segment from my model file. The model and loss have been converted to CUDA formats.
Environment : Ubuntu-14.04, TitanX GPU, CUDA V7.0.27, Driver Version 346.35.
Ground truth pixel values in CamVid
Hi,
I use CamVid dataset to train a model, and its parameter setting is same as the ENet paper.
After training, I forward a single image with my trained model, and the results are:
From this single image test, it seems that the trained model is powerful enough.
With the image above, and its annotated image, I try to calculate its accuracy.
I considered the pixel values in annotated image indicate its class.
For example, 0 indicates background, 1 indicates sky, and so on.
After forwarding the image, I compare each pixel in output vector with its ground truth,
and get the accuracy 0.06 = 6%.
After that, I saw the following code in loadCamVid.lua
-- load corresponding ground truth
rawImg = image.load(gtPath[i], 1, 'byte'):squeeze():float() + 2
local mask = rawImg:eq(13):float()
rawImg = rawImg - mask * #classes
In original ground truth image, the pixel values are 0-11,
after the process above, the pixel values are 1-11, the index 0 is lost.
What's going wrong when I calculate the accuracy?
Many thanks.
difficulty in reproducing your result
Hi,
Your team mentioned the significance of setting batch size in training. Then may I know how do you explain why batch size would impact on the final result so strongly?
I have trained with a batch size of 2 and adjusted my learning rate to 1e-5. I also modified the original code by adding a iterSize of 4. In essence, the real batch size is 8. However, it wouldn't achieve the performance of the pre-trained model you provided.
Here's the error curve:
Encoder
My result:
Thank you for answering.
Training Enet on a custom dataset
I am looking to train an Enet infrastructure on the dataset available here- https://github.com/jeanpat/DeepFISH
What are the points that I need to take care about when training the dataset on Enet and steps for training the dataset.
getting inf loss
I am training camvid at 240*320 with 32 classes
now it is training the encoder
however, Train Error and Test Error are both -inf
I thought there may be something wrong with the label protocol
any clues about why this happens ?
is there a plan for c++ version of Enet?
I currently noticed this network and want to try it. Because i am not familiar with torch, i wonder whether there will be or have existed any work to implement enet with caffe? I have tried the fb-torch2caffe converter, but it is out-of-date and cant support enet.
problem in loading Cityscape data
I try to train the encoder, but I meet this error
Do I miss anything?
I type this in my terminal :
th run.lua --dataset cs --datapath /data/path/to/Cityscapes/ --model models/encoder.lua --save /save/trained/model/ --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64 --cachepath /home/janice/Documents/ENet-training/train/trained/dataset_cache/
I'm new in torch and lua, hope someone can help, thanks!!
Different image sizes leads to inference error
I have managed to visualize the segmentation with standard videos (640 by 480) but I get the following error with videos of size 1230 by 375. Sounds like the network is expecting some specific image dimensions.
WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
/home/djeb/torch/install/share/lua/5.1/nn/Container.lua:67:
In 27 module of nn.Sequential:
In 2 module of nn.Sequential:
/home/djeb/torch/install/share/lua/5.1/nn/CAddTable.lua:16: bad argument #2 to 'add' (sizes do not match at /tmp/luarocks_cutorch-scm-1-1384/cutorch/lib/THC/generic/THCTensorMathPointwise.cu:10)
stack traceback:
[C]: in function 'add'
/home/djeb/torch/install/share/lua/5.1/nn/CAddTable.lua:16: in function </home/djeb/torch/install/share/lua/5.1/nn/CAddTable.lua:9>
[C]: in function 'xpcall'
/home/djeb/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/djeb/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/djeb/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
/home/djeb/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/djeb/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
demo.lua:244: in function <demo.lua:202>
Error Training Decoder
I am working on training the decoder portion after succesfully training the encoder for 300 epochs. I have looked at issue #2 and it is similar but that solution does not work for me. Here is the command that I used to train the encoder:
th run.lua --dataset cs --datapath ~/IGVC/ENet/ENet-training-master/train/data/Cityscapes/ --cachepath ~/IGVC/ENet/ENet-training-master/cache/cityscape/ --model models/encoder.lua --save ~/IGVC/ENet/ENet-training-master/save/encoder/ --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64 --nGPU 1 --batchSize 5
Here is the command that I am trying to train the decoder with:
th run.lua --dataset cs --datapath ~/IGVC/ENet/ENet-training-master/train/data/Cityscapes/ --cachepath ~/IGVC/ENet/ENet-training-master/cache/ --model models/decoder.lua --save ~/IGVC/ENet/ENet-training-master/save/decoder/ --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64 --nGPU 1 --batchSize 5 --CNNEncoder ~/IGVC/ENet/ENet-training-master/save/encoder/model-299.net
And here is the error that I get:
Training: epoch # 1 [batchSize = 5] /home/mike/torch/install/bin/luajit: ...all/share/lua/5.1/cudnn/SpatialCrossEntropyCriterion.lua:28: input and target should be of same size stack traceback: [C]: in function 'assert' ...all/share/lua/5.1/cudnn/SpatialCrossEntropyCriterion.lua:28: in function 'forward' ./train.lua:99: in function 'opfunc' /home/mike/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam' ./train.lua:112: in function 'train' run.lua:59: in main chunk [C]: in function 'dofile' ...mike/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00406670
Any help would be greatly appreciated. Thanks!
Error when forwarding image with trained decoder
Hi,
I've trained an encoder and a decoder, and the training configuration as following:
Encoder:
th run.lua
--dataset cv
--datapath data/CamVid/
--model models/encoder.lua
--save ../trained-model/encoder
--cachepath ../trained-model/encoder/cache_dataset/
--devid 1
--nGPU 1
--imHeight 360
--imWidth 480
--labelHeight 45
--labelWidth 60
Decoder:
th run.lua
--dataset cv
--datapath data/CamVid/
--model models/decoder.lua
--save ../trained-model/decoder
--cachepath ../trained-model/decoder/cache_dataset/
--CNNEncoder ../trained-model/encoder/model-best.net
--devid 1
--nGPU 1
--imHeight 360
--imWidth 480
--labelHeight 360
--labelWidth 480
I used following script to try to forward a single image in the CamVid dataset,
require "image"
require "nn"
require "cunn"
require "cudnn"
torch.setdefaulttensortype('torch.FloatTensor')
-- Get arguments
local opts = require "opts"
opt = opts.parse(arg)
-- Path to decoder/model-best.net and a single image in CamVid dataset
local model = opt.model
local img = opt.image
-- Load model from file
model = torch.load(model)
print(model)
-- Load image from file
img = image.load(img)
img = img:cuda()
input = {}
input[0] = img
print(input)
-- Forward the image and print out the result
local output = model:forward(input)
print(output)
, and I got the following error message:
My question is
-
Is there any mistakes when training encoder or decorder?
Why "print(model)" only show the part of the network?
(I DO NOT modify any lua files) -
It seems the image is loaded succesfully,
and I also create a 4-D tensor for the input.
What's going wrong?
Many thanks.
Decoder Training Issue
Hi, I have some questions about the training process.
Should I train the encoder first then use the model-best.net to train the decoder? I did so.
The best testing error of encoder training is
Best test error: 0.4738311638236, in epoch: 60
However for decoder:
Best test error: 0.77272008287907, in epoch: 72
I trained the decoder for several times, the best test error was approximately the same. So I wonder if I made any mistakes in the training settings.
Port to keras + license
Hello,
I have ported Enet to the keras framework and recently published it on github (for personal/research use). I know I should have asked you beforehand but I was wondering what kind of attribution would be acceptable and whether a MIT license would be ok for distribution (so that I can merge the code with keras-contrib).
Thank you for your work!
How to interpret unlabeled areas?
The visualization of the predictions of ENet contains some unlabeled areas. Does that areas mean, that the net is not sure about the right class or is it trained as an own class? Is it possible to turn off this areas, so the network have to classify every pixel to the remained classes. Or is there a threshold? Thank you in advance!
got scripts implemented bu pytorch?
Hi, adam, i saw you stay in the team of pytorch, did you consider implement ENet in pytorch?
Caffe Implementation
Is there any Caffe Implementation for ENet?
Decoder network
The decoder network doesn't use dropout, and it's using ReLU instead of PReLU. Does this setting outperform the decoder with dropout and PReLU?
Bad argument #1 to 'indexAdd' in confusionMatrix.lua
I've successfully trained several models on CamVid and CityScape. But when I try to train on my own dataset, which I labeled it according to CamVid category rules and applied data augmentation to them, something went wrong with confusion matrix.
When I trained on original labeled data, everything went well. But when I trained on augmented data, it gave me this error. I could reckon this error is occurred because of the augmented label image. But I'm not sure what exactly does this error mean? Btw, without --noConfusion option, the training is working.
Has Model.net been trained on CityScape extra training Data ?
Just wondering whether or not segmentation accuracy would be better with including the CityScapes extra set to the training data.
Testing the network
Hi I was able to successfully train the encoder and the decoder. I wanted to test the network with an image and am setting up a script in Lua. I am a little confused understanding the network as an end-to-end system. When I load the .net file from training session of the decoder, I see the following model:
nn.ConcatTable {
input
|`-> (1): cudnn.SpatialConvolution(3 -> 13, 3x3, 2,2, 1,1)
|`-> (2): nn.SpatialMaxPooling(2x2, 2,2)
... -> output
}
However, when I load the .net file from the training session of the encoder, I see the model is much bigger. Am I suppose to pass the test image through the encoder and then connect the output of the encoder to the decoder?
Following is a snippet of how I'm loading the .net files.
require 'nn'
require 'image'
require 'cunn'
require 'cudnn'
test_img = '/path/to/image/test.png'
network = '/path/to/train/trained_decoder/model-299.net'
net = torch.load(network,'b64')
Large encoder and decoder model
Hi,
I trained both encoder and decoder, I got the large model.net. Encoder model is about 400M; decoder model is 19M. Both of these are much larger than yours, are these normal or something wrong?
First, I trained encoder:
th run.lua --dataset cs --datapath /home/janice/Pictures/Cityscapes --model models/encoder.lua --save /home/janice/Documents/ENet-training/train/trained/encoder_model/ --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64 --cachepath /home/janice/Documents/ENet-training/train/trained/encoder_dataset_cache/ --nGPU 1 --learningRate 5e-4 --weightDecay 2e-4 --batchSize 5
Then I trained decoder:
th run.lua --dataset cs --datapath /home/janice/Pictures/Cityscapes/ --model models/decoder.lua --save /home/janice/Documents/ENet-training/train/trained/decoder_model/ --imHeight 256 --imWidth 512 --labelHeight 256 --labelWidth 512 --cachepath /home/janice/Documents/ENet-training/train/trained/decoder_dataset_cache/ --nGPU 1 --learningRate 5e-4 --weightDecay 2e-4 --batchSize 5 --CNNEncoder /home/janice/Documents/ENet-training/train/trained/encoder_model/model-best.net
Thanks!!
Problem when training decoder
I'm trying to train the CitySpace dataset using ENet by following the example command in https://github.com/forwchen/ENet-training/tree/master/train. The training of the encoder part seems to have worked with following command:
th run.lua --dataset cs --datapath data/Cityscapes --model models/encoder.lua --save trained_models/tests --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64 --learningRate 5e-4 --weightDecay 2e-4 --batchSize 10
However, when I choose a pre-trained model to train the decoder part I get the following error:
th run.lua --dataset cs --datapath data/Cityscapes --model models/decoder.lua --CNNEncoder trained_models/tests/model-281.net --imHeight 256 --imWidth 512 --labelHeight 256 --labelWidth 512 --learningRate 5e-4 --weightDecay 2e-4 --batchSize 10
==> defining some tools
==> flattening model parameters
==> defining training procedure
Class 'Unlabeled' is ignored in confusion matrix
==> allocating minibatch memory
==> defining test procedure
==> Training: epoch # 1 [batchSize = 10]
/root/torch/install/bin/luajit: bad argument #2 to '?' (sizes do not match at /root/torch/extra/cutorch/lib/THC/generic/THCTensorCopy.c:64)
stack traceback:
[C]: at 0x7f818f6cc0d0
[C]: in function '__newindex'
./train.lua:87: in function 'train'
run.lua:59: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
Could someone let me know if I'm doing something wrong?
Thanks
Performance Analysis
I took a video as input and on the lower right corner the frames are displayed. It says 23 frames at input resolution of 512x272 px.
It runs on a Titan X (Pascal) and has cuDNN v5.1 support. So i can not reproduce the inference time in the paper (150 frames). Is there still a trick to get a much better performance?
Thats what i typed into the terminal:
qlua demo.lua -i /home/timo/SegNet/Farbvideo.avi -d /home/timo/ENet-training/model/ -r 0.5
demo.lua:221: attempt to call method 'squeeze' (a nil value)
Hi,
when i am running my self trained model i get the following error for every processed frame and the output windows stays white:
demo.lua:221: attempt to call method 'squeeze' (a nil value)
stack traceback:
demo.lua:221: in function <demo.lua:186>
When i run your provided model everything works.
When i run my self trained encoder alone, it is also working (but with reduced output resolution of course).
This is how i proceed:
First i train the encoder:
th run.lua --dataset cs --datapath /home/udo/CityScapes --model models/encoder.lua --save save/trained/model/ --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64 --nGPU 1
Than, after moving the database data.t7 to data_enc.t7 (so a new one is created for the decoder with it's correct output resolutin), i train the decoder:
th run.lua --dataset cs --datapath /home/udo/CityScapes --model models/decoder.lua --save save/trained/model-dec/ --imHeight 256 --imWidth 512 --labelHeight 256 --labelWidth 512 --nGPU 1 --CNNEncoder /home/udo/enet/ENet-training/train/save/trained/model/model-best.net
Both are running fine and converge as i should.
This is how i run the demo:
qlua demo.lua -i ~/CityScapes/leftImg8bit/test -d ~/enet/ENet-training/train/save/trained/model-dec/
I am pointing the demo directly to the saved model of the decoder training, is there some preprocessing step necessary?
The trained decoder model is a bit smaller than the encoder (2983638 vs 2916948). Your model is bigger: 3230016. To the encoder and decoder have to be "fused" in an intermediate step?
Best regards,
Udo
About the learning rate decay
I have trained the encoder on the Cityscapes dataset using a batch size of 4 and learning rate of 5e-4. I tested 2 values for the learning rate decay and I do not understand why and how the learning rate decay affects the training process in the first 50 epochs, when lrDecayEvery is 50. So here are the training and testing errors for the first 5 epochs when training the encoder with different learning rate decay values (the other hyperparameters are the same).
- learning rate decay = 5e-1
Epoch Testing error Training error
1 1.936894622 2.197471235
2 1.847181412 1.922610026
3 1.803531181 1.842771126
4 1.746171906 1.799409299
5 1.754233467 1.771462634
- learning rate decay = 1e-7
Epoch Testing error Training error
1 1.084342769 1.018340091
2 0.760894685 0.752971031
3 0.664953763 0.658853157
4 0.597714836 0.603430173
5 0.571140148 0.565291346
Reproducing Results
Hello, I've been able to successfully retrain an encoder and decoder model using the Cityscape dataset and the output from the visualization scripts looks great; however I was wondering how one would go about outputting the IoU mAP results on the 1525 test images (Cityscape) stated in the paper with this model. I'd like to confirm that my model produces similar results to the following:
There doesn't appear to be a dedicated script to do this in the repo.
Training on CamVid with 14 classes instead of 12
I am trying to train ENet on CamVid but after adding 2 additional classes, Lanes and Traffic Signals. I recreated the annotations and changed the training, validation and test split of the dataset from 701 images. I have created the new train.txt and test.txt files exactly as it has been created for the default CamVid dataset. The only change i made was in the loadCamVid file where i changed the classes list according to the new dataset. However on running the run.lua file f\with the correct paths to the dataset and model, I am facing this error. It would be great if someone can help pointing out what the problem might be.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [705,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [706,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [644,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [645,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [646,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [576,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [586,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [587,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [288,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [289,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [290,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [313,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [319,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [928,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [929,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [930,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [931,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [932,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [933,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [934,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [992,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [993,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [994,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [996,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [512,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [513,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [514,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [515,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [516,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [525,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [526,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [527,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [25,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [26,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [27,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [28,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [29,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [30,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [31,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [374,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [375,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [376,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [377,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [378,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [380,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [381,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [382,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [383,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [224,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [225,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [226,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [227,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [228,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [229,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [230,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [231,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [252,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [255,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [919,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [920,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [921,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [922,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [923,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [924,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [925,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [926,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [927,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [143,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [144,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [145,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [146,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [147,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [148,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [149,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [150,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [151,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [152,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [153,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [154,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [155,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [156,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [157,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [158,0,0] Assertion t >= 0 && t < n_classes
failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [159,0,0] Assertion t >= 0 && t < n_classes
failed.
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-1749/cutorch/lib/THC/generic/THCStorage.c line=32 error=59 : device-side assert triggered
/home/ws1/torch/install/bin/luajit: cuda runtime error (59) : device-side assert triggered at /tmp/luarocks_cutorch-scm-1-1749/cutorch/lib/THC/generic/THCStorage.c:32
stack traceback:
[C]: at 0x7f92d5c377b0
[C]: in function '__index'
...ch/install/share/lua/5.1/nn/SpatialClassNLLCriterion.lua:51: in function 'updateOutput'
...all/share/lua/5.1/cudnn/SpatialCrossEntropyCriterion.lua:37: in function 'forward'
./train.lua:99: in function 'opfunc'
/home/ws1/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
./train.lua:112: in function 'train'
run.lua:59: in main chunk
[C]: in function 'dofile'
.../ws1/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
Training with other datasets with different image size
Hello,
I'm trying to train ENet with a different dataset where images and labels have size 500 (width) x 210 (height). I developed my own loadDataset.lua file and added this option in run.lua. However, I'm getting the following error when the data is being loaded:
==> Training: epoch # 1 [batchSize = 10]
/root/torch/install/bin/luajit: ...all/share/lua/5.1/cudnn/SpatialCrossEntropyCriterion.lua:41: input and target should be of same size
stack traceback:
[C]: in function 'assert'
...all/share/lua/5.1/cudnn/SpatialCrossEntropyCriterion.lua:41: in function 'forward'
./train.lua:99: in function 'opfunc'
/root/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
./train.lua:112: in function 'train'
run.lua:61: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
I would like to load and use the dataset images with their original sizes (no resizing). What should I put as image/label width and height for training the encoder in this case? In my case, I tried the following values assuming that labels in the encoder are normally 1/8th of the original image size in other datasets (e.g., CityScapes):
--imHeight 210 --imWidth 500 --labelHeight 27 --labelWidth 63
Could anyone give some advise?
Thank you.
Explanation for the resnet view
Hi,
I am wondering, how do you go about what kind of resnet view is appropriate? For instance: in your paper, you present a resnet view of 3 conv layers with a bunch of prelus and stuff. In the inception-resent-v1, there is a resnet view like:
In other words, how do you make this design decision? Does it depend on the type of scene you are training on?
Second question is how do you make the design decision on the number of bottlenecks? For instance, you have a pattern of dilated, asymmetric, dilated twice in the encoder. Once again, is this number of bottlenecks dependent on the kind of scene like Suncam, cityscape etc or is just that the design decision on how deep the network is arbitrary and the deeper network, the better accuracy?
Error: torch/File.lua:375: unknown object
Hi, when running demo.lua on a video, I get the following error. My model and category file have been downloaded from the dropbox link in the repo readme.md
thx for your help.
djeb@DjebPC:~/ENet-training/visualize$ qlua demo.lua -i ~/Desktop/myVideo.mp4
GPU # 1 selected
Loading model from: /home/djeb/ENet-training/visualize/model/model-best.net
qlua: /home/djeb/torch/install/share/lua/5.1/torch/File.lua:375: unknown object
stack traceback:
[C]: at 0x7f3093f069c0
[C]: in function 'error'
/home/djeb/torch/install/share/lua/5.1/torch/File.lua:375: in function 'readObject'
/home/djeb/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
demo.lua:63: in main chunk
ENet in tensor flow
ENet is really great for real time semantic segmentation. But recently I would like to use it in tensor flow. Is there any effort to your knowledge that port the torch model into tensor flow?
Early convergence Issue
Hi,
I found several model I trained would reach the great performance that the model-best.net you offered. I set opt.lua according to your documentation, except that I used a batch size of 2.
and there is the result tested by your model-best.net
I trained it on cityscapes for several time. Actually the training process tended to converge at a early stage(80-100th epoch). Here is a graph of test error trend.
For parameter settings, I basically followed your default setting or your documentation.
Encoder
smallNet : false
learningRate : 0.001
datahistClasses : 810274
1969251
321342
1195839
34488
46349
65294
10960
29268
835925
61092
202450
65301
7305
371063
14192
12671
12402
5239
22095
[torch.FloatTensor of size 20]
batchSize : 2
dataconClasses : table: 0x401fe608
dataClasses : table: 0x401fe4d0
channels : 3
printNorm : false
save : savemodel/
CNNEncoder : historymodel/enc_1/model-best.net
labelHeight : 32
labelWidth : 64
plot : false
nGPU : 2
lrDecayEvery : 100
weightDecay : 0.0005
imHeight : 256
dataset : cs
momentum : 0.9
devid : 1
cachepath : historymodel/
datapath : datasets/Cityscapes/
threads : 8
maxepoch : 300
noConfusion : all
learningRateDecay : 1e-07
model : models/encoder.lua
imWidth : 512
and I got Best test error: 0.46744307547808, in epoch: 88
Decoder
smallNet : false
learningRate : 0.001
datahistClasses : 45323724
127228632
20988628
78573240
2259981
3025684
4230457
716094
1900118
54828056
3993315
13767863
4206164
466290
24125912
922413
811419
803171
340114
1427811
[torch.FloatTensor of size 20]
batchSize : 2
dataconClasses : table: 0x400dd688
dataClasses : table: 0x400dd550
channels : 3
printNorm : false
save : savemodel/
CNNEncoder : historymodel/enc_2_728/model-best.net
labelHeight : 256
labelWidth : 512
plot : false
nGPU : 2
lrDecayEvery : 100
weightDecay : 0.0005
imHeight : 256
dataset : cs
momentum : 0.9
devid : 1
cachepath : historymodel/dec
datapath : /home/eeb433/Documents/Yuhang/dilation/datasets/Cityscapes/
threads : 8
maxepoch : 300
noConfusion : all
learningRateDecay : 1e-07
model : models/decoder.lua
imWidth : 512
and I got Best test error: 0.77709275662899, in epoch: 95
Thanks for your answer!
demo.lua:200 attempt to index local 'img' (a nil value)
Hi all, I am trying to visualize the results and I am running into some snafoos. I added fastimage.so to torch/install/lib/lua/5.1 to get around the fastimage.so not found error after I run this command:
qlua demo.lua -i ../train/data/Cityscapes/leftImg8bit/test/ -d ../save/ -m decoder2 --net 16
The e-Lab Scene Parser opens but all I see is a white screen except for the colored legend that shows up on the side. This is the error that I get:
demo.lua:200: attempt to index local 'img' (a nil value) stack traceback: demo.lua:200: in function <demo.lua:186>
Any help would be greatly appreciated!
Sizes do not match error
Hi I tried to reproduce the training from this research paper. I was able to successfully train the encoder. However, I get the following error when training the decoder. Any help is appreciated.
Command used to train the encoder
th run.lua --dataset cv --datapath /home/ubuntu/SegNet-Tutorial/CamVid/ --model models/encoder.lua --save /home/ubuntu/ENet-training/train/trained_model/ --imHeight 360 --imWidth 480 --labelHeight 45 --labelWidth 60 --cachepath /home/ubuntu/ENet-training/train/dataset_cache/
Command used to start training the decoder and its resulting error:
th run.lua --dataset cv --datapath /home/ubuntu/SegNet-Tutorial/CamVid/ --model models/decoder.lua --save /home/ubuntu/ENet-training/train/trained_decoder/ --imHeight 360 --imWidth 480 --labelHeight 360 --labelWidth 480 --cachepath /home/ubuntu/ENet-training/train/dataset_cache/ --CNNEncoder /home/ubuntu/ENet-training/train/trained_model/model-299.net
Error:
==> Training: epoch # 1 [batchSize = 2] /home/ubuntu/torch/install/bin/luajit: bad argument #2 to '?' (sizes do not match at /home/ubuntu/torch/extra/cutorch/lib/THC/generic/THCTensorCopy.c:64) stack traceback: [C]: at 0x7ff8eee7d610 [C]: in function '__newindex' ./train.lua:97: in function 'train' run.lua:77: in main chunk [C]: in function 'dofile' ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670
Another command used to start training the decoder and its resulting error:
th run.lua --dataset cv --datapath /home/ubuntu/SegNet-Tutorial/CamVid/ --model models/decoder.lua --save /home/ubuntu/ENet-training/train/trained_decoder/ --imHeight 45 --imWidth 60 --labelHeight 360 --labelWidth 480 --cachepath /home/ubuntu/ENet-training/train/dataset_cache/ --CNNEncoder /home/ubuntu/ENet-training/train/trained_model/model-299.net
Error:
==> Training: epoch # 1 [batchSize = 2] /home/ubuntu/torch/install/bin/luajit: /home/ubuntu/torch/install/share/lua/5.1/nn/Container.lua:67: Step: 0ms In 2 module of nn.Sequential: /home/ubuntu/torch/install/share/lua/5.1/nn/JoinTable.lua:39: bad argument #1 to 'copy' (sizes do not match at /home/ubuntu/torch/extra/cutorch/lib/THC/generic/THCTensorCopy.cu:10) stack traceback: [C]: in function 'copy' /home/ubuntu/torch/install/share/lua/5.1/nn/JoinTable.lua:39: in function </home/ubuntu/torch/install/share/lua/5.1/nn/JoinTable.lua:21> [C]: in function 'xpcall' /home/ubuntu/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' ./train.lua:108: in function 'opfunc' /home/ubuntu/torch/install/share/lua/5.1/optim/adam.lua:33: in function 'adam' ./train.lua:123: in function 'train' run.lua:77: in main chunk [C]: in function 'dofile' ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670
Bad segmentation quality
Class Accuracy is 0
I'm training both encoder and decoder on the CamVid dataset and use --noConfusion all
. It works fine for encoder, while for decoder, the class accuracies for column-pole, sign-symbol, pedestrian and bicyclist are 0.000% during both training and testing. I'm not sure why this could happen? The dataset was downloaded from SegNet github, and the image size is kept as it is (360x480).
Here's confusionMatrix for one model:
Testing:
================================================================================
ConfusionMatrix:
[[ 6378220 69799 10 23 124 301373 0 3156 11950 0 0] 94.287% [class: Sky]
[ 137783 8384576 4 12455 44300 470470 0 397151 318598 3 0] 85.861% [class: Building]
[ 44235 287893 2 255 3949 56334 0 36627 41310 0 0] 0.000% [class: Column-Pole]
[ 10 20881 0 9232678 701740 26 0 28850 281855 0 0] 89.934% [class: Road]
[ 2 42951 0 656156 2133744 7 0 41084 820141 0 0] 57.761% [class: Sidewalk]
[ 281655 1948040 3 127 1128 2107336 0 118706 53577 3 0] 46.720% [class: Tree]
[ 7756 339229 0 6 132 44766 0 6926 2599 0 0] 0.000% [class: Sign-Symbol]
[ 910 262004 1 2599 6494 4922 0 108354 81889 0 0] 23.194% [class: Fence]
[ 5352 131591 1 64122 61088 11641 0 68830 1224900 0 0] 78.142% [class: Car]
[ 47 165030 0 303 944 93 0 62262 25160 0 0] 0.000% [class: Pedestrian]
[ 3 24571 0 1233 2030 177 0 29364 16965 0 0]] 0.000% [class: Bicyclist]
+ average row correct: 43.263584944676%
+ average rowUcol correct (VOC measure): 33.552783618056%
+ global correct: 77.335819603064%
Character data not supported type: 17
Where to set #classes
Hi,
I have a same dataset as CamVid except with two classes, I assume the only where that needs to modify is loadCamvid.lua. I changed the classes and conClasses. Also I changed this line mask = rawImg:eq(13):float() to mask = rawImg:eq(3):float().
I could train the encoder part, but for the decoder the result has 13 classes.
Is there anywhere I should change?
-Best
Mina
LICENCE
Hi,
Great work, thanks for sharing it.
What is the licence for this repository?
Thanks!
cuda runtime error (77) : an illegal memory access was encountered
Hi,
while training cityscapes, i get the following error radomly during 1st or 2nd epoch:
THCudaCheck FAIL file=/home/udo/programs/torch/extra/cutorch/lib/THC/generic/THCStorage.c line=30 error=77 : an illegal memory access was encountered /home/udo/programs/torch/install/bin/luajit: cuda runtime error (77) : an illegal memory access was encountered at /home/udo/programs/torch/extra/cutorch/lib/THC/generic/THCStorage.c:147
I am using Ubuntu 16.04 with Cuda 8.0 and a GTX1070
Doesn't matter if cudnn is installed via luarocks install or not....
Best Regards,
Udo
Edit: launching with CUDA_LAUNCH_BLOCKING=1, i get:
THCudaCheck FAIL file=/home/udo/programs/torch/extra/cunn/lib/THCUNN/ClassNLLCriterion.cu line=171 error=77 : an illegal memory access was encountered /home/udo/programs/torch/install/bin/luajit: cuda runtime error (77) : an illegal memory access was encountered at /home/udo/programs/torch/extra/cutorch/lib/THC/generic/THCStorage.c:147
cuda runtime error (10) : invalid device ordinal
Hi, is there any solution to this error?
==> defining some tools
==> flattening model parameters THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-4230/cutorch/init.c line=719 error=10 : invalid device ordinal
==> defining some tools
==> flattening model parameters /home/CV/torch/install/bin/luajit: /home/CV/torch/install/share/lua/5.1/trepl/init.lua:384: ...V/torch/install/share/lua/5.1/cunn/DataParallelTable.lua:634: cuda runtime error (10) : invalid device ordinal at /tmp/luarocks_cutorch-scm-1-4230/cutorch/init.c:719 stack traceback:
`[C]: in function 'error'
/home/CV/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
run.lua:56: in main chunk
[C]: in function 'dofile'
.../CV/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670`
Many thanks!
Would E-Net support water segmentation after custom training ?
I would like to perform water segmentation form aerial image with e-net. Just wondering if its design is appropriate for such task. Thank you.
Unable to run on Jetson Tx1
I have tried to run a pre train model through demo.lua .
upon execution on a video.mp4 file - list of categories gets displayed , in a white window,
video statistics get displayed, but segmentation dosent start
After a couple of seconds, program displays 'killed' and exits.
I had run the model on my host machine without any difficulty earlier , but on jetson i face this issue
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.