e-lab / enet-training Goto Github PK

View Code? Open in Web Editor NEW

351.0 351.0 88.0 135 KB

Shell 0.09% Lua 99.91%

enet-training's People

Contributors

Stargazers

Watchers

Forkers

tfwu wanjinchang hedgefair kirk86 wucpmark jxchen01 liuxiabing diz-vara syed-ahmed sbug15 anuragranj xshhhm p-andra gopigrip7 richardkelley wuthmone coolhebei wyw636 yhkim8412 markjingnb alexeyab kevin0932 hzq-github lymhust issac8huxley pchank lyk125 cuijianzhu sunnywanghj collector-m phunghx soledad89 ml-lab meshiguge wangjuenew satoshirobatofujimoto tpys sdemyanov githubfragments huaijin-chen jeanpat xilaili hexiangquan kastnerkyle zjucsxxd cyz0054 pharish93 yogendratamang48 jiangqh nagyist expipiplus1 dreadlord1984 segmentationorg jay98 yugvirparmar neuralnetworkingtechnologies yammay yaweiye29 saviogeorge dapengliu stevenlol zgsxwsdxg jasonlee020 wenxuanliu yasser2652 matsuren undercontroller zibu15 giorking gaojie0105 bermanmaxim briando2005 tuofen afcarl mengyingwu starstarfish sahandv vehicularkech emmaymjin happog jurjsorinliviu sinofairy fangwudi yuwvehu kingwingshome 3229018240 dayaalex

enet-training's Issues

No result to visualize

I was trying to train the encoder and decoder and then visualize that for CityScape. Training for encoder is:
th run.lua --dataset cs --datapath ~/Desktop/Dataset/CityScape --model models/encoder.lua --save data/CityScape/trained_encoder/ --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64 --cachepath data/CityScape/enc_cache/ --nGPU 1 --lrDecayEvery 10 -b 5 --maxepoch 100
Training for decoder is:
th run.lua --dataset cs --datapath ~/Desktop/Dataset/CityScape --model models/decoder.lua --imHeight 256 --imWidth 512 --labelHeight 256 --labelWidth 512 --CNNEncoder data/CityScape/trained_encoder/model-best.net --nGPU 1 --cachepath data/CityScape/dec_cache/ --save data/CityScape/trained_decoder/ --lrDecayEvery 10 -b 5 --maxepoch 100

I used the best decoder model to visualize, however the e-Lab Scene Parser is just a white window with classes listed on the side. The error is:
video statistics: 29.979879275654 fps, 360x640 (149 frames)
Press Spacebar to pause or
Right Arrow to skip forward or
Esc to exit
demo.lua:221: attempt to call method 'squeeze' (a nil value)
stack traceback:
demo.lua:221: in function demo.lua:186
demo.lua:221: attempt to call method 'squeeze' (a nil value)
stack traceback:
demo.lua:221: in function demo.lua:186
demo.lua:221: attempt to call method 'squeeze' (a nil value)
stack traceback:
demo.lua:221: in function demo.lua:186
demo.lua:221: attempt to call method 'squeeze' (a nil value)

What's the possible problem here?

module 'fastimage' not found

I want to test ENet. I enter the following in the terminal:

qlua demo.lua -i /home/timo/example_image/004.png -m /home/timo/ENet-training/model/model-best.net

Then I get the following output:

Found Environment variable CUDNN_PATH = /usr/local/cuda/lib64/libcudnn.so.5
GPU # 1 selected
Loading model from: /home/timo/ENet-training/model/model-best.net
No stat file found in directory: /home/timo/ENet-training/model/home/timo/ENet-training/model/model-best.net
newcatdir= /home/timo/ENet-training/model/categories.txt
Loading categories file from: /home/timo/ENet-training/model/categories.txt
/home/timo/ENet-training/model/categories.txt
Network has this list of categories, targets:
1   Unlabeled   true
2   Road    true
3   Sidewalk    true
4   Building    true
5   Wall    true
6   Fence   true
7   Pole    true
8   TrafficLight    true
9   TrafficSign true
10  Vegetation  true
11  Terrain true
12  Sky true
13  Person  true
14  Rider   true
15  Car true
16  Truck   true
17  Bus true
18  Train   true
19  Motorcycle  true
20  Bicycle true
qlua: ./frame/frameimage.lua:17: module 'fastimage' not found:
    no field package.preload['fastimage']
    no file '/home/timo/.luarocks/share/lua/5.1/fastimage.lua'
    no file '/home/timo/.luarocks/share/lua/5.1/fastimage/init.lua'
    no file '/home/timo/torch/install/share/lua/5.1/fastimage.lua'
    no file '/home/timo/torch/install/share/lua/5.1/fastimage/init.lua'
    no file './fastimage.lua'
    no file '/home/timo/torch/install/share/luajit-2.1.0-beta1/fastimage.lua'
    no file '/usr/local/share/lua/5.1/fastimage.lua'
    no file '/usr/local/share/lua/5.1/fastimage/init.lua'
    no file '/home/timo/torch/install/lib/fastimage.so'
    no file '/home/timo/.luarocks/lib/lua/5.1/fastimage.so'
    no file '/home/timo/torch/install/lib/lua/5.1/fastimage.so'
    no file './fastimage.so'
    no file '/usr/local/lib/lua/5.1/fastimage.so'
    no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
    [C]: at 0x7f5bf93969c0
    [C]: in function 'require'
    ./frame/frameimage.lua:17: in function 'init'
    demo.lua:147: in main chunk

"luarocks install fastimage" does not work unfortunately. Does somebody has an idea? Thank you in advance!

Preprocessing Issue - Data Size

Hi Adam,
May I know have you resized the input resolution before training?
The original Cityscapes datasets have the resolution of 2048x1024, however in README.md it seems that you use 512x256 as the input size.

I used

 --dataset cs --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64

and

 --imHeight 256 --imWidth 512 --labelHeight 256 --labelWidth 512

to train an encoder and a decoder with 2048x1024 input. No error was reported however the visualized result was rather rough(using your visualize tool). I wonder if there is any mistake with input data.

Thanks for your answering!

Train and use without CUDA!

Is it possible to design encoder and decoder in a way not using CUDA?

Thank You,
Mina

CPU Implementation

Hi, I would like to use this for segmenting images on CPU. I tried to run the trained classifier to segment one of the image, by taking out the Cuda line in demo.lua. But the model shows requirement for cuda. Is there some option to run it?

Thanks

Save segmented labels

Is there anyway to save the label result (saying just class number from 0-11 for 1-N) as an image file? It should be the tensor 'winners' from demo.lua in visualize. I could image.display(winners) but I couldn't image.save(winners) as it would save an image with all pixels at 255.

Layers with and without bias

Hi!
I read in your paper that you did not use bias terms in your convolution layers, so I wonder why, in your encoder.lua, you use both spatial convolutions with bias and spatial convolutions without bias, for example (line 43-44):
main:add(cudnn.SpatialConvolution(internal, internal, asymetric, 1, 1, 1, pad, 0):noBias()) main:add(cudnn.SpatialConvolution(internal, internal, 1, asymetric, 1, 1, 0, pad))
Thanks for your answer

Training on SUN RGB-D dataset

Hi,

I want to train ENet model on SUN RGB-D dataset, but I found that the ground truth of each image is not consistent.

I following the source code to load the label of each image with
m = require 'matio'
label = m.load(/path/to/folders/'seg.mat').seglabel
Then, drawing an output image with the label, and making different index label has different color.

But, for example, beds are labelled with different color/index in following images

And other objects have different index in different images.
Also, SUN RGB-D dataset has 38 classes (including unlabelled class), so the index interval should be [0, 37] or [1, 38].
But some seg.mat file has the index number larger than 37 and 38, for example, 45, 46 appeared.

I'm wonder what's going wrong about the ground truths?

Many thanks.

Running semantic segmentation with a different model file, on a different dataset.

I found the code quite readable and newbie-friendly, and am trying to build my own semantic segmentation repository based off of ENet. For now, the details have been abstracted away to get the system up and running.

I'm working with my own dataset - img: 256x256, lab: 128x128 (architecture has 1 2x2 pooling). A loadDataset.lua script generates the data/labels as 4D tensors : numFiles X channels X hgt X wdt. A Resnet-style model is called from run.lua for training an epoch [trying to figure out only training for now] as follows:

epoch = 1
trainConf, model, loss = trainer(data.trainData, opt.dataClasses, epoch);

I am getting the following error:

==> Training: epoch # 1 [batchSize = 128]
THCudaCheck FAIL file=/home/ishann/torch/extra/cutorch/lib/THC/generic/THCStorage.cu line=41 error=2 : out of memory
/home/ishann/torch/install/share/lua/5.1/nn/Container.lua:67:
In 5 module of nn.Sequential:
In 1 module of nn.Sequential:
In 3 module of nn.Sequential:
In 1 module of nn.ConcatTable:
In 1 module of nn.Sequential:
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:142: cuda runtime error (2) : out of memory at /home/ishann/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:41
stack traceback:
        [C]: in function 'resize'
        ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:142: in function 'createIODescriptors'
        ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:349: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:347>
        [C]: in function 'xpcall'
        /home/ishann/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
        /home/ishann/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/ishann/torch/install/share/lua/5.1/nn/Sequential.lua:41>
        [C]: in function 'xpcall'
        /home/ishann/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
        ...e/ishann/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function <...e/ishann/torch/install/share/lua/5.1/nn/ConcatTable.lua:9>
        [C]: in function 'xpcall'
        ...
        /home/ishann/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
        /home/ishann/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
        ./trainer.lua:78: in function 'opfunc'
        /home/ishann/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
        ./trainer.lua:94: in function 'trainer'
        [string "trainConf, model, loss = trainer(data.trainDa..."]:1: in main chunk
        [C]: in function 'xpcall'
        /home/ishann/torch/install/share/lua/5.1/trepl/init.lua:669: in function 'repl'
        ...ushb/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
        [C]: at 0x00406670

I am currently working with a single GPU and have removed the dataParallelTable code segment from my model file. The model and loss have been converted to CUDA formats.

Environment : Ubuntu-14.04, TitanX GPU, CUDA V7.0.27, Driver Version 346.35.

Ground truth pixel values in CamVid

Hi,

I use CamVid dataset to train a model, and its parameter setting is same as the ENet paper.
After training, I forward a single image with my trained model, and the results are:

From this single image test, it seems that the trained model is powerful enough.

With the image above, and its annotated image, I try to calculate its accuracy.
I considered the pixel values in annotated image indicate its class.
For example, 0 indicates background, 1 indicates sky, and so on.
After forwarding the image, I compare each pixel in output vector with its ground truth,
and get the accuracy 0.06 = 6%.

After that, I saw the following code in loadCamVid.lua
-- load corresponding ground truth
rawImg = image.load(gtPath[i], 1, 'byte'):squeeze():float() + 2
local mask = rawImg:eq(13):float()
rawImg = rawImg - mask * #classes

In original ground truth image, the pixel values are 0-11,
after the process above, the pixel values are 1-11, the index 0 is lost.

What's going wrong when I calculate the accuracy?
Many thanks.

difficulty in reproducing your result

Hi,

Your team mentioned the significance of setting batch size in training. Then may I know how do you explain why batch size would impact on the final result so strongly?

I have trained with a batch size of 2 and adjusted my learning rate to 1e-5. I also modified the original code by adding a iterSize of 4. In essence, the real batch size is 8. However, it wouldn't achieve the performance of the pre-trained model you provided.

Here's the error curve:
Encoder

Decoder

My result:

Thank you for answering.

Training Enet on a custom dataset

I am looking to train an Enet infrastructure on the dataset available here- https://github.com/jeanpat/DeepFISH

What are the points that I need to take care about when training the dataset on Enet and steps for training the dataset.

getting inf loss

I am training camvid at 240*320 with 32 classes

now it is training the encoder

however, Train Error and Test Error are both -inf

I thought there may be something wrong with the label protocol

any clues about why this happens ?

is there a plan for c++ version of Enet?

I currently noticed this network and want to try it. Because i am not familiar with torch, i wonder whether there will be or have existed any work to implement enet with caffe? I have tried the fb-torch2caffe converter, but it is out-of-date and cant support enet.

problem in loading Cityscape data

I try to train the encoder, but I meet this error

Do I miss anything?
I type this in my terminal :
th run.lua --dataset cs --datapath /data/path/to/Cityscapes/ --model models/encoder.lua --save /save/trained/model/ --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64 --cachepath /home/janice/Documents/ENet-training/train/trained/dataset_cache/

I'm new in torch and lua, hope someone can help, thanks!!

Different image sizes leads to inference error

I have managed to visualize the segmentation with standard videos (640 by 480) but I get the following error with videos of size 1230 by 375. Sounds like the network is expecting some specific image dimensions.

WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
/home/djeb/torch/install/share/lua/5.1/nn/Container.lua:67: 
In 27 module of nn.Sequential:
In 2 module of nn.Sequential:
/home/djeb/torch/install/share/lua/5.1/nn/CAddTable.lua:16: bad argument #2 to 'add' (sizes do not match at /tmp/luarocks_cutorch-scm-1-1384/cutorch/lib/THC/generic/THCTensorMathPointwise.cu:10)
stack traceback:
    [C]: in function 'add'
    /home/djeb/torch/install/share/lua/5.1/nn/CAddTable.lua:16: in function </home/djeb/torch/install/share/lua/5.1/nn/CAddTable.lua:9>
    [C]: in function 'xpcall'
    /home/djeb/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    /home/djeb/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/djeb/torch/install/share/lua/5.1/nn/Sequential.lua:41>
    [C]: in function 'xpcall'
    /home/djeb/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    /home/djeb/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    demo.lua:244: in function <demo.lua:202>

Error Training Decoder

I am working on training the decoder portion after succesfully training the encoder for 300 epochs. I have looked at issue #2 and it is similar but that solution does not work for me. Here is the command that I used to train the encoder:

th run.lua --dataset cs --datapath ~/IGVC/ENet/ENet-training-master/train/data/Cityscapes/ --cachepath ~/IGVC/ENet/ENet-training-master/cache/cityscape/ --model models/encoder.lua --save ~/IGVC/ENet/ENet-training-master/save/encoder/ --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64 --nGPU 1 --batchSize 5
Here is the command that I am trying to train the decoder with:

th run.lua --dataset cs --datapath ~/IGVC/ENet/ENet-training-master/train/data/Cityscapes/ --cachepath ~/IGVC/ENet/ENet-training-master/cache/ --model models/decoder.lua --save ~/IGVC/ENet/ENet-training-master/save/decoder/ --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64 --nGPU 1 --batchSize 5 --CNNEncoder ~/IGVC/ENet/ENet-training-master/save/encoder/model-299.net

And here is the error that I get:

Training: epoch # 1 [batchSize = 5] /home/mike/torch/install/bin/luajit: ...all/share/lua/5.1/cudnn/SpatialCrossEntropyCriterion.lua:28: input and target should be of same size stack traceback: [C]: in function 'assert' ...all/share/lua/5.1/cudnn/SpatialCrossEntropyCriterion.lua:28: in function 'forward' ./train.lua:99: in function 'opfunc' /home/mike/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam' ./train.lua:112: in function 'train' run.lua:59: in main chunk [C]: in function 'dofile' ...mike/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00406670

Any help would be greatly appreciated. Thanks!

Error when forwarding image with trained decoder

Hi,
I've trained an encoder and a decoder, and the training configuration as following:

Encoder:

th run.lua
--dataset cv
--datapath data/CamVid/
--model models/encoder.lua
--save ../trained-model/encoder
--cachepath ../trained-model/encoder/cache_dataset/
--devid 1
--nGPU 1
--imHeight 360
--imWidth 480
--labelHeight 45
--labelWidth 60

Decoder:

th run.lua
--dataset cv
--datapath data/CamVid/
--model models/decoder.lua
--save ../trained-model/decoder
--cachepath ../trained-model/decoder/cache_dataset/
--CNNEncoder ../trained-model/encoder/model-best.net
--devid 1
--nGPU 1
--imHeight 360
--imWidth 480
--labelHeight 360
--labelWidth 480

I used following script to try to forward a single image in the CamVid dataset,

require "image"
require "nn"
require "cunn"
require "cudnn"

torch.setdefaulttensortype('torch.FloatTensor')

-- Get arguments
local opts = require "opts"
opt = opts.parse(arg)

-- Path to decoder/model-best.net and a single image in CamVid dataset
local model = opt.model
local img = opt.image

-- Load model from file
model = torch.load(model)
print(model)

-- Load image from file
img = image.load(img)
img = img:cuda()
input = {}
input[0] = img
print(input)

-- Forward the image and print out the result
local output = model:forward(input)
print(output)

, and I got the following error message:

My question is

Is there any mistakes when training encoder or decorder?
Why "print(model)" only show the part of the network?
(I DO NOT modify any lua files)
It seems the image is loaded succesfully,
and I also create a 4-D tensor for the input.
What's going wrong?

Many thanks.

Decoder Training Issue

Hi, I have some questions about the training process.

Should I train the encoder first then use the model-best.net to train the decoder? I did so.

The best testing error of encoder training is

Best test error: 0.4738311638236, in epoch: 60

However for decoder:

Best test error: 0.77272008287907, in epoch: 72

I trained the decoder for several times, the best test error was approximately the same. So I wonder if I made any mistakes in the training settings.

Port to keras + license

Hello,

I have ported Enet to the keras framework and recently published it on github (for personal/research use). I know I should have asked you beforehand but I was wondering what kind of attribution would be acceptable and whether a MIT license would be ok for distribution (so that I can merge the code with keras-contrib).

Thank you for your work!

How to interpret unlabeled areas?

The visualization of the predictions of ENet contains some unlabeled areas. Does that areas mean, that the net is not sure about the right class or is it trained as an own class? Is it possible to turn off this areas, so the network have to classify every pixel to the remained classes. Or is there a threshold? Thank you in advance!

got scripts implemented bu pytorch?

Hi, adam, i saw you stay in the team of pytorch, did you consider implement ENet in pytorch?

Caffe Implementation

Is there any Caffe Implementation for ENet?

Decoder network

The decoder network doesn't use dropout, and it's using ReLU instead of PReLU. Does this setting outperform the decoder with dropout and PReLU?

Bad argument #1 to 'indexAdd' in confusionMatrix.lua

I've successfully trained several models on CamVid and CityScape. But when I try to train on my own dataset, which I labeled it according to CamVid category rules and applied data augmentation to them, something went wrong with confusion matrix.
When I trained on original labeled data, everything went well. But when I trained on augmented data, it gave me this error. I could reckon this error is occurred because of the augmented label image. But I'm not sure what exactly does this error mean? Btw, without --noConfusion option, the training is working.

Has Model.net been trained on CityScape extra training Data ?

Just wondering whether or not segmentation accuracy would be better with including the CityScapes extra set to the training data.

Testing the network

Hi I was able to successfully train the encoder and the decoder. I wanted to test the network with an image and am setting up a script in Lua. I am a little confused understanding the network as an end-to-end system. When I load the .net file from training session of the decoder, I see the following model:

nn.ConcatTable {
  input
    |`-> (1): cudnn.SpatialConvolution(3 -> 13, 3x3, 2,2, 1,1)
    |`-> (2): nn.SpatialMaxPooling(2x2, 2,2)
     ... -> output
}

However, when I load the .net file from the training session of the encoder, I see the model is much bigger. Am I suppose to pass the test image through the encoder and then connect the output of the encoder to the decoder?

Following is a snippet of how I'm loading the .net files.

require 'nn'
require 'image'
require 'cunn'
require 'cudnn'

test_img = '/path/to/image/test.png'
network = '/path/to/train/trained_decoder/model-299.net'
net = torch.load(network,'b64')

Large encoder and decoder model

Hi,
I trained both encoder and decoder, I got the large model.net. Encoder model is about 400M; decoder model is 19M. Both of these are much larger than yours, are these normal or something wrong?

First, I trained encoder:
th run.lua --dataset cs --datapath /home/janice/Pictures/Cityscapes --model models/encoder.lua --save /home/janice/Documents/ENet-training/train/trained/encoder_model/ --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64 --cachepath /home/janice/Documents/ENet-training/train/trained/encoder_dataset_cache/ --nGPU 1 --learningRate 5e-4 --weightDecay 2e-4 --batchSize 5

Then I trained decoder:
th run.lua --dataset cs --datapath /home/janice/Pictures/Cityscapes/ --model models/decoder.lua --save /home/janice/Documents/ENet-training/train/trained/decoder_model/ --imHeight 256 --imWidth 512 --labelHeight 256 --labelWidth 512 --cachepath /home/janice/Documents/ENet-training/train/trained/decoder_dataset_cache/ --nGPU 1 --learningRate 5e-4 --weightDecay 2e-4 --batchSize 5 --CNNEncoder /home/janice/Documents/ENet-training/train/trained/encoder_model/model-best.net

Thanks!!

Problem when training decoder

I'm trying to train the CitySpace dataset using ENet by following the example command in https://github.com/forwchen/ENet-training/tree/master/train. The training of the encoder part seems to have worked with following command:

th run.lua --dataset cs --datapath data/Cityscapes --model models/encoder.lua --save trained_models/tests --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64 --learningRate 5e-4 --weightDecay 2e-4 --batchSize 10

However, when I choose a pre-trained model to train the decoder part I get the following error:

th run.lua --dataset cs --datapath data/Cityscapes --model models/decoder.lua --CNNEncoder trained_models/tests/model-281.net --imHeight 256 --imWidth 512 --labelHeight 256 --labelWidth 512 --learningRate 5e-4 --weightDecay 2e-4 --batchSize 10

==> defining some tools
==> flattening model parameters
==> defining training procedure
Class 'Unlabeled' is ignored in confusion matrix
==> allocating minibatch memory
==> defining test procedure
==> Training: epoch # 1 [batchSize = 10]
/root/torch/install/bin/luajit: bad argument #2 to '?' (sizes do not match at /root/torch/extra/cutorch/lib/THC/generic/THCTensorCopy.c:64)
stack traceback:
[C]: at 0x7f818f6cc0d0
[C]: in function '__newindex'
./train.lua:87: in function 'train'
run.lua:59: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

Could someone let me know if I'm doing something wrong?

Thanks

Performance Analysis

I took a video as input and on the lower right corner the frames are displayed. It says 23 frames at input resolution of 512x272 px.
It runs on a Titan X (Pascal) and has cuDNN v5.1 support. So i can not reproduce the inference time in the paper (150 frames). Is there still a trick to get a much better performance?

Thats what i typed into the terminal:

qlua demo.lua -i /home/timo/SegNet/Farbvideo.avi -d /home/timo/ENet-training/model/ -r 0.5

demo.lua:221: attempt to call method 'squeeze' (a nil value)

Hi,
when i am running my self trained model i get the following error for every processed frame and the output windows stays white:

demo.lua:221: attempt to call method 'squeeze' (a nil value)
stack traceback:
    demo.lua:221: in function <demo.lua:186>

When i run your provided model everything works.
When i run my self trained encoder alone, it is also working (but with reduced output resolution of course).

This is how i proceed:

First i train the encoder:
th run.lua --dataset cs --datapath /home/udo/CityScapes --model models/encoder.lua --save save/trained/model/ --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64 --nGPU 1

Than, after moving the database data.t7 to data_enc.t7 (so a new one is created for the decoder with it's correct output resolutin), i train the decoder:
th run.lua --dataset cs --datapath /home/udo/CityScapes --model models/decoder.lua --save save/trained/model-dec/ --imHeight 256 --imWidth 512 --labelHeight 256 --labelWidth 512 --nGPU 1 --CNNEncoder /home/udo/enet/ENet-training/train/save/trained/model/model-best.net
Both are running fine and converge as i should.

This is how i run the demo:
qlua demo.lua -i ~/CityScapes/leftImg8bit/test -d ~/enet/ENet-training/train/save/trained/model-dec/

I am pointing the demo directly to the saved model of the decoder training, is there some preprocessing step necessary?
The trained decoder model is a bit smaller than the encoder (2983638 vs 2916948). Your model is bigger: 3230016. To the encoder and decoder have to be "fused" in an intermediate step?

Best regards,
Udo

About the learning rate decay

I have trained the encoder on the Cityscapes dataset using a batch size of 4 and learning rate of 5e-4. I tested 2 values for the learning rate decay and I do not understand why and how the learning rate decay affects the training process in the first 50 epochs, when lrDecayEvery is 50. So here are the training and testing errors for the first 5 epochs when training the encoder with different learning rate decay values (the other hyperparameters are the same).

learning rate decay = 5e-1

Epoch 	Testing error	Training error		
1	1.936894622	2.197471235	
2	1.847181412	1.922610026	
3	1.803531181	1.842771126
4	1.746171906	1.799409299	
5	1.754233467	1.771462634

learning rate decay = 1e-7

Epoch	Testing error	Training error	
1	1.084342769	1.018340091	
2	0.760894685	0.752971031	
3	0.664953763	0.658853157	
4	0.597714836	0.603430173
5	0.571140148	0.565291346

Reproducing Results

Hello, I've been able to successfully retrain an encoder and decoder model using the Cityscape dataset and the output from the visualization scripts looks great; however I was wondering how one would go about outputting the IoU mAP results on the 1525 test images (Cityscape) stated in the paper with this model. I'd like to confirm that my model produces similar results to the following:

There doesn't appear to be a dedicated script to do this in the repo.

Training on CamVid with 14 classes instead of 12

I am trying to train ENet on CamVid but after adding 2 additional classes, Lanes and Traffic Signals. I recreated the annotations and changed the training, validation and test split of the dataset from 701 images. I have created the new train.txt and test.txt files exactly as it has been created for the default CamVid dataset. The only change i made was in the loadCamVid file where i changed the classes list according to the new dataset. However on running the run.lua file f\with the correct paths to the dataset and model, I am facing this error. It would be great if someone can help pointing out what the problem might be.

/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [705,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [706,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [644,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [645,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [646,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [576,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [586,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [587,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [288,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [289,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [290,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [313,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [319,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [928,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [929,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [930,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [931,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [932,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [933,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [934,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [992,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [993,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [994,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [996,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [512,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [513,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [514,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [515,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [516,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [525,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [526,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [527,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [25,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [26,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [27,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [28,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [29,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [30,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [31,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [374,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [375,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [376,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [377,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [378,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [380,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [381,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [382,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [383,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [224,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [225,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [226,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [227,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [228,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [229,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [230,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [231,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [252,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [255,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [919,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [920,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [921,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [922,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [923,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [924,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [925,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [926,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [927,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [143,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [144,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [145,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [146,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [147,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [148,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [149,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [150,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [151,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [152,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [153,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [154,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [155,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [156,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [157,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [158,0,0] Assertion t >= 0 && t < n_classes failed.
/tmp/luarocks_cunn-scm-1-8974/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [159,0,0] Assertion t >= 0 && t < n_classes failed.
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-1749/cutorch/lib/THC/generic/THCStorage.c line=32 error=59 : device-side assert triggered

/home/ws1/torch/install/bin/luajit: cuda runtime error (59) : device-side assert triggered at /tmp/luarocks_cutorch-scm-1-1749/cutorch/lib/THC/generic/THCStorage.c:32
stack traceback:
[C]: at 0x7f92d5c377b0
[C]: in function '__index'
...ch/install/share/lua/5.1/nn/SpatialClassNLLCriterion.lua:51: in function 'updateOutput'
...all/share/lua/5.1/cudnn/SpatialCrossEntropyCriterion.lua:37: in function 'forward'
./train.lua:99: in function 'opfunc'
/home/ws1/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
./train.lua:112: in function 'train'
run.lua:59: in main chunk
[C]: in function 'dofile'
.../ws1/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50

Training with other datasets with different image size

Hello,

I'm trying to train ENet with a different dataset where images and labels have size 500 (width) x 210 (height). I developed my own loadDataset.lua file and added this option in run.lua. However, I'm getting the following error when the data is being loaded:

==> Training: epoch # 1 [batchSize = 10]
/root/torch/install/bin/luajit: ...all/share/lua/5.1/cudnn/SpatialCrossEntropyCriterion.lua:41: input and target should be of same size
stack traceback:
[C]: in function 'assert'
...all/share/lua/5.1/cudnn/SpatialCrossEntropyCriterion.lua:41: in function 'forward'
./train.lua:99: in function 'opfunc'
/root/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
./train.lua:112: in function 'train'
run.lua:61: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

I would like to load and use the dataset images with their original sizes (no resizing). What should I put as image/label width and height for training the encoder in this case? In my case, I tried the following values assuming that labels in the encoder are normally 1/8th of the original image size in other datasets (e.g., CityScapes):

--imHeight 210 --imWidth 500 --labelHeight 27 --labelWidth 63

Could anyone give some advise?

Thank you.

Explanation for the resnet view

Hi,

I am wondering, how do you go about what kind of resnet view is appropriate? For instance: in your paper, you present a resnet view of 3 conv layers with a bunch of prelus and stuff. In the inception-resent-v1, there is a resnet view like:

In other words, how do you make this design decision? Does it depend on the type of scene you are training on?

Second question is how do you make the design decision on the number of bottlenecks? For instance, you have a pattern of dilated, asymmetric, dilated twice in the encoder. Once again, is this number of bottlenecks dependent on the kind of scene like Suncam, cityscape etc or is just that the design decision on how deep the network is arbitrary and the deeper network, the better accuracy?

Error: torch/File.lua:375: unknown object

Hi, when running demo.lua on a video, I get the following error. My model and category file have been downloaded from the dropbox link in the repo readme.md

thx for your help.

djeb@DjebPC:~/ENet-training/visualize$ qlua demo.lua -i ~/Desktop/myVideo.mp4
GPU # 1 selected
Loading model from: /home/djeb/ENet-training/visualize/model/model-best.net
qlua: /home/djeb/torch/install/share/lua/5.1/torch/File.lua:375: unknown object
stack traceback:
    [C]: at 0x7f3093f069c0
    [C]: in function 'error'
    /home/djeb/torch/install/share/lua/5.1/torch/File.lua:375: in function 'readObject'
    /home/djeb/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
    demo.lua:63: in main chunk

ENet in tensor flow

ENet is really great for real time semantic segmentation. But recently I would like to use it in tensor flow. Is there any effort to your knowledge that port the torch model into tensor flow?

Early convergence Issue

Hi,

I found several model I trained would reach the great performance that the model-best.net you offered. I set opt.lua according to your documentation, except that I used a batch size of 2.

Here's my result:

and there is the result tested by your model-best.net

I trained it on cityscapes for several time. Actually the training process tended to converge at a early stage(80-100th epoch). Here is a graph of test error trend.

For parameter settings, I basically followed your default setting or your documentation.

Encoder

smallNet : false
learningRate : 0.001
datahistClasses :   810274
 1969251
  321342
 1195839
   34488
   46349
   65294
   10960
   29268
  835925
   61092
  202450
   65301
    7305
  371063
   14192
   12671
   12402
    5239
   22095
[torch.FloatTensor of size 20]

batchSize : 2
dataconClasses : table: 0x401fe608
dataClasses : table: 0x401fe4d0
channels : 3
printNorm : false
save : savemodel/
CNNEncoder : historymodel/enc_1/model-best.net
labelHeight : 32
labelWidth : 64
plot : false
nGPU : 2
lrDecayEvery : 100
weightDecay : 0.0005
imHeight : 256
dataset : cs
momentum : 0.9
devid : 1
cachepath : historymodel/
datapath : datasets/Cityscapes/
threads : 8
maxepoch : 300
noConfusion : all
learningRateDecay : 1e-07
model : models/encoder.lua
imWidth : 512

and I got Best test error: 0.46744307547808, in epoch: 88

Decoder

smallNet : false
learningRate : 0.001
datahistClasses :   45323724
 127228632
  20988628
  78573240
   2259981
   3025684
   4230457
    716094
   1900118
  54828056
   3993315
  13767863
   4206164
    466290
  24125912
    922413
    811419
    803171
    340114
   1427811
[torch.FloatTensor of size 20]

batchSize : 2
dataconClasses : table: 0x400dd688
dataClasses : table: 0x400dd550
channels : 3
printNorm : false
save : savemodel/
CNNEncoder : historymodel/enc_2_728/model-best.net
labelHeight : 256
labelWidth : 512
plot : false
nGPU : 2
lrDecayEvery : 100
weightDecay : 0.0005
imHeight : 256
dataset : cs
momentum : 0.9
devid : 1
cachepath : historymodel/dec
datapath : /home/eeb433/Documents/Yuhang/dilation/datasets/Cityscapes/
threads : 8
maxepoch : 300
noConfusion : all
learningRateDecay : 1e-07
model : models/decoder.lua
imWidth : 512

and I got Best test error: 0.77709275662899, in epoch: 95

Thanks for your answer!

demo.lua:200 attempt to index local 'img' (a nil value)

Hi all, I am trying to visualize the results and I am running into some snafoos. I added fastimage.so to torch/install/lib/lua/5.1 to get around the fastimage.so not found error after I run this command:

qlua demo.lua -i ../train/data/Cityscapes/leftImg8bit/test/ -d ../save/ -m decoder2 --net 16

The e-Lab Scene Parser opens but all I see is a white screen except for the colored legend that shows up on the side. This is the error that I get:

demo.lua:200: attempt to index local 'img' (a nil value) stack traceback: demo.lua:200: in function <demo.lua:186>

Any help would be greatly appreciated!

Sizes do not match error

Hi I tried to reproduce the training from this research paper. I was able to successfully train the encoder. However, I get the following error when training the decoder. Any help is appreciated.

Command used to train the encoder

th run.lua --dataset cv --datapath /home/ubuntu/SegNet-Tutorial/CamVid/ --model models/encoder.lua --save /home/ubuntu/ENet-training/train/trained_model/ --imHeight 360 --imWidth 480 --labelHeight 45 --labelWidth 60 --cachepath /home/ubuntu/ENet-training/train/dataset_cache/

Command used to start training the decoder and its resulting error:

th run.lua --dataset cv --datapath /home/ubuntu/SegNet-Tutorial/CamVid/ --model models/decoder.lua --save /home/ubuntu/ENet-training/train/trained_decoder/ --imHeight 360 --imWidth 480 --labelHeight 360 --labelWidth 480 --cachepath /home/ubuntu/ENet-training/train/dataset_cache/ --CNNEncoder /home/ubuntu/ENet-training/train/trained_model/model-299.net

Error:

==> Training: epoch # 1 [batchSize = 2] /home/ubuntu/torch/install/bin/luajit: bad argument #2 to '?' (sizes do not match at /home/ubuntu/torch/extra/cutorch/lib/THC/generic/THCTensorCopy.c:64) stack traceback: [C]: at 0x7ff8eee7d610 [C]: in function '__newindex' ./train.lua:97: in function 'train' run.lua:77: in main chunk [C]: in function 'dofile' ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670

Another command used to start training the decoder and its resulting error:

th run.lua --dataset cv --datapath /home/ubuntu/SegNet-Tutorial/CamVid/ --model models/decoder.lua --save /home/ubuntu/ENet-training/train/trained_decoder/ --imHeight 45 --imWidth 60 --labelHeight 360 --labelWidth 480 --cachepath /home/ubuntu/ENet-training/train/dataset_cache/ --CNNEncoder /home/ubuntu/ENet-training/train/trained_model/model-299.net

Error:

==> Training: epoch # 1 [batchSize = 2] /home/ubuntu/torch/install/bin/luajit: /home/ubuntu/torch/install/share/lua/5.1/nn/Container.lua:67: Step: 0ms In 2 module of nn.Sequential: /home/ubuntu/torch/install/share/lua/5.1/nn/JoinTable.lua:39: bad argument #1 to 'copy' (sizes do not match at /home/ubuntu/torch/extra/cutorch/lib/THC/generic/THCTensorCopy.cu:10) stack traceback: [C]: in function 'copy' /home/ubuntu/torch/install/share/lua/5.1/nn/JoinTable.lua:39: in function </home/ubuntu/torch/install/share/lua/5.1/nn/JoinTable.lua:21> [C]: in function 'xpcall' /home/ubuntu/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' ./train.lua:108: in function 'opfunc' /home/ubuntu/torch/install/share/lua/5.1/optim/adam.lua:33: in function 'adam' ./train.lua:123: in function 'train' run.lua:77: in main chunk [C]: in function 'dofile' ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670

Bad segmentation quality

Hi
I download this implementation of ENet, and download trained model, as mentioned in repo, but after processing, I got bad segmentation results.

Is it trained model and it is no need to train it? Should I train my own model to get good result or I run something wrong?

Thx

Class Accuracy is 0

I'm training both encoder and decoder on the CamVid dataset and use --noConfusion all. It works fine for encoder, while for decoder, the class accuracies for column-pole, sign-symbol, pedestrian and bicyclist are 0.000% during both training and testing. I'm not sure why this could happen? The dataset was downloaded from SegNet github, and the image size is kept as it is (360x480).
Here's confusionMatrix for one model:
Testing:

================================================================================
ConfusionMatrix:
[[ 6378220   69799      10      23     124  301373       0    3156   11950       0       0]   94.287% 	[class: Sky]
 [  137783 8384576       4   12455   44300  470470       0  397151  318598       3       0]   85.861% 	[class: Building]
 [   44235  287893       2     255    3949   56334       0   36627   41310       0       0]   0.000% 	[class: Column-Pole]
 [      10   20881       0 9232678  701740      26       0   28850  281855       0       0]   89.934% 	[class: Road]
 [       2   42951       0  656156 2133744       7       0   41084  820141       0       0]   57.761% 	[class: Sidewalk]
 [  281655 1948040       3     127    1128 2107336       0  118706   53577       3       0]   46.720% 	[class: Tree]
 [    7756  339229       0       6     132   44766       0    6926    2599       0       0]   0.000% 	[class: Sign-Symbol]
 [     910  262004       1    2599    6494    4922       0  108354   81889       0       0]   23.194% 	[class: Fence]
 [    5352  131591       1   64122   61088   11641       0   68830 1224900       0       0]   78.142% 	[class: Car]
 [      47  165030       0     303     944      93       0   62262   25160       0       0]   0.000% 	[class: Pedestrian]
 [       3   24571       0    1233    2030     177       0   29364   16965       0       0]]  0.000% 	[class: Bicyclist]
 + average row correct: 43.263584944676% 
 + average rowUcol correct (VOC measure): 33.552783618056% 
 + global correct: 77.335819603064%

Character data not supported type: 17

When I run "th preprocess.lua", the thread was killed at "6th chunk out of 8 chunks", and I see "Character data not supported type: 17" repeats many times at "5th chunk out of 8 chunks".
The problem is simply as the following picture:

Where to set #classes

Hi,
I have a same dataset as CamVid except with two classes, I assume the only where that needs to modify is loadCamvid.lua. I changed the classes and conClasses. Also I changed this line mask = rawImg:eq(13):float() to mask = rawImg:eq(3):float().

I could train the encoder part, but for the decoder the result has 13 classes.

Is there anywhere I should change?

-Best
Mina

LICENCE

Hi,
Great work, thanks for sharing it.
What is the licence for this repository?
Thanks!

cuda runtime error (77) : an illegal memory access was encountered

Hi,
while training cityscapes, i get the following error radomly during 1st or 2nd epoch:

THCudaCheck FAIL file=/home/udo/programs/torch/extra/cutorch/lib/THC/generic/THCStorage.c line=30 error=77 : an illegal memory access was encountered /home/udo/programs/torch/install/bin/luajit: cuda runtime error (77) : an illegal memory access was encountered at /home/udo/programs/torch/extra/cutorch/lib/THC/generic/THCStorage.c:147

I am using Ubuntu 16.04 with Cuda 8.0 and a GTX1070
Doesn't matter if cudnn is installed via luarocks install or not....

Best Regards,
Udo

Edit: launching with CUDA_LAUNCH_BLOCKING=1, i get:
THCudaCheck FAIL file=/home/udo/programs/torch/extra/cunn/lib/THCUNN/ClassNLLCriterion.cu line=171 error=77 : an illegal memory access was encountered /home/udo/programs/torch/install/bin/luajit: cuda runtime error (77) : an illegal memory access was encountered at /home/udo/programs/torch/extra/cutorch/lib/THC/generic/THCStorage.c:147

cuda runtime error (10) : invalid device ordinal

Hi, is there any solution to this error?

==> defining some tools

==> flattening model parameters THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-4230/cutorch/init.c line=719 error=10 : invalid device ordinal

==> defining some tools

==> flattening model parameters /home/CV/torch/install/bin/luajit: /home/CV/torch/install/share/lua/5.1/trepl/init.lua:384: ...V/torch/install/share/lua/5.1/cunn/DataParallelTable.lua:634: cuda runtime error (10) : invalid device ordinal at /tmp/luarocks_cutorch-scm-1-4230/cutorch/init.c:719 stack traceback:

`[C]: in function 'error'
/home/CV/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
run.lua:56: in main chunk

[C]: in function 'dofile'
.../CV/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk

[C]: at 0x00406670`

Many thanks!

Would E-Net support water segmentation after custom training ?

I would like to perform water segmentation form aerial image with e-net. Just wondering if its design is appropriate for such task. Thank you.

Unable to run on Jetson Tx1

I have tried to run a pre train model through demo.lua .
upon execution on a video.mp4 file - list of categories gets displayed , in a white window,
video statistics get displayed, but segmentation dosent start
After a couple of seconds, program displays 'killed' and exits.

I had run the model on my host machine without any difficulty earlier , but on jetson i face this issue

e-lab / enet-training Goto Github PK

enet-training's People

Contributors

Stargazers

Watchers

Forkers

enet-training's Issues

Command used to train the encoder

Command used to start training the decoder and its resulting error:

Error:

Another command used to start training the decoder and its resulting error:

Error:

Recommend Projects

Recommend Topics

Recommend Org