davidtvs / pytorch-enet Goto Github PK

PyTorch implementation of ENet

License: MIT License

Python 99.67% Dockerfile 0.33%

enet pytorch cityscape camvid

pytorch-enet's Introduction

PyTorch-ENet

PyTorch (v1.1.0) implementation of ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation, ported from the lua-torch implementation ENet-training created by the authors.

This implementation has been tested on the CamVid and Cityscapes datasets. Currently, a pre-trained version of the model trained in CamVid and Cityscapes is available here.

Dataset	Classes ¹	Input resolution	Batch size	Epochs	Mean IoU (%)	GPU memory (GiB)	Training time (hours)²
CamVid	11	480x360	10	300	52.1³	4.2	1
Cityscapes	19	1024x512	4	300	59.5⁴	5.4	20

¹ When referring to the number of classes, the void/unlabeled class is always excluded.
² These are just for reference. Implementation, datasets, and hardware changes can lead to very different results. Reference hardware: Nvidia GTX 1070 and an AMD Ryzen 5 3600 3.6GHz. You can also train for 100 epochs or so and get similar mean IoU (± 2%).
³ Test set.
⁴ Validation set.

Installation

Local pip

Python 3 and pip
Set up a virtual environment (optional, but recommended)
Install dependencies using pip: pip install -r requirements.txt

Docker image

Build the image: docker build -t enet .
Run: docker run -it --gpus all --ipc host enet

Usage

Run main.py, the main script file used for training and/or testing the model. The following options are supported:

python main.py [-h] [--mode {train,test,full}] [--resume]
               [--batch-size BATCH_SIZE] [--epochs EPOCHS]
               [--learning-rate LEARNING_RATE] [--lr-decay LR_DECAY]
               [--lr-decay-epochs LR_DECAY_EPOCHS]
               [--weight-decay WEIGHT_DECAY] [--dataset {camvid,cityscapes}]
               [--dataset-dir DATASET_DIR] [--height HEIGHT] [--width WIDTH]
               [--weighing {enet,mfb,none}] [--with-unlabeled]
               [--workers WORKERS] [--print-step] [--imshow-batch]
               [--device DEVICE] [--name NAME] [--save-dir SAVE_DIR]

For help on the optional arguments run: python main.py -h

Examples: Training

python main.py -m train --save-dir save/folder/ --name model_name --dataset name --dataset-dir path/root_directory/

Examples: Resuming training

python main.py -m train --resume True --save-dir save/folder/ --name model_name --dataset name --dataset-dir path/root_directory/

Examples: Testing

python main.py -m test --save-dir save/folder/ --name model_name --dataset name --dataset-dir path/root_directory/

Project structure

Folders

data: Contains instructions on how to download the datasets and the code that handles data loading.
metric: Evaluation-related metrics.
models: ENet model definition.
save: By default, main.py will save models in this folder. The pre-trained models can also be found here.

Files

args.py: Contains all command-line options.
main.py: Main script file used for training and/or testing the model.
test.py: Defines the Test class which is responsible for testing the model.
train.py: Defines the Train class which is responsible for training the model.
transforms.py: Defines image transformations to convert an RGB image encoding classes to a torch.LongTensor and vice versa.

pytorch-enet's People

Contributors

Stargazers

Watchers

Forkers

wpf535236337 irfanicmll davidnet yuyangyg kennya-42 shreelock amirunpri2018 shinkyo0513 vipermdl mrluin jshowzzz linhduongtuan wangq95 apocalypsetank yangyuren03 trevol ahnuxff 980380446 collector-m daidai321 tyunist xxxxxxxiao jasonlee020 absaravanan steveszf nszjh peterxiaoguo lejson pgadosey chl916185 delldu activeintelligentsystemslab timctho prasadpant pandinosaurus rajuthegr8 1273545169 tianqi-wang1996 sublucky benjaminyoung29 shuxiangguo jtang10 shreyaskamathkm haminyg njustghw shuai-xie suhyung-code42 queequeg92 i6173215 mengkunzhao riwaly avijit9 tomsirliu viridityzhu btayart anotherkey holygen noticeable alphacool123 tjusym soulmateb penpaperkeycode mitalbert welsum sagarjoglekar wanglaotou kargnin simp13 patrickna wx7911 cinderstones jobr0 ykk136 gregoryjesse w11m lao-ling-jie hamid-naderi chriswill21 hehewang625327 tootouch gibranbenitez torjusn hfutdzx amiraliebrahimi jlniuhuan stat-eklee prabhatkumar95 jimmysue skybird1101 manideep1108 renlancai haoranzhuexplorer py6578 gabrieldeml mahedjaved danslavov pmingeli philippmwirth scottflybird msewantcs

pytorch-enet's Issues

Test speed

Hello ,could you help me how to calculate the segmentation speed.I don't know how to calculate the speed in pytorch. Thank you!

How do I save the output file?

Hi, first of all, thank you for the great work!
I am newcommer for the pytorch and computer vision area,
I trained and tested for the cityscapes ( for testing I used some of validation data to see if it works properly).
I am curious how to save the output file that is segmented. or testing my own photos that don't have annotated data with them?

Thank you again!

Evaluation code

Is there some code in there if I want to use the pretrained model on a single input image?

the question about torch.load()

Thanks for your code which help me a lot. I want to segment my road figure, this is my test, not belong to any research or commodity. I wrote a new code. this code is just load the pre-trained model and use it to process my picture. But, when I load the model, the error arises.
"_pickle.UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified."

i have searched some explanation, which one tells me this is because of verison of torch, but I am sure my version meets the requirements.

Training the network on a binary mask

I'm trying to run the training using my own dataset, which consists of images and 2d binary masks. With the current label transformation I keep getting memory exceptions, I tried handling the label transformation to make it work but when the training starts the result showing (Mean IoU: 1.0000) with first epoch. Do you have suggestions on how to make the network work with a new dataset and specifically a binary classification task?

Thanks

RuntimeError: weight tensor should be defined either for all or no classes

Hi, when I trained my own dataset ( like camvid, has 4 classes), error hapened:

>>>> [Epoch: 0] Training
Traceback (most recent call last):
  File "main.py", line 306, in <module>
    model = train(train_loader, val_loader, w_class, class_encoding)
  File "main.py", line 191, in train
    epoch_loss, (iou, miou) = train.run_epoch(args.print_step)
  File "/media/gaoya/disk/Applications/pytorch/SemanticSegmentation/PyTorch-ENet-master/train.py", line 47, in run_epoch
    loss = self.criterion(outputs, labels)
  File "/media/gaoya/disk/Applications/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/gaoya/disk/Applications/anaconda3/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 916, in forward
    ignore_index=self.ignore_index, reduction=self.reduction)
  File "/media/gaoya/disk/Applications/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1995, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/media/gaoya/disk/Applications/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1826, in nll_loss
    ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: weight tensor should be defined either for all or no classes at /tmp/pip-req-build-58y_cjjl/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:27

How can I solve it?

Transfer learning with a different number of classes as in original

How would you apply transfer learning to a model or continue training with a another dataset that does not consists of all the classes as in the original? E.g. you train the model with the Cityscapes dataset first and then you want to continue with your own dataset that consists of a subset of the classes from Cityscapes?

How do I save the segmented images ?

Hi, thanks for the code. I am getting the mIoU result. I wanted to see the segmented image and save those it in a folder. Could you please help me with this ?

CUDA error: out of memory

Hi thanks for sharing your work.
I'm trying to use your network with cityscapes dataset but i get a runtime cuda error during test forwarding (out of memory).
I'm using a gtx 1060 6gb and intel i5 7600. What do you think i can change to make it work properly? Which parameters should i change?

thanks for your help!

what happened to unlabeled or void classes?

Can you please explain how you handle the void classes or unlabeled classes?

I guess more accurately I am asking what is this doing:

```parser.add_argument(
    "--with-unlabeled",
    dest='ignore_unlabeled',
    action='store_false',
    help="The unlabeled class is not ignored.")```

are we not ignoring the unlabeled classes? how is that even possible?
Thank you for explanation in advance!

Ignore index is not looping over indices

Hey!
First of all: Thank you for the repository! I have been looking for quite a while to get a nice (performant) way to calculate the meanIoU over batches.

I noticed that in iou.py you are looping ofer the indices that you want to set to 0, but you don't use the looping variable:

 for index in self.ignore_index:
    conf_matrix[:, self.ignore_index] = 0
    conf_matrix[self.ignore_index, :] = 0

It should be this instead, right?

 for index in self.ignore_index:
    conf_matrix[:, index] = 0
    conf_matrix[index, :] = 0

memory cost and performance on cityscapes

i am sorry...
i have ran a different network, and get wrong results.
how can i draw back this issue....

Is there any official code to validate Camvid's results?

Thank you for your works, I have a question about validate camvid's results.
Are the results of all papers on camvid datasets validated by their own code, rather than by official code?

dataset folder structure

Hi mate, I was trying to test the Cityscape dataset with this command

python main.py -m train --save-dir save/ENet_Cityscapes/ --name test --dataset cityscapes --dataset-dir data/leftImg8bit_trainvaltest

But it gave me this


Loading dataset...

Selected dataset: cityscapes
Dataset directory: data/leftImg8bit_trainvaltest
Save directory: save/ENet_Cityscapes/
Traceback (most recent call last):
  File "main.py", line 298, in <module>
    loaders, w_class, class_encoding = load_dataset(dataset)
  File "main.py", line 45, in load_dataset
    train_set = dataset(
  File "/home/john/Desktop/PyTorch-ENet/data/cityscapes.py", line 84, in __init__
    self.train_data = utils.get_files(
  File "/home/john/Desktop/PyTorch-ENet/data/utils.py", line 19, in get_files
    raise RuntimeError("\"{0}\" is not a folder.".format(folder))
RuntimeError: "data/leftImg8bit_trainvaltest/leftImg8bit_trainvaltest/leftImg8bit/train" is not a folder.

I have tried different way but it keeps saying is not a folder
Can you recommend a directory structure? that would be great

the question about output shape

I got a problem when i test the model with cityscapes's dataset like this. How can I solve it?

Traceback (most recent call last):
File "main.py", line 348, in
loaders, w_class, class_encoding = load_dataset(dataset)
File "main.py", line 124, in load_dataset
color_labels = utils.batch_transform(labels, label_to_rgb)
File "D:\Projects\ENetProjects\PyTorch-ENet-master\utils.py", line 21, in batch_transform
transf_slices = [transform(tensor) for tensor in torch.unbind(batch)]
File "D:\Projects\ENetProjects\PyTorch-ENet-master\utils.py", line 21, in
transf_slices = [transform(tensor) for tensor in torch.unbind(batch)]
File "D:\Applications\anaconda3\lib\site-packages\torchvision\transforms\transforms.py", line 61, in call
img = t(img)
File "D:\Projects\ENetProjects\PyTorch-ENet-master\transforms.py", line 92, in call
color_tensor[channel].masked_fill_(mask, color_value)
RuntimeError: output with shape [360, 480] doesn't match the broadcast shape [3, 360, 480]

generates iou greater than 1 looks like a wrong formulation!

For some weird reason I can get IoU greater than 1, which does not make scene.
what can be possible reason for that?

digging more into the code I guess we can see in here that we we do self.conf_metric.value() it wont be equal to the number of pixels. so I guess there is something is wrong in the computation process.

None

during inference, how to handle unlabeled class?

Hi, thank your for your great work. I've successfully trained my own models on local dataset. Just one question about unlabeled class: based on my understanding, the unlabeled class is excluded by setting the weight to 0 during training. But during inference, I found the trained model couldn't predict any unlabeled class. It predicted a cyan boundary (like the car class in attached image) where cyan color supposed to be pole and sign-symbols..Not sure if there's anything I've done wrong?

How to run trained network on own images?

How would I apply the ENet to my own images or videos? I don't see an example in your code. I tried to add images to the test directory and test.txt in VamVid but they were not picked up during testing.

About the cityscape dataset 1024*512.

Hello!Thanks for your code.i trained on cityscapes for epoch=300,but the best model is when epoch210 ,miou is 50.00%.The question is, is it because the size of the data set I use is the original size not 1024*512

something wrong with mIoU

For some reason, I received a negative number for mIoU. I am not sure what is wrong with it

Add License

Hi,

Could you please add a license to your project (like MIT, BSD or Apache2) so others may utilize your code?

Thanks

Wrong about train.

meet the error when i train,I haven't changed any code.
I've tried to modify the dimensions of label and outputs，But it still doesn't work.
Looking forward to your reply
Traceback (most recent call last): File "PyTorch-ENet/run.py", line 305, in <module> model = train(train_loader, val_loader, w_class, class_encoding) File "PyTorch-ENet/run.py", line 192, in train epoch_loss, (iou, miou) = train.run_epoch(args.print_step) File "PyTorch-ENet\train.py", line 47, in run_epoch loss = self.criterion(outputs, labels) File "Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "Anaconda3\lib\site-packages\torch\nn\modules\loss.py", line 904, in forward ignore_index=self.ignore_index, reduction=self.reduction) File "Anaconda3\lib\site-packages\torch\nn\functional.py", line 1970, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "Anaconda3\lib\site-packages\torch\nn\functional.py", line 1792, in nll_loss ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: invalid argument 3: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4 at c:\a\w\1\s\windows\pytorch\aten\src\thnn\generic/SpatialClassNLLCriterion.c:59

There are total 13 classes in camvid.py, but the outputs generated via pre-trained model in "./save/CamVid" have only 12 dimensions.

Training with arbitrary sized images

Hello there,
I am trying to train with arbitrary sized images.
As mentioned by @tcwang0509 in NVIDIA/pix2pixHD#52 (comment)_ Concerning a similar network the sizes need to be divisible by 32. As far as I notized, this is also the case using this ENet implementation.

Why is this the case?

How can this be handled?
In Training that is not a problem to me, but when it comes to testing and validating my models, I simply cannot resize or random crop my test / val images, because I would like to compare the results.
For example I am training the model on PascalVOC 2012 with the images being resized to 480x480, but when it comes to testing there are arbitrary sizes.
Is there any way of making such a network compatible to arbitrary sized images?

I stumbled across this comment meetps/pytorch-semseg#43 (comment)_ concering the UNet which insists on using padding of 1.
But so far I have not had success on the ENet. Should a padding be added to each of the modules in the encoder and decoder?

Desperately searching for help ;)
Thanks in advance!

utils.batch_transform requires some changes

When I run python main.py -m test --imshow-batch

I get an error

File "/home/sam/Documents/ComputerVision/PyTorch-ENet/utils.py", line 24, in batch_transform
return F.stack(transf_slices)
AttributeError: module 'torch.functional' has no attribute 'stack'

Testing MIoU is Nan in cityscapes dataset (1024x520)

Hello @davidtvs I follow the way how to train the cityscapes. The validation's Mean IoU is 0.5426, Then I try to do testing however I got nan result, it said because of Mean of empty slice. Here is the full problem.

Testing...

>>>> Running test dataset
/home/my_home/PyTorch-ENet/metric/iou.py:93: RuntimeWarning: Mean of empty slice
  return iou, np.nanmean(iou)
>>>> Avg. loss: 0.0000 | Mean IoU: nan
unlabeled: nan
road: nan
sidewalk: nan
building: nan
wall: nan
fence: nan
pole: nan
traffic_light: nan
traffic_sign: nan
vegetation: nan
terrain: nan
sky: nan
person: nan
rider: nan
car: nan
truck: nan
bus: nan
train: nan
motorcycle: nan
bicycle: nan

Training on Custom Dataset where height/width are not powers of 2

Hi, I'm trying to train on a custom dataset of size 300x300, and errors are being thrown in the second DownSamplingBottleneck as when concatenating the size of main and ext, there's a size mismatch.

This is due to the fact that after the first downsampling bottleneck, 300x300 becomes 150x150 for both main and ext.

During the second downsampling bottleneck, main becomes 38 since (75 + 2x1 - 1x(3-1) -1)/2 + 1 = 74/2 + 1 = 38.

However, ext becomes (75-0x1 - 1x(2-1) -1)/2 + 1 = 73/2 + 1 = 37.5, which is floored.

Do you have any suggestions to get around this? I'm relatively new to pytorch.

Details in training on Camvid

Hello @davidtvs thank you for your works. I have a question about camvid training. I try to train from scratch using Camvid dataset which follow this division for training and testing. I evaluate the validation data and I just got mIOU about 31% in epoch 1000. Its using the same size as you mention in your readme. Do you also got the same problem?.
Using your implementation I follow this setting:

11 classes, the unlabelled will belongs to 0, all of the other class outside of that 11 class will be belongs to class background or 0.
The class roadmarking is not used!, I have check also in another implementation they don't use it.
I used ENet initialization.
So in your experiment, did you got 31%accuracy in validation data at epoch more than 500?

ENet returns a tensor with Nans as an output

I tried to train ENet from scratch on my own data and after several epochs of training ENet started to return a tensor with Nan values. Can any part of the model cause this problem? If it is true, how we can handle it?

Wrong about test

The model I train and you provide both have this problem.

Avg. loss: 0.0000 | Mean IoU: nan
unlabeled: nan
road: nan
sidewalk: nan
building: nan
wall: nan
fence: nan
pole: nan
traffic_light: nan
traffic_sign: nan
vegetation: nan
terrain: nan
sky: nan
person: nan
rider: nan
car: nan
truck: nan
bus: nan
train: nan
motorcycle: nan
bicycle: nan

test IoU is nan

Thank you for opening the wonderful codes of ENet for us.
When I try to select the "test" mode , the result is like this:

Avg. loss: 13.1708 | Mean IoU: nan
unlabeled: nan
road: nan
sidewalk: nan
building: nan
wall: nan
...

And all the IoUs are nan.
However, when I choose to train the ENet model, the mIoU is normal, but I am sure that the project uses the same iou.py and confusionmatrix.py.
Would you like to explain the reason for me? Thank you very much!

the question with output

Hellow! when i test the model ,i only got the follow numbers. How can i get the output results of segmentation?
Numbers:

Running test dataset
Avg. loss: 1.0134 | Mean IoU: 0.5271
sky: 0.8987
building: 0.6927
pole: 0.1963
road: 0.9145
pavement: 0.7308
tree: 0.6252
sign_symbol: 0.1862
fence: 0.1774
car: 0.6926
pedestrian: 0.2986
bicyclist: 0.3851
unlabeled: nan

a question about UpsamplingBottleneck block

@davidtvs
The official code does not have any activation functions in 1x1 expansion block in all Bottleneck blocks.
The paper said that they place Batch Normalization and PReLU between all convolutions, not after all convolutions
This is official code fragment in Bottleneck block. I cannot find any activation functions after the region marked in red.
I hope you can check this problem.

Activation selection within the bottlenecks in the network

if relu:
    activation = nn.ReLU()
else:
    activation = nn.PReLU()

Does doing this ensure that PReLU weights are unique for each instance of activation within the bottlenecks? While trying to trace this network with torch.jit, it gives errors regarding shared weights by nn.PReLU layers within the submodules. Perhaps this should be implemented with copy.deepcopy for all instances?

To follow the original paper more closely, the number of channels can be specified for each PReLU instance to learn a weight per channel as shown here.

Resize of Label with categorical classes

I was trying to run the code but I found that in the label_transforms you use Resize for training the ENet. How is it possible to use this function when the default transformation is BILINEAR and that would increase the number of categories in the image. For example, if you have number 4 and 6 and resize to a smaller size the image would produce a value of 5 which is actually another category. Am I missing something in terms of the preprocessing of the data?

Thanks for the help.

Custom dataset

I was wondering what it the solution in case we want to train it on our own dataset?
Thank you in advance.

Training with two classes

When I try to train the model with only two classes, let's say 'road' and 'unlabeled' the IOU remains 1 all the time. Is there a bug or is it something that I need to configure differently?

How to get the test results of cityscapes dataset?

I have trained the cityscapes dataset using the following, and the Mean IoU is valid dataset.

python main.py -m train --save-dir save/folder/ --name model_name --dataset name --dataset-dir path/root_directory/

Then I want to get the test results of cityscapes dataset, and I run the Testing code but find problem.

python main.py -m test --save-dir save/folder/ --name model_name --dataset name --dataset-dir path/root_directory/

THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
*** Error in `python': free(): invalid pointer: 0x00007fbf14a51d40 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fbf92c6b7e5]
...

can you help me ? thank you @davidtvs

question about testing the model on an image

Hi, I have trained the model with my dataset and want to see the segmentation effect of the model.How can I use the model to segment an image?
Thanks in advance.

Question about the params and FLOPs of ENet

Does anyone reproduce the ENet？Why the params and GFLOPs of my reproduced Network are about 10 and 4 times bigger than the values mentiond in the original paper (Table 3) respectively?
My calculated value—— params: 3.5Million，GFLOPs: 16.9

Test on video

Hello!
Thank you for your codes. The architecture works great on images. But when I tried to test on videos to implement real-time semantic segmentation the result was unsatisfactory. I used opencv to read and write the video. Do you have any ideas how to solve the issue, or how to get a better result when running the model on videos?
Thanks!

Working in real time

Hello. How can I start the work in real time so that the picture is taken from the camera?

mean of empty slice

Hi thanks for sharing your work.
I'm trying to test your network with cityscapes dataset but i get a mean of empty slice error.

Can we pass Single image or a video file to the model to test the semantic Segmentation results.

Once the model is trained can we just pass a single image or inputting video file to the model to test the semantic Segmentation results.

How to use the 960x720 CamVid dataset?

Hello,
I want to train the model by big resolution CamVid dataset, and I find the big image can be download here https://github.com/mostafaizz/camvid, but its label image is not consistent with small (480x360) https://github.com/alexgkendall/SegNet-Tutorial/tree/master/CamVid .
I am a new segmentation player, I don't know how to convert the color label image to the annotation image. Can you give me some advice about it?
Thank you!

Problems while using different dataset

Hello, I tried to adapt the code into a .ipynb so I could run isolated cells and check how the pipeline works, as I'm trying to evaluate the ENet performance over learning and predicting underwater images (SUIM Dataset, link), but I'm facing some problems.

When I run the cell for training, it throws the following error: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4. Have you faced similar issue before? Any help would be greatly appreciated!

Follow code below with minor changes:

import torch.nn.functional as F
import random
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms.functional as TF
import torchvision
import torchvision.transforms as transforms
import transforms as ext_transforms
import torch.optim.lr_scheduler as lr_scheduler
import os
import numpy as np
import matplotlib.pyplot as plt
import glob
from collections import OrderedDict
from torch.utils.data import Dataset, DataLoader

# ENET PYTORCH GITHUB LIBS
import utils
import tools
from PIL import Image
from enet import ENet
from iou import IoU
from train import Train
from test import Test

# Configuring images size
std_size = 256

# Setting device for torch
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

image_transform = transforms.Compose(
    [transforms.CenterCrop((std_size, std_size)),
      transforms.ToTensor()])

label_transform = transforms.Compose([
    transforms.CenterCrop((std_size, std_size)),
    ext_transforms.PILToLongTensor()])

root_dir = '/content/drive/My Drive/Colab Notebooks'
save_dir = '/content/drive/My Drive/Colab Notebooks/save'

# Training dataset root folders
train_folder = os.path.normpath(root_dir + '/train/images')
train_lbl_folder = os.path.normpath(root_dir + '/train/masks')

# Validation dataset root folders
val_folder = os.path.normpath(root_dir + '/val/images')
val_lbl_folder = os.path.normpath(root_dir + '/val/masks')

# Test dataset root folders
test_folder = os.path.normpath(root_dir + '/test/images')
test_lbl_folder = os.path.normpath(root_dir + '/test/masks')

class CustomDataset(Dataset):
    """Custom Dataset based on CamVid dataset found on:
      https://github.com/davidtvs/PyTorch-ENet.
    
    Student disclaimer: most parts of this code were used and adapted
    for academic purposes only, with no commercial intents. All rights
    reserved to original author. Please refer to the url cited above.

    Keyword arguments:
    - root_dir (``string``): Root directory path.
    - mode (``string``): The type of dataset: 'train' for training set, 'val'
    for validation set, and 'test' for test set.
    - transform (``callable``, optional): A function/transform that  takes in
    an PIL image and returns a transformed version. Default: None.
    - label_transform (``callable``, optional): A function/transform that takes
    in the target and transforms it. Default: None.
    - loader (``callable``, optional): A function to load an image given its
    path. By default ``default_loader`` is used.

    """

    img_extension = '.jpg'
    label_extension = '.bmp'

    color_encoding = OrderedDict([
        ('Background', (0,0,0)),
        ('Human Divers', (0,0,255)),
        ('Aquatic Plants and Sea-Grass', (0,255,0)),
        ('Wrecks and Ruins', (0,255,255)),
        ('Robots', (255,0,0)),
        ('Reefs and Intertebrates', (255,0,255)),
        ('Fishs and Vertebrates', (255,255,0)),
        ('Sea-Floor and Rocks', (255,255,255))
    ])

    def __init__(self, mode = 'train', transform=None, 
                 label_transform = None, loader = tools.pil_loader):
        self.mode = mode
        self.transform = transform
        self.label_transform = label_transform
        self.loader = loader
    
        if self.mode.lower() == 'train':
            # Get the training data and labels filepaths
            self.train_data = tools.get_files(
                train_folder, extension_filter=self.img_extension)

            self.train_labels = tools.get_files(
                train_lbl_folder, extension_filter=self.label_extension)
            
        elif self.mode.lower() == 'val':
            # Get the validation data and labels filepaths
            self.val_data = tools.get_files(
                val_folder, extension_filter=self.img_extension)

            self.val_labels = tools.get_files(
                val_lbl_folder, extension_filter=self.label_extension)
            
        elif self.mode.lower() == 'test':
            # Get the test data and labels filepaths
            self.test_data = tools.get_files(
                test_folder, extension_filter=self.img_extension)

            self.test_labels = tools.get_files(
                test_lbl_folder, extension_filter=self.label_extension)
            
        else:
            raise RuntimeError("Unexpected dataset mode. "
                               "Supported modes are: train, val and test")

    def __getitem__(self, index):

        """
        Args:
        - index (``int``): index of the item in the dataset

        Returns:
        A tuple of ``PIL.Image`` (image, label) where label is the ground-truth
        of the image.

        """
        if self.mode.lower() == 'train':
            data_path, label_path = self.train_data[index], self.train_labels[
                index]
        elif self.mode.lower() == 'val':
            data_path, label_path = self.val_data[index], self.val_labels[
                index]
        elif self.mode.lower() == 'test':
            data_path, label_path = self.test_data[index], self.test_labels[
                index]
        else:
            raise RuntimeError("Unexpected dataset mode. "
                               "Supported modes are: train, val and test")

        img, label = self.loader(data_path, label_path)

        if self.transform is not None:
            img = self.transform(img)

        if self.label_transform is not None:
            label = self.label_transform(label)

        return img, label
        
    
    def __len__(self):
        """Returns the length of the dataset."""
        if self.mode.lower() == 'train':
            return len(self.train_data)
        elif self.mode.lower() == 'val':
            return len(self.val_data)
        elif self.mode.lower() == 'test':
            return len(self.test_data)
        else:
            raise RuntimeError("Unexpected dataset mode. "
                               "Supported modes are: train, val and test")

# Setting Dataloader variables
mode = input('SELECT MODE OF OPERATION: train, val or test: ')
batch_size = 4
num_workers = 0

# Load the training set as tensors
train_set = CustomDataset(
    transform=image_transform,
    label_transform=label_transform)
train_loader = DataLoader(
    train_set,
    batch_size=batch_size,
    shuffle=True,
    num_workers=num_workers)

# Load the validation set as tensors
val_set = CustomDataset(
    mode='val',
    transform=image_transform,
    label_transform=label_transform)
val_loader = DataLoader(
    val_set,
    batch_size=batch_size,
    shuffle=False,
    num_workers=num_workers)

# Load the test set as tensors
test_set = CustomDataset(
    mode='test',
    transform=image_transform,
    label_transform=label_transform)
test_loader = DataLoader(
    root_dir,
    batch_size=batch_size,
    shuffle=False,
    num_workers=num_workers)

# Retrieving color_encoding
class_encoding = train_set.color_encoding

# Get number of classes to predict
num_classes = len(class_encoding)

# Print information for debugging
print("Number of classes to predict:", num_classes)
print("Train dataset size:", len(train_set))
print("Validation dataset size:", len(val_set))

# Get class weights from the selected weighing technique
weighing = 'enet'
ignore_unlabeled = False

print("\nWeighing technique:", weighing)
print("Computing class weights...")
print("(this can take a while depending on the dataset size)")

class_weights = 0

if weighing.lower() == 'enet':
    class_weights = tools.enet_weighing(train_loader, num_classes)
elif weighing.lower() == 'mfb':
    class_weights = tools.median_freq_balancing(train_loader, num_classes)
else:
    class_weights = None

if class_weights is not None:
    class_weights = torch.from_numpy(class_weights).float().to(device)
    # Set the weight of the unlabeled class to 0
    if ignore_unlabeled:
        ignore_index = list(class_encoding).index('unlabeled')
        class_weights[ignore_index] = 0

print("Class weights:", class_weights)

class_weights = None

learning_rate = 0.05
weight_decay = 0.1
lr_decay_epochs = 10
lr_decay = 0.1

# Intialize ENet
model = ENet(num_classes).to(device)
# Check if the network architecture is correct
# print(model)

# We are going to use the CrossEntropyLoss loss function as it's most
# frequentely used in classification problems with multiple classes which
# fits the problem. This criterion  combines LogSoftMax and NLLLoss.
criterion = nn.CrossEntropyLoss(weight=class_weights)

# ENet authors used Adam as the optimizer
optimizer = optim.Adam(
    model.parameters(),
    lr=learning_rate,
    weight_decay=weight_decay)

# Learning rate decay scheduler
lr_updater = lr_scheduler.StepLR(optimizer, lr_decay_epochs,
                                  lr_decay)

# Evaluation metric
metric = IoU(num_classes, ignore_index=False)

# Optionally resume from a checkpoint
resume = True
resume = False
name = 'test'

if resume:
    model, optimizer, start_epoch, best_miou = utils.load_checkpoint(
        model, optimizer, save_dir, name)
    print("Resuming from model: Start epoch = {0} "
          "| Best mean IoU = {1:.4f}".format(start_epoch, best_miou))
else:
    start_epoch = 0
    best_miou = 0

epochs = 10

train = Train(model, train_loader, optimizer, criterion, metric, device)
val = Test(model, val_loader, criterion, metric, device)
for epoch in range(start_epoch, epochs):
    print(">>>> [Epoch: {0:d}] Training".format(epoch))

    lr_updater.step()
    epoch_loss, (iou, miou) = train.run_epoch(True)

    print(">>>> [Epoch: {0:d}] Avg. loss: {1:.4f} | Mean IoU: {2:.4f}".
          format(epoch, epoch_loss, miou))

    if (epoch + 1) % 10 == 0 or epoch + 1 == epochs:
        print(">>>> [Epoch: {0:d}] Validation".format(epoch))

        loss, (iou, miou) = val.run_epoch(True)

        print(">>>> [Epoch: {0:d}] Avg. loss: {1:.4f} | Mean IoU: {2:.4f}".
              format(epoch, loss, miou))

        # Print per class IoU on last epoch or if best iou
        if epoch + 1 == epochs or miou > best_miou:
            for key, class_iou in zip(class_encoding.keys(), iou):
                print("{0}: {1:.4f}".format(key, class_iou))

        # Save the model if it's the best thus far
        if miou > best_miou:
            print("\nBest model thus far. Saving...\n")
            best_miou = miou
            #utils.save_checkpoint(model, optimizer, epoch + 1, best_miou)

About Training

When i want to train your ENet on my device, i run with following command:

python main.py -m train --save-dir ./camvid_model/ --name ENet --dataset camvid --dataset-dir CamVid/ --with-unlabeled --imshow-batch

But I meet the followed problems.
Traceback (most recent call last):
File "main.py", line 291, in
loaders, w_class, class_encoding = load_dataset(dataset)
File "main.py", line 110, in load_dataset
color_labels = utils.batch_transform(labels, label_to_rgb)
File "/home/amax/linrui/PyTorch-ENet-master/utils.py", line 21, in batch_transform
transf_slices = [transform(tensor) for tensor in torch.unbind(batch)]
File "/usr/local/lib/python2.7/dist-packages/torchvision/transforms/transforms.py", line 49, in call
img = t(img)
File "/home/amax/linrui/PyTorch-ENet-master/transforms.py", line 91, in call
color_tensor[channel].masked_fill_(mask, color_value)
RuntimeError: expand(torch.ByteTensor{[3, 360, 480]}, size=[360, 480]): the number of sizes provided (2) must be greater or equal to the number of dimensions in the tensor (3)

The label.size() is [B,3,H,W], And how to translate to [B,classnumber,H,W]. I didn't find in your code?
Thank you

save all the images

hello, thanks for your nice work.
I got a question, how can I save all the test segmented images, not the the batch of them. I have checked the function predict and imshow_batch, but don't find the answer.

many thanks

Single picture test time

Hi,Thank you for sharing your work.
I train my own data set and trained 2,456 images of data.
I found a weird problem when testing with a single image.Different Epoch saved model test times are different in the same CPU test environment.
Example: Single picture test time,Epoch_50 saves the model takes 0.2s, while Epoch_100 takes 0.9s, and the model size is the same 4.5Mb.
I am looking forward to your reply, thank you.