Failed to reproduce the results on VOC 2012 dataset

Hi, VainF. Thanks for sharing this nice repo, where the code has great readability and practicability. However, I failed to reproduce the results on the voc dataset.

I trained deeplabv3plus-resnet101 (os 16, provided pre-trained weights for ResNet101) on the VOC 2012_aug dataset with all the other default settings but only changed the gpu_id to '0,1,2,3' as I couldn't train the model on one 2080ti gpu with the batch size of 16. And I also applied the SyncBN: to avoid the performance decrease caused by multi-gpu training. And the best miou is 0.7539. Then I asked my friend to help to train the model, he trained the model on a TITAN RTX gpu with no SyncBN, and his best miou is 0.7535. Therefore I think multi-gpu training is fine with SyncBN.

Did you further apply multi-scale inference for the validation? Do I need to change some settings to achieve 0.783 on VOC 2012_aug dataset?

Looking forward to your reply and suggestions. Thank you again for your effort.

Only support single GPU?

I find that there is no 'Parallel' or 'parallel' in the codes, so I think it only supports single GPU, right?
Then how can you put 16 images on one GPU when trained on CityScapes……

Thanks for your effort!

--year 2012_aug

Hi VainF,

I am able to train --year 2012 with following command:

python --model deeplabv3plus_mobilenet --enable_vis --vis_port 28333 --gpu_id 0 --year 2012 --crop_val --lr 0.01 --crop_size 513 --batch_size 14 --output_stride 16 --continue_training

But when I try to train --year 2012_aug, I encounter following error:

Setting up a new session...
Device: cuda
Dataset: voc, Train set: 10582, Val set: 1449
[!] Retrain
Traceback (most recent call last):
  File "", line 390, in <module>
  File "", line 335, in main
    for (images, labels) in train_loader:
  File "/home/paul/segmentation/lib/python3.6/site-packages/torch/utils/data/", line 521, in __next__
    data = self._next_data()
  File "/home/paul/segmentation/lib/python3.6/site-packages/torch/utils/data/", line 1203, in _next_data
    return self._process_data(data)
  File "/home/paul/segmentation/lib/python3.6/site-packages/torch/utils/data/", line 1229, in _process_data
  File "/home/paul/segmentation/lib/python3.6/site-packages/torch/", line 425, in reraise
    raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/paul/segmentation/lib/python3.6/site-packages/torch/utils/data/_utils/", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/paul/segmentation/lib/python3.6/site-packages/torch/utils/data/_utils/", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/paul/segmentation/lib/python3.6/site-packages/torch/utils/data/_utils/", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/paul/segmentation/DeepLabV3Plus-Pytorch/datasets/", line 145, in __getitem__
    target =[index])
  File "/home/paul/segmentation/lib/python3.6/site-packages/PIL/", line 2912, in open
    fp =, "rb")
FileNotFoundError: [Errno 2] No such file or directory: './datasets/data/VOCdevkit/VOC2012/SegmentationClassAug/2008_002913.png'

In my ./datasets/data/VOCdevkit/VOC2012/SegmentationClassAug directory, I have train_aug.txt file in it. What am I missing? Please help. Thanks a lot.

P.S. I did check 2008_002913.png exists under ./datasets/data/VOCdevkit/VOC2012/JPEGImages
So do I need to copy all the .png files to ./datasets/data/VOCdevkit/VOC2012/SegmentationClassAug? or what should I do to fix this problem? Thanks for your help.

Edited: after follow the instruction to download labels from the dropbox and extract to ./datasets/data/VOCdevkit/VOC2012/SegmentationClassAug then every thing works as expected.

Pre-training model

Hello author, can the pre-training model provide the download address in China? Like Baidu Cloud

Deeplab code not implemented

Why everything comes to this line calling DeepLabV3 but it is not implemented as shown in the figure below?

if name=='deeplabv3plus':
    return_layers = {'high_level_features': 'out', 'low_level_features': 'low_level'}
    classifier = DeepLabHeadV3Plus(inplanes, low_level_planes, num_classes, aspp_dilate)
elif name=='deeplabv3':
    return_layers = {'high_level_features': 'out'}
    classifier = DeepLabHead(inplanes , num_classes, aspp_dilate)
backbone = IntermediateLayerGetter(backbone, return_layers=return_layers)

model = DeepLabV3(backbone, classifier)
return model


IntermediateLayerGetter parameters

thanks for this implementation.

do you think it is safe to grab the parameters of the backbone after it has been passed through IntermediateLayerGetter? as done in here.

it seems that calling backbone.parameters() will retrieve only few parameters and not the entire backbone's parameters as one expects.
see here for an example using resnet.

can not upzip the file DeepLabV3Plus-ResNet101


i can not unzip the file best_deeplabv3plus_resnet101_cityscapes_os16.pth.tar

is it the file damaged or which tool i should use to unzip the file?

thank you!

Best Reguards

Nice Repo!

This repo is really nice, performance on pascal voc could be reproduce using 2 gpus with batchsize=16.

pth -> onnx

请问怎么将训练好的 pth 分割模型转换为 onnx?用的网络的 deeplabv3plus_resnet101



question to the trainingdata

hi, guys, i'm recently meeting a problem which very confuses me. I'm using deeplabv3+ to train a 5 classes segmentation model include forest, ground, sky, runway asphalt and runway lane. i used 3100 images and cooresponding labels. But i exchange the label index 1,2 by mistake from 1706th label up and i trained the network. But finally i get a better segmentation than before accidently. Do you know what causes this, because i fixed the problem and modifed the wrong label index as correct index afterwards and the results is bad. Thank you in advance.

Cityscapes training on Full Res image


Thanks for this wonderful repo!

I would like to ask you whether you have trained Cityscapes images on full resolution images using DeeplabV3 + Mobilenet architecture model you have provided in this repo?

Questions about evaluating cityscapes dataset

Thanks for your great work, I just wandering how do you evaluate cityscapes dataset, after reading your code, it seems like you trained the model on input size 512x512, and directly evaluate on the original image size(1024 x 2048):

  if opts.crop_val:
            val_transform = et.ExtCompose([
                et.ExtResize(opts.crop_size),     # random crop to 512 x 512
                et.ExtNormalize(mean=[0.485, 0.456, 0.406],
                                std=[0.229, 0.224, 0.225]),
            val_transform = et.ExtCompose([
                et.ExtNormalize(mean=[0.485, 0.456, 0.406],
                                std=[0.229, 0.224, 0.225]),

Why use the same model to evaluate the different input image size? Thanks.

[!] Retrain

[!] Retrain输出这个是什么原因啊?

TypeError: the JSON object must be str, bytes or bytearray, not bool

I work with Cityscapes dataset but when training there is a error like this :
Traceback (most recent call last):
File "", line 388, in
File "", line 217, in main
vis = Visualizer(port=opts.vis_port,
File "I:\DeepLabV3Plus-Pytorch-master\utils\", line 14, in init
ori_win = json.loads(ori_win)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\", line 341, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not bool

how can i know miou score

Hello, I would like to know whether you started training from scratch without loading any weight and how many epochs you have trained

Reproduce issue.

With the default training setting of this code, I train "deeplabv3plus_resnet101" model on voc12.
The best mIOU I can get is 0.763, whereas the provided corresponding model can score 0.783.

Question about padding in Mobilenetv2

Dear VainF,

self.input_padding = fixed_padding( 3, dilation )

x_pad = F.pad(x, self.input_padding)

Notice that these two lines are different from the original Mobilenetv2. Could you please share the reason why you implement padding in these two lines and what's consequence of removing them?

Thank you very much.

With kind regards.


Why is this nn.AdaptiveAvgPool2d(1) done here?

class ASPPPooling(nn.Sequential):

def __init__(self, in_channels, out_channels):
    super(ASPPPooling, self).__init__(
        nn.Conv2d(in_channels, out_channels, 1, bias=False),

def forward(self, x):
    size = x.shape[-2:]
    x = super(ASPPPooling, self).forward(x)
    return F.interpolate(x, size=size, mode='bilinear', align_corners=False)

I am doing segmentation task and this abive pooling changes my output from torch.Size([1, 256, 16, 16]) to torch.Size([1, 256, 1, 1])
giving the error,
"Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])"

What could have gone wrong?

Testing 问题


Train DeeplabV3Plus-MobileNetV3 for Road Only Segmentation

Hi there,

I am trying to find a model which just do segmentation for road with this model:

'tf_mobilenetv3_small_075': {
        'imagenet': ''

I need to run it on Rpi + OAK-D camera, so I'd like it to be as not that slow on these edge devices.

Could you please provide this trained model, or help to show me how to do it?



Training and Version

Hi, Could please give some clear instruction of the changes in and if I want to train my own data set. Also tell the versions of libraries you used.

question about --continue training

Hello, thanks for your nice work.
I met a bug on --continue training.

python --model deeplabv3plus_mobilenet --dataset cityscapes --gpu_id 6 --lr 0.1 --crop_size 768 --batch_size 12 --output_stride 16 --data_root ./datasets/data/cityscapes --ckpt checkpoints/best_deeplabv3plus_mobilenet_cityscapes_os16.pth --continue_training


Can you fix it?

low GPU utility

| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 GeForce RTX 208... Off | 00000000:01:00.0 Off | N/A |
| 36% 64C P2 93W / 250W | 5556MiB / 11018MiB | 38% Default |
| 1 GeForce RTX 208... Off | 00000000:02:00.0 Off | N/A |
| 37% 64C P2 118W / 250W | 5486MiB / 11019MiB | 32% Default |

I'm using two RTX 2080Ti GPUs, and the average utility is around 35%. I also tried the implementation in SMP, and the utility is low as well.
Wonder anyone else also experiences this problem? And what may be the cause? Thanks.
I'm sure it's not caused by the data loader, as when I use unet or my own model, the utility is always over 90%.

can't reproduce your results

Thank you for your amazing work. I was trying to reproduce your results on cityscapes dataset. However, I couldn't reach mIoU > 70 % for both mobilenet and resnet based model. Could you share your training hyperparameters? Also, do you have any training tips that could help to reach your results?

With kind regards.

resnet50 training problem

Hello. I'm trying to reproducing the result. However, when training with deeplabv3plus_resnet50, the mIoU can't reach 0.772. Instead the best performance is 0.714. I wonder is there modification of hyper-parameter when you train it yourself. Thank you very much.

Question about train


Thanks for your repo! I successfully trained the DeeplabV3Plus-Mobilenetv2 model on the Pascal2012 dataset, but my mIOU is only 69.41%(python3 --model deeplabv3plus_mobilenet --separable_conv --gpu_id 0 --year 2012_aug --crop_val --lr 0.007 --crop_size 513 --batch_size 10 --output_stride 16).How can I improve?
Another question, why is the experimental section of mobilenetv2's paper up to 75.70% mIOU?Was it because his model had been pretrained on COCO?I'm so confused...

Look forward to your answers!

reporduced your code

Hello, I would like to know whether you started training from scratch without loading any weight and how many epochs you have trained

question about test

@VainF ,Hi,I can train normally on cityscapes datasets, but the test results are obviously wrong. What's the matter?

Is the separable_conv is better than standard conv?

I train moblenet-deeplabv3+ and mobilenet-deeplabv3+ with --separable_conv open
and find that latter is better than former by 1.8% (MIoU).
So I want your result and I guess it is because the reducing of overfitting?
Thanks a lot!

train --year 2007 failed

Hi, while waiting to download, I try to run 2007 dataset I already downloaded before.

When run, I got the following error message:

(segmentation) paul@tensor:~/segmentation/DeepLabV3Plus-Pytorch$ python --model deeplabv3plus_mobilenet --enable_vis --vis_port 28333 --gpu_id 0 --year 2007 --crop_val --lr 0.01 --crop_size 513 --batch_size 16 --output_stride 16
Setting up a new session...
Device: cuda
Dataset: voc, Train set: 209, Val set: 213
[!] Retrain
/home/paul/segmentation/lib/python3.6/site-packages/torchvision/transforms/ UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
"Argument interpolation should be of type InterpolationMode instead of int. "
/home/paul/segmentation/lib/python3.6/site-packages/torchvision/transforms/ UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
"Argument interpolation should be of type InterpolationMode instead of int. "
Epoch 1, Itrs 10/30000, Loss=1.980302
Traceback (most recent call last):
File "", line 390, in
File "", line 342, in main
outputs = model(images)
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/modules/", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/parallel/", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/modules/", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/paul/segmentation/DeepLabV3Plus-Pytorch/network/", line 16, in forward
x = self.classifier(features)
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/modules/", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/paul/segmentation/DeepLabV3Plus-Pytorch/network/", line 49, in forward
output_feature = self.aspp(feature['out'])
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/modules/", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/paul/segmentation/DeepLabV3Plus-Pytorch/network/", line 160, in forward
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/modules/", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/paul/segmentation/DeepLabV3Plus-Pytorch/network/", line 130, in forward
x = super(ASPPPooling, self).forward(x)
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/modules/", line 139, in forward
input = module(input)
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/modules/", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/modules/", line 178, in forward
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/", line 2279, in batch_norm
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/", line 2247, in _verify_batch_size
raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])

what should I do to fix this error? Thank you for your help.

How to modify the structure to fit more than three channels of input pictures?

The work I am currently facing needs to add a mask as a four-channel input based on the three-channel picture. I do n’t know how to change the network structure. For example, when using resnet101 as the backbone,how to modify the network structure to fit the four-channel Picture input?
Hope for your help, Thanks

Error while loading DeepLabV3Plus-ResNet50 model from checkpoint with --separable_conv flag

It seems that DeepLabV3Plus-ResNet50 model is not trained while --separable_conv flag is active because trying to load the weights when this flag is active, causes an error at the checkpoint loading stage.


python --model deeplabv3plus_resnet50 --separable_conv --ckpt checkpoints/best_deeplabv3plus_resnet50_voc_os16.pth --test_only --save_val_results


RuntimeError: Error(s) in loading state_dict for DeepLabV3:
Missing key(s) in state_dict: "classifier.aspp.convs.1.0.body.0.weight", "classifier.aspp.convs.1.0.body.1.weight", "classifier.aspp.convs.2.0.body.0.weight", "classifier.aspp.convs.2.0.body.1.weight", "classifier.aspp.convs.3.0.body.0.weight", "classifier.aspp.convs.3.0.body.1.weight", "classifier.classifier.0.body.0.weight", "classifier.classifier.0.body.1.weight".
Unexpected key(s) in state_dict: "classifier.aspp.convs.1.0.weight", "classifier.aspp.convs.2.0.weight", "classifier.aspp.convs.3.0.weight", "classifier.classifier.0.weight".

Perhaps, if you are still keeping the commands that you used for training the models for which you shared the weights and publish those commands, it might be easier to use the pre-trained models.

I just wanted to note this point in case somebody else also experiences the same issue. Overall, the repo is really helpful. Thank you.

Model performance index

Hi @VainF ,

I used THOP to add two lines of code to calculate the model parameters and flops in the,but the result is not ideal.How does your code calculate the flops and parameters of the model as your chart shows?
Looking forward to your answer!Thanks!

FocalLoss params alpha and gamma

I use deeplabv3plus_resnet101 to train my own dataset, and set loss='Focal_Loss'.
But I found the params in focalloss are set as α=1,γ=0, it means the same to cross_entroy loss.
Is this something you did on purpose or is this a code error ?

how can i get your score?





import cv2
import numpy as np

from network import *
from PIL import Image
from torchvision.transforms.transforms import *
import torch

val_transform = Compose([
#et.ExtResize( 512 ),
Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_path = 'models_res/best_deeplabv3plus_mobilenet_cityscapes_os16.pth'
model = deeplabv3plus_mobilenet(num_classes = 19,output_stride=16)


img_path = 'results/4_image.png'
image ='RGB')
input = cv2.cvtColor(np.asarray(image),cv2.COLOR_RGB2BGR)
if name == 'main':
import torch


model_dict = torch.load(model_path)
test_input = val_transform(image).unsqueeze(dim=0)
test_input =
output =model(test_input).cpu().detach().clone()

preds = output.max(dim=1)[1].cpu().numpy()#中括号里对应输出 19 个维度中其中一个
mask = (output.detach().max(dim=1)[1].cpu()==5).nonzero()
mask = mask[...,1:].numpy()
cv2.drawContours(input, [mask], -1, (0, 0, 255), -1)


