Code Monkey home page Code Monkey logo

Comments (13)

bb846 avatar bb846 commented on July 3, 2024 1

Try using larger crop size, for example 768.

It gives CUDA out of memory.

@bb846 Thanks, for sharing the number of workers link. I also found the ImageNet pre-trained weights being loaded. But still can't reproduce the result. I think the issue is that BatchNorm is not synced in this repository and I'll have to use DistributedDataParallel to use the Pytorch SyncBatchNorm: https://pytorch.org/docs/master/generated/torch.nn.SyncBatchNorm.html#torch.nn.SyncBatchNorm.

So could you please share if you were able to reproduce the 77% result or get >75%; the exact command (or hyperparameters you used); and any other changes you made to this code? I think some of the hyperparameters like the learning rate are different from the original paper.

Yes, it requires large GPUs. Regarding of the result, I only get 75.8%. I used cropsize 768 and RandomScale with range [0.5, 2.0]. I also used DistributedDataParallel and SyncBatchNorm. The other hyperparameters are similar to this repo. Hope this help.

Best,

from deeplabv3plus-pytorch.

VainF avatar VainF commented on July 3, 2024

My hyper-parameters:

python main.py --model deeplabv3plus_mobilenet --enable_vis --vis_port 28333 --gpu_id 0 --year 2012_aug --crop_val --lr 0.01 --crop_size 513 --batch_size 16 --output_stride 16

Tip:
please use a larger batch size (>8 data points per GPU instance) in parallel training, otherwise, the BN statistics may be incorrect.

from deeplabv3plus-pytorch.

bb846 avatar bb846 commented on July 3, 2024

Thank you for your infos. I was using cropsize 768, batchsize 16 and inferring on the whole pictures. Now I am able to reach mIoU 70.4% for deeplabv3+_mobilenet.

from deeplabv3plus-pytorch.

zzc-ai avatar zzc-ai commented on July 3, 2024

Hello, I would like to know whether you started training from scratch without loading any weight and how many epochs you have trained

from deeplabv3plus-pytorch.

bb846 avatar bb846 commented on July 3, 2024

from deeplabv3plus-pytorch.

prachigarg23 avatar prachigarg23 commented on July 3, 2024

Hi, I'm having trouble reproducing the DeeplabV3 and DeeplabV3+ (ResNet101) results on Cityscapes. I'm using the following command:

python main.py --model deeplabv3plus_resnet101 --dataset cityscapes --gpu_id 0,1 --lr 0.1 --val_interval 300 --crop_size 513 --batch_size 16 --output_stride 16 --data_root path_to_cs

python main.py --model deeplabv3_resnet101 --dataset cityscapes --gpu_id 0,1 --lr 0.1 --val_interval 300 --crop_size 513 --batch_size 16 --output_stride 16 --data_root path_to_cs

Using 2 Nvidia 1080 Ti GPUs.

Getting 67.35% mIoU after 30k iterations on DeeplabV3 as compared to the 77.23% in the original paper.
Getting 72.1% mIoU after 30k iterations on DeeplabV3+ as compared to ~77% as reported in this repo.

I was confused about the correct learning rate and number of iterations. In the DeeplabV3 paper, they mention they use 0.007 as the initial learning rate and train for 90k training iterations for cityscapes. I saw 0.1 in the readme and 0.01 here. Can someone please confirm reproduceable hyperparameters for cityscapes?

@VainF Could I be doing something else wrong? Has anyone been able to reproduce the results? @bb846

from deeplabv3plus-pytorch.

prachigarg23 avatar prachigarg23 commented on July 3, 2024

Is the reported result on cityscapes after initialising from an ImageNet or COCO pretrained model? @VainF

from deeplabv3plus-pytorch.

bb846 avatar bb846 commented on July 3, 2024

Hi, I'm having trouble reproducing the DeeplabV3 and DeeplabV3+ (ResNet101) results on Cityscapes. I'm using the following command:

python main.py --model deeplabv3plus_resnet101 --dataset cityscapes --gpu_id 0,1 --lr 0.1 --val_interval 300 --crop_size 513 --batch_size 16 --output_stride 16 --data_root path_to_cs

python main.py --model deeplabv3_resnet101 --dataset cityscapes --gpu_id 0,1 --lr 0.1 --val_interval 300 --crop_size 513 --batch_size 16 --output_stride 16 --data_root path_to_cs

Using 2 Nvidia 1080 Ti GPUs.

Getting 67.35% mIoU after 30k iterations on DeeplabV3 as compared to the 77.23% in the original paper. Getting 72.1% mIoU after 30k iterations on DeeplabV3+ as compared to ~77% as reported in this repo.

I was confused about the correct learning rate and number of iterations. In the DeeplabV3 paper, they mention they use 0.007 as the initial learning rate and train for 90k training iterations for cityscapes. I saw 0.1 in the readme and 0.01 here. Can someone please confirm reproduceable hyperparameters for cityscapes?

@VainF Could I be doing something else wrong? Has anyone been able to reproduce the results? @bb846

Try using larger crop size, for example 768.

from deeplabv3plus-pytorch.

bb846 avatar bb846 commented on July 3, 2024

Is the reported result on cityscapes after initialising from an ImageNet or COCO pretrained model? @VainF

I think yes. From the following codes, you can see the ResNet backbone is loading weights pretrained on ImageNet.

def _resnet(arch, block, layers, pretrained, progress, **kwargs):
model = ResNet(block, layers, **kwargs)
if pretrained:
state_dict = load_state_dict_from_url(model_urls[arch],
progress=progress)
model.load_state_dict(state_dict)
return model

model_urls = {
'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
'resnext50_32x4d': 'https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth',
'resnext101_32x8d': 'https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth',
'wide_resnet50_2': 'https://download.pytorch.org/models/wide_resnet50_2-95faca4d.pth',
'wide_resnet101_2': 'https://download.pytorch.org/models/wide_resnet101_2-32ee1156.pth',
}

from deeplabv3plus-pytorch.

bb846 avatar bb846 commented on July 3, 2024

Should we continue to use the same number of workers (default=2 in main.py) when using multiple GPUs? Does using number_of_workers = 4*Number_of_GPUs help?

By the way, you seemed to mention the number of workers. I think this parameter will only affect the speed of data loading and has nothing to do with the model performance. For reference, you may read the following links:

https://stackoverflow.com/questions/53998282/how-does-the-number-of-workers-parameter-in-pytorch-dataloader-actually-work

https://discuss.pytorch.org/t/guidelines-for-assigning-num-workers-to-dataloader/813

from deeplabv3plus-pytorch.

prachigarg23 avatar prachigarg23 commented on July 3, 2024

Try using larger crop size, for example 768.

It gives CUDA out of memory.

@bb846 Thanks, for sharing the number of workers link. I also found the ImageNet pre-trained weights being loaded. But still can't reproduce the result. I think the issue is that BatchNorm is not synced in this repository and I'll have to use DistributedDataParallel to use the Pytorch SyncBatchNorm: https://pytorch.org/docs/master/generated/torch.nn.SyncBatchNorm.html#torch.nn.SyncBatchNorm.

So could you please share if you were able to reproduce the 77% result or get >75%; the exact command (or hyperparameters you used); and any other changes you made to this code? I think some of the hyperparameters like the learning rate are different from the original paper.

from deeplabv3plus-pytorch.

newzealandpaul avatar newzealandpaul commented on July 3, 2024

@bb846 are you able to share the command you used to run the training?

from deeplabv3plus-pytorch.

kona419 avatar kona419 commented on July 3, 2024

Try using larger crop size, for example 768.

It gives CUDA out of memory.

@bb846 Thanks, for sharing the number of workers link. I also found the ImageNet pre-trained weights being loaded. But still can't reproduce the result. I think the issue is that BatchNorm is not synced in this repository and I'll have to use DistributedDataParallel to use the Pytorch SyncBatchNorm: https://pytorch.org/docs/master/generated/torch.nn.SyncBatchNorm.html#torch.nn.SyncBatchNorm.

So could you please share if you were able to reproduce the 77% result or get >75%; the exact command (or hyperparameters you used); and any other changes you made to this code? I think some of the hyperparameters like the learning rate are different from the original paper.

Hello, I am wonder if you got >75%. Because my result is around 63~64%.

from deeplabv3plus-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.