Code Monkey home page Code Monkey logo

Comments (20)

qizhuli avatar qizhuli commented on August 24, 2024 4

@SoonminHwang you can either transfer the weights over by matcaffe/pycaffe, or you can replace the ResNet part of the PSPNet prototxt with the DeepLab version.
By the way, the PSPNet ResNet backbone is not structurally identical to the original ResNet. The first few convs are 3x3 here instead of 7x7. So you wouldn't be able to transfer the DeepLab weights over exactly. Therefore, easiest solution would be to use the original ResNet / DeepLab structure and then use their public weights to initialise.

from pspnet.

Fromandto avatar Fromandto commented on August 24, 2024 2

@suhyung I am using the training script of deeplab-v2. it is compatible.

from pspnet.

Fromandto avatar Fromandto commented on August 24, 2024 1

@hszhao I am training a 713 resolution pspnet on 2 x 12gb titan x with batch size 1, and it seems almost all memories are used.

So I guess training with batchsize 16 would require about 32 titan x cards (12gb memory) ?

I cannot find details about how many gpus are used in the paper, so I want to confirm that how many gpus are required to train with batchsize 16 according to your experience ?

I really wonder what is the quantitative performance improvement between batchsize 16 and batchsize 1, because in the paper and this thread you emphasize that batchsize matters yet in deeplab-v2 (and according to my own experience) training with batchsize 1 also works (to some extent). Do I really need to use batchsize 16 (and potentially 32 cards ?) to achieve ideal performance ? ...

from pspnet.

dongzhuoyao avatar dongzhuoyao commented on August 24, 2024

same

from pspnet.

mjohn123 avatar mjohn123 commented on August 24, 2024

I am also regrading to the question. Thanks

from pspnet.

rickythink avatar rickythink commented on August 24, 2024

same

from pspnet.

huaxinxiao avatar huaxinxiao commented on August 24, 2024

Has anyone re-trained successfully?

from pspnet.

rener1199 avatar rener1199 commented on August 24, 2024

same

from pspnet.

justinbuzzni avatar justinbuzzni commented on August 24, 2024

same

from pspnet.

hszhao avatar hszhao commented on August 24, 2024

Hi, for the training, the issues are mainly related to bn layer:

  1. Whether to update the parameters (mean,variance,slope,bias) of 'bn'?
    -If you are working on the same (or similar) dataset as the released model, you can just fix the 'bn' layer for fine tuning; if not, you may need to update the parameters.
  2. What may need more attention when updating parameters?
    -The batch size when doing batch normalization is important and it's better to set it above 16 in a calculation step for we need to keep the current mean and variance approximate to the global statistics that will be used in the testing step. While semantic segmentation is memory consuming and to maintain a larger crop size (related to certain dataset) may cause small batch size on each GPU card. So during our training step, we use MPI to gather data from different GPU cards and then do the bn operation. While it seems that current official Caffe doesn't support such communication. We are trying to make our training code compatible with BVLC and you can have a glance at Caffe vision of yjxiong which is a OpenMPI-based Multi-GPU version. If you are working on other datasets, maybe other platform can support such bn communication. Sorry for the inconvenient of the current released evaluation code.
    Thanks.

from pspnet.

mjohn123 avatar mjohn123 commented on August 24, 2024

@hszhao : Thanks for your information. I am working in same cityscapes dataset. I am using 1 GPU TitanX Pascal. Is it possible to run your training model in my computer? If not, could you reduce the Resnet depth layer such as 54? I also a beginner of caffer, so I do not know how can I make the training model from your first point

from pspnet.

huaxinxiao avatar huaxinxiao commented on August 24, 2024

@Fromandto If your batchsize is 1, the batch normalization layer may be not work. However, the bn layer seems important to the performance of PSPNet.

from pspnet.

Fromandto avatar Fromandto commented on August 24, 2024

@huaxinxiao yes, this is exactly what i am concerned ... but I just don't have 32 gpus (or is there anything wrong with my setting so that 4 gpus are enough to train 16 batch ?)

from pspnet.

huaxinxiao avatar huaxinxiao commented on August 24, 2024

@Fromandto Smaller crop size (<321) will work in 4 gpus. Besides, you should use OpenMPI-based Multi-GPU caffe to gather the bn parameters.

from pspnet.

suhyung avatar suhyung commented on August 24, 2024

@Fromandto Could you share your training script?

from pspnet.

SoonminHwang avatar SoonminHwang commented on August 24, 2024

@Fromandto @hszhao Could you tell me some details for training? I'm using deeplab-v2 caffe, and I'm ready to train a model with my own python script. But I don't have any proper initial weights for pspnet101-VOC2012.prototxt. I tried to use the initial parameters from deeplab-v2, the layer names are different. Should I train a network for pre-trained model on ImageNet by myself?

from pspnet.

ThienAnh avatar ThienAnh commented on August 24, 2024

@SoonminHwang Did you have init weights file?

from pspnet.

tkasarla avatar tkasarla commented on August 24, 2024

What init weights should we use for training the cityscapes model for pspnet?

from pspnet.

melody-rain avatar melody-rain commented on August 24, 2024

@huaxinxiao can you train this pspnet with SyncBN?

from pspnet.

holyseven avatar holyseven commented on August 24, 2024

I've implemented sync batch normalization in pure tensorflow, which makes possible to train and reproduce the performance of PSPNet: batch norm across GPUs.

from pspnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.