Code Monkey home page Code Monkey logo

Comments (8)

PytaichukBohdan avatar PytaichukBohdan commented on July 3, 2024 1

@kinfeparty @VainF Found the issue.

According to Pytorch optimizer documentation,

if you need to move a model to GPU via .cuda(), please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call.

It is fixed by moving model to cuda before loading state dict to optimizer:

` if opts.ckpt is not None and os.path.isfile(opts.ckpt):

    checkpoint = torch.load(opts.ckpt, map_location=torch.device('cpu'))
    # checkpoint = torch.load(opts.ckpt)
    model.load_state_dict(checkpoint["model_state"])

    model = nn.DataParallel(model)
    model.to(device)

    if opts.continue_training:
        optimizer.load_state_dict(checkpoint["optimizer_state"])
        scheduler.load_state_dict(checkpoint["scheduler_state"])
        cur_itrs = checkpoint["cur_itrs"]
        best_score = checkpoint['best_score']
        print("Training state restored from %s" % opts.ckpt)
    print("Model restored from %s" % opts.ckpt)
    del checkpoint  # free memory
else:
    print("[!] Retrain")

    model = nn.DataParallel(model)
    model.to(device)`

from deeplabv3plus-pytorch.

VainF avatar VainF commented on July 3, 2024 1

@PytaichukBohdan thanks!

from deeplabv3plus-pytorch.

VainF avatar VainF commented on July 3, 2024

Hi @kinfeparty , I added the missing map_location in the latest commit. Please try again.

from deeplabv3plus-pytorch.

kinfeparty avatar kinfeparty commented on July 3, 2024

Hi @VainF ,I modified the code but met the same bug.

from deeplabv3plus-pytorch.

PytaichukBohdan avatar PytaichukBohdan commented on July 3, 2024

Hi @VainF , got the same issue.
Do you know what it can be related to?

from deeplabv3plus-pytorch.

YLiu-creator avatar YLiu-creator commented on July 3, 2024

when continue training, the ASPPPooling met the error:
Original Traceback (most recent call last):
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/Projects/CloudDetection/cloudNet_4channel/network/utils.py", line 16, in forward
x = self.classifier(features)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/Projects/CloudDetection/cloudNet_4channel/network/_deeplab.py", line 84, in forward
low_output_feature= self.aspp(low_level_beforeFPM)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/Projects/CloudDetection/cloudNet_4channel/network/_deeplab.py", line 265, in forward
res.append(conv(x))
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/Projects/CloudDetection/cloudNet_4channel/network/_deeplab.py", line 233, in forward
x = super(ASPPPooling, self).forward(x)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 81, in forward
exponential_average_factor, self.eps)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/functional.py", line 1652, in batch_norm
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])

ASPPPooling worked when retraining
I don't know how to debug, please give some help.

from deeplabv3plus-pytorch.

YLiu-creator avatar YLiu-creator commented on July 3, 2024

when continue training, the ASPPPooling met the error:
Original Traceback (most recent call last):
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/Projects/CloudDetection/cloudNet_4channel/network/utils.py", line 16, in forward
x = self.classifier(features)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/Projects/CloudDetection/cloudNet_4channel/network/_deeplab.py", line 84, in forward
low_output_feature= self.aspp(low_level_beforeFPM)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/Projects/CloudDetection/cloudNet_4channel/network/_deeplab.py", line 265, in forward
res.append(conv(x))
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/Projects/CloudDetection/cloudNet_4channel/network/_deeplab.py", line 233, in forward
x = super(ASPPPooling, self).forward(x)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 81, in forward
exponential_average_factor, self.eps)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/functional.py", line 1652, in batch_norm
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])

ASPPPooling worked when retraining
I don't know how to debug, please give some help.

I konw the "1" was caused by AdaptiveAvgPool2d, but why only except error in continue training?

from deeplabv3plus-pytorch.

longphamkhac avatar longphamkhac commented on July 3, 2024

image
image
How can my output segmentation image be the same as the second image, tks sir very much

from deeplabv3plus-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.