Comments (8)
@kinfeparty @VainF Found the issue.
According to Pytorch optimizer documentation,
if you need to move a model to GPU via .cuda(), please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call.
It is fixed by moving model to cuda before loading state dict to optimizer:
` if opts.ckpt is not None and os.path.isfile(opts.ckpt):
checkpoint = torch.load(opts.ckpt, map_location=torch.device('cpu'))
# checkpoint = torch.load(opts.ckpt)
model.load_state_dict(checkpoint["model_state"])
model = nn.DataParallel(model)
model.to(device)
if opts.continue_training:
optimizer.load_state_dict(checkpoint["optimizer_state"])
scheduler.load_state_dict(checkpoint["scheduler_state"])
cur_itrs = checkpoint["cur_itrs"]
best_score = checkpoint['best_score']
print("Training state restored from %s" % opts.ckpt)
print("Model restored from %s" % opts.ckpt)
del checkpoint # free memory
else:
print("[!] Retrain")
model = nn.DataParallel(model)
model.to(device)`
from deeplabv3plus-pytorch.
@PytaichukBohdan thanks!
from deeplabv3plus-pytorch.
Hi @kinfeparty , I added the missing map_location
in the latest commit. Please try again.
from deeplabv3plus-pytorch.
Hi @VainF ,I modified the code but met the same bug.
from deeplabv3plus-pytorch.
Hi @VainF , got the same issue.
Do you know what it can be related to?
from deeplabv3plus-pytorch.
when continue training, the ASPPPooling met the error:
Original Traceback (most recent call last):
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/Projects/CloudDetection/cloudNet_4channel/network/utils.py", line 16, in forward
x = self.classifier(features)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/Projects/CloudDetection/cloudNet_4channel/network/_deeplab.py", line 84, in forward
low_output_feature= self.aspp(low_level_beforeFPM)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/Projects/CloudDetection/cloudNet_4channel/network/_deeplab.py", line 265, in forward
res.append(conv(x))
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/Projects/CloudDetection/cloudNet_4channel/network/_deeplab.py", line 233, in forward
x = super(ASPPPooling, self).forward(x)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 81, in forward
exponential_average_factor, self.eps)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/functional.py", line 1652, in batch_norm
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])
ASPPPooling worked when retraining
I don't know how to debug, please give some help.
from deeplabv3plus-pytorch.
when continue training, the ASPPPooling met the error:
Original Traceback (most recent call last):
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/Projects/CloudDetection/cloudNet_4channel/network/utils.py", line 16, in forward
x = self.classifier(features)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/Projects/CloudDetection/cloudNet_4channel/network/_deeplab.py", line 84, in forward
low_output_feature= self.aspp(low_level_beforeFPM)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/Projects/CloudDetection/cloudNet_4channel/network/_deeplab.py", line 265, in forward
res.append(conv(x))
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/Projects/CloudDetection/cloudNet_4channel/network/_deeplab.py", line 233, in forward
x = super(ASPPPooling, self).forward(x)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 81, in forward
exponential_average_factor, self.eps)
File "/home/GFXX/anaconda3/envs/gfx/lib/python3.7/site-packages/torch/nn/functional.py", line 1652, in batch_norm
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])ASPPPooling worked when retraining
I don't know how to debug, please give some help.
I konw the "1" was caused by AdaptiveAvgPool2d, but why only except error in continue training?
from deeplabv3plus-pytorch.
How can my output segmentation image be the same as the second image, tks sir very much
from deeplabv3plus-pytorch.
Related Issues (20)
- video
- Some advice on GPU choice?
- I have three categories, but my class IOU only has two
- New additional classes not training HOT 3
- RuntimeError: The size of tensor a (125) must match the size of tensor b (126) at non-singleton dimension 3
- The size of tensor a (125) must match the size of tensor b (126) at non-singleton dimension 3' will appear during the validation phase HOT 1
- When I use a model with plus, the following error always occurs
- how can I write the argument ("--input") in predict.py
- distributed training error
- ONLY TRIANED ON A SUNGLE GPU
- How to run train.py
- about dice loss
- How to use this model on iOS?
- Issue with Multi-GPU Training/Predicting using --gpu_id
- Visualization of training results
- Wrong File Name in best_deeplabv3plus_resnet101_cityscapes_os16 HOT 3
- predict
- MobileNetV2 Width_mult
- hrnet_48 pretrain model
- How To Test On Cityscapes
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deeplabv3plus-pytorch.