amdegroot / ssd.pytorch Goto Github PK
View Code? Open in Web Editor NEWA PyTorch Implementation of Single Shot MultiBox Detector
License: MIT License
A PyTorch Implementation of Single Shot MultiBox Detector
License: MIT License
The prior_box.py using the version v2: every box can be described as (x, y, w, h) instead of the v1 (x1, y1, x2, y2)
using the clamp_(max=1, min=0) will cause the 'bottom box' out of the image. For example: the output[-5, :] is 0.8333, 0.8333, 0.5020, 1.000。 so the x2 and y2 is out of the image. I am not sure whether it will cause the accuracy, maybe can modify it like the v1. (maybe it will not be a problem)
line41: x = torch.from_numpy(transform(img)[0]).permute(2, 0, 1)
is not change the bgr to rgb. It's not equal to the dataset = VOCDetection(args.voc_root, [('2007', set_type)], BaseTransform(300, dataset_mean), AnnotationTransform())
(it change bgr to rgb).
So, I think it's better to add change the line138 img = img[:, :, (2, 1, 0)]
in voc0712 to the base_transform function's * (The results will not change too much if we set vis_threshold=0.6, however in the eval.py, if we use BaseTransform out the dataset, it will change the mAP)
Hi, I recently modify your code to train my own dataset.
Basically I did following changes:
1.Change the classes, num_classes
2. Change the dataset path
3. Change the RGB mean value of the dataset
Then I ran the modified train.py and encounter with an error:
CUDA_LAUNCH_BLOCKING=1 python train_button.py
Loading base network...
Initializing weights...
Loading Dataset...
Training SSD on button
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes
failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes
failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [11,0,0] Assertion t >= 0 && t < n_classes
failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes
failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [14,0,0] Assertion t >= 0 && t < n_classes
failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes
failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes
failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [20,0,0] Assertion t >= 0 && t < n_classes
failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [21,0,0] Assertion t >= 0 && t < n_classes
failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [22,0,0] Assertion t >= 0 && t < n_classes
failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [24,0,0] Assertion t >= 0 && t < n_classes
failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [25,0,0] Assertion t >= 0 && t < n_classes
failed.
THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THCUNN/generic/ClassNLLCriterion.cu line=83 error=59 : device-side assert triggered
Traceback (most recent call last):
File "train_button.py", line 204, in
train()
File "train_button.py", line 160, in train
loss_l, loss_c = criterion(out, targets)
File "/home/deep-server/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/home/deep-server/Documents/Jingya/ssd.pytorch/modules/multibox_loss.py", line 110, in forward
loss_c = F.cross_entropy(conf_p, targets_weighted, size_average=False)
File "/home/deep-server/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/torch/nn/functional.py", line 509, in cross_entropy
return nll_loss(log_softmax(input), target, weight, size_average)
File "/home/deep-server/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/torch/nn/functional.py", line 477, in nll_loss
return f(input, target)
File "/home/deep-server/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/torch/nn/_functions/thnn/auto.py", line 41, in forward
output, *self.additional_args)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /b/wheel/pytorch-src/torch/lib/THCUNN/generic/ClassNLLCriterion.cu:83
Can you please help me with the possible reason for the error?
It should related to the line of loss_c = F.cross_entropy(conf_p, targets_weighted, size_average=False)
But I don't understand how it would go wrong.
Thank you in advance.
Hi @amdegroot , I was trying to get the demo running and I'm having a problem when calling transform(img)
of the BaseTransform
class.
When doing python test.py
the output is the following.
Finished loading model!
Testing image 1/4952....
Traceback (most recent call last):
File "test.py", line 84, in <module>
thresh=args.visual_threshold)
File "test.py", line 39, in test_net
x = Variable(transform(img).unsqueeze(0))
File "/home/arian/Documents/proyecto-integrador/models/ssd/ssd-pytorch/data/data_augment.py", line 119, in __call__
return torch.Tensor(img)
RuntimeError: tried to construct a tensor from a nested float sequence, but found an item of type numpy.float32 at index (0, 0, 0)
This happens in the demo notebook and in the test.py file.
Do you have any idea why this could be happening?.
Thanks,
Arian.
I got the following error when training
THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THC/generic/THCTensorMath.cu line=226 error=59 : device-side assert triggered
Traceback (most recent call last):
File "train_cars.py", line 232, in <module>
train()
File "train_cars.py", line 184, in train
loss_l, loss_c = criterion(out, targets)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/home/mshah/code/ssd.pytorch/layers/modules/multibox_loss.py", line 70, in forward
match(self.threshold,truths,defaults,self.variance,labels,loc_t,conf_t,idx)
File "/home/mshah/code/ssd.pytorch/layers/box_utils.py", line 107, in match
loc = encode(matches, priors, variances)
File "/home/mshah/code/ssd.pytorch/layers/box_utils.py", line 133, in encode
return torch.cat([g_cxcy, g_wh], 1) # [num_priors,4]
RuntimeError: cuda runtime error (59) : device-side assert triggered at /b/wheel/pytorch-src/torch/lib/THC/generic/THCTensorMath.cu:226
Can I know how to fix?
When I run test.py, the sentence “y = net(x)” is error:
RuntimeError: dimension out of range - got 1 but the tensor is only 1D
Change:
'--cuda', default=True
thank you for help.
Hello @amdegroot,
thank you for making your code available. I am currently working in the ssd_keras port. However, we are missing the MAP score and I saw that you have already calculated yours. Do you think you could pin-point which code did you use to evaluate your SSD port.
Also I saw that you are missing the data_augmentation part. Maybe you could take a look here. It is a python generator, it is currently missing the crop transformation but it has helped me reach a better loss.
Thank you!
_,loss_idx = loss_c.sort(1, descending=True)
_,idx_rank = loss_idx.sort(1)
Just wondering, is this a bug? I don't think you should put descending=True to find idx_rank.
The 'steps' in config.py is [8, 16, 32, 64, 100, 300]. I am just wondering how to derive these numbers? I have read the papers which says 'f_k is the size of kth square feature maps', though I cannot relate it with the numbers you got. Thanks.
kaan@ALTAR:ssd.pytorch$ python3 train.py
Loading base network...
Initializing weights...
Loading Dataset...
Training SSD on VOC0712
Traceback (most recent call last):
File "train.py", line 231, in
train()
File "train.py", line 170, in train
images, targets = next(batch_iterator)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 201, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
AttributeError: Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 40, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 40, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/kaan/ssd.pytorch/data/voc0712.py", line 117, in getitem
im, gt, h, w = self.pull_item(index)
File "/home/kaan/ssd.pytorch/data/voc0712.py", line 129, in pull_item
height, width, channels = img.shape
AttributeError: 'NoneType' object has no attribute 'shape'
when I run 'python -m demo.live', it occurs to me. The environment is conda and python3.5, how can I solve it?
I've tried to implement my own dataset detector, however at training time, the localization loss is NaN due to negative values present on g_wh
layers/box_utils.py#L137, I don't know if this error is related to the format of the bounding boxes or if it's related to the output of the SSD model.
I would like to know if am I doing something wrong while loading the dataset or if the error is related to a bug on the base implementation.
I was wondering how I could obtain weiliu89 weights to validate 77.2%. I have changed some parts of the code and just want to validate if it is still reproducible. Thanks
I do not read the source code about py-faster-rcnn while its readiblity is poor. I noticed that you dose not keep difficulte instances in training and testing in PASCAL VOC. So tell me please, the most models ignore difficulte instance in training and testing in PASCAL VOC competition too, right? Thank you!
Do you still keep your loss info? Currently my loss keep hovers at 20. Can I know what loss value is considered reasonable?
I wondering the time consumption at each part. (VGG, Extra, multi_box, detection)
From the result of caffe version, the VGG part accounts for up to 80 percent of time consumption.
However, in this version, the distribution of time consumption is as follow:
Total time : 0.018(seconds) per image
VGG part 8.4%
Extra layer 2.8%
Multi_box 61%
detect 27.5%
Most of time is from Multi_box and detect.
I measure it by python time.time()
And both total time for one image is almost the same.
caffe : 19ms
pytorch : 18ms
I wondering why this inconsistence happen?
i note that in loading the xml file, 0-indexing is used:
obj_struct['bbox'] = [int(bbox.find('xmin').text) - 1,
int(bbox.find('ymin').text) - 1,
int(bbox.find('xmax').text) - 1,
int(bbox.find('ymax').text) - 1]
however, the detections are 1-indexing
# the VOCdevkit expects 1-based indices
for k in range(dets.shape[0]):
f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'.
format(index[1], dets[k, -1],
dets[k, 0] + 1, dets[k, 1] + 1,
dets[k, 2] + 1, dets[k, 3] + 1))
if you use consistent indexing, the mAP for model ssd300_mAP_77.43_v2.pth should be 0.775538
I notice that you did not use the variance in priorbox. Is it supposed to be like this? The caffe code has the following which you seem to have left out
top_data += top[0]->offset(0, 1);
if (variance_.size() == 1) {
caffe_set<Dtype>(dim, Dtype(variance_[0]), top_data);
} else {
int count = 0;
for (int h = 0; h < layer_height; ++h) {
for (int w = 0; w < layer_width; ++w) {
for (int i = 0; i < num_priors_; ++i) {
for (int j = 0; j < 4; ++j) {
top_data[count] = variance_[j];
++count;
}
}
}
}
}
}
Though I do not understand what the offset and caffe_set does. Do you have any idea?
python train.py Loading base network... Initializing weights... Loading Dataset... Training SSD on VOC0712 Traceback (most recent call last): File "train.py", line 232, in <module> train() File "train.py", line 181, in train out = net(images) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 60, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 70, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply raise output File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 42, in _worker output = module(*input, **kwargs) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/data/gpu/utkrsh/code/ssd.pytorch/ssd.py", line 76, in forward s = self.L2Norm(x) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/data/gpu/utkrsh/code/ssd.pytorch/layers/modules/l2norm.py", line 21, in forward x/=norm.expand_as(x) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/autograd/variable.py", line 725, in expand_as return Expand.apply(self, (tensor.size(),)) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 111, in forward result = i.expand(*new_size) RuntimeError: The expanded size of the tensor (512) must match the existing size (8) at non-singleton dimension 1. at /opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/THC/generic/T$CTensor.c:323
I am getting the above stack trace after running train.py for default values. The dataset and weights were downloaded in the default location.
I am using python 3.6 and pytorch 0.2.0
I do understand the meaning of the error, I am just not able to find the source. Can anyone point in the right direction?
I have own datasets with labeled. How to train it?
Hi, there is a problem hope you can help me, thank you.
File "/home/hd/ssd.pytorch/data/voc.py", line 222
gts.append([label, *(int(bb.text) - 1 for bb in bbox)])
^
SyntaxError: invalid syntax
This error occur when I run the test.py. Thank you again.
^ under the *
Hi all,
When image is resized, bounding boxes should be scaled accordingly, but the Resize
class in augmentations.py
does not scale the bounding box. Is this a bug?
Can I know how does the following line ensures best prior? Why 2? What will happen if this line is not included? It seems to me that it is not necessary to have this line. Thanks.
best_truth_overlap.index_fill_(0, best_prior_idx, 2) # ensure best prior
Hi there~
The PriorBox would encounter an error on multi-GPU mode. For example, when running on one GPU, the output size would be:
This is correct. But When running on 2 GPUs, the size of priors would be (17464, 4) and (26196, 4) on 3 GPUs, while the sizes of loc_data and conf_data remain the same as they are on 1 GPU.
ps. I found this bug when applying net = torch.nn.DataParallel(net).cuda()
in train.py
Hope to see the solution.
Thanks.
I saw the images that represent the graph, but they were blurred. Are there some ways or scripts to reproduce the graph during training and inference? In the file .gitignore it seems it was used visualize.py to generate those pictures. I need this because i think it helps a lot as a first step for a good understanding of the architecture and the functionality of the model itself.
iter 510 || Loss: 6.8001 || THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/THC/generated/../THCReduceAll.cuh line=334 error=77 : an illegal memory access was encountered Traceback (most recent call last): File "train.py", line 231, in <module> train() File "train.py", line 183, in train loss_l, loss_c = criterion(out, targets) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/users/gpu/utkrsh/code/ssd.pytorch/layers/modules/multibox_loss.py", line 137, in forward conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1, self.num_classes) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/autograd/variable.py", line 72, in __getitem__ return MaskedSelect.apply(self, key) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 468, in forward return tensor.masked_select(mask) RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/THC/generated/../THCReduceAll.cuh:334
I am trying to train the network with a slight modification in localization loss in multibox_loss.py
. I keep on getting this error message for the same line of code. Also, when starting to train, there is a warning
/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py:450: UserWarning: mask is not broadcastable to self, but they have the same number of elements. Falling back to deprecated pointwise behavior. return tensor.masked_fill_(mask, value)
I am training with batch_size=32
in train.py and everything else is at the default value. I have tried to modify the code but there is no impact on the warning and I keep getting this error.
Also, if I use a larger batch_size
in train.py
like 40, I get this illegal memory access error much earlier than with size 32.
Any suggestions for what might be wrong?
def point_form(boxes):
""" Convert prior_boxes to (xmin, ymin, xmax, ymax)
representation for comparison to point form ground truth data.
Args:
boxes: (tensor) center-size default boxes from priorbox layers.
Return:
boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
"""
return torch.cat((boxes[:, :2] - boxes[:, 2:]/2, # xmin, ymin
boxes[:, :2] + boxes[:, 2:]/2), 1) # xmax, ymax
def center_size(boxes):
""" Convert prior_boxes to (cx, cy, w, h)
representation for comparison to center-size form ground truth data.
Args:
boxes: (tensor) point_form boxes
Return:
boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
"""
return torch.cat((boxes[:, 2:] + boxes[:, :2])/2, # cx, cy
boxes[:, 2:] - boxes[:, :2], 1) # w, h
for i, k in enumerate(self.feature_maps):
step_x = step_y = self.image_size/k
for h, w in product(range(k), repeat=2):
c_x = ((w+0.5) * step_x)
c_y = ((h+0.5) * step_y)
c_w = c_h = self.min_sizes[i] / 2
s_k = self.image_size # 300
# aspect_ratio: 1,
# size: min_size
mean += [(c_x-c_w)/s_k, (c_y-c_h)/s_k,
(c_x+c_w)/s_k, (c_y+c_h)/s_k]
if self.max_sizes[i] > 0:
# aspect_ratio: 1
# size: sqrt(min_size * max_size)/2
c_w = c_h = sqrt(self.min_sizes[i] *
self.max_sizes[i])/2
mean += [(c_x-c_w)/s_k, (c_y-c_h)/s_k,
(c_x+c_w)/s_k, (c_y+c_h)/s_k]
# rest of prior boxes
for ar in self.aspect_ratios[i]:
if not (abs(ar-1) < 1e-6):
c_w = self.min_sizes[i] * sqrt(ar)/2
c_h = self.min_sizes[i] / sqrt(ar)/2
mean += [(c_x-c_w)/s_k, (c_y-c_h)/s_k,
(c_x+c_w)/s_k, (c_y+c_h)/s_k]
Can you kindly verify? Thanks.
Hello,
I was looking at your implementation and I believe the input to your model is an image with RGB ordering. I was also looking at the keras implementation and they use BGR values. I have been also testing with an map evaluation script and it seems that I get better results ;using the weights that you provided from the original caffe implementation, when I use BGR instead of RBG. Do you happen to know which order should we follow when using the original caffe weights?
Thank you very much :)
Loading base network...
Initializing weights...
Loading Dataset...
Training SSD on VOC2007
Timer: 6.7833 sec.
iter 0 || Loss: 26.1034 || Timer: 0.2098 sec.
iter 10 || Loss: 15.1629 || Timer: 0.2115 sec.
iter 20 || Loss: 15.4713 || Timer: 0.2101 sec.
iter 30 || Loss: 17.6274 || Timer: 0.2153 sec.
iter 40 || Loss: 31.7296 || Timer: 0.2107 sec.
iter 50 || Loss: nan || Timer: 0.2113 sec.
iter 60 || Loss: nan || Timer: 0.2073 sec.
iter 70 || Loss: nan || Timer: 0.2035 sec.
iter 80 || Loss: nan || Timer: 0.2090 sec.
iter 90 || Loss: nan || Timer: 0.2055 sec.
iter 100 || Loss: nan || Timer: 0.2196 sec.
iter 110 || Loss: nan || Timer: 0.2064 sec.
iter 120 || Loss: nan || Timer: 0.2257 sec.
iter 130 || Loss: nan || Timer: 0.2051 sec.
iter 140 || Loss: nan || Timer: 0.2142 sec.
iter 150 || Loss: nan || Timer: 0.2056 sec.
iter 160 || Loss: nan || Timer: 0.2122 sec.
iter 170 || Loss: nan || Timer: 0.2090 sec.
iter 180 || Loss: nan || Timer: 0.2091 sec.
iter 190 || Loss: nan || Timer: 0.2110 sec.
Better Style be beneficial to all.
Do you have any idea why torchvision vgg16 cannot converge with your training pipeline? I just changed your vgg model to torchvision ones and it can't converge. I tried many different parameters
Hello, I don't understand why you calculate out = weight*x before return out and not return the x straightly.Could you tell me the reason?
thx~~ :)
def forward(self, x):
norm = x.pow(2).sum(1).sqrt()+self.eps
x/=norm.expand_as(x)
out = self.weight.unsqueeze(0).unsqueeze(2).unsqueeze(3).expand_as(x) * x <= here
return out
when doing 'python test.py',the output is the following:
Traceback (most recent call last):
File "test.py", line 73, in
net.load_state_dict(torch.load(args.trained_model))
File "/home/qz/lzjqsdd/APP/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 331, in load_state_dict .format(name))
Do you have any idea why this could be happening?Thanks
Hi,
I just finished the training process and run the test process by eval.py
; however, I got a much lower result (compared with the results reported in readme
, see here.). After a further digging, there are some concerns:
One more thing, you report that without pre-training and using data augmentation alone, you have a 77.43% performance. What setting do you use? I just Xavier init all layers without a pretrained model, keeping all other parameters unchanged as in your repo; the training completely fails (at first, the loss is ~15; then 20k iter it goes down to 7.x and kept the same for the rest iter (max_iter=120k); the test mAP is 0.4, cf with pre-train 71.x that I got).
Thanks so much for your help!
Hongyang, Francis
Hello
I am wondering why did you set the difficult training set as False.
Since I found that original code uses difficult training set as well.
Thanks
where can I find the SSD version2 paper.
Hello
Firstly I really appreciate your work.
I wonder if you have in mind to support image size 512 soon.
Thanks
Hello
I have a question about the criteria that you used in the config.py.
Since the original paper states about the scale factor to be 'regularly spaced', it seems your definition of scale factor is quit different from it.
For example, lets say 4 feature maps are used for prediction and if we define Smin and Smax to be 0.2 and 0.8 respectively, it results in (0.2 0.4 0.6 0.8) for each feature map scale factor.
However I found that your definition of scale factor
(30(0.1), 60(0.2), 111(0.37), 162(0.54), 213(0.71), 264(0.88)
seems to be not regularly spaced. The differences between scale factors are (0.1, 0.17 0.17 0.17 0.17). Do you have any special reasons to use it? (e.g Improving the accuracy?)
Any comments will be appreciated.
Thanks in advance.
I run the demo, return 'tensors are on different GPSs' , but I have only one GPU.
The demo was run successfully with CPU.
Can you put the process of using the GPU to release it?
Thank you very much!
hi,
have you successfully run the train.py?
I encountered a runtime error saying: "div_ only supports scalar multiplication" from line "x/=norm.expand_as(x)" in modules/l2norm.py
Then I changed this line to "x = x.div(nor.expand_as(x))" but got another cuda runtime error "device-side assert triggered" from line "return torch.cat([g_cxcy, g_wh], 1)" in box_utils.py
BTW, i am using python 2.7 instead of python3.
When I run the Demo Jupyter Notebook, I got a runtime error when "y = net(xx)". I have a GPU[0].
Thank you very much
RuntimeError Traceback (most recent call last)
in ()
3 xx = xx.cuda()
4 print(xx.t())
----> 5 y = net(xx)
/home/tech/anaconda3/envs/tf35/lib/python3.5/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
204
205 def call(self, *input, **kwargs):
--> 206 result = self.forward(*input, **kwargs)
207 for hook in self._forward_hooks.values():
208 hook_result = hook(self, input, result)
/home/tech/ssd.pytorch/ssd.py in forward(self, x)
72 # apply vgg up to conv4_3 relu
73 for k in range(23):
---> 74 x = self.vggk
75
76 s = self.L2Norm(x)
/home/tech/anaconda3/envs/tf35/lib/python3.5/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
204
205 def call(self, *input, **kwargs):
--> 206 result = self.forward(*input, **kwargs)
207 for hook in self._forward_hooks.values():
208 hook_result = hook(self, input, result)
/home/tech/anaconda3/envs/tf35/lib/python3.5/site-packages/torch/nn/modules/conv.py in forward(self, input)
235 def forward(self, input):
236 return F.conv2d(input, self.weight, self.bias, self.stride,
--> 237 self.padding, self.dilation, self.groups)
238
239
/home/tech/anaconda3/envs/tf35/lib/python3.5/site-packages/torch/nn/functional.py in conv2d(input, weight, bias, stride, padding, dilation, groups)
38 f = ConvNd(_pair(stride), _pair(padding), _pair(dilation), False,
39 _pair(0), groups, torch.backends.cudnn.benchmark, torch.backends.cudnn.enabled)
---> 40 return f(input, weight, bias)
41
42
RuntimeError: tensors are on different GPUs
area_a = ((box_a[:, 2]-box_a[:, 0]) *
(box_a[:, 3]-box_a[:, 1])).unsqueeze(1).expand_as(inter) # [A,B]
area_b = ((box_b[:, 2]-box_b[:, 0]) *
(box_b[:, 3]-box_b[:, 1])).unsqueeze(0).expand_as(inter)
why the unsqueeze num is different? I don't understand.
The train_transform() is not used in the base_transform. So does this project use RandomHorizontalFlip?
Or this function is called other place?
Hello
It seems the code trains the network(SSD) without difficult training sets.
Additionally I also trained the network with 07 ++ 12 train set (07 trainval, 07 test, 12 trainval) and tested with 12 test set using the official server. And the result was 74.1%, 2% below from the latest version of SSD300(75.8%). Of course there will be differences in the library(pytorch vs caffe), it seems like the network which was trained only with easy sets would not be able to achieve the original performance.
Any comments will be appreciated.
Thanks in advance.
I'm very new to pytorch I'm getting these errors when I run the test.py file
File "test.py", line 93, in
thresh=args.visual_threshold)
File "test.py", line 54, in test_net
y = net(x) # forward pass
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/workspace/ssd.pytorch/ssd.py", line 102, in forward
self.priors # default boxes
File "/workspace/ssd.pytorch/layers/functions/detection.py", line 51, in forward
decoded_boxes = decode(loc_data[i], prior_data, self.variance)
File "/workspace/ssd.pytorch/layers/box_utils.py", line 152, in decode
priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:],
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torch/tensor.py", line 283, in mul
return self.mul(other)
TypeError: mul received an invalid combination of arguments - got (torch.FloatTensor), but expected one of:
I got this issue coming up. I was able to fix it though by setting cudnn.benchmark = False
and setting --batch_size
to 8.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.