amdegroot / ssd.pytorch Goto Github PK

View Code? Open in Web Editor NEW

5.1K 86.0 1.7K 106.31 MB

A PyTorch Implementation of Single Shot MultiBox Detector

License: MIT License

Python 95.90% Shell 4.10%

pytorch deep-learning ssd object-detection computer-vision machine-learning image-recognition webcam

ssd.pytorch's People

Contributors

Stargazers

Watchers

Forkers

wanjinchang likeucode clcarwin sunjieee benjamesbabala balancap lidaguo adrianhust ghzhangnj kyl2016 nishathussain arunpatala chenbangfeng ataraxialab chenyuntc nianfudong jiangweixian andfoy zhuozheng memoiryclear alexkoltun jgraving peratham authman shllhs redheli laurynasmiksys dongb5 jianweilin ps793 xiexianhai acgtyrant howardntust zhangkaij dl-85 ml-lab xiaoerlaigeid ceshine maxkircher iguazi walkoncross cadene yangwangx hli2020 oarriaga junhocho aymenx17 hyfine zhengfangwu maxiaoyuzdz chauhan-utk soledad89 soonminhwang zgsxwsdxg sajithar-99 itayhubara cshaoping lopuhin dimplesl jxlin xychen9459 psu1 chuanleiguo ossdc wenh123 kiranscaria birdylinch feynman27 strongwolf wldeephi arnholdinstitute resurgo-genetics junshk asoleimanib donnyyou 10183308 kastnerkyle grseb9s papercoming astorfi changjo shubhampachori12110095 wsq10c laycoding ahmadh84 robert-junwang jiaming-liu caomw taokong agarwalishita zjulujia monjovi itaouil liu-zhy lukeandshuo rosenfeldamir lotrea isaac-duan chenwgen meshubhama01

ssd.pytorch's Issues

PriorBox: the box is out of the image

The prior_box.py using the version v2: every box can be described as (x, y, w, h) instead of the v1 (x1, y1, x2, y2)
using the clamp_(max=1, min=0) will cause the 'bottom box' out of the image. For example: the output[-5, :] is 0.8333, 0.8333, 0.5020, 1.000。 so the x2 and y2 is out of the image. I am not sure whether it will cause the accuracy, maybe can modify it like the v1. (maybe it will not be a problem)

bug: test.py (BaseTransform)

line41: x = torch.from_numpy(transform(img)[0]).permute(2, 0, 1) is not change the bgr to rgb. It's not equal to the dataset = VOCDetection(args.voc_root, [('2007', set_type)], BaseTransform(300, dataset_mean), AnnotationTransform()) （it change bgr to rgb）.

So, I think it's better to add change the line138 img = img[:, :, (2, 1, 0)] in voc0712 to the base_transform function's * (The results will not change too much if we set vis_threshold=0.6, however in the eval.py, if we use BaseTransform out the dataset, it will change the mAP)

Error while modified to train my own dataset

Hi, I recently modify your code to train my own dataset.
Basically I did following changes:
1.Change the classes, num_classes
2. Change the dataset path
3. Change the RGB mean value of the dataset

Then I ran the modified train.py and encounter with an error:
CUDA_LAUNCH_BLOCKING=1 python train_button.py
Loading base network...
Initializing weights...
Loading Dataset...
Training SSD on button
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [11,0,0] Assertion t >= 0 && t < n_classes failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [14,0,0] Assertion t >= 0 && t < n_classes failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [20,0,0] Assertion t >= 0 && t < n_classes failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [21,0,0] Assertion t >= 0 && t < n_classes failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [22,0,0] Assertion t >= 0 && t < n_classes failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [24,0,0] Assertion t >= 0 && t < n_classes failed.
/b/wheel/pytorch-src/torch/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [25,0,0] Assertion t >= 0 && t < n_classes failed.
THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THCUNN/generic/ClassNLLCriterion.cu line=83 error=59 : device-side assert triggered
Traceback (most recent call last):
File "train_button.py", line 204, in
train()
File "train_button.py", line 160, in train
loss_l, loss_c = criterion(out, targets)
File "/home/deep-server/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/home/deep-server/Documents/Jingya/ssd.pytorch/modules/multibox_loss.py", line 110, in forward
loss_c = F.cross_entropy(conf_p, targets_weighted, size_average=False)
File "/home/deep-server/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/torch/nn/functional.py", line 509, in cross_entropy
return nll_loss(log_softmax(input), target, weight, size_average)
File "/home/deep-server/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/torch/nn/functional.py", line 477, in nll_loss
return f(input, target)
File "/home/deep-server/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/torch/nn/_functions/thnn/auto.py", line 41, in forward
output, *self.additional_args)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /b/wheel/pytorch-src/torch/lib/THCUNN/generic/ClassNLLCriterion.cu:83

Can you please help me with the possible reason for the error?
It should related to the line of loss_c = F.cross_entropy(conf_p, targets_weighted, size_average=False)
But I don't understand how it would go wrong.
Thank you in advance.

RuntimeError when converting image to tensor

Hi @amdegroot , I was trying to get the demo running and I'm having a problem when calling transform(img) of the BaseTransform class.

When doing python test.py the output is the following.

Finished loading model!
Testing image 1/4952....
Traceback (most recent call last):
  File "test.py", line 84, in <module>
    thresh=args.visual_threshold)
  File "test.py", line 39, in test_net
    x = Variable(transform(img).unsqueeze(0))
  File "/home/arian/Documents/proyecto-integrador/models/ssd/ssd-pytorch/data/data_augment.py", line 119, in __call__
    return torch.Tensor(img)
RuntimeError: tried to construct a tensor from a nested float sequence, but found an item of type numpy.float32 at index (0, 0, 0)

This happens in the demo notebook and in the test.py file.
Do you have any idea why this could be happening?.

Thanks,
Arian.

Error in training

I got the following error when training

THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THC/generic/THCTensorMath.cu line=226 error=59 : device-side assert triggered
Traceback (most recent call last):
  File "train_cars.py", line 232, in <module>
    train()
  File "train_cars.py", line 184, in train
    loss_l, loss_c = criterion(out, targets)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/mshah/code/ssd.pytorch/layers/modules/multibox_loss.py", line 70, in forward
    match(self.threshold,truths,defaults,self.variance,labels,loc_t,conf_t,idx)
  File "/home/mshah/code/ssd.pytorch/layers/box_utils.py", line 107, in match
    loc = encode(matches, priors, variances)
  File "/home/mshah/code/ssd.pytorch/layers/box_utils.py", line 133, in encode
    return torch.cat([g_cxcy, g_wh], 1)  # [num_priors,4]
RuntimeError: cuda runtime error (59) : device-side assert triggered at /b/wheel/pytorch-src/torch/lib/THC/generic/THCTensorMath.cu:226

Can I know how to fix?

error:Dimension out of range

When I run test.py, the sentence “y = net(x)” is error:
RuntimeError: dimension out of range - got 1 but the tensor is only 1D

Change:

'--cuda', default=True

thank you for help.

[question] Where can I calculate the MAP?

Hello @amdegroot,
thank you for making your code available. I am currently working in the ssd_keras port. However, we are missing the MAP score and I saw that you have already calculated yours. Do you think you could pin-point which code did you use to evaluate your SSD port.

Also I saw that you are missing the data_augmentation part. Maybe you could take a look here. It is a python generator, it is currently missing the crop transformation but it has helped me reach a better loss.

Thank you!

Possible bug?

_,loss_idx = loss_c.sort(1, descending=True)
 _,idx_rank = loss_idx.sort(1)

Just wondering, is this a bug? I don't think you should put descending=True to find idx_rank.

Why I keep getting CUDA error when I change the value of the weight decay?

The error is as follows:

It happens to many different values of weight decays.

How to derive the 'steps' in config.py?

The 'steps' in config.py is [8, 16, 32, 64, 100, 300]. I am just wondering how to derive these numbers? I have read the papers which says 'f_k is the size of kth square feature maps', though I cannot relate it with the numbers you got. Thanks.

Demo and Evaluation Detection

Hello finally I can run your code. But I don't know why the MAP is 0.0 for all. When I am using demo.ipynb I can't plot the bounding box, but the detection has been completed.

How to fix this?
-Thank you-

Can't train

kaan@ALTAR:ssd.pytorch$ python3 train.py
Loading base network...
Initializing weights...
Loading Dataset...
Training SSD on VOC0712
Traceback (most recent call last):
File "train.py", line 231, in
train()
File "train.py", line 170, in train
images, targets = next(batch_iterator)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 201, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
AttributeError: Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 40, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 40, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/kaan/ssd.pytorch/data/voc0712.py", line 117, in getitem
im, gt, h, w = self.pull_item(index)
File "/home/kaan/ssd.pytorch/data/voc0712.py", line 129, in pull_item
height, width, channels = img.shape
AttributeError: 'NoneType' object has no attribute 'shape'

AttributeError: 'NoneType' object has no attribute 'shape'

when I run 'python -m demo.live', it occurs to me. The environment is conda and python3.5, how can I solve it?

NaN values at Multibox encoding

I've tried to implement my own dataset detector, however at training time, the localization loss is NaN due to negative values present on g_wh layers/box_utils.py#L137, I don't know if this error is related to the format of the bounding boxes or if it's related to the output of the SSD model.

I would like to know if am I doing something wrong while loading the dataset or if the error is related to a bug on the base implementation.

how to obtain weiliu89 weights

I was wondering how I could obtain weiliu89 weights to validate 77.2%. I have changed some parts of the code and just want to validate if it is still reproducible. Thanks

Do the most models ignore difficulte instance in training and testing in PASCAL VOC competition?

I do not read the source code about py-faster-rcnn while its readiblity is poor. I noticed that you dose not keep difficulte instances in training and testing in PASCAL VOC. So tell me please, the most models ignore difficulte instance in training and testing in PASCAL VOC competition too, right? Thank you!

Can I know how the loss in SSD should look like?

Do you still keep your loss info? Currently my loss keep hovers at 20. Can I know what loss value is considered reasonable?

Time profiling result is inconsistent with result from original caffe SSD

I wondering the time consumption at each part. (VGG, Extra, multi_box, detection)
From the result of caffe version, the VGG part accounts for up to 80 percent of time consumption.
However, in this version, the distribution of time consumption is as follow:

Total time : 0.018(seconds) per image
VGG part 8.4%
Extra layer 2.8%
Multi_box 61%
detect 27.5%
Most of time is from Multi_box and detect.
I measure it by python time.time()

And both total time for one image is almost the same.
caffe : 19ms
pytorch : 18ms

I wondering why this inconsistence happen?

0/1-indexing error in eval.py

i note that in loading the xml file, 0-indexing is used:

        obj_struct['bbox'] = [int(bbox.find('xmin').text) - 1,
                              int(bbox.find('ymin').text) - 1,
                              int(bbox.find('xmax').text) - 1,
                              int(bbox.find('ymax').text) - 1]

however, the detections are 1-indexing

                # the VOCdevkit expects 1-based indices
                for k in range(dets.shape[0]):
                    f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'.
                            format(index[1], dets[k, -1],
                                   dets[k, 0] + 1, dets[k, 1] + 1,
                                   dets[k, 2] + 1, dets[k, 3] + 1))

if you use consistent indexing, the mAP for model ssd300_mAP_77.43_v2.pth should be 0.775538

Variance not used in priorbox?

I notice that you did not use the variance in priorbox. Is it supposed to be like this? The caffe code has the following which you seem to have left out

top_data += top[0]->offset(0, 1);
  if (variance_.size() == 1) {
    caffe_set<Dtype>(dim, Dtype(variance_[0]), top_data);
  } else {
    int count = 0;
    for (int h = 0; h < layer_height; ++h) {
      for (int w = 0; w < layer_width; ++w) {
        for (int i = 0; i < num_priors_; ++i) {
          for (int j = 0; j < 4; ++j) {
            top_data[count] = variance_[j];
            ++count;
          }
        }
      }
    }
  }
}

Though I do not understand what the offset and caffe_set does. Do you have any idea?

RunTime Error in Training with default values

python train.py Loading base network... Initializing weights... Loading Dataset... Training SSD on VOC0712 Traceback (most recent call last): File "train.py", line 232, in <module> train() File "train.py", line 181, in train out = net(images) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 60, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 70, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply raise output File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 42, in _worker output = module(*input, **kwargs) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/data/gpu/utkrsh/code/ssd.pytorch/ssd.py", line 76, in forward s = self.L2Norm(x) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/data/gpu/utkrsh/code/ssd.pytorch/layers/modules/l2norm.py", line 21, in forward x/=norm.expand_as(x) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/autograd/variable.py", line 725, in expand_as return Expand.apply(self, (tensor.size(),)) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 111, in forward result = i.expand(*new_size) RuntimeError: The expanded size of the tensor (512) must match the existing size (8) at non-singleton dimension 1. at /opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/THC/generic/T$CTensor.c:323

I am getting the above stack trace after running train.py for default values. The dataset and weights were downloaded in the default location.
I am using python 3.6 and pytorch 0.2.0
I do understand the meaning of the error, I am just not able to find the source. Can anyone point in the right direction?

How to train own datasets?

I have own datasets with labeled. How to train it?

Error when run test.py

Hi, there is a problem hope you can help me, thank you.

File "/home/hd/ssd.pytorch/data/voc.py", line 222
gts.append([label, *(int(bb.text) - 1 for bb in bbox)])
^
SyntaxError: invalid syntax

This error occur when I run the test.py. Thank you again.
^ under the *

Possible bug for the `Resize` class in augmentations.py

Hi all,

When image is resized, bounding boxes should be scaled accordingly, but the Resize class in augmentations.py does not scale the bounding box. Is this a bug?

Why "best_truth_overlap.index_fill_(0, best_prior_idx, 2)?"

Can I know how does the following line ensures best prior? Why 2? What will happen if this line is not included? It seems to me that it is not necessary to have this line. Thanks.

best_truth_overlap.index_fill_(0, best_prior_idx, 2)  # ensure best prior

Number of priors wrong on Multi-GPU mode

Hi there~

The PriorBox would encounter an error on multi-GPU mode. For example, when running on one GPU, the output size would be:

size(loc_data) = (16, 8732, 4)
size(conf_data) = (16, 8732, 21)
size(priors) = (8732, 4)

This is correct. But When running on 2 GPUs, the size of priors would be (17464, 4) and (26196, 4) on 3 GPUs, while the sizes of loc_data and conf_data remain the same as they are on 1 GPU.

ps. I found this bug when applying net = torch.nn.DataParallel(net).cuda() in train.py

Hope to see the solution.

Thanks.

How to visualize the computational graph?

I saw the images that represent the graph, but they were blurred. Are there some ways or scripts to reproduce the graph during training and inference? In the file .gitignore it seems it was used visualize.py to generate those pictures. I need this because i think it helps a lot as a first step for a good understanding of the architecture and the functionality of the model itself.

cuda Runtime Error (77): an illegal memory access was encountered

iter 510 || Loss: 6.8001 || THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/THC/generated/../THCReduceAll.cuh line=334 error=77 : an illegal memory access was encountered Traceback (most recent call last): File "train.py", line 231, in <module> train() File "train.py", line 183, in train loss_l, loss_c = criterion(out, targets) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/users/gpu/utkrsh/code/ssd.pytorch/layers/modules/multibox_loss.py", line 137, in forward conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1, self.num_classes) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/autograd/variable.py", line 72, in __getitem__ return MaskedSelect.apply(self, key) File "/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 468, in forward return tensor.masked_select(mask) RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/THC/generated/../THCReduceAll.cuh:334
I am trying to train the network with a slight modification in localization loss in multibox_loss.py. I keep on getting this error message for the same line of code. Also, when starting to train, there is a warning
/users/gpu/utkrsh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py:450: UserWarning: mask is not broadcastable to self, but they have the same number of elements. Falling back to deprecated pointwise behavior. return tensor.masked_fill_(mask, value)

I am training with batch_size=32 in train.py and everything else is at the default value. I have tried to modify the code but there is no impact on the warning and I keep getting this error.
Also, if I use a larger batch_size in train.py like 40, I get this illegal memory access error much earlier than with size 32.
Any suggestions for what might be wrong?

How to derive the math in box_utils.py?

def point_form(boxes):
    """ Convert prior_boxes to (xmin, ymin, xmax, ymax)
    representation for comparison to point form ground truth data.
    Args:
        boxes: (tensor) center-size default boxes from priorbox layers.
    Return:
        boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
    """
    return torch.cat((boxes[:, :2] - boxes[:, 2:]/2,     # xmin, ymin
                     boxes[:, :2] + boxes[:, 2:]/2), 1)  # xmax, ymax


def center_size(boxes):
    """ Convert prior_boxes to (cx, cy, w, h)
    representation for comparison to center-size form ground truth data.
    Args:
        boxes: (tensor) point_form boxes
    Return:
        boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
    """
    return torch.cat((boxes[:, 2:] + boxes[:, :2])/2,  # cx, cy
                     boxes[:, 2:] - boxes[:, :2], 1)  # w, h

            for i, k in enumerate(self.feature_maps):
                step_x = step_y = self.image_size/k
                for h, w in product(range(k), repeat=2):
                    c_x = ((w+0.5) * step_x)
                    c_y = ((h+0.5) * step_y)
                    c_w = c_h = self.min_sizes[i] / 2
                    s_k = self.image_size  # 300
                    # aspect_ratio: 1,
                    # size: min_size
                    mean += [(c_x-c_w)/s_k, (c_y-c_h)/s_k,
                             (c_x+c_w)/s_k, (c_y+c_h)/s_k]
                    if self.max_sizes[i] > 0:
                        # aspect_ratio: 1
                        # size: sqrt(min_size * max_size)/2
                        c_w = c_h = sqrt(self.min_sizes[i] *
                                         self.max_sizes[i])/2
                        mean += [(c_x-c_w)/s_k, (c_y-c_h)/s_k,
                                 (c_x+c_w)/s_k, (c_y+c_h)/s_k]
                    # rest of prior boxes
                    for ar in self.aspect_ratios[i]:
                        if not (abs(ar-1) < 1e-6):
                            c_w = self.min_sizes[i] * sqrt(ar)/2
                            c_h = self.min_sizes[i] / sqrt(ar)/2
                            mean += [(c_x-c_w)/s_k, (c_y-c_h)/s_k,
                                     (c_x+c_w)/s_k, (c_y+c_h)/s_k]

When I cross reference with prior_box.py, it seems that the math does not give what is written in the comments. center_size seems right but I think you also need to divide by 2 in the second expression for w and h?
I cannot derive the math for point form.

Can you kindly verify? Thanks.

RGB vs BGR?

Hello,
I was looking at your implementation and I believe the input to your model is an image with RGB ordering. I was also looking at the keras implementation and they use BGR values. I have been also testing with an map evaluation script and it seems that I get better results ;using the weights that you provided from the original caffe implementation, when I use BGR instead of RBG. Do you happen to know which order should we follow when using the original caffe weights?

Thank you very much :)

the training error will go to Nan by using your default parameter

Loading base network...
Initializing weights...
Loading Dataset...
Training SSD on VOC2007
Timer: 6.7833 sec.
iter 0 || Loss: 26.1034 || Timer: 0.2098 sec.
iter 10 || Loss: 15.1629 || Timer: 0.2115 sec.
iter 20 || Loss: 15.4713 || Timer: 0.2101 sec.
iter 30 || Loss: 17.6274 || Timer: 0.2153 sec.
iter 40 || Loss: 31.7296 || Timer: 0.2107 sec.
iter 50 || Loss: nan || Timer: 0.2113 sec.
iter 60 || Loss: nan || Timer: 0.2073 sec.
iter 70 || Loss: nan || Timer: 0.2035 sec.
iter 80 || Loss: nan || Timer: 0.2090 sec.
iter 90 || Loss: nan || Timer: 0.2055 sec.
iter 100 || Loss: nan || Timer: 0.2196 sec.
iter 110 || Loss: nan || Timer: 0.2064 sec.
iter 120 || Loss: nan || Timer: 0.2257 sec.
iter 130 || Loss: nan || Timer: 0.2051 sec.
iter 140 || Loss: nan || Timer: 0.2142 sec.
iter 150 || Loss: nan || Timer: 0.2056 sec.
iter 160 || Loss: nan || Timer: 0.2122 sec.
iter 170 || Loss: nan || Timer: 0.2090 sec.
iter 180 || Loss: nan || Timer: 0.2091 sec.
iter 190 || Loss: nan || Timer: 0.2110 sec.

Can you follow PEP 8?

Better Style be beneficial to all.

Do you have any idea why torchvision vgg16 cannot converge with your training pipeline?

Do you have any idea why torchvision vgg16 cannot converge with your training pipeline? I just changed your vgg model to torchvision ones and it can't converge. I tried many different parameters

A question about the L2Norm.py code

Hello, I don't understand why you calculate out = weight*x before return out and not return the x straightly.Could you tell me the reason?
thx~~ :）

def forward(self, x):
norm = x.pow(2).sum(1).sqrt()+self.eps
x/=norm.expand_as(x)
out = self.weight.unsqueeze(0).unsqueeze(2).unsqueeze(3).expand_as(x) * x <= here
return out

KeyError: 'unexpected key "0.weight" in state_dict'

when doing 'python test.py',the output is the following:
Traceback (most recent call last):
File "test.py", line 73, in
net.load_state_dict(torch.load(args.trained_model))
File "/home/qz/lzjqsdd/APP/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 331, in load_state_dict .format(name))
Do you have any idea why this could be happening?Thanks

can you provide a high resolution image for ssd.png

Run eval.py and result is 71.x, compared with yours 77.x

Hi,

I just finished the training process and run the test process by eval.py; however, I got a much lower result (compared with the results reported in readme, see here.). After a further digging, there are some concerns:

the parameters, confidence_threshold and top_k, are not used at all. See here.
there is no NMS process? See here.

One more thing, you report that without pre-training and using data augmentation alone, you have a 77.43% performance. What setting do you use? I just Xavier init all layers without a pretrained model, keeping all other parameters unchanged as in your repo; the training completely fails (at first, the loss is ~15; then 20k iter it goes down to 7.x and kept the same for the rest iter (max_iter=120k); the test mAP is 0.4, cf with pre-train 71.x that I got).

Thanks so much for your help!
Hongyang, Francis

why did you set difficult as False?

Hello

I am wondering why did you set the difficult training set as False.
Since I found that original code uses difficult training set as well.

Thanks

in your README you talked about the version 2 of SSD

where can I find the SSD version2 paper.

Any scheduled time to support image size 512?

Hello

Firstly I really appreciate your work.
I wonder if you have in mind to support image size 512 soon.

Thanks

What criteria of the scale factor(Prior Box) did you used in config.py?

Hello

I have a question about the criteria that you used in the config.py.

Since the original paper states about the scale factor to be 'regularly spaced', it seems your definition of scale factor is quit different from it.

For example, lets say 4 feature maps are used for prediction and if we define Smin and Smax to be 0.2 and 0.8 respectively, it results in (0.2 0.4 0.6 0.8) for each feature map scale factor.

However I found that your definition of scale factor
(30(0.1), 60(0.2), 111(0.37), 162(0.54), 213(0.71), 264(0.88)
seems to be not regularly spaced. The differences between scale factors are (0.1, 0.17 0.17 0.17 0.17). Do you have any special reasons to use it? (e.g Improving the accuracy?)

Any comments will be appreciated.
Thanks in advance.

tensors are on different GPUs

I run the demo, return 'tensors are on different GPSs' , but I have only one GPU.
The demo was run successfully with CPU.
Can you put the process of using the GPU to release it?
Thank you very much!

runtime error

hi,

have you successfully run the train.py?
I encountered a runtime error saying: "div_ only supports scalar multiplication" from line "x/=norm.expand_as(x)" in modules/l2norm.py
Then I changed this line to "x = x.div(nor.expand_as(x))" but got another cuda runtime error "device-side assert triggered" from line "return torch.cat([g_cxcy, g_wh], 1)" in box_utils.py

BTW, i am using python 2.7 instead of python3.

Run time error: tensors are on different GPUs

When I run the Demo Jupyter Notebook, I got a runtime error when "y = net(xx)". I have a GPU[0].
Thank you very much

RuntimeError Traceback (most recent call last)
in ()
3 xx = xx.cuda()
4 print(xx.t())
----> 5 y = net(xx)

/home/tech/anaconda3/envs/tf35/lib/python3.5/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
204
205 def call(self, *input, **kwargs):
--> 206 result = self.forward(*input, **kwargs)
207 for hook in self._forward_hooks.values():
208 hook_result = hook(self, input, result)

/home/tech/ssd.pytorch/ssd.py in forward(self, x)
72 # apply vgg up to conv4_3 relu
73 for k in range(23):
---> 74 x = self.vggk
75
76 s = self.L2Norm(x)

/home/tech/anaconda3/envs/tf35/lib/python3.5/site-packages/torch/nn/modules/conv.py in forward(self, input)
235 def forward(self, input):
236 return F.conv2d(input, self.weight, self.bias, self.stride,
--> 237 self.padding, self.dilation, self.groups)
238
239

/home/tech/anaconda3/envs/tf35/lib/python3.5/site-packages/torch/nn/functional.py in conv2d(input, weight, bias, stride, padding, dilation, groups)
38 f = ConvNd(_pair(stride), _pair(padding), _pair(dilation), False,
39 _pair(0), groups, torch.backends.cudnn.benchmark, torch.backends.cudnn.enabled)
---> 40 return f(input, weight, bias)
41
42

RuntimeError: tensors are on different GPUs

The difference in compute area_a and area_b in jaccard function

I am working on adding the randomhorizontalflip to this repo . But I always get the Nan loss in smoothl1loss ,and when I read the jaccard function in detail,I find:

area_a = ((box_a[:, 2]-box_a[:, 0]) *
          (box_a[:, 3]-box_a[:, 1])).unsqueeze(1).expand_as(inter)  # [A,B]
area_b = ((box_b[:, 2]-box_b[:, 0]) *
          (box_b[:, 3]-box_b[:, 1])).unsqueeze(0).expand_as(inter)

why the unsqueeze num is different? I don't understand.

Not use RandomHorizontalFlip?

The train_transform() is not used in the base_transform. So does this project use RandomHorizontalFlip?
Or this function is called other place?

how about the current performance

SSD keep_difficult=False Problem?

Hello

It seems the code trains the network(SSD) without difficult training sets.

Additionally I also trained the network with 07 ++ 12 train set (07 trainval, 07 test, 12 trainval) and tested with 12 test set using the official server. And the result was 74.1%, 2% below from the latest version of SSD300(75.8%). Of course there will be differences in the library(pytorch vs caffe), it seems like the network which was trained only with easy sets would not be able to achieve the original performance.

Any comments will be appreciated.
Thanks in advance.

Running on GPU errors

I'm very new to pytorch I'm getting these errors when I run the test.py file

File "test.py", line 93, in
thresh=args.visual_threshold)
File "test.py", line 54, in test_net
y = net(x) # forward pass
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/workspace/ssd.pytorch/ssd.py", line 102, in forward
self.priors # default boxes
File "/workspace/ssd.pytorch/layers/functions/detection.py", line 51, in forward
decoded_boxes = decode(loc_data[i], prior_data, self.variance)
File "/workspace/ssd.pytorch/layers/box_utils.py", line 152, in decode
priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:],
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torch/tensor.py", line 283, in mul
return self.mul(other)
TypeError: mul received an invalid combination of arguments - got (torch.FloatTensor), but expected one of:

(float value)
didn't match because some of the arguments have invalid types: (torch.FloatTensor)
(torch.cuda.FloatTensor other)
didn't match because some of the arguments have invalid types: (torch.FloatTensor)

CUDNN_STATUS_ALLOC_FAILED

I got this issue coming up. I was able to fix it though by setting cudnn.benchmark = False and setting --batch_size to 8.

amdegroot / ssd.pytorch Goto Github PK

ssd.pytorch's People

Contributors

Stargazers

Watchers

Forkers

ssd.pytorch's Issues

I am working on adding the randomhorizontalflip to this repo . But I always get the Nan loss in smoothl1loss ,and when I read the jaccard function in detail,I find:

Recommend Projects

Recommend Topics

Recommend Org