Code Monkey home page Code Monkey logo

Comments (11)

rainsoulsrx avatar rainsoulsrx commented on July 22, 2024

hi, I have the same question!
In the paper, the author said "We used the method described by [2] to select reference bounding box shapes to match the data distribution", I think that's to say, the anchor shape is selected according to the objects' weight and height. So when the traing data changes, the anchor shape may also need to be changed.

from squeezedet.

BichenWuUCB avatar BichenWuUCB commented on July 22, 2024

@ChaunceyWang Thanks for your question. Responses in-line:

I want to input other image resolution , but I have some problems, first , how to design the anchors?

As @rainsoulsrx explained, you could refer to our previous paper on how to select an optimal set of anchors to best fits the data distribution.

I know that if I use squeezeDet and input image is 1272x375, the feature map shape after fire11 is [1,22,76,768]. Then how to calculate these numbers [ 36., 37.], [ 366., 174.]... ?

Anchor sizes should fit the bounding box shape distribution of your data (e.g., the shape of cars, pedestrians, cyclists, etc.). So the feature map dimension has nothing to do with anchor sizes.

Besides, I noticed your reply :#1, you mentioned "grid size" there, does it mean the (22,76) above? if I want to input 600*600 image, I only need to change the "(H,W)"? Do I need to modify the 9 anchor_shapes?

Yes, the grid size in this case is just (22, 76). Grid size is only dependent on the original image resolution, and it is independent with anchor shape.

In addition , the equation (3) in the paper seems to be different from the codes ...

Yes, you are correct. I will fix this in the paper.

from squeezedet.

ChaunceyWang avatar ChaunceyWang commented on July 22, 2024

@rainsoulsrx Thanks for your reminding, I ignored the citation.
@BichenWuUCB Thanks for your reply. I saw the anchor shapes in resnet50_convDet.py :

  H, W, B = 24, 78, 9
  anchor_shapes = np.reshape(
      [np.array(
          [[  94.,  49.], [ 225., 161.], [ 170.,  91.],
           [ 390., 181.], [  41.,  32.], [ 128.,  64.],
           [ 298., 164.], [ 232.,  99.], [  65.,  42.]])] * H * W,
      (H, W, B, 2)
  ) 

which is different from squeezeNet before. You said:

Anchor sizes should fit the bounding box shape distribution of your data

. The inputs are same (KITTI dataset), why are the anchor shapes different?

from squeezedet.

rainsoulsrx avatar rainsoulsrx commented on July 22, 2024

I think that's because in Res50, the feature map shape is 24*78 in that layer.

from squeezedet.

ChaunceyWang avatar ChaunceyWang commented on July 22, 2024

@rainsoulsrx However, the feature map shape of VGG 16 in that layer is also 24*78, this is the source codes in kitti_vgg16_config.py

  H, W, B = 24, 78, 9
  anchor_shapes = np.reshape(
      [np.array(
          [[  36.,  37.], [ 366., 174.], [ 115.,  59.],
           [ 162.,  87.], [  38.,  90.], [ 258., 173.],
           [ 224., 108.], [  78., 170.], [  72.,  43.]])] * H * W,
      (H, W, B, 2)
  )

But the anchor_shapes is different from ResNet50 above, is there something wrong?
And from the paper,does the anchor shape only relate to input object shape distribution ?

from squeezedet.

BichenWuUCB avatar BichenWuUCB commented on July 22, 2024

@ChaunceyWang Yes, theoretically, res50, vgg16, squeezeDet can use the same set of anchor shapes -- as long as their inputs are the same dataset with same image resolution. The reason why you see two sets of anchor shapes is the following: KITTI evaluation script ignores objects that are smaller than a certain size (there are other criteria). So when selecting anchor shapes, we could either ignore smaller objects by following KITTI's standard, or we could keep them. We used the K-Means method described in our previous paper to choose two sets of anchors for above two cases.

It is a bit confusing that we used a different set of anchor shapes for res50 model, but the reason has nothing to do with the model or input dataset, etc. The res50 model that we released here is one of many that we trained and it happened to use a different set of anchors.

Does it make sense?

from squeezedet.

ChaunceyWang avatar ChaunceyWang commented on July 22, 2024

@BichenWuUCB Thank you! Roger that!

from squeezedet.

rainsoulsrx avatar rainsoulsrx commented on July 22, 2024

Hi, BiChen, Thanks for your patient explaination! I still have a question about the anchor size select.

anchor_shapes = np.reshape(
[np.array(
[[ 36., 37.], [ 366., 174.], [ 115., 59.],
[ 162., 87.], [ 38., 90.], [ 258., 173.],
[ 224., 108.], [ 78., 170.], [ 72., 43.]])] * H * W,
(H, W, B, 2)
)
My question is, Is 36, 37, 366, 174 here... the anchor's weight and height relative to the image after resize? that is to say H * W(1242*375) but not the feature map? So if I chanege the image size, I should change the anchor size correspondingly relative to the input image size but not the feature map? I do not know if I understand correctly.

from squeezedet.

rainsoulsrx avatar rainsoulsrx commented on July 22, 2024

Hi, BiChen, I have another question.
If I have the origion training datasets whose resolution is 19201080 and the information in trainval.txt is also corresponding to this size, but this big size would cause the speed very slow, so I want to change the size to 450300(for example) when training the net, that's to say, I change mc.IMAGE_WIDTH and mc.IMAGE_HEIGHT to 450 and 300, but do I need to change the information in trainval.txt at the same time? or the code will do the change automatically?

from squeezedet.

BichenWuUCB avatar BichenWuUCB commented on July 22, 2024

@rainsoulsrx: thanks for your questions.

First question: anchor size is relative to the original image size and grid size H, W is also relative to the image size. Depending on the padding/striding strategies, H,W are roughly 1/16 of the original image height and width. So, let's say you want to down-sample your input image by half (on both width and height), then yes, you need to down-sample your anchor size by half and you need to change grid size H,W to be the same as the last conv-layer output's spatial dimension.

Second question: In my implementation, trainval.txt only contains index to images. Image resolution is specified by mc.IMAGE_WIDTH and mc.IMAGE_HEIGHT in the config file. Once you modified the two variables correctly, input images should be re-sized to the desired resolution. You don't need to (and shouldn't) specify image resolution in the trainval.txt.

Hope that helps,
Bichen

from squeezedet.

Imkaran avatar Imkaran commented on July 22, 2024

Hi Bichen,
I have training set with image resolution of 1280x720. When I modified the image height and width in "kitti_model_config.py" and "kitti_squeezeDet_config.py", I got the following error

ValueError: Cannot reshape a tensor with 648000 elements to shape [10,16848,2] (336960 elements) for 'interpret_output/pred_class_probs' (op: 'Reshape') with input shapes: [324000,2], [3] and with input tensors computed as partial shapes: input[1] = [10,16848,2].

Can you please help me in resolving the issue

from squeezedet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.