<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Oh actually in our <a href="https://github.com/xingyizhou/CenterNet/blob/master/src/li

Hi, Thanks for the detailed reading. Yes. And other positi

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Thanks for you time <a class="user-mention notranslate" data-hovercard-type="user" dat

Sorry, Depth should be in R ^ WR×HR. We have fixed this in the supplementary but

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Confusion in equation 2 about centernet HOT 13 CLOSED

xingyizhou commented on August 15, 2024 1

Confusion in equation 2

from centernet.

Comments (13)

xingyizhou commented on August 15, 2024 1

Oh actually in our implementation, w and h are in the downsampled space. Sorry that it is not clearly stated in the paper. We will add this in the next revision.

from centernet.

xingyizhou commented on August 15, 2024

Hi,
Thanks for the detailed reading.

Yes. And other positions will be masked out.
We follow CornerNet.Actually such collisions happen very rarely, even in the category-agnostic case. You can refer to Appendix C of our paper and check the collision experiments code here.
k is short for keypoints, similar to L_{size} in eq. 2.

from centernet.

rafikg commented on August 15, 2024

Thanks @xingyizhou
I have an other question in the section "From points to bounding boxes" about how to produce the bounding box (see the equation at the bottom of page 3 just after equation 4. I think you should annotate this equation as equation 5 :) ).
x_hat_i , delta_x_hat_i are inferred in the dimension of the heat map (W/R, H/R), however w_hat_i and h_hat_i are inferred in the original size (see L_size in equation 3). So in the equation 5, w_hat_i and h_hat_i should be divided by R. What do you think ?

from centernet.

xingyizhou commented on August 15, 2024

Hi,
Thanks for the suggestion. Yes, "formula 5" is in the output size. The final prediction is multiplied by R.

from centernet.

rafikg commented on August 15, 2024

Okay, you mean (x_hat_i+ delta_x_hat_i) and (y_hat_i+ delta_y_hat_i) are multiplied by R ?

from centernet.

xingyizhou commented on August 15, 2024

I mean we will multiply the whole entity by R after "formula 5".

from centernet.

rafikg commented on August 15, 2024

Sorry but I think it does not make sense. You have to multiply only the (x_hat_i+ delta_x_hat_i) and (y_hat_i+ delta_y_hat_i) by R, because w_hat and h_hat are already in the original size. Thanks

from centernet.

rafikg commented on August 15, 2024

Thanks for you time @xingyizhou . I am reading the part of the 3D estimation. Your estimation is based on the paper in [13]. I read your supplementary which is very helpfull and I understand that you estimate the depth that correspond to every center point, also the dimension and the angle.

My questions (or remarks ) are:

I don't understand d= 1/σ(ˆd)−1 and how this quantity is in [0, 1].

We instead use the output transformation of Eigenet al. [13] and d= 1/σ(ˆd)−1, where σ is the sigmoid function. We compute the depth as an additional output channel ˆD ∈ [0,1] WR×HRof our keypointestimator.

I checked the paper [13] and If I understand well they used a sort of regression with a particular loss function that takes into account the relation between pairs of pixels in ground truth and in the predicted depth map ( each pair of pixels in the prediction must differ in depth by an amount similar to that of the corresponding pair in the ground truth). I don't see exactly where you used the paper [13] depth estimation ?
The orientation angle is with respect to the camera or with respect to the ray that goes through the center of the bounding-box ?
do you have a demo script for 3D bounding box?

from centernet.

xingyizhou commented on August 15, 2024

Sorry, Depth should be in R ^ WR×HR. We have fixed this in the supplementary but missed the main paper. We will fix this in the next revision.
We mean we take the idea of estimating the inverse depth from [13].
The network predicts the camera-view vehicle orientation, and the orientation is then converted to the global orientation here. You can check Fig.3, Fig.4 of Mou. et al and Sec. 4.2 (Allocentric vs. Egocentric:) of 3DRCNN for more details about the orientation representation.
You can run python test.py ddd --dataset kitti --load_model ../models/ddd_3dop.pth --debug 2 to visualize the 3D bbox output on KITTI, after setting up kitti data. I haven't tried our 3D model on custom images. I don't expect it will work well on images outside kitti because kitti is in a strange aspect ratio and requires a known camera intrinsic.

from centernet.

rafikg commented on August 15, 2024

@xingyizhou This paper seems very good but it needs to be re-written from scratch!

from centernet.

xingyizhou commented on August 15, 2024

Hi,
Thanks for your feedback. We will more appreciate it if you can specify which aspects of the writing you don't like.

from centernet.

rafikg commented on August 15, 2024

@xingyizhou, I think you have to fix the points that I have mentioned and other things is you can add more understandable figures for the models (because you did not handle the model visualization in the code). Thanks

from centernet.

ustczhouyu commented on August 15, 2024

Training and testing on my own data, many objects are missed, but when each image is cut into two images and then tested, there is no missed objects. Maybe it is because my data contains too many objects. How to modify the object number of output ?

from centernet.

Confusion in equation 2 about centernet HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent