Code Monkey home page Code Monkey logo

Comments (13)

xingyizhou avatar xingyizhou commented on August 15, 2024 1

Oh actually in our implementation, w and h are in the downsampled space. Sorry that it is not clearly stated in the paper. We will add this in the next revision.

from centernet.

xingyizhou avatar xingyizhou commented on August 15, 2024

Hi,
Thanks for the detailed reading.

  1. Yes. And other positions will be masked out.
  2. We follow CornerNet.Actually such collisions happen very rarely, even in the category-agnostic case. You can refer to Appendix C of our paper and check the collision experiments code here.
  3. k is short for keypoints, similar to L_{size} in eq. 2.

from centernet.

rafikg avatar rafikg commented on August 15, 2024

Thanks @xingyizhou
I have an other question in the section "From points to bounding boxes" about how to produce the bounding box (see the equation at the bottom of page 3 just after equation 4. I think you should annotate this equation as equation 5 :) ).
x_hat_i , delta_x_hat_i are inferred in the dimension of the heat map (W/R, H/R), however w_hat_i and h_hat_i are inferred in the original size (see L_size in equation 3). So in the equation 5, w_hat_i and h_hat_i should be divided by R. What do you think ?

from centernet.

xingyizhou avatar xingyizhou commented on August 15, 2024

Hi,
Thanks for the suggestion. Yes, "formula 5" is in the output size. The final prediction is multiplied by R.

from centernet.

rafikg avatar rafikg commented on August 15, 2024

Okay, you mean (x_hat_i+ delta_x_hat_i) and (y_hat_i+ delta_y_hat_i) are multiplied by R ?

from centernet.

xingyizhou avatar xingyizhou commented on August 15, 2024

I mean we will multiply the whole entity by R after "formula 5".

from centernet.

rafikg avatar rafikg commented on August 15, 2024

Sorry but I think it does not make sense. You have to multiply only the (x_hat_i+ delta_x_hat_i) and (y_hat_i+ delta_y_hat_i) by R, because w_hat and h_hat are already in the original size. Thanks

from centernet.

rafikg avatar rafikg commented on August 15, 2024

Thanks for you time @xingyizhou . I am reading the part of the 3D estimation. Your estimation is based on the paper in [13]. I read your supplementary which is very helpfull and I understand that you estimate the depth that correspond to every center point, also the dimension and the angle.

My questions (or remarks ) are:

  1. I don't understand d= 1/σ(ˆd)−1 and how this quantity is in [0, 1].

We instead use the output transformation of Eigenet al. [13] and d= 1/σ(ˆd)−1, where σ is the sigmoid function. We compute the depth as an additional output channel ˆD ∈ [0,1] WR×HRof our keypointestimator.

  1. I checked the paper [13] and If I understand well they used a sort of regression with a particular loss function that takes into account the relation between pairs of pixels in ground truth and in the predicted depth map ( each pair of pixels in the prediction must differ in depth by an amount similar to that of the corresponding pair in the ground truth). I don't see exactly where you used the paper [13] depth estimation ?
  2. The orientation angle is with respect to the camera or with respect to the ray that goes through the center of the bounding-box ?
  3. do you have a demo script for 3D bounding box?

from centernet.

xingyizhou avatar xingyizhou commented on August 15, 2024
  1. Sorry, Depth should be in R ^ WR×HR. We have fixed this in the supplementary but missed the main paper. We will fix this in the next revision.
  2. We mean we take the idea of estimating the inverse depth from [13].
  3. The network predicts the camera-view vehicle orientation, and the orientation is then converted to the global orientation here. You can check Fig.3, Fig.4 of Mou. et al and Sec. 4.2 (Allocentric vs. Egocentric:) of 3DRCNN for more details about the orientation representation.
  4. You can run python test.py ddd --dataset kitti --load_model ../models/ddd_3dop.pth --debug 2 to visualize the 3D bbox output on KITTI, after setting up kitti data. I haven't tried our 3D model on custom images. I don't expect it will work well on images outside kitti because kitti is in a strange aspect ratio and requires a known camera intrinsic.

from centernet.

rafikg avatar rafikg commented on August 15, 2024

@xingyizhou This paper seems very good but it needs to be re-written from scratch!

from centernet.

xingyizhou avatar xingyizhou commented on August 15, 2024

Hi,
Thanks for your feedback. We will more appreciate it if you can specify which aspects of the writing you don't like.

from centernet.

rafikg avatar rafikg commented on August 15, 2024

@xingyizhou, I think you have to fix the points that I have mentioned and other things is you can add more understandable figures for the models (because you did not handle the model visualization in the code). Thanks

from centernet.

ustczhouyu avatar ustczhouyu commented on August 15, 2024

Training and testing on my own data, many objects are missed, but when each image is cut into two images and then tested, there is no missed objects. Maybe it is because my data contains too many objects. How to modify the object number of output ?

from centernet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.