Thank you for sharing ! I have a questions about your code. I noticed that the cod

Confusion about the one box dataset about diode HOT 4 CLOSED

nvlabs commented on August 11, 2024 1

Confusion about the one box dataset

from diode.

Comments (4)

akshaychawla commented on August 11, 2024

Thank you for the interest in our work! and you are correct in assuming that we want to generate the onebox dataset with this inversion process.

By default, we set args.real_mixin_alpha=0.0 over here:

DIODE/main_yolo.py

Line 256 in 80a396d

    
           parser.add_argument("--real_mixin_alpha", type=float, default=0.0, help="how much of real image to mix in with the random initialization")

This makes sure that the initialization consists of only random noise i.e the expression becomes: init = 0.0*imgs + (1.0 - 0.0)*init

This argument was added because we were curious to explore what the model inversion process generates when initialized with something other than random noise. An example of such an initialization would be using bounding boxes and corresponding images from a real dataset such as COCO or Pascal's VOC0712. In this scenario, we can use args.real_mixin_alpha to control how much of the initialization is close to the original image.

from diode.

merlinarer commented on August 11, 2024

Thank you for the interest in our work! and you are correct in assuming that we want to generate the onebox dataset with this inversion process.

By default, we set args.real_mixin_alpha=0.0 over here:

DIODE/main_yolo.py

Line 256 in 80a396d

parser.add_argument("--real_mixin_alpha", type=float, default=0.0, help="how much of real image to mix in with the random initialization")

This makes sure that the initialization consists of only random noise i.e the expression becomes: init = 0.0*imgs + (1.0 - 0.0)*init
This argument was added because we were curious to explore what the model inversion process generates when initialized with something other than random noise. An example of such an initialization would be using bounding boxes and corresponding images from a real dataset such as COCO or Pascal's VOC0712. In this scenario, we can use args.real_mixin_alpha to control how much of the initialization is close to the original image.

Thank you for your detailed reply, and I got it. So, are the evaluation results in your paper based on an initialization with a real dataset or not ?
Another question is about Table 3. The top and the bottom rows are the results by using original and generated images & lables, respectively, and both are quite clear. My confusion comes from the middle row. You mentioned that it is by using synthetic images conditioned on MS-COCO labels. However, MS-COCO labels contains multiple objects for each image, and how did you use these labels to generate corresponding images ?
Looking forward to your reply !

from diode.

akshaychawla commented on August 11, 2024

The evaluation results in our paper are based on using random noise as initialization for the DIODE generation process i.e args.real_mixin_alpha was always set to 0.0.
In middle row of Table 3, you are correct that we use multiple object labels for each image when we sample labels from coco. This is because our object detection network Yolo-V3 and its loss function allow predicting multiple objects per image. Hence, during the inversion process, we can condition on multiple bboxes for every image.

In fact, we use the ability to condition on multiple bboxes as part of a unique bbox sampling procedure called false positive sampling (FP sampling). In FP sampling, we discover that during the image generation process, network constantly tries to add context to the image i.e if we condition on a road bike, we often get to see a human generated close to it. To use this unique ability, we aggregate high confidence false positive detections that appear during the generation process leading to more realistic initialization bboxes and generated images. See section 3.1, 5.2 and figure 3.

from diode.

merlinarer commented on August 11, 2024

The evaluation results in our paper are based on using random noise as initialization for the DIODE generation process i.e args.real_mixin_alpha was always set to 0.0.

In middle row of Table 3, you are correct that we use multiple object labels for each image when we sample labels from coco. This is because our object detection network Yolo-V3 and its loss function allow predicting multiple objects per image. Hence, during the inversion process, we can condition on multiple bboxes for every image.

In fact, we use the ability to condition on multiple bboxes as part of a unique bbox sampling procedure called false positive sampling (FP sampling). In FP sampling, we discover that during the image generation process, network constantly tries to add context to the image i.e if we condition on a road bike, we often get to see a human generated close to it. To use this unique ability, we aggregate high confidence false positive detections that appear during the generation process leading to more realistic initialization bboxes and generated images. See section 3.1, 5.2 and figure 3.

Very clear, thanks for your reply !

from diode.

Confusion about the one box dataset about diode HOT 4 CLOSED

Comments (4)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent