Code Monkey home page Code Monkey logo

Comments (4)

akshaychawla avatar akshaychawla commented on August 11, 2024

Thank you for the interest in our work! and you are correct in assuming that we want to generate the onebox dataset with this inversion process.

By default, we set args.real_mixin_alpha=0.0 over here:

parser.add_argument("--real_mixin_alpha", type=float, default=0.0, help="how much of real image to mix in with the random initialization")

This makes sure that the initialization consists of only random noise i.e the expression becomes: init = 0.0*imgs + (1.0 - 0.0)*init

This argument was added because we were curious to explore what the model inversion process generates when initialized with something other than random noise. An example of such an initialization would be using bounding boxes and corresponding images from a real dataset such as COCO or Pascal's VOC0712. In this scenario, we can use args.real_mixin_alpha to control how much of the initialization is close to the original image.

from diode.

merlinarer avatar merlinarer commented on August 11, 2024

Thank you for the interest in our work! and you are correct in assuming that we want to generate the onebox dataset with this inversion process.

By default, we set args.real_mixin_alpha=0.0 over here:

parser.add_argument("--real_mixin_alpha", type=float, default=0.0, help="how much of real image to mix in with the random initialization")

This makes sure that the initialization consists of only random noise i.e the expression becomes: init = 0.0*imgs + (1.0 - 0.0)*init
This argument was added because we were curious to explore what the model inversion process generates when initialized with something other than random noise. An example of such an initialization would be using bounding boxes and corresponding images from a real dataset such as COCO or Pascal's VOC0712. In this scenario, we can use args.real_mixin_alpha to control how much of the initialization is close to the original image.

Thank you for your detailed reply, and I got it. So, are the evaluation results in your paper based on an initialization with a real dataset or not ?
Another question is about Table 3. The top and the bottom rows are the results by using original and generated images & lables, respectively, and both are quite clear. My confusion comes from the middle row. You mentioned that it is by using synthetic images conditioned on MS-COCO labels. However, MS-COCO labels contains multiple objects for each image, and how did you use these labels to generate corresponding images ?
Looking forward to your reply !

from diode.

akshaychawla avatar akshaychawla commented on August 11, 2024
  1. The evaluation results in our paper are based on using random noise as initialization for the DIODE generation process i.e args.real_mixin_alpha was always set to 0.0.

  2. In middle row of Table 3, you are correct that we use multiple object labels for each image when we sample labels from coco. This is because our object detection network Yolo-V3 and its loss function allow predicting multiple objects per image. Hence, during the inversion process, we can condition on multiple bboxes for every image.

In fact, we use the ability to condition on multiple bboxes as part of a unique bbox sampling procedure called false positive sampling (FP sampling). In FP sampling, we discover that during the image generation process, network constantly tries to add context to the image i.e if we condition on a road bike, we often get to see a human generated close to it. To use this unique ability, we aggregate high confidence false positive detections that appear during the generation process leading to more realistic initialization bboxes and generated images. See section 3.1, 5.2 and figure 3.

from diode.

merlinarer avatar merlinarer commented on August 11, 2024
  1. The evaluation results in our paper are based on using random noise as initialization for the DIODE generation process i.e args.real_mixin_alpha was always set to 0.0.
  2. In middle row of Table 3, you are correct that we use multiple object labels for each image when we sample labels from coco. This is because our object detection network Yolo-V3 and its loss function allow predicting multiple objects per image. Hence, during the inversion process, we can condition on multiple bboxes for every image.

In fact, we use the ability to condition on multiple bboxes as part of a unique bbox sampling procedure called false positive sampling (FP sampling). In FP sampling, we discover that during the image generation process, network constantly tries to add context to the image i.e if we condition on a road bike, we often get to see a human generated close to it. To use this unique ability, we aggregate high confidence false positive detections that appear during the generation process leading to more realistic initialization bboxes and generated images. See section 3.1, 5.2 and figure 3.

Very clear, thanks for your reply !

from diode.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.