Code Monkey home page Code Monkey logo

yolov3-gradcam's People

Contributors

pifalken avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

yolov3-gradcam's Issues

Output size of extractor.forward_pass()

It seems to me that a direct forward pass via model(x) and using the extractor's forward pass through forward_pass_on_convolutions(x) gives outputs of different sizes.

forward_pass_on_convolutions(x) outputs a tensor of size (1, 477360), which is the flattened form of (1, 3, 36, 52, 85) -> (1, 5616, 85) -> (1, 477360).

However, using model_output = self.model(x)gives multiple outputs: model_output[0] has shape (1, 7371, 85), as opposed to (1, 5616, 85) we previously obtained. I turned to model_output[1], which is a list of size 3, to understand what's going on:

model_output[1][0].shape -> (1, 3, 9, 13, 85) -> (1, 351, 85)
model_output[1][1].shape -> (1, 3, 18, 26, 85) -> (1, 1404, 85)
model_output[1][2].shape -> (1, 3, 36, 52, 85) -> (1, 5616, 85): this is what `forward_pass_on_convolutions(x)` returns.

Now, concatenating these along axis 1 gives us: (1, 351 + 1404 + 5616, 85) -> (1, 7371, 85): this is the shape of model_output[0].

The YOLOv2/YOLO9000 paper mentions the following:

Fine-Grained Features.This modified YOLO predicts detections on a 13 ร— 13 feature map. While this is sufficient for large objects, it may benefit from finer grained features for localizing smaller objects. Faster R-CNN and SSD both run their proposal networks at various feature maps in the network to get a range of resolutions. We take a different approach, simply adding a passthrough layer that brings features from an earlier layer at 26 ร— 26 resolution.

I infer from this that a similar feature is at work here, and results from 3 different resolutions are brought together as outputs, and concatenated to produce an output of size (1, 7371, 85). However, forward_pass_on_convolutions(x) only provides the outputs of the 3rd resolution, hence the equality with model_output[1][2].shape -> (1, 5616, 85).

In light of these, I have two questions:

  1. Why does forward_pass_on_convolutions(x) not include the outputs of the other resolutions? It seems like in the current setting we are backpropagating with incomplete target outputs (the shape of the target outputs we generate in generate_cam are also (1, 5616, 85)).

  2. As a solution, I tried to generate 3 target tensors with sizes that correspond to the 3 resolutions, but only the one with size (1, 5616, 85) can be backpropagated, the others expectedly fail on model_output.backward() due to size incompatibility. How can I go around this so that the other sizes can be backpropagated as well?

Many thanks for the help in advance.

Something's wrong

image 1/1 data/samples/Jujube_4501.bmp: Model Summary: 222 layers, 6.1556e+07 parameters, 6.1556e+07 gradients
Traceback (most recent call last):
  File "/home/zxzn/YOLOv3-GradCAM/gradcam/gradcam.py", line 178, in <module>
    cam = grad_cam.generate_cam(prep_img, target_class)
  File "/home/zxzn/YOLOv3-GradCAM/gradcam/gradcam.py", line 127, in generate_cam
    conv_output, model_output = self.extractor.forward_pass(input_image)
  File "/home/zxzn/YOLOv3-GradCAM/gradcam/gradcam.py", line 106, in forward_pass
    conv_output, x = self.forward_pass_on_convolutions(x)
  File "/home/zxzn/YOLOv3-GradCAM/gradcam/gradcam.py", line 91, in forward_pass_on_convolutions
    x = x + layer_outputs[int(mdef["from"])]
TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'

Process finished with exit code 1

Why is that? I used the model trained by yolov3 of ultralytics and the corresponding CFG, but it couldn't work normally. In addition, I also took it out of the yolov3 file of ultralytics models.py , utils folder

This is my directory structure

(cam) zxzn@zxzn:~/YOLOv3-GradCAM/gradcam$ ls
cfg  data  detect.py  gradcam.py  misc_functions.py  models.py  __pycache__  test.py  train.py  utils  weights

Thank you for your help @pifalken

What do I have to do to move it?

I am very afraid of asking this kind of basic question but I don't know how to use this program. Can someone please tell me how to use it?
I will probably incorporate "ultralytics/yolov3", but I don't know what to do after that.
environment is using Google colab.

Where can I find programs such as models.py and utils folder?

Hello.

I want to run GradCAM on the created YOLO v3 AI, so I accessed your this repository.
When I ran gradcam.py, I noticed that there was no models.py or utils folder in this repository.
So Currently, gradcam.py cannot be executed.

Where can I get the models.py program and the utils folder?

Thank you very much.

Potential issue in creation of the target tensor for backpropagation

In gradcam.py, generate_cam function, the target tensor is created as follows:

# target for backprop
one_hot_output = torch.cuda.FloatTensor(1, model_output.size()[-1]).zero_()
one_hot_output[0][target_class] = 1

Given that model_output here is of shape (1, 3, 36, 52, 85) before flattening, shouldn't all elements of index [target_class] of the last dimension (# classes) be set to 1? I think setting the index of target class to 1 in the tensor after flattening may be incorrect, so I came up with this:

# I also return the dimensions of the output before flattening in `forward_pass()`
conv_output, model_output, model_output_dims = self.extractor.forward_pass(input_image)

# create tensor with original output shape
one_hot_output = torch.FloatTensor(model_output_dims.shape).zero_()
# set `target_class` to 1 over all dimensions other than the class dimension
one_hot_output[:, :, :, :, target_class] = 1
# flatten afterwards
one_hot_output = one_hot_output.view(one_hot_output.size(0), -1)

I think this makes sense, but I wanted to run it by @pifalken in case I'm wrong, otherwise this may need to be corrected in the repo. Let me know what you think!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.