pifalken / yolov3-gradcam Goto Github PK

View Code? Open in Web Editor NEW

28.0 1.0 14.0 10 KB

GradCAM algorithm implementation for YOLOv3

Python 100.00%

pytorch yolov3 gradcam

yolov3-gradcam's People

Contributors

Stargazers

Watchers

Forkers

sweetclover fenhua moonjooyoung sryo00 zhengweisrc li-yan-type sphrf noticeable noobgrow achel-x gtttter chenpeng68 jaechang3456

yolov3-gradcam's Issues

Output size of extractor.forward_pass()

It seems to me that a direct forward pass via model(x) and using the extractor's forward pass through forward_pass_on_convolutions(x) gives outputs of different sizes.

forward_pass_on_convolutions(x) outputs a tensor of size (1, 477360), which is the flattened form of (1, 3, 36, 52, 85) -> (1, 5616, 85) -> (1, 477360).

However, using model_output = self.model(x)gives multiple outputs: model_output[0] has shape (1, 7371, 85), as opposed to (1, 5616, 85) we previously obtained. I turned to model_output[1], which is a list of size 3, to understand what's going on:

model_output[1][0].shape -> (1, 3, 9, 13, 85) -> (1, 351, 85)
model_output[1][1].shape -> (1, 3, 18, 26, 85) -> (1, 1404, 85)
model_output[1][2].shape -> (1, 3, 36, 52, 85) -> (1, 5616, 85): this is what `forward_pass_on_convolutions(x)` returns.

Now, concatenating these along axis 1 gives us: (1, 351 + 1404 + 5616, 85) -> (1, 7371, 85): this is the shape of model_output[0].

The YOLOv2/YOLO9000 paper mentions the following:

Fine-Grained Features.This modified YOLO predicts detections on a 13 × 13 feature map. While this is sufficient for large objects, it may benefit from finer grained features for localizing smaller objects. Faster R-CNN and SSD both run their proposal networks at various feature maps in the network to get a range of resolutions. We take a different approach, simply adding a passthrough layer that brings features from an earlier layer at 26 × 26 resolution.

I infer from this that a similar feature is at work here, and results from 3 different resolutions are brought together as outputs, and concatenated to produce an output of size (1, 7371, 85). However, forward_pass_on_convolutions(x) only provides the outputs of the 3rd resolution, hence the equality with model_output[1][2].shape -> (1, 5616, 85).

In light of these, I have two questions:

Why does forward_pass_on_convolutions(x) not include the outputs of the other resolutions? It seems like in the current setting we are backpropagating with incomplete target outputs (the shape of the target outputs we generate in generate_cam are also (1, 5616, 85)).
As a solution, I tried to generate 3 target tensors with sizes that correspond to the 3 resolutions, but only the one with size (1, 5616, 85) can be backpropagated, the others expectedly fail on model_output.backward() due to size incompatibility. How can I go around this so that the other sizes can be backpropagated as well?

Many thanks for the help in advance.

have you finish this job?

Something's wrong

image 1/1 data/samples/Jujube_4501.bmp: Model Summary: 222 layers, 6.1556e+07 parameters, 6.1556e+07 gradients
Traceback (most recent call last):
  File "/home/zxzn/YOLOv3-GradCAM/gradcam/gradcam.py", line 178, in <module>
    cam = grad_cam.generate_cam(prep_img, target_class)
  File "/home/zxzn/YOLOv3-GradCAM/gradcam/gradcam.py", line 127, in generate_cam
    conv_output, model_output = self.extractor.forward_pass(input_image)
  File "/home/zxzn/YOLOv3-GradCAM/gradcam/gradcam.py", line 106, in forward_pass
    conv_output, x = self.forward_pass_on_convolutions(x)
  File "/home/zxzn/YOLOv3-GradCAM/gradcam/gradcam.py", line 91, in forward_pass_on_convolutions
    x = x + layer_outputs[int(mdef["from"])]
TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'

Process finished with exit code 1

Why is that? I used the model trained by yolov3 of ultralytics and the corresponding CFG, but it couldn't work normally. In addition, I also took it out of the yolov3 file of ultralytics models.py , utils folder

This is my directory structure

(cam) zxzn@zxzn:~/YOLOv3-GradCAM/gradcam$ ls
cfg  data  detect.py  gradcam.py  misc_functions.py  models.py  __pycache__  test.py  train.py  utils  weights

Thank you for your help @pifalken

Which version yolov3?

What do I have to do to move it?

I am very afraid of asking this kind of basic question but I don't know how to use this program. Can someone please tell me how to use it?
I will probably incorporate "ultralytics/yolov3", but I don't know what to do after that.
environment is using Google colab.

I'll need to look into this further as something deeper is broken in the forward pass. For your specific error, you can replace:

weight 'w' did not multiply with feature map

https://github.com/pifalken/GradCAM4YOLO/blob/b0c97d57971c273d3f0069d3fbb7c4827279e88e/gradcam/gradcam.py#L156

Where can I find programs such as models.py and utils folder?

Hello.

I want to run GradCAM on the created YOLO v3 AI, so I accessed your this repository.
When I ran gradcam.py, I noticed that there was no models.py or utils folder in this repository.
So Currently, gradcam.py cannot be executed.

Where can I get the models.py program and the utils folder?

Thank you very much.

Can you tell me your yolov3 project address? I would like to combine it to see if it is convenient

Potential issue in creation of the target tensor for backpropagation

In gradcam.py, generate_cam function, the target tensor is created as follows:

# target for backprop
one_hot_output = torch.cuda.FloatTensor(1, model_output.size()[-1]).zero_()
one_hot_output[0][target_class] = 1

Given that model_output here is of shape (1, 3, 36, 52, 85) before flattening, shouldn't all elements of index [target_class] of the last dimension (# classes) be set to 1? I think setting the index of target class to 1 in the tensor after flattening may be incorrect, so I came up with this:

# I also return the dimensions of the output before flattening in `forward_pass()`
conv_output, model_output, model_output_dims = self.extractor.forward_pass(input_image)

# create tensor with original output shape
one_hot_output = torch.FloatTensor(model_output_dims.shape).zero_()
# set `target_class` to 1 over all dimensions other than the class dimension
one_hot_output[:, :, :, :, target_class] = 1
# flatten afterwards
one_hot_output = one_hot_output.view(one_hot_output.size(0), -1)

I think this makes sense, but I wanted to run it by @pifalken in case I'm wrong, otherwise this may need to be corrected in the repo. Let me know what you think!

pifalken / yolov3-gradcam Goto Github PK

yolov3-gradcam's People

Contributors

Stargazers

Watchers

Forkers

yolov3-gradcam's Issues

Output size of extractor.forward_pass()

have you finish this job?

Something's wrong

Which version yolov3?

What do I have to do to move it?

I'll need to look into this further as something deeper is broken in the forward pass. For your specific error, you can replace:

weight 'w' did not multiply with feature map

Where can I find programs such as models.py and utils folder?

Can you tell me your yolov3 project address? I would like to combine it to see if it is convenient

Potential issue in creation of the target tensor for backpropagation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent