pifalken / yolov3-gradcam Goto Github PK
View Code? Open in Web Editor NEWGradCAM algorithm implementation for YOLOv3
GradCAM algorithm implementation for YOLOv3
It seems to me that a direct forward pass via model(x)
and using the extractor's forward pass through forward_pass_on_convolutions(x)
gives outputs of different sizes.
forward_pass_on_convolutions(x)
outputs a tensor of size (1, 477360), which is the flattened form of (1, 3, 36, 52, 85) -> (1, 5616, 85) -> (1, 477360).
However, using model_output = self.model(x)
gives multiple outputs: model_output[0]
has shape (1, 7371, 85), as opposed to (1, 5616, 85) we previously obtained. I turned to model_output[1]
, which is a list of size 3, to understand what's going on:
model_output[1][0].shape -> (1, 3, 9, 13, 85) -> (1, 351, 85)
model_output[1][1].shape -> (1, 3, 18, 26, 85) -> (1, 1404, 85)
model_output[1][2].shape -> (1, 3, 36, 52, 85) -> (1, 5616, 85): this is what `forward_pass_on_convolutions(x)` returns.
Now, concatenating these along axis 1 gives us: (1, 351 + 1404 + 5616, 85) -> (1, 7371, 85): this is the shape of model_output[0]
.
The YOLOv2/YOLO9000 paper mentions the following:
Fine-Grained Features.This modified YOLO predicts detections on a 13 ร 13 feature map. While this is sufficient for large objects, it may benefit from finer grained features for localizing smaller objects. Faster R-CNN and SSD both run their proposal networks at various feature maps in the network to get a range of resolutions. We take a different approach, simply adding a passthrough layer that brings features from an earlier layer at 26 ร 26 resolution.
I infer from this that a similar feature is at work here, and results from 3 different resolutions are brought together as outputs, and concatenated to produce an output of size (1, 7371, 85). However, forward_pass_on_convolutions(x)
only provides the outputs of the 3rd resolution, hence the equality with model_output[1][2].shape -> (1, 5616, 85)
.
In light of these, I have two questions:
Why does forward_pass_on_convolutions(x)
not include the outputs of the other resolutions? It seems like in the current setting we are backpropagating with incomplete target outputs (the shape of the target outputs we generate in generate_cam
are also (1, 5616, 85)).
As a solution, I tried to generate 3 target tensors with sizes that correspond to the 3 resolutions, but only the one with size (1, 5616, 85) can be backpropagated, the others expectedly fail on model_output.backward()
due to size incompatibility. How can I go around this so that the other sizes can be backpropagated as well?
Many thanks for the help in advance.
image 1/1 data/samples/Jujube_4501.bmp: Model Summary: 222 layers, 6.1556e+07 parameters, 6.1556e+07 gradients
Traceback (most recent call last):
File "/home/zxzn/YOLOv3-GradCAM/gradcam/gradcam.py", line 178, in <module>
cam = grad_cam.generate_cam(prep_img, target_class)
File "/home/zxzn/YOLOv3-GradCAM/gradcam/gradcam.py", line 127, in generate_cam
conv_output, model_output = self.extractor.forward_pass(input_image)
File "/home/zxzn/YOLOv3-GradCAM/gradcam/gradcam.py", line 106, in forward_pass
conv_output, x = self.forward_pass_on_convolutions(x)
File "/home/zxzn/YOLOv3-GradCAM/gradcam/gradcam.py", line 91, in forward_pass_on_convolutions
x = x + layer_outputs[int(mdef["from"])]
TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
Process finished with exit code 1
Why is that? I used the model trained by yolov3 of ultralytics and the corresponding CFG, but it couldn't work normally. In addition, I also took it out of the yolov3 file of ultralytics models.py , utils folder
This is my directory structure
(cam) zxzn@zxzn:~/YOLOv3-GradCAM/gradcam$ ls
cfg data detect.py gradcam.py misc_functions.py models.py __pycache__ test.py train.py utils weights
Thank you for your help @pifalken
I am very afraid of asking this kind of basic question but I don't know how to use this program. Can someone please tell me how to use it?
I will probably incorporate "ultralytics/yolov3", but I don't know what to do after that.
environment is using Google colab.
Hello.
I want to run GradCAM on the created YOLO v3 AI, so I accessed your this repository.
When I ran gradcam.py, I noticed that there was no models.py or utils folder in this repository.
So Currently, gradcam.py cannot be executed.
Where can I get the models.py program and the utils folder?
Thank you very much.
In gradcam.py
, generate_cam
function, the target tensor is created as follows:
# target for backprop
one_hot_output = torch.cuda.FloatTensor(1, model_output.size()[-1]).zero_()
one_hot_output[0][target_class] = 1
Given that model_output
here is of shape (1, 3, 36, 52, 85) before flattening, shouldn't all elements of index [target_class] of the last dimension (# classes) be set to 1? I think setting the index of target class
to 1 in the tensor after flattening may be incorrect, so I came up with this:
# I also return the dimensions of the output before flattening in `forward_pass()`
conv_output, model_output, model_output_dims = self.extractor.forward_pass(input_image)
# create tensor with original output shape
one_hot_output = torch.FloatTensor(model_output_dims.shape).zero_()
# set `target_class` to 1 over all dimensions other than the class dimension
one_hot_output[:, :, :, :, target_class] = 1
# flatten afterwards
one_hot_output = one_hot_output.view(one_hot_output.size(0), -1)
I think this makes sense, but I wanted to run it by @pifalken in case I'm wrong, otherwise this may need to be corrected in the repo. Let me know what you think!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.