zhoubolei / cam Goto Github PK
View Code? Open in Web Editor NEWClass Activation Mapping
Home Page: http://cnnlocalization.csail.mit.edu/
License: MIT License
Class Activation Mapping
Home Page: http://cnnlocalization.csail.mit.edu/
License: MIT License
The error is :
Check failed: ReadProtoFromBinaryFile(param_file, param) Failed to parse NetParameter file: CAM/models/vgg16CAM_train_iter_90000.caffemodel
*** Check failure stack trace: ***
I think it might be the problem of the version of caffe...
The function crop2img seems to combine the "gradients" into one image. Since the resulting variables are called "alignImgMean" and "alignImgSet", i guess this function is somehow related to the "mergeTenCrop.m" script. So I try to reproduce crop2img with mergeTenCrop. What I didn't figure out so far is whether crop2img would convert the gradients or not.
https://github.com/metalbubble/CAM/blob/18419ae817d9fcda72bb5fcbe132113ce9d58cc8/mergeTenCrop.m#L6
The function mergeTenCrops for example needs a 256,256,1,10 matrix (cropImgSet), but ILSVRC_generate_heatmap.m provides 256,256,3,10 (CAMmap_crops) as input.
Am I supposed to pre-convert the gradients?
If true, Is this third dimension used for color channel (3 = rgb, 1=black/white) ?
Is crop2img really based on mergeTenCrop or did I miss something?
Could you please provide an explanation on the purpose of/how to reimplement crop2img?
Thanks for your effort. Could you please share the pretrained vgg16CAM model?
Thanks.
I'm hoping to use this to train on my own data from scratch. I have images that have point annotations in the format of coordinates. What is the format of the annotations required for CAM?
Nice online demo with Class Activation Mapping and Squeezenet-v1.1:
https://transcranial.github.io/keras-js/#/squeezenet-v1.1
Why do you use average pooling instead of global average pooling like in paper?
layers {
name: "CAM_pool"
type: POOLING
bottom: "CAM_conv"
top: "CAM_pool"
pooling_param {
pool: AVE
kernel_size: 14
stride: 14
}
}
Hello, I want to use the activation map cam to activate multiple tags. Using the trained model, the remote sensing images feel the same, and all are activated. Can you help me solve the puzzle? The following is a map of remote sensing (in the data set used in the training model) and two different targets
The trained models provided at http://cnnlocalization.csail.mit.edu/demoCAM/models/imagenet_googleletCAM_train_iter_120000.caffemodel
which is downloaded using the download.sh script is too small (3.75MB) , also it gives error while running the demo.m with online=1
Check failed: ReadProtoFromBinaryFile(param_file, param) Failed to parse NetParameter file: models/imagenet_googleletCAM_train_iter_120000.caffemodel
I use Caffe to fine-tune VGG-16 on a new data set. After fine-tuning, I get the fine-tuned model. So I want to ask that how to generate CAM for this fine-tuned model and custom data set?
Could you please share the python code that generate the bounding box according to the heatmap?
Originally posted by @liu666666 in #18 (comment)
for example, the task is to classify dog or not, and we label dog as class 1.
when comes a dog image, class 1 focus on any part except dog;
on the contrary, class 0 focus right on dog.
.
In the paper you say that you finetune the model, so do you finetune the whole network or only the part of the network where you introduced the 331024 convolutions followed by global average pooling followed by soft max. We have very less compute so this detail may help us wasting out time and help us concentrate on other tasks.
Thank you.
Thanks for your effort. Could you please share the pretrained vgg16CAM model?
I have tried to convert caffemodel to torchmodel accroding to prototxt and .caffemodel, but the accuracy is much lower than it should be. Would you share the pretrained vgg16CAM model with me? Or Is there anything I should pay special attention to?
Thanks.
As far as I can see, the size of heatmap for each testing image should be 224-224. So in mergeTenCrop.m, the size of input, i.e. CAMmap_crops, is expected to be 224-224-10, not 256-256-10. And the size of cropImgSet should also be changed to 224-224-3-10.
I mainly tackle regression problems by CNNs, and want to find a reliable method to calculate the heatmaps for NN's reuslts. However , i find almost all interpretation methods including CAM is used for classification NNs but not for regression NNs. Is there any interpretation method suitable for CNNs which do regression tasks?
Hi, when I run ILSVRC_evaluate_bbox.m to evaluate the bbox, I get the error as follow:
"Undefined function 'loadHeatMap' for input arguments of type 'char'."
The function "loadHeatMap" used in ILSVRC_evaluate_bbox.m (line 73) is undefined. Would you like to update this script? Thank you!
GoogLeNet-CAM model on ImageNet.
I was not able to find the file: models/deploy_googlenetCAM.prototxt to use with the pre-trained weights. Is it possible to add this in?
Can you provide the dataset you are using?
Hi.
I was looking into the code for bounding box generation in C, and I noticed that you have accumulated bounding boxes for different thresholds (with default values being 30, 90, and 150) bboxGenerator/dt_box.cpp
. However, in the paper, only one threshold is mentioned (20% of the largest value in the mapping).
It would be great if you can elaborate on this difference. Am I missing something?
你好,label地址好像有问题,访问地址的时候提示错误,下载不了Jason文件,请问该怎么解决
if i want to train a cam which is used medical image(3 class),which loss is my need?
I am the developer of CAM. Recently I found this blog article (https://thehive.ai/blog/inside-a-neural-networks-mind) to introduce CAM and grad-CAM. The overview on the CAM and grad-CAM in the blog article is good, but found there is some bias or misleading claim to CAM, compared to grad-CAM. This wrong claim has been existing for a while that I would like to clarify as below:
First of all, nowadays all the mainstream network architectures such as resnet, densenet, or other squeezenet use global average pooling at the end, so the class activation map (the heatmap) could be generated directly using CAM, without modifying any network architectures. So the claim that the grad-Cam is superior over CAM because of using grad-cam without modifying architecture is false.
Meanwhile, if you are using resnet or densenet or squeeznet or any modern networks, so you can basically generate heatmap using CAM directly (see example code at https://github.com/metalbubble/CAM/blob/master/pytorch_CAM.py), without needing the extra step to compute the gradient as in grad-CAM. Through that you save the backward computation, in which you save almost half of the computation. This is crucial in some application such as video processing that CAM is able to use the forward pass only to generate the prediction and heatmap for each frame. So in the associated code of that blog (https://github.com/hiveml/tensorflow-grad-cam), they are already using the resnet, but still use the gradient to generate CAM. It simply wastes the computation.
I want to evaluate the CAM algorithm with the provided evaluation script.
It occurs to me, that the "sizeImg_ILSVRC2014.mat" file is missing. It is neither located in this projects folder, nor in the ILSVCR devkit. In Google, this file doesn't seem to be mentioned, either.
So where/how do i get this file?
Traceback (most recent call last):
File "pytorch_CAM.py", line 4, in
import requests
ModuleNotFoundError: No module named 'requests'
Hello,
I saw that in Places365 CAM code, a thresholding is applied over the softmax layer weights while in this repository and in the original paper, there's no mention about this trick.
Is there an explanation for this?
Thanks!
Meet this interesting work so late.
Here is my little doubt.
Line 48 in c63f285
weight_softmax[idx]
should be 512. However, for layer4
's nc
, it should be 256. Is there a mistake here? In other words, I suspect that CAM can only be used for the last layer, so as to match the dimension of 512.resize
is a bit rough.For example I have 2 classes and for single image I have heatmap with shape [H,W,N_CLASSES]. I train my model with sigmoid + binary crossentropy. At prediction time when I use larger image as network input, at each pixel I want my classes to be exclusive, so I need to compare heatmap values with np.argmax
to get 'best' class, so my questions is values in heatmaps are really comparable?
I tried to dump min, max values of heatmap for single image:
i 0
np.min(heatmap[:,:,i]),np.max(heatmap[:,:,i]) -38.4533 19.9384
i 1
np.min(heatmap[:,:,i]),np.max(heatmap[:,:,i]) -20.2977 34.8101
As I can see range of values are different and heatmaps are not normalized.
Is there a way to normalize all planes of heatmap to [0,1] range and make them comparable?
What if i want to visualize the intermediate feature map and its channel is difference from the last layer. For example, Visualize the layer3 in resnet18 in pytorch.
Current repository includes the train_val file and solver of the GoogleCAM, have you tried other models like VGGNet or ResNet for CAM?
It's confused to find dataset/ILSVRC2012/imageListVal.mat and sizeImg_ILSVRC2014.mat.
Can you apply more details in data preparing?
how to train the model?
i have my dataset, and i want to train the model based on my dataset...
Hi ,
Actually I am new to this field and I am now checking out the possibility of using CAM for Weakly Supervised Object Localization.
As far as I understood , If I have trained a classifier (lets say to classify 3 objects) with network architectures such as resnet, densenet, or other squeezenet that use global average pooling at the end . I can apply your pytorch_CAM.py in the repository to generate heat_maps for those 3 classes.
It would be great , if anyone can correct me if I am wrong.
Thanks
rahul
I am not familiar to C++. I want to run your code generate_bbox.m
, but when it runs to the line
system(['bboxgenerator/./dt_box ' curHeatMapFile ' ' curParaThreshold ' ' curBBoxFile]);
some errors occur. e.g., the generate_bbox cannot be used.
I have some questions:
I have a question about the results on the classification in the paper:
For classification, why do you compare your GAP network with the NIN? NIN also has the GAP layer, then what can be included when the GAP network performs better than NIN?
I am really a little confused about this, so could you be so kind to give me some explanations? Thanks!
@metalbubble @ajschumacher
Thank you so much for your code, it is helpful for me. I miss some trouble:
"online = 0; % whether extract features online or load pre-extracted features"
In this line, I want to use my own dataset and extract features online, but when I change "online=1", the results seem terrible.
do you think what's the reason? and is there any other code I need to change?
Thank you so much again and wait for your reply.
Traceback (most recent call last):
File "CAM3.py", line 69, in
h_x = F.softmax(logit, dim=1).data.squeeze()
File "/home/omnisky/anaconda2/envs/swin/lib/python3.7/site-packages/torch/nn/functional.py", line 1512, in softmax
ret = input.softmax(dim)
AttributeError: 'tuple' object has no attribute 'softmax'
When I put pytorch_ CAM. Py encountered the above problems when it was used in my own model. Do you know how to solve them? Thank you very much and look forward to your reply@Bolei Zhou
I am using this script in pytorch available in your repository.
Currently, this line is
cam = weight_softmax[class_idx].dot(feature_conv.reshape((nc, h*w)))
But I think it should be,
cam = weight_softmax[idx].dot(feature_conv.reshape((nc, h*w)))
class_idx should be replaced by idx, becuase class_idx is a list, idx is a integer.
Hello,how is the heatmap_6.txt generated?
Hello, I read the code and paper "Learning Deep Features for Discriminative Localization". Now I'm trying to apply this method to a smaller class label case (only 2). Everything was okay before the up-sampling part. However, In the up-sampling part (Maybe mergeTencrops.m are doing this) I can't get the technique you used, so now I'm trying to search in google or find some paper about it if you tell some information about the technique.
In sum, can you tell me about the name of technique you used in mergeTencrops ??
Thank you for reading.
Hello! I appreciate your work very well.
When I run the demo, there is an error like this:
error while loading shared libraries: libopencv_core.so.2.4: cannot open shared object file: No such file or directory
How can I solve it? Thank you!
Are there any methods to use the CAM on re-id datasets e.g. market1501? the tiny images can not show the great performance of CAM.
I have tried your demo using the pre-trained CAM models. I now want to try it with my own pre-trained Caffe CNNs.
I would be very grateful if you could demonstrate the modifications that are required to convert a pre-trained non-CAM model to the CAM format. Is it as simple as modifying the prototxt files directly? Thanks.
Does the VggNet apply the same data preparing (prepare_image.m) as GoogLetNet?
How to fine tune your model?
I don't have sufficient data to retrain your model from scratch.I want to fine tune your model on my data which has only two classes ?
In the demo code of generating bounding box for CAM, the code seems to generate multiple boxes for one map, how can I modify the code so that one map only generate one box for localization?
Thank you for sharing the code.but can it be used for detection task?
Thanks for your providing of the source code, which helps me a lot.
I found a mistake in mergeTenCrop.m.
In line 25, 26
alignImgSet(i:i+cropSize-1, j:j+cropSize-1,:,curr) = curCrop1;
alignImgSet(i:i+cropSize-1, j:j+cropSize-1,:, curr+5) = curCrop2;
They maybe modified into
alignImgSet(j:j+cropSize-1, i:i+cropSize-1,:,curr) = curCrop1;
alignImgSet(j:j+cropSize-1, i:i+cropSize-1,:, curr+5) = curCrop2;
You can check it whether it is right.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.