zf4444 / mmal-net Goto Github PK

This is a PyTorch implementation of the paper "Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization (MMAL-Net)" (Fan Zhang, Meng Li, Guisheng Zhai, Yizhao Liu).

Python 100.00%

fgvc-aircraft fine-grained cub-200-2011 stanford-cars fine-grained-classification weakly-supervised-localization

mmal-net's People

Contributors

Stargazers

Watchers

Forkers

dbofseuofhust cvding hongfel3 ljm198134 acero1717 bhargavipatel imjerrychen uspenat fulva1230 annopackage 1170500804 getacat wwwht kalisa1123 leofengxin bailu921 missrain777 tangohu17 zhangyimeng xiaoruishan rollingwang lvxingvir sphades costantine20 joeywu99 harshab1701 scorpiokay runtao forallx94 xizhipeng0618 markin-wang xuzhikangnba insight-mk1 ddasdkimo ahwhbc jing--li antecede deml190 zhumingxu spizberg teodorchiaburu mymuli me714 isdingzhan azuredsky zhe-meng nghgphi hnrna nastiaskuba david8388 ai-jie01 leovoltolini egger-meow zhangli1210

mmal-net's Issues

原论文指标

你好，我在cub上跑到最好88.85（epoch93），在aircraft上最好94.18（epoch97），总的训练代数都是200，一直跑不到原论文指标，可能是什么原因导致的。

GPU如何调整？

您好，请问一下如果有两张GPU卡，需要调整的地方是在 config.py 中的 CUDA_VISIBLE_DEVICES = '0' 改为 CUDA_VISIBLE_DEVICES = '0,1' 就可以了吗？
还有关于 N_list的三个参数，请问16g的显存，对于CUB数据集，应调整为多少合适？
十分感谢您的回复

模型测试需利用到pretrain_model

你好

我发现测试代码首先会在test.py中model = MainNet()处加载预训练模型 (config.py中pretrain_path = './models/pretrained/resnet50-19c8e357.pth'）

然后会在test.py中的epoch = auto_load_resume(model, pth_path, status='test')，又加载一次你训练的模型

那可不可以只加载最终的模型，而不用加载预训练模型呢？

Another N_list config [for speed up training]

@ZF1044404254,
Thank you very much for your great work.
Can you please introduce another configuration for N_list (for CAR data set) that will increase the speed of training?
(Do we need to change window_side & ratios?)

请教大佬如何在自己的数据集上调参数

大佬，我用您的代码集中跑战斗机的数据，（战斗机数据像素大小约为300*300，共16个类别，每个类约300张图片），调整了下input_size，stride，window_sized，但是都是盲目去调的，效果都不太好，最终的最好训练结果在60%左右，请大佬指导一下参数的调整方向，感谢！

Gradient of maximum score with respect to input image is 'None'

I am trying to compute the saliency maps for MMAL-Net. I am following this blog. I have trained MMAL-Net on my custom data. To compute the saliency map we first forward an image through the network and compute the score. Then we need to compute gradients of maximum score with respect to each pixel of the input image. This is done using the backward() function from torch.autograd(). In my case, when I use backward() on maximum score its gradients are None.

While the method in the blog works other model that are available in torch but for MMAL-Net gradients are None.

Is there any suggestions how to fix it or am I missing something?

你好！关于AOLM部分

请问：
AOLM中获取左上和右下的坐标值时的*32是为什么？
以及如何避免取坐标后local_image为一条线的情况？

Error in evaluation using test.py

I trained TBMSL-NET on my custom data set. During the evaluation using the test.py, I am getting the following error:

Testing
0%| | 0/15 [00:00<?, ?it/s]
Traceback (most recent call last):
File "", line 1, in
Traceback (most recent call last):
File "test.py", line 58, in
File "E:\Anaconda3\envs\tbmsl_net\lib\multiprocessing\spawn.py", line 105, in spawn_main
for i, data in enumerate(tqdm(testloader)):
File "E:\Anaconda3\envs\tbmsl_net\lib\site-packages\tqdm\std.py", line 1104, in iter
exitcode = _main(fd)
File "E:\Anaconda3\envs\tbmsl_net\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "E:\Anaconda3\envs\tbmsl_net\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "E:\Anaconda3\envs\tbmsl_net\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "E:\Anaconda3\envs\tbmsl_net\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "E:\Anaconda3\envs\tbmsl_net\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "E:\Anaconda3\envs\tbmsl_net\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "F:\Codes\TBMSL-Net\test.py", line 58, in
for i, data in enumerate(tqdm(testloader)):
File "E:\Anaconda3\envs\tbmsl_net\lib\site-packages\tqdm\std.py", line 1104, in iter
for obj in iterable:
for obj in iterable:
File "E:\Anaconda3\envs\tbmsl_net\lib\site-packages\torch\utils\data\dataloader.py", line 278, in iter
File "E:\Anaconda3\envs\tbmsl_net\lib\site-packages\torch\utils\data\dataloader.py", line 278, in iter
return _MultiProcessingDataLoaderIter(self)
File "E:\Anaconda3\envs\tbmsl_net\lib\site-packages\torch\utils\data\dataloader.py", line 682, in init
return _MultiProcessingDataLoaderIter(self)
File "E:\Anaconda3\envs\tbmsl_net\lib\site-packages\torch\utils\data\dataloader.py", line 682, in init
w.start()
File "E:\Anaconda3\envs\tbmsl_net\lib\multiprocessing\process.py", line 112, in start
w.start()
File "E:\Anaconda3\envs\tbmsl_net\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
self._popen = self._Popen(self)
File "E:\Anaconda3\envs\tbmsl_net\lib\multiprocessing\context.py", line 223, in _Popen
File "E:\Anaconda3\envs\tbmsl_net\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "E:\Anaconda3\envs\tbmsl_net\lib\multiprocessing\context.py", line 322, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "E:\Anaconda3\envs\tbmsl_net\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
return Popen(process_obj)
File "E:\Anaconda3\envs\tbmsl_net\lib\multiprocessing\popen_spawn_win32.py", line 89, in init
File "E:\Anaconda3\envs\tbmsl_net\lib\multiprocessing\popen_spawn_win32.py", line 46, in init
reduction.dump(process_obj, to_child)
File "E:\Anaconda3\envs\tbmsl_net\lib\multiprocessing\reduction.py", line 60, in dump
prep_data = spawn.get_preparation_data(process_obj._name)
File "E:\Anaconda3\envs\tbmsl_net\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "E:\Anaconda3\envs\tbmsl_net\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
ForkingPickler(file, protocol).dump(obj)

BrokenPipeError: [Errno 32] Broken pipe
0%|

I also tried it on FGVC-Aircraft data set but got the same error. What can cause this?

精度提升

我比较早看到你的论文和代码，觉得写的很好，我也跑到了和之前论文中差不多的精度，但是最近发现，你的baseline精度又提升了，但是看代码好像并无太大变化，就想请教一下，您是如何在aircraft数据集上将精度从94.50%提升到94.70%，以及在car数据集上从94.70%提升到95.00%？

如何训练自己的数据集？

您好，目前我正在参加一项医学图像的分类竞赛，我发现该竞赛中所提供的数据适用细粒度分类，通过前期学习调研考虑应用您的模型，但是我并没有找到相关训练自己数据的方法？我的数据集是采用文件夹名称进行划分的：

dataset
|
|--类别1
|----1.jpg
|----...
|--类别2
|----1.jpg
|----...
|--类别3
|----1.jpg
|----...

请问我该如何重新组织它以便放入模型进行训练？

Suggestion：

"conda create --name DCL file conda_list.txt" should be"conda create --name DCL --file conda_list.txt"

A question about result visualization.

Could you please tell me which toolbox did you use to visualize your results in section 3.1?Or some code snippets?
Thanks in advance.

convert model weight to ONNX

I appreciated your work a lot.

Did you try to convert the weight to ONNX?

I tried but failed. Thank you.

如何产生Part images的相关问题

尊敬的作者您好！
首先非常感谢您分享出的优秀成果。我在看您源码的时候，碰到了一点问题。
我在config.py文件中看到了如下信息：
# windows info for CUB
N_list = [2, 3, 2]
proposalN = sum(N_list) # proposal window num
window_side = [128, 192, 256]
iou_threshs = [0.25, 0.25, 0.25]
ratios = [[4, 4], [3, 5], [5, 3],
[6, 6], [5, 7], [7, 5],
[8, 8], [6, 10], [10, 6], [7, 9], [9, 7], [7, 10], [10, 7]]
【问题一】您在论文中提到，part image是通过滑动窗口得到的，我看代码中最后得到的part image大小是用双线性插值到224*224的对吗？那上面的这些参数具体代表什么意思呢？N_list = [2, 3, 2]这个设定我不太理解，既然后面用于非极大值抑制的iou_threshs都是0.25，为什么要分三个？（不知道我理解得对不对）然后ratios是滑动窗的尺寸比例吗？那window_side又是什么呢？

【问题二】然后关于APPM这块的源码我也读不太懂，可能上述参数的具体含义我理解不清楚导致我读代码的时候碰到比较多的问题，ratios在这里面代表的什么我一直理解不了。能麻烦您帮我注释以下代码的含义吗？非常感谢！！
class APPM(nn.Module):
def init(self):
super(APPM, self).init()
self.avgpools = [nn.AvgPool2d(ratios[i], 1) for i in range(len(ratios))]

def forward(self, proposalN, x, ratios, window_nums_sum, N_list, iou_threshs, DEVICE='cuda'):
    batch, channels, _, _ = x.size()
    avgs = [self.avgpools[i](x) for i in range(len(ratios))]

    # feature map sum
    fm_sum = [torch.sum(avgs[i], dim=1) for i in range(len(ratios))]

    all_scores = torch.cat([fm_sum[i].view(batch, -1, 1) for i in range(len(ratios))], dim=1)   #cat拼接张量
    windows_scores_np = all_scores.data.cpu().numpy()   #.cpu()将数据的处理设备从其他设备（如.cuda()拿到cpu上），不会改变变量类型，转换后仍然是Tensor变量。
                                                          # ，t.numpy()将Tensor变量转换为ndarray变量
    window_scores = torch.from_numpy(windows_scores_np).to(DEVICE).reshape(batch, -1)  #torch.from_numpy()方法把数组转换成张量，且二者共享内存，对张量进行修改比如重新赋值，那么原始数组也会相应发生改变。
            
    # nms
    proposalN_indices = []
    for i, scores in enumerate(windows_scores_np):
        indices_results = []
        for j in range(len(window_nums_sum)-1):
            indices_results.append(nms(scores[sum(window_nums_sum[:j+1]):sum(window_nums_sum[:j+2])], proposalN=N_list[j], iou_threshs=iou_threshs[j],
                                       coordinates=coordinates_cat[sum(window_nums_sum[:j+1]):sum(window_nums_sum[:j+2])]) + sum(window_nums_sum[:j+1]))
        # indices_results.reverse()
        proposalN_indices.append(np.concatenate(indices_results, 1))   # reverse

    proposalN_indices = np.array(proposalN_indices).reshape(batch, proposalN)
    proposalN_indices = torch.from_numpy(proposalN_indices).to(DEVICE)
    proposalN_windows_scores = torch.cat(
        [torch.index_select(all_score, dim=0, index=proposalN_indices[i]) for i, all_score in enumerate(all_scores)], 0).reshape(
        batch, proposalN)

    return proposalN_indices, proposalN_windows_scores, window_scores


再一次感谢您卓越的工作，祝您科研顺利，期待您的回复~

Stanford cars

请问能提供stanford cars数据集的checkpoint model吗？

AOLM的程序问题

您好，我在运行训练时遇到一个问题，源代码如下
properties = measure.regionprops(component_labels)
areas = []
for prop in properties:
areas.append(prop.area)
max_idx = areas.index(max(areas))
报错内容为
ValueError: max() arg is an empty sequence
它显示max（）中的areas为空，无法读取
我想询问是少了什么吗？

预训练的重要性？

很好的一篇论文！但是如果我选择从头开始训练这个网络，那么这些local,part模块的裁剪还能train的成功吗？就是不使用预训练，对实验结果的影响有多大的影响呢？期待您的回复！

为什么这个模块这么占显存？

There is something wrong

Hi! First of all, congratulations on your work. I want to use your model as a pretrained model for my own work.In a dataset containing bird images, I would like to use it to eliminate background or images without birds. When I run test.py on my dataset, acc comes out very low (1%).The following error message appears.("there is one img no intersection")What is the main problem.

And also I want to ask that,In your paper you said that this model extracts object location(bounding box) and discriminative parts. But when I investigate your code,It seem you are using bounding box information but you are not using directly or indirectly.I cant understand how your code works.And how can be apply your model for my custom dataset for related purpose I said before.

Thanks and be healty,Harun Alperen

gitkeep! Caution!

Config for custom datasets

Hi, Thank you for your great work.
I have a question about training on other datasets: How can I change the window_size and ratio for different kinds of data.

Thank you.

精度

Dose AOLM have bug? the measure.regionprops returns (min_row, min_col, max_row, max_col) means (y0,x0,y1,x1) right?

Hi ZF444,
First thank your for your great idea.

I have a question about: the measure.regionprops returns (min_row, min_col, max_row, max_col) means (y0,x0,y1,x1) right? not (x0,y0,x1,y1)

https://github.com/1170500804/tbmsl/blob/67e9013756d8c11cb358f12bf4104d56bb3e1ed2/utils/AOLM.py#L38

About ensemble and training details.

Hi, thanks for your simple and efficient methods. I have some comments for your network.
1、Your classification results are based on the output of the second branch. Have you ever tried ensemble three branch. Does it improved?
2、In part branch, input size is 224 * 224 rather than 448*448。This is tricky or not?
3、The metrics of localization iou is computed by total images or test dataset only?

如果图片中没有主体，本文方法是否有效？

常用的细粒度分类数据其实都是有主体的，例如cub数据，图片主体就是鸟。先进行AOLM找出主体，然后再进行APPM，找出所有判别区域，是一个有效的做法，但是有些细粒度数据没有主体哦。

有些数据集本身没有主体，例如每张图片存在好几个分散的判别区域。如果直接用本文方法好像不太行，因为AOLM仅仅会crop出主体，请问我所述的场景，如何进行适当修改呢？

训练速度的问题

您好，我在跑你的代码。cub200用v100卡跑需要20分钟才能完成一个epoch的训练，而且gpu利用率很低，有办法提高速度么？

Errors in test.py

There is a mistake that 'Break' should be deleted.
https://github.com/ZF1044404254/TBMSL-Net/blob/9437d1c3d24d4975d72091c84d11bd533e4343c0/test.py#L66

Besides, I have checked the checkpoint model provided for CUB dataset, getting 89.63% accuracy. Thanks.
However, errors occur when another checkpoint model provided for Aircraft is loaded. The file named 'air_epoch146.pth' has different keys, as shown below:

三个分支是共享权重的？

可视化

您好，我想知道AOLM模块是如何可视化的。

断点继续训练报错

感谢您的工作！
在我的训练过程中发现
从上一个检查点继续训练时，发生错误，RuntimeError:in loading state_dict for MainNet:Missing key in state_dict:...
请问是否与代码auto_load_resume.py中第18到21行有关？

About N_list and ratios in the code

Hi,

Thanks for your great work and code.

I am not familiar with the detection task. Could you please tell me what N_list and ratios stand for and how to choose them for my own dataset?

问下,你们有没有做过计算量的对比?

这个和racnn对比计算量多了不少

Something went wrong when changing default config

I have implemented your code and trained with "Aircraft" dataset. It worked normally until I tried to change your CE loss function into ArcFace loss and your SGD optimizer into Adam. The code still worked but I achieve log: "there is one img no intersection" and the accuracy is very low (approximately 1%). What happened?

There is my ArcFace loss

class ArcFaceLoss(nn.Module):
    def __init__(self, s=30.0, m=0.50, is_cuda=True, base_loss = 'CrossEntropyLoss'):
        super(ArcFaceLoss, self).__init__()
        self.s = s
        self.m = m
        self.criterion = nn.CrossEntropyLoss()
        self.criterion = self.criterion.cuda()

    def forward(self, input, label):
        theta = torch.acos(torch.clamp(input, -1.0 + 1e-7, 1.0 - 1e-7))
        target_logits = torch.cos(theta + self.m) 
        one_hot = torch.zeros_like(input)
        one_hot.scatter_(1, label.view(-1, 1).long(), 1)
        output = input * (1 - one_hot) + target_logits * one_hot
        output = output * self.s
        return self.criterion(output, label)

Is this correct?

Forget this.

Can you share your ResNet-50 baseline of each dataset？

别的数据集如何设置N_list和ratios

大佬，我想用这个框架跑其他数据集，如何设置和N_list和ratios，window_side需要改变吗

当我开始训练时，出现如下警告

UserWarning: Possibly corrupt EXIF data. Expecting to read 200 bytes but only got 0. Skipping tag 0
UserWarning: Possibly corrupt EXIF data. Expecting to read 143 bytes but only got 0. Skipping tag 0
UserWarning: Possibly corrupt EXIF data. Expecting to read 393216 bytes but only got 0. Skipping tag 0

因为是官方数据集，所以图片损坏的可能性不大，但是不知道问题出在了哪里，请问可以解答一下吗？