jdai-cv / dcl Goto Github PK

View Code? Open in Web Editor NEW

583.0 583.0 158.0 262 KB

Destruction and Construction Learning for Fine-grained Image Recognition

License: Other

Python 100.00%

dcl's People

Stargazers

Watchers

Forkers

zhengyu19921215 zongking123 goingboy sweaterr tqdavid michaelfan01 jdc08161063 wyvernbai allenwutao lunchben lxc86739795 terrencewayne lingfengyueguang zhudd-hub leowangzi gsx0 ruiling1211 liuboss1992 paul0m wgqtmac youngadvance wolfworld6 mymuli goldentimecoolk kingsley851102 inmgjim forrest-ht xpain1 reborm zjj-2015 tweetyone meta-ai-learner willforcv janicelc yjzst 459548764 jryongithub cbanyungong epiloguecc ljm198134 vanchengkai zambashilidye dreadlord1984 ahsan856jalal bachelorwangwei eycab tyomj tony-hou lyqsr chl916185 minzhangm thea1 w-o-w leavelove wuseguang minushuang ahagary alphagowisdom wenbinlee tangmingyi yyqgood xshuyu neo-kang damonzhenghuang kejiejiang qiujinyi linshufei clscy tranvnhan lliai happog oujieww pp-zhangyi dzcgaara bruceyang2012 wuw2019 whatnamdouwant chaseshen-ai runninggump sys12135 adagiozbc yipeng-sun luluqie hzh0303 zhanqiqi miao404 apllolulu a-daniu daaiyiyejian tcexeexe paulhendricks bysen32 zjuqiushi xfanx sakura-inori guo123123 yc139464127 youtang1993 llljun leeyegy

dcl's Issues

TypeError: can't convert np.ndarray of type numpy.str_.

evaluating val ...
Traceback (most recent call last):
File "train.py", line 228, in
checkpoint=args.check_point)
File "/home/wmj/DCL/utils/train_model.py", line 139, in train
val_acc1, val_acc2, val_acc3 = eval_turn(Config, model, data_loader['val'], 'val', epoch, log_file)
File "/home/wmj/DCL/utils/eval_model.py", line 43, in eval_turn
labels = Variable(torch.from_numpy(np.array(data_val[1])).long().cuda())
TypeError: can't convert np.ndarray of type numpy.str_. The only supported types are: float64, float32, float16, int64, int32, int16, int8, uint8, and bool.

How can I solve this issue?

保存的模型有一层维度出现偏差

运行CUB数据集，保存模型，加载后发现有一层维度不一样
saved_model:
......
model.7.2.bn3.running_vartorch.Size([2048])
model.7.2.bn3.num_batches_trackedtorch.Size([])
classifier.weighttorch.Size([200, 2048])
classifier_swap.weighttorch.Size([400, 2048])
Convmask.weighttorch.Size([1, 2048, 1, 1])
Convmask.biastorch.Size([1])

origin_model:
......
model.7.2.bn3.running_vartorch.Size([2048])
model.7.2.bn3.num_batches_trackedtorch.Size([])
classifier.weighttorch.Size([200, 2048])
classifier_swap.weighttorch.Size([2, 2048])
Convmask.weighttorch.Size([1, 2048, 1, 1])
Convmask.biastorch.Size([1])

咋回事呢

What are the training hyper parameters to get top1 acc>0.87?

What are the training hyper parameters to get top1 acc>0.87, like batch, lr strategy? and how many epochs do you run?
I can only get 0.83 on the CUB_200_2001. I divided the data sets to 70% training, 30% validation.
Please do not close the issues so fast.
Thanks.

Where is the 'pretrainedmodels' in 'LoadModel.py'

about dataloader

in dataset_DCL.py
function getitem()
there is :
return img_unswap, img_swap, label, label_swap, swap_law1, swap_law2, self.paths[item]
when i debug in train_model.py,
inputs, labels, labels_swap, swap_law, img_names = data
why they are different items, and why the batch of inputs are twice of batch_size

can you help me find out why?

No module named 'models'

Traceback (most recent call last): File "train_rel.py", line 16, in <module> from models.resnet_swap_2loss_add import resnet_swap_2loss_add ModuleNotFoundError: No module named 'models'

from resnet_swap_2loss_add import resnet_swap_2loss_add
AttributeError: 'resnet_swap_2loss_add' object has no attribute 'module'

maybe you just delete .module if you just you use one gpu.
when I use model = nn.DataParallel(model) my gpu is stuck and kill -9 pid is not response

return运行报错 torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

return torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

very hard to understand the code.

What is Asoftmax_linear.py?

I know it is not used in the default configuration. But what is it? Please give a reference. Thanks.

关于标签的问题

开源代码中给的CUB下的文件train.txt和val.txt中图片的标签是从1-200可是如果按照文件中的标签的话带进去算loss会出错的，还不是把标签值改到0-199呢

What the mean of swap_law1

Hi, I can not understand the use of swap_law1, and why use 24 but not other number
swap_law1 = [(i-24)/49 for i in range(crop_num[0]*crop_num[1])]

使用mobilenet作为backbone网络

你好，
我是新手上路，resnet50作为特征提取网络，模型太大了，使用mobilenetv2替换resnet50在stanford cars上训练，acc只有0.74。请问大佬是否尝试过轻量型网络，是否适合这类任务，有什么优化措施？

RuntimeError: cudaEventSynchronize in future::wait: device-side assert triggered

I can run it only for a few steps.

ljy@scw4750:~/liang-codes/DCL-master$ python train.py --data CUB --epoch 360 --backbone resnet50                     --tb 16 --tnw 16 --vb 512 --vnw 16                     --lr 0.0008 --lr_step 60                     --cls_lr_ratio 10 --start_epoch 0                     --detail training_descibe --size 512                     --crop 448 --cls_mul --swap_num 7 7
Namespace(auto_resume=False, backbone='resnet50', base_lr=0.0008, check_point=5000, cls_2=False, cls_lr_ratio=10.0, cls_mul=True, crop_resolution=448, dataset='CUB', decay_step=60, discribe='training_descibe', epoch=360, resize_resolution=512, resume=None, save_point=5000, start_epoch=0, swap_num=[7, 7], train_batch=16, train_num_workers=16, val_batch=512, val_num_workers=16)
Choose model and train set
resnet50
train from imagenet pretrained models ...
Set cache dir
the num of new layers: 4
step:        1 / 375 loss=ce_loss+swap_loss+law_loss: 11.6468 = 5.2024 + 6.0823 + 0.3620 
step:        2 / 375 loss=ce_loss+swap_loss+law_loss: 11.6235 = 5.2029 + 5.9551 + 0.4655 
step:        3 / 375 loss=ce_loss+swap_loss+law_loss: 11.7090 = 5.1929 + 6.0129 + 0.5033 
step:        4 / 375 loss=ce_loss+swap_loss+law_loss: 11.6939 = 5.3655 + 5.9946 + 0.3338 
step:        5 / 375 loss=ce_loss+swap_loss+law_loss: 11.6287 = 5.3284 + 5.9917 + 0.3086 
step:        6 / 375 loss=ce_loss+swap_loss+law_loss: 12.3495 = 5.6930 + 6.1811 + 0.4753 
step:        7 / 375 loss=ce_loss+swap_loss+law_loss: 12.0914 = 5.4335 + 6.0904 + 0.5675 
step:        8 / 375 loss=ce_loss+swap_loss+law_loss: 12.1460 = 5.6956 + 6.1155 + 0.3348 
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [2,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [3,0,0] Assertion `t >= 0 && t < n_classes` failed.
CUDA error after cudaEventDestroy in future dtor: device-side assert triggeredTraceback (most recent call last):
  File "train.py", line 229, in <module>
    checkpoint=args.check_point)
  File "/home/ljy/liang-codes/DCL-master/utils/train_model.py", line 111, in train
    law_loss = add_loss(outputs[2], swap_law) * gamma_
  File "/home/ljy/anaconda3/envs/p36c8ljy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ljy/anaconda3/envs/p36c8ljy/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 85, in forward
    reduce=self.reduce)
  File "/home/ljy/anaconda3/envs/p36c8ljy/lib/python3.6/site-packages/torch/nn/functional.py", line 1558, in l1_loss
    input, target, size_average, reduce)
  File "/home/ljy/anaconda3/envs/p36c8ljy/lib/python3.6/site-packages/torch/nn/functional.py", line 1537, in _pointwise_loss
    return lambd_optimized(input, target, size_average, reduce)
RuntimeError: cudaEventSynchronize in future::wait: device-side assert triggered

performance on FGVC-Aircraft dataset

Hello, I retrained DCL on Aircraft dataset. All hyper parameters except N followed default setting. When I set N as 7 and 2, I got the highest accuracy 90.4% and 92.3% respectively. But the reported results were 92.2% and 93.0%.
Is there anything wrong? Could someone provide suggestions on reproducing the reported results?
THX!!!

work well

Applied to my own data, the effect is indeed improved, and it works very well! The disadvantage is that the code quality is not very good.

question about the training time

How long does it take you to train the product dataset with 4 P40?

Excuse me, how to test with test.py?

When I ran test.py, I encountered an ‘no attribute 'submit'’ error，and ‘unswap’ is unexpected.Please tell me how to run test.py, thanks.

咨询

作者，你好，你在说明里面写到这个算法已经用到了京东商品识别上面，我想问，CUB的类别仅仅200类，而实际商品类别数量可能上万，并且你的网络有2个全连接，这样全连接导致的显存占用以及参数量会剧增，请问你如何解决这种问题呢。
总是在小规模数据上，并不能很好的看出网络的真实性能的。假如有大规模数据，那么如何能实战呢？

test.py #75

got an unexpected keyword argument 'unswap'?

CUDA Error while running the code

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu line=110 error=59 : device-side assert triggered
Traceback (most recent call last):
File "mytrain2.py", line 146, in
save_dir=save_dir)
File "/media/HDD_3TB2/rupali/Code/DCL/utils/train_util_DCL.py", line 60, in train
loss = criterion(outputs[0], labels)
File "/home/rupali/anaconda3/envs/EnvPytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/rupali/anaconda3/envs/EnvPytorch/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 942, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/home/rupali/anaconda3/envs/EnvPytorch/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/rupali/anaconda3/envs/EnvPytorch/lib/python3.7/site-packages/torch/nn/functional.py", line 1871, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu:110

RuntimeError: reduce failed to synchronize: device-side assert triggered

I git clone the code, down the aricraft dataset, and use the datasets/AIR/train.txt" as the annotation. But when run the code, it occurs RuntimeError: reduce failed to synchronize: device-side assert triggered`.

/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [20,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [21,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
  File "train.py", line 229, in <module>
    checkpoint=args.check_point)
  File "/home/workspace/git/python/DCL/utils/train_model.py", line 111, in train
    law_loss = add_loss(outputs[2], swap_law) * gamma_
  File "/home/miniconda3/envs/cuda92/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/miniconda3/envs/cuda92/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 87, in forward
    return F.l1_loss(input, target, reduction=self.reduction)
  File "/home/miniconda3/envs/cuda92/lib/python3.6/site-packages/torch/nn/functional.py", line 1702, in l1_loss
    input, target, reduction)
  File "/home/miniconda3/envs/cuda92/lib/python3.6/site-packages/torch/nn/functional.py", line 1674, in _pointwise_loss
    return lambd_optimized(input, target, reduction)
RuntimeError: reduce failed to synchronize: device-side assert triggered

After check the file datasets/AIR/train.txt", I find the class ids range in [1, 100], but it shoud be in [0, 99]. So it Assertion t >= 0 && t < n_classes failed`.

Solution: substract 1 for each label number, in the file dataset_DCL.py#L41

from: self.labels = anno['label'].tolist()
to : self.labels = [int(x)-1 for x in anno['label'].tolist()]

给位大佬，能问一下utils里面的Asoftmax和autoaugment与dataset_DCL三个网络主要是做什么的吗？

请问test怎么运行呢？

ct_train/val/test file

Hi,

Can you share the ct_train/val/test.txt files of CUB/ Stanford Car/Aircraft to us?

Thank you.

Kind regards.

Kiki

train_model.py #85 为什么要在这做 inputs.size(0) < 2*train_batch_size 这个判断呢？

也就是在model中的 last_cont 是表示什么含义呢？对应于论文上的哪一部分呢？根据上面的判断，只会在train 的最后一步，数据不够的一个batch的时候触发，为什么要单独处理呢？

test.py parameter setting problem

'--save', dest='resume'

'--ss', dest='save_suffix'

How do the above two parameters of test.py be set?

Can you post the python script to prepare the data from standard data sets?

For example, the downloaded CUB_200_2001 has given format. Can you post your script to prepare the data from CUB_200_2001 to your own format?
Thanks a lot.

why this situation happen?

Is it possible to get the ground truth labels of TEST set of FGVC product dataset?

Dear authors, thanks for sharing your amazing work. However, I have met a little problem.

I wonder how to evaluate the recognition performance on FGVC product dataset, since the test labels of FGVC dataset are not provided at all. I also noticed that the FGVC competition now is closed and I cannot submit a submission to kaggle server https://www.kaggle.com/c/imaterialist-product-2019/submit. So currently, is it possible to get the ground truth labels of TEST set of FGVC product datasets?

I am looking for your kind help. Thanks!

Has anyone retrain the project? I can’t get the accuracy reported in the paper.

Has anyone retrain the project? Can the accuracy of the paper be achieved? I didn't change the code. And the best result was 0.8669.

RuntimeError: Error(s) in loading state_dict for MainModel

Dear Authors:
I use single GPU(TITAN Xp 12G) to train and test on datasets(STCAR,CUB,AIR). No errors appear while training. But when I test, I get the same problem on three dataset which is the weight could not be loaded. Details is shown as below.
/usr/bin/python3.6 /media/duanzd/local/DCL-master/test.py --ver test --acc_report --data STCAR --backbone resnet50 --save /media/duanzd/local/DCL_weights/net_model/DCL_STCAR/weights_358_508_1.0000_1.0000.pth
Namespace(acc_report=True, backbone='resnet50', batch_size=16, crop_resolution=448, dataset='STCAR', num_workers=16, resize_resolution=512, resume='/media/duanzd/local/DCL_weights/net_model/DCL_STCAR/weights_358_508_1.0000_1.0000.pth', save_suffix=None, swap_num=[7, 7], version='test')
resnet50
Traceback (most recent call last):
File "/media/duanzd/local/DCL-master/test.py", line 93, in
model.load_state_dict(model_dict)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 721, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for MainModel:
While copying the parameter named "classifier_swap.weight", whose dimensions in the model are torch.Size([2, 2048]) and whose dimensions in the checkpoint are torch.Size([392, 2048]).
I wonder if there are some wrong settings when I train the weight. Thank you for help.

how to test?when i use the test.py , how to set the relevant parameters?

'MainModel' object has no attribute 'module'

Traceback (most recent call last): │
File "train.py", line 194, in │
ignored_params1 = list(map(id, model.module.classifier.parameters())) │
File "/home/weiyanyan/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 518, in getattr │
type(self).name, name)) │
AttributeError: 'MainModel' object has no attribute 'module'

Some inconsistance bettween paper and code

Dear @wyvernbai,
I might found some inconsistance bettween paper and code. Hopefuly looking forward to your further explaination.
Firstly, you mentioned θadv ∈ Rd×2 in the paper, which means discriminator is a two-way classifier. However, the code says it is a 2 * num_class way classifier.
Sencondly, adv loss is usually optimized by a minmax game, but I did not find any minmax optimazation within your code. Is that Intended？
Lastly, the paper says "the outputs are handled by an ReLU and an average pooling to
get a map with the size of 2×N ×N" to produce location map. But convmask(2048, 1, ,1) + avepool2 in the code might not take that effects?

fix random seed to get identical result

Hi, thanks for sharing your work. Can you please fix the random seed so that we could have a consistent result? which may help further research. (I failed to fix it, cause some random sources reach beyond my knowledge.)

Thanks again.

有人试过把这个思路应用到人脸识别上吗？

Training from different backbone

I want to train Stanford dog dataset with DCL from a pretrained model, when I start my training job like this:

python train.py --data product --epoch 360 --backbone resnet50 \
                    --tb 16 --tnw 16 --vb 512 --vnw 16 \
                    --lr 0.0008 --lr_step 60 \
                    --cls_lr_ratio 10 --start_epoch 1 \
                    --detail resnet50_zqs --size 512 \
                    --crop 448 --cls_mul --swap_num 7 7

It works just fine, I got a 84% accuracy. So I try to train it with a larger backbone like this:

python train.py --data product --epoch 60 --backbone senet154 \
                    --tb 16 --tnw 32 --vb 512 --vnw 32 \
                    --lr 0.01 --lr_step 12 \
                    --cls_lr_ratio 10 --start_epoch 1 \
                    --detail senet154_zqs --size 512 \
                    --crop 448 --cls_2 --swap_num 7 7

However, it ended with 29% accuracy. I don't know wether there are something wrong with my hyper parameter, can anyone help ?

I have already downloaded the pretrained model and put it in the correct path, and in config.py i added it like this:
pretrained_model = {'resnet50': './models/pretrained/resnet50-19c8e357.pth',
'senet154': './models/pretrained/senet154-c7b49a05.pth'}

咨询

作者，您好。
我想咨询一下，basline的问题，我用resnet50在CUB上面训练，学习率设置和您的代码一样，共训练300epoch，每90个epoch降低到上一次学习率的1/10，但是我发现在前面训练的过程中，网络测试精度上升非常缓慢，到最后也只是52%的测试精度；
请问您训练baseline有什么技巧么？
谢谢。

More than one objects in image and Feature Vector for Image Retrieval

Nice work!
wonder 2 things:
1st:
DCL focuses on the detailed discriminant part, then extract features, and then do classification. So if there are more than one objects [dog and cat] in image, have you tested the impact on DCL?
2nd:
Have you extented DCL feature vector for Image Retrieval?
since classifier layer is 2048->num_classes and there are many SKUs in e-commerce, wondering how to bridge the gap of the former two?
If there are 1 billion SKUs, just using a small and finite part of the SKUs may not good enough. such as 2048->1000SKUs, or 20480->10000SKUs

test.py size of tensor doesn't match

hi,
I successfully run the training code. But when it turned to test. I don't know how to set the parser.
I run train.py --cls_mul with resnet50

in test.py, I use val set for test, annotated line63-66( dont use ct_test.txt ), annotated line 73 (undefined) , annotated line 94 (only using 1 device), replace line89 resume to the path of my model.

But it turn out that in line111:
RuntimeError: the size of tensor a( **) must match the size of tensor b(2) at non-singleton dimension 1

a () is the same as the clsnum I set in config.py, however output[1] seems not to match that size.

I'm really interested with your project and what's wrong with my process..

When will your paper be public? Could u upload it to arxiv?

training hyper parameters

what's the meaning of the training hyper parameter '--cls_2' or '--cls_mul'? and what's the different between of them?

nn.DataParallel problem

When use the nn.DataParallel, I got OOM even setting train_batch to 1. And after I shutdown nn.DataParallel, it runs smoothly with train batch setting to 16 and use 12G memory of titan xp. And by looking issues in the previous code, i got that train DCL with train batch 16 use 88GB meory of 4 P40s. And I can't figure out why it uses so much memory and did I got things right.
My environment info is as below:
OS: Ubuntu 16.04
Pytorch: .0.4.1
nvidia-driver: 384.90
CUDA; 9.0

how to understand the parameter of cls_2 and cls_2xmul?

In the paper, the parameter of cls_2 is indicating whether the image is destructed or not. It is confirmed to the code in "LoadModel.py".
However, what is the meaning of the parameter of cls_2xmul in "LoadModel.py". the output channels is 2*num_class. And when using the cls_2xmul, the swap label is "label+numcls" in "dataset_DCL.py".

Hope to answer, thanks.

Does the stanford cars data set need to be split into train/val/test？

If I run the code on the stanford cars dataset, I need to split the dataset into three parts of train/val/test and store the labels in three files: train.txt, val.txt, and test.txt?

problem with st_car datasets

It seems that the train.txt of STCAR porvided by source code is incorrect, the filename in the train.txt can not map to the source images. such as 013178.jpg ,013191.jpg,013434.jpg and so on.

how can I continue to train from the previous saved model？

Follow parameters setting seems doesn't work. The previous saved model has not been loaded.

python train.py --data product --epoch 60 --backbone senet154
--tb 96 --tnw 32 --vb 512 --vnw 32
--lr 0.01 --lr_step 12
--cls_lr_ratio 10 --start_epoch $LAST_EPOCH
--detail training_descibe4checkpoint --size 512
--crop 448 --cls_2 --swap_num 7 7

FileNotFoundError while python train.py as the Readme.md proposed

After orgnaizing the CUB dataset as the Readme.md suggest, while python train.py for training from scatch it reported 'FileNotFoundError: [Errno 2] File b'./dataset/CUB_200_2011/anno/ct_train.txt' does not exist: b'./dataset/CUB_200_2011/anno/ct_train.txt''.How could I fix it?Thanks a lot.

What is the 'Submit_result' in the 85th line of 'test.py' ?

CUDA memory error

I have two 1080ti GPU and ech of their memory is 11G. But when I train DCL on kaggle product dataset, I only can set train batch 4 and val batch 16. Though in this way, GPU is used more than 90%. So may I get your GPUs info and some suggestion?

jdai-cv / dcl Goto Github PK

dcl's People

Stargazers

Watchers

Forkers

dcl's Issues

Recommend Projects

Recommend Topics

Recommend Org