chenbinghui1 / dsl Goto Github PK
View Code? Open in Web Editor NEWCVPR2022 paper "Dense Learning based Semi-Supervised Object Detection"
License: Apache License 2.0
CVPR2022 paper "Dense Learning based Semi-Supervised Object Detection"
License: Apache License 2.0
第四步如果单卡的话是只需要在train里加入config的路径就行了嘛?还需要配置其他什么东西吗,为什么单纯添加了[r50_caffe_mslonger_tricks_0.Xdata.py]模型之后跑完一百个epoch之后的MAP结果连1%都不到,(1%和10%的数据),求解单卡配置参数,非常感谢
and when will release code? Thanks~
Hi, Thank you for your great work!
In your paper, you mentioned that you applied a AF strategy to improve the quality of pseudo-labels, but I didn't find the code of this part. could you release the code of this part?
連不上外網,請問可以放到baidu網盤上面嗎
请问安装好4个依赖后就可以直接运行代码了吗?是否需要pip install -v 之类的命令?
在训练过程中预测伪标签的时候,我看到预测的图是:[self.image_list[next_iter]],也就是说每一个iteration只更新一张伪标签吗?还是我的理解的问题,期待你的回答,万分感激
http://www4.comp.polyu.edu.hk/~cslzhang/paper/DSL_cvpr22.pdf was linked from Google Scholar but unfortunately returns 404 now. Would be keen to read the paper in better rendering than Google cache :)
您好,请问在lscale部分,我的理解是通过下采样图片,然后错位对相邻层同尺寸的分数图进行MSEloss计算,这个分数图是怎么得到的呢?对FPN网络输出的特征图进行激活函数sigmoid这样的?还有,在进行MSEloss时,相邻层的channel是不相同的,应该也是个2倍关系,请问这部分是怎么处理的?
向大佬学习!
Hi, thanks for your great work, but I concern that how to generate adathres.json?
非常感谢您能开源您的代码!不过我在阅读您的代码的时候有几个地方不是很明白,所以想请教您一下:
flatten_As_labels
是用来干什么的呢?hello, I have a problem when inference the data of VOC. Run the script of ./tools/inference_unlabeled_coco_data.sh, I don't get any inference results at specified folder. Is there a script for inferencing and generating pseudo-labels for VOC data? Thanks!
您好,看了您的工作,提出了很多新的想法,感觉收益很多。
我想要问一下,对于半监督训练,在监督训练baseline的时候需要将模型训练至完全收敛吗,还是要留存出一定的空间至没有完全收敛状态,因为在半监督阶段也会使用到标注的数据,从而防止这部分数据的过拟合而影响到模型的整体效果。
因为我看您说一般baseline在55epoch达到较好效果,我用voc数据训练了60epoch后到达63.8AP50,而您论文中给出的supervised的AP50是69.6。所以说是不是在监督阶段留一部分余量会更好?
Hi, binghui, thank you for sharing your work, I also work on ssod area, I found some differences between your work and other opensource ssod architecher, since the problem I will describe should be very specific, to prevent misunderstanding, I write them in chinese:
1.我仔细看了你代码的实现,在做10%standard任务时,大概的状态是你会在半监督那部分开始之前,用10%数据量训练好的模型来生成一份伪标签,然后在半监督的时候,你每个epoch的的迭代次数其实是按照这个伪标签的图片数量来的,是这样吗?
2.你的伪标签在每一个epoch结束后利用pred_hook机制又重新生成一次,同时是每个epoch来更新你的af模块,如果我没有理解错的话,你的ema模型是每一个迭代都会更新,但是只会在epoch结束来ema离线生成伪标签,这个与我们自己实现的方案有点不同,我们是利用ema每个迭代生成伪标签而且ema也每个迭代更新,道理上来说我感觉你这种方式貌似更加鲁棒,这样的实现有什么原因吗?
感谢你的工作并期待你的回复。
您好,请问哪里能找到您关于Aggregated Teacher更新参数的代码,我在mmdet/runner/hooks/semi_epoch_based_runner.py里的SemiEpochBasedRunner.EMA中找到更新参数的代码,但是好像和ema策略是一样的?我不太确定是否是找错了,麻烦您了
大佬提交的代码似乎是一个社区开源库的代码,特别杂,能否只条明路说说各模块关键实现代码分别在何处,感谢不尽!
Must the supervised model be trained in advance? Can I use DSL to train a model both with labeled samples and unlabeled samples from the begining(just with pretrained model of backbone)?
Hai,
I have trained supervised model just like steps you give (step 1-4), after training supervised baseline model on COCO dataset,
I have run semi_dest.sh with corresponding file paths to determine performance of supervised model, and the performance is
12% ( I have used 10% as partially labelled data) but in Table 1 of your paper the result is "23.70 ± 0.22". How I solve this issue??
Secondly, I am training model on 1 GPU device. This is the only difference.
I am waiting for your positive response and guidance please. Thanks.
我先使用自己的数据集训练了一个baseline,mAP在30%左右,然后将权重加在了pretrained后边,但在DSL Training阶段发现准确率不是从30%左右开始,而是从零开始,而且涨幅非常慢,训练16个epoch大概才有5%的mAP,请问是哪里出了问题吗?
Hi, thanks for your wonderful work! Where can I find the code about Adaptive Filtering Strategy? Thanks!
关于unlabel_train.sh脚本文件的问题,想要复现大佬的项目,但是发现在DEMO目录下并没有此文件,请问在哪个目录下呢?可以帮忙解决一下吗
你好,请问如何在我自己准备的图片上面使用DSL进行推理。
在readme里面发现了 tools/semi_dist_test.sh,但是我看了一下,好像无法满足我的需求。
大佬们好,我折腾了一个月,终于在WSL的ubuntu18.04.5上配好了环境
但是它在运行
readme里的这条语句时,运行了一段时间然后报错如下:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
Traceback (most recent call last):
File "./tools/train.py", line 202, in <module>
main()
File "./tools/train.py", line 120, in main
init_dist(args.launcher, **cfg.dist_params)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/dist_utils.py", line 18, in init_dist
_init_dist_pytorch(backend, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/dist_utils.py", line 35, in _init_dist_pytorch
dist.init_process_group(backend=backend, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/distributed_c10d.py", line 500, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/rendezvous.py", line 190, in _env_rendezvous_handler
store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout)
RuntimeError: Address already in use
Traceback (most recent call last):
File "./tools/train.py", line 202, in <module>
main()
File "./tools/train.py", line 120, in main
init_dist(args.launcher, **cfg.dist_params)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/dist_utils.py", line 18, in init_dist
_init_dist_pytorch(backend, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/dist_utils.py", line 35, in _init_dist_pytorch
dist.init_process_group(backend=backend, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/distributed_c10d.py", line 500, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/rendezvous.py", line 190, in _env_rendezvous_handler
store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout)
RuntimeError: Address already in use
Traceback (most recent call last):
File "./tools/train.py", line 202, in <module>
main()
File "./tools/train.py", line 120, in main
init_dist(args.launcher, **cfg.dist_params)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/dist_utils.py", line 18, in init_dist
_init_dist_pytorch(backend, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/dist_utils.py", line 35, in _init_dist_pytorch
dist.init_process_group(backend=backend, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/distributed_c10d.py", line 500, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/rendezvous.py", line 190, in _env_rendezvous_handler
store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout)
RuntimeError: Address already in use
Traceback (most recent call last):
File "./tools/train.py", line 202, in <module>
main()
File "./tools/train.py", line 120, in main
init_dist(args.launcher, **cfg.dist_params)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/dist_utils.py", line 18, in init_dist
_init_dist_pytorch(backend, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/dist_utils.py", line 35, in _init_dist_pytorch
dist.init_process_group(backend=backend, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/distributed_c10d.py", line 500, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/rendezvous.py", line 190, in _env_rendezvous_handler
store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout)
RuntimeError: Address already in use
Traceback (most recent call last):
File "./tools/train.py", line 202, in <module>
main()
File "./tools/train.py", line 120, in main
init_dist(args.launcher, **cfg.dist_params)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/dist_utils.py", line 18, in init_dist
_init_dist_pytorch(backend, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/dist_utils.py", line 35, in _init_dist_pytorch
dist.init_process_group(backend=backend, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/distributed_c10d.py", line 500, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/rendezvous.py", line 190, in _env_rendezvous_handler
store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout)
RuntimeError: Address already in use
Traceback (most recent call last):
File "./tools/train.py", line 202, in <module>
main()
File "./tools/train.py", line 120, in main
init_dist(args.launcher, **cfg.dist_params)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/dist_utils.py", line 18, in init_dist
_init_dist_pytorch(backend, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/dist_utils.py", line 35, in _init_dist_pytorch
dist.init_process_group(backend=backend, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/distributed_c10d.py", line 500, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/rendezvous.py", line 190, in _env_rendezvous_handler
store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout)
RuntimeError: Address already in use
Killing subprocess 1941
Killing subprocess 1942
Killing subprocess 1943
Killing subprocess 1944
Killing subprocess 1945
Killing subprocess 1946
Killing subprocess 1947
Killing subprocess 1948
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 192, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launch.py", line 340, in <module>
main()
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python', '-u', './tools/train.py', '--local_rank=7', 'configs/fcos_semi/r50_caffe_mslonger_tricks_0.Xdata.py', '--launcher', 'pytorch', '--work-dir', 'workdir_coco/r50_caffe_mslonger_tricks_0.1data']' returned non-zero exit status 1.
主要是 RuntimeError: Address already in use
报错
然后我尝试运行了它报错的/usr/bin/python -u ./tools/train.py --local_rank=7 configs/fcos_semi/r50_caffe_mslonger_tricks_0.Xdata.py --launcher pytorch --work-dir workdir_coco/r50_caffe_mslonger_tricks_0.1data
这条命令,发现是可以运行的,我去网上搜了一下,应该是pytorch分布式在单机多任务时使用了GPU的同一个端口而报错,然后我就修改了所有DSL项目中的master_port参数,如下
但是还是报错RuntimeError: Address already in use
,,,,,好折磨啊555
大佬求教一教,已经给了star
hello,我昨天试了一下用训好的 epochxx.pth 和 epochxx.pth_ema 进行推理,发现这两个模型得到的精度是一模一样的,请问这样是合理的吗,理论上 EMAmodel 的精度不是会高很多的吗,或者说有哪里出了问题?
您好,感谢您非常有意义的工作,我关于baseline有些问题想和您请教:在10%labeled data设置下,意味着90%的unlabled data需要和10%labeled data进行匹配,在一个epoch中,如果unlabled data全部使用的话,则labled data需要重复使用9次,这样来构成一个epoch。这样在论文baseline的结果中,在一个epoch中同样使用9倍的labled data还是只使用1倍的labled data呢?(在semi-supervised和baseline跑相同迭代数的情况下)
感谢大佬的工作,DSL在自己的数据集上效果也很棒!
实验过程发现一个现象:使用50%标注训练的模型指标已经和100%标注训练的模型持平了。想请教您一下,这个现象应该怎么解释?
我之前的理解是半监督无论如何也不可能超过全监督的性能,否则全量标注相比于部分标注多出来的那些标注框作用是什么?模型带着部分错误的label都可以达到全量标注的效果,无法理解,希望您能解惑,感谢大佬!
Hi~,
Thank you for your great work! I have some problems with your code, in your paper you mentioned that you applied a MetaNet to improve the quality of pseudo-labels, but I didn't find the code of this part. I noticed that you said you remove the code of this part in the ReadMe.md, could you release the code of this part for I want to check the effectiveness of this part?
batch_mlvl_bboxes /= batch_mlvl_bboxes.new_tensor(
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5142/5142, 19.1 task/s, elapsed: 269s, ETA: 0s2022-11-24 10:22:08,173 - mmdet - INFO - [INFO] Unlabel pred Done!
Traceback (most recent call last):
File "tools/train.py", line 202, in
main()
File "tools/train.py", line 190, in main
train_detector(
File "/home/hello/PycharmProjects/pythonProject/new_DSL/DSL/mmdet/apis/train.py", line 218, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/hello/PycharmProjects/pythonProject/new_DSL/DSL/mmdet/runner/hooks/semi_epoch_based_runner.py", line 345, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/hello/PycharmProjects/pythonProject/new_DSL/DSL/mmdet/runner/hooks/semi_epoch_based_runner.py", line 267, in train
self.call_hook('after_train_iter')
File "/home/hello/anaconda3/envs/Torch-DSL/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/home/hello/PycharmProjects/pythonProject/new_DSL/DSL/mmdet/runner/hooks/unlabel_pred_hook.py", line 460, in after_train_iter
self.after_train_iter_func(runner)
File "/home/hello/PycharmProjects/pythonProject/new_DSL/DSL/mmdet/runner/hooks/unlabel_pred_hook.py", line 517, in after_train_iter_func
assert len(runner.imagefiles) == 2
AssertionError
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 16269) of binary: /home/hello/anaconda3/envs/Torch-DSL/bin/python
when i want to debug and set preload=1,start_point=2 to reduce training time.It occur another error.
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading annotations into memory...
Done (t=0.13s)
creating index...
index created!
loading annotations into memory...
Done (t=0.02s)
creating index...
index created!
loading annotations into memory...
Done (t=0.20s)
creating index...
index created!
[ERROR][ModelInfer] Found no image in /home/hello/PycharmProjects/pythonProject/new_DSL/DSL/mydata/semicoco/images/full
but this document has images,i dont know what happen?
anybody can help me?Thanks a lot.
您好,我在复现的过程中发现代码里没有demo/model_train/unlabel.sh这个文件,在论文里面涉及的一些方法我也没有找到相关的代码,请问您是已经把所有核心代码都放了吗?我用demo/model_train/unlabel_dynamic.sh文件训练的时候代码似乎是有bug
换用自己数据集训练时报错:
Traceback (most recent call last):
File "./tools/train.py", line 202, in
main()
File "./tools/train.py", line 190, in main
train_detector(
File "/secret/ZLW/Codes/SSOD/DSL/mmdet/apis/train.py", line 218, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/secret/ZLW/Codes/SSOD/DSL/mmdet/runner/hooks/semi_epoch_based_runner.py", line 344, in run
epoch_runner(data_loaders[i], **kwargs)
File "/secret/ZLW/Codes/SSOD/DSL/mmdet/runner/hooks/semi_epoch_based_runner.py", line 265, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/secret/ZLW/Codes/SSOD/DSL/mmdet/runner/hooks/semi_epoch_based_runner.py", line 155, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "/usr/local/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/secret/ZLW/Codes/SSOD/DSL/mmdet/models/detectors/base.py", line 237, in train_step
losses = self(**data)
File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 97, in new_func
return old_func(*args, **kwargs)
File "/secret/ZLW/Codes/SSOD/DSL/mmdet/models/detectors/base.py", line 171, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/secret/ZLW/Codes/SSOD/DSL/mmdet/models/detectors/single_stage.py", line 82, in forward_train
losses = self.bbox_head.forward_train(x, img_metas, gt_bboxes,
File "/secret/ZLW/Codes/SSOD/DSL/mmdet/models/dense_heads/base_dense_head.py", line 54, in forward_train
losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
File "/usr/local/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 185, in new_func
return old_func(*args, **kwargs)
File "/secret/ZLW/Codes/SSOD/DSL/mmdet/models/dense_heads/fcos_head.py", line 309, in loss
loss_cls = self.loss_cls(
File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/secret/ZLW/Codes/SSOD/DSL/mmdet/models/losses/focal_loss.py", line 170, in forward
loss_cls = self.loss_weight * calculate_loss_func(
File "/secret/ZLW/Codes/SSOD/DSL/mmdet/models/losses/focal_loss.py", line 85, in sigmoid_focal_loss
loss = _sigmoid_focal_loss(pred.contiguous(), target, gamma, alpha, None,
File "/usr/local/lib/python3.8/site-packages/mmcv/ops/focal_loss.py", line 39, in forward
assert input.size(0) == target.size(0)
AssertionError
batch设为8,输入分辨率设为512x512,debug了一下,发现在semi_epoch_based_runner.py第186行开始,
data_batch['img_metas']、data_batch['gt_bboxes']data_batch['gt_labels']添加了一个元素,而data_batch['img'] cat了一个batch-1的图像tensor。导致网络的模型输入tensor维度变成(15,3,512,512),而label相关的信息为9张图像的,进而在计算loss时出现了AssertionError。
请问大佬这里是我代码没理解对还是确实有bug呢?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.