alibabaresearch / efficientteacher Goto Github PK
View Code? Open in Web Editor NEWA Supervised and Semi-Supervised Object Detection Library for YOLO Series
License: GNU General Public License v3.0
A Supervised and Semi-Supervised Object Detection Library for YOLO Series
License: GNU General Public License v3.0
Hello感谢作者们的工作,将半监督算法结合了广泛应用的YoloV5并得到了很大的提升,看了论文之后(还没来得及跑代码==)有几个问题想要了解一下:
Table 1中,我们看到本文提出的Dense Detector相比YoloV5、V7而言在参数量不相上下、计算量甚至更高的情况下的mAP仍有下降,那么为什么不考虑直接基于以上更强性能的Yolo detectors应用本文方法?
目前半监督方法在one stage&anchor-free里的还主要是基于FCOS这种比较早期的detector,论文里一般没有提供基于比较sota的如YoloX等模型的效果。不知道你们有尝试将Efficient Teacher应用在yolox这类算法上吗?看起来PLA&EA也都适合anchor-free
在burn-in阶段利用ublabel data进行域迁移学习的想法非常漂亮,但在消融实验里好像没单独看到DA带来的影响,不知道这一步带来的提升怎么样?Figure5中普通Burn-In效果也能达到不错,请问这里的普通burn-in是没有进行DA和adaptive threshold的吗?
@BowieHsu
你好,我训练,监督式的epoch是可以正常训练,到非监督的epoch就会报错 return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
训练自己的数据集,采用直接训练方式,前220轮监督学习map50持续增长,后面切换到半监督后后,map50急速下降后开始增长,100轮结束后还是和前220轮的map50相差一段距离,请问这个是什么原因,有什么推荐的解决方法吗?谢谢!
训练过程中,val loss曲线全为0,其它曲线正常,请问这个可能是什么原因
Traceback (most recent call last):
File "train.py", line 84, in
main(opt)
File "train.py", line 76, in main
trainer.train(callbacks, val)
File "/share/disk1/ml/code/efficientteacher-main/trainer/trainer.py", line 532, in train
self.train_in_epoch(callbacks)
File "/share/disk1/ml/code/efficientteacher-main/trainer/ssod_trainer.py", line 285, in train_in_epoch
self.train_without_unlabeled(callbacks)
File "/share/disk1/ml/code/efficientteacher-main/trainer/ssod_trainer.py", line 402, in train_without_unlabeled
self.update_optimizer(loss, ni)
File "/share/disk1/ml/code/efficientteacher-main/trainer/ssod_trainer.py", line 445, in update_optimizer
self.ema.update(self.model)
File "/share/disk1/ml/code/efficientteacher-main/utils/torch_utils.py", line 331, in update
self.updates += 1
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'
Killing subprocess 60073
Traceback (most recent call last):
File "/root/anaconda3/envs/effictteacher/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/root/anaconda3/envs/effictteacher/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/anaconda3/envs/effictteacher/lib/python3.6/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/root/anaconda3/envs/effictteacher/lib/python3.6/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/root/anaconda3/envs/effictteacher/lib/python3.6/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/anaconda3/envs/effictteacher/bin/python', '-u', 'train.py', '--local_rank=0', '--cfg', 'configs/ssod/custom/yolov5l_custom_ssod.yaml']' returned non-zero exit status 1.
您好,请问关于小模型(例如yolov5s)的模型半监督训练的超参数选择方面,您的团队目前有计划在训练吗?参数的选择方面有一些更好的建议吗?
I get the following error when running bash data/get_coco.sh
.
[val2017.zip]
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of val2017.zip or
val2017.zip.zip, and cannot find val2017.zip.ZIP, period.
[coco2017labels.zip]
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of coco2017labels.zip or
coco2017labels.zip.zip, and cannot find coco2017labels.zip.ZIP, period.
请允许我介绍一下3月的更新计划——1.我们将会在验证后上传在readme中提到的这些coco预训练模型,这样可以进一步方便大家调试和使用我们的算法库,也能更快地获得较好的半监督训练效果;2.我们的半监督方案还无法适应任意的数据分布,所以我们希望收集大家在实际使用中遇到的各种情况,请随时提出issue来共同改进;
Let me introduce the update plan for March: 1. After verification, we will upload the coco pre-trained models mentioned in readme, which will further facilitate everyone to debug and use our algorithm library, and can also get better semi-supervised training effect faster; 2. Our semi-supervised scheme can not adapt to any data distribution, so we hope to collect various situations encountered by everyone in actual use, please feel free to submit issues to improve together;
#不确定样本里面obj特别高的,送出来修iou loss
if t[7] >= 0.99:
uncertain_obj_targets.append(np.concatenate((t[:6], t[7:8])))
#不确定样本里cls特别高的,送出来修cls loss
if t[8] >= 0.99:
uncertain_cls_targets.append(np.concatenate((t[:6], t[7:8])))
此处
if t[8] >= 0.99:
uncertain_cls_targets.append(np.concatenate((t[:6], t[7:8])))
是否应该是
if t[8] >= 0.99:
uncertain_cls_targets.append(np.concatenate((t[:6], t[8:9])))
我想了解一些关于半监督训练时数据集的设置情况。
1、无标签的样本应该出自新的图像,还是应该由原始的训练集中划分得出呢
2、在原论文中,你们使用的训练集和target是来自10%的train2017吗
原始的yolov5模型还是需要到原始仓库yolov5运行检测吗?
您好,请问一下 在模型进行训练的时候出现这个问题的原因是什么result type Float can't be cast to the desired output type long int
您好,我在执行 python val.py --config configs/sup/custom/yolov5s_custom.yaml --weights weights/efficient-yolov5s.pt
此条命令时报错了,以下是我的 yaml 文件定义:
Dataset:
data_name: 'custom'
train: ['./images/train' ]
val: [ './images/valid' ]
test: [ './images/valid' ]
nc: 1 # number of classe
np: 0 #number of keypoints
names: [ 'apple' ]
我的数据集是 txt 格式,组织目录如下:
|-- dataset
| |-- images
| | |-- train
| | |-- valid
| |-- labels
| | |-- train
| | |-- valid
txt文件时标准的yolov5格式,如下所示:
0 0.927518 0.820644 0.123193 0.112306 0 0.525768 0.426199 0.050776 0.020583
请问是否支持其它版本的yolo,configs/ssod中只有关于v5的配置,可否切换至V7版本进行半监督学习?
想问一下,我想把efficient-yolov5l-ssod.pt作为预模型直接用于官方的yolov5代码中,但提示no model named "models.detector",模型名称对应不上,请问该如何修改呢?
将上面生成的unlabel.txt的绝对路径用来替换yolov5_custom.yaml的target: data_custom_target.txt,然后粘贴以下部分配置文件到yolov5l_custom.yaml中:
您好,上面这个是 README 里面的介绍,这里的 target: data_custom_target.txt 字段请问是添加在哪里?我再 yolov5l_custom.yaml 文件中并未找到。
另外我这个 custom_train.txt 是否需要涵盖 unlabel.txt 的内容,还是按照原来的监督学习默认的就行?
请问作者或者各位大佬,如何使用本代码进行YOLOv7的半监督训练?个人小白
哦哦 了解,那应该不影响训练,我再查一下,目前这个loss试下来有帮助吗
Originally posted by @BowieHsu in #21 (comment)
I have datasets which has annotations which are oriented at an angle, I understand the codebase is designed to handle HBBs, Where are the areas one should change the code to make it work for OBBs ?
1、半监督从头训练学习率截取部分如下,从epoch54后面学习率都为0.01,burn_epochs=220:
epoch | train/box_loss | train/obj_loss | train/cls_loss | metrics/precision | metrics/recall | metrics/mAP_0.5 | metrics/mAP_0.5:0.95 | val/box_loss | val/obj_loss | val/cls_loss | x/lr0 | x/lr1 | x/lr2
50 0.045489 0.0052034 0 0 0 0 0 0 0 0 0.00917 0.00917 0.01747
51 0.044976 0.0052962 0 0 0 0 0 0 0 0 0.00935 0.00935 0.01585
52 0.043255 0.0051407 0 0 0 0 0 0 0 0 0.00953 0.00953 0.01423
53 0.044804 0.0053863 0 0 0 0 0 0 0 0 0.00971 0.00971 0.01261
54 0.044136 0.0049189 0 0 0 0 0 0 0 0 0.00989 0.00989 0.01099
55 0.042046 0.0052086 0 0 0 0 0 0 0 0 0.01 0.01 0.01
56 0.04328 0.004892 0 0 0 0 0 0 0 0 0.01 0.01 0.01
57 0.042468 0.0050412 0 0 0 0 0 0 0 0 0.01 0.01 0.01
58 0.043239 0.0051044 0 0 0 0 0 0 0 0 0.01 0.01 0.01
59 0.041122 0.0047628 0 0 0 0 0 0 0 0 0.01 0.01 0.01
60 0.041649 0.0048522 0 0 0 0 0 0 0 0 0.01 0.01 0.01
61 0.040692 0.0050476 0 0 0 0 0 0 0 0 0.01 0.01 0.01
62 0.038645 0.0047481 0 0 0 0 0 0 0 0 0.01 0.01 0.01
2、半监督训练加载labled10%训练好的模型epoch210接着训练,burn_epochs=220,后面的学习率都是0.01:
epoch | train/box_loss | train/obj_loss | train/cls_loss | metrics/precision | metrics/recall | metrics/mAP_0.5 | metrics/mAP_0.5:0.95 | val/box_loss | val/obj_loss | val/cls_loss | x/lr0 | x/lr1 | x/lr2
211 | 0.027092 | 0.002975 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01
212 | 0.025645 | 0.003046 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01
213 | 0.025966 | 0.003221 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01
214 | 0.025256 | 0.003216 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01
215 | 0.025347 | 0.003051 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01
我想请问下两个问题:
1、为什么yolov5l全监督训练配置文件中lrf设置为0.1,而半监督训练配置文件中lrf设置为1.0呢?这个有什么说法吗?
2、上述训练方式1的结果比方式2的结果差很多
Downloading https://github.com/BowieHsu/EfficientTeacher/releases/tag/data_list/data_list.zip ... -#O#- # # [data_list.zip] End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. note: data_list.zip may be a plain executable, not an archive unzip: cannot find zipfile directory in one of data_list.zip or data_list.zip.zip, and cannot find data_list.zip.ZIP, period.
您好,好像没有这个文件。
你好,我想用yolov5 v6.1的yolov5s.pt作为预训练模型,通过convert_pt_to_efficient.py转换yolov5s.pt得到e-yolov5s.pt,转换的配置文件用的是configs/sup/public/yolov5s_coco.yaml,转换过程没有报错,于是我用这个e-yolov5s.pt直接采用半监督的方式训练20个类别的模型,但是我在加载e-yolov5s.pt的时候报如下的错误:RuntimeError: Error(s) in loading state_dict for Model:
size mismatch for head.m.0.weight: copying a param with shape torch.Size([255, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([75, 128, 1, 1]).
size mismatch for head.m.0.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([75]).
size mismatch for head.m.1.weight: copying a param with shape torch.Size([255, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([75, 256, 1, 1]).
size mismatch for head.m.1.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([75]).
size mismatch for head.m.2.weight: copying a param with shape torch.Size([255, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([75, 512, 1, 1]).
size mismatch for head.m.2.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([75]).
Traceback (most recent call last):
这是不是意味着,半监督的预训练模型不能直接用官方80个类别的进行训练呢
您好,您的关于yolov5的超参数和U版的超参数设置一样么?
另外burn_epochs: 这个参数 设置220 是论文中的设定么(假设总共训练300个epoch)
我用标准yolov5模型转成ET模型训练完成后,如何再转回yolov5模型呢?
您好!拉取最新的代码后,依然遇到该问题,请问一下该问题如何解决?我在配置文件中加载了efficient-yolov7s-simota.pt
权重
@BowieHsu
Hi, when I reproduce your result, I find that I need 40min per epoch, is it normal? By the way, my device is a 3090Ti GPU with batchsize=32, 10% labeled coco
thanks for sharing
你好,如果我想改动正负样本分配策略,应该修改哪一部分代码,我看到了models/assigner中已经写有几种策略,但我不知道如何调用他们
readme_zhcn中, configs/ssod/custom/yolov5l_custom_ssod.yaml写成了configs/custom/yolov5l_custom_ssod.yaml
你好,我遇到以下问题,你能帮我看看吗
train: immutable=False, deprecated_keys=set(), renamed_keys={}, new_allowed=False
EfficientTeacher 2023-3-15 torch 1.7.1 CUDA:0 (NVIDIA GeForce RTX 2080, 7974.0625MB)
Weights & Biases: run 'pip install wandb' to automatically track and visualize EfficientTeacher runs (RECOMMENDED)
TensorBoard: Start with 'tensorboard --logdir yolov5_ssod', view at http://localhost:6006/
Model summary: 478 layers, 47539674 parameters, 47539674 gradients
Scaled weight_decay = 0.0005
optimizer: SGD with parameter groups 110 weight, 101 weight (no decay), 104 bias
Traceback (most recent call last):
File "train.py", line 84, in
main(opt)
File "train.py", line 72, in main
trainer = SSODTrainer(cfg, device, callbacks, LOCAL_RANK, RANK, WORLD_SIZE)
File "/home/jinping/code/efficientteacher-main/trainer/ssod_trainer.py", line 60, in init
self.build_dataloader(cfg, callbacks)
File "/home/jinping/code/efficientteacher-main/trainer/ssod_trainer.py", line 225, in build_dataloader
workers=cfg.Dataset.workers, prefix=colorstr('train: '), cfg=cfg)
File "/home/jinping/code/efficientteacher-main/utils/datasets.py", line 334, in create_dataloader
prefix=prefix)
File "/home/jinping/code/efficientteacher-main/utils/datasets.py", line 704, in init
if self.img_files[0].endswith('.txt') and len(self.img_files[0].split(' ')) == 2:
IndexError: list index out of range
(open-mmlab) jinping@jinping-Precision-5820-Tower:~/code/efficientteacher-main$
我在训练好efficient_yolol.pt之后,使用convert_efficient_to_yolov5转换成标准的yolov5l.pt,但在使用yolov5官方的detect.py时,出现了以下bug:
Traceback (most recent call last):
File "/home/zzz/Desktop/workspace/yolov5/detect.py", line 307, in
main(opt)
File "/home/zzz/Desktop/workspace/yolov5/detect.py", line 302, in main
run(* * vars(opt))
File "/usr/local/anaconda3/envs/***/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, * * kwargs)
File "/home/zzz/Desktop/workspace/yolov5/detect.py", line 82, in run
model = torch.jit.load(w) if 'torchscript' in w else attempt_load(weights, map_location=device)
File "/home/zzz/Desktop/workspace/yolov5/models/experimental.py", line 94, in attempt_load
ckpt = torch.load(attempt_download(w), map_location=map_location) # load
File "/usr/local/anaconda3/envs/yolov5/lib/python3.8/site-packages/torch/serialization.py", line 607, in load
return _load(opened_zipfile, map_location, pickle_module, * * pickle_load_args)
File "/usr/local/anaconda3/envs/yolov5/lib/python3.8/site-packages/torch/serialization.py", line 882, in _load
result = unpickler.load()
File "/usr/local/anaconda3/envs/yolov5/lib/python3.8/site-packages/torch/serialization.py", line 875, in find_class
return super().find_class(mod_name, name)
AttributeError: Can't get attribute 'DetectionModel' on <module 'models.yolo' from '/home/zzz/Desktop/workspace/yolov5/models/yolo.py'>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.