alibabaresearch / efficientteacher Goto Github PK

View Code? Open in Web Editor NEW

795.0 795.0 145.0 2.04 MB

A Supervised and Semi-Supervised Object Detection Library for YOLO Series

License: GNU General Public License v3.0

Python 99.90% Shell 0.10%

efficientteacher's People

Contributors

Stargazers

Watchers

Forkers

moguijoe masemxiao s8xy apxlwl realcyborg codingonion kdy1999 windb3ll sunmooncode meton-robean hzy5000 mistyr0se chenerhua ntt720 xupercoin jbluv jie311 vamoko meanchen spicyguml d3p10y fskeo so-totoro chatgpt2 lonelyzyp awekling billionerd coder-drinker minisoco farmingtong witek- tutuna tufo830 isprettyboy pikazhung n0wwa nicbair hay-man obsidian6s zaku-zaku hisstar ai-jie01 weilovewei cerviny ahx89 nicolesherwood molierflower closegoingaway herpacker paramedick javaartisan maigone 0x8235 lesterlee89 iam20cm lycokie fryuki 98usb eugenj8 yurikabe zhuofalin junsenc wensiyuansix hs991023 staccats luluchou e-kiss-me monsterdove piapplepi jo-dean ifreegroup samewoo nanpusher erdosys excelisa haru-zt ymzhang96 w90o0u hapyura yasu79 lang-ham mmxlvii vaneesae joyenter fvac wsd12345 becauseofai xiao2duan twacoco paoyes luozhe023 qugou1350636 tyagi-iiitv zhuligu tuanbc hvt1609 pudongdong cv-det handsomempw plutomingyu

efficientteacher's Issues

Some questions about EfficientTeacher

Hello感谢作者们的工作，将半监督算法结合了广泛应用的YoloV5并得到了很大的提升，看了论文之后（还没来得及跑代码==)有几个问题想要了解一下：

Table 1中，我们看到本文提出的Dense Detector相比YoloV5、V7而言在参数量不相上下、计算量甚至更高的情况下的mAP仍有下降，那么为什么不考虑直接基于以上更强性能的Yolo detectors应用本文方法？
目前半监督方法在one stage&anchor-free里的还主要是基于FCOS这种比较早期的detector，论文里一般没有提供基于比较sota的如YoloX等模型的效果。不知道你们有尝试将Efficient Teacher应用在yolox这类算法上吗？看起来PLA&EA也都适合anchor-free
在burn-in阶段利用ublabel data进行域迁移学习的想法非常漂亮，但在消融实验里好像没单独看到DA带来的影响，不知道这一步带来的提升怎么样？Figure5中普通Burn-In效果也能达到不错，请问这里的普通burn-in是没有进行DA和adaptive threshold的吗？

Permission denied: '/runs_yolov5'

您好我按着README进行复现论文，但是在《在实际项目中使用Efficient Teacher方案》，第三部分，进行有监督训练出现整个问题，请问应该如何解决

哦哦了解，那应该不影响训练，我再查一下，目前这个loss试下来有帮助吗

@BowieHsu
你好，我训练，监督式的epoch是可以正常训练，到非监督的epoch就会报错 return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

target数据集必须独立于train，val吗？当将val设为target数据集时，会导致image能找全，但是label找的是错误的情况？

训练精度问题

训练自己的数据集，采用直接训练方式，前220轮监督学习map50持续增长，后面切换到半监督后后，map50急速下降后开始增长，100轮结束后还是和前220轮的map50相差一段距离，请问这个是什么原因，有什么推荐的解决方法吗？谢谢！

unsupported operand type(s) for +=: 'NoneType' and 'int';请问这是什么原因呀

Traceback (most recent call last):
File "train.py", line 84, in
main(opt)
File "train.py", line 76, in main
trainer.train(callbacks, val)
File "/share/disk1/ml/code/efficientteacher-main/trainer/trainer.py", line 532, in train
self.train_in_epoch(callbacks)
File "/share/disk1/ml/code/efficientteacher-main/trainer/ssod_trainer.py", line 285, in train_in_epoch
self.train_without_unlabeled(callbacks)
File "/share/disk1/ml/code/efficientteacher-main/trainer/ssod_trainer.py", line 402, in train_without_unlabeled
self.update_optimizer(loss, ni)
File "/share/disk1/ml/code/efficientteacher-main/trainer/ssod_trainer.py", line 445, in update_optimizer
self.ema.update(self.model)
File "/share/disk1/ml/code/efficientteacher-main/utils/torch_utils.py", line 331, in update
self.updates += 1
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'
Killing subprocess 60073
Traceback (most recent call last):
File "/root/anaconda3/envs/effictteacher/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/root/anaconda3/envs/effictteacher/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/anaconda3/envs/effictteacher/lib/python3.6/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/root/anaconda3/envs/effictteacher/lib/python3.6/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/root/anaconda3/envs/effictteacher/lib/python3.6/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/anaconda3/envs/effictteacher/bin/python', '-u', 'train.py', '--local_rank=0', '--cfg', 'configs/ssod/custom/yolov5l_custom_ssod.yaml']' returned non-zero exit status 1.

支持yolov8半监督训练吗

标准转ET之后运行val.py，精度为0，各项值都为0

老师您好，我想请问一下以下问题：
问题1：标准转ET运行val.py，精度为什么为0？我是使用的官方上yolov5代码训练了，放在convert_pt_to_efficient.py转成efficient_v5.py模型。（请问我转换权重文件这里是不是理解有问题：best.py指官方v5转ET；yolov5s_custom.yaml指只修改过nc和classes的此代码文件；efficient_v5.pt是转换成ET的权重文件）

问题2：配置文件是如何读取backbone、neck、head的？
问题3：在配置文件当中有没有 “weight”会产生什么影响？

关于yolov5s的半监督训练

您好，请问关于小模型（例如yolov5s）的模型半监督训练的超参数选择方面，您的团队目前有计划在训练吗？参数的选择方面有一些更好的建议吗？

Data downloading error

I get the following error when running bash data/get_coco.sh.

[val2017.zip]
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of val2017.zip or
        val2017.zip.zip, and cannot find val2017.zip.ZIP, period.

[coco2017labels.zip]
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of coco2017labels.zip or
        coco2017labels.zip.zip, and cannot find coco2017labels.zip.ZIP, period.

模型转换后出现KeyError

KeyError: 'det_8.conv1'

我采用的是转换后的efficient pt进行训练的，yolov5s，配置文件按照README要求更改，但是训练完成之后发现转换时会提示缺失文件(类似于不能导入yolo_ssod等)，这个时候我在外部efficientteacher的models文件中找到对应文件名文件放入到script的models文件中，发现出现了这样的错误。

Welcome to the efficientteacher project

请允许我介绍一下3月的更新计划——1.我们将会在验证后上传在readme中提到的这些coco预训练模型，这样可以进一步方便大家调试和使用我们的算法库，也能更快地获得较好的半监督训练效果；2.我们的半监督方案还无法适应任意的数据分布，所以我们希望收集大家在实际使用中遇到的各种情况，请随时提出issue来共同改进；

Let me introduce the update plan for March: 1. After verification, we will upload the coco pre-trained models mentioned in readme, which will further facilitate everyone to debug and use our algorithm library, and can also get better semi-supervised training effect faster; 2. Our semi-supervised scheme can not adapt to any data distribution, so we hope to collect various situations encountered by everyone in actual use, please feel free to submit issues to improve together;

关于compute_un_sup_loss函数中select_targets

#不确定样本里面obj特别高的，送出来修iou loss
if t[7] >= 0.99:
    uncertain_obj_targets.append(np.concatenate((t[:6], t[7:8])))
#不确定样本里cls特别高的，送出来修cls loss
if t[8] >= 0.99:
    uncertain_cls_targets.append(np.concatenate((t[:6], t[7:8])))

此处

if t[8] >= 0.99:
    uncertain_cls_targets.append(np.concatenate((t[:6], t[7:8])))

是否应该是

if t[8] >= 0.99:
    uncertain_cls_targets.append(np.concatenate((t[:6], t[8:9])))

关于数据集设置的问题

我想了解一些关于半监督训练时数据集的设置情况。
1、无标签的样本应该出自新的图像，还是应该由原始的训练集中划分得出呢
2、在原论文中，你们使用的训练集和target是来自10%的train2017吗

该库的detect脚本只适用于et-yolov5的检测吗？

原始的yolov5模型还是需要到原始仓库yolov5运行检测吗？

result type Float can't be cast to the desired output type long int

您好，请问一下在模型进行训练的时候出现这个问题的原因是什么result type Float can't be cast to the desired output type long int

Issues encountered when using Efficient Teacher on personal datasets

您好，我在执行 python val.py --config configs/sup/custom/yolov5s_custom.yaml --weights weights/efficient-yolov5s.pt 此条命令时报错了，以下是我的 yaml 文件定义：

Dataset:
data_name: 'custom'
train: ['./images/train' ]
val: [ './images/valid' ]
test: [ './images/valid' ]
nc: 1 # number of classe
np: 0 #number of keypoints
names: [ 'apple' ]

txt文件时标准的yolov5格式，如下所示：
0 0.927518 0.820644 0.123193 0.112306 0 0.525768 0.426199 0.050776 0.020583

如何切换yolo版本？

请问是否支持其它版本的yolo，configs/ssod中只有关于v5的配置，可否切换至V7版本进行半监督学习？

由有监督学习进入无监督开始训练，报错

return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

预模型使用问题

想问一下，我想把efficient-yolov5l-ssod.pt作为预模型直接用于官方的yolov5代码中，但提示no model named "models.detector"，模型名称对应不上，请问该如何修改呢？

训练问题

将上面生成的unlabel.txt的绝对路径用来替换yolov5_custom.yaml的target: data_custom_target.txt，然后粘贴以下部分配置文件到yolov5l_custom.yaml中:

您好，上面这个是 README 里面的介绍，这里的 target: data_custom_target.txt 字段请问是添加在哪里？我再 yolov5l_custom.yaml 文件中并未找到。

另外我这个 custom_train.txt 是否需要涵盖 unlabel.txt 的内容，还是按照原来的监督学习默认的就行？

使用YOLOV7进行半监督训练

请问作者或者各位大佬，如何使用本代码进行YOLOv7的半监督训练？个人小白

请问有人在efficientteacher上使用yolov7进行半监督训练吗

哦哦了解，那应该不影响训练，我再查一下，目前这个loss试下来有帮助吗

          哦哦 了解，那应该不影响训练，我再查一下，目前这个loss试下来有帮助吗

Originally posted by @BowieHsu in #21 (comment)

您好，请问如果我想要结果可以复现，我应该在哪里修改seed？

How to make the code handle oriented bounding box annotations?

I have datasets which has annotations which are oriented at an angle, I understand the codebase is designed to handle HBBs, Where are the areas one should change the code to make it work for OBBs ?

请问有计划支持yolov8的半监督训练吗？

关于半监督训练学习率问题

1、半监督从头训练学习率截取部分如下，从epoch54后面学习率都为0.01,burn_epochs=220：
epoch | train/box_loss | train/obj_loss | train/cls_loss | metrics/precision | metrics/recall | metrics/mAP_0.5 | metrics/mAP_0.5:0.95 | val/box_loss | val/obj_loss | val/cls_loss | x/lr0 | x/lr1 | x/lr2
50 0.045489 0.0052034 0 0 0 0 0 0 0 0 0.00917 0.00917 0.01747
51 0.044976 0.0052962 0 0 0 0 0 0 0 0 0.00935 0.00935 0.01585
52 0.043255 0.0051407 0 0 0 0 0 0 0 0 0.00953 0.00953 0.01423
53 0.044804 0.0053863 0 0 0 0 0 0 0 0 0.00971 0.00971 0.01261
54 0.044136 0.0049189 0 0 0 0 0 0 0 0 0.00989 0.00989 0.01099
55 0.042046 0.0052086 0 0 0 0 0 0 0 0 0.01 0.01 0.01
56 0.04328 0.004892 0 0 0 0 0 0 0 0 0.01 0.01 0.01
57 0.042468 0.0050412 0 0 0 0 0 0 0 0 0.01 0.01 0.01
58 0.043239 0.0051044 0 0 0 0 0 0 0 0 0.01 0.01 0.01
59 0.041122 0.0047628 0 0 0 0 0 0 0 0 0.01 0.01 0.01
60 0.041649 0.0048522 0 0 0 0 0 0 0 0 0.01 0.01 0.01
61 0.040692 0.0050476 0 0 0 0 0 0 0 0 0.01 0.01 0.01
62 0.038645 0.0047481 0 0 0 0 0 0 0 0 0.01 0.01 0.01

2、半监督训练加载labled10%训练好的模型epoch210接着训练,burn_epochs=220,后面的学习率都是0.01：
epoch | train/box_loss | train/obj_loss | train/cls_loss | metrics/precision | metrics/recall | metrics/mAP_0.5 | metrics/mAP_0.5:0.95 | val/box_loss | val/obj_loss | val/cls_loss | x/lr0 | x/lr1 | x/lr2
211 | 0.027092 | 0.002975 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01
212 | 0.025645 | 0.003046 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01
213 | 0.025966 | 0.003221 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01
214 | 0.025256 | 0.003216 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01
215 | 0.025347 | 0.003051 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01

我想请问下两个问题：
1、为什么yolov5l全监督训练配置文件中lrf设置为0.1，而半监督训练配置文件中lrf设置为1.0呢？这个有什么说法吗？
2、上述训练方式1的结果比方式2的结果差很多

No labels in ./.../unlabel.cache

您好，我在实际半监督训练的时候一直报没有找到target的标签，target本来不就应该是无标签数据吗？为什么datasets_ssod.py加载数据的时候还要加载target的标签呢?

Killed

配置完全按照您说的进行，但是出现了Killed的情况是因为dataloader workers开多了还是？

请问有人可以在自己的数据集使用efficientteacher取得效果提升么

关于pt权重文件的转换“KeyError: 'backbone.stage1'”

运行convert_pt_to_efficient.py的时候出现报错，这个怎么解决呢

训练过程中显示的参数分别都是什么意思？

请问第一行里的参数分别是什么意思呢

convert_efficient_to_yolov5 转换训练好的模型到yolov5识别时为no detections

采用项目本身的脚本detect.py 效果没问题

通过convert_efficient_to_yolov5 转换也没错误，但是转换后使用 yolov5的detect 脚本推理发现全是none结果。

get_label.sh 下载不了

Downloading https://github.com/BowieHsu/EfficientTeacher/releases/tag/data_list/data_list.zip ... -#O#- # # [data_list.zip] End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. note: data_list.zip may be a plain executable, not an archive unzip: cannot find zipfile directory in one of data_list.zip or data_list.zip.zip, and cannot find data_list.zip.ZIP, period.
您好，好像没有这个文件。

导出无标签图片的.txt

无法导出无标签图片的.txt

RuntimeError: Error(s) in loading state_dict for Model

你好，我想用yolov5 v6.1的yolov5s.pt作为预训练模型，通过convert_pt_to_efficient.py转换yolov5s.pt得到e-yolov5s.pt，转换的配置文件用的是configs/sup/public/yolov5s_coco.yaml，转换过程没有报错，于是我用这个e-yolov5s.pt直接采用半监督的方式训练20个类别的模型，但是我在加载e-yolov5s.pt的时候报如下的错误：RuntimeError: Error(s) in loading state_dict for Model:
size mismatch for head.m.0.weight: copying a param with shape torch.Size([255, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([75, 128, 1, 1]).
size mismatch for head.m.0.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([75]).
size mismatch for head.m.1.weight: copying a param with shape torch.Size([255, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([75, 256, 1, 1]).
size mismatch for head.m.1.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([75]).
size mismatch for head.m.2.weight: copying a param with shape torch.Size([255, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([75, 512, 1, 1]).
size mismatch for head.m.2.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([75]).
Traceback (most recent call last):

这是不是意味着，半监督的预训练模型不能直接用官方80个类别的进行训练呢

关于训练超参数的设置

您好，您的关于yolov5的超参数和U版的超参数设置一样么？
另外burn_epochs: 这个参数设置220 是论文中的设定么（假设总共训练300个epoch）

关于用自己的数据集运行的问题

为什么我根据要求将自己的数据集转化为coco格式以后出现错误“读取不到标签”的情况呢？希望得到您的解答

模型转换问题

我用标准yolov5模型转成ET模型训练完成后，如何再转回yolov5模型呢？

NameError: name 'ComputeFastXLoss' is not defined

您好！拉取最新的代码后，依然遇到该问题，请问一下该问题如何解决？我在配置文件中加载了efficient-yolov7s-simota.pt权重
@BowieHsu

About the training speed

Hi, when I reproduce your result, I find that I need 40min per epoch, is it normal? By the way, my device is a 3090Ti GPU with batchsize=32, 10% labeled coco
thanks for sharing

NAS Distill Prune

在configs里看到了nas，distill，prune这些参数，请问一下后续是有计划把efficient teacher打造成一个业务场景应用库吗？
ssod中使用的学习率恒定为0.01，这个是经过调参之后选择的吗？
ssod训练时没有使用syncbn，有关于这部分进行实验吗？
请问有没有交流群？

正负样本分配策略

你好，如果我想改动正负样本分配策略，应该修改哪一部分代码，我看到了models/assigner中已经写有几种策略，但我不知道如何调用他们

路径写错了

readme_zhcn中， configs/ssod/custom/yolov5l_custom_ssod.yaml写成了configs/custom/yolov5l_custom_ssod.yaml

关于自己的数据训练

我现在准备了一份自己的数据集，标准的COCO格式。并且也准备了一份转化的YOLO格式的数据集。对这份数据也完成了切分，切分后的格式也准备成了YOLO格式，格式如下：

如果我想使用该仓库进行半监督的任务，请问我应该如何操作？

use custom dataset, new issue

你好，我遇到以下问题，你能帮我看看吗

train: immutable=False, deprecated_keys=set(), renamed_keys={}, new_allowed=False
EfficientTeacher 2023-3-15 torch 1.7.1 CUDA:0 (NVIDIA GeForce RTX 2080, 7974.0625MB)

Weights & Biases: run 'pip install wandb' to automatically track and visualize EfficientTeacher runs (RECOMMENDED)
TensorBoard: Start with 'tensorboard --logdir yolov5_ssod', view at http://localhost:6006/
Model summary: 478 layers, 47539674 parameters, 47539674 gradients

Scaled weight_decay = 0.0005
optimizer: SGD with parameter groups 110 weight, 101 weight (no decay), 104 bias
Traceback (most recent call last):
File "train.py", line 84, in
main(opt)
File "train.py", line 72, in main
trainer = SSODTrainer(cfg, device, callbacks, LOCAL_RANK, RANK, WORLD_SIZE)
File "/home/jinping/code/efficientteacher-main/trainer/ssod_trainer.py", line 60, in init
self.build_dataloader(cfg, callbacks)
File "/home/jinping/code/efficientteacher-main/trainer/ssod_trainer.py", line 225, in build_dataloader
workers=cfg.Dataset.workers, prefix=colorstr('train: '), cfg=cfg)
File "/home/jinping/code/efficientteacher-main/utils/datasets.py", line 334, in create_dataloader
prefix=prefix)
File "/home/jinping/code/efficientteacher-main/utils/datasets.py", line 704, in init
if self.img_files[0].endswith('.txt') and len(self.img_files[0].split(' ')) == 2:
IndexError: list index out of range
(open-mmlab) jinping@jinping-Precision-5820-Tower:~/code/efficientteacher-main$

pt文件转换问题

我在训练好efficient_yolol.pt之后，使用convert_efficient_to_yolov5转换成标准的yolov5l.pt，但在使用yolov5官方的detect.py时，出现了以下bug：
Traceback (most recent call last):
File "/home/zzz/Desktop/workspace/yolov5/detect.py", line 307, in
main(opt)
File "/home/zzz/Desktop/workspace/yolov5/detect.py", line 302, in main
run(* * vars(opt))
File "/usr/local/anaconda3/envs/***/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, * * kwargs)
File "/home/zzz/Desktop/workspace/yolov5/detect.py", line 82, in run
model = torch.jit.load(w) if 'torchscript' in w else attempt_load(weights, map_location=device)
File "/home/zzz/Desktop/workspace/yolov5/models/experimental.py", line 94, in attempt_load
ckpt = torch.load(attempt_download(w), map_location=map_location) # load
File "/usr/local/anaconda3/envs/yolov5/lib/python3.8/site-packages/torch/serialization.py", line 607, in load
return _load(opened_zipfile, map_location, pickle_module, * * pickle_load_args)
File "/usr/local/anaconda3/envs/yolov5/lib/python3.8/site-packages/torch/serialization.py", line 882, in _load
result = unpickler.load()
File "/usr/local/anaconda3/envs/yolov5/lib/python3.8/site-packages/torch/serialization.py", line 875, in find_class
return super().find_class(mod_name, name)
AttributeError: Can't get attribute 'DetectionModel' on <module 'models.yolo' from '/home/zzz/Desktop/workspace/yolov5/models/yolo.py'>