Code Monkey home page Code Monkey logo

efficientteacher's People

Contributors

alibaba-oss avatar bowiehsu avatar kdy1999 avatar meton-robean avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

efficientteacher's Issues

Some questions about EfficientTeacher

Hello感谢作者们的工作,将半监督算法结合了广泛应用的YoloV5并得到了很大的提升,看了论文之后(还没来得及跑代码==)有几个问题想要了解一下:

  1. Table 1中,我们看到本文提出的Dense Detector相比YoloV5、V7而言在参数量不相上下、计算量甚至更高的情况下的mAP仍有下降,那么为什么不考虑直接基于以上更强性能的Yolo detectors应用本文方法?
    image

  2. 目前半监督方法在one stage&anchor-free里的还主要是基于FCOS这种比较早期的detector,论文里一般没有提供基于比较sota的如YoloX等模型的效果。不知道你们有尝试将Efficient Teacher应用在yolox这类算法上吗?看起来PLA&EA也都适合anchor-free

  3. 在burn-in阶段利用ublabel data进行域迁移学习的想法非常漂亮,但在消融实验里好像没单独看到DA带来的影响,不知道这一步带来的提升怎么样?Figure5中普通Burn-In效果也能达到不错,请问这里的普通burn-in是没有进行DA和adaptive threshold的吗?

Permission denied: '/runs_yolov5'

您好 我按着README进行复现论文,但是在《在实际项目中使用Efficient Teacher方案》,第三部分 ,进行有监督训练出现整个问题,请问应该如何解决
111111

训练精度问题

训练自己的数据集,采用直接训练方式,前220轮监督学习map50持续增长,后面切换到半监督后后,map50急速下降后开始增长,100轮结束后还是和前220轮的map50相差一段距离,请问这个是什么原因,有什么推荐的解决方法吗?谢谢!

loss曲线

训练过程中,val loss曲线全为0,其它曲线正常,请问这个可能是什么原因

unsupported operand type(s) for +=: 'NoneType' and 'int';请问这是什么原因呀

Traceback (most recent call last):
File "train.py", line 84, in
main(opt)
File "train.py", line 76, in main
trainer.train(callbacks, val)
File "/share/disk1/ml/code/efficientteacher-main/trainer/trainer.py", line 532, in train
self.train_in_epoch(callbacks)
File "/share/disk1/ml/code/efficientteacher-main/trainer/ssod_trainer.py", line 285, in train_in_epoch
self.train_without_unlabeled(callbacks)
File "/share/disk1/ml/code/efficientteacher-main/trainer/ssod_trainer.py", line 402, in train_without_unlabeled
self.update_optimizer(loss, ni)
File "/share/disk1/ml/code/efficientteacher-main/trainer/ssod_trainer.py", line 445, in update_optimizer
self.ema.update(self.model)
File "/share/disk1/ml/code/efficientteacher-main/utils/torch_utils.py", line 331, in update
self.updates += 1
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'
Killing subprocess 60073
Traceback (most recent call last):
File "/root/anaconda3/envs/effictteacher/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/root/anaconda3/envs/effictteacher/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/anaconda3/envs/effictteacher/lib/python3.6/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/root/anaconda3/envs/effictteacher/lib/python3.6/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/root/anaconda3/envs/effictteacher/lib/python3.6/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/anaconda3/envs/effictteacher/bin/python', '-u', 'train.py', '--local_rank=0', '--cfg', 'configs/ssod/custom/yolov5l_custom_ssod.yaml']' returned non-zero exit status 1.

标准转ET之后运行val.py,精度为0,各项值都为0

老师您好,我想请问一下以下问题:
问题1:标准转ET运行val.py,精度为什么为0?我是使用的官方上yolov5代码训练了,放在convert_pt_to_efficient.py转成efficient_v5.py模型。(请问我转换权重文件这里是不是理解有问题:best.py指官方v5转ET;yolov5s_custom.yaml指只修改过nc和classes的此代码文件;efficient_v5.pt是转换成ET的权重文件)
image

问题2:配置文件是如何读取backbone、neck、head的?
问题3:在配置文件当中有没有 “weight”会产生什么影响?
image

关于yolov5s的半监督训练

您好,请问关于小模型(例如yolov5s)的模型半监督训练的超参数选择方面,您的团队目前有计划在训练吗?参数的选择方面有一些更好的建议吗?

Data downloading error

I get the following error when running bash data/get_coco.sh.

[val2017.zip]
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of val2017.zip or
        val2017.zip.zip, and cannot find val2017.zip.ZIP, period.

[coco2017labels.zip]
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of coco2017labels.zip or
        coco2017labels.zip.zip, and cannot find coco2017labels.zip.ZIP, period.

模型转换后出现KeyError

KeyError: 'det_8.conv1'
image
我采用的是转换后的efficient pt进行训练的,yolov5s,配置文件按照README要求更改,但是训练完成之后发现转换时会提示缺失文件(类似于不能导入yolo_ssod等),这个时候我在外部efficientteacher的models文件中找到对应文件名文件放入到script的models文件中,发现出现了这样的错误。

Welcome to the efficientteacher project

请允许我介绍一下3月的更新计划——1.我们将会在验证后上传在readme中提到的这些coco预训练模型,这样可以进一步方便大家调试和使用我们的算法库,也能更快地获得较好的半监督训练效果;2.我们的半监督方案还无法适应任意的数据分布,所以我们希望收集大家在实际使用中遇到的各种情况,请随时提出issue来共同改进;

Let me introduce the update plan for March: 1. After verification, we will upload the coco pre-trained models mentioned in readme, which will further facilitate everyone to debug and use our algorithm library, and can also get better semi-supervised training effect faster; 2. Our semi-supervised scheme can not adapt to any data distribution, so we hope to collect various situations encountered by everyone in actual use, please feel free to submit issues to improve together;

关于compute_un_sup_loss函数中select_targets

#不确定样本里面obj特别高的,送出来修iou loss
if t[7] >= 0.99:
    uncertain_obj_targets.append(np.concatenate((t[:6], t[7:8])))
#不确定样本里cls特别高的,送出来修cls loss
if t[8] >= 0.99:
    uncertain_cls_targets.append(np.concatenate((t[:6], t[7:8])))

此处

if t[8] >= 0.99:
    uncertain_cls_targets.append(np.concatenate((t[:6], t[7:8])))

是否应该是

if t[8] >= 0.99:
    uncertain_cls_targets.append(np.concatenate((t[:6], t[8:9])))

关于数据集设置的问题

我想了解一些关于半监督训练时数据集的设置情况。
1、无标签的样本应该出自新的图像,还是应该由原始的训练集中划分得出呢
2、在原论文中,你们使用的训练集和target是来自10%的train2017吗

Issues encountered when using Efficient Teacher on personal datasets

您好,我在执行 python val.py --config configs/sup/custom/yolov5s_custom.yaml --weights weights/efficient-yolov5s.pt 此条命令时报错了,以下是我的 yaml 文件定义:

Dataset:
data_name: 'custom'
train: ['./images/train' ]
val: [ './images/valid' ]
test: [ './images/valid' ]
nc: 1 # number of classe
np: 0 #number of keypoints
names: [ 'apple' ]


我的数据集是 txt 格式,组织目录如下:
|-- dataset
| |-- images
| | |-- train
| | |-- valid
| |-- labels
| | |-- train
| | |-- valid

txt文件时标准的yolov5格式,如下所示:
0 0.927518 0.820644 0.123193 0.112306 0 0.525768 0.426199 0.050776 0.020583

如何切换yolo版本?

请问是否支持其它版本的yolo,configs/ssod中只有关于v5的配置,可否切换至V7版本进行半监督学习?

预模型使用问题

想问一下,我想把efficient-yolov5l-ssod.pt作为预模型直接用于官方的yolov5代码中,但提示no model named "models.detector",模型名称对应不上,请问该如何修改呢?

训练问题

将上面生成的unlabel.txt的绝对路径用来替换yolov5_custom.yaml的target: data_custom_target.txt,然后粘贴以下部分配置文件到yolov5l_custom.yaml中:

您好,上面这个是 README 里面的介绍,这里的 target: data_custom_target.txt 字段请问是添加在哪里?我再 yolov5l_custom.yaml 文件中并未找到。

另外我这个 custom_train.txt 是否需要涵盖 unlabel.txt 的内容,还是按照原来的监督学习默认的就行?

关于半监督训练学习率问题

1、半监督从头训练学习率截取部分如下,从epoch54后面学习率都为0.01,burn_epochs=220:
epoch | train/box_loss | train/obj_loss | train/cls_loss | metrics/precision | metrics/recall | metrics/mAP_0.5 | metrics/mAP_0.5:0.95 | val/box_loss | val/obj_loss | val/cls_loss | x/lr0 | x/lr1 | x/lr2
50 0.045489 0.0052034 0 0 0 0 0 0 0 0 0.00917 0.00917 0.01747
51 0.044976 0.0052962 0 0 0 0 0 0 0 0 0.00935 0.00935 0.01585
52 0.043255 0.0051407 0 0 0 0 0 0 0 0 0.00953 0.00953 0.01423
53 0.044804 0.0053863 0 0 0 0 0 0 0 0 0.00971 0.00971 0.01261
54 0.044136 0.0049189 0 0 0 0 0 0 0 0 0.00989 0.00989 0.01099
55 0.042046 0.0052086 0 0 0 0 0 0 0 0 0.01 0.01 0.01
56 0.04328 0.004892 0 0 0 0 0 0 0 0 0.01 0.01 0.01
57 0.042468 0.0050412 0 0 0 0 0 0 0 0 0.01 0.01 0.01
58 0.043239 0.0051044 0 0 0 0 0 0 0 0 0.01 0.01 0.01
59 0.041122 0.0047628 0 0 0 0 0 0 0 0 0.01 0.01 0.01
60 0.041649 0.0048522 0 0 0 0 0 0 0 0 0.01 0.01 0.01
61 0.040692 0.0050476 0 0 0 0 0 0 0 0 0.01 0.01 0.01
62 0.038645 0.0047481 0 0 0 0 0 0 0 0 0.01 0.01 0.01

2、半监督训练加载labled10%训练好的模型epoch210接着训练,burn_epochs=220,后面的学习率都是0.01:
epoch | train/box_loss | train/obj_loss | train/cls_loss | metrics/precision | metrics/recall | metrics/mAP_0.5 | metrics/mAP_0.5:0.95 | val/box_loss | val/obj_loss | val/cls_loss | x/lr0 | x/lr1 | x/lr2
211 | 0.027092 | 0.002975 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01
212 | 0.025645 | 0.003046 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01
213 | 0.025966 | 0.003221 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01
214 | 0.025256 | 0.003216 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01
215 | 0.025347 | 0.003051 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01

我想请问下两个问题:
1、为什么yolov5l全监督训练配置文件中lrf设置为0.1,而半监督训练配置文件中lrf设置为1.0呢?这个有什么说法吗?
2、上述训练方式1的结果比方式2的结果差很多

No labels in ./.../unlabel.cache

您好,我在实际半监督训练的时候一直报没有找到target的标签,target本来不就应该是无标签数据吗?为什么datasets_ssod.py加载数据的时候还要加载target的标签呢?
屏幕截图 2023-03-22 170831

Killed

image
配置完全按照您说的进行,但是出现了Killed的情况是因为dataloader workers开多了还是?

get_label.sh 下载不了

Downloading https://github.com/BowieHsu/EfficientTeacher/releases/tag/data_list/data_list.zip ... -#O#- # # [data_list.zip] End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. note: data_list.zip may be a plain executable, not an archive unzip: cannot find zipfile directory in one of data_list.zip or data_list.zip.zip, and cannot find data_list.zip.ZIP, period.
您好,好像没有这个文件。

RuntimeError: Error(s) in loading state_dict for Model

你好,我想用yolov5 v6.1的yolov5s.pt作为预训练模型,通过convert_pt_to_efficient.py转换yolov5s.pt得到e-yolov5s.pt,转换的配置文件用的是configs/sup/public/yolov5s_coco.yaml,转换过程没有报错,于是我用这个e-yolov5s.pt直接采用半监督的方式训练20个类别的模型,但是我在加载e-yolov5s.pt的时候报如下的错误:RuntimeError: Error(s) in loading state_dict for Model:
size mismatch for head.m.0.weight: copying a param with shape torch.Size([255, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([75, 128, 1, 1]).
size mismatch for head.m.0.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([75]).
size mismatch for head.m.1.weight: copying a param with shape torch.Size([255, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([75, 256, 1, 1]).
size mismatch for head.m.1.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([75]).
size mismatch for head.m.2.weight: copying a param with shape torch.Size([255, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([75, 512, 1, 1]).
size mismatch for head.m.2.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([75]).
Traceback (most recent call last):

这是不是意味着,半监督的预训练模型不能直接用官方80个类别的进行训练呢

关于训练超参数的设置

您好,您的关于yolov5的超参数和U版的超参数设置一样么?
另外burn_epochs: 这个参数 设置220 是论文中的设定么(假设总共训练300个epoch)

模型转换问题

我用标准yolov5模型转成ET模型训练完成后,如何再转回yolov5模型呢?

About the training speed

Hi, when I reproduce your result, I find that I need 40min per epoch, is it normal? By the way, my device is a 3090Ti GPU with batchsize=32, 10% labeled coco
thanks for sharing

NAS Distill Prune

  1. 在configs里看到了nas,distill,prune这些参数,请问一下后续是有计划把efficient teacher打造成一个业务场景应用库吗?
  2. ssod中使用的学习率恒定为0.01,这个是经过调参之后选择的吗?
  3. ssod训练时没有使用syncbn,有关于这部分进行实验吗?
  4. 请问有没有交流群?

正负样本分配策略

你好,如果我想改动正负样本分配策略,应该修改哪一部分代码,我看到了models/assigner中已经写有几种策略,但我不知道如何调用他们

路径写错了

readme_zhcn中, configs/ssod/custom/yolov5l_custom_ssod.yaml写成了configs/custom/yolov5l_custom_ssod.yaml

关于自己的数据训练

我现在准备了一份自己的数据集,标准的COCO格式。并且也准备了一份转化的YOLO格式的数据集。对这份数据也完成了切分,切分后的格式也准备成了YOLO格式,格式如下:
image
如果我想使用该仓库进行半监督的任务,请问我应该如何操作?

use custom dataset, new issue

你好,我遇到以下问题,你能帮我看看吗

train: immutable=False, deprecated_keys=set(), renamed_keys={}, new_allowed=False
EfficientTeacher 2023-3-15 torch 1.7.1 CUDA:0 (NVIDIA GeForce RTX 2080, 7974.0625MB)

Weights & Biases: run 'pip install wandb' to automatically track and visualize EfficientTeacher runs (RECOMMENDED)
TensorBoard: Start with 'tensorboard --logdir yolov5_ssod', view at http://localhost:6006/
Model summary: 478 layers, 47539674 parameters, 47539674 gradients

Scaled weight_decay = 0.0005
optimizer: SGD with parameter groups 110 weight, 101 weight (no decay), 104 bias
Traceback (most recent call last):
File "train.py", line 84, in
main(opt)
File "train.py", line 72, in main
trainer = SSODTrainer(cfg, device, callbacks, LOCAL_RANK, RANK, WORLD_SIZE)
File "/home/jinping/code/efficientteacher-main/trainer/ssod_trainer.py", line 60, in init
self.build_dataloader(cfg, callbacks)
File "/home/jinping/code/efficientteacher-main/trainer/ssod_trainer.py", line 225, in build_dataloader
workers=cfg.Dataset.workers, prefix=colorstr('train: '), cfg=cfg)
File "/home/jinping/code/efficientteacher-main/utils/datasets.py", line 334, in create_dataloader
prefix=prefix)
File "/home/jinping/code/efficientteacher-main/utils/datasets.py", line 704, in init
if self.img_files[0].endswith('.txt') and len(self.img_files[0].split(' ')) == 2:
IndexError: list index out of range
(open-mmlab) jinping@jinping-Precision-5820-Tower:~/code/efficientteacher-main$

pt文件转换问题

我在训练好efficient_yolol.pt之后,使用convert_efficient_to_yolov5转换成标准的yolov5l.pt,但在使用yolov5官方的detect.py时,出现了以下bug:
Traceback (most recent call last):
File "/home/zzz/Desktop/workspace/yolov5/detect.py", line 307, in
main(opt)
File "/home/zzz/Desktop/workspace/yolov5/detect.py", line 302, in main
run(* * vars(opt))
File "/usr/local/anaconda3/envs/***/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, * * kwargs)
File "/home/zzz/Desktop/workspace/yolov5/detect.py", line 82, in run
model = torch.jit.load(w) if 'torchscript' in w else attempt_load(weights, map_location=device)
File "/home/zzz/Desktop/workspace/yolov5/models/experimental.py", line 94, in attempt_load
ckpt = torch.load(attempt_download(w), map_location=map_location) # load
File "/usr/local/anaconda3/envs/yolov5/lib/python3.8/site-packages/torch/serialization.py", line 607, in load
return _load(opened_zipfile, map_location, pickle_module, * * pickle_load_args)
File "/usr/local/anaconda3/envs/yolov5/lib/python3.8/site-packages/torch/serialization.py", line 882, in _load
result = unpickler.load()
File "/usr/local/anaconda3/envs/yolov5/lib/python3.8/site-packages/torch/serialization.py", line 875, in find_class
return super().find_class(mod_name, name)
AttributeError: Can't get attribute 'DetectionModel' on <module 'models.yolo' from '/home/zzz/Desktop/workspace/yolov5/models/yolo.py'>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.