Code Monkey home page Code Monkey logo

yolov3's Introduction

YOLOV3


Introduction

This is my own YOLOV3 written in pytorch, and is also the first time i have reproduced a object detection model.The dataset used is PASCAL VOC. The eval tool is the voc2010. Now the mAP gains the goal score.

Subsequently, i will continue to update the code to make it more concise , and add the new and efficient tricks.

Note : Now this repository supports the model compression in the new branch model_compression


Results

name Train Dataset Val Dataset mAP(others) mAP(mine) notes
YOLOV3-448-544 2007trainval + 2012trainval 2007test 0.769 0.768 | - baseline(augument + step lr)
YOLOV3-*-544 2007trainval + 2012trainval 2007test 0.793 0.803 | - +multi-scale training
YOLOV3-*-544 2007trainval + 2012trainval 2007test 0.806 0.811 | - +focal loss(note the conf_loss in the start is lower)
YOLOV3-*-544 2007trainval + 2012trainval 2007test 0.808 0.813 | - +giou loss
YOLOV3-*-544 2007trainval + 2012trainval 2007test 0.812 0.821 | - +label smooth
YOLOV3-*-544 2007trainval + 2012trainval 2007test 0.822 0.826 | - +mixup
YOLOV3-*-544 2007trainval + 2012trainval 2007test 0.833 0.832 | 0.840 +cosine lr
YOLOV3-*-* 2007trainval + 2012trainval 2007test 0.858 0.858 | 0.860 +multi-scale test and flip, nms threshold is 0.45

Note :

  • YOLOV3-448-544 means train image size is 448 and test image size is 544. "*" means the multi-scale.
  • mAP(mine)'s format is (use_difficult mAP | no_difficult mAP).
  • In the test, the nms threshold is 0.5(except the last one) and the conf_score is 0.01.others nms threshold is 0.45(0.45 will increase the mAP)
  • Now only support the single gpu to train and test.

Environment

  • Nvida GeForce RTX 2080 Ti
  • CUDA10.0
  • CUDNN7.0
  • ubuntu 16.04
  • python 3.5
# install packages
pip3 install -r requirements.txt --user

Brief

  • Data Augment (RandomHorizontalFlip, RandomCrop, RandomAffine, Resize)
  • Step lr Schedule
  • Multi-scale Training (320 to 640)
  • focal loss
  • GIOU
  • Label smooth
  • Mixup
  • cosine lr
  • Multi-scale Test and Flip

Prepared work

1、Git clone YOLOV3 repository

git clone https://github.com/Peterisfar/YOLOV3.git

update the "PROJECT_PATH" in the params.py.

2、Download dataset

  • Download Pascal VOC dataset : VOC 2012_trainvalVOC 2007_trainvalVOC2007_test. put them in the dir, and update the "DATA_PATH" in the params.py.
  • Convert data format : Convert the pascal voc *.xml format to custom format (Image_path0   xmin0,ymin0,xmax0,ymax0,class0   xmin1,ymin1...)
cd YOLOV3 && mkdir data
cd utils
python3 voc.py # get train_annotation.txt and test_annotation.txt in data/

3、Download weight file

Make dir weight/ in the YOLOV3 and put the weight file in.


Train

Run the following command to start training and see the details in the config/yolov3_config_voc.py

WEIGHT_PATH=weight/darknet53_448.weights

CUDA_VISIBLE_DEVICES=0 nohup python3 -u train.py --weight_path $WEIGHT_PATH --gpu_id 0 > nohup.log 2>&1 &

Notes:

  • Training steps could run the "cat nohup.log" to print the log.
  • It supports to resume training adding --resume, it will load last.pt automaticly.

Test

You should define your weight file path WEIGHT_FILE and test data's path DATA_TEST

WEIGHT_PATH=weight/best.pt
DATA_TEST=./data/test # your own images

CUDA_VISIBLE_DEVICES=0 python3 test.py --weight_path $WEIGHT_PATH --gpu_id 0 --visiual $DATA_TEST --eval

The images can be seen in the data/


TODO

  • Mish
  • OctConv
  • Custom data

Reference

yolov3's People

Contributors

peterisfar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

yolov3's Issues

About utils/datasets.py

line111: for i in range(3):
line112: label[i][..., 5] = 1.0

Why set the conf of map pixels to 1.0? It means all pixels contains obj?

关于yololoss的疑问

您的代码非常简练有力!赞!
但是我对于yololoss有一些疑问:
您的策略貌似跟yolov3论文的策略不一样啊
gtbox与每层feature_map中的pred_anchor进行IOU计算,如果IOU>0.3即进行assign为positive操作。若gt_box与所有feature_map中的pred_anchor的IOU都小于0.3时,取所有feature_map中最大的anchor进行assign
而我理解的yolov3策略是
gtbox与所有feature_map中的pred_anchor进行IOU计算,挑选IOU最大的哪一个进行assign.
请问这样做有什么好处吗?希望能与您交流一二

validate的速度特别慢

我的设备:云平台的4核cpu,24g内存,2080ti-11g显卡。
训练阶段,gpu正常使用,利用率100%。但是在测试时,gpu利用率0%,速度特别慢。之前的相关issues我也看了,但没有发现到相关的解决办法。
之后在自己没有cpu的电脑上使用best.pt权重测试,发现测试一张图片的时间为1-2s,但在云平台测试一张图片的时间为15-25s。

注:multi_scale和flip都没有开

您好,想请问一个您的label处理的问题

我自己写的yolov3的loss总是计算出nan值,所以我用您的loss计算函数放在我的代码上想看下是否是我的loss计算有问题。我发现在您的loss计算 cls loss时有这么一段:
label_cls = label[..., 6:]
label_mix = label[..., 5:6]
按道理,label cls应该是label[...,5:]
我想请问下您的第五位代表什么含义,因为最近事情很多,没时间细看您的dataloader部分,如果有时间的话麻烦简单讲解一下这部分处理在您的代码的哪部分,谢谢!!!

Anchor

请问前面这组anchor和后面注释的anchor是怎样的尺度关系?
MODEL = {"ANCHORS":[[(1.25, 1.625), (2.0, 3.75), (4.125, 2.875)], # Anchors for small obj(12,16),(19,36),(40,28)
[(1.875, 3.8125), (3.875, 2.8125), (3.6875, 7.4375)], # Anchors for medium obj(36,75),(76,55),(72,146)
[(3.625, 2.8125), (4.875, 6.1875), (11.65625, 10.1875)]], # Anchors for big obj(142,110),(192,243),(459,401)
"STRIDES":[8, 16, 32],
"ANCHORS_PER_SCLAE":3
}
谢谢

关于测试

你好 关于这里的测试这一部分我看的不是很理解。这里是要自己的图片进行测试吗还是?不是用下载好的2007-test吗?

multi_scale在单个epoch里面并不起作用?

因为我自己的需求问题,需要在训练集中额外加一些其他东西,然后想看一下每个batch的tensor大小,结果发现,在单个epoch里的图片大小全都是448,图片大小并没有随着multi_scale的代码而改变。

我自己打印tensor.size的代码在imgs = imgs.to(self.device)之后,multi_scale代码在https://github.com/Peterisfar/YOLOV3/blob/03a834f88d57f6cf4c5016a1365d631e8bbbacea/train.py#L131-L134。

Epoch:[ 0 | 49 ]    Batch:[ 0 | 2068 ]    loss_giou: 2.3399    loss_conf: 1.8814    loss_cls: 1.1482    loss: 5.3695    lr: 0
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
multi_scale_img_size : 512
torch.Size([8, 3, 448, 448])
Epoch:[ 0 | 49 ]    Batch:[ 10 | 2068 ]    loss_giou: 2.0293    loss_conf: 3.2374    loss_cls: 2.2489    loss: 7.5156    lr: 2.41663e-07
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
multi_scale_img_size : 512
torch.Size([8, 3, 448, 448])
Epoch:[ 0 | 49 ]    Batch:[ 20 | 2068 ]    loss_giou: 2.1348    loss_conf: 2.8944    loss_cls: 2.2905    loss: 7.3198    lr: 4.83325e-07
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])

但在第一个epoch的最后,图片的大小成功改变了,不知道这是不是一个bug。

Epoch:[ 0 | 49 ]    Batch:[ 2050 | 2068 ]    loss_giou: 2.0007    loss_conf: 2.3295    loss_cls: 2.1310    loss: 6.4612    lr: 4.95408e-05
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
multi_scale_img_size : 544
torch.Size([8, 3, 448, 448])
Epoch:[ 0 | 49 ]    Batch:[ 2060 | 2068 ]    loss_giou: 1.9995    loss_conf: 2.3252    loss_cls: 2.1278    loss: 6.4524    lr: 4.97825e-05
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([8, 3, 448, 448])
torch.Size([7, 3, 448, 448])
torch.Size([8, 3, 544, 544])

baseline

您好请问您的代码的baseline是指的darknet对yolo进行训练吗

关于模型的map的疑问?

您好,虽然你的代码很简洁,但是随着对您的代码的深入了解,发现您的代码貌似并没有取得良好的检测效果。在图像大小为544*544下,conf_thershold=0.01,nms_threshold=0.5时,对voc_test进行000001.jpg进行检测时,发现进行了极大值抑制之后,boundingbox的数量居然还有了276个,但GT_box只有一个,模型检测出很多的FP,所以在此对您的eval——map的计算方式是否正确产生了质疑。虽然运行test.py,在data/result/里会出现有些不错的检测效果的图,但是您的utils/visualize中的visualize_boxes的min_score_thresh=0.5,所以您这些图片其实是在conf_threshold=0.5时得到的。我用我自己的eval方法在conf_threshhold=0.5的条件下对您的检测结果的进行了map测试,发现结果有点poor。。。

每个epoch训练完之后validate的速度特别慢?

感谢您的代码,请问每个epoch训练完之后validate的batchsize可以修改吗(现在我运行的时候是单张图片计算的,2007test好多图片,每个epoch验证都要很久),是否是我的参数没设置对,我应该如何修改,因为每跑完一个epoch就要花很长时间在validation上,速度特别慢。请问我该如何解决这个问题。谢谢。

聚类anchors

请问一下,anchors是怎么得到的啊?有代码可以提供一下嘛?

关于voc.py中的逻辑错误

当use_difficult_bbox=False时,其中在判断的时候有逻辑错误,生成的注解文件包含有难以分辨的标签。
if (not use_difficult_bbox) and (difficult == 1): # difficult 表示是否容易识别,0表示容易,1表示困难
continue
应该更改为
if ( use_difficult_bbox) and (difficult == 1): # difficult 表示是否容易识别,0表示容易,1表示困难
continue

训练问题

请问您的代码可以对于VOC中的特定几类(不是全部20类)进行训练吗?

IndexError: Caught IndexError in DataLoader worker process 1.

File "train.py", line 99, in train
for i, (imgs, label_sbbox, label_mbbox, label_lbbox, sbboxes, mbboxes, lbboxes) in enumerate(self.train_dataloader):
File "/home/aistudio/work/torch1/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 801, in next
return self._process_data(data)
File "/home/aistudio/work/torch1/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/aistudio/work/torch1/pytorch/lib/python3.6/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "/home/aistudio/work/torch1/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/aistudio/work/torch1/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/aistudio/work/torch1/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/aistudio/work/newyolov3/YOLOV3/utils/datasets.py", line 36, in getitem
img_mix, bboxes_mix = self.__parse_annotation(self.__annotations[item_mix])
File "/home/aistudio/work/newyolov3/YOLOV3/utils/datasets.py", line 84, in __parse_annotation
img, bboxes = dataAug.Resize((self.img_size, self.img_size), True)(np.copy(img), np.copy(bboxes))
File "/home/aistudio/work/newyolov3/YOLOV3/utils/data_augment.py", line 97, in call
bboxes[:, [0, 2]] = bboxes[:, [0, 2]] * resize_ratio + dw
IndexError: too many indices for array
当我在跑第一个epoch的时候运行60个batchsize出现上述问题

【Question】Batch size?

您好, 我看到config文件里batch size设置为8,仔细阅读代码后也没有发现其他地方会更改batch size。 我看到你的显卡是2080ti,挺好的显卡了,请问batch size设置这么小是有什么特殊原因吗?
谢谢

best.pt链接

作者大大你好,你分享的best.pt链接失效了,方便更新一下吗,万分感谢!

augment

你好,首先感谢您分享的代码,我想问一下在数据增强那里有个mixup函数,为什么要采用beta分布来设置进行图像数据的混合,而且class_id后面一个参数为什么是0-1??

LOSS

Sorry to bother you, I read your comment in ultralytics / yolov3. Did you changed his loss function to the original YOLOV3 paper version?

VOC训练问题

我使用ultralytics的yolov3进行了训练,发现根本训练不出来, map连1都上不去, 我发现您也遇到过这个问题,您知道是什么原因吗?

About training

Hey gays !when I training use my own image ,it only trained 20 epoches and then begin to validate,the total epoches I set are 50.

Why do you set epoch >= 20 in train.py ?

            if epoch >= 20:
                print('*'*20+"Validate"+'*'*20)
                with torch.no_grad():
                    APs = Evaluator(self.yolov3).APs_voc()
                    for i in APs:
                        print("{} --> mAP : {}".format(i, APs[i]))
                        mAP += APs[i]
                    mAP = mAP / self.train_dataset.num_classes
                    print('mAP:%g'%(mAP))

            self.__save_model_weights(epoch, mAP)
            print('best mAP : %g' % (self.best_mAP))

Detect too many extra bounding boxes with the test threshold of 0.01.

Dear author,
When evaluating the mAP of VOC dataset, the performance is much better than darknet YOLOv3 version. However, there are much more extra predicted bounding boxes. The test threshold is default as 0.005 in darknet version and the number of false positives is about 10000~11000. However, in this Pytorch version code, even with a threshold of 0.1, the number of FP is still high. Could you please tell me the reason and how to solve it?
Thanks

Is it possible to convert weight from pt to darknet

Hi,
Thank you so much for your work in this repository. I would like to know if it is possible to convert the trained weight from pytorch model (pt) format to darknet format, so that one can use it with darknet?

Thanks.

关于训练

你好 我按照你写的步骤去训练 但是这一行代码我没看懂是什么意思,是写错了还是?希望有空能解答一下。代码如下:
CUDA_VISIBLE_DEVICES=0 nohup python3 -u train.py --weight_path $WEIGHT_PATH --gpu_id 0 > nohup.log 2>&1 &

GPU training

你好,非常感谢你开源纯pytorch的yolov3版本,如何使用多GPU训练。谢谢。

IDE

Hi,
Which IDE are you using?
Best regards,
PeterPham

some questions about the 'datasets' and 'yolo_loss' code

Very thanks for your work. These days I am working on the implementation of YOLOv3 and find your work. I have some questions about the code.

  1. random_crop function. In utils/data_augment.py, line 'crop_xmax = max(w_img, int(max_bbox[2] + random.uniform(0, max_r_trans)))' and 'crop_ymax = max(h_img, int(max_bbox[3] + random.uniform(0, max_d_trans)))' I think the 'max' function should be changed to 'min' because 'h_img' and 'w_img' are always the maximum ones.
  2. mixup function: utils/datasets.py line 'item_mix = random.randint(0, len(self.__annotations)-1)'. when I test by myself, there exists a case that item_mix = item. I think it is better to add a if condition for this case to avoid 'bboxes = np.concatenate([bboxes_org, bboxes_mix])'
  3. label_smooth: I read the paper 'Bag of Freebies' and in section 3.2, equation (3), the formula is different with your implementation. The one shown in the paper is 1 - delta if i=y else delta (K -1) and your code is 1 - delta + delta / K if i=y else delta / K. Is it correct?
  4. loss function: I didn't understand 'bbox_loss_scale = 2.0 - 1.0 * label_xywh[..., 2:3] * label_xywh[..., 3:4] / (img_size ** 2)' in giou loss. Is it shown in the GIOU paper? why the format is 2 minus the label box area / image area? For 'label_noobj_mask * FOCAL(input=p_conf, target=label_obj_mask)' in confidence loss. I find in some other implementations, there is a scale factor for the balance of negative samples. see eriklindernoren/PyTorch-YOLOv3#309. Then how to determine the scale value?
    Could you please explain them? Very thanks!

关于结构

同学 你好 我想询问一下 你现在用的pytorch复现yolov3提高mAP时是通过改变网络结构来实现的吗?能说一下大体的改变吗

换自己的数据集为什么跑一点就报错了

issues
您好,我换了一个自己的数据集后为什么在跑了一点点后报错啊。我查了下问题,说是数据集的问题,但是我反复重新弄了好几次数据集,确保无误后,依然还是有这个问题。希望您能解答一下,万分感谢!!!

index 56 is out of bound for axis 1 with size 56?

when I run this code to trian on VOC, I met the following bug:
"index 56 is out of bound for axis 1 with size 56",

It caused by the following sentences in "utils/dataset.py" :
label[i][yind, xind, iou_mask, 0:4] = bbox_xywh
label[i][yind, xind, iou_mask, 4:5] = 1.0
Iabel[i][yind, xind, iou_mask, 5:6] = bbox_mix
label[i][yind, xind, iou_mask, 6:] = one_hot_smooth
How can I solve it? Thanks for your help.

loss 降不下来

你好,自己数据集在v5上面loss降低到0.02左右,但是在本工程下,最低只能降到0.5,精度上不去,请问遇到过这个问题吗?

About training

I 'm so sorry to bother you again,but i really don't know which place occur this proplem.The erro is:RuntimeError: shape '[512, 256, 3, 3]' is invalid for input of size 408910when i try to run the train.py you show on the README.md.Could you tell me the problem? i just according to your guide and run the train.py

Anchors setting when training on the custom dataset

Hey, I find your code is clean and elegant. When training on my dataset, I have two questions about anchors:

  1. What is the relationship between these float anchors and interger anchors in comment?
    1.25 is 12 scaled by 8 times? 1.875 is 36 scales by 16 times?
MODEL = {"ANCHORS":[[(1.25, 1.625), (2.0, 3.75), (4.125, 2.875)], # Anchors for small obj(12,16),(19,36),(40,28)
[(1.875, 3.8125), (3.875, 2.8125), (3.6875, 7.4375)], # Anchors for medium obj(36,75),(76,55),(72,146)
[(3.625, 2.8125), (4.875, 6.1875), (11.65625, 10.1875)]], # Anchors for big obj(142,110),(192,243),(459,401)
"STRIDES":[8, 16, 32],
"ANCHORS_PER_SCLAE":3
}
  1. Is that necessary to use kmeans algorithm to generate new anchors when training on the custom dataset?

Looking forward to your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.