abeardear / pytorch-yolo-v1 Goto Github PK

View Code? Open in Web Editor NEW

592.0 6.0 244.0 70.2 MB

an experiment for yolo-v1, including training and testing.

License: MIT License

Python 100.00%

yolov1 pytorch object-detection experiment

pytorch-yolo-v1's People

Contributors

Stargazers

Watchers

Forkers

uptodiff insigh licf23 kay1794 pengfeidip lihengtianxia cguisu eversee22 0xproflupin shifangyan shape-kim motokimura sunlinhui uecym snsardiwal yu-ta-chen zhewei-forks edwardzcl smallsmallqiu stefancho daywater rayteen lkk12014402 sky730367 eleven11wang sophia303v wangzz313 linwei-chen guhay lavendelion cyberstone02 mingcent stefanwangsjtu note-liu caijiahao test-error jzxiong wuxiangchao 2226171237 czifan tkone2018 chenhang98 sunyaj hbwxcw json870422471 fortitude94deng beebrain leokale ryan-phy felixhuangx feizy rodrigogantier shadowwalker00 buaaplayer 525747310 inghyun hxz2015 cnuxdh heypaprika liaorongfan yokings piaofu110 lacorse husterrc qinjt zhihaolzh xiaohuihuichao guolihe xubaozhao oreo-lp lidading leiqing110 lucheng2 snakeonex hedali-88 eekarot zhangxingchao x-sapphire tangohu17 aaron2bin zhangmeijuan666 ridang trings mahmoud-saadel-din nathannlp liaoxianfu silenceawp billysx hyunsongkwon tjtanaa hqw15 ycl1035621458 ryw90 1996hxr agito555 cyy0523xc hongliw ahwxz123 xcd-suzhanzhishen k0nen

pytorch-yolo-v1's Issues

IndexError: invalid index of a 0-dim tensor

Met this question when using predict.py. Error code location is located at

        i = order[0]
        keep.append(i)

after check the code, I found the key is to update squeeze() use

        # before
        ids = (ovr<=threshold).nonzero().squeeze()
        # after
        ids = (ovr<=threshold).nonzero().squeeze(-1)

this change can solve the problem, when ids shape == (1,1), then squeeze(-1) return (1)

The network predicts absolute value of xy instead of offset to the grid cell as specified in paper. Why is it so?

key errors: unexpected keys in state_dict

https://github.com/xiongzihua/pytorch-YOLO-v1/blob/0e5776a15e63f6d811c61a1b08f382bc41cff8c0/predict.py#L146

Hi，zihua:

after training , I run predict.py, then I get an error shows that keys are mismatch. Could you help me solve this bug? Thanks ~

The errror information:

RuntimeError: Error(s) in loading state_dict for VGG:
Missing key(s) in state_dict: "features.0.weight", "features.0.bias"...
Unexpected key(s) in state_dict: "module.features.0.weight", "module.features.0.bias"...

损失函数的参数好像有点问题

(self.l_coord * loc_loss + 2 * contain_loss + not_contain_loss + self.l_noobj * nooobj_loss + class_loss) / N

contain_loss前系数为何是2？
not_contain_loss前应该有self.l_noobj作为系数

假如有两个物体，他们两个物体的bbox的中心都落到同一个cell里，这个时候为啥还要把两个bbox的label和confident score赋值为一样的呢？？？不应该是一个bbox的label对应x1,y1,w1,h1,另外一个物体的bbox赋值到x2,y2,w2,h2吗？
另外为什么一个cell里只有一个bbox的中心的时候，要c1和c2都是1，x1=x2, y1=y2, w1=w2, h1=h2呢？

十分感谢！

choose the best iou box

box1_xyxy[:,:2] = box1[:,:2]/14. -0.5*box1[:,2:4]

What's that code mean? Why divide 14 and multiply 0.5？
Thank you .

训练的时候损失值降到1.6，predict的时候啥也检测不出来是怎么回事？

其实输出是14×１４大小的，但是注释给的都是7×７的。
eval_voc的时候更加离奇，
---start evaluate---
---class aeroplane ap 0.0---
---class bicycle ap -1---
---map -0.5---

在yoloLoss.py找到一处bug。

大佬好，我在yoloLoss.py中找到一处bug，已经提交了Pull Requests，希望大佬能审查下。

您能提供一下预训练好的模型文件吗，谢谢！

您能提供一下预训练好的模型文件吗，谢谢！
邮箱：[email protected]

IndexError: list index out of range

    num_faces = int(splited[1])
IndexError: list index out of range

I changed file_root and test_root in train.py
then run train.py, the error occurs.

and I want to ask you what is the role of dataset.py.
thanks for your nice work!

pre-train model

Do you have a pre-trained model(by yourself)?

I don't konw where is wrong

i use pytorch1.0
I encountered some warnings and errors.
I don't know if they are important .Maybe when i tried to correct them the logic is wrong
here are the warnings and errors

1 UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
I replaced F.sigmoid() with torch,sigmoid in resnet_yolo.py and net.py
2 UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
I replaced size_average=False with reduction='sum' in yoloLoss.py
3 IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
I replaced loss.data[0] with loss.item() in train.py
4 UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. images = Variable(images,volatile=True)
i just change it to images = images.detach()
i don't konw if it's right

My result is bad .
so anyone can tell me why
thanks

about performance

我用原始代码训练, vgg16作为backbone, 50个epoch(原始参数)后 mAP达到44.6%, 训练70个epoch后达到 49.8% ... 从数据上看, 没啥问题, 但是不知道为什么距离你的readme里面还有很大的距离.

（50 epoches, left: training loss right: val loss）

(70 epoches, left: training loss right: val loss)

I trained the network with the original code, vgg16 as the backbone,. after 50 epoches (original parameters) mAP is 44.6%. After 70 epoches , mAP is 49.8% ... Why does this not achieve the performance inside your readme.

help

Can you give me the train file: best.pth ?

yoloLoss的组成部分中contain_loss的可疑之处

contain_loss = F.mse_loss(box_pred_response[:,4],box_target_response_iou[:,4],size_average=False)
box_pred_response[:,4]代表的是iou值较大的预测得分，
box_target_response_iou[:,4]代表iou的值，
利用这两个信息求loss是什么意思勒，希望能得到作者的解惑。
我认为这行语句应该改为以下形式更为妥当：
contain_loss =
F.mse_loss(box_pred_response[:,4],box_target_response[:,4],size_average=False)
这只是我个人看法，还是希望能得到作者和广大码农的帮助

这里操作是不是有问题？

box1_xyxy[:, :2] = box1[:, :2] / 14. - 0.5 * box1[:, 2:4]
box1_xyxy[:, 2:4] = box1[:, :2] / 14. + 0.5 * box1[:, 2:4]
box2 = box_target[i].view(-1, 5)
box2_xyxy = Variable(torch.FloatTensor(box2.size()))
box2_xyxy[:, :2] = box2[:, :2] / 14. - 0.5 * box2[:, 2:4]
box2_xyxy[:, 2:4] = box2[:, :2] / 14. + 0.5 * box2[:, 2:4]

这里预测出来的xywh应该都是[0-1]，这里除以14没有意义吧

prediction problem

when i tried to predict this error occurred

load model...
predicting...
predict.py:143: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
img = Variable(img[None,:,:,:],volatile=True)
D:\miniconda\lib\site-packages\torch\nn\functional.py:1332: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
File "predict.py", line 174, in
result = predict_gpu(model,image_name)
File "predict.py", line 148, in predict_gpu
boxes,cls_indexs,probs = decoder(pred)
File "predict.py", line 89, in decoder
cls_indexs = torch.cat(cls_indexs,0) #(n,)
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated

please guide
thank you

target grid num is 14

I found that target grid num is 14， and error while training. So I change it to 7.
what's more, the loss never decrease when it reach 4.xx
I use it on people detect, mAP is only 0.08

train datasets not exactly same to the origin YOLO v1

hi, I found that your train dataset have voc2007train/val + 2012train/val, and that usually called VOC+ , but in VOC+, 2012train/val data len is 11k, but yours have 17k, your total train datasets lenth is 22k(2007 5K + 2012 17K), but origin YOLO v1 use the VOC+ total datasets len is 17k (2007 5.xK + 2012 11.xK)?

请问为什么要代码中加入了sigmoid？

在net.py和resnet.py中，请问forward函数的最后为什么要加入x = torch.sigmoid(x) ？，这里之后return x应该和标签值求损失函数然后反向传播，sigmoid之后相当于归一化，明显与标签值不匹配吧

数据集的目录形式是什么样的？

我想请问下数据集的目录形式，和这段代码是什么意思？
if isinstance(list_file, list):
# Cat multiple list files together.
# This is especially useful for voc07/voc12 combination.
tmp_file = '/tmp/listfile.txt'
os.system('cat %s > %s' % (' '.join(list_file), tmp_file))
list_file = tmp_file

some pictures in annotation txt（voc2007.txt+ voc2012.txt） are not in the image folder（2007trainval + 2012trainval）

eg：009963.jpg is in the voc2007.txt as a train image's name but it is in the voc2007test folder as a test image actually.so the dataloder（getitem）can not load the picture,and will cause error --'NoneType' object has no attribute 'shape' during trainning the net.

best.pth

如果可以提供下 best.pth 就更好了！

训练了 5 个epoch ， pred bbox 的x2 竟然小于 x1

Can't find the listfile.txt

excuse me, In the yolodataset have a string variable tmp_file is '/tmp/listfile.txt'.
could you teach me how to use it? thanks.

不能得到任何的bbox

在我训练完模型之后，尝试预测bbox，可是在预测bbox时候，也就是运行predict.py文件加载训练好的模型best.pth，不能得到任何的bbox，在查看代码后发现mask1 = contain > 0.1 # 大于阈值这一行代码的mask1[:,:,0]为0，如下图所示：

tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=torch.uint8)

请问我的问题大概出在哪里呢？

predict.py中的nms是对所有类别一起做nms吗？

看代码里面是所有类别的框一起做nms，为什么不每个类单独做

question : Yolo v1 confidence and probability class map

Hello,

I am trying to understand in details Yolo V1 but I have some questions about it regarding the confidence and class probability. In fact, the confidence is equal to : ground truth label * IOU(pred, truth). To find the ground truth label, we just need to get an array of size 7x7 and put the cell to 1 if the center of an object is inside the cell in the dataset. But, to compute the IOU I have some doubt. Do you compute the IOU only when the center of the prediction box and the center of the object (truth) is INSIDE the same cell ?
I have also a question regarding P class|object, if there is no object in a cell ou multiple object which label do you return during the training step ?

I thank you for the help !

你的resnet50作为backbone时，输入(3,448,448)的图片，输出维度不是(7,7,30)！

执行eval_voc,Expected 4-dimensional input for 4-dimensional weight 64 3 7 7, but got 3-dimensional input of size [3, 448, 448] instead

Expected 4-dimensional input for 4-dimensional weight 64 3 7 7, but got 3-dimensional input of size [3, 448, 448] instead
增加了下面一行代码，出现了更多的问题。感谢感谢!!!
def predict_gpu(model,image_name,root_path=''):

result = []
image = cv2.imread(root_path+image_name)
h,w,_ = image.shape
img = cv2.resize(image,(448,448))
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
mean = (123,117,104)#RGB
img = img - np.array(mean,dtype=np.float32)

transform = transforms.Compose([transforms.ToTensor(),])
img = transform(img)
img = img.cuda()

**img = torch.unsqueeze(img, dim=0)**


pred = model(img) #1x7x7x30
pred = pred.cpu()
boxes,cls_indexs,probs =  decoder(pred)

for i,box in enumerate(boxes):
    x1 = int(box[0]*w)
    x2 = int(box[2]*w)
    y1 = int(box[1]*h)
    y2 = int(box[3]*h)
    cls_index = cls_indexs[i]
    cls_index = int(cls_index) # convert LongTensor to int
    prob = probs[i]
    prob = float(prob)
    result.append([(x1,y1),(x2,y2),VOC_CLASSES[cls_index],image_name,prob])
return result

eval error

C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master>python eval_voc.py
---prepare target---
---start test---
0%| | 0/4951 [00:00<?, ?it/s]C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py:143: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
img = Variable(img[None,:,:,:],volatile=True)
D:\miniconda\lib\site-packages\torch\nn\functional.py:1332: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
0%| | 4/4951 [00:01<55:21, 1.49it/s]
Traceback (most recent call last):
File "eval_voc.py", line 186, in
result = predict_gpu(model,image_path,root_path='./VOCdevkit/VOC2012/JPEGImages/') #result[[left_up,right_bottom,class_name,image_path],]
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py", line 148, in predict_gpu
boxes,cls_indexs,probs = decoder(pred)
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py", line 90, in decoder
keep = nms(boxes,probs)
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py", line 107, in nms
i = order[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

please help

疑问：关于target的encoder部分

似乎是把box的坐标encode成相对与box中心所在网格的相对位置

但是计算IOU时怎么可以直接把这个target放进去？

ValueError: Input must be >= 2-d.

I tried to run train.py without cuda, but get this error

About BatchNormalization

Hi, Thank you for your reproducible code about Yolov1.

I was wondering about the structure of your resnet_yolo.py

def forward(self, x):
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.maxpool(x)

    x = self.layer1(x)
    x = self.layer2(x)
    x = self.layer3(x)
    x = self.layer4(x)
    x = self.layer5(x)
    # x = self.avgpool(x)
    # x = x.view(x.size(0), -1)
    # x = self.fc(x)
    x = self.conv_end(x)
    x = self.bn_end(x)
    x = F.sigmoid(x) #归一化到0-1
    # x = x.view(-1,7,7,30)
    x = x.permute(0,2,3,1) #(-1,7,7,30)

Why there is a 'self.bn_end(x)' at the last of the Network?
Is it for faster convergency and critical for the performance?

Why you used sigmoid activation in laste layer instead of linear activation?

Line no 61 in net.py

predict.py error

error when i run eval_voc.py

Traceback (most recent call last):
File "eval_voc.py", line 164, in
result = predict_gpu(model,image_path,root_path='E:/yolov1/pytorch_yolov1/data/VOCdevkit/VOC2007/JPEGImages/') #result[[left_up,right_bottom,class_name,image_path],]
File "E:\yolov1\yolores\predict.py", line 126, in predict_gpu
boxes,cls_indexs,probs = decoder(pred)
File "E:\yolov1\yolores\predict.py", line 46, in decoder
if mask[i,j,b] == 1:
IndexError: index 7 is out of bounds for dimension 1 with size 7

当我运行eval_voc.py之后报错上面的问题，请问该如何解决呢？

i dont know what is the problem

thank you for your good code
i faced this problem

File "train.py", line 123, in
for i,(images,target) in enumerate(train_loader):
File "D:\miniconda\lib\site-packages\torch\utils\data\dataloader.py", line 615, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "D:\miniconda\lib\site-packages\torch\utils\data\dataloader.py", line 615, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\dataset.py", line 70, in getitem
img, boxes = self.random_flip(img, boxes)
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\dataset.py", line 257, in random_flip
im_lr = np.fliplr(im).copy()
File "C:\Users\vcvis\AppData\Roaming\Python\Python36\site-packages\numpy\lib\twodim_base.py", line 95, in fliplr
raise ValueError("Input must be >= 2-d.")
ValueError: Input must be >= 2-d.

what is should do?

please guide me
thank you

boxes /= torch.Tensor([w,h,w,h]).expand_as(boxes)
img = self.BGR2RGB(img)
img = self.subMean(img,self.mean) 

# 因为网络输入要求, 所以缩放到固定尺寸, 但是在这之后
#  难道不应该再调整一下 bboxes的值吗.. 
#  此时的img和bboxes已经不匹配了(我特意看了self.encoder, 也没有类似的操作)
img = cv2.resize(img,(self.image_size,self.image_size))
target = self.encoder(boxes,labels)