abeardear / pytorch-yolo-v1 Goto Github PK

View Code? Open in Web Editor NEW

597.0 6.0 244.0 70.2 MB

an experiment for yolo-v1, including training and testing.

License: MIT License

Python 100.00%

yolov1 pytorch object-detection experiment

pytorch-yolo-v1's Issues

question : Yolo v1 confidence and probability class map

Hello,

I am trying to understand in details Yolo V1 but I have some questions about it regarding the confidence and class probability. In fact, the confidence is equal to : ground truth label * IOU(pred, truth). To find the ground truth label, we just need to get an array of size 7x7 and put the cell to 1 if the center of an object is inside the cell in the dataset. But, to compute the IOU I have some doubt. Do you compute the IOU only when the center of the prediction box and the center of the object (truth) is INSIDE the same cell ?
I have also a question regarding P class|object, if there is no object in a cell ou multiple object which label do you return during the training step ?

I thank you for the help !

疑问：关于target的encoder部分

似乎是把box的坐标encode成相对与box中心所在网格的相对位置

但是计算IOU时怎么可以直接把这个target放进去？

IndexError: list index out of range

    num_faces = int(splited[1])
IndexError: list index out of range

I changed file_root and test_root in train.py
then run train.py, the error occurs.

and I want to ask you what is the role of dataset.py.
thanks for your nice work!

Why you used sigmoid activation in laste layer instead of linear activation?

Line no 61 in net.py

choose the best iou box

box1_xyxy[:,:2] = box1[:,:2]/14. -0.5*box1[:,2:4]

What's that code mean? Why divide 14 and multiply 0.5？
Thank you .

不能得到任何的bbox

在我训练完模型之后，尝试预测bbox，可是在预测bbox时候，也就是运行predict.py文件加载训练好的模型best.pth，不能得到任何的bbox，在查看代码后发现mask1 = contain > 0.1 # 大于阈值这一行代码的mask1[:,:,0]为0，如下图所示：

tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=torch.uint8)

请问我的问题大概出在哪里呢？

eval error

C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master>python eval_voc.py
---prepare target---
---start test---
0%| | 0/4951 [00:00<?, ?it/s]C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py:143: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
img = Variable(img[None,:,:,:],volatile=True)
D:\miniconda\lib\site-packages\torch\nn\functional.py:1332: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
0%| | 4/4951 [00:01<55:21, 1.49it/s]
Traceback (most recent call last):
File "eval_voc.py", line 186, in
result = predict_gpu(model,image_path,root_path='./VOCdevkit/VOC2012/JPEGImages/') #result[[left_up,right_bottom,class_name,image_path],]
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py", line 148, in predict_gpu
boxes,cls_indexs,probs = decoder(pred)
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py", line 90, in decoder
keep = nms(boxes,probs)
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py", line 107, in nms
i = order[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

please help

你的resnet50作为backbone时，输入(3,448,448)的图片，输出维度不是(7,7,30)！

Can't find the listfile.txt

excuse me, In the yolodataset have a string variable tmp_file is '/tmp/listfile.txt'.
could you teach me how to use it? thanks.

train datasets not exactly same to the origin YOLO v1

hi, I found that your train dataset have voc2007train/val + 2012train/val, and that usually called VOC+ , but in VOC+, 2012train/val data len is 11k, but yours have 17k, your total train datasets lenth is 22k(2007 5K + 2012 17K), but origin YOLO v1 use the VOC+ total datasets len is 17k (2007 5.xK + 2012 11.xK)?

请问关于vgg16_bn版本的performance

背景
预训练的vgg16_bn

我在110个epoch后mAP只有52%，而且很难再提升；但是通过预训练的resnet50,50个epoch就有67%，不知道问题出在哪。。

About BatchNormalization

Hi, Thank you for your reproducible code about Yolov1.

I was wondering about the structure of your resnet_yolo.py

def forward(self, x):
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.maxpool(x)

    x = self.layer1(x)
    x = self.layer2(x)
    x = self.layer3(x)
    x = self.layer4(x)
    x = self.layer5(x)
    # x = self.avgpool(x)
    # x = x.view(x.size(0), -1)
    # x = self.fc(x)
    x = self.conv_end(x)
    x = self.bn_end(x)
    x = F.sigmoid(x) #归一化到0-1
    # x = x.view(-1,7,7,30)
    x = x.permute(0,2,3,1) #(-1,7,7,30)

Why there is a 'self.bn_end(x)' at the last of the Network?
Is it for faster convergency and critical for the performance?

about dataset.py line 94

你好,
以下为源码, 我添加了注释, 问题在注释里

boxes /= torch.Tensor([w,h,w,h]).expand_as(boxes)
img = self.BGR2RGB(img)
img = self.subMean(img,self.mean) 

# 因为网络输入要求, 所以缩放到固定尺寸, 但是在这之后
#  难道不应该再调整一下 bboxes的值吗.. 
#  此时的img和bboxes已经不匹配了(我特意看了self.encoder, 也没有类似的操作)
img = cv2.resize(img,(self.image_size,self.image_size))
target = self.encoder(boxes,labels)

The network predicts absolute value of xy instead of offset to the grid cell as specified in paper. Why is it so?

some pictures in annotation txt（voc2007.txt+ voc2012.txt） are not in the image folder（2007trainval + 2012trainval）

eg：009963.jpg is in the voc2007.txt as a train image's name but it is in the voc2007test folder as a test image actually.so the dataloder（getitem）can not load the picture,and will cause error --'NoneType' object has no attribute 'shape' during trainning the net.

这里操作是不是有问题？

box1_xyxy[:, :2] = box1[:, :2] / 14. - 0.5 * box1[:, 2:4]
box1_xyxy[:, 2:4] = box1[:, :2] / 14. + 0.5 * box1[:, 2:4]
box2 = box_target[i].view(-1, 5)
box2_xyxy = Variable(torch.FloatTensor(box2.size()))
box2_xyxy[:, :2] = box2[:, :2] / 14. - 0.5 * box2[:, 2:4]
box2_xyxy[:, 2:4] = box2[:, :2] / 14. + 0.5 * box2[:, 2:4]

这里预测出来的xywh应该都是[0-1]，这里除以14没有意义吧

您能提供一下预训练好的模型文件吗，谢谢！

您能提供一下预训练好的模型文件吗，谢谢！
邮箱：[email protected]

请问为什么要代码中加入了sigmoid？

在net.py和resnet.py中，请问forward函数的最后为什么要加入x = torch.sigmoid(x) ？，这里之后return x应该和标签值求损失函数然后反向传播，sigmoid之后相当于归一化，明显与标签值不匹配吧

pre-train model

Do you have a pre-trained model(by yourself)?

为什么结果的置信度这么低呢？ 0.2，0.3这样子？

help

Can you give me the train file: best.pth ?

ValueError: Input must be >= 2-d.

I tried to run train.py without cuda, but get this error

损失函数的参数好像有点问题

(self.l_coord * loc_loss + 2 * contain_loss + not_contain_loss + self.l_noobj * nooobj_loss + class_loss) / N

contain_loss前系数为何是2？
not_contain_loss前应该有self.l_noobj作为系数

loss变化图片

能把loss数据的变化给一下吗，我想对比一下你的数据。我用resnet微调，loss为什么变化那么小哪。

key errors: unexpected keys in state_dict

https://github.com/xiongzihua/pytorch-YOLO-v1/blob/0e5776a15e63f6d811c61a1b08f382bc41cff8c0/predict.py#L146

Hi，zihua:

after training , I run predict.py, then I get an error shows that keys are mismatch. Could you help me solve this bug? Thanks ~

The errror information:

RuntimeError: Error(s) in loading state_dict for VGG:
Missing key(s) in state_dict: "features.0.weight", "features.0.bias"...
Unexpected key(s) in state_dict: "module.features.0.weight", "module.features.0.bias"...

数据集的目录形式是什么样的？

我想请问下数据集的目录形式，和这段代码是什么意思？
if isinstance(list_file, list):
# Cat multiple list files together.
# This is especially useful for voc07/voc12 combination.
tmp_file = '/tmp/listfile.txt'
os.system('cat %s > %s' % (' '.join(list_file), tmp_file))
list_file = tmp_file

执行eval_voc,Expected 4-dimensional input for 4-dimensional weight 64 3 7 7, but got 3-dimensional input of size [3, 448, 448] instead

Expected 4-dimensional input for 4-dimensional weight 64 3 7 7, but got 3-dimensional input of size [3, 448, 448] instead
增加了下面一行代码，出现了更多的问题。感谢感谢!!!
def predict_gpu(model,image_name,root_path=''):

result = []
image = cv2.imread(root_path+image_name)
h,w,_ = image.shape
img = cv2.resize(image,(448,448))
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
mean = (123,117,104)#RGB
img = img - np.array(mean,dtype=np.float32)

transform = transforms.Compose([transforms.ToTensor(),])
img = transform(img)
img = img.cuda()

**img = torch.unsqueeze(img, dim=0)**


pred = model(img) #1x7x7x30
pred = pred.cpu()
boxes,cls_indexs,probs =  decoder(pred)

for i,box in enumerate(boxes):
    x1 = int(box[0]*w)
    x2 = int(box[2]*w)
    y1 = int(box[1]*h)
    y2 = int(box[3]*h)
    cls_index = cls_indexs[i]
    cls_index = int(cls_index) # convert LongTensor to int
    prob = probs[i]
    prob = float(prob)
    result.append([(x1,y1),(x2,y2),VOC_CLASSES[cls_index],image_name,prob])
return result

在yoloLoss.py找到一处bug。

大佬好，我在yoloLoss.py中找到一处bug，已经提交了Pull Requests，希望大佬能审查下。

训练了 5 个epoch ， pred bbox 的x2 竟然小于 x1

target grid num is 14

I found that target grid num is 14， and error while training. So I change it to 7.
what's more, the loss never decrease when it reach 4.xx
I use it on people detect, mAP is only 0.08

yoloLoss的组成部分中contain_loss的可疑之处

contain_loss = F.mse_loss(box_pred_response[:,4],box_target_response_iou[:,4],size_average=False)
box_pred_response[:,4]代表的是iou值较大的预测得分，
box_target_response_iou[:,4]代表iou的值，
利用这两个信息求loss是什么意思勒，希望能得到作者的解惑。
我认为这行语句应该改为以下形式更为妥当：
contain_loss =
F.mse_loss(box_pred_response[:,4],box_target_response[:,4],size_average=False)
这只是我个人看法，还是希望能得到作者和广大码农的帮助

i dont know what is the problem

thank you for your good code
i faced this problem

File "train.py", line 123, in
for i,(images,target) in enumerate(train_loader):
File "D:\miniconda\lib\site-packages\torch\utils\data\dataloader.py", line 615, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "D:\miniconda\lib\site-packages\torch\utils\data\dataloader.py", line 615, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\dataset.py", line 70, in getitem
img, boxes = self.random_flip(img, boxes)
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\dataset.py", line 257, in random_flip
im_lr = np.fliplr(im).copy()
File "C:\Users\vcvis\AppData\Roaming\Python\Python36\site-packages\numpy\lib\twodim_base.py", line 95, in fliplr
raise ValueError("Input must be >= 2-d.")
ValueError: Input must be >= 2-d.

what is should do?

please guide me
thank you

ImportError: cannot import name 'queue' from 'torch._six' (/home/liqi/.local/lib/python3.8/site-packages/torch/_six.py)

about grid_num=14

I don't understand why grid_num is 14 in code , and why is it divided by 14 in yoloLoss.py line 88?

IndexError: invalid index of a 0-dim tensor

Met this question when using predict.py. Error code location is located at

        i = order[0]
        keep.append(i)

after check the code, I found the key is to update squeeze() use

        # before
        ids = (ovr<=threshold).nonzero().squeeze()
        # after
        ids = (ovr<=threshold).nonzero().squeeze(-1)

this change can solve the problem, when ids shape == (1,1), then squeeze(-1) return (1)

prediction problem

when i tried to predict this error occurred

load model...
predicting...
predict.py:143: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
img = Variable(img[None,:,:,:],volatile=True)
D:\miniconda\lib\site-packages\torch\nn\functional.py:1332: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
File "predict.py", line 174, in
result = predict_gpu(model,image_name)
File "predict.py", line 148, in predict_gpu
boxes,cls_indexs,probs = decoder(pred)
File "predict.py", line 89, in decoder
cls_indexs = torch.cat(cls_indexs,0) #(n,)
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated

please guide
thank you

训练的时候损失值降到1.6，predict的时候啥也检测不出来是怎么回事？

其实输出是14×１４大小的，但是注释给的都是7×７的。
eval_voc的时候更加离奇，
---start evaluate---
---class aeroplane ap 0.0---
---class bicycle ap -1---
---map -0.5---

关于dataset模块的encode方法，可能的bug

大佬你好：
关于你的encode代码，我有一个疑问：

        for i in range(cxcy.size()[0]):
            cxcy_sample = cxcy[i]
            ij = (cxcy_sample/cell_size).ceil()-1 #
            target[int(ij[1]),int(ij[0]),4] = 1
            target[int(ij[1]),int(ij[0]),9] = 1
            target[int(ij[1]),int(ij[0]),int(labels[i])+9] = 1
            xy = ij*cell_size #匹配到的网格的左上角相对坐标
            delta_xy = (cxcy_sample -xy)/cell_size
            target[int(ij[1]),int(ij[0]),2:4] = wh[i]
            target[int(ij[1]),int(ij[0]),:2] = delta_xy
            target[int(ij[1]),int(ij[0]),7:9] = wh[i]
            target[int(ij[1]),int(ij[0]),5:7] = delta_xy

关于类的概率部分赋值倒没什么疑问，但是bbox的赋值我有一些疑问，希望大佬能够解答。

考虑如下的bbox的label:
也是一个7 * 7 * 30的target

x1, y1 , w1 , h1 , c1, x2, y2, w2, h2, c2 是target[ row, col, :10]的值，
target[row, col, 10:] 是class 概率，

x1, y1代表bbox中心点坐标，w1,h1代表bbox的宽和长，c1是论文中的confidence score，x2等就是第二个bbox的label。以此类推。

我看到大佬把第一个bbox的label和第二个bbox的label全都赋值为一样的了，两个confident score也一样了

            target[int(ij[1]),int(ij[0]),2:4] = wh[i]
            target[int(ij[1]),int(ij[0]),:2] = delta_xy
            target[int(ij[1]),int(ij[0]),7:9] = wh[i]
            target[int(ij[1]),int(ij[0]),5:7] = delta_xy

我的问题是：

假如有两个物体，他们两个物体的bbox的中心都落到同一个cell里，这个时候为啥还要把两个bbox的label和confident score赋值为一样的呢？？？不应该是一个bbox的label对应x1,y1,w1,h1,另外一个物体的bbox赋值到x2,y2,w2,h2吗？
另外为什么一个cell里只有一个bbox的中心的时候，要c1和c2都是1，x1=x2, y1=y2, w1=w2, h1=h2呢？

十分感谢！

predict.py error

error when i run eval_voc.py

Traceback (most recent call last):
File "eval_voc.py", line 164, in
result = predict_gpu(model,image_path,root_path='E:/yolov1/pytorch_yolov1/data/VOCdevkit/VOC2007/JPEGImages/') #result[[left_up,right_bottom,class_name,image_path],]
File "E:\yolov1\yolores\predict.py", line 126, in predict_gpu
boxes,cls_indexs,probs = decoder(pred)
File "E:\yolov1\yolores\predict.py", line 46, in decoder
if mask[i,j,b] == 1:
IndexError: index 7 is out of bounds for dimension 1 with size 7

当我运行eval_voc.py之后报错上面的问题，请问该如何解决呢？

predict.py中的nms是对所有类别一起做nms吗？

看代码里面是所有类别的框一起做nms，为什么不每个类单独做

I don't konw where is wrong

i use pytorch1.0
I encountered some warnings and errors.
I don't know if they are important .Maybe when i tried to correct them the logic is wrong
here are the warnings and errors

1 UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
I replaced F.sigmoid() with torch,sigmoid in resnet_yolo.py and net.py
2 UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
I replaced size_average=False with reduction='sum' in yoloLoss.py
3 IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
I replaced loss.data[0] with loss.item() in train.py
4 UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. images = Variable(images,volatile=True)
i just change it to images = images.detach()
i don't konw if it's right

My result is bad .
so anyone can tell me why
thanks

about performance

我用原始代码训练, vgg16作为backbone, 50个epoch(原始参数)后 mAP达到44.6%, 训练70个epoch后达到 49.8% ... 从数据上看, 没啥问题, 但是不知道为什么距离你的readme里面还有很大的距离.

（50 epoches, left: training loss right: val loss）

(70 epoches, left: training loss right: val loss)

I trained the network with the original code, vgg16 as the backbone,. after 50 epoches (original parameters) mAP is 44.6%. After 70 epoches , mAP is 49.8% ... Why does this not achieve the performance inside your readme.

best.pth

如果可以提供下 best.pth 就更好了！

abeardear / pytorch-yolo-v1 Goto Github PK

pytorch-yolo-v1's Issues

Recommend Projects

Recommend Topics

Recommend Org