Code Monkey home page Code Monkey logo

pytorch-yolo-v1's Issues

question : Yolo v1 confidence and probability class map

Hello,

I am trying to understand in details Yolo V1 but I have some questions about it regarding the confidence and class probability. In fact, the confidence is equal to : ground truth label * IOU(pred, truth). To find the ground truth label, we just need to get an array of size 7x7 and put the cell to 1 if the center of an object is inside the cell in the dataset. But, to compute the IOU I have some doubt. Do you compute the IOU only when the center of the prediction box and the center of the object (truth) is INSIDE the same cell ?
I have also a question regarding P class|object, if there is no object in a cell ou multiple object which label do you return during the training step ?

I thank you for the help !

IndexError: list index out of range

    num_faces = int(splited[1])
IndexError: list index out of range

I changed file_root and test_root in train.py
then run train.py, the error occurs.

and I want to ask you what is the role of dataset.py.
thanks for your nice work!

choose the best iou box

box1_xyxy[:,:2] = box1[:,:2]/14. -0.5*box1[:,2:4]

What's that code mean? Why divide 14 and multiply 0.5?
Thank you .

不能得到任何的bbox

在我训练完模型之后,尝试预测bbox,可是在预测bbox时候,也就是运行predict.py文件加载训练好的模型best.pth,不能得到任何的bbox,在查看代码后发现mask1 = contain > 0.1 # 大于阈值这一行代码的mask1[:,:,0]为0,如下图所示:

tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=torch.uint8)

请问我的问题大概出在哪里呢?

eval error

C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master>python eval_voc.py
---prepare target---
---start test---
0%| | 0/4951 [00:00<?, ?it/s]C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py:143: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
img = Variable(img[None,:,:,:],volatile=True)
D:\miniconda\lib\site-packages\torch\nn\functional.py:1332: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
0%| | 4/4951 [00:01<55:21, 1.49it/s]
Traceback (most recent call last):
File "eval_voc.py", line 186, in
result = predict_gpu(model,image_path,root_path='./VOCdevkit/VOC2012/JPEGImages/') #result[[left_up,right_bottom,class_name,image_path],]
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py", line 148, in predict_gpu
boxes,cls_indexs,probs = decoder(pred)
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py", line 90, in decoder
keep = nms(boxes,probs)
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py", line 107, in nms
i = order[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

please help

Can't find the listfile.txt

excuse me, In the yolodataset have a string variable tmp_file is '/tmp/listfile.txt'.
could you teach me how to use it? thanks.

train datasets not exactly same to the origin YOLO v1

hi, I found that your train dataset have voc2007train/val + 2012train/val, and that usually called VOC+ , but in VOC+, 2012train/val data len is 11k, but yours have 17k, your total train datasets lenth is 22k(2007 5K + 2012 17K), but origin YOLO v1 use the VOC+ total datasets len is 17k (2007 5.xK + 2012 11.xK)?

请问关于vgg16_bn版本的performance

背景
预训练的vgg16_bn

我在110个epoch后mAP只有52%,而且很难再提升;但是通过预训练的resnet50,50个epoch就有67%,不知道问题出在哪。。

About BatchNormalization

Hi, Thank you for your reproducible code about Yolov1.

I was wondering about the structure of your resnet_yolo.py

def forward(self, x):
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.maxpool(x)

    x = self.layer1(x)
    x = self.layer2(x)
    x = self.layer3(x)
    x = self.layer4(x)
    x = self.layer5(x)
    # x = self.avgpool(x)
    # x = x.view(x.size(0), -1)
    # x = self.fc(x)
    x = self.conv_end(x)
    x = self.bn_end(x)
    x = F.sigmoid(x) #归一化到0-1
    # x = x.view(-1,7,7,30)
    x = x.permute(0,2,3,1) #(-1,7,7,30)

Why there is a 'self.bn_end(x)' at the last of the Network?
Is it for faster convergency and critical for the performance?

about dataset.py line 94

你好,
以下为源码, 我添加了注释, 问题在注释里

boxes /= torch.Tensor([w,h,w,h]).expand_as(boxes)
img = self.BGR2RGB(img)
img = self.subMean(img,self.mean) 

# 因为网络输入要求, 所以缩放到固定尺寸, 但是在这之后
#  难道不应该再调整一下 bboxes的值吗.. 
#  此时的img和bboxes已经不匹配了(我特意看了self.encoder, 也没有类似的操作)
img = cv2.resize(img,(self.image_size,self.image_size))
target = self.encoder(boxes,labels)

这里操作是不是有问题?

box1_xyxy[:, :2] = box1[:, :2] / 14. - 0.5 * box1[:, 2:4]
box1_xyxy[:, 2:4] = box1[:, :2] / 14. + 0.5 * box1[:, 2:4]
box2 = box_target[i].view(-1, 5)
box2_xyxy = Variable(torch.FloatTensor(box2.size()))
box2_xyxy[:, :2] = box2[:, :2] / 14. - 0.5 * box2[:, 2:4]
box2_xyxy[:, 2:4] = box2[:, :2] / 14. + 0.5 * box2[:, 2:4]

这里预测出来的xywh应该都是[0-1],这里除以14没有意义吧

请问为什么要代码中加入了sigmoid?

在net.py和resnet.py中,请问forward函数的最后为什么要加入x = torch.sigmoid(x) ?,这里之后return x应该和标签值求损失函数然后反向传播,sigmoid之后相当于归一化,明显与标签值不匹配吧

help

Can you give me the train file: best.pth ?

损失函数的参数好像有点问题

(self.l_coord * loc_loss + 2 * contain_loss + not_contain_loss + self.l_noobj * nooobj_loss + class_loss) / N

contain_loss前系数为何是2?
not_contain_loss前应该有self.l_noobj作为系数

loss变化图片

能把loss数据的变化给一下吗,我想对比一下你的数据。我用resnet微调,loss为什么变化那么小哪。

key errors: unexpected keys in state_dict

https://github.com/xiongzihua/pytorch-YOLO-v1/blob/0e5776a15e63f6d811c61a1b08f382bc41cff8c0/predict.py#L146

Hi,zihua:

after training , I run predict.py, then I get an error shows that keys are mismatch. Could you help me solve this bug? Thanks ~

The errror information:

RuntimeError: Error(s) in loading state_dict for VGG:
Missing key(s) in state_dict: "features.0.weight", "features.0.bias"...
Unexpected key(s) in state_dict: "module.features.0.weight", "module.features.0.bias"...

数据集的目录形式是什么样的?

我想请问下数据集的目录形式,和这段代码是什么意思?
if isinstance(list_file, list):
# Cat multiple list files together.
# This is especially useful for voc07/voc12 combination.
tmp_file = '/tmp/listfile.txt'
os.system('cat %s > %s' % (' '.join(list_file), tmp_file))
list_file = tmp_file

执行eval_voc,Expected 4-dimensional input for 4-dimensional weight 64 3 7 7, but got 3-dimensional input of size [3, 448, 448] instead

Expected 4-dimensional input for 4-dimensional weight 64 3 7 7, but got 3-dimensional input of size [3, 448, 448] instead
增加了下面一行代码 ,出现了更多的问题。感谢感谢!!!
def predict_gpu(model,image_name,root_path=''):

result = []
image = cv2.imread(root_path+image_name)
h,w,_ = image.shape
img = cv2.resize(image,(448,448))
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
mean = (123,117,104)#RGB
img = img - np.array(mean,dtype=np.float32)

transform = transforms.Compose([transforms.ToTensor(),])
img = transform(img)
img = img.cuda()

**img = torch.unsqueeze(img, dim=0)**


pred = model(img) #1x7x7x30
pred = pred.cpu()
boxes,cls_indexs,probs =  decoder(pred)

for i,box in enumerate(boxes):
    x1 = int(box[0]*w)
    x2 = int(box[2]*w)
    y1 = int(box[1]*h)
    y2 = int(box[3]*h)
    cls_index = cls_indexs[i]
    cls_index = int(cls_index) # convert LongTensor to int
    prob = probs[i]
    prob = float(prob)
    result.append([(x1,y1),(x2,y2),VOC_CLASSES[cls_index],image_name,prob])
return result

target grid num is 14

I found that target grid num is 14, and error while training. So I change it to 7.
what's more, the loss never decrease when it reach 4.xx
I use it on people detect, mAP is only 0.08

yoloLoss的组成部分中contain_loss的可疑之处

contain_loss = F.mse_loss(box_pred_response[:,4],box_target_response_iou[:,4],size_average=False)
box_pred_response[:,4]代表的是iou值较大的预测得分,
box_target_response_iou[:,4]代表iou的值,
利用这两个信息求loss是什么意思勒,希望能得到作者的解惑。
我认为这行语句应该改为以下形式更为妥当:
contain_loss =
F.mse_loss(box_pred_response[:,4],box_target_response[:,4],size_average=False)
这只是我个人看法,还是希望能得到作者和广大码农的帮助

i dont know what is the problem

thank you for your good code
i faced this problem

File "train.py", line 123, in
for i,(images,target) in enumerate(train_loader):
File "D:\miniconda\lib\site-packages\torch\utils\data\dataloader.py", line 615, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "D:\miniconda\lib\site-packages\torch\utils\data\dataloader.py", line 615, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\dataset.py", line 70, in getitem
img, boxes = self.random_flip(img, boxes)
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\dataset.py", line 257, in random_flip
im_lr = np.fliplr(im).copy()
File "C:\Users\vcvis\AppData\Roaming\Python\Python36\site-packages\numpy\lib\twodim_base.py", line 95, in fliplr
raise ValueError("Input must be >= 2-d.")
ValueError: Input must be >= 2-d.

what is should do?

please guide me
thank you

about grid_num=14

I don't understand why grid_num is 14 in code , and why is it divided by 14 in yoloLoss.py line 88?

IndexError: invalid index of a 0-dim tensor

Met this question when using predict.py. Error code location is located at

        i = order[0]
        keep.append(i)

after check the code, I found the key is to update squeeze() use

        # before
        ids = (ovr<=threshold).nonzero().squeeze()
        # after
        ids = (ovr<=threshold).nonzero().squeeze(-1)

this change can solve the problem, when ids shape == (1,1), then squeeze(-1) return (1)

prediction problem

when i tried to predict this error occurred

load model...
predicting...
predict.py:143: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
img = Variable(img[None,:,:,:],volatile=True)
D:\miniconda\lib\site-packages\torch\nn\functional.py:1332: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
File "predict.py", line 174, in
result = predict_gpu(model,image_name)
File "predict.py", line 148, in predict_gpu
boxes,cls_indexs,probs = decoder(pred)
File "predict.py", line 89, in decoder
cls_indexs = torch.cat(cls_indexs,0) #(n,)
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated

please guide
thank you

关于dataset模块的encode方法,可能的bug

大佬你好:
关于你的encode代码,我有一个疑问:

        for i in range(cxcy.size()[0]):
            cxcy_sample = cxcy[i]
            ij = (cxcy_sample/cell_size).ceil()-1 #
            target[int(ij[1]),int(ij[0]),4] = 1
            target[int(ij[1]),int(ij[0]),9] = 1
            target[int(ij[1]),int(ij[0]),int(labels[i])+9] = 1
            xy = ij*cell_size #匹配到的网格的左上角相对坐标
            delta_xy = (cxcy_sample -xy)/cell_size
            target[int(ij[1]),int(ij[0]),2:4] = wh[i]
            target[int(ij[1]),int(ij[0]),:2] = delta_xy
            target[int(ij[1]),int(ij[0]),7:9] = wh[i]
            target[int(ij[1]),int(ij[0]),5:7] = delta_xy

关于类的概率部分赋值倒没什么疑问,但是bbox的赋值我有一些疑问,希望大佬能够解答。

考虑如下的bbox的label:
也是一个7 * 7 * 30的target

x1, y1 , w1 , h1 , c1, x2, y2, w2, h2, c2 是target[ row, col, :10]的值,
target[row, col, 10:] 是class 概率,

x1, y1代表bbox中心点坐标,w1,h1代表bbox的宽和长,c1是论文中的confidence score,x2等就是第二个bbox的label。以此类推。

我看到大佬把第一个bbox的label和第二个bbox的label全都赋值为一样的了,两个confident score也一样了

            target[int(ij[1]),int(ij[0]),2:4] = wh[i]
            target[int(ij[1]),int(ij[0]),:2] = delta_xy
            target[int(ij[1]),int(ij[0]),7:9] = wh[i]
            target[int(ij[1]),int(ij[0]),5:7] = delta_xy

我的问题是

假如有两个物体, 他们两个物体的bbox的中心都落到同一个cell里,这个时候为啥还要把两个bbox的label和confident score赋值为一样的呢???不应该是一个bbox的label对应x1,y1,w1,h1,另外一个物体的bbox赋值到x2,y2,w2,h2吗?
另外为什么一个cell里只有一个bbox的中心的时候,要c1和c2都是1,x1=x2, y1=y2, w1=w2, h1=h2呢?

十分感谢!

predict.py error

error when i run eval_voc.py

Traceback (most recent call last):
File "eval_voc.py", line 164, in
result = predict_gpu(model,image_path,root_path='E:/yolov1/pytorch_yolov1/data/VOCdevkit/VOC2007/JPEGImages/') #result[[left_up,right_bottom,class_name,image_path],]
File "E:\yolov1\yolores\predict.py", line 126, in predict_gpu
boxes,cls_indexs,probs = decoder(pred)
File "E:\yolov1\yolores\predict.py", line 46, in decoder
if mask[i,j,b] == 1:
IndexError: index 7 is out of bounds for dimension 1 with size 7

当我运行eval_voc.py之后报错上面的问题,请问该如何解决呢?

I don't konw where is wrong

i use pytorch1.0
I encountered some warnings and errors.
I don't know if they are important .Maybe when i tried to correct them the logic is wrong
here are the warnings and errors

1 UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
I replaced F.sigmoid() with torch,sigmoid in resnet_yolo.py and net.py
2 UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
I replaced size_average=False with reduction='sum' in yoloLoss.py
3 IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
I replaced loss.data[0] with loss.item() in train.py
4 UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. images = Variable(images,volatile=True)
i just change it to images = images.detach()
i don't konw if it's right

My result is bad .
so anyone can tell me why
thanks

about performance

我用原始代码训练, vgg16作为backbone, 50个epoch(原始参数)后 mAP达到44.6%, 训练70个epoch后达到 49.8% ... 从数据上看, 没啥问题, 但是不知道为什么距离你的readme里面还有很大的距离.

image
(50 epoches, left: training loss right: val loss)

image
(70 epoches, left: training loss right: val loss)


I trained the network with the original code, vgg16 as the backbone,. after 50 epoches (original parameters) mAP is 44.6%. After 70 epoches , mAP is 49.8% ... Why does this not achieve the performance inside your readme.

best.pth

如果可以提供下 best.pth 就更好了!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.