abeardear / pytorch-yolo-v1 Goto Github PK
View Code? Open in Web Editor NEWan experiment for yolo-v1, including training and testing.
License: MIT License
an experiment for yolo-v1, including training and testing.
License: MIT License
Met this question when using predict.py
. Error code location is located at
i = order[0]
keep.append(i)
after check the code, I found the key is to update squeeze()
use
# before
ids = (ovr<=threshold).nonzero().squeeze()
# after
ids = (ovr<=threshold).nonzero().squeeze(-1)
this change can solve the problem, when ids
shape == (1,1), then squeeze(-1) return (1)
Hi,zihua:
after training , I run predict.py
, then I get an error shows that keys are mismatch. Could you help me solve this bug? Thanks ~
The errror information:
RuntimeError: Error(s) in loading state_dict for VGG:
Missing key(s) in state_dict: "features.0.weight", "features.0.bias"...
Unexpected key(s) in state_dict: "module.features.0.weight", "module.features.0.bias"...
(self.l_coord * loc_loss + 2 * contain_loss + not_contain_loss + self.l_noobj * nooobj_loss + class_loss) / N
contain_loss前系数为何是2?
not_contain_loss前应该有self.l_noobj作为系数
大佬你好:
关于你的encode代码,我有一个疑问:
for i in range(cxcy.size()[0]):
cxcy_sample = cxcy[i]
ij = (cxcy_sample/cell_size).ceil()-1 #
target[int(ij[1]),int(ij[0]),4] = 1
target[int(ij[1]),int(ij[0]),9] = 1
target[int(ij[1]),int(ij[0]),int(labels[i])+9] = 1
xy = ij*cell_size #匹配到的网格的左上角相对坐标
delta_xy = (cxcy_sample -xy)/cell_size
target[int(ij[1]),int(ij[0]),2:4] = wh[i]
target[int(ij[1]),int(ij[0]),:2] = delta_xy
target[int(ij[1]),int(ij[0]),7:9] = wh[i]
target[int(ij[1]),int(ij[0]),5:7] = delta_xy
关于类的概率部分赋值倒没什么疑问,但是bbox的赋值我有一些疑问,希望大佬能够解答。
考虑如下的bbox的label:
也是一个7 * 7 * 30的target
x1, y1 , w1 , h1 , c1, x2, y2, w2, h2, c2 是target[ row, col, :10]的值,
target[row, col, 10:] 是class 概率,
x1, y1代表bbox中心点坐标,w1,h1代表bbox的宽和长,c1是论文中的confidence score,x2等就是第二个bbox的label。以此类推。
我看到大佬把第一个bbox的label和第二个bbox的label全都赋值为一样的了,两个confident score也一样了
target[int(ij[1]),int(ij[0]),2:4] = wh[i]
target[int(ij[1]),int(ij[0]),:2] = delta_xy
target[int(ij[1]),int(ij[0]),7:9] = wh[i]
target[int(ij[1]),int(ij[0]),5:7] = delta_xy
我的问题是:
假如有两个物体, 他们两个物体的bbox的中心都落到同一个cell里,这个时候为啥还要把两个bbox的label和confident score赋值为一样的呢???不应该是一个bbox的label对应x1,y1,w1,h1,另外一个物体的bbox赋值到x2,y2,w2,h2吗?
另外为什么一个cell里只有一个bbox的中心的时候,要c1和c2都是1,x1=x2, y1=y2, w1=w2, h1=h2呢?
十分感谢!
box1_xyxy[:,:2] = box1[:,:2]/14. -0.5*box1[:,2:4]
What's that code mean? Why divide 14 and multiply 0.5?
Thank you .
其实输出是14×14大小的,但是注释给的都是7×7的。
eval_voc的时候更加离奇,
---start evaluate---
---class aeroplane ap 0.0---
---class bicycle ap -1---
---map -0.5---
大佬好,我在yoloLoss.py中找到一处bug, 已经提交了Pull Requests,希望大佬能审查下。
您能提供一下预训练好的模型文件吗,谢谢!
邮箱:[email protected]
num_faces = int(splited[1])
IndexError: list index out of range
I changed file_root and test_root in train.py
then run train.py, the error occurs.
and I want to ask you what is the role of dataset.py.
thanks for your nice work!
Do you have a pre-trained model(by yourself)?
i use pytorch1.0
I encountered some warnings and errors.
I don't know if they are important .Maybe when i tried to correct them the logic is wrong
here are the warnings and errors
1 UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
I replaced F.sigmoid() with torch,sigmoid in resnet_yolo.py and net.py
2 UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
I replaced size_average=False with reduction='sum' in yoloLoss.py
3 IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
I replaced loss.data[0] with loss.item() in train.py
4 UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead. images = Variable(images,volatile=True)
i just change it to images = images.detach()
i don't konw if it's right
My result is bad .
so anyone can tell me why
thanks
我用原始代码训练, vgg16作为backbone, 50个epoch(原始参数)后 mAP达到44.6%, 训练70个epoch后达到 49.8% ... 从数据上看, 没啥问题, 但是不知道为什么距离你的readme里面还有很大的距离.
(50 epoches, left: training loss right: val loss)
(70 epoches, left: training loss right: val loss)
I trained the network with the original code, vgg16 as the backbone,. after 50 epoches (original parameters) mAP is 44.6%. After 70 epoches , mAP is 49.8% ... Why does this not achieve the performance inside your readme.
Can you give me the train file: best.pth ?
contain_loss = F.mse_loss(box_pred_response[:,4],box_target_response_iou[:,4],size_average=False)
box_pred_response[:,4]代表的是iou值较大的预测得分,
box_target_response_iou[:,4]代表iou的值,
利用这两个信息求loss是什么意思勒,希望能得到作者的解惑。
我认为这行语句应该改为以下形式更为妥当:
contain_loss =
F.mse_loss(box_pred_response[:,4],box_target_response[:,4],size_average=False)
这只是我个人看法,还是希望能得到作者和广大码农的帮助
box1_xyxy[:, :2] = box1[:, :2] / 14. - 0.5 * box1[:, 2:4]
box1_xyxy[:, 2:4] = box1[:, :2] / 14. + 0.5 * box1[:, 2:4]
box2 = box_target[i].view(-1, 5)
box2_xyxy = Variable(torch.FloatTensor(box2.size()))
box2_xyxy[:, :2] = box2[:, :2] / 14. - 0.5 * box2[:, 2:4]
box2_xyxy[:, 2:4] = box2[:, :2] / 14. + 0.5 * box2[:, 2:4]
这里预测出来的xywh应该都是[0-1],这里除以14没有意义吧
when i tried to predict this error occurred
load model...
predicting...
predict.py:143: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead.
img = Variable(img[None,:,:,:],volatile=True)
D:\miniconda\lib\site-packages\torch\nn\functional.py:1332: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
File "predict.py", line 174, in
result = predict_gpu(model,image_name)
File "predict.py", line 148, in predict_gpu
boxes,cls_indexs,probs = decoder(pred)
File "predict.py", line 89, in decoder
cls_indexs = torch.cat(cls_indexs,0) #(n,)
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated
please guide
thank you
I found that target grid num is 14, and error while training. So I change it to 7.
what's more, the loss never decrease when it reach 4.xx
I use it on people detect, mAP is only 0.08
hi, I found that your train dataset have voc2007train/val + 2012train/val, and that usually called VOC+ , but in VOC+, 2012train/val data len is 11k, but yours have 17k, your total train datasets lenth is 22k(2007 5K + 2012 17K), but origin YOLO v1 use the VOC+ total datasets len is 17k (2007 5.xK + 2012 11.xK)?
在net.py和resnet.py中,请问forward函数的最后为什么要加入x = torch.sigmoid(x) ?,这里之后return x应该和标签值求损失函数然后反向传播,sigmoid之后相当于归一化,明显与标签值不匹配吧
我想请问下数据集的目录形式,和这段代码是什么意思?
if isinstance(list_file, list):
# Cat multiple list files together.
# This is especially useful for voc07/voc12 combination.
tmp_file = '/tmp/listfile.txt'
os.system('cat %s > %s' % (' '.join(list_file), tmp_file))
list_file = tmp_file
eg:009963.jpg is in the voc2007.txt as a train image's name but it is in the voc2007test folder as a test image actually.so the dataloder(getitem)can not load the picture,and will cause error --'NoneType' object has no attribute 'shape' during trainning the net.
如果可以提供下 best.pth 就更好了!
excuse me, In the yolodataset have a string variable tmp_file is '/tmp/listfile.txt'.
could you teach me how to use it? thanks.
在我训练完模型之后,尝试预测bbox,可是在预测bbox时候,也就是运行predict.py
文件加载训练好的模型best.pth
,不能得到任何的bbox,在查看代码后发现mask1 = contain > 0.1 # 大于阈值
这一行代码的mask1[:,:,0]
为0,如下图所示:
tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=torch.uint8)
请问我的问题大概出在哪里呢?
看代码里面是所有类别的框一起做nms,为什么不每个类单独做
Hello,
I am trying to understand in details Yolo V1 but I have some questions about it regarding the confidence and class probability. In fact, the confidence is equal to : ground truth label * IOU(pred, truth). To find the ground truth label, we just need to get an array of size 7x7 and put the cell to 1 if the center of an object is inside the cell in the dataset. But, to compute the IOU I have some doubt. Do you compute the IOU only when the center of the prediction box and the center of the object (truth) is INSIDE the same cell ?
I have also a question regarding P class|object, if there is no object in a cell ou multiple object which label do you return during the training step ?
I thank you for the help !
Expected 4-dimensional input for 4-dimensional weight 64 3 7 7, but got 3-dimensional input of size [3, 448, 448] instead
增加了下面一行代码 ,出现了更多的问题。感谢感谢!!!
def predict_gpu(model,image_name,root_path=''):
result = []
image = cv2.imread(root_path+image_name)
h,w,_ = image.shape
img = cv2.resize(image,(448,448))
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
mean = (123,117,104)#RGB
img = img - np.array(mean,dtype=np.float32)
transform = transforms.Compose([transforms.ToTensor(),])
img = transform(img)
img = img.cuda()
**img = torch.unsqueeze(img, dim=0)**
pred = model(img) #1x7x7x30
pred = pred.cpu()
boxes,cls_indexs,probs = decoder(pred)
for i,box in enumerate(boxes):
x1 = int(box[0]*w)
x2 = int(box[2]*w)
y1 = int(box[1]*h)
y2 = int(box[3]*h)
cls_index = cls_indexs[i]
cls_index = int(cls_index) # convert LongTensor to int
prob = probs[i]
prob = float(prob)
result.append([(x1,y1),(x2,y2),VOC_CLASSES[cls_index],image_name,prob])
return result
C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master>python eval_voc.py
---prepare target---
---start test---
0%| | 0/4951 [00:00<?, ?it/s]C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py:143: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead.
img = Variable(img[None,:,:,:],volatile=True)
D:\miniconda\lib\site-packages\torch\nn\functional.py:1332: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
0%| | 4/4951 [00:01<55:21, 1.49it/s]
Traceback (most recent call last):
File "eval_voc.py", line 186, in
result = predict_gpu(model,image_path,root_path='./VOCdevkit/VOC2012/JPEGImages/') #result[[left_up,right_bottom,class_name,image_path],]
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py", line 148, in predict_gpu
boxes,cls_indexs,probs = decoder(pred)
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py", line 90, in decoder
keep = nms(boxes,probs)
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py", line 107, in nms
i = order[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
please help
I tried to run train.py without cuda, but get this error
Hi, Thank you for your reproducible code about Yolov1.
I was wondering about the structure of your resnet_yolo.py
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.layer5(x)
# x = self.avgpool(x)
# x = x.view(x.size(0), -1)
# x = self.fc(x)
x = self.conv_end(x)
x = self.bn_end(x)
x = F.sigmoid(x) #归一化到0-1
# x = x.view(-1,7,7,30)
x = x.permute(0,2,3,1) #(-1,7,7,30)
Why there is a 'self.bn_end(x)' at the last of the Network?
Is it for faster convergency and critical for the performance?
Line no 61 in net.py
error when i run eval_voc.py
Traceback (most recent call last):
File "eval_voc.py", line 164, in
result = predict_gpu(model,image_path,root_path='E:/yolov1/pytorch_yolov1/data/VOCdevkit/VOC2007/JPEGImages/') #result[[left_up,right_bottom,class_name,image_path],]
File "E:\yolov1\yolores\predict.py", line 126, in predict_gpu
boxes,cls_indexs,probs = decoder(pred)
File "E:\yolov1\yolores\predict.py", line 46, in decoder
if mask[i,j,b] == 1:
IndexError: index 7 is out of bounds for dimension 1 with size 7
当我运行eval_voc.py之后报错上面的问题,请问该如何解决呢?
thank you for your good code
i faced this problem
File "train.py", line 123, in
for i,(images,target) in enumerate(train_loader):
File "D:\miniconda\lib\site-packages\torch\utils\data\dataloader.py", line 615, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "D:\miniconda\lib\site-packages\torch\utils\data\dataloader.py", line 615, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\dataset.py", line 70, in getitem
img, boxes = self.random_flip(img, boxes)
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\dataset.py", line 257, in random_flip
im_lr = np.fliplr(im).copy()
File "C:\Users\vcvis\AppData\Roaming\Python\Python36\site-packages\numpy\lib\twodim_base.py", line 95, in fliplr
raise ValueError("Input must be >= 2-d.")
ValueError: Input must be >= 2-d.
what is should do?
please guide me
thank you
背景
预训练的vgg16_bn
我在110个epoch后mAP只有52%,而且很难再提升;但是通过预训练的resnet50,50个epoch就有67%,不知道问题出在哪。。
I don't understand why grid_num is 14 in code , and why is it divided by 14 in yoloLoss.py line 88?
能把loss数据的变化给一下吗,我想对比一下你的数据。我用resnet微调,loss为什么变化那么小哪。
你好,
以下为源码, 我添加了注释, 问题在注释里
boxes /= torch.Tensor([w,h,w,h]).expand_as(boxes)
img = self.BGR2RGB(img)
img = self.subMean(img,self.mean)
# 因为网络输入要求, 所以缩放到固定尺寸, 但是在这之后
# 难道不应该再调整一下 bboxes的值吗..
# 此时的img和bboxes已经不匹配了(我特意看了self.encoder, 也没有类似的操作)
img = cv2.resize(img,(self.image_size,self.image_size))
target = self.encoder(boxes,labels)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.