Comments (23)
一般无故卡死是因为多线程读取数据导致内存爆炸,所以大内存很重要,否则就调小num_works
from yolov4-pytorch.
应该是显存OOM,但我batch_size设成1,input设置为418*418,用的Titan X 12G显卡,还是一直出现这个问题。
from yolov4-pytorch.
没有道理。12G不可能这样爆显存。。输入应该是416x416吧
from yolov4-pytorch.
相同的数据集(45000张图片,9比1划分训练集和验证集),在ultralytics yolov5x上跑没问题(input_size=640,batch_size=4,使用mosaic数据增强)。
不知道up主有没有拿大批量的数据集来训练过。
from yolov4-pytorch.
什么错误?
from yolov4-pytorch.
我训练过COCO
from yolov4-pytorch.
什么错误?
没有错误提示,训练就停在那里,不往下训练了,也不退出,这种情况都是出现在第一个epoch,不过每次的iter都不一样
from yolov4-pytorch.
显存OOM不会停在那里。会提示错误
from yolov4-pytorch.
重下一下吧,最近换成了tqdm进度条,看看还有没有错误
from yolov4-pytorch.
显存OOM不会停在那里。会提示错误
好的,等周一上班了再试一下。
from yolov4-pytorch.
跑到第13个epoch爆显存了,input_shape = (416,416),Batch_size = 4.
Epoch 13/25: 60%|██████ | 6080/10125 [25:21<23:04, 2.92it/s, lr=0.000282, step/s=0.306, total_loss=14.1]Traceback (most recent call last):
File "D:\anaconda\lib\multiprocessing\queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "D:\anaconda\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
MemoryError
Epoch 13/25: 60%|██████ | 6088/10125 [25:40<19:00, 3.54it/s, lr=0.000282, step/s=0.248, total_loss=14.1]
from yolov4-pytorch.
,你这个是爆内存了。你可以减少num_works
from yolov4-pytorch.
,你这个是爆内存了。你可以减少num_works
num_works设为2一样出错,设成0跑了3个epoch受不了了,训练太慢了
from yolov4-pytorch.
训练时,修改Use_Data_Loader = False(原来为True),这个修改会使num_workers,pin_memory参数不再生效,同时将batch_size改小,比如修改为8.
测试时,可以将测试时的batch_size改为原来的1/3或者1/2.
2070S这样设置可以跑,不再无故卡死。
from yolov4-pytorch.
为0的话 等于Use_Data_Loader = False
from yolov4-pytorch.
后来发现,训练时,可以将预训练模型加载在cpu上,num_works拉满,pin_memory=False,调整训练时的batch_size至内存有1G剩余;val时,除batch_size=Batch_size//5,其余参数相同。
这样也可以。
from yolov4-pytorch.
Use_Data_Loader
请问Use_Data_Loader 这个参数在哪里
from yolov4-pytorch.
已经取消这个参数了,自动使用pytorch的dataloader
from yolov4-pytorch.
我也遇到了相同的问题,我是训练固定轮次就停下,不报错。继续训练,又训练相同轮数停下,已排除显存不够的问题。请各位大佬帮忙看一下如何解决。
from yolov4-pytorch.
我也遇到一樣的問題,不知道有沒有人已經解決了。
from yolov4-pytorch.
一般无故卡死是因为多线程读取数据导致内存爆炸,所以大内存很重要,否则就调小num_works
你好 请问这种情况导致的内存爆炸能检测到吗 比如使用命令 watch free -h
还是说卡死之后内存会自动被系统释放
from yolov4-pytorch.
Total Loss: 0.032 || Val Loss: 0.024
Save best model to best_epoch_weights.pth
Start Train
Epoch 52/200: 94%|▉| 1558/1660 [05:59<00:22, 4.55it/s, loss=0.0319, lr=0.00053
Process finished with exit code -1
我的v7运行到52轮就自动停止了,是什么原因。
from yolov4-pytorch.
请问解决吗了,我也出现这个问题。代码中断,不显示任何错误提示。排查了很久都没有找到原因!
from yolov4-pytorch.
Related Issues (20)
- opencv加载图像进行推理 HOT 5
- train_loss下降正常,val_loss不变,map一直很低,实际预测检测效果很差。 HOT 2
- train_loss下降,val_loss不变,map一直不变,自建数据集预测效果很差。 HOT 9
- 验证损失低于训练损失 HOT 2
- 替换YOLOv4主干网络为ghost或者mobilenetv2/3,检测精度直接为0或者零点几? HOT 14
- 你好,想问下更改预测框的颜色字体粗细大小 要如何更改啊 HOT 3
- 找不到dropblock,SAF等方法的人看这里 HOT 1
- loss计算太慢 HOT 7
- get_map很低
- please help me,why are the images of the validation set used during the training process? HOT 1
- 1
- AP很低 HOT 2
- 导,你的heatmap生成原理是什么?
- 请求 HOT 1
- 求教predict和get_map中detect_image的区别
- 默认参数没有调整,loss一直是nan HOT 2
- 计算量怎么查看? HOT 1
- File "mtrand, pyx", line 934, in numpy.random. mtrand. RandomState. choiceValueError: a cannot be empty unless no samples are taken
- RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [4096]] is at version 3; expected version 2 instead. HOT 1
- 请问yolov4的预训练权重从哪下载呢 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yolov4-pytorch.