Comments (15)
@Laughing-q
I can train yolov8 on my custom dataset now as I set copy-paste=0 in model.train()
from ultralytics.
@menglongyue can you try it with flag v5loader=True
?
from ultralytics.
@menglongyue can you try it with flag
v5loader=True
?
i just tryed, when set v5loader=True, this problem did not occur again. thank you very much! this problem might be a bug in yolov8.
from ultralytics.
@menglongyue ok got it! could you please tell me more about this issue? like is there negative labels or empty labels in you custom dataset? I'd like to reproduce your issue and solve it. :)
from ultralytics.
@menglongyue ok got it! could you please tell me more about this issue? like is there negative labels or empty labels in you custom dataset? I'd like to reproduce your issue and solve it. :)
OKοΌtraining log is as follows. hope it's useful for you:
`
yolo/engine/trainer: task=detect, mode=train, model=yolov8l.yaml, data=FLIR_rgb.yaml, epochs=300, patience=50, batch=8, imgsz=640, save=True, cache=F$
lse, device=0,1, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=False, seed=0, deterministic=True, singl
e_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, sav
e_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_con
f=False, save_crop=False, hide_labels=False, hide_conf=False, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, reti
na_masks=False, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=17, workspace=4, nms=False, lr0=0.01
, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, fl_gamma=0.0,
label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=
0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, hydra={'output_subdir': None, 'run': {'dir': '.'}}, v5loader=True, save_dir=runs/detect/train9
Ultralytics YOLOv8.0.3 π Python-3.8.15 torch-1.12.0+cu102 CUDA:0 (GeForce RTX 2080 Ti, 11019MiB)
CUDA:1 (GeForce RTX 2080 Ti, 11019MiB)
Overriding model.yaml nc=80 with nc=3
from n params module arguments
0 -1 1 1856 ultralytics.nn.modules.Conv [3, 64, 3, 2]
1 -1 1 73984 ultralytics.nn.modules.Conv [64, 128, 3, 2]
2 -1 3 279808 ultralytics.nn.modules.C2f [128, 128, 3, True]
3 -1 1 295424 ultralytics.nn.modules.Conv [128, 256, 3, 2]
4 -1 6 2101248 ultralytics.nn.modules.C2f [256, 256, 6, True]
5 -1 1 1180672 ultralytics.nn.modules.Conv [256, 512, 3, 2]
6 -1 6 8396800 ultralytics.nn.modules.C2f [512, 512, 6, True]
7 -1 1 2360320 ultralytics.nn.modules.Conv [512, 512, 3, 2]
8 -1 3 4461568 ultralytics.nn.modules.C2f [512, 512, 3, True]
9 -1 1 656896 ultralytics.nn.modules.SPPF [512, 512, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.Concat [1]
12 -1 3 4723712 ultralytics.nn.modules.C2f [1024, 512, 3]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.Concat [1]
15 -1 3 1247744 ultralytics.nn.modules.C2f [768, 256, 3]
16 -1 1 590336 ultralytics.nn.modules.Conv [256, 256, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.Concat [1]
18 -1 3 4592640 ultralytics.nn.modules.C2f [768, 512, 3]
19 -1 1 2360320 ultralytics.nn.modules.Conv [512, 512, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.Concat [1]
21 -1 3 4723712 ultralytics.nn.modules.C2f [1024, 512, 3]
22 [15, 18, 21] 1 5585113 ultralytics.nn.modules.Detect [3, [256, 512, 512]]
YOLOv8l summary: 365 layers, 43632153 parameters, 43632137 gradients, 165.4 GFLOPs
DDP settings: RANK 0, WORLD_SIZE 2, DEVICE cuda:0
[2023-01-11 14:48:27,614][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0
[2023-01-11 14:48:27,617][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 1
[2023-01-11 14:48:27,617][torch.distributed.distributed_c10d][INFO] - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 $
odes.
[2023-01-11 14:48:27,625][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 $
odes.
optimizer: SGD(lr=0.01) with parameter groups 97 weight(decay=0.0), 104 weight(decay=0.0005), 103 bias
train: Scanning /data1/huangqj/YOLO/ultralytics/train_rgb... 3598 images, 0 backgrounds, 1 corrupt: 100%|ββββββββββ| 3598/3598 [00:04<00:00, 831.82it$
train: WARNING
0016]
train: WARNING
train: WARNING
train: WARNING
train: WARNING
train: WARNING
train: New cache created: /data1/huangqj/YOLO/ultralytics/train_rgb.cache
val: Scanning /data1/huangqj/YOLO/ultralytics/val_rgb... 1543 images, 0 backgrounds, 0 corrupt: 100%|ββββββββββ| 1543/1543 [00:03<00:00, 441.33it/s]
val: New cache created: /data1/huangqj/YOLO/ultralytics/val_rgb.cache
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/train10
Starting training for 300 epochs...`
from ultralytics.
Hi @Laughing-q, I'm having the same problem while training on a custom dataset. Using v5loader=True
didn't help.
Train code
from ultralytics import YOLO
# Load a model
model = YOLO("yolov8l.pt") # load a pretrained model (recommended for training)
# Use the model
results = model.train(data="./data.yaml", epochs=100, conf="./default.yaml") # train the model
My data.yml
(copied from my yolov5 project)
train: /home/huy/projects/scratch/data/v7/train/images
val: /home/huy/projects/scratch/data/v7/val/images
nc: 3
names: ["0", "1", "2"]
My config default.yml
# Ultralytics YOLO π, GPL-3.0 license
# Default training settings and hyperparameters for medium-augmentation COCO training
task: "detect" # choices=['detect', 'segment', 'classify', 'init'] # init is a special case. Specify task to run.
mode: "train" # choices=['train', 'val', 'predict'] # mode to run task in.
# Train settings -------------------------------------------------------------------------------------------------------
model: yolov8l.py # i.e. yolov8n.pt, yolov8n.yaml. Path to model file
data: /home/huy/ssd/scratch_yolov8/data.yaml # i.e. coco128.yaml. Path to data file
epochs: 100 # number of epochs to train for
patience: 50 # TODO: epochs to wait for no observable improvement for early stopping of training
batch: 8 # number of images per batch
imgsz: 640 # size of input images
save: True # save checkpoints
cache: True # True/ram, disk or False. Use cache for data loading
device: 0,1 # cuda device, i.e. 0 or 0,1,2,3 or cpu. Device to run on
workers: 8 # number of worker threads for data loading
project: null # project name
name: null # experiment name
exist_ok: False # whether to overwrite existing experiment
pretrained: True # whether to use a pretrained model
optimizer: 'SGD' # optimizer to use, choices=['SGD', 'Adam', 'AdamW', 'RMSProp']
verbose: False # whether to print verbose output
seed: 0 # random seed for reproducibility
deterministic: True # whether to enable deterministic mode
single_cls: False # train multi-class data as single-class
image_weights: False # use weighted image selection for training
rect: False # support rectangular training
cos_lr: False # use cosine learning rate scheduler
close_mosaic: 10 # disable mosaic augmentation for final 10 epochs
resume: False # resume training from last checkpoint
# Segmentation
overlap_mask: True # masks should overlap during training
mask_ratio: 4 # mask downsample ratio
# Classification
dropout: 0.0 # use dropout regularization
# Val/Test settings ----------------------------------------------------------------------------------------------------
val: True # validate/test during training
save_json: False # save results to JSON file
save_hybrid: False # save hybrid version of labels (labels + additional predictions)
conf: null # object confidence threshold for detection (default 0.25 predict, 0.001 val)
iou: 0.7 # intersection over union (IoU) threshold for NMS
max_det: 300 # maximum number of detections per image
half: False # use half precision (FP16)
dnn: False # use OpenCV DNN for ONNX inference
plots: True # show plots during training
# Prediction settings --------------------------------------------------------------------------------------------------
source: null # source directory for images or videos
show: False # show results if possible
save_txt: False # save results as .txt file
save_conf: False # save results with confidence scores
save_crop: False # save cropped images with results
hide_labels: False # hide labels
hide_conf: False # hide confidence scores
vid_stride: 1 # video frame-rate stride
line_thickness: 3 # bounding box thickness (pixels)
visualize: False # visualize results
augment: False # apply data augmentation to images
agnostic_nms: False # class-agnostic NMS
retina_masks: False # use retina masks for object detection
# Export settings ------------------------------------------------------------------------------------------------------
format: torchscript # format to export to
keras: False # use Keras
optimize: False # TorchScript: optimize for mobile
int8: False # CoreML/TF INT8 quantization
dynamic: False # ONNX/TF/TensorRT: dynamic axes
simplify: False # ONNX: simplify model
opset: 17 # ONNX: opset version
workspace: 4 # TensorRT: workspace size (GB)
nms: False # CoreML: add NMS
# Hyperparameters ------------------------------------------------------------------------------------------------------
lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
lrf: 0.01 # final OneCycleLR learning rate (lr0 * lrf)
momentum: 0.937 # SGD momentum/Adam beta1
weight_decay: 0.0005 # optimizer weight decay 5e-4
warmup_epochs: 3.0 # warmup epochs (fractions ok)
warmup_momentum: 0.8 # warmup initial momentum
warmup_bias_lr: 0.1 # warmup initial bias lr
box: 7.5 # box loss gain
cls: 0.5 # cls loss gain (scale with pixels)
dfl: 1.5 # dfl loss gain
fl_gamma: 0.0 # focal loss gamma (efficientDet default gamma=1.5)
label_smoothing: 0.0
nbs: 64 # nominal batch size
hsv_h: 0.015 # image HSV-Hue augmentation (fraction)
hsv_s: 0.7 # image HSV-Saturation augmentation (fraction)
hsv_v: 0.4 # image HSV-Value augmentation (fraction)
degrees: 0.0 # image rotation (+/- deg)
translate: 0.1 # image translation (+/- fraction)
scale: 0.5 # image scale (+/- gain)
shear: 0.0 # image shear (+/- deg)
perspective: 0.0 # image perspective (+/- fraction), range 0-0.001
flipud: 0.0 # image flip up-down (probability)
fliplr: 0.5 # image flip left-right (probability)
mosaic: 1.0 # image mosaic (probability)
mixup: 0.0 # image mixup (probability)
copy_paste: 0.0 # segment copy-paste (probability)
# Hydra configs --------------------------------------------------------------------------------------------------------
hydra:
output_subdir: null # disable hydra directory creation
run:
dir: .
# Debug, do not modify -------------------------------------------------------------------------------------------------
v5loader: True # use legacy YOLOv5 dataloader
Error logs
yolo/engine/trainer: task=detect, mode=train, model=yolov8l.yaml, data=./data.yaml, epochs=100, patience=50, batch=16, imgsz=640, save=True, cache=False, device=None, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=False, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, overlap_mask=True, mask_ratio=4, dropout=False, val=True, save_json=False, save_hybrid=False, conf=./default.yaml, iou=0.7, max_det=300, half=True, dnn=False, plots=False, source=ultralytics/assets/, show=False, save_txt=False, save_conf=False, save_crop=False, hide_labels=False, hide_conf=False, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, retina_masks=False, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=17, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.001, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, fl_gamma=0.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.9, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.15, copy_paste=0.3, hydra={'output_subdir': None, 'run': {'dir': '.'}}, v5loader=True, save_dir=runs/detect/train3
Ultralytics YOLOv8.0.3 π Python-3.9.15 torch-1.12.0 CUDA:0 (NVIDIA GeForce RTX 3090, 24268MiB)
Overriding model.yaml nc=80 with nc=3
from n params module arguments
0 -1 1 1856 ultralytics.nn.modules.Conv [3, 64, 3, 2]
1 -1 1 73984 ultralytics.nn.modules.Conv [64, 128, 3, 2]
2 -1 3 279808 ultralytics.nn.modules.C2f [128, 128, 3, True]
3 -1 1 295424 ultralytics.nn.modules.Conv [128, 256, 3, 2]
4 -1 6 2101248 ultralytics.nn.modules.C2f [256, 256, 6, True]
5 -1 1 1180672 ultralytics.nn.modules.Conv [256, 512, 3, 2]
6 -1 6 8396800 ultralytics.nn.modules.C2f [512, 512, 6, True]
7 -1 1 2360320 ultralytics.nn.modules.Conv [512, 512, 3, 2]
8 -1 3 4461568 ultralytics.nn.modules.C2f [512, 512, 3, True]
9 -1 1 656896 ultralytics.nn.modules.SPPF [512, 512, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.Concat [1]
12 -1 3 4723712 ultralytics.nn.modules.C2f [1024, 512, 3]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.Concat [1]
15 -1 3 1247744 ultralytics.nn.modules.C2f [768, 256, 3]
16 -1 1 590336 ultralytics.nn.modules.Conv [256, 256, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.Concat [1]
18 -1 3 4592640 ultralytics.nn.modules.C2f [768, 512, 3]
19 -1 1 2360320 ultralytics.nn.modules.Conv [512, 512, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.Concat [1]
21 -1 3 4723712 ultralytics.nn.modules.C2f [1024, 512, 3]
22 [15, 18, 21] 1 5585113 ultralytics.nn.modules.Detect [3, [256, 512, 512]]
Model summary: 365 layers, 43632153 parameters, 43632137 gradients, 165.4 GFLOPs
Transferred 589/595 items from pretrained weights
WARNING:__main__:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
Overriding model.yaml nc=80 with nc=3
from n params module arguments
0 -1 1 1856 ultralytics.nn.modules.Conv [3, 64, 3, 2]
1 -1 1 73984 ultralytics.nn.modules.Conv [64, 128, 3, 2]
2 -1 3 279808 ultralytics.nn.modules.C2f [128, 128, 3, True]
3 -1 1 295424 ultralytics.nn.modules.Conv [128, 256, 3, 2]
4 -1 6 2101248 ultralytics.nn.modules.C2f [256, 256, 6, True]
5 -1 1 1180672 ultralytics.nn.modules.Conv [256, 512, 3, 2]
6 -1 6 8396800 ultralytics.nn.modules.C2f [512, 512, 6, True]
7 -1 1 2360320 ultralytics.nn.modules.Conv [512, 512, 3, 2]
8 -1 3 4461568 ultralytics.nn.modules.C2f [512, 512, 3, True]
9 -1 1 656896 ultralytics.nn.modules.SPPF [512, 512, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.Concat [1]
12 -1 3 4723712 ultralytics.nn.modules.C2f [1024, 512, 3]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.Concat [1]
15 -1 3 1247744 ultralytics.nn.modules.C2f [768, 256, 3]
16 -1 1 590336 ultralytics.nn.modules.Conv [256, 256, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.Concat [1]
18 -1 3 4592640 ultralytics.nn.modules.C2f [768, 512, 3]
19 -1 1 2360320 ultralytics.nn.modules.Conv [512, 512, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.Concat [1]
21 -1 3 4723712 ultralytics.nn.modules.C2f [1024, 512, 3]
22 [15, 18, 21] 1 5585113 ultralytics.nn.modules.Detect [3, [256, 512, 512]]
Model summary: 365 layers, 43632153 parameters, 43632137 gradients, 165.4 GFLOPs
Transferred 589/595 items from pretrained weights
DDP settings: RANK 0, WORLD_SIZE 2, DEVICE cuda:0
optimizer: SGD(lr=0.01) with parameter groups 97 weight(decay=0.0), 104 weight(decay=0.001), 103 bias
train: Scanning /home/huy/projects/scratch/data/v7/train/labels.cache... 2347 images, 136 backgrounds, 0 corrupt: 100%|ββββββββββ| 2480/2480 [00:00<?, ?it/s]
val: Scanning /home/huy/projects/scratch/data/v7/val/labels.cache... 508 images, 0 backgrounds, 0 corrupt: 100%|ββββββββββ| 508/508 [00:00<?, ?it/s]
Image sizes 640 train, 640 val
Using 16 dataloader workers
Logging results to runs/detect/train4
Starting training for 100 epochs...
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/100 5.28G 2.071 5.688 1.948 50 640: 2%|β | 3/155 [00:05<03:05, 1.22s/it]Traceback (most recent call last):
File "/home/huy/ssd/scratch_yolov8/train.py", line 8, in <module>
results = model.train(data="./data.yaml", epochs=100, conf="./default.yaml") # train the model
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/model.py", line 193, in train
self.trainer.train()
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/trainer.py", line 177, in train
self._do_train(int(os.getenv("RANK", -1)), world_size)
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/trainer.py", line 275, in _do_train
for i, batch in pbar:
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
data = self._next_data()
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1327, in _next_data
return self._process_data(data)
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
data.reraise()
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 4.
Original Traceback (most recent call last):
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/dataloaders/v5loader.py", line 664, in __getitem__
img, labels = self.load_mosaic(index)
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/dataloaders/v5loader.py", line 799, in load_mosaic
img4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/dataloaders/v5augmentations.py", line 255, in copy_paste
l, box, s = labels[j], boxes[j], segments[j]
IndexError: list index out of range
1/100 5.28G 2.108 5.806 1.969 45 640: 3%|β | 4/155 [00:05<03:22, 1.34s/it]
Traceback (most recent call last):
File "/home/huy/ssd/scratch_yolov8/train.py", line 8, in <module>
results = model.train(data="./data.yaml", epochs=100, conf="./default.yaml") # train the model
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/model.py", line 193, in train
self.trainer.train()
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/trainer.py", line 177, in train
self._do_train(int(os.getenv("RANK", -1)), world_size)
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/trainer.py", line 275, in _do_train
for i, batch in pbar:
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
data = self._next_data()
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1327, in _next_data
return self._process_data(data)
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
data.reraise()
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 4.
Original Traceback (most recent call last):
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/dataloaders/v5loader.py", line 664, in __getitem__
img, labels = self.load_mosaic(index)
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/dataloaders/v5loader.py", line 799, in load_mosaic
img4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/dataloaders/v5augmentations.py", line 255, in copy_paste
l, box, s = labels[j], boxes[j], segments[j]
IndexError: list index out of range
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 12076) of binary: /home/huy/anaconda3/envs/yolov8/bin/python
Traceback (most recent call last):
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/distributed/run.py", line 765, in <module>
main()
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/distributed/run.py", line 761, in main
run(args)
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/home/huy/ssd/scratch_yolov8/train.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2023-01-11_15:44:53
host : huy-money-maker
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 12077)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-01-11_15:44:53
host : huy-money-maker
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 12076)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
from ultralytics.
there maybe some bugs exists in dataloadingοΌwaiting for the official optimization.
from ultralytics.
@LNTH Did this error happen with both v5loader=False
and v5loader=True
?
@menglongyue I've already fixed your issue, I'll make a PR after I fix his issue. Thanks for your reporting again! :)
from ultralytics.
@LNTH looks like you're using copy_paste
, you have to make sure all your labels are segment labels. Then it should work correctly.
EDIT: can you try to train coco128
or coco128-seg
with the same command?
from ultralytics.
@menglongyue hi this issue has fixed by this #249.
from ultralytics.
- What does "your labels are segment labels" mean?
- How can I overdrive training config? I used
results = model.train(data="./data.yaml", epochs=100, conf="./default.yaml")
and inside my default.yaml I set copy-paste to 0 but it didn't work
When I run results = model.train(data="coco128", epochs=100)
, I got this bug (coco128-seg also has the same error)
Dataset not found β οΈ, missing path /home/huy/ssd/scratch_yolov8/datasets/coco128, attempting download...
Downloading https://github.com/ultralytics/yolov5/releases/download/v1.0/coco128.zip to /home/huy/ssd/scratch_yolov8/datasets/coco128.zip...
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 6.66M/6.66M [00:00<00:00, 22.3MB/s]
Unzipping /home/huy/ssd/scratch_yolov8/datasets/coco128.zip...
Dataset download success β
(1.2s), saved to /home/huy/ssd/scratch_yolov8/datasets/coco128
Traceback (most recent call last):
File "/home/huy/ssd/scratch_yolov8/train.py", line 8, in <module>
results = model.train(data="coco128", epochs=100) # train the model
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/model.py", line 189, in train
self.trainer = self.TrainerClass(overrides=overrides)
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/trainer.py", line 128, in __init__
self.data = check_dataset(self.data)
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/utils.py", line 291, in check_dataset
names = [x.name for x in (data_dir / 'train').iterdir() if x.is_dir()] # class names list
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/utils.py", line 291, in <listcomp>
names = [x.name for x in (data_dir / 'train').iterdir() if x.is_dir()] # class names list
File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/pathlib.py", line 1160, in iterdir
for name in self._accessor.listdir(self):
FileNotFoundError: [Errno 2] No such file or directory: '/home/huy/ssd/scratch_yolov8/datasets/coco128/train'
After I manually create train folder, I got new error state that I don't have the valid data
from ultralytics.
- What does "your labels are segment labels" mean?
- How can I overdrive training config? I used
results = model.train(data="./data.yaml", epochs=100, conf="./default.yaml")
and inside my default.yaml I set copy-paste to 0 but it didn't workWhen I run
results = model.train(data="coco128", epochs=100)
, I got this bug (coco128-seg also has the same error)Dataset not found β οΈ, missing path /home/huy/ssd/scratch_yolov8/datasets/coco128, attempting download... Downloading https://github.com/ultralytics/yolov5/releases/download/v1.0/coco128.zip to /home/huy/ssd/scratch_yolov8/datasets/coco128.zip... 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 6.66M/6.66M [00:00<00:00, 22.3MB/s] Unzipping /home/huy/ssd/scratch_yolov8/datasets/coco128.zip... Dataset download success β (1.2s), saved to /home/huy/ssd/scratch_yolov8/datasets/coco128 Traceback (most recent call last): File "/home/huy/ssd/scratch_yolov8/train.py", line 8, in <module> results = model.train(data="coco128", epochs=100) # train the model File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/model.py", line 189, in train self.trainer = self.TrainerClass(overrides=overrides) File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/trainer.py", line 128, in __init__ self.data = check_dataset(self.data) File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/utils.py", line 291, in check_dataset names = [x.name for x in (data_dir / 'train').iterdir() if x.is_dir()] # class names list File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/utils.py", line 291, in <listcomp> names = [x.name for x in (data_dir / 'train').iterdir() if x.is_dir()] # class names list File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/pathlib.py", line 1160, in iterdir for name in self._accessor.listdir(self): FileNotFoundError: [Errno 2] No such file or directory: '/home/huy/ssd/scratch_yolov8/datasets/coco128/train'
After I manually create train folder, I got new error state that I don't have the valid data
This is the folder structure of coco128 (automatically downloaded and unzip)
datasets
|__coco128
|__images
| |__train2017
|__labels
|__train2017
from ultralytics.
@LNTH can you try this?
results = model.train(data="coco128.yaml", epochs=100)
from ultralytics.
@Laughing-q Both coco128.yaml
and coco128-seg.yaml
work (I stopped at 10 epochs)
What does "your labels are segment labels" mean? Do you mean the same label format as Yolov5 seg?
from ultralytics.
@LNTH yes, the same label format as yolov5 seg. :)
from ultralytics.
Related Issues (20)
- Displaying frame settings HOT 3
- How to output prediction boxes based on different confidence thresholds for each class in yolov8 HOT 7
- Minimum classes number requirement? HOT 1
- c1=16 in DFL module HOT 2
- Converting classification model to Tflite using Yolov8 HOT 4
- Run onnx on GPU in javascript ? HOT 11
- Understanding the Curves HOT 4
- How to monitor images on the desktop in real time. Yolov8 HOT 5
- when try export yolov8 custom trained model to onnx format, new exported model names property is None. HOT 7
- Tracking HOT 2
- size mismatch error when finetuning a model with multiprocessing HOT 10
- How to avoid using clearml during training HOT 4
- Use 2 maxpool2d with kernel = 3 as opposed to 1 maxpool2d with kernel = 5 when inferencing with CPU in SPPF layer HOT 10
- I used the engine file for validation tests and found an error HOT 2
- label size HOT 23
- "test-dev2017"result is HOT 6
- Object detection from a folder where new images are constantly being written HOT 2
- Incorrectly load ema weights when resuming training HOT 1
- Insufficient memory is reported, and then the entire docker environment is killed. HOT 2
- PyTorch to Torchscript with FP16 Quantization HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ultralytics.