Code Monkey home page Code Monkey logo

Comments (15)

LNTH avatar LNTH commented on May 18, 2024 1

@Laughing-q
I can train yolov8 on my custom dataset now as I set copy-paste=0 in model.train()

from ultralytics.

Laughing-q avatar Laughing-q commented on May 18, 2024

@menglongyue can you try it with flag v5loader=True?

from ultralytics.

menglongyue avatar menglongyue commented on May 18, 2024

@menglongyue can you try it with flag v5loader=True?

i just tryed, when set v5loader=True, this problem did not occur again. thank you very much! this problem might be a bug in yolov8.

from ultralytics.

Laughing-q avatar Laughing-q commented on May 18, 2024

@menglongyue ok got it! could you please tell me more about this issue? like is there negative labels or empty labels in you custom dataset? I'd like to reproduce your issue and solve it. :)

from ultralytics.

menglongyue avatar menglongyue commented on May 18, 2024

@menglongyue ok got it! could you please tell me more about this issue? like is there negative labels or empty labels in you custom dataset? I'd like to reproduce your issue and solve it. :)
OK,training log is as follows. hope it's useful for you:

`
yolo/engine/trainer: task=detect, mode=train, model=yolov8l.yaml, data=FLIR_rgb.yaml, epochs=300, patience=50, batch=8, imgsz=640, save=True, cache=F$
lse, device=0,1, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=False, seed=0, deterministic=True, singl
e_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, sav
e_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_con
f=False, save_crop=False, hide_labels=False, hide_conf=False, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, reti
na_masks=False, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=17, workspace=4, nms=False, lr0=0.01
, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, fl_gamma=0.0,
label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=
0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, hydra={'output_subdir': None, 'run': {'dir': '.'}}, v5loader=True, save_dir=runs/detect/train9
Ultralytics YOLOv8.0.3 πŸš€ Python-3.8.15 torch-1.12.0+cu102 CUDA:0 (GeForce RTX 2080 Ti, 11019MiB)
CUDA:1 (GeForce RTX 2080 Ti, 11019MiB)

Overriding model.yaml nc=80 with nc=3

           from  n    params  module                                       arguments                                                          

0 -1 1 1856 ultralytics.nn.modules.Conv [3, 64, 3, 2]
1 -1 1 73984 ultralytics.nn.modules.Conv [64, 128, 3, 2]
2 -1 3 279808 ultralytics.nn.modules.C2f [128, 128, 3, True]
3 -1 1 295424 ultralytics.nn.modules.Conv [128, 256, 3, 2]
4 -1 6 2101248 ultralytics.nn.modules.C2f [256, 256, 6, True]
5 -1 1 1180672 ultralytics.nn.modules.Conv [256, 512, 3, 2]
6 -1 6 8396800 ultralytics.nn.modules.C2f [512, 512, 6, True]
7 -1 1 2360320 ultralytics.nn.modules.Conv [512, 512, 3, 2]
8 -1 3 4461568 ultralytics.nn.modules.C2f [512, 512, 3, True]
9 -1 1 656896 ultralytics.nn.modules.SPPF [512, 512, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.Concat [1]
12 -1 3 4723712 ultralytics.nn.modules.C2f [1024, 512, 3]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.Concat [1]
15 -1 3 1247744 ultralytics.nn.modules.C2f [768, 256, 3]
16 -1 1 590336 ultralytics.nn.modules.Conv [256, 256, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.Concat [1]
18 -1 3 4592640 ultralytics.nn.modules.C2f [768, 512, 3]
19 -1 1 2360320 ultralytics.nn.modules.Conv [512, 512, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.Concat [1]
21 -1 3 4723712 ultralytics.nn.modules.C2f [1024, 512, 3]
22 [15, 18, 21] 1 5585113 ultralytics.nn.modules.Detect [3, [256, 512, 512]]
YOLOv8l summary: 365 layers, 43632153 parameters, 43632137 gradients, 165.4 GFLOPs

DDP settings: RANK 0, WORLD_SIZE 2, DEVICE cuda:0
[2023-01-11 14:48:27,614][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0
[2023-01-11 14:48:27,617][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 1
[2023-01-11 14:48:27,617][torch.distributed.distributed_c10d][INFO] - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 $
odes.
[2023-01-11 14:48:27,625][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 $
odes.
optimizer: SGD(lr=0.01) with parameter groups 97 weight(decay=0.0), 104 weight(decay=0.0005), 103 bias
train: Scanning /data1/huangqj/YOLO/ultralytics/train_rgb... 3598 images, 0 backgrounds, 1 corrupt: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3598/3598 [00:04<00:00, 831.82it$
train: WARNING ⚠️ /data1/huangqj/FLIR/rgb/images/FLIR_03093_RGB.jpg: ignoring corrupt image/label: non-normalized or out of bounds coordinates [ 1$
0016]
train: WARNING ⚠️ /data1/huangqj/FLIR/rgb/images/FLIR_07226_RGB.jpg: 1 duplicate labels removed
train: WARNING ⚠️ /data1/huangqj/FLIR/rgb/images/FLIR_07525_RGB.jpg: 1 duplicate labels removed
train: WARNING ⚠️ /data1/huangqj/FLIR/rgb/images/FLIR_09042_RGB.jpg: 1 duplicate labels removed
train: WARNING ⚠️ /data1/huangqj/FLIR/rgb/images/FLIR_09055_RGB.jpg: 1 duplicate labels removed
train: WARNING ⚠️ /data1/huangqj/FLIR/rgb/images/FLIR_09653_RGB.jpg: 1 duplicate labels removed
train: New cache created: /data1/huangqj/YOLO/ultralytics/train_rgb.cache
val: Scanning /data1/huangqj/YOLO/ultralytics/val_rgb... 1543 images, 0 backgrounds, 0 corrupt: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1543/1543 [00:03<00:00, 441.33it/s]
val: New cache created: /data1/huangqj/YOLO/ultralytics/val_rgb.cache
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/train10
Starting training for 300 epochs...`

from ultralytics.

LNTH avatar LNTH commented on May 18, 2024

Hi @Laughing-q, I'm having the same problem while training on a custom dataset. Using v5loader=True didn't help.

Train code

from ultralytics import YOLO

# Load a model
model = YOLO("yolov8l.pt")  # load a pretrained model (recommended for training)

# Use the model
results = model.train(data="./data.yaml", epochs=100, conf="./default.yaml")  # train the model

My data.yml (copied from my yolov5 project)

train: /home/huy/projects/scratch/data/v7/train/images
val: /home/huy/projects/scratch/data/v7/val/images

nc: 3
names: ["0", "1", "2"]

My config default.yml

# Ultralytics YOLO πŸš€, GPL-3.0 license
# Default training settings and hyperparameters for medium-augmentation COCO training

task: "detect" # choices=['detect', 'segment', 'classify', 'init'] # init is a special case. Specify task to run.
mode: "train" # choices=['train', 'val', 'predict'] # mode to run task in.

# Train settings -------------------------------------------------------------------------------------------------------
model: yolov8l.py # i.e. yolov8n.pt, yolov8n.yaml. Path to model file
data: /home/huy/ssd/scratch_yolov8/data.yaml # i.e. coco128.yaml. Path to data file
epochs: 100 # number of epochs to train for
patience: 50  # TODO: epochs to wait for no observable improvement for early stopping of training
batch: 8 # number of images per batch
imgsz: 640 # size of input images
save: True # save checkpoints
cache: True # True/ram, disk or False. Use cache for data loading
device: 0,1 # cuda device, i.e. 0 or 0,1,2,3 or cpu. Device to run on
workers: 8 # number of worker threads for data loading
project: null # project name
name: null # experiment name
exist_ok: False # whether to overwrite existing experiment
pretrained: True # whether to use a pretrained model
optimizer: 'SGD' # optimizer to use, choices=['SGD', 'Adam', 'AdamW', 'RMSProp']
verbose: False # whether to print verbose output
seed: 0 # random seed for reproducibility
deterministic: True # whether to enable deterministic mode
single_cls: False # train multi-class data as single-class
image_weights: False # use weighted image selection for training
rect: False # support rectangular training
cos_lr: False # use cosine learning rate scheduler
close_mosaic: 10 # disable mosaic augmentation for final 10 epochs
resume: False # resume training from last checkpoint
# Segmentation
overlap_mask: True # masks should overlap during training
mask_ratio: 4 # mask downsample ratio
# Classification
dropout: 0.0  # use dropout regularization

# Val/Test settings ----------------------------------------------------------------------------------------------------
val: True # validate/test during training
save_json: False # save results to JSON file
save_hybrid: False # save hybrid version of labels (labels + additional predictions)
conf: null # object confidence threshold for detection (default 0.25 predict, 0.001 val)
iou: 0.7 # intersection over union (IoU) threshold for NMS
max_det: 300 # maximum number of detections per image
half: False # use half precision (FP16)
dnn: False # use OpenCV DNN for ONNX inference
plots: True # show plots during training

# Prediction settings --------------------------------------------------------------------------------------------------
source: null # source directory for images or videos
show: False # show results if possible
save_txt: False # save results as .txt file
save_conf: False # save results with confidence scores
save_crop: False # save cropped images with results
hide_labels: False # hide labels
hide_conf: False # hide confidence scores
vid_stride: 1 # video frame-rate stride
line_thickness: 3 # bounding box thickness (pixels)
visualize: False # visualize results
augment: False # apply data augmentation to images
agnostic_nms: False # class-agnostic NMS
retina_masks: False # use retina masks for object detection

# Export settings ------------------------------------------------------------------------------------------------------
format: torchscript # format to export to
keras: False  # use Keras
optimize: False  # TorchScript: optimize for mobile
int8: False  # CoreML/TF INT8 quantization
dynamic: False  # ONNX/TF/TensorRT: dynamic axes
simplify: False  # ONNX: simplify model
opset: 17  # ONNX: opset version
workspace: 4  # TensorRT: workspace size (GB)
nms: False  # CoreML: add NMS

# Hyperparameters ------------------------------------------------------------------------------------------------------
lr0: 0.01  # initial learning rate (SGD=1E-2, Adam=1E-3)
lrf: 0.01  # final OneCycleLR learning rate (lr0 * lrf)
momentum: 0.937  # SGD momentum/Adam beta1
weight_decay: 0.0005  # optimizer weight decay 5e-4
warmup_epochs: 3.0  # warmup epochs (fractions ok)
warmup_momentum: 0.8  # warmup initial momentum
warmup_bias_lr: 0.1  # warmup initial bias lr
box: 7.5  # box loss gain
cls: 0.5  # cls loss gain (scale with pixels)
dfl: 1.5  # dfl loss gain
fl_gamma: 0.0  # focal loss gamma (efficientDet default gamma=1.5)
label_smoothing: 0.0
nbs: 64  # nominal batch size
hsv_h: 0.015  # image HSV-Hue augmentation (fraction)
hsv_s: 0.7  # image HSV-Saturation augmentation (fraction)
hsv_v: 0.4  # image HSV-Value augmentation (fraction)
degrees: 0.0  # image rotation (+/- deg)
translate: 0.1  # image translation (+/- fraction)
scale: 0.5  # image scale (+/- gain)
shear: 0.0  # image shear (+/- deg)
perspective: 0.0  # image perspective (+/- fraction), range 0-0.001
flipud: 0.0  # image flip up-down (probability)
fliplr: 0.5  # image flip left-right (probability)
mosaic: 1.0  # image mosaic (probability)
mixup: 0.0  # image mixup (probability)
copy_paste: 0.0  # segment copy-paste (probability)

# Hydra configs --------------------------------------------------------------------------------------------------------
hydra:
  output_subdir: null  # disable hydra directory creation
  run:
    dir: .

# Debug, do not modify -------------------------------------------------------------------------------------------------
v5loader: True  # use legacy YOLOv5 dataloader

Error logs

yolo/engine/trainer: task=detect, mode=train, model=yolov8l.yaml, data=./data.yaml, epochs=100, patience=50, batch=16, imgsz=640, save=True, cache=False, device=None, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=False, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, overlap_mask=True, mask_ratio=4, dropout=False, val=True, save_json=False, save_hybrid=False, conf=./default.yaml, iou=0.7, max_det=300, half=True, dnn=False, plots=False, source=ultralytics/assets/, show=False, save_txt=False, save_conf=False, save_crop=False, hide_labels=False, hide_conf=False, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, retina_masks=False, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=17, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.001, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, fl_gamma=0.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.9, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.15, copy_paste=0.3, hydra={'output_subdir': None, 'run': {'dir': '.'}}, v5loader=True, save_dir=runs/detect/train3
Ultralytics YOLOv8.0.3 πŸš€ Python-3.9.15 torch-1.12.0 CUDA:0 (NVIDIA GeForce RTX 3090, 24268MiB)
Overriding model.yaml nc=80 with nc=3

                   from  n    params  module                                       arguments                     
  0                  -1  1      1856  ultralytics.nn.modules.Conv                  [3, 64, 3, 2]                 
  1                  -1  1     73984  ultralytics.nn.modules.Conv                  [64, 128, 3, 2]               
  2                  -1  3    279808  ultralytics.nn.modules.C2f                   [128, 128, 3, True]           
  3                  -1  1    295424  ultralytics.nn.modules.Conv                  [128, 256, 3, 2]              
  4                  -1  6   2101248  ultralytics.nn.modules.C2f                   [256, 256, 6, True]           
  5                  -1  1   1180672  ultralytics.nn.modules.Conv                  [256, 512, 3, 2]              
  6                  -1  6   8396800  ultralytics.nn.modules.C2f                   [512, 512, 6, True]           
  7                  -1  1   2360320  ultralytics.nn.modules.Conv                  [512, 512, 3, 2]              
  8                  -1  3   4461568  ultralytics.nn.modules.C2f                   [512, 512, 3, True]           
  9                  -1  1    656896  ultralytics.nn.modules.SPPF                  [512, 512, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.Concat                [1]                           
 12                  -1  3   4723712  ultralytics.nn.modules.C2f                   [1024, 512, 3]                
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.Concat                [1]                           
 15                  -1  3   1247744  ultralytics.nn.modules.C2f                   [768, 256, 3]                 
 16                  -1  1    590336  ultralytics.nn.modules.Conv                  [256, 256, 3, 2]              
 17            [-1, 12]  1         0  ultralytics.nn.modules.Concat                [1]                           
 18                  -1  3   4592640  ultralytics.nn.modules.C2f                   [768, 512, 3]                 
 19                  -1  1   2360320  ultralytics.nn.modules.Conv                  [512, 512, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.Concat                [1]                           
 21                  -1  3   4723712  ultralytics.nn.modules.C2f                   [1024, 512, 3]                
 22        [15, 18, 21]  1   5585113  ultralytics.nn.modules.Detect                [3, [256, 512, 512]]          
Model summary: 365 layers, 43632153 parameters, 43632137 gradients, 165.4 GFLOPs

Transferred 589/595 items from pretrained weights
WARNING:__main__:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Overriding model.yaml nc=80 with nc=3

                   from  n    params  module                                       arguments                     
  0                  -1  1      1856  ultralytics.nn.modules.Conv                  [3, 64, 3, 2]                 
  1                  -1  1     73984  ultralytics.nn.modules.Conv                  [64, 128, 3, 2]               
  2                  -1  3    279808  ultralytics.nn.modules.C2f                   [128, 128, 3, True]           
  3                  -1  1    295424  ultralytics.nn.modules.Conv                  [128, 256, 3, 2]              
  4                  -1  6   2101248  ultralytics.nn.modules.C2f                   [256, 256, 6, True]           
  5                  -1  1   1180672  ultralytics.nn.modules.Conv                  [256, 512, 3, 2]              
  6                  -1  6   8396800  ultralytics.nn.modules.C2f                   [512, 512, 6, True]           
  7                  -1  1   2360320  ultralytics.nn.modules.Conv                  [512, 512, 3, 2]              
  8                  -1  3   4461568  ultralytics.nn.modules.C2f                   [512, 512, 3, True]           
  9                  -1  1    656896  ultralytics.nn.modules.SPPF                  [512, 512, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.Concat                [1]                           
 12                  -1  3   4723712  ultralytics.nn.modules.C2f                   [1024, 512, 3]                
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.Concat                [1]                           
 15                  -1  3   1247744  ultralytics.nn.modules.C2f                   [768, 256, 3]                 
 16                  -1  1    590336  ultralytics.nn.modules.Conv                  [256, 256, 3, 2]              
 17            [-1, 12]  1         0  ultralytics.nn.modules.Concat                [1]                           
 18                  -1  3   4592640  ultralytics.nn.modules.C2f                   [768, 512, 3]                 
 19                  -1  1   2360320  ultralytics.nn.modules.Conv                  [512, 512, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.Concat                [1]                           
 21                  -1  3   4723712  ultralytics.nn.modules.C2f                   [1024, 512, 3]                
 22        [15, 18, 21]  1   5585113  ultralytics.nn.modules.Detect                [3, [256, 512, 512]]          
Model summary: 365 layers, 43632153 parameters, 43632137 gradients, 165.4 GFLOPs

Transferred 589/595 items from pretrained weights
DDP settings: RANK 0, WORLD_SIZE 2, DEVICE cuda:0
optimizer: SGD(lr=0.01) with parameter groups 97 weight(decay=0.0), 104 weight(decay=0.001), 103 bias
train: Scanning /home/huy/projects/scratch/data/v7/train/labels.cache... 2347 images, 136 backgrounds, 0 corrupt: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2480/2480 [00:00<?, ?it/s]
val: Scanning /home/huy/projects/scratch/data/v7/val/labels.cache... 508 images, 0 backgrounds, 0 corrupt: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 508/508 [00:00<?, ?it/s]
Image sizes 640 train, 640 val
Using 16 dataloader workers
Logging results to runs/detect/train4
Starting training for 100 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      1/100      5.28G      2.071      5.688      1.948         50        640:   2%|▏         | 3/155 [00:05<03:05,  1.22s/it]Traceback (most recent call last):
  File "/home/huy/ssd/scratch_yolov8/train.py", line 8, in <module>
    results = model.train(data="./data.yaml", epochs=100, conf="./default.yaml")  # train the model
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/model.py", line 193, in train
    self.trainer.train()
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/trainer.py", line 177, in train
    self._do_train(int(os.getenv("RANK", -1)), world_size)
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/trainer.py", line 275, in _do_train
    for i, batch in pbar:
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
    data = self._next_data()
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1327, in _next_data
    return self._process_data(data)
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
    data.reraise()
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
IndexError: Caught IndexError in DataLoader worker process 4.
Original Traceback (most recent call last):
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/dataloaders/v5loader.py", line 664, in __getitem__
    img, labels = self.load_mosaic(index)
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/dataloaders/v5loader.py", line 799, in load_mosaic
    img4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/dataloaders/v5augmentations.py", line 255, in copy_paste
    l, box, s = labels[j], boxes[j], segments[j]
IndexError: list index out of range

      1/100      5.28G      2.108      5.806      1.969         45        640:   3%|β–Ž         | 4/155 [00:05<03:22,  1.34s/it]
Traceback (most recent call last):
  File "/home/huy/ssd/scratch_yolov8/train.py", line 8, in <module>
    results = model.train(data="./data.yaml", epochs=100, conf="./default.yaml")  # train the model
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/model.py", line 193, in train
    self.trainer.train()
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/trainer.py", line 177, in train
    self._do_train(int(os.getenv("RANK", -1)), world_size)
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/trainer.py", line 275, in _do_train
    for i, batch in pbar:
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
    data = self._next_data()
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1327, in _next_data
    return self._process_data(data)
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
    data.reraise()
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
IndexError: Caught IndexError in DataLoader worker process 4.
Original Traceback (most recent call last):
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/dataloaders/v5loader.py", line 664, in __getitem__
    img, labels = self.load_mosaic(index)
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/dataloaders/v5loader.py", line 799, in load_mosaic
    img4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/dataloaders/v5augmentations.py", line 255, in copy_paste
    l, box, s = labels[j], boxes[j], segments[j]
IndexError: list index out of range

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 12076) of binary: /home/huy/anaconda3/envs/yolov8/bin/python
Traceback (most recent call last):
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/distributed/run.py", line 765, in <module>
    main()
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/distributed/run.py", line 761, in main
    run(args)
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/distributed/run.py", line 752, in run
    elastic_launch(
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/home/huy/ssd/scratch_yolov8/train.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-01-11_15:44:53
  host      : huy-money-maker
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 12077)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-01-11_15:44:53
  host      : huy-money-maker
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 12076)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

from ultralytics.

menglongyue avatar menglongyue commented on May 18, 2024

there maybe some bugs exists in dataloading,waiting for the official optimization.

from ultralytics.

Laughing-q avatar Laughing-q commented on May 18, 2024

@LNTH Did this error happen with both v5loader=False and v5loader=True?
@menglongyue I've already fixed your issue, I'll make a PR after I fix his issue. Thanks for your reporting again! :)

from ultralytics.

Laughing-q avatar Laughing-q commented on May 18, 2024

@LNTH looks like you're using copy_paste, you have to make sure all your labels are segment labels. Then it should work correctly.

EDIT: can you try to train coco128 or coco128-seg with the same command?

from ultralytics.

Laughing-q avatar Laughing-q commented on May 18, 2024

@menglongyue hi this issue has fixed by this #249.

from ultralytics.

LNTH avatar LNTH commented on May 18, 2024

@Laughing-q

  • What does "your labels are segment labels" mean?
  • How can I overdrive training config? I used results = model.train(data="./data.yaml", epochs=100, conf="./default.yaml") and inside my default.yaml I set copy-paste to 0 but it didn't work

When I run results = model.train(data="coco128", epochs=100), I got this bug (coco128-seg also has the same error)

Dataset not found ⚠️, missing path /home/huy/ssd/scratch_yolov8/datasets/coco128, attempting download...
Downloading https://github.com/ultralytics/yolov5/releases/download/v1.0/coco128.zip to /home/huy/ssd/scratch_yolov8/datasets/coco128.zip...
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.66M/6.66M [00:00<00:00, 22.3MB/s]
Unzipping /home/huy/ssd/scratch_yolov8/datasets/coco128.zip...
Dataset download success βœ… (1.2s), saved to /home/huy/ssd/scratch_yolov8/datasets/coco128

Traceback (most recent call last):
  File "/home/huy/ssd/scratch_yolov8/train.py", line 8, in <module>
    results = model.train(data="coco128", epochs=100)  # train the model
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/model.py", line 189, in train
    self.trainer = self.TrainerClass(overrides=overrides)
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/trainer.py", line 128, in __init__
    self.data = check_dataset(self.data)
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/utils.py", line 291, in check_dataset
    names = [x.name for x in (data_dir / 'train').iterdir() if x.is_dir()]  # class names list
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/utils.py", line 291, in <listcomp>
    names = [x.name for x in (data_dir / 'train').iterdir() if x.is_dir()]  # class names list
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/pathlib.py", line 1160, in iterdir
    for name in self._accessor.listdir(self):
FileNotFoundError: [Errno 2] No such file or directory: '/home/huy/ssd/scratch_yolov8/datasets/coco128/train'

After I manually create train folder, I got new error state that I don't have the valid data

from ultralytics.

LNTH avatar LNTH commented on May 18, 2024

@Laughing-q

  • What does "your labels are segment labels" mean?
  • How can I overdrive training config? I used results = model.train(data="./data.yaml", epochs=100, conf="./default.yaml") and inside my default.yaml I set copy-paste to 0 but it didn't work

When I run results = model.train(data="coco128", epochs=100), I got this bug (coco128-seg also has the same error)

Dataset not found ⚠️, missing path /home/huy/ssd/scratch_yolov8/datasets/coco128, attempting download...
Downloading https://github.com/ultralytics/yolov5/releases/download/v1.0/coco128.zip to /home/huy/ssd/scratch_yolov8/datasets/coco128.zip...
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.66M/6.66M [00:00<00:00, 22.3MB/s]
Unzipping /home/huy/ssd/scratch_yolov8/datasets/coco128.zip...
Dataset download success βœ… (1.2s), saved to /home/huy/ssd/scratch_yolov8/datasets/coco128

Traceback (most recent call last):
  File "/home/huy/ssd/scratch_yolov8/train.py", line 8, in <module>
    results = model.train(data="coco128", epochs=100)  # train the model
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/model.py", line 189, in train
    self.trainer = self.TrainerClass(overrides=overrides)
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/engine/trainer.py", line 128, in __init__
    self.data = check_dataset(self.data)
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/utils.py", line 291, in check_dataset
    names = [x.name for x in (data_dir / 'train').iterdir() if x.is_dir()]  # class names list
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/site-packages/ultralytics/yolo/data/utils.py", line 291, in <listcomp>
    names = [x.name for x in (data_dir / 'train').iterdir() if x.is_dir()]  # class names list
  File "/home/huy/anaconda3/envs/yolov8/lib/python3.9/pathlib.py", line 1160, in iterdir
    for name in self._accessor.listdir(self):
FileNotFoundError: [Errno 2] No such file or directory: '/home/huy/ssd/scratch_yolov8/datasets/coco128/train'

After I manually create train folder, I got new error state that I don't have the valid data

This is the folder structure of coco128 (automatically downloaded and unzip)

datasets
|__coco128
   |__images
   |    |__train2017
   |__labels
        |__train2017

from ultralytics.

Laughing-q avatar Laughing-q commented on May 18, 2024

@LNTH can you try this?

results = model.train(data="coco128.yaml", epochs=100)

from ultralytics.

LNTH avatar LNTH commented on May 18, 2024

@Laughing-q Both coco128.yaml and coco128-seg.yaml work (I stopped at 10 epochs)

What does "your labels are segment labels" mean? Do you mean the same label format as Yolov5 seg?

from ultralytics.

Laughing-q avatar Laughing-q commented on May 18, 2024

@LNTH yes, the same label format as yolov5 seg. :)

from ultralytics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.