wangermeng2021 / scaled-yolov4-tensorflow2 Goto Github PK

View Code? Open in Web Editor NEW

47.0 4.0 18.0 73.45 MB

A Tensorflow2.x implementation of Scaled-YOLOv4 as described in Scaled-YOLOv4: Scaling Cross Stage Partial Network

License: Apache License 2.0

Python 100.00%

tensorflow2 scaledyolov4 yolov4 object-detection yolo tf2 tensorflow tensorflow-serving

scaled-yolov4-tensorflow2's Introduction

Scaled-YOLOv4-tensorflow2

A Tensorflow2.x implementation of Scaled-YOLOv4 as described in Scaled-YOLOv4: Scaling Cross Stage Partial Network

Update Log

[2021-07-02]:

Add support for: Exponential moving average decay for variables. Improve mAP from 0.985 to 0.990 on Chess Pieces dataset.

[2021-06-29]:

Major Features and Improvements:

Add support for: Sharpness-Aware Minimization(SAM_sgd,SAM_adam).

Bug Fixes and Changes:

Fix the nan loss error when using adam optimizer
Set default optimizer as SAM_adam
Change default running mode from 'fit' to 'eager mode'

[2021-06-27] Add support for: resuming training from checkpoints.

[2021-02-21] Add support for: model.fit(dramatic improvement in GPU utilization); online coco evaluation callback; change default optimizer from sgd to adam

[2021-02-11] Add support for: one-click deployment using tensorflow Serving(very fast)

[2021-01-29] Add support for: mosaic,ssd_random_crop

[2021-01-25] Add support for: ciou loss,hard-nms,DIoU-nms,label_smooth,transfer learning,tensorboard

[2021-01-23] Add support for: scales_x_y/eliminate grid sensitivity,accumulate gradients for using big batch size,focal loss,diou loss

[2021-01-16] Add support for: warmup,Cosine annealing scheduler,Eager mode training with tf.GradientTape,support voc/coco dataset format

[2021-01-10] Add support for: yolov4-tiny,yolov4-large p5/p6/p7,online coco evaluation,multi scale training

Demo

ScaledYOLOv4_p5_detection_result:

ScaledYOLOv4_tiny_detection_result:

Installation

1. Clone project

git clone https://github.com/wangermeng2021/Scaled-YOLOv4-tensorflow2.git
cd Scaled-YOLOv4-tensorflow2

2. Install environment

install tesnorflow ( skip this step if it's already installed,test environment:tensorflow 2.4.0)
```
pip install -r requirements.txt
```

Note:

I strongly recommend using voc dataset type(default dataset type), because my GPU is old, so coco dataset type is not fully tested.

Training:

Download Pre-trained p5 coco pretrain models and place it under directory 'pretrained/ScaledYOLOV4_p5_coco_pretrain' :
https://drive.google.com/file/d/1glOCE3Y5Q5enW3rpVq3SmKDXzaKIw4YL/view?usp=sharing
Download Pre-trained p6 coco pretrain models and place it under directory 'pretrained/ScaledYOLOV4_p6_coco_pretrain' :
https://drive.google.com/file/d/1EymbpgiO6VkCCFdB0zSTv0B9yB6T9Fw1/view?usp=sharing
Download Pre-trained tiny coco pretrain models and place it under directory 'pretrained/ScaledYOLOV4_tiny_coco_pretrain' :
https://drive.google.com/file/d/1x15FN7jCAFwsntaMwmSkkgIzvHXUa7xT/view?usp=sharing

For training on Pothole dataset(No need to download dataset,it's already included in project):
p5(single scale):

python train.py --use-pretrain True --model-type p5 --dataset-type voc --dataset dataset/pothole_voc --num-classes 1 --class-names pothole.names  --voc-train-set dataset_1,train --voc-val-set dataset_1,val  --epochs 200 --batch-size 4 --multi-scale 416 --augment ssd_random_crop

p5(multi scale):

python train.py --use-pretrain True --model-type p5 --dataset-type voc --dataset dataset/pothole_voc --num-classes 1 --class-names pothole.names --voc-train-set dataset_1,train --voc-val-set dataset_1,val  --epochs 200 --batch-size 4 --multi-scale 320,352,384,416,448,480,512 --augment ssd_random_crop

For training on Chess Pieces dataset(No need to download dataset,it's already included in project):
tiny(single scale):

python train.py --use-pretrain True --model-type tiny --dataset-type voc --dataset dataset/chess_voc --num-classes 12 --class-names chess.names --voc-train-set dataset_1,train --voc-val-set dataset_1,val  --epochs 400 --batch-size 32 --multi-scale 416 --augment ssd_random_crop

tiny(multi scale):

python train.py --use-pretrain True --model-type tiny --dataset-type voc --dataset dataset/chess_voc --num-classes 12 --class-names chess.names --voc-train-set dataset_1,train --voc-val-set dataset_1,val  --epochs 400 --batch-size 32 --multi-scale 320,352,384,416,448,480,512 --augment ssd_random_crop

For training with SAM_sgd on Chess Pieces dataset:

python train.py --optimizer SAM_sgd --use-pretrain True --model-type tiny --dataset-type voc --dataset dataset/chess_voc --num-classes 12 --class-names chess.names --voc-train-set dataset_1,train --voc-val-set dataset_1,val  --epochs 400 --batch-size 32 --multi-scale 416 --augment ssd_random_crop

For training with ema(Exponential Moving Average) on Chess Pieces dataset:

python train.py --ema True --use-pretrain True --model-type tiny --dataset-type voc --dataset dataset/chess_voc --num-classes 12 --class-names chess.names --voc-train-set dataset_1,train --voc-val-set dataset_1,val  --epochs 400 --batch-size 32 --multi-scale 416 --augment ssd_random_crop

Tensorboard visualization:

Navigate to http://0.0.0.0:6006

Evaluation results(GTX2080,[email protected]):

model	Chess Pieces	pothole
Scaled-YoloV4-tiny(416)	0.985
Scaled-YoloV4-tiny(416)+ema	0.990
AlexeyAB's YoloV4(416)		0.814
Scaled-YoloV4-p5(416)		0.826

Evaluation on Pothole dataset:
Evaluation on chess dataset:

Detection

For detection on Chess Pieces dataset:

python3 detect.py --pic-dir images/chess_pictures --model-path output_model/best_model_tiny_0.985/1 --class-names dataset/chess.names --nms-score-threshold 0.1

detection result:

For detection on Pothole dataset:

python3 detect.py --pic-dir images/pothole_pictures --model-path output_model/best_model_p5_0.827/1 --class-names dataset/pothole.names --nms-score-threshold 0.1

detection result:

Customzied training

Convert your dataset to Pascal VOC format(you can use labelImg to generate VOC format dataset)
Generate class names file(such as xxx.names)

python train.py --use-pretrain True --model-type p5 --dataset-type voc --dataset your_dataset_root_dir --num-classes num_of_classes --class-names path_of_xxx.names --voc-train-set dataset_1,train --voc-val-set dataset_1,val  --epochs 200 --batch-size 8 --multi-scale 416  --augment ssd_random_crop

Deployment

TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments.it include two parts:clients and server, we can run them on one machine.

Navigate to deployment directory:

  cd  deployment/tfserving

Generate a docker image which contains your trained model (it will take minutes，you only have to run it one time):

  ./gen_image --model-dir ScaledYOLOv4-tensorflow2/output_model/pothole/best_model_p5_0.811

Deploy model:
- Server side( docker and nvidia-docker installed ):
  
  ./run_image
- Client side(no need to install tensorflow):
  1. install client package
    
    pip install tfservingclient-1.0.0-cp37-cp37m-manylinux1_x86_64.whl
  2. predict images
    
    python demo.py --pic-dir xxxx --class-names xxx.names

References

scaled-yolov4-tensorflow2's People

Contributors

Stargazers

Watchers

Forkers

chenghuige anmolduainter hzk7287 leofengxin pauljurczak tnqle khoadinh44 arthurfortes ctxqlxs ashpika40 clark1216 zmfkzj palak-15 jundeli animeesh ilkayw guanzhao-hub skytodmoon

scaled-yolov4-tensorflow2's Issues

how to convert this to tflite?

Any suggestions? Thank you.

NameError: free variable 'classification_loss' referenced before assignment in enclosing scope

NameError: free variable 'classification_loss' referenced before assignment in enclosing scope
got this error right after the start of first epoch (right after it printed epoch 1/200)

Which TF version have you used for training?

Need to know about Tensorflow version

Thanks

Duplicate name in graph `ones`

Hello, using TF 2.1 training fails during model export. It throws an error in box_decode function in box_coder.py. See traceback:

Traceback (most recent call last):
  File "/Users/ondra/opt/anaconda3/envs/tf_21_py37/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1619, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Duplicate node name in graph: 'ones'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/ondra/Downloads/ScaledYOLOv4-tensorflow2-main/train.py", line 272, in <module>
    main(args)
  File "/Users/ondra/Downloads/ScaledYOLOv4-tensorflow2-main/train.py", line 266, in main
    model = Yolov4(args, training=False)
  File "/Users/ondra/Downloads/ScaledYOLOv4-tensorflow2-main/model/yolov4.py", line 21, in Yolov4
    pre_nms_decoded_boxes, pre_nms__scores = postprocess(outputs,args)
  File "/Users/ondra/Downloads/ScaledYOLOv4-tensorflow2-main/model/postprocess.py", line 22, in postprocess
    decoded_boxes = box_decode(output[..., 0:4], args, index)
  File "/Users/ondra/Downloads/ScaledYOLOv4-tensorflow2-main/model/box_coder.py", line 22, in box_decode
    grid_xy = tf.stack(tf.meshgrid(tf.range(grid_width), tf.range(grid_height)), axis=-1)
  File "/Users/ondra/opt/anaconda3/envs/tf_21_py37/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py", line 3065, in meshgrid
    mult_fact = ones(shapes, output_dtype)
  File "/Users/ondra/opt/anaconda3/envs/tf_21_py37/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py", line 2671, in ones
    output = fill(shape, constant(one, dtype=dtype), name=name)
  File "/Users/ondra/opt/anaconda3/envs/tf_21_py37/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py", line 233, in fill
    result = gen_array_ops.fill(dims, value, name=name)
  File "/Users/ondra/opt/anaconda3/envs/tf_21_py37/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_array_ops.py", line 3247, in fill
    "Fill", dims=dims, value=value, name=name)
  File "/Users/ondra/opt/anaconda3/envs/tf_21_py37/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 742, in _apply_op_helper
    attrs=attr_protos, op_def=op_def)
  File "/Users/ondra/opt/anaconda3/envs/tf_21_py37/lib/python3.7/site-packages/tensorflow_core/python/framework/func_graph.py", line 595, in _create_op_internal
    compute_device)
  File "/Users/ondra/opt/anaconda3/envs/tf_21_py37/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3322, in _create_op_internal
    op_def=op_def)
  File "/Users/ondra/opt/anaconda3/envs/tf_21_py37/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1786, in __init__
    control_input_ops)
  File "/Users/ondra/opt/anaconda3/envs/tf_21_py37/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1622, in _create_c_op
    raise ValueError(str(e))
ValueError: Duplicate node name in graph: 'ones'

The problem is in repeated call of tf.meshgrid function which calls ones function with same naming which works for the first YOLO head, but second time it fails because of already existing name.

Maybe in higher TF version the problem is handled on framework level. My solution is to run postprocess block in for loop in postprocess.py (for loop stats on line 18) under unique name space for each head (iteration).

training time is too high

I am trying to train a model (p5), based on my dataset.
It takes about 30 hours on Darknet. But it takes incredibly much more time with the Scaled-YOLOv4-tensorflow2 code:
time elapsed: 1.813 hour, time left: 360.723 hour

Error showing up during detection

Hi,
When I try to run detect.py using this:
"python detect.py --pic-dir images/chess_pictures --model-path output_model/chess/best_model_tiny_0.985/1 --class-names dataset/chess.names --nms-score-threshold 0.1",
the error which shows up is:

Traceback (most recent call last):
File "detect.py", line 119, in
main(args)
File "detect.py", line 95, in main
model = tf.keras.models.load_model(args.model_path)
File "C:\Users\tsath\anaconda3\envs\yolo\lib\site-packages\tensorflow\python\keras\saving\save.py", line 212, in load_model
return saved_model_load.load(filepath, compile, options)
File "C:\Users\tsath\anaconda3\envs\yolo\lib\site-packages\tensorflow\python\keras\saving\saved_model\load.py", line 147, in load
keras_loader.finalize_objects()
File "C:\Users\tsath\anaconda3\envs\yolo\lib\site-packages\tensorflow\python\keras\saving\saved_model\load.py", line 612, in finalize_objects
self._reconstruct_all_models()
File "C:\Users\tsath\anaconda3\envs\yolo\lib\site-packages\tensorflow\python\keras\saving\saved_model\load.py", line 631, in _reconstruct_all_models
self._reconstruct_model(model_id, model, layers)
File "C:\Users\tsath\anaconda3\envs\yolo\lib\site-packages\tensorflow\python\keras\saving\saved_model\load.py", line 677, in _reconstruct_model
created_layers) = functional_lib.reconstruct_from_config(
File "C:\Users\tsath\anaconda3\envs\yolo\lib\site-packages\tensorflow\python\keras\engine\functional.py", line 1285, in reconstruct_from_config
process_node(layer, node_data)
File "C:\Users\tsath\anaconda3\envs\yolo\lib\site-packages\tensorflow\python\keras\engine\functional.py", line 1222, in process_node
nest.flatten(inbound_node.outputs)[inbound_tensor_index])
IndexError: list index out of range

I used the default dataset (chess) for training and tried to detect using the saved model.
The tensorflow version running is: TF 2.4.1
Can someone help me with resolving this issue? Thanks!!

Training with datasets in COCO format

Hi, first of all, thank you for your work!

I'am trying to train COCO 2017 dataset and also my own dataset in COCO format but without success:

!python train.py --use-pretrain True\
                 --model-type p5\
                 --dataset-type coco\
                 --dataset dataset/coco/\
                 --num-classes 32\
                 --class-names coco.names\
                 --coco-train-set train2017\
                 --coco-valid-set val2017\
                 --epochs 200\
                 --batch-size 8\
                 --multi-scale 416\
                 --augment ssd_random_crop

Output:

...
Load p5 weight successfully!
Tensorboard engine is running at http://localhost:6006/
loading dataset...
  0%|                                                   | 0/625 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 298, in <module>
    main(args)
  File "train.py", line 194, in main
    coco_map_callback = CocoMapCallback(pred_generator,model,args,mAP_writer)
  File "/tf/parkinto-object-detection/Scaled-YOLOv4-tensorflow2/utils/fit_coco_map.py", line 30, in __init__
    for batch_img, batch_boxes, batch_valids in pred_generator_tqdm:
  File "/usr/local/lib/python3.6/dist-packages/tqdm/std.py", line 1170, in __iter__
    for obj in iterable:
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/data_utils.py", line 483, in __iter__
    for item in (self[i] for i in range(len(self))):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/data_utils.py", line 483, in <genexpr>
    for item in (self[i] for i in range(len(self))):
  File "/tf/parkinto-object-detection/Scaled-YOLOv4-tensorflow2/generator/coco_generator.py", line 183, in __getitem__
    y_true = get_y_true(self.max_side, batch_boxes, groundtruth_valids, self._args)
  File "/tf/parkinto-object-detection/Scaled-YOLOv4-tensorflow2/generator/get_y_true.py", line 92, in get_y_true
    grids[grid_index][batch_index][grid_xy[1]][grid_xy[0]][grid_anchor_index][5+batch_boxes[batch_index][box_index][4].astype(np.int32)] = 1
IndexError: index 44 is out of bounds for axis 0 with size 37

I'am using TF 2.4.1

Is possible to train on datasets in COCO format or I have to convert them to VOC format? Thank you

Error Converting weights of p7 from pytorch to tensorflow. Could you help in this

I have tried converted .pt model to .onxx and then onxx to pb file for tensorflow. I have checked the graph using neutron => 1 transpose layer has been added so that it can handle NHWC input of tensorflow.

Customzied training Error

when i trained myself datasets,it comes
+-------------------------------------------+
loading dataset...
loading dataset...
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:03<00:00, 1.16s/it]
creating index...
index created!
1e-06
0%| | 0/20 [00:00<?, ?it/s]2021-04-28 11:28:55.279041: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic librarylibcudnn.so.7
2021-04-28 11:28:57.187358: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic librarylibcublas.so.10
0%| | 0/20 [00:09<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 298, in
main(args)
File "train.py", line 235, in main
model_outputs = model(batch_imgs, training=True)
File "/opt/anaconda3/envs/pray2/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 968, in call
outputs = self.call(cast_inputs, *args, **kwargs)
File "/opt/anaconda3/envs/pray2/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 719, in call
convert_kwargs_to_constants=base_layer_utils.call_context().saving)
File "/opt/anaconda3/envs/pray2/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 888, in _run_internal_graph
output_tensors = layer(computed_tensors, **kwargs)
File "/opt/anaconda3/envs/pray2/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 968, in call
outputs = self.call(cast_inputs, *args, **kwargs)
File "/opt/anaconda3/envs/pray2/lib/python3.6/site-packages/tensorflow/python/keras/layers/merge.py", line 183, in call
return self._merge_function(inputs)
File "/opt/anaconda3/envs/pray2/lib/python3.6/site-packages/tensorflow/python/keras/layers/merge.py", line 522, in _merge_function
return K.concatenate(inputs, axis=self.axis)
File "/opt/anaconda3/envs/pray2/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 2709, in concatenate
return array_ops.concat([to_dense(x) for x in tensors], axis)
File "/opt/anaconda3/envs/pray2/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/opt/anaconda3/envs/pray2/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1606, in concat
return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
File "/opt/anaconda3/envs/pray2/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1181, in concat_v2
_ops.raise_from_not_ok_status(e, name)
File "/opt/anaconda3/envs/pray2/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6653, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [9,13,13,512] vs. shape[1] = [9,12,12,512] [Op:ConcatV2] name: concat
+-------------------------------+
could you help me》

library version

i wonder your env
tensorflow, numpy, python, pycocotools

Please tell me the exact version number. The version is different from yours, so an error seems to occur in the middle.

YOLOv4-P7 Nan error

Hello.

I've tried to train your YOLOv4-P7 model and previously was facing NaN error.

I have observed that you have updated loss function slightly to fix the issue and wonder whether the issue has been resolved, as mAP scores of P6 and P7 haven't been updated yet.

Has the issue been solved??

Unable to load checkpoint

I assumed that the way to load a saved checkpoint is the same as loading pretrained weight. However, when I try to load my own saved checkpoint and train again with the exact same data and exact same command, I got this error:

Traceback (most recent call last):
  File "train.py", line 310, in <module>
    main(args)
  File "train.py", line 151, in main
    pretrain_model.load_weights(args.p5_coco_pretrained_weights).expect_partial()
  File "C:\Program Files\Python38\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2205, in load_weights
    status = self._trackable_saver.restore(filepath, options)
  File "C:\Program Files\Python38\lib\site-packages\tensorflow\python\training\tracking\util.py", line 1336, in restore
    base.CheckpointPosition(
  File "C:\Program Files\Python38\lib\site-packages\tensorflow\python\training\tracking\base.py", line 253, in restore
    restore_ops = trackable._restore_from_checkpoint_position(self)  # pylint: disable=protected-access
  File "C:\Program Files\Python38\lib\site-packages\tensorflow\python\training\tracking\base.py", line 972, in _restore_from_checkpoint_position
    current_position.checkpoint.restore_saveables(
  File "C:\Program Files\Python38\lib\site-packages\tensorflow\python\training\tracking\util.py", line 307, in restore_saveables
    new_restore_ops = functional_saver.MultiDeviceSaver(
  File "C:\Program Files\Python38\lib\site-packages\tensorflow\python\training\saving\functional_saver.py", line 345, in restore
    restore_ops = restore_fn()
  File "C:\Program Files\Python38\lib\site-packages\tensorflow\python\training\saving\functional_saver.py", line 321, in restore_fn
    restore_ops.update(saver.restore(file_prefix, options))
  File "C:\Program Files\Python38\lib\site-packages\tensorflow\python\training\saving\functional_saver.py", line 115, in restore
    restore_ops[saveable.name] = saveable.restore(
  File "C:\Program Files\Python38\lib\site-packages\tensorflow\python\training\saving\saveable_object_util.py", line 131, in restore
    return resource_variable_ops.shape_safe_assign_variable_handle(
  File "C:\Program Files\Python38\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 307, in shape_safe_assign_variable_handle
    shape.assert_is_compatible_with(value_tensor.shape)
  File "C:\Program Files\Python38\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 1134, in assert_is_compatible_with
    raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (340,) and (40,) are incompatible

The command I'm using: python train.py --epochs 200 --batch-size 4 --start-eval-epoch 0 --model-type p5 --use-pretrain True --dataset-type coco --dataset dataset/CV1/ --num-classes 5 --class-names CV1.names --coco-train-set train --coco-valid-set val --augment ssd_random_crop --p5-coco-pretrained-weights checkpoints/best_weight_p5_27_0.872

Let me know if you need the weight files to test. I can share

Index out of bound

Trying to train on tiny model with custom data and image_size 608, got the following error. Using coco format
Any idea?

loading dataset...
 2% 2/88 [00:00<00:22,  3.77it/s]Traceback (most recent call last):
 File "train.py", line 298, in <module>
   main(args)
 File "train.py", line 194, in main
   coco_map_callback = CocoMapCallback(pred_generator,model,args,mAP_writer)
 File "/content/ScaledYOLOv4-tensorflow2/utils/fit_coco_map.py", line 30, in __init__
   for batch_img, batch_boxes, batch_valids in pred_generator_tqdm:
 File "/usr/local/lib/python3.7/dist-packages/tqdm/std.py", line 1104, in __iter__
   for obj in iterable:
 File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/data_utils.py", line 483, in __iter__
   for item in (self[i] for i in range(len(self))):
 File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/data_utils.py", line 483, in <genexpr>
   for item in (self[i] for i in range(len(self))):
 File "/content/ScaledYOLOv4-tensorflow2/generator/coco_generator.py", line 183, in __getitem__
   y_true = get_y_true(self.max_side, batch_boxes, groundtruth_valids, self._args)
 File "/content/ScaledYOLOv4-tensorflow2/generator/get_y_true.py", line 91, in get_y_true
   grids[grid_index][batch_index][grid_xy[1]][grid_xy[0]][grid_anchor_index][0:4] = np.concatenate([dxdy,dwdh])
IndexError: index 19 is out of bounds for axis 0 with size 19

Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ./pretrain/ScaledYOLOV4_p6_coco_pretrain/coco_pretrain

Somehow, I cant load the pretrained weights.
My pretrained folder for p6 contains the following files:

\pretrain\ScaledYOLOV4_p6_coco_pretrain\checkpoint
\pretrain\ScaledYOLOV4_p6_coco_pretrain\coco_pretrain.data-00000-of-00001
\pretrain\ScaledYOLOV4_p6_coco_pretrain\coco_pretrain.index

Can you tell me where my mistake is?

(To be more precise, I'm only interested in saving the keras models as .h5 for inference only in my environment.)

Cannot reproduce mAP with potholes data

I ran the command as specified in the docs to train on an RTX3090 using the potholes dataset as follows.

python train.py --use-pretrain True --model-type p5 --dataset-type voc --dataset dataset/pothole_voc --num-classes 1 --class-names pothole.names --voc-train-set dataset_1,train --voc-val-set dataset_1,val  --epochs 200 --batch-size 4 --multi-scale 320,352,384,416,448,480,512 --augment ssd_random_crop

However I cannot reproduce the mAP as presented.

Similarly the training loss looks quite different.

Any thoughts on what I might be doing wrong? Or perhaps there has been a material change to the repo since those results were posted.

Feedback appreciated and thanks for providing this good work to the community.

would you mind uploading the pothole datatset weights?

Can you upload the pothole datatset weights, My laptop can't train model, but I want to use this, please!

Model fails to save after training

Hi,

I am seeing the following error:

Training is finished!
Exporting model...
Traceback (most recent call last):
File "C:\TensorFlowObjectDetection\TFODCourse\Scaled_YOLOv4_tf2\train.py", line 382, in
main(args)
File "C:\TensorFlowObjectDetection\TFODCourse\Scaled_YOLOv4_tf2\train.py", line 374, in main
model = Yolov4(args, training=False)
File "C:\TensorFlowObjectDetection\TFODCourse\Scaled_YOLOv4_tf2\model\yolov4.py", line 11, in Yolov4
scaled_yolov4_csp_darknet53_outputs = scaled_yolov4_csp_darknet53(input,mode=args.model_type)
File "C:\TensorFlowObjectDetection\TFODCourse\Scaled_YOLOv4_tf2\model\CSPDarknet53.py", line 35, in scaled_yolov4_csp_darknet53
x = conv2d_bn_mish(x, 32, (3, 3), name="first_block")
File "C:\TensorFlowObjectDetection\TFODCourse\Scaled_YOLOv4_tf2\model\common.py", line 7, in conv2d_bn_mish
return x * tf.math.tanh(tf.math.softplus(x))
File "C:\TensorFlowObjectDetection\TFODCourse\tfod\lib\site-packages\tensorflow\python\util\traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\TensorFlowObjectDetection\TFODCourse\tfod\lib\site-packages\keras\layers\core\tf_op_layer.py", line 119, in handle
return TFOpLambda(op)(*args, **kwargs)
File "C:\TensorFlowObjectDetection\TFODCourse\tfod\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
OverflowError: Exception encountered when calling layer "tf.math.softplus" (type TFOpLambda).

Python int too large to convert to C long

Call arguments received by layer "tf.math.softplus" (type TFOpLambda):
• features=tf.Tensor(shape=(None, None, None, 32), dtype=float32)
• name=None

I still have the checkpoints saved and a temp_model_variables.h5 has been created but I am unable to convert these into a saved model.

Any advice?

Multi GPU support

There are 4 GPUs are available for tensorflow on my machine, but only one is used on training.