AI Basketball Games Video Editor

AI Basketball Games Video Editor is a command-line program to get basketball highlight video by PyTorch YOLOv4 object detection. Analyze basketball and basketball hoop locations collected from object detection. It can get shot frame index and cut video frame to merge highlight video.

├── README.md
├── video_editor.py                   demo to get basketball highlight video
├── pytorch_YOLOv4                    pytorch-YOLOv4 source code
│   ├── weights                       need to download weights
│   └── ...
├── tool
│   ├── utils_basketball.py           detect basketball shots algorithm
│   └── utils.py                  
├── dataset
│   └── your_video_name.mp4
├── result
│   ├── obj_log_name.data             save frame information and object detect result
│   └── your_output_video_name.mp4

0. Environments

0.1 Get a copy

git clone https://github.com/OwlTing/AI_basketball_games_video_editor.git

0.2 Create virtual environments

conda create --name py36_env python=3.6
conda activate py36_env
cd AI_basketball_games_video_editor

0.3 Requirements

Debian 10
python 3.6
numpy
pandas
tqdm
cv2
pytorch 1.3.0
Please refer to the official documentation for installing pytorch https://pytorch.org/get-started/locally/
More details for different cuda version https://pytorch.org/get-started/previous-versions/
Example:
conda install pytorch==1.3.0 torchvision==0.4.1 cudatoolkit=10.0 -c pytorch

Optional (For tensorrt yolov4 object detector engine):
tensorrt 7.0.0
Please refer to the official documentation for installing tensorrt with different cuda version
https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html
Example: (For Debian 10 cuda 10.0)

mkdir tensorrt
From https://developer.nvidia.com/tensorrt, to download TensorRT-7.0.0.11.Ubuntu-18.04.x86_64-gnu.cuda-10.0.cudnn7.6.tar.gz
(select TensorRT 7.0) in the directory tensorrt/
tar xzvf TensorRT-7.0.0.11.Ubuntu-18.04.x86_64-gnu.cuda-10.0.cudnn7.6.tar.gz
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/<path_your_tensorrt>/TensorRT-7.0.0.11/lib
cd TensorRT-7.0.0.11/python/
pip install tensorrt-7.0.0.11-cp36-none-linux_x86_64.whl

sudo cp /<path_your_tensorrt>/TensorRT-7.0.0.11/lib/libnvinfer.so.7 /usr/lib/ ;  
sudo cp /<path_your_tensorrt>/TensorRT-7.0.0.11/lib/libnvonnxparser.so.7 /usr/lib/ ;  
sudo cp /<path_your_tensorrt>/TensorRT-7.0.0.11/lib/libnvparsers.so.7 /usr/lib/ ;  
sudo cp /<path_your_tensorrt>/TensorRT-7.0.0.11/lib/libnvinfer_plugin.so.7 /usr/lib/ ;  
sudo cp /<path_your_tensorrt>/TensorRT-7.0.0.11/lib/libmyelin.so.1 /usr/lib/

pip install pycuda

1. Weights Download

1.1 darknet2pytorch

google(https://drive.google.com/file/d/15waE6I1odd_cR3hKKpm1uXXE41s5q1ax)
mkdir pytorch_YOLOv4/weights/
download file yolov4-basketball.weights in the directory pytorch_YOLOv4/weights/

1.2 tensorrt

google(https://drive.google.com/file/d/1_c8uhyi47Krs5gAbRR66zzYKaxGNnzEs)
mkdir pytorch_YOLOv4/weights/
download file yolov4-basketball.trt in the directory pytorch_YOLOv4/weights/

2. Use AI Basketball Games Video Editor

2.1 Prepare your basketball video

download your basketball video in the directory dataset/

2.2 Prepare output folder

mkdir result

2.3 Run the demo

python video_editor.py --video_path VIDEO_PATH --output_path OUTPUT_PATH --output_video_name OUTPUT_VIDEO_NAME [OPTIONS]

# example
python video_editor.py --video_path dataset/basketball_demo.mp4 --output_path result/demo --output_video_name out_demo.mp4

It will generate your_output_video_name.mp4 obj_log_name.data in the directory result/
If you had finished extracting features. You can use --read_flag 1 to read log for different output video mode.
If you use pytorch yolov4 object detector engine --inference_detector pytorch.
For image input size, you can select any inference_size = (height, width) in
height = 320 + 96 * n, n in {0, 1, 2, 3, ...}
width = 320 + 96 * m, m in {0, 1, 2, 3, ...}
Exmaple --inference_size (1184, 1184) or --inference_size (704, 704)
Default inference_size is (1184, 1184)
If you use tensorrt yolov4 object detector engine --inference_detector tensorrt.
For image input size, you only can select --inference_size (1184, 1184).
Tensorrt engine 3x faster than pytorch engine fps.

You can use --output_mode shot to select different output video mode.

output video mode  
full            show person basketball basketball_hoop frame_information  
basketball      show basketball basketball_hoop frame_information  
shot            show basketball shot frame_information  
standard        show frame_information  
clean           only cutting video

You can refer the command-line options.

optional arguments:
-h, --help                                       show this help message and exit

--video_path VIDEO_PATH                          input video path (default: None)
                                                 
--output_path OUTPUT_PATH                        output folder path (default: None)
                                                 
--output_video_name OUTPUT_VIDEO_NAME            output video name (default: None)
                                                 
--highlight_flag HIGHLIGHT_FLAG                  select 1 with auto-generated highlight or 
                                                 0 without auto-generated highlight (default: 1)
                                                 
--output_mode OUTPUT_MODE                        output video mode 
                                                 full       show person basketball basketball_hoop frame_information 
                                                 basketball show basketball basketball_hoop frame_information 
                                                 shot       show basketball shot frame_information 
                                                 standard   show frame_information 
                                                 clean      only cutting video (default: shot)
                                                 
--process_frame_init PROCESS_FRAME_INIT          start processing frame (default: 0)
                                                 
--process_frame_final PROCESS_FRAME_FINAL        end processing frame. If process_frame_final < 0, 
                                                 use video final frame (default: -1)
                                                 
--obj_log_name OBJ_LOG_NAME                      save frame information and obj detect result 
                                                 (default: obj_log_name.data)
                                                 
--save_step SAVE_STEP                            save obj log for each frame step (default: 2000)
                                                 
--weight_path WEIGHT_PATH                        Yolov4 weight path (default: pytorch_YOLOv4/weights/yolov4-basketball.weights)
                                                 
--cfg_path CFG_PATH                              Yolov4 cfg path (default: pytorch_YOLOv4/cfg/yolov4-basketball.cfg)

--num_classes NUM_CLASSES                        num classes = 3 (person/basketball/basketball_hoop) (default: 3)
                                                 
--namesfile_path NAMESFILE_PATH                  Yolov4 class names path (default: pytorch_YOLOv4/data/basketball_obj.names)
                                                 
--inference_detector INFERENCE_DETECTOR          object detector engine. You can select pytorch or tensorrt (default: pytorch)
                                                 
--inference_size INFERENCE_SIZE                  Image input size for inference 
                                                 If you use pytorch yolov4 object detector engine 
                                                 height = 320 + 96 * n, n in {0, 1, 2, 3, ...} 
                                                 width = 320 + 96 * m, m in {0, 1, 2, 3, ...} 
                                                 inference_size= (height, width) 
                                                 
                                                 If you use tensorrt yolov4 object detector engine Image input size for
                                                 inference only with inference_size = (1184, 1184) (default: (1184, 1184))
                                                 
--read_flag READ_FLAG                            read log mode flag If you had finished extracting features. You can use 
                                                 select 1 to read log for different output video mode. (default: 0)
                                                                                                  
--cut_frame CUT_FRAME                            cut frame range around shot frame index for highlight video (default: 50)

Reference:

https://github.com/Tianxiaomo/pytorch-YOLOv4
https://github.com/eriklindernoren/PyTorch-YOLOv3
https://github.com/marvis/pytorch-caffe-darknet-convert
https://github.com/marvis/pytorch-yolo3
Paper Yolo v4: https://arxiv.org/abs/2004.10934
Source code Yolo v4:https://github.com/AlexeyAB/darknet
More details: http://pjreddie.com/darknet/yolo/

@article{yolov4,
  title={YOLOv4: YOLOv4: Optimal Speed and Accuracy of Object Detection},
  author={Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao},
  journal = {arXiv},
  year={2020}
}

Contact:
Issues should be raised directly in the repository.
If you are very interested in this project, please feel free to contact me ([email protected]).

Took much time to generate highlight video

I am pleasure I file the 1st issue here. I am very intrested in this tool and I am newbee of pytorch.
Thanks for your effort. ^^

I got a question:

when I use your wonderful script, it works well. but for longger video, it would take much time...

My test PC Env

cpu: i5 8500 3.00GHz
gpu: GTX1060 6GB

Code

  def detect(self, model, img, image_size):
        model.eval()
        
        IN_IMAGE_H, IN_IMAGE_W = image_size
        
        sized = cv2.resize(img, (IN_IMAGE_W, IN_IMAGE_H))
        sized = cv2.cvtColor(sized, cv2.COLOR_BGR2RGB)        
        
        t0 = time.time()

        if type(sized) == np.ndarray and len(sized.shape) == 3:  # cv2 image
            sized = torch.from_numpy(sized.transpose(2, 0, 1)).float().div(255.0).unsqueeze(0)
        elif type(sized) == np.ndarray and len(sized.shape) == 4:
            sized = torch.from_numpy(sized.transpose(0, 3, 1, 2)).float().div(255.0)
        else:
            print("unknow image type")
            exit(-1)
        
        use_cuda = 1
        if use_cuda:
            sized = sized.cuda()
        sized = torch.autograd.Variable(sized)

        t1 = time.time()
        
        with torch.no_grad():
            output = model(sized)

        t2 = time.time()

#         print('-----------------------------------')
#         print('           Preprocess : %f' % (t1 - t0))
#         print('      Model Inference : %f' % (t2 - t1))
#         print('-----------------------------------')

        boxes = post_processing(img, 0.4, 0.6, output)

        return boxes

model:

        m = Darknet(cfg_path)
#         m.print_network()
        m.load_weights(weight_path)
        print('Loading weights from %s... Done!' % (weight_path))

        if use_cuda:
            m.cuda()

        self.num_classes = m.num_classes
        self.class_names = load_class_names(namesfile_path)
        self.engine = m
        self.image_size = inference_size

when use inference_size: (1184, 1184), each frame will take 200+ ms in the following step:

  with torch.no_grad():
            output = model(sized)

and total vedio frame size is 36000+, that means total time cost would be > 3 hour

So, if the pytorch has this such bad performance ? do you know the reason. Thank you. ^^

owlting / ai_basketball_games_video_editor Goto Github PK