Code Monkey home page Code Monkey logo

dynamic-yolo's Introduction

Dynamic YOLO for Small Underwater Object Detection

Underwater object detection is one of the most essential methods for marine exploration. However, small objects in underwater environments pose a crucial challenge that degrades detection performance dramatically. In this paper, a dynamic YOLO detector is presented as a solution to alleviate this problem. First, a light-weight backbone network is built based on deformable convolution v3 with some specialized designs for small object detection. Second, a unified feature fusion framework based on channel-, scale-, and spatial-aware attention is proposed to fuse feature maps from different scales. It can fully utilize the increased capability of the proposed backbone. Lastly, a simple but effective detection head is designed to deal with the conflict between classification and localization by disentangling and aligning the two tasks. With the alignment, our dynamic YOLO obtains the ability for robust localization. Extensive experiments are conducted on benchmark datasets to demonstrate the effectiveness of the proposed model. Without bells and whistles, dynamic YOLO outperforms the recent state-of-the-art methods by a large margin of $+1.2 \ mAP$, and $+1.8 \ AP_{S}$ on $\textit{DUO}$ dataset. Experimental results on $\textit{Pascal VOC}$ and $\textit{MS COCO}$ detasets also demonstrate the superiority of the proposed method. At last, ablation studies on $\textit{DUO}$ dataset are conducted to validate the effectiveness and efficiency of each design.

Usage

Our detection code is developed on top of MMDetection v3.0.

Install

  • Clone this repo:
git clone https://github.com/chenjie04/Dynamic-YOLO.git
cd Dynamic-YOLO
  • Create a conda virtual environment and activate it:
conda create -n Dynamic-YOLO python=3.8 -y
conda activate Dynamic-YOLO
  • Install PyTorch following official instructions, e.g.
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia
  • Install MMEngine, MMCV and mmdet using MIM.
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"
mim install mmdet
  • Install and configre wandb:
pip install wandb
wandb login

Provide your API key when prompted.

  • Install other requirements:
pip install timm # For stochastic depth
pip install ninja # For the compilation of DCN v3 
  • Compile DCN v3 CUDA operators:
cd ./ops_dcnv3
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Data Preparation

Download Detecting Underwater Objects (DUO). It is recommended to download and extract the dataset somewhere outside the project directory. The folder structure is like follow:

|--data
|    |--coco
|    |--DUO
|    |--VOCdevkit
|
|--Dynamic-YOLO
|    |--config
|    .....

Training

  • Multi-gpu training
bash dist_train.sh configs/dynamic_yolo/dynamic_yolo_s_300e_DUO.py 2
  • Single-gpu Training
python train.py configs/dynamic_yolo/dynamic_yolo_s_300e_DUO.py

Testing

python test.py configs/dynamic_yolo/dynamic_yolo_s_300e_DUO.py work_dirs/dynamic_yolo_s_300e_DUO/epoch_300.pth

Citation

If you find this project useful in your research, please consider cite:

@article{chen2024dynamic,
  title={Dynamic YOLO for small underwater object detection},
  author={Chen, Jie and Er, Meng Joo},
  journal={Artificial Intelligence Review},
  volume={57},
  number={7},
  pages={1--23},
  year={2024},
  publisher={Springer}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.