Code Monkey home page Code Monkey logo

ai-train's Introduction

πŸš† 2nd Place Solution of AI Journey Contest 2021: AITrain πŸš†

Competition description

The goal of the competition is to create a computer vision system for Semantic Rail Scene Understanding. Developing an accurate and robust algorithm is a clear way to enhance rail traffic safety. Successful models can be incorporated in real-time applications to warn train drivers about possible collisions with potentially hazardous objects.

The dataset consists of over 7000 images from the ego-perspective of trains. Each image is annotated with bounding boxes of 11 different types of objects (such as car, human, wagon, trailing switch) and dense pixel-wise semantic labeling for 3 different classes.

The quality metric of the competition is weighted average of [email protected] and meanIoU:

competition_metric = 0.7 * mAP@.5 + 0.3 * meanIoU

This is a code competition so the testing time and resources are limited:

  • Time for inference: 15min for 300 images;
  • 1 GPU Tesla V100 32 Gb;
  • 3 vCPU;
  • 94 GB RAM.

Solutions are run in Docker container in the offline mode.

Solution

Two main architectures of the solution are Panoptic FPN and YOLOv5. We don't train separate models for semantic segmentation task but solely rely on Panoptic FPN and multitask learning.

In a nutshell, Panoptic FPN is an extended version of Mask-RCNN with an additional semantic segmentation branch:

Panoptic FPN architecture. Image source

YOLOv5 is a high-performing, lightweight and very popular object detection framework. A simple codebase allows to quickly train a model on a custom dataset making YOLOv5 an attractive choice for CV competitions.

The solution is an ensemble of 6 models:

  • Panoptic FPN with ResNet101 backbone and standard Faster-RCNN ROI head. The shortest image side size is chosen from [1024, 1536] with a step of 64.
  • Panoptic FPN with ResNet50 backbone and Cascade-RCNN ROI head. Image size: [1024, 1536] with a step of 64.
  • RetinaNet with ResNet50 backbone. Image size: [1280, 1796] with a step of 64.
  • YOLOv5m6 with 2048 image size.
  • YOLOv5m6 with 2560 image size.
  • YOLOv5l6 with 1536 image resolution and label smoothing of 0.1.

To ensemble different models we use Weighted Boxes Fusion (WBF) for object detection and a simple average for semantic segmentation. We also tried NMS, Soft-NMS and Non-maximum weighted but WBF demonstrated a superior performance. We set iou_threshold=0.6 and equal weights to all models.

image
Weighted Boxes Fusion. Image source

A bag tricks, tweaks and freebies is used to improve the performance:

  • Multitask learning: all Detectron2 models are trained to solve both object detection and semantic segmentation tasks. Multitask learning, if applied correctly, improves generalization and reduces overfiting. Moreover, solving both tasks at once makes an inference more efficient.
  • Test time augmentations: for each model run an inference on several augmented versions of original images. We use image resizing augmentation with [0.8, 1, 1.2] scales with respect to maximum training image size.
  • High image resolution on both training and inference. The dataset contains quite a large amount of tiny objects so it is crucial to use high resolution images.
  • Multi-scale training: using different image resolutions during training enhances the final performace.
  • Light augmentations: the list of used augmentations is limited to only Random Crop, Random Brightness, Random Contrast and Random Saturation. The flips are not used since there are some classes that depend on the sides (e.g. facing switch left or facing switch right) Harder color and spatial augmentations hurt the performance probably due to the vast amount of tiny objects and objects which class is might be recognized only by object's color (e.g. traffic light permitting or not).

The implementation is heavily based on Detectron2 and YOLOv5 frameworks.

Detectron2 YOLOv5

Results

The results in the table correspond to an inference without TTA if not specified otherwise.

Run β„– Model mAP:0.5 local mIoU local Metric local mAP:0.5 public LB mIoU public LB Metric public LB
1 Panoptic FPN, ResNet50 0.583 0.8778 0.6716 0.375 0.892 0.530
2 Panoptic FPN, ResNet101 0.604 0.8885 0.6893 β€” β€” β€”
4 Panoptic FPN, ResNet50, Cascade ROI head 0.606 0.8626 0.6832 β€” β€” β€”
5 RetinaNet, ResNet50 0.594 β€” β€” β€” β€” β€”
6 YOLOv5m6, TTA, img_size=2048 0.619 β€” β€” β€” β€” β€”
7 YOLOv5m6, TTA, img_size=2560 0.606 β€” β€” β€” β€” β€”
9 YOLOv5l6, TTA, img_size=1536, label_smoothing=0.1 0.607 β€” β€” β€” β€” β€”
Ensembled run numbers
2 + 4 + 5 0.642 0.8855 0.7153 0.415 0.897 0.560
2 + 4 + 6 0.657 0.8855 0.7255 β€” β€” β€”
2 + 4 + 5 + 6 0.669 0.8855 0.7341 0.421 0.897 0.564
2 + 4 + 5 + 6 + 7 0.676 0.8855 0.7393 0.440 0.897 0.577
2 + 4 + 5 + 6 + 7 with TTA 0.667 0.8875 0.7336 0.453 0.899 0.587
2 + 4 + 5 + 6 + 7 + 9 0.685 0.8855 0.7449 0.434 0.897 0.573
2 + 4 + 5 + 6 + 7 + 9 with TTA 0.674 0.8875 0.7384 0.447 0.899 0.583

How to run

🐳  Docker

Start a Docker container via docker-compose:

JUPYTER_PORT=8888 GPUS=all docker-compose -p $USER up -d --build

All the following steps are supposed to be run in the container.

Dataset

Download and unpack the data into data/raw directory:

data/
β”œβ”€β”€ raw/
β”‚   β”œβ”€β”€ bboxes
β”‚   β”œβ”€β”€ images
β”‚   └── masks

Run the following commands to prepare the dataset for Detectron2 models:

PYTHONPATH=$(pwd)/src python3 -m data.data2coco
PYTHONPATH=$(pwd)/src/baseline python3 -m evaluation.masks2json --path_to_masks data/raw/masks --path_to_save test.json
PYTHONPATH=$(pwd)/src python3 -m data.prepare_masks
PYTHONPATH=$(pwd)/src python3 -m data.split

To prepare the dataset for YOLOv5 use the baseline notebook provided by organizers.

The data structure after the data preparation should look as following:

data/
β”œβ”€β”€ raw/
β”‚   β”œβ”€β”€ bboxes
β”‚   β”œβ”€β”€ images
β”‚   β”œβ”€β”€ masks
β”‚   β”œβ”€β”€ detection_coco.json
β”‚   └── segmentation_coco.json
β”œβ”€β”€ processed/
β”‚   β”œβ”€β”€ train
β”‚   β”œβ”€β”€ val
β”‚   β”œβ”€β”€ masks
β”‚   └── test_filenames.json
β”œβ”€β”€ yolo/
β”‚   β”œβ”€β”€ images
|   └── labels

You can take a look at the processed dataset with the visualization notebook.

Training

Detectron2 models

The configs for Detectron2 models are located here. For example, to train a Panoptic FPN with ResNet101 backbone run the following command

bash train_dt2.sh my-sota-run main-v100

YOLOv5 models

To train a YOLOv5 model run the following commands

cd src/baseline/yolov5
python3 train.py --rect --img 2048 --batch 16 --epochs 100 --data aitrain_dataset.yaml --weights yolov5m6.pt --hyp data/hyps/hyp_aitrain.yaml --name my-sota-run

Evaluation

Run this notebook to evaluate the model and to also run a grid search for inference parameters. To visualize and look at the predictions use this notebook.

Make a submission

The training results (model weights and configs) should be located in outputs/ directory. Modify the solution file to select the required runs and run

./make_submission.sh "dt2-model-1,dt2-model2,dt2-model3" "yolo-model-1,yolo-model-2"

References

ai-train's People

Contributors

mamatshamshiev avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.