Code Monkey home page Code Monkey logo

os2d-cirtorch-pytorch-gem's Introduction

OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features

This repo is the implementation of the following paper:

OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features
Anton Osokin, Denis Sumin, Vasily Lomakin
arXiv:2003.06800v1, 2020

If you use our ideas, code or data, please, cite our paper.

Citation in bibtex
@article{osokin2020os2d,
    title={{OS2D}: One-Stage One-Shot Object Detection by Matching Anchor Features}},
    author={Anton Osokin and Denis Sumin and Vasily Lomakin},
    year={2020},
    journal={arXiv:2003.06800v1},
}

License

This software is released under the MIT license, which means that you can use the code in any way you want.

Requirements

  1. python >= 3.7
  2. pytorch >= 1.4, torchvision >=0.5
  3. NVIDIA GPU, tested with V100 and GTX 1080 Ti
  4. Installed CUDA, tested with v10.0

See INSTALL.md for the package installation.

Demo

See our demo-notebook for an illustration of our method.

Dataset installation

  1. Grozi-3.2k dataset with our annotation (0.5GB): download from Google Drive or with the magic command and unpack to $OS2D_ROOT/data
cd $OS2D_ROOT
./os2d/utils/wget_gdrive.sh data/grozi.zip 1Fx9lvmjthe3aOqjvKc6MJpMuLF22I1Hp
unzip data/grozi.zip -d data
  1. Extra test sets of retail products (0.1GB): download from Google Drive or with the magic command and unpack to $OS2D_ROOT/data
cd $OS2D_ROOT
./os2d/utils/wget_gdrive.sh data/retail_test_sets.zip 1Vp8sm9zBOdshYvND9EPuYIu0O9Yo346J
unzip data/retail_test_sets.zip -d data
  1. INSTRE datasets (2.3GB) are re-hosted in Center for Machine Perception in Prague (thanks to Ahmet Iscen!):
cd $OS2D_ROOT
wget ftp://ftp.irisa.fr/local/texmex/corpus/instre/gnd_instre.mat -P data/instre  # 200KB
wget ftp://ftp.irisa.fr/local/texmex/corpus/instre/instre.tar.gz -P data/instre  # 2.3GB
tar -xzf data/instre/instre.tar.gz -C data/instre
  1. If you want to add your own dataset you should create an instance of the DatasetOneShotDetection class and then pass it into the functions creating dataloaders build_train_dataloader_from_config or build_eval_dataloaders_from_cfg from os2d/data/dataloader.py. See os2d/data/dataset.py for docs and examples.

Trained models

We release three pretrained models:

  1. OS2D V2-train: Google Drive
  2. OS2D V1-train: Google Drive
  3. OS2D V2-init: Google Drive

You can download them with the magic command:

cd $OS2D_ROOT
./os2d/utils/wget_gdrive.sh models/os2d_v2-train.pth 1l_aanrxHj14d_QkCpein8wFmainNAzo8
./os2d/utils/wget_gdrive.sh models/os2d_v1-train.pth 1ByDRHMt1x5Ghvy7YTYmQjmus9bQkvJ8g
./os2d/utils/wget_gdrive.sh models/os2d_v2-init.pth 1sr9UX45kiEcmBeKHdlX7rZTSA4Mgt0A7

Evaluation

  1. OS2D V2-train (best model)

For a fast eval on a validation set, one can do use a single scale of images with this script (will give 85.58 mAP on the validation set "grozi-val-new-cl"):

cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False model.backbone_arch ResNet50 train.do_training False eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" init.model models/os2d_v2-train.pth eval.scales_of_image_pyramid "[1.0]"

Multiscale evaluation gives better results - scripts below use the default setting with 7 scales: 0.5, 0.625, 0.8, 1, 1.2, 1.4, 1.6. Note that this evaluation can be slower because of the multiple scale and a lot of classes in the dataset.

To evaluate on the validation set with multiple scales, run:

cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False model.backbone_arch ResNet50 train.do_training False eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" init.model models/os2d_v2-train.pth
  1. OS2D V1-train

To evaluate on the validation set run:

cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model False model.use_simplified_affine_model True model.backbone_arch ResNet101 train.do_training False eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" init.model models/os2d_v1-train.pth
  1. OS2D V2-init

To evaluate on the validation set run:

cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False model.backbone_arch ResNet50 train.do_training False eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" init.model models/os2d_v2-init.pth

Training

Pretrained models

In this project, we do not train models from scratch but start from some pretrained models. For instructions how to get them, see models/README.md.

Best models

Our V2-train model on the Grozi-3.2k dataset was trained using this command:

cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False train.objective.loc_weight 0.0 train.model.freeze_bn_transform True model.backbone_arch ResNet50 init.model models/imagenet-caffe-resnet50-features-ac468af-renamed.pth init.transform models/weakalign_resnet101_affine_tps.pth.tar train.mining.do_mining True output.path output/os2d_v2-train

Dut to hard patch mining, this process is quite slow. Without it, training is faster, but produces slightly worse results:

cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False train.objective.loc_weight 0.0 train.model.freeze_bn_transform True model.backbone_arch ResNet50 init.model models/imagenet-caffe-resnet50-features-ac468af-renamed.pth init.transform models/weakalign_resnet101_affine_tps.pth.tar train.mining.do_mining False output.path output/os2d_v2-train-nomining

For the V1-train model, we used this command:

cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model False model.use_simplified_affine_model True train.objective.loc_weight 0.2 train.model.freeze_bn_transform False model.backbone_arch ResNet101 init.model models/gl18-tl-resnet101-gem-w-a4d43db-converted.pth train.mining.do_mining False output.path output/os2d_v1-train

Note that these runs need a lot of RAM due to caching of the whole training set. If this does not work for you you can use parameters train.cache_images False, which will load images on the fly, but can be slow. Also note that several first iterations of training can be slow bacause of "warming up", i.e., computing the grids of anchors in Os2dBoxCoder. Those computations are cached, so everyhitng will eventually run faster.

For the rest of the training scripts see below.

Rerunning experiments on retail and INSTRE datasets

All the experiments ob this project were run with our job helper. For each experiment, one program an experiment structure (in python) and calls several technical function provided by the launcher. See, e.g., this file for an example.

The launch happens as follows:

# add OS2D_ROOT to the python path - can be done, e.g., as follows
export PYTHONPATH=$OS2D_ROOT:$PYTHONPATH
# call the experiment script
python ./experiments/launcher_exp1.py LIST_OF_LAUNCHER_FLAGS

Extra parameters in LIST_OF_LAUNCHER_FLAGS are parsed by the launcher and contain some useful options about the launch:

  1. --no-launch allows to prepare all the scripts of the experiment without the actual launch.
  2. --slurm allows to prepare SLURM jobs and launches (if the is no --no-launch) with sbatch.
  3. --stdout-file and --stderr-file - files where to save stdout and stderr, respectively (relative to the log_path defined in the experiment description).
  4. For many SLURM related parameters, see the launcher.

Our experiments can be found here:

  1. Experiments with OS2D
  2. Experiments with the detector-retrieval baseline
  3. Experiments with the CoAE baseline

Baselines

We have added two baselines in this repo:

  1. Class-agnostic detector + image retrieval system: see README for details.
  2. Co-Attention and Co-Excitation, CoAE (original code, paper): see README for details.

Acknowledgements

We would like to personally thank Ignacio Rocco, Relja Arandjelović, Andrei Bursuc, Irina Saparina and Ekaterina Glazkova for amazing discussions and insightful comments without which this project would not be possible.

This research was partly supported by Samsung Research, Samsung Electronics, by the Russian Science Foundation grant 19-71-00082 and through computational resources of HPC facilities at NRU HSE.

This software was largely inspired by a number of great repos: weakalign, cnnimageretrieval-pytorch, torchcv, maskrcnn-benchmark. Special thanks goes to the amazing PyTorch.

os2d-cirtorch-pytorch-gem's People

Contributors

aosokin avatar

Watchers

James Cloos avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.