Code Monkey home page Code Monkey logo

detectron.pytorch's Introduction

A Pytorch Implementation of Detectron

Build Status

Example output of e2e_mask_rcnn-R-101-FPN_2x using Detectron pretrained weight.

Corresponding example output from Detectron.

Example output of e2e_keypoint_rcnn-R-50-FPN_s1x using Detectron pretrained weight.

This code follows the implementation architecture of Detectron. Only part of the functionality is supported. Check this section for more information. This code now supports PyTorch 1.0 and TorchVision 0.3.

With this code, you can...

  1. Train your model from scratch.
  2. Inference using the pretrained weight file (*.pkl) from Detectron.

This repository is originally built on jwyang/faster-rcnn.pytorch. However, after many modifications, the structure changes a lot and it's now more similar to Detectron. I deliberately make everything similar or identical to Detectron's implementation, so as to reproduce the result directly from official pretrained weight files.

This implementation has the following features:

  • It is pure Pytorch code. Of course, there are some CUDA code.

  • It supports multi-image batch training.

  • It supports multiple GPUs training.

  • It supports two pooling methods. Notice that only roi align is revised to match the implementation in Caffe2. So, use it.

  • It is memory efficient. For data batching, there are two techiniques available to reduce memory usage: 1) Aspect grouping: group images with similar aspect ratio in a batch 2) Aspect cropping: crop images that are too long. Aspect grouping is implemented in Detectron, so it's used for default. Aspect cropping is the idea from jwyang/faster-rcnn.pytorch, and it's not used for default.

    Besides of that, I implement a customized nn.DataParallel module which enables different batch blob size on different gpus. Check My nn.DataParallel section for more details about this.

News

  • (2018/05/25) Support ResNeXt backbones.
  • (2018/05/22) Add group normalization baselines.
  • (2018/05/15) PyTorch0.4 is supported now !
  • (2019/08/28) Support PASCAL VOC and Custom Dataset
  • (2019/01/17) PyTorch 1.0 Supported now!
  • (2019/05/30) Code rebased on TorchVision 0.3. Compilation is now optional!

Getting Started

Clone the repo:

git clone https://github.com/roytseng-tw/mask-rcnn.pytorch.git

Requirements

Tested under python3.

  • python packages
    • pytorch>=1.0.0
    • torchvision>=0.3.0
    • cython>=0.29.2
    • matplotlib
    • numpy
    • scipy
    • opencv
    • pyyaml
    • packaging
    • pycocotools — for COCO dataset, also available from pip.
    • tensorboardX — for logging the losses in Tensorboard
  • An NVIDIA GPU and CUDA 8.0 or higher. Some operations only have gpu implementation.
  • NOTICE: different versions of Pytorch package have different memory usages.

Compilation [Optional]

Compile the CUDA code:

cd lib  # please change to this directory
sh make.sh

It will compile all the modules you need, including NMS. (Actually gpu nms is never used ...)

Note that, If you use CUDA_VISIBLE_DEVICES to set gpus, make sure at least one gpu is visible when compile the code.

Data Preparation

Create a data folder under the repo,

cd {repo_root}
mkdir data
  • COCO: Download the coco images and annotations from coco website.

    And make sure to put the files as the following structure:

    coco
    ├── annotations
    |   ├── instances_minival2014.json
    │   ├── instances_train2014.json
    │   ├── instances_train2017.json
    │   ├── instances_val2014.json
    │   ├── instances_val2017.json
    │   ├── instances_valminusminival2014.json
    │   ├── ...
    |
    └── images
        ├── train2014
        ├── train2017
        ├── val2014
        ├── val2017
        ├── ...
    

    Download coco mini annotations from here. Please note that minival is exactly equivalent to the recently defined 2017 val set. Similarly, the union of valminusminival and the 2014 train is exactly equivalent to the 2017 train set.

    Feel free to put the dataset at any place you want, and then soft link the dataset under the data/ folder:

    ln -s path/to/coco data/coco
    
  • PASCAL VOC 2007 + 12 Please follow the instructions in py-faster-rcnn to prepare VOC datasets. Actually, you can refer to any others. After downloading the data, creat softlinks in the data/VOC<year> folder as folows,

    VOCdevkitPATH=/path/to/voc_devkit
    mkdir -p $DETECTRON/detectron/datasets/data/VOC<year>
    ln -s /${VOCdevkitPATH}/VOC<year>/JPEGImages $DETECTRON.PYTORCH/data/VOC<year>/JPEGImages
    ln -s /${VOCdevkitPATH}/VOC<year>/json_annotations $DETECTRON.PYTORCH/data/VOC<year>/annotations
    ln -s /${VOCdevkitPATH} $DETECTRON.PYTORCH/data/VOC<year>/VOCdevkit<year>
    

    The directory structure of JPEGImages and annotations should be as follows,

    VOC<year>
    ├── annotations
    |   ├── train.json
    │   ├── trainval.json
    │   ├── test.json
    │   ├── ...
    |
    └── JPEGImages
        ├── <im-1-name>.jpg
        ├── ...
        ├── <im-N-name>.jpg
    

    NOTE: The annotations folder requires you to have PASCAL VOC annotations in COCO json format, which is available for download here. You can also convert the XML annotatinos files to JSON by running the following script,

    python tools/pascal_voc_xml2coco_json_converter.py $VOCdevkitPATH $year
    

    (In order to succesfully run the script above, you need to update the full path to the respective folders in the script).

  • Custom Dataset Similar to above, create a directory named CustomDataset in the data folder and add symlinks to the annotations directory and JPEGImages as shown for Pascal Voc dataset. You also need to link the custom dataset devkit to CustomDataDevkit.

Recommend to put the images on a SSD for possible better training performance

Pretrained Model

I use ImageNet pretrained weights from Caffe for the backbone networks.

Download them and put them into the {repo_root}/data/pretrained_model.

You can the following command to download them all:

  • extra required packages: argparse_color_formater, colorama, requests
python tools/download_imagenet_weights.py

NOTE: Caffe pretrained weights have slightly better performance than Pytorch pretrained. Suggest to use Caffe pretrained models from the above link to reproduce the results. By the way, Detectron also use pretrained weights from Caffe.

If you want to use pytorch pre-trained models, please remember to transpose images from BGR to RGB, and also use the same data preprocessing (minus mean and normalize) as used in Pytorch pretrained model.

ImageNet Pretrained Model provided by Detectron

Besides of using the pretrained weights for ResNet above, you can also use the weights from Detectron by changing the corresponding line in model config file as follows:

RESNETS:
  IMAGENET_PRETRAINED_WEIGHTS: 'data/pretrained_model/R-50.pkl'

R-50-GN.pkl and R-101-GN.pkl are required for gn_baselines.

X-101-32x8d.pkl, X-101-64x4d.pkl and X-152-32x8d-IN5k.pkl are required for ResNeXt backbones.

Training

DO NOT CHANGE anything in the provided config files(configs/**/xxxx.yml) unless you know what you are doing

Use the environment variable CUDA_VISIBLE_DEVICES to control which GPUs to use.

Adapative config adjustment

Let's define some terms first

       batch_size: NUM_GPUS x TRAIN.IMS_PER_BATCH
       effective_batch_size: batch_size x iter_size
       change of somethining: new value of something / old value of something

Following config options will be adjusted automatically according to actual training setups: 1) number of GPUs NUM_GPUS, 2) batch size per GPU TRAIN.IMS_PER_BATCH, 3) update period iter_size

  • SOLVER.BASE_LR: adjust directly propotional to the change of batch_size.
  • SOLVER.STEPS, SOLVER.MAX_ITER: adjust inversely propotional to the change of effective_batch_size.

Train from scratch

Take mask-rcnn with res50 backbone for example.

python tools/train_net_step.py --dataset coco2017 --cfg configs/baselines/e2e_mask_rcnn_R-50-C4.yml --use_tfboard --bs {batch_size} --nw {num_workers}

Use --bs to overwrite the default batch size to a proper value that fits into your GPUs. Simliar for --nw, number of data loader threads defaults to 4 in config.py.

Specify —-use_tfboard to log the losses on Tensorboard.

NOTE:

  • use --dataset keypoints_coco2017 when training for keypoint-rcnn.
  • use --dataset voc2007 when training for PASCAL VOC 2007.
  • use --dataset voc2012 when training for PASCAL VOC 2012.
  • use --dataset custom_dataset --num_classes $NUM_CLASSES when training for your custom dataset. Here, $NUM_CLASSES is the number of object classes + 1 (for background class) present in your custom dataset.

The use of --iter_size

As in Caffe, update network once (optimizer.step()) every iter_size iterations (forward + backward). This way to have a larger effective batch size for training. Notice that, step count is only increased after network update.

python tools/train_net_step.py --dataset coco2017 --cfg configs/baselines/e2e_mask_rcnn_R-50-C4.yml --bs 4 --iter_size 4

iter_size defaults to 1.

Finetune from a pretrained checkpoint

python tools/train_net_step.py ... --load_ckpt {path/to/the/checkpoint}

or using Detectron's checkpoint file

python tools/train_net_step.py ... --load_detectron {path/to/the/checkpoint}

Resume training with the same dataset and batch size

python tools/train_net_step.py ... --load_ckpt {path/to/the/checkpoint} --resume

When resume the training, step count and optimizer state will also be restored from the checkpoint. For SGD optimizer, optimizer state contains the momentum for each trainable parameter.

NOTE: --resume is not yet supported for --load_detectron

Set config options in command line

  python tools/train_net_step.py ... --no_save --set {config.name1} {value1} {config.name2} {value2} ...
  • For Example, run for debugging.
    python tools/train_net_step.py ... --no_save --set DEBUG True
    
    Load less annotations to accelarate training progress. Add --no_save to avoid saving any checkpoint or logging.

Show command line help messages

python train_net_step.py --help

Two Training Scripts

In short, use train_net_step.py.

In train_net_step.py:

(Deprecated) In train_net.py some config options have no effects and worth noticing:

  • SOLVER.LR_POLICY, SOLVER.MAX_ITER, SOLVER.STEPS,SOLVER.LRS: For now, the training policy is controlled by these command line arguments:

    • --epochs: How many epochs to train. One epoch means one travel through the whole training sets. Defaults to 6.
    • --lr_decay_epochs : Epochs to decay the learning rate on. Decay happens on the beginning of a epoch. Epoch is 0-indexed. Defaults to [4, 5].

    For more command line arguments, please refer to python train_net.py --help

  • SOLVER.WARM_UP_ITERS, SOLVER.WARM_UP_FACTOR, SOLVER.WARM_UP_METHOD: Training warm up is not supported.

Inference

Evaluate the training results

For example, test mask-rcnn on coco2017 val set

python tools/test_net.py --dataset coco2017 --cfg config/baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml --load_ckpt {path/to/your/checkpoint}

Use --load_detectron to load Detectron's checkpoint. If multiple gpus are available, add --multi-gpu-testing.

Specify a different output directry, use --output_dir {...}. Defaults to {the/parent/dir/of/checkpoint}/test

Visualize the training results on images

python tools/infer_simple.py --dataset coco --cfg cfgs/baselines/e2e_mask_rcnn_R-50-C4.yml --load_ckpt {path/to/your/checkpoint} --image_dir {dir/of/input/images}  --output_dir {dir/to/save/visualizations}

--output_dir defaults to infer_outputs.

Supported Network modules

  • Backbone:

    • ResNet: ResNet50_conv4_body,ResNet50_conv5_body, ResNet101_Conv4_Body,ResNet101_Conv5_Body, ResNet152_Conv5_Body
    • ResNeXt: [fpn_]ResNet101_Conv4_Body,[fpn_]ResNet101_Conv5_Body, [fpn_]ResNet152_Conv5_Body
    • FPN: fpn_ResNet50_conv5_body,fpn_ResNet50_conv5_P2only_body, fpn_ResNet101_conv5_body,fpn_ResNet101_conv5_P2only_body,fpn_ResNet152_conv5_body,fpn_ResNet152_conv5_P2only_body
  • Box head: ResNet_roi_conv5_head,roi_2mlp_head, roi_Xconv1fc_head, roi_Xconv1fc_gn_head

  • Mask head: mask_rcnn_fcn_head_v0upshare,mask_rcnn_fcn_head_v0up, mask_rcnn_fcn_head_v1up, mask_rcnn_fcn_head_v1up4convs, mask_rcnn_fcn_head_v1up4convs_gn

  • Keypoints head: roi_pose_head_v1convX

NOTE: the naming is similar to the one used in Detectron. Just remove any prepending add_.

Supported Datasets

Only COCO is supported for now. However, the whole dataset library implementation is almost identical to Detectron's, so it should be easy to add more datasets supported by Detectron.

Configuration Options

Architecture specific configuration files are put under configs. The general configuration file lib/core/config.py has almost all the options with same default values as in Detectron's, so it's effortless to transform the architecture specific configs from Detectron.

Some options from Detectron are not used because the corresponding functionalities are not implemented yet. For example, data augmentation on testing.

Extra options

  • MODEL.LOAD_IMAGENET_PRETRAINED_WEIGHTS = True: Whether to load ImageNet pretrained weights.
    • RESNETS.IMAGENET_PRETRAINED_WEIGHTS = '': Path to pretrained residual network weights. If start with '/', then it is treated as a absolute path. Otherwise, treat as a relative path to ROOT_DIR.
  • TRAIN.ASPECT_CROPPING = False, TRAIN.ASPECT_HI = 2, TRAIN.ASPECT_LO = 0.5: Options for aspect cropping to restrict image aspect ratio range.
  • RPN.OUT_DIM_AS_IN_DIM = True, RPN.OUT_DIM = 512, RPN.CLS_ACTIVATION = 'sigmoid': Official implement of RPN has same input and output feature channels and use sigmoid as the activation function for fg/bg class prediction. In jwyang's implementation, it fix output channel number to 512 and use softmax as activation function.

How to transform configuration files from Detectron

  1. Remove MODEL.NUM_CLASSES. It will be set according to the dataset specified by --dataset.
  2. Remove TRAIN.WEIGHTS, TRAIN.DATASETS and TEST.DATASETS
  3. For module type options (e.g MODEL.CONV_BODY, FAST_RCNN.ROI_BOX_HEAD ...), remove add_ in the string if exists.
  4. If want to load ImageNet pretrained weights for the model, add RESNETS.IMAGENET_PRETRAINED_WEIGHTS pointing to the pretrained weight file. If not, set MODEL.LOAD_IMAGENET_PRETRAINED_WEIGHTS to False.
  5. [Optional] Delete OUTPUT_DIR: . at the last line
  6. Do NOT change the option NUM_GPUS in the config file. It's used to infer the original batch size for training, and learning rate will be linearly scaled according to batch size change. Proper learning rate adjustment is important for training with different batch size.
  7. For group normalization baselines, add RESNETS.USE_GN: True.

My nn.DataParallel

  • Keep certain keyword inputs on cpu Official DataParallel will broadcast all the input Variables to GPUs. However, many rpn related computations are done in CPU, and it's unnecessary to put those related inputs on GPUs.
  • Allow Different blob size for different GPU To save gpu memory, images are padded seperately for each gpu.
  • Work with returned value of dictionary type

Benchmark

BENCHMARK.md

detectron.pytorch's People

Contributors

adityaarun1 avatar jiasenlu avatar jwyang avatar roytseng-tw avatar vfdev-5 avatar yuliang-zou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

detectron.pytorch's Issues

torch 1.0.1 not matched torchvision 0.3.0

I installed torchvision 0.3.0 first, and it told me:

ERROR: torchvision 0.3.0 has requirement torch>=1.1.0, but you'll have torch 1.0.1 which is incompatible.

when I pip install torch-1.0.1-cp35-cp35m-manylinux1_x86_64.whl. So what are the matched versions?

Evaluation scripts for Custom Dataset & VOC

Hey!

I have finetuned a network on VOC 2007, which was initially trained on COCO Dataset.

I am now trying to evaluate the training on the validation set. When I run the command,
python tools/test_net.py --dataset voc2007 --cfg configs/baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml --load_ckpt {path/to/your/checkpoint},

I get this:
INFO task_evaluation.py: 61: Evaluating bounding boxes is done!

INFO task_evaluation.py: 104: Evaluating segmentations
Traceback (most recent call last):

File "tools/test_net.py", line 125, in
check_expected_results=True)

File "/home/deep/data/asif/Detectron/Detectron.pytorch/lib/core/test_engine.py", line 128, in run_inference
all_results = result_getter()

File "/home/deep/data/asif/Detectron/Detectron.pytorch/lib/core/test_engine.py", line 108, in result_getter
multi_gpu=multi_gpu_testing

File "/home/deep/data/asif/Detectron/Detectron.pytorch/lib/core/test_engine.py", line 163, in test_net_on_dataset
dataset, all_boxes, all_segms, all_keyps, output_dir

File "/home/deep/data/asif/Detectron/Detectron.pytorch/lib/datasets/task_evaluation.py", line 63, in evaluate_all
results = evaluate_masks(dataset, all_boxes, all_segms, output_dir)

File "/home/deep/data/asif/Detectron/Detectron.pytorch/lib/datasets/task_evaluation.py", line 128, in evaluate_masks
'No evaluator for dataset: {}'.format(dataset.name)

NotImplementedError: No evaluator for dataset: voc_2007_test

This appears from the task_evaluation file where no implementation for voc evaluation in masks. Is there a work around?

Since, voc is already in json format I guessed it is still possible to evaluate segmentations in the task_evaluation script! How to do this?

Freeze All layers except Output layers

Hi,

I am finetuning a COCO pretrained model with VOC dataset. I am using the config file : e2e_mask_rcnn_R-50-FPN_1x.yaml

I found a key to freeze the backbone of model with TRAIN.FREEZE_CONV_BODY: True
But this only freezes the Convolutional layers. How do I freeze the Fast RCNN and MRCNN layers so I can update the output layers alone by fine tuning ?

ImportError using Pytorch 1.1.0

Import Error using Pytorch 1.1.0

I encountered this error during a dataset conversion to COCO json format

----> 6 from lib.utils.boxes import xyxy_to_xywh
.
.
.
/content/Detectron.pytorch/lib/nn/parallel/scatter_gather.py in <module>()
      6 from ._functions import Scatter, Gather
      7 from torch._six import string_classes, int_classes
----> 8 from torch.utils.data.dataloader import numpy_type_map
      9 
     10 

ImportError: cannot import name 'numpy_type_map'

It seems that numpy_type_map has been moved to torch/utils/data/_utils/collate.py in pytorch 1.1.0

Maybe numpy_type_map should be redefined entirely instead of being imported.

EDIT: Same error encountered when executing

python tools/download_imagenet_weights.py

COCO Performance

Hi, thanks for the great work! I wonder if you have replicated the performance on COCO under the new version.

Finetuning with VOC2007

Hi!

I trained from scratch completely a network with COCO2017 dataset with:

python tools/train_net_step.py --dataset coco2017 --cfg configs/baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml --use_tfboard --bs=2

I now used the checkpoint (pth file) created from the train from scratch to finetune with VOC2007 dataset.

As expected I ran into trouble because the number of classes in VOC (21) and COCO (81) are different. I understand its possible to finetune as there are steps given to finetune custom dataset with different number of classes. I would like to know how to do this?

The command i used:
python tools/train_net_step.py --dataset voc2007 --cfg configs/baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml --load_ckpt=/home/deep/data/asif/Detectron/Detectron.pytorch/Outputs/e2e_mask_rcnn_R-50-FPN_1x/Mar14-14-59-32_deeppc_step/ckpt/model_step719999.pth --use_tfboard --bs=2

The errors I got:
Traceback (most recent call last):
File "tools/train_net_step.py", line 471, in
main()
File "tools/train_net_step.py", line 331, in main
net_utils.load_ckpt(maskRCNN, checkpoint['model'])
File "/home/deep/data/asif/Detectron/Detectron.pytorch/lib/utils/net.py", line 163, in load_ckpt
model.load_state_dict(state_dict, strict=False)
File "/home/deep/anaconda3/envs/detectron/lib/python3.7/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Generalized_RCNN:
size mismatch for Box_Outs.cls_score.weight: copying a param with shape torch.Size([81, 1024]) from checkpoint, the shape in current model is torch.Size([21, 1024]).
size mismatch for Box_Outs.cls_score.bias: copying a param with shape torch.Size([81]) from checkpoint, the shape in current model is torch.Size([21]).
size mismatch for Box_Outs.bbox_pred.weight: copying a param with shape torch.Size([324, 1024]) from checkpoint, the shape in current model is torch.Size([84, 1024]).
size mismatch for Box_Outs.bbox_pred.bias: copying a param with shape torch.Size([324]) from checkpoint, the shape in current model is torch.Size([84]).
size mismatch for Mask_Outs.classify.weight: copying a param with shape torch.Size([81, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([21, 256, 1, 1]).
size mismatch for Mask_Outs.classify.bias: copying a param with shape torch.Size([81]) from checkpoint, the shape in current model is torch.Size([21]).

Infer test images on a custom dataset

Hi!
I am trying to use voc dataset to infer test images. It works fine with the coco dataset! I am exploring the code and its abilities as I ultimately want to use a custom dataset and classes! So, I was trying to visualize the training performance of VOC2007 on the infer_simple.py. But its not possible as the infer_simple.py only accepts coco or keypoints coco as the acceptable dataset. Am I doing something wrong here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.