Code Monkey home page Code Monkey logo

retinamask's Introduction

RetinaMask

The code is based on the maskrcnn-benchmark.

alt text

Citing RetinaMask

Please cite RetinaMask in your publications if it helps your research:

@inproceedings{fu2019retinamask,
  title = {{RetinaMask}: Learning to predict masks improves state-of-the-art single-shot detection for free},
  author = {Fu, Cheng-Yang and  Shvets, Mykhailo and Berg, Alexander C.},
  booktitle = {arXiv preprint arXiv:1901.03353},
  year = {2019}
}

Contents

  1. Installation
  2. Models

Installation

Follow the maskrcnn-benchmark to install code and set up the dataset. Use config files in ./configs/retina/ for Training and Testing.

Models

Models BBox B(time) Mask M(time) Link
ResNet-50-FPN 39.4/58.6/42.3/21.9/42.0/51.0 0.124 34.9/55.7/37.1/15.1/36.7/50.4 0.139 link
ResNet-101-FPN 41.4/ 60.8/44.6/23.0/44.5/53.5 0.145 36.6/58.0/39.1/16.2/38.8/52.7 0.160 link
ResNet-101-FPN-GN 41.7/61.7/45.0/23.5/44.7/52.8 0.153 36.7/58.8/39.3/16.4/39.4/52.6 0.164 link
ResNeXt32x8d-101-FPN-GN 42.6/62.5/46.0/24.8/45.6/53.8 0.231 37.4/59.8/40.0/17.6/39.9/53.4 0.270 link

P.S. evaluation metric: AP, AP50, AP75, AP(small), AP(medium), AP(large), please refer to COCO for detailed explanation. The inference time is measured on Nvidia 1080Ti.

Run Inference

Use the following scripts. (Assume models are download to the ./models directory) Run Mask and BBox

python tools/test_net.py --config-file ./configs/retina/retinanet_mask_R-50-FPN_2x_adjust_std011_ms.yaml MODEL.WEIGHT ./models/retinanet_mask_R-50-FPN_2x_adjust_std011_ms_model.pth

Run BBox only

python tools/test_net.py --config-file ./configs/retina/retinanet_mask_R-50-FPN_2x_adjust_std011_ms.yaml MODEL.WEIGHT ./models/retinanet_mask_R-50-FPN_2x_adjust_std011_ms_model.pth MODEL.MASK_ON False

retinamask's People

Contributors

belowmit avatar bermanmaxim avatar fmassa avatar godricly avatar jiayuan-gu avatar killthekitten avatar kongsea avatar macfly1202 avatar martinruenz avatar mike-shvets avatar sotte avatar soumith avatar vfdev-5 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

retinamask's Issues

About the speed issue

Hi, since I tested maskrcnn-benchmark with resnet50 maskrcnn
it was about 15fps fastest on GTX 1080 ti. Which means it's about 66ms per frame.

Why does one stage Retina as feature extractor backbones gots 120+ms in your readme?
It should be much more faster therotically.

Could it work on cpu?

❓ Questions and Help

Because of the epidemic,I still can't go back to University to use GPU.And could it work on cpu?
It would be appreciated if you reply me soon.Thanks!

AdjustSmoothL1Loss substracts two variables with different quantities

❓ Questions and Help

I'm confused by the AdjustSmoothL1Loss using running_mean-running_var in the paper. Running mean and running_var have different dimensional quantity. Subtracting has no meaning on them.

e.g., if running_mean stands for meter, then running_var is m^2
according to Dimensional analysis

Only commensurable quantities (physical quantities having the same dimension) may be compared, equated, added, or subtracted.

Maybe use standard deviation?

Add speed test vs original maskrcnn-benmark

Thanks for integrating retinanet into maskrcnn. Does there any plan to post some speed evaluation using retinanet in maskrcnn architecture?

That would be great if maskrcnn runs realtime (15fps both inference and visualization)!

Experiments on VOC

❓ Questions and Help

I converted the PASCAL VOC dataset into the format of COCO dataset and trained a new model, but the AP is low (almost zero). Do you have any suggection on how to train RetinaMask with VOC ?

error while training the retinamask for COCO dataset with my own images inside

🐛 Bug

Error while starting the training using tools/train_net.py

I have my own dataset in COCO format on which i want to train the retinamask. I am getting following error while starting the training.

Traceback (most recent call last):
File "tools/train_net.py", line 18, in
from maskrcnn_benchmark.engine.inference import inference
File "/data/home/amjn_cowi/amardeep/retinamask/retinamask/maskrcnn_benchmark/engine/inference.py", line 20, in
from maskrcnn_benchmark.structures.boxlist_ops import boxlist_iou
File "/data/home/amjn_cowi/amardeep/retinamask/retinamask/maskrcnn_benchmark/structures/boxlist_ops.py", line 6, in
from maskrcnn_benchmark.layers import nms as _box_nms
File "/data/home/amjn_cowi/amardeep/retinamask/retinamask/maskrcnn_benchmark/layers/init.py", line 8, in
from .nms import nms
File "/data/home/amjn_cowi/amardeep/retinamask/retinamask/maskrcnn_benchmark/layers/nms.py", line 3, in
from maskrcnn_benchmark import _C
ImportError: cannot import name '_C' from 'maskrcnn_benchmark' (/data/home/amjn_cowi/amardeep/retinamask/retinamask/maskrcnn_benchmark/init.py)

Can somebody guide me how to fix it ?

I used following command to start the training :
python tools/train_net.py --config-file=configs/retina/retinanet_mask_R-50-FPN_1.5x.yaml

I followed the MaskRCNN Benchmark (install.md) file to setup the environment.

Thanks in Advance

P6 is different from paper

It's mentioned in the original retinanet paper that P6 is generated by C5. But in this implementation, P6 is generated by P5. Did you have any experiment on these implementation?

Issue about ImageNetPretrained weights

❓ Questions and Help

Hi, i can't get the ImageNetPretrained weights when run the train script. I just get the R-50.pkl just like :
NoSuchBucketThe specified bucket does not existdetectron5FE2AB3EE4E3584851Vhi/Z1ck7Jayj7yY2muN495jIdfytq3QOLzmAaL0jneIhvQkirmRu6/0ZYIWr1XKCQtpgIFrM=
Error comes : _pickle.UnpicklingError: invalid load key, '<'.
How can i solve this issue or can you give me some other links.
@chengyangfu

Non-existent config key: MODEL.BACKBONE.OUT_CHANNELS

I followed the installation instructions and everything went fine until wanted to test the pre-trained model by executing:

python tools/test_net.py --config-file ./configs/retina/retinanet_mask_R-50-FPN_2x_adjust_std011_ms.yaml MODEL.WEIGHT ./models/retinanet_mask_R-50-FPN_2x_adjust_std011_ms_model.pth MODEL.MASK_ON False

This leads to the following error:

Traceback (most recent call last):
  File "tools/test_net.py", line 100, in <module>
    main()
  File "tools/test_net.py", line 55, in main
    cfg.merge_from_file(args.config_file)
  File "/home/flo/anaconda3/envs/uav-challenge/lib/python3.6/site-packages/yacs/config.py", line 213, in merge_from_file
    self.merge_from_other_cfg(cfg)
  File "/home/flo/anaconda3/envs/uav-challenge/lib/python3.6/site-packages/yacs/config.py", line 217, in merge_from_other_cfg
    _merge_a_into_b(cfg_other, self, self, [])
  File "/home/flo/anaconda3/envs/uav-challenge/lib/python3.6/site-packages/yacs/config.py", line 460, in _merge_a_into_b
    _merge_a_into_b(v, b[k], root, key_list + [k])
  File "/home/flo/anaconda3/envs/uav-challenge/lib/python3.6/site-packages/yacs/config.py", line 460, in _merge_a_into_b
    _merge_a_into_b(v, b[k], root, key_list + [k])
  File "/home/flo/anaconda3/envs/uav-challenge/lib/python3.6/site-packages/yacs/config.py", line 473, in _merge_a_into_b
    raise KeyError("Non-existent config key: {}".format(full_key))
KeyError: 'Non-existent config key: MODEL.BACKBONE.OUT_CHANNELS'

Any ideas?

RuntimeError: The size of tensor a (0) must match the size of tensor b (225603) at non-singleton dimension 0

🐛 Bug

Hi, Thanks for sharing,
I'm training on a custom dataset using
python tools/train_net.py --config-file ./configs/retina/retinanet_mask_R-50-FPN_2x_adjust_std011_ms.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.MAX_ITER 180000 SOLVER.STEPS "(90000, 120000)"

But after few iteration, I get this error.

2021-01-19 12:42:44,720 maskrcnn_benchmark.trainer INFO: eta: 1 day, 5:42:19  iter: 902  loss: 1.8773 (2.2307)  loss_retina_cls: 0.4378 (0.6734)  loss_retina_reg: 1.0318 (1.1414)  loss_mask: 0.3184 (0.4159)  time: 0.3497 (0.5971)  data: 0.0093 (0.3164)  lr: 0.005000  max mem: 3196
2021-01-19 12:42:45,114 maskrcnn_benchmark.trainer INFO: eta: 1 day, 5:41:38  iter: 903  loss: 1.7931 (2.2301)  loss_retina_cls: 0.4374 (0.6731)  loss_retina_reg: 1.0279 (1.1412)  loss_mask: 0.3223 (0.4158)  time: 0.3535 (0.5969)  data: 0.0093 (0.3160)  lr: 0.005000  max mem: 3196
Traceback (most recent call last):
  File "tools/train_net.py", line 171, in <module>
    main()
  File "tools/train_net.py", line 164, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 73, in train
    arguments,
  File "/home/eldad/retinamask-master/maskrcnn_benchmark/engine/trainer.py", line 65, in do_train
    loss_dict = model(images, targets)
  File "/home/eldad/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/detector/retinanet.py", line 61, in forward
    (anchors, detections), detector_losses = self.rpn(images, rpn_features, targets)
  File "/home/eldad/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/rpn/retinanet.py", line 150, in forward
    return self._forward_train(anchors, box_cls, box_regression, targets)
  File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/rpn/retinanet.py", line 157, in _forward_train
    anchors, box_cls, box_regression, targets
  File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/rpn/retinanet_loss.py", line 108, in __call__
    labels, regression_targets = self.prepare_targets(anchors, targets)
  File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/rpn/retinanet_loss.py", line 87, in prepare_targets
    matched_targets.bbox, anchors_per_image.bbox
  File "/home/eldad/retinamask-master/maskrcnn_benchmark/modeling/box_coder.py", line 44, in encode
    targets_dx = wx * (gt_ctr_x - ex_ctr_x) / ex_widths
RuntimeError: The size of tensor a (0) must match the size of tensor b (225603) at non-singleton dimension 0
(base) eldad@x580-05:~/retinamask-master$

Using maskrcnn, I get no errors. but on retina, there size mismatch error. Any idea why I have such error?

Thank you

Detectron2 Support

Hello, quick question, will this repository ever be updated to Detectron2? I've seen the changes over the original maskrcnn benchmark repo and it seems it's very similar to the new Detectron2 repo.

Issue with the build process

🐛 Bug

I try to build the code inside a NVIDIA docker contaner.

gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -m64 -fPIC -m64 -fPIC -fPIC -DWITH_CUDA -I/home/jovyan/map-workspace/code/retinamask/maskrcnn_benchmark/csrc -I/opt/conda/lib/python3.6/site-packages/torch/lib/include -I/opt/conda/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/lib/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.6m -c /home/jovyan/map-workspace/code/retinamask/maskrcnn_benchmark/csrc/cpu/ROIAlign_cpu.cpp -o build/temp.linux-x86_64-3.6/home/jovyan/map-workspace/code/retinamask/maskrcnn_benchmark/csrc/cpu/ROIAlign_cpu.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
/usr/local/cuda/bin/nvcc -DWITH_CUDA -I/home/jovyan/map-workspace/code/retinamask/maskrcnn_benchmark/csrc -I/opt/conda/lib/python3.6/site-packages/torch/lib/include -I/opt/conda/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/lib/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.6m -c /home/jovyan/map-workspace/code/retinamask/maskrcnn_benchmark/csrc/cuda/SigmoidFocalLoss_cuda.cu -o build/temp.linux-x86_64-3.6/home/jovyan/map-workspace/code/retinamask/maskrcnn_benchmark/csrc/cuda/SigmoidFocalLoss_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
In file included from /home/jovyan/map-workspace/code/retinamask/maskrcnn_benchmark/csrc/cuda/SigmoidFocalLoss_cuda.cu:6:0:
/opt/conda/lib/python3.6/site-packages/torch/lib/include/ATen/cuda/CUDAContext.h:12:22: fatal error: cusparse.h: No such file or directory
compilation terminated.

To Reproduce

Steps to reproduce the behavior:

  1. python setup.py build develop

Expected behavior

Should compile the C++ code.

Environment

Please copy and paste the output from the
environment collection script from PyTorch
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

Collecting environment information...
PyTorch version: 1.0.1.post2
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: Could not collect

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration: GPU 0: Tesla V100-SXM2-16GB
Nvidia driver version: 390.46
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.2.1

Versions of relevant libraries:
[pip] numpy==1.13.3
[pip] torch==1.0.1.post2
[pip] torchsummary==1.5.1
[pip] torchvision==0.2.2.post3
[conda] torch 1.0.1.post2
[conda] torchsummary 1.5.1
[conda] torchvision 0.2.2.post3

  • PyTorch Version (e.g., 1.0):
  • OS (e.g., Linux):
  • How you installed PyTorch (conda, pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Additional context

How can I see the image with "box" and "mask" lables?

❓ Questions and Help

Sorry to bother you again.In fact,I didn't run it successfully because of my poor device.But I don't want to stop learning it ,I read the code,and found that the result of this project is some files ended with".pth", isn't it? If I want to see the image with "boxes" and "mask" label , should I write this part of the code?ThanKs again!

Clarification on number of Features (num_features) as parameter In the adjusted smooth l1 loss

@chengyangfu . The new smooth L1 loss in fast RCNN is implemented in code as:

def smooth_l1_loss(input, target, beta=1. / 9, size_average=True):

    n = torch.abs(input - target)
    cond = n < beta
    loss = torch.where(cond, 0.5 * n ** 2 / beta, n - 0.5 * beta)
    if size_average:
        return loss.mean()
    return loss.sum()

In your work, you implemented self-adjust smooth L1 loss as:

import torch
from torch import nn
import logging
from torch.distributed import deprecated as dist

class AdjustSmoothL1Loss(nn.Module):

    def __init__(self, num_features, momentum=0.1, beta=1. /9):
        super(AdjustSmoothL1Loss, self).__init__()
        self.num_features = num_features
        self.momentum = momentum
        self.beta = beta
        self.register_buffer(
            'running_mean', torch.empty(num_features).fill_(beta)
        )
        self.register_buffer('running_var', torch.zeros(num_features))
        self.logger = logging.getLogger("maskrcnn_benchmark.trainer")

    def forward(self, inputs, target, size_average=True):

        n = torch.abs(inputs -target)
        with torch.no_grad():
            if torch.isnan(n.var(dim=0)).sum().item() == 0:
                self.running_mean = self.running_mean.to(n.device)
                self.running_mean *= (1 - self.momentum)
                self.running_mean += (self.momentum * n.mean(dim=0))
                self.running_var = self.running_var.to(n.device)
                self.running_var *= (1 - self.momentum)
                self.running_var += (self.momentum * n.var(dim=0))


        beta = (self.running_mean - self.running_var)
        beta = beta.clamp(max=self.beta, min=1e-3)

        beta = beta.view(-1, self.num_features).to(n.device)
        cond = n < beta.expand_as(n)
        loss = torch.where(cond, 0.5 * n ** 2 / beta, n - 0.5 * beta)
        if size_average:
            return loss.mean()
        return loss.sum()

I see the num_features parameter that needs to be passed to class, for the loss computation to succeed. Can you please help me understand what this num_features parameter is? What value is needed for this num_features?

Multi-scale training

Thank you for making the code available online. Really solid work!

Where do you specify multi-scale training?

In the caption of figure one of the paper (https://arxiv.org/pdf/1901.03353.pdf), it is mentioned that you do not use multi-scale training.

Looking at https://github.com/chengyangfu/retinamask/blob/master/maskrcnn_benchmark/data/transforms/build.py#L10.

It seems you always use multiscale resizing option which seems depends on the number of min scales. If we were to specify multiple min-sizes in the config files, we would get multi-scale training. Is that right?

Can you point me config file where you do that? Or you don't do that at all.

How the batch is handled given all the images could have arbitrary second dimension resulting in arbitrary feature map size and a different number of predictions depending on input image size.

Many thanks
Gurkirt

why 'the number of mask proposals is (100+Gt) during training'?

Hi, I am very glad you share the retinamask, and I have some questions:

  1. in your paper, you said training the mask subnet by (100+ GT), why not GT? and I want have a try, so I need confirm if the (100+GT) is set in the fellow line:
    ( ../maskrcnn_benchmark/modeling/detector/retinanet.py )
    """
    proposals = []
    for (image_detections, image_targets) in zip(detections, targets):
    merge_list = []

                     if not isinstance(image_detections, list): 
                        merge_list.append(image_detections.copy_with_fields('labels'))
                     if not isinstance(image_targets, list):
                         merge_list.append(image_targets.copy_with_fields('labels'))
    
                     if len(merge_list) == 1: proposals.append(merge_list[0])
                     else: proposals.append(cat_boxlist(merge_list))
                 """
    

I mean, I want to just using gt for training, I want to delete the detection net's proposals.
2. can you tell me ,if is it a practical way fo ignoring the mask net, and just training the detection, then freeze it, start train mask subnet? do you have any more experiment ?
Looking u reply.Thanks a lot.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.