Code Monkey home page Code Monkey logo

solq's Introduction

SOLQ: Segmenting Objects by Learning Queries

PWC PWC PWC PWC

This repository is an official implementation of the NeurIPS 2021 paper SOLQ: Segmenting Objects by Learning Queries.

Introduction

TL; DR. SOLQ is an end-to-end instance segmentation framework with Transformer. It directly outputs the instance masks without any box dependency.

Abstract. In this paper, we propose an end-to-end framework for instance segmentation. Based on the recently introduced DETR, our method, termed SOLQ, segments objects by learning unified queries. In SOLQ, each query represents one object and has multiple representations: class, location and mask. The object queries learned perform classification, box regression and mask encoding simultaneously in an unified vector form. During training phase, the mask vectors encoded are supervised by the compression coding of raw spatial masks. In inference time, mask vectors produced can be directly transformed to spatial masks by the inverse process of compression coding. Experimental results show that SOLQ can achieve state-of-the-art performance, surpassing most of existing approaches. Moreover, the joint learning of unified query representation can greatly improve the detection performance of original DETR. We hope our SOLQ can serve as a strong baseline for the Transformer-based instance segmentation.

Updates

  • (12/10/2021) Release D-DETR+SQR log.txt in SQR.
  • (29/09/2021) Our SOLQ has been accepted by NeurIPS 2021.
  • (14/07/2021) Higher performance (Box AP=56.5, Mask AP=46.7) is reported by training with long side 1536 on Swin-L backbone, instead of long side 1333.

Main Results

Method Backbone Dataset Box AP Mask AP Model
SOLQ R50 test-dev 47.8 39.7 google
SOLQ R101 test-dev 48.7 40.9 google
SOLQ Swin-L test-dev 55.4 45.9 google
SOLQ Swin-L & 1536 test-dev 56.5 46.7 google

Installation

The codebase is built on top of Deformable DETR.

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

    We recommend you to use Anaconda to create a conda environment:

    conda create -n deformable_detr python=3.7 pip

    Then, activate the environment:

    conda activate deformable_detr
  • PyTorch>=1.5.1, torchvision>=0.6.1 (following instructions here)

    For example, if your CUDA version is 9.2, you could install pytorch and torchvision as following:

    conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=9.2 -c pytorch
  • Other requirements

    pip install -r requirements.txt
  • Build MultiScaleDeformableAttention

    cd ./models/ops
    sh ./make.sh

Usage

Dataset preparation

Please download COCO and organize them as following:

mkdir data && cd data
ln -s /path/to/coco coco

Training and Evaluation

Training on single node

Training SOLQ on 8 GPUs as following:

sh configs/r50_solq_train.sh

Evaluation

You can download the pretrained model of SOLQ (the link is in "Main Results" session), then run following command to evaluate it on COCO 2017 val dataset:

sh configs/r50_solq_eval.sh

Evaluation on COCO 2017 test-dev dataset

You can download the pretrained model of SOLQ (the link is in "Main Results" session), then run following command to evaluate it on COCO 2017 test-dev dataset (submit to server):

sh configs/r50_solq_submit.sh

Visualization on COCO 2017 val dataset

You can visualize on image as follows:

EXP_DIR=/path/to/checkpoint
python visual.py \
       --meta_arch solq \
       --backbone resnet50 \
       --with_vector \
       --with_box_refine \
       --masks \
       --batch_size 2 \
       --vector_hidden_dim 1024 \
       --vector_loss_coef 3 \
       --output_dir ${EXP_DIR} \
       --resume ${EXP_DIR}/solq_r50_final.pth \
       --eval    

Citing SOLQ

If you find SOLQ useful in your research, please consider citing:

@article{dong2021solq,
  title={SOLQ: Segmenting Objects by Learning Queries},
  author={Dong, Bin and Zeng, Fangao and Wang, Tiancai and Zhang, Xiangyu and Wei, Yichen},
  journal={NeurIPS},
  year={2021}
}

solq's People

Contributors

dbofseuofhust avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

solq's Issues

Semantics of with_vector

  1. Do I understand correctly that SOLQ(..., with_vector = False) is equivalent to vanilla DeformableDETR(...)?
  2. Why are postprocessors created only in args.eval?
    postprocessors = {'bbox': PostProcess(processor_dct=processor_dct if (args.with_vector and args.eval) else None)}
    The postprocessors should be applied at regular evaluation during training too, right? In fast_solq code only args.masks is checked:
    postprocessors = {'bbox': PostProcess(processor_dct=processor_dct if args.with_vector else None)}
    . What's the motivation for this difference?

Thanks!

How to train with 2 or more machines?

Very wonderful work! It's very beneficial to read the paper! One thing I would like to ask you, can this code be trained with 2 servers at the same time? How do I modify the bash?

Training Time

Cool work!!!!
Thanks for opensourcing the code.
I wonder how long it takes for training via one machine ?

ImportError: cannot import name '_NewEmptyTensorOp' from 'torchvision.ops.misc'

Error

ImportError

Steps to reproduce the behavior:

  1. Git clone the repository of SOLQ
  2. Update the dataset you want to use.
  3. Update the data paths in the file SOLQ/datasets/coco.py
  4. RUn the bash file configs/r50_solq_train.sh

Expected behavior

It should now show the error and move further to run the SOL-Q model.

Environment

PyTorch version: 1.9.0+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.12.0
Libc version: glibc-2.26

Python version: 3.7.11 (default, Jul 3 2021, 18:01:19) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.4.104+-x86_64-with-Ubuntu-18.04-bionic
Is CUDA available: True
CUDA runtime version: 11.0.221
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 460.32.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.4
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.9.0+cu102
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.10.0
[pip3] torchvision==0.10.0+cu102
[conda] Could not collect

Additional context

When running the script !bash configs/r50_solq_train.sh it, shows ImportError like shown below :

Traceback (most recent call last):
  File "main.py", line 22, in <module>
    import datasets
  File "/content/SOLQ/datasets/__init__.py", line 13, in <module>
    from .coco import build as build_coco
  File "/content/SOLQ/datasets/coco.py", line 23, in <module>
    from util.misc import get_local_rank, get_local_size
  File "/content/SOLQ/util/misc.py", line 36, in <module>
    from torchvision.ops.misc import _NewEmptyTensorOp

Some questions about potential further experiments.

Hi, I am keeping focusing on DETR related work.
This work is really interesting.
It can be a good supplement answer to the issue facebookresearch/detr#163.

Here, I am still have some questions:

  1. I am surprised that UQR could provide more than 7 points improvement on AP^seg, as shown in Table2. May I ask for the detailed comparision about AP^seg_S,AP^seg_M and AP^seg_L? I wonder where the main improvement comes from?
  2. As seen in table 1, SOLQ performs best on AP^box_L(much more better than other methods). I am confused that why its performance is not best on AP^seg_L. Could I regard it as the drawback of the mask compression coding on large objects? I wonder whether SQR could perform well on large objects and UQR performs better on small and medium objects. What's your idea?
  3. If my understand correctly, this method can easily apply to panoptic segmentation. May I ask for some results about panoptic segmentation? With this result, we could better analyse the gap between SOLQ and DETR.

training time

“Thanks for your attention on SOLQ! It will take about 1.5 days and 2.0 days to train SOLQ with R50 and R101 backbones, respectively. As for the Swin-Large backbone, it will take nearly four days to train due to the large computation cost.”

Excuse me, do you use 2 GPUs or 8 GPUs for training for 1.5 days? the read me you provided uses 8 GPUs. so I confused

Difference between feature usage between DeformableDETR and DeformableDETRsegm

Could you please comment on difference between feature usage in DeformableDETR and DeformableDETRsegm?

The first uses all features elements, and the latter only features[-1]...

Thank you!

https://github.com/megvii-research/SOLQ/blob/5471f58/models/deformable_detr.py#L136-L162 :

        features, pos = self.backbone(samples)

        srcs = []
        masks = []
        for l, feat in enumerate(features):
            src, mask = feat.decompose()
            srcs.append(self.input_proj[l](src))
            masks.append(mask)
            assert mask is not None
        if self.num_feature_levels > len(srcs):
            _len_srcs = len(srcs)
            for l in range(_len_srcs, self.num_feature_levels):
                if l == _len_srcs:
                    src = self.input_proj[l](features[-1].tensors)
                else:
                    src = self.input_proj[l](srcs[-1])
                m = samples.mask
                mask = F.interpolate(m[None].float(), size=src.shape[-2:]).to(torch.bool)[0]
                pos_l = self.backbone[1](NestedTensor(src, mask)).to(src.dtype)
                srcs.append(src)
                masks.append(mask)
                pos.append(pos_l)

        query_embeds = None
        if not self.two_stage:
            query_embeds = self.query_embed.weight
        hs, init_reference, inter_references, enc_outputs_class, enc_outputs_coord_unact, _, _ = self.transformer(srcs, masks, pos, query_embeds)

https://github.com/megvii-research/SOLQ/blob/5471f58/models/segmentation.py#L49-L55 :

        features, pos = self.detr.backbone(samples)

        bs = features[-1].tensors.shape[0]

        src, mask = features[-1].decompose()
        src_proj = self.detr.input_proj(src)
        hs, memory = self.detr.transformer(src_proj, mask, self.detr.query_embed.weight, pos[-1])

question about experimental results.

1

The dataset in the caption of Table1 is 'coco test-dev'.
The dataset in the caption of Table2 is 'coco val'.
But the results are the same for 'SOLQ(r50)' in Table1 and 'D-DETR∗+UQR(r50)' in Table2. Is this a mistake?

2

What is the setting of the experiments in Table3? Experiments use 1x(12ep) training strategy?
.

how to train on custom dataset?

Hi,

It's possible to train on custom datasets (Classes not found in coco dataset), there is some tutorial for this. Thanks for the response

KeyError: 'pred_vectors' An error occurred during the last training session. May I ask what happened

Epoch: [0] [39428/39429] eta: 0:00:00 lr: 0.000200 class_error: 60.00 grad_norm: 32.79 loss: 3.9304 (4.0446) loss_ce: 1.6838 (1.6791) loss_bbox: 0.8631 (1.0390) loss_giou: 1.3310 (1.3266) loss_ce_unscaled: 0.8419 (0.8395) class_error_unscaled: 81.2500 (71.9218) loss_bbox_unscaled: 0.1726 (0.2078) loss_giou_unscaled: 0.6655 (0.6633) cardinality_error_unscaled: 294.6667 (293.4123) time: 0.4156 data: 0.0000 max mem: 19834
Epoch: [0] Total time: 4:32:57 (0.4154 s / it)
Averaged stats: lr: 0.000200 class_error: 60.00 grad_norm: 32.79 loss: 3.9304 (4.0446) loss_ce: 1.6838 (1.6791) loss_bbox: 0.8631 (1.0390) loss_giou: 1.3310 (1.3266) loss_ce_unscaled: 0.8419 (0.8395) class_error_unscaled: 81.2500 (71.9218) loss_bbox_unscaled: 0.1726 (0.2078) loss_giou_unscaled: 0.6655 (0.6633) cardinality_error_unscaled: 294.6667 (293.4123)
Traceback (most recent call last):
File "/root/autodl-tmp/solq/main.py", line 407, in
main(0, ngpus_per_node, args)
File "/root/autodl-tmp/solq/main.py", line 362, in main
test_stats, coco_evaluator = evaluate(
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/root/autodl-tmp/solq/engine.py", line 125, in evaluate
results = postprocessors['bbox'](outputs, orig_target_sizes)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/root/autodl-tmp/solq/models/solq.py", line 479, in forward
out_logits, out_bbox, out_vector = outputs['pred_logits'], outputs['pred_boxes'], outputs['pred_vectors']
KeyError: 'pred_vectors'

batchsize=3 An error occurred during the last training session. May I ask what happened

Swin-L Train and Test Image Resizing

Hi,

You mentioned:

Higher performance (Box AP=56.5, Mask AP=46.7) is reported by training with long side 1536 on Swin-L backbone, instead of long side 1333.

May I know the image resizing strategies during training and testing? I found some commented codes for Swin-L, for training https://github.com/megvii-research/SOLQ/blob/main/datasets/coco.py#L135 and for testing https://github.com/megvii-research/SOLQ/blob/main/datasets/coco.py#L158. But for testing the long side is 1333 instead of 1536. Could you please clarify this? Thank you very much!

Performance on DETR

As the SOLQ is built on D-DETR, have you performed the experiment on the original DETR model? I would be very grateful if you can provide the results.

On your statement about including mask loss in bipartite matching

I have a question on the inclusion of mask loss in bipartite matching. You mentioned in the paper in page 5 last sentence that "The participation of mask loss may affect the global matching between the object queries and ground-truths.". Could you please say if this is an empirical finding? Basically, I want to know why do you think this would be true?

Thanks a lot in advance!

cuda out of memory

i use 3090*8 and set batch_size =2, but train terminated for cuda out of memory, how can i do ?

Swin-L pretrained checkpoints used

Hi @dbofseuofhust, @vaesl!

Can't find in the code the URLs to the ImageNet-pretrained Swin-L. Which checkpoints did you use? https://github.com/microsoft/Swin-Transformer provides many different ones.

Could you please publish a config for training using Swin-L?

Are your modifications to swin_transformer.py upstreamed anywhere?

I also wonder, have you tried other Swin backbones like Swin-S or Swin-B? ESViT repo publishes some self-sup trained Swin, but they are only for Swin-S/T/B: https://github.com/microsoft/esvit ...

Thank you!

DCT visualize

Hi there,
I'm trying to visualize the mask after idct, the code is:

save_dir = './vis'
gt_mask_len = 512
n_keep = 128
processor_dct = ProcessorDCT(n_keep=n_keep, gt_mask_len=gt_mask_len)
#mask = cv2.imread(img_path, 0).astype(np.float32)
mask = np.load('mask.npy')
new_mask = np.array((mask==1)).astype(np.float32)

new_mask = cv2.resize(new_mask, (gt_mask_len, gt_mask_len))
coeffs = cv2.dct(new_mask)
cv2.imwrite(os.path.join(save_dir, '{}_coeffs.png'.format(name.split('.')[0])), coeffs)

idct = np.zeros((gt_mask_len**2))
vectors = torch.from_numpy(coeffs).flatten()
vectors = vectors[torch.tensor(processor_dct.zigzag_table)]

idct[:n_keep] = vectors.cpu().numpy()
idct = processor_dct.inverse_zigzag(idct, gt_mask_len, gt_mask_len)
cv2.imwrite(os.path.join(save_dir, '{}_i_coeffs.png'.format(name.split('.')[0])), idct)
re_mask = cv2.idct(idct)
max_v = np.max(re_mask)
min_v = np.min(re_mask)
re_mask = np.where(re_mask>(max_v+min_v) / 2., 255, 0)
cv2.imwrite(os.path.join(save_dir, '{}_recover.png'.format(name.split('.')[0])), re_mask)


plt.figure(1)
plt.imshow(new_mask)
plt.figure(2)
plt.imshow(re_mask)

new_mask shows:
image

re_mask shows:
image

The re_mask looks different with the original mask.

I was wondering why this is and if there is anything I am doing wrong.

A question about the implementation of segmentation mask for Deformable Detr

Hello,

First of all, thank you for your great work.

I want to ask a point in the implementation of input projection before the MaskHeadSmallConv in segmentation.py .
The implementation applies stride 2 to the features which makes the best stride 8. However, for the segmentation tasks, it is possible to get better result when the stride 4 is utilized for the mask creation. The original segmentation head implementation of DETR also utilizes in that way. Therefore, I want to ask that what is the reason for utilizing that stride in your implementation?

Thanks in advance

Add mask to the matching cost

Hi, authors, have you tried computing the matching cost considering cls, reg, and mask together ? Is there any difference in the performance? Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.