Code Monkey home page Code Monkey logo

swint_detectron2's Introduction

SwinT_detectron2

Swin Transformer for Object Detection by detectron2

This repo contains the supported code and configuration files to reproduce object detection results of Swin Transformer. It is based on detectron2.

You can find SwinV2 in this repo

Results and Models

RetinaNet

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs config log model
Swin-T ImageNet-1K 3x 44.6 - - - config - model

Faster R-CNN

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs config log model
Swin-T FPN ImageNet-1K 3x 45.1 - - - config - model

Mask R-CNN

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs config log model
Swin-T FPN ImageNet-1K 3x 45.5 41.8 - - config - model

The mask mAP (41.8 vs 41.6) is same as the mmdetection, but box mAP is worse (45.5 vs 46.0)

Usage

Please refer to get_started.md for installation and dataset preparation.

note: you need convert the original pretrained weights to d2 format by convert_to_d2.py

References

swint_detectron2's People

Contributors

anthonyweidai avatar l3str4nge avatar xiaohu2015 avatar yangyanggirl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

swint_detectron2's Issues

Faster R-CNN Pre-trained Model

Hi, thank you very much for your graceful work.
I would like to ask if you have any plans to release the Faster R-CNN pre-trained model? I would appreciate it if you could do this, it would help me a lot.
Thank you very much!

Detectron2 version

Hi, thank you for your nice work.
Can you tell me which detectron2 version have been used?
I have a problem when evaluate the model(training by myself or downloading from the link)

[05/06 11:12:29] d2.evaluation.coco_evaluation WARNING: No predictions from the model!
[05/06 11:12:29] d2.engine.defaults INFO: Evaluation results for coco_2017_val in csv format:
[05/06 11:12:29] d2.evaluation.testing INFO: copypaste: Task: bbox
[05/06 11:12:29] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl
[05/06 11:12:29] d2.evaluation.testing INFO: copypaste: nan,nan,nan,nan,nan,nan

I want to make sure whether it is a version issue, thanks~

About the train

Thank you for your nice project!
I called your swint-Detectron2 code to train my own data set. I used the network configuration of retinanet. No error was reported during the training process,iteration=200000 AP50=0.7 or so, and the test set is only AP50=0.5 or so. I have changed pixel_mean according to my dataset.
I would like to know what other parameters should be paid attention to in training, which may be helpful for my training!

Swintransformer结构问题

你好,我想问下model中swintransformer好像并不是级联的结构
image
输出的四个stage好像是互不影响的,请问这样的原因是什么呢?

pred_masks

Thank you for your nice project! i trained my own data with your backbone then i want to inference my result.but it shows that it Cannot find field 'pred_masks' in the given Instances! .i use mask_rcnn_swint_T_FPN_3x.yaml as my config.

My version of Detectron2

Hi,
My code work with Detectron 0.3 and I want change it's backbone from Resnet to SwinT,
you said your code work with Detectron 0.6, Do you have any idea about solve this problem?

Thanks

Support for FRCN C4

Hi, thanks for your excellent work. I wonder if Swin-Transformer supports FRCN C-4? How to write the config? Thanks!

version of torch

Hi,
Thanks for sharing your Code. I tried to run convert_to_d2.py file to create model file. But i have an error and I guess it's because of torch version. Can you help me about step of running and please tell which version torch i should to use.

Thank you

About eval

The metrics of the results are nan.

The train_net.py as follow:

#!/usr/bin/env python

Copyright (c) Facebook, Inc. and its affiliates.

"""
Detection Training Script.
This scripts reads a given config file and runs the training or evaluation.
It is an entry point that is made to train standard models in detectron2.
In order to let one script support training of many models,
this script contains logic that are specific to these built-in models and therefore
may not be suitable for your own project.
For example, your research project perhaps only needs a single "evaluator".
Therefore, we recommend you to use detectron2 as an library and take
this file as an example of how to use the library.
You may want to write your own script with your datasets and other customizations.
"""
import itertools
import logging
import os
from collections import OrderedDict
import torch

import detectron2.utils.comm as comm
from detectron2.checkpoint import DetectionCheckpointer
from detectron2.config import get_cfg
from detectron2.data import MetadataCatalog
from detectron2.engine import DefaultTrainer, default_argument_parser, default_setup, hooks, launch
from detectron2.evaluation import (
CityscapesInstanceEvaluator,
CityscapesSemSegEvaluator,
COCOEvaluator,
COCOPanopticEvaluator,
DatasetEvaluators,
LVISEvaluator,
PascalVOCDetectionEvaluator,
SemSegEvaluator,
verify_results,
)
from detectron2.modeling import GeneralizedRCNNWithTTA
from detectron2.solver.build import maybe_add_gradient_clipping, get_default_optimizer_params

from swint import add_swint_config
import pig_dataset

class Trainer(DefaultTrainer):
"""
We use the "DefaultTrainer" which contains pre-defined default logic for
standard training workflow. They may not work for you, especially if you
are working on a new research project. In that case you can write your
own training loop. You can use "tools/plain_train_net.py" as an example.
"""

@classmethod
def build_evaluator(cls, cfg, dataset_name, output_folder=None):
    """
    Create evaluator(s) for a given dataset.
    This uses the special metadata "evaluator_type" associated with each builtin dataset.
    For your own dataset, you can simply create an evaluator manually in your
    script and do not have to worry about the hacky if-else logic here.
    """
    if output_folder is None:
        output_folder = os.path.join(cfg.OUTPUT_DIR, "inference")
    evaluator_list = []
    evaluator_type = MetadataCatalog.get(dataset_name).evaluator_type
    if evaluator_type in ["sem_seg", "coco_panoptic_seg"]:
        evaluator_list.append(
            SemSegEvaluator(
                dataset_name,
                distributed=True,
                num_classes=cfg.MODEL.SEM_SEG_HEAD.NUM_CLASSES,
                ignore_label=cfg.MODEL.SEM_SEG_HEAD.IGNORE_VALUE,
                output_dir=output_folder,
            )
        )
    if evaluator_type in ["coco", "coco_panoptic_seg"]:
        evaluator_list.append(COCOEvaluator(dataset_name, cfg, True, output_folder))
    if evaluator_type == "coco_panoptic_seg":
        evaluator_list.append(COCOPanopticEvaluator(dataset_name, output_folder))
    if evaluator_type == "cityscapes_instance":
        assert (
            torch.cuda.device_count() >= comm.get_rank()
        ), "CityscapesEvaluator currently do not work with multiple machines."
        return CityscapesInstanceEvaluator(dataset_name)
    if evaluator_type == "cityscapes_sem_seg":
        assert (
            torch.cuda.device_count() >= comm.get_rank()
        ), "CityscapesEvaluator currently do not work with multiple machines."
        return CityscapesSemSegEvaluator(dataset_name)
    elif evaluator_type == "pascal_voc":
        return PascalVOCDetectionEvaluator(dataset_name)
    elif evaluator_type == "lvis":
        return LVISEvaluator(dataset_name, cfg, True, output_folder)
    if len(evaluator_list) == 0:
        raise NotImplementedError(
            "no Evaluator for the dataset {} with the type {}".format(
                dataset_name, evaluator_type
            )
        )
    elif len(evaluator_list) == 1:
        return evaluator_list[0]
    return DatasetEvaluators(evaluator_list)

@classmethod
def test_with_TTA(cls, cfg, model):
    logger = logging.getLogger("detectron2.trainer")
    # In the end of training, run an evaluation with TTA
    # Only support some R-CNN models.
    logger.info("Running inference with test-time augmentation ...")
    model = GeneralizedRCNNWithTTA(cfg, model)
    evaluators = [
        cls.build_evaluator(
            cfg, name, output_folder=os.path.join(cfg.OUTPUT_DIR, "inference_TTA")
        )
        for name in cfg.DATASETS.TEST
    ]
    res = cls.test(cfg, model, evaluators)
    res = OrderedDict({k + "_TTA": v for k, v in res.items()})
    return res

@classmethod
def build_optimizer(cls, cfg, model):
    params = get_default_optimizer_params(
        model,
        base_lr=cfg.SOLVER.BASE_LR,
        weight_decay=cfg.SOLVER.WEIGHT_DECAY,
        weight_decay_norm=cfg.SOLVER.WEIGHT_DECAY_NORM,
        bias_lr_factor=cfg.SOLVER.BIAS_LR_FACTOR,
        weight_decay_bias=cfg.SOLVER.WEIGHT_DECAY_BIAS,
    )

    def maybe_add_full_model_gradient_clipping(optim):  # optim: the optimizer class
        # detectron2 doesn't have full model gradient clipping now
        clip_norm_val = cfg.SOLVER.CLIP_GRADIENTS.CLIP_VALUE
        enable = (
            cfg.SOLVER.CLIP_GRADIENTS.ENABLED
            and cfg.SOLVER.CLIP_GRADIENTS.CLIP_TYPE == "full_model"
            and clip_norm_val > 0.0
        )

        class FullModelGradientClippingOptimizer(optim):
            def step(self, closure=None):
                all_params = itertools.chain(*[x["params"] for x in self.param_groups])
                torch.nn.utils.clip_grad_norm_(all_params, clip_norm_val)
                super().step(closure=closure)

        return FullModelGradientClippingOptimizer if enable else optim

    optimizer_type = cfg.SOLVER.OPTIMIZER
    if optimizer_type == "SGD":
        optimizer = maybe_add_gradient_clipping(torch.optim.SGD)(
            params, cfg.SOLVER.BASE_LR, momentum=cfg.SOLVER.MOMENTUM,
            nesterov=cfg.SOLVER.NESTEROV,
            weight_decay=cfg.SOLVER.WEIGHT_DECAY,
        )
    elif optimizer_type == "AdamW":
        optimizer = maybe_add_full_model_gradient_clipping(torch.optim.AdamW)(
            params, cfg.SOLVER.BASE_LR, betas=(0.9, 0.999),
            weight_decay=cfg.SOLVER.WEIGHT_DECAY,
        )

    else:
        raise NotImplementedError("no optimizer type {optimizer_type}")
    return optimizer

def setup(args):
"""
Create configs and perform basic setups.
"""
cfg = get_cfg()
add_swint_config(cfg)
args.config_file = "./configs/SwinT/retinanet_swint_T_FPN_3x.yaml"####
cfg.merge_from_file(args.config_file)
cfg.MODEL.WEIGHTS = "weights/retinanet_swint_S_3x.pth"###
cfg.DATASETS.TRAIN = ("pig_coco_train",)
cfg.DATASETS.TEST = ("pig_coco_test", )
cfg.merge_from_list(args.opts)
cfg.freeze()
default_setup(cfg, args)
return cfg

def main(args):
cfg = setup(args)
args.eval_only = True

if args.eval_only:
    model = Trainer.build_model(cfg)
    DetectionCheckpointer(model, save_dir=cfg.OUTPUT_DIR).resume_or_load(
        cfg.MODEL.WEIGHTS, resume=args.resume
    )
    res = Trainer.test(cfg, model)
    if cfg.TEST.AUG.ENABLED:
        res.update(Trainer.test_with_TTA(cfg, model))
    if comm.is_main_process():
        verify_results(cfg, res)
    return res

"""
If you'd like to do anything fancier than the standard training logic,
consider writing your own training loop (see plain_train_net.py) or
subclassing the trainer.
"""
trainer = Trainer(cfg)
trainer.resume_or_load(resume=args.resume)
if cfg.TEST.AUG.ENABLED:
    trainer.register_hooks(
        [hooks.EvalHook(0, lambda: trainer.test_with_TTA(cfg, trainer.model))]
    )
return trainer.train()

if name == "main":
args = default_argument_parser().parse_args()
print("Command Line Args:", args)
launch(
main,
args.num_gpus,
num_machines=args.num_machines,
machine_rank=args.machine_rank,
dist_url=args.dist_url,
args=(args,),
)

=================================================================================================
The output as follow:

/home/server/anaconda3/envs/swin_d2/bin/python /home/server/SwinT_detectron2/train_net.py
Command Line Args: Namespace(config_file='', dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
Loading config ./configs/SwinT/../Base-RetinaNet.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content.
[08/26 12:26:16 detectron2]: Rank of current process: 0. World size: 1
[08/26 12:26:17 detectron2]: Environment info:


sys.platform linux
Python 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]
numpy 1.19.2
detectron2 0.5 @/home/server/anaconda3/envs/swin_d2/lib/python3.7/site-packages/detectron2
Compiler GCC 7.3
CUDA compiler CUDA 11.1
detectron2 arch flags 3.7, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.0, 8.6
DETECTRON2_ENV_MODULE
PyTorch 1.9.0 @/home/server/anaconda3/envs/swin_d2/lib/python3.7/site-packages/torch
PyTorch debug build False
GPU available Yes
GPU 0 NVIDIA GeForce RTX 3090 (arch=8.6)
CUDA_HOME /usr/local/cuda
Pillow 8.3.1
torchvision 0.10.0 @/home/server/anaconda3/envs/swin_d2/lib/python3.7/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20210804
iopath 0.1.8
cv2 4.4.0


PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  • CuDNN 8.0.5
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

[08/26 12:26:17 detectron2]: Command line arguments: Namespace(config_file='./configs/SwinT/retinanet_swint_T_FPN_3x.yaml', dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[08/26 12:26:17 detectron2]: Contents of args.config_file=./configs/SwinT/retinanet_swint_T_FPN_3x.yaml:
BASE: "../Base-RetinaNet.yaml"
MODEL:
WEIGHTS: "weights/retinanet_swint_S_3x.pth"
PIXEL_MEAN: [123.675, 116.28, 103.53] # use RGB [103.530, 116.280, 123.675]
PIXEL_STD: [58.395, 57.12, 57.375] #[57.375, 57.120, 58.395] # I use the dafault config [1.0, 1.0, 1.0] and BGR format, that is a mistake
RESNETS:
DEPTH: 50
BACKBONE:
NAME: "build_retinanet_swint_fpn_backbone"
SWINT:
OUT_FEATURES: ["stage3", "stage4", "stage5"]
FPN:
IN_FEATURES: ["stage3", "stage4", "stage5"]
INPUT:
FORMAT: "RGB"
SOLVER:
STEPS: (210000, 250000)
MAX_ITER: 270000
WEIGHT_DECAY: 0.05
BASE_LR: 0.0001
AMP:
ENABLED: True
TEST:
EVAL_PERIOD: 30000

DATASETS:
TRAIN: ("coco_2017_train",)
TEST: ("coco_2017_val",)

[08/26 12:26:17 detectron2]: Running with full config:
CUDNN_BENCHMARK: false
DATALOADER:
ASPECT_RATIO_GROUPING: true
FILTER_EMPTY_ANNOTATIONS: true
NUM_WORKERS: 4
REPEAT_THRESHOLD: 0.0
SAMPLER_TRAIN: TrainingSampler
DATASETS:
PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: []
PROPOSAL_FILES_TRAIN: []
TEST:

  • pig_coco_test
    TRAIN:
  • pig_coco_train
    GLOBAL:
    HACK: 1.0
    INPUT:
    CROP:
    ENABLED: false
    SIZE:
    • 0.9
    • 0.9
      TYPE: relative_range
      FORMAT: RGB
      MASK_FORMAT: polygon
      MAX_SIZE_TEST: 1333
      MAX_SIZE_TRAIN: 1333
      MIN_SIZE_TEST: 800
      MIN_SIZE_TRAIN:
  • 640
  • 672
  • 704
  • 736
  • 768
  • 800
    MIN_SIZE_TRAIN_SAMPLING: choice
    RANDOM_FLIP: horizontal
    MODEL:
    ANCHOR_GENERATOR:
    ANGLES:
      • -90
      • 0
      • 90
        ASPECT_RATIOS:
      • 0.5
      • 1.0
      • 2.0
        NAME: DefaultAnchorGenerator
        OFFSET: 0.0
        SIZES:
      • 32
      • 40.31747359663594
      • 50.79683366298238
      • 64
      • 80.63494719327188
      • 101.59366732596476
      • 128
      • 161.26989438654377
      • 203.18733465192952
      • 256
      • 322.53978877308754
      • 406.37466930385904
      • 512
      • 645.0795775461751
      • 812.7493386077181
        BACKBONE:
        FREEZE_AT: -1
        NAME: build_retinanet_swint_fpn_backbone
        DEVICE: cuda
        FPN:
        FUSE_TYPE: sum
        IN_FEATURES:
    • stage3
    • stage4
    • stage5
      NORM: ''
      OUT_CHANNELS: 256
      TOP_LEVELS: 2
      KEYPOINT_ON: false
      LOAD_PROPOSALS: false
      MASK_ON: false
      META_ARCHITECTURE: RetinaNet
      PANOPTIC_FPN:
      COMBINE:
      ENABLED: true
      INSTANCES_CONFIDENCE_THRESH: 0.5
      OVERLAP_THRESH: 0.5
      STUFF_AREA_LIMIT: 4096
      INSTANCE_LOSS_WEIGHT: 1.0
      PIXEL_MEAN:
  • 123.675
  • 116.28
  • 103.53
    PIXEL_STD:
  • 58.395
  • 57.12
  • 57.375
    PROPOSAL_GENERATOR:
    MIN_SIZE: 0
    NAME: RPN
    RESNETS:
    DEFORM_MODULATED: false
    DEFORM_NUM_GROUPS: 1
    DEFORM_ON_PER_STAGE:
    • false
    • false
    • false
    • false
      DEPTH: 50
      NORM: FrozenBN
      NUM_GROUPS: 1
      OUT_FEATURES:
    • res3
    • res4
    • res5
      RES2_OUT_CHANNELS: 256
      RES5_DILATION: 1
      STEM_OUT_CHANNELS: 64
      STRIDE_IN_1X1: true
      WIDTH_PER_GROUP: 64
      RETINANET:
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_WEIGHTS: &id001
    • 1.0
    • 1.0
    • 1.0
    • 1.0
      FOCAL_LOSS_ALPHA: 0.25
      FOCAL_LOSS_GAMMA: 2.0
      IN_FEATURES:
    • p3
    • p4
    • p5
    • p6
    • p7
      IOU_LABELS:
    • 0
    • -1
    • 1
      IOU_THRESHOLDS:
    • 0.4
    • 0.5
      NMS_THRESH_TEST: 0.5
      NORM: ''
      NUM_CLASSES: 80
      NUM_CONVS: 4
      PRIOR_PROB: 0.01
      SCORE_THRESH_TEST: 0.05
      SMOOTH_L1_LOSS_BETA: 0.0
      TOPK_CANDIDATES_TEST: 1000
      ROI_BOX_CASCADE_HEAD:
      BBOX_REG_WEIGHTS:
      • 10.0
      • 10.0
      • 5.0
      • 5.0
      • 20.0
      • 20.0
      • 10.0
      • 10.0
      • 30.0
      • 30.0
      • 15.0
      • 15.0
        IOUS:
    • 0.5
    • 0.6
    • 0.7
      ROI_BOX_HEAD:
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_LOSS_WEIGHT: 1.0
      BBOX_REG_WEIGHTS:
    • 10.0
    • 10.0
    • 5.0
    • 5.0
      CLS_AGNOSTIC_BBOX_REG: false
      CONV_DIM: 256
      FC_DIM: 1024
      NAME: ''
      NORM: ''
      NUM_CONV: 0
      NUM_FC: 0
      POOLER_RESOLUTION: 14
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      SMOOTH_L1_BETA: 0.0
      TRAIN_ON_PRED_BOXES: false
      ROI_HEADS:
      BATCH_SIZE_PER_IMAGE: 512
      IN_FEATURES:
    • res4
      IOU_LABELS:
    • 0
    • 1
      IOU_THRESHOLDS:
    • 0.5
      NAME: Res5ROIHeads
      NMS_THRESH_TEST: 0.5
      NUM_CLASSES: 80
      POSITIVE_FRACTION: 0.25
      PROPOSAL_APPEND_GT: true
      SCORE_THRESH_TEST: 0.05
      ROI_KEYPOINT_HEAD:
      CONV_DIMS:
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
      LOSS_WEIGHT: 1.0
      MIN_KEYPOINTS_PER_IMAGE: 1
      NAME: KRCNNConvDeconvUpsampleHead
      NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
      NUM_KEYPOINTS: 17
      POOLER_RESOLUTION: 14
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      ROI_MASK_HEAD:
      CLS_AGNOSTIC_MASK: false
      CONV_DIM: 256
      NAME: MaskRCNNConvUpsampleHead
      NORM: ''
      NUM_CONV: 0
      POOLER_RESOLUTION: 14
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      RPN:
      BATCH_SIZE_PER_IMAGE: 256
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_LOSS_WEIGHT: 1.0
      BBOX_REG_WEIGHTS: *id001
      BOUNDARY_THRESH: -1
      CONV_DIMS:
    • -1
      HEAD_NAME: StandardRPNHead
      IN_FEATURES:
    • res4
      IOU_LABELS:
    • 0
    • -1
    • 1
      IOU_THRESHOLDS:
    • 0.3
    • 0.7
      LOSS_WEIGHT: 1.0
      NMS_THRESH: 0.7
      POSITIVE_FRACTION: 0.5
      POST_NMS_TOPK_TEST: 1000
      POST_NMS_TOPK_TRAIN: 2000
      PRE_NMS_TOPK_TEST: 6000
      PRE_NMS_TOPK_TRAIN: 12000
      SMOOTH_L1_BETA: 0.0
      SEM_SEG_HEAD:
      COMMON_STRIDE: 4
      CONVS_DIM: 128
      IGNORE_VALUE: 255
      IN_FEATURES:
    • p2
    • p3
    • p4
    • p5
      LOSS_WEIGHT: 1.0
      NAME: SemSegFPNHead
      NORM: GN
      NUM_CLASSES: 54
      SWINT:
      APE: false
      DEPTHS:
    • 2
    • 2
    • 6
    • 2
      DROP_PATH_RATE: 0.2
      EMBED_DIM: 96
      MLP_RATIO: 4
      NUM_HEADS:
    • 3
    • 6
    • 12
    • 24
      OUT_FEATURES:
    • stage3
    • stage4
    • stage5
      WINDOW_SIZE: 7
      WEIGHTS: weights/retinanet_swint_S_3x.pth
      OUTPUT_DIR: ./output
      SEED: -1
      SOLVER:
      AMP:
      ENABLED: true
      BASE_LR: 0.0001
      BIAS_LR_FACTOR: 1.0
      CHECKPOINT_PERIOD: 5000
      CLIP_GRADIENTS:
      CLIP_TYPE: value
      CLIP_VALUE: 1.0
      ENABLED: false
      NORM_TYPE: 2.0
      GAMMA: 0.1
      IMS_PER_BATCH: 16
      LR_SCHEDULER_NAME: WarmupMultiStepLR
      MAX_ITER: 270000
      MOMENTUM: 0.9
      NESTEROV: false
      OPTIMIZER: AdamW
      REFERENCE_WORLD_SIZE: 0
      STEPS:
  • 210000
  • 250000
    WARMUP_FACTOR: 0.001
    WARMUP_ITERS: 1000
    WARMUP_METHOD: linear
    WEIGHT_DECAY: 0.05
    WEIGHT_DECAY_BIAS: 0.0001
    WEIGHT_DECAY_NORM: 0.0
    TEST:
    AUG:
    ENABLED: false
    FLIP: true
    MAX_SIZE: 4000
    MIN_SIZES:
    • 400
    • 500
    • 600
    • 700
    • 800
    • 900
    • 1000
    • 1100
    • 1200
      DETECTIONS_PER_IMAGE: 100
      EVAL_PERIOD: 30000
      EXPECTED_RESULTS: []
      KEYPOINT_OKS_SIGMAS: []
      PRECISE_BN:
      ENABLED: false
      NUM_ITER: 200
      VERSION: 2
      VIS_PERIOD: 0

[08/26 12:26:17 detectron2]: Full config saved to ./output/config.yaml
[08/26 12:26:17 d2.utils.env]: Using a generated random seed 17194009
[08/26 12:26:18 d2.engine.defaults]: Model:
RetinaNet(
(backbone): FPN(
(fpn_lateral3): Conv2d(192, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_lateral4): Conv2d(384, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_lateral5): Conv2d(768, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(top_block): LastLevelP6P7(
(p6): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(p7): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
(bottom_up): SwinTransformer(
(patch_embed): PatchEmbed(
(proj): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
(norm): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
)
(pos_drop): Dropout(p=0.0, inplace=False)
(layers): ModuleList(
(0): BasicLayer(
(blocks): ModuleList(
(0): SwinTransformerBlock(
(norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=96, out_features=288, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=96, out_features=96, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): Identity()
(norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=96, out_features=384, bias=True)
(act): GELU()
(fc2): Linear(in_features=384, out_features=96, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
(norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=96, out_features=288, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=96, out_features=96, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=96, out_features=384, bias=True)
(act): GELU()
(fc2): Linear(in_features=384, out_features=96, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
(reduction): Linear(in_features=384, out_features=192, bias=False)
(norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
)
)
(1): BasicLayer(
(blocks): ModuleList(
(0): SwinTransformerBlock(
(norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=192, out_features=576, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=192, out_features=192, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=192, out_features=768, bias=True)
(act): GELU()
(fc2): Linear(in_features=768, out_features=192, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
(norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=192, out_features=576, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=192, out_features=192, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=192, out_features=768, bias=True)
(act): GELU()
(fc2): Linear(in_features=768, out_features=192, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
(reduction): Linear(in_features=768, out_features=384, bias=False)
(norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
)
(2): BasicLayer(
(blocks): ModuleList(
(0): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU()
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU()
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(2): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU()
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(3): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU()
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(4): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU()
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(5): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU()
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
(reduction): Linear(in_features=1536, out_features=768, bias=False)
(norm): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
)
)
(3): BasicLayer(
(blocks): ModuleList(
(0): SwinTransformerBlock(
(norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=768, out_features=2304, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=768, out_features=768, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(act): GELU()
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
(norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=768, out_features=2304, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=768, out_features=768, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(act): GELU()
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
)
)
(norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
)
(head): RetinaNetHead(
(cls_subnet): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU()
(6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU()
)
(bbox_subnet): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU()
(6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU()
)
(cls_score): Conv2d(256, 720, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bbox_pred): Conv2d(256, 36, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(anchor_generator): DefaultAnchorGenerator(
(cell_anchors): BufferList()
)
)
[08/26 12:26:18 fvcore.common.checkpoint]: [Checkpointer] Loading from weights/retinanet_swint_S_3x.pth ...
WARNING [08/26 12:26:19 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model:
pixel_mean
pixel_std
anchor_generator.cell_anchors.{0, 1, 2, 3, 4}
[08/26 12:26:19 d2.data.datasets.coco]: Loaded 5000 images in COCO format from ./datasets/coco2017/annotations/instances_val2017.json
[08/26 12:26:19 d2.data.build]: Distribution of instances among all 80 categories:

category #instances category #instances category #instances
person 10777 bicycle 314 car 1918
motorcycle 367 airplane 143 bus 283
train 190 truck 414 boat 424
traffic light 634 fire hydrant 101 stop sign 75
parking meter 60 bench 411 bird 427
cat 202 dog 218 horse 272
sheep 354 cow 372 elephant 252
bear 71 zebra 266 giraffe 232
backpack 371 umbrella 407 handbag 540
tie 252 suitcase 299 frisbee 115
skis 241 snowboard 69 sports ball 260
kite 327 baseball bat 145 baseball gl.. 148
skateboard 179 surfboard 267 tennis racket 225
bottle 1013 wine glass 341 cup 895
fork 215 knife 325 spoon 253
bowl 623 banana 370 apple 236
sandwich 177 orange 285 broccoli 312
carrot 365 hot dog 125 pizza 284
donut 328 cake 310 chair 1771
couch 261 potted plant 342 bed 163
dining table 695 toilet 179 tv 288
laptop 231 mouse 106 remote 283
keyboard 153 cell phone 262 microwave 55
oven 143 toaster 9 sink 225
refrigerator 126 book 1129 clock 267
vase 274 scissors 36 teddy bear 190
hair drier 11 toothbrush 57
total 36335
[08/26 12:26:19 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[08/26 12:26:19 d2.data.common]: Serializing 5000 elements to byte tensors and concatenating them all ...
[08/26 12:26:19 d2.data.common]: Serialized dataset takes 19.13 MiB
WARNING [08/26 12:26:19 d2.evaluation.coco_evaluation]: COCO Evaluator instantiated using config, this is deprecated behavior. Please pass in explicit arguments instead.
[08/26 12:26:20 d2.evaluation.evaluator]: Start inference on 5000 batches
/home/server/anaconda3/envs/swin_d2/lib/python3.7/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
[08/26 12:26:21 d2.evaluation.evaluator]: Inference done 11/5000. Dataloading: 0.0005 s/iter. Inference: 0.0410 s/iter. Eval: 0.0000 s/iter. Total: 0.0415 s/iter. ETA=0:03:27
[08/26 12:26:26 d2.evaluation.evaluator]: Inference done 130/5000. Dataloading: 0.0007 s/iter. Inference: 0.0415 s/iter. Eval: 0.0000 s/iter. Total: 0.0423 s/iter. ETA=0:03:25
[08/26 12:26:31 d2.evaluation.evaluator]: Inference done 252/5000. Dataloading: 0.0007 s/iter. Inference: 0.0408 s/iter. Eval: 0.0000 s/iter. Total: 0.0416 s/iter. ETA=0:03:17
[08/26 12:26:36 d2.evaluation.evaluator]: Inference done 374/5000. Dataloading: 0.0007 s/iter. Inference: 0.0407 s/iter. Eval: 0.0000 s/iter. Total: 0.0415 s/iter. ETA=0:03:12
[08/26 12:26:41 d2.evaluation.evaluator]: Inference done 493/5000. Dataloading: 0.0007 s/iter. Inference: 0.0409 s/iter. Eval: 0.0000 s/iter. Total: 0.0417 s/iter. ETA=0:03:07
[08/26 12:26:46 d2.evaluation.evaluator]: Inference done 619/5000. Dataloading: 0.0007 s/iter. Inference: 0.0405 s/iter. Eval: 0.0000 s/iter. Total: 0.0413 s/iter. ETA=0:03:00
[08/26 12:26:51 d2.evaluation.evaluator]: Inference done 741/5000. Dataloading: 0.0007 s/iter. Inference: 0.0405 s/iter. Eval: 0.0000 s/iter. Total: 0.0413 s/iter. ETA=0:02:55
[08/26 12:26:56 d2.evaluation.evaluator]: Inference done 864/5000. Dataloading: 0.0007 s/iter. Inference: 0.0404 s/iter. Eval: 0.0000 s/iter. Total: 0.0412 s/iter. ETA=0:02:50
[08/26 12:27:01 d2.evaluation.evaluator]: Inference done 985/5000. Dataloading: 0.0007 s/iter. Inference: 0.0404 s/iter. Eval: 0.0000 s/iter. Total: 0.0413 s/iter. ETA=0:02:45
[08/26 12:27:06 d2.evaluation.evaluator]: Inference done 1107/5000. Dataloading: 0.0007 s/iter. Inference: 0.0404 s/iter. Eval: 0.0000 s/iter. Total: 0.0412 s/iter. ETA=0:02:40
[08/26 12:27:11 d2.evaluation.evaluator]: Inference done 1227/5000. Dataloading: 0.0008 s/iter. Inference: 0.0405 s/iter. Eval: 0.0000 s/iter. Total: 0.0413 s/iter. ETA=0:02:35
[08/26 12:27:16 d2.evaluation.evaluator]: Inference done 1343/5000. Dataloading: 0.0008 s/iter. Inference: 0.0407 s/iter. Eval: 0.0000 s/iter. Total: 0.0415 s/iter. ETA=0:02:31
[08/26 12:27:21 d2.evaluation.evaluator]: Inference done 1467/5000. Dataloading: 0.0008 s/iter. Inference: 0.0406 s/iter. Eval: 0.0000 s/iter. Total: 0.0414 s/iter. ETA=0:02:26
[08/26 12:27:26 d2.evaluation.evaluator]: Inference done 1592/5000. Dataloading: 0.0008 s/iter. Inference: 0.0405 s/iter. Eval: 0.0000 s/iter. Total: 0.0413 s/iter. ETA=0:02:20
[08/26 12:27:31 d2.evaluation.evaluator]: Inference done 1717/5000. Dataloading: 0.0008 s/iter. Inference: 0.0404 s/iter. Eval: 0.0000 s/iter. Total: 0.0412 s/iter. ETA=0:02:15
[08/26 12:27:36 d2.evaluation.evaluator]: Inference done 1842/5000. Dataloading: 0.0008 s/iter. Inference: 0.0403 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:02:09
[08/26 12:27:41 d2.evaluation.evaluator]: Inference done 1967/5000. Dataloading: 0.0008 s/iter. Inference: 0.0403 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:02:04
[08/26 12:27:46 d2.evaluation.evaluator]: Inference done 2092/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:01:59
[08/26 12:27:51 d2.evaluation.evaluator]: Inference done 2216/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:01:54
[08/26 12:27:56 d2.evaluation.evaluator]: Inference done 2335/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:01:49
[08/26 12:28:01 d2.evaluation.evaluator]: Inference done 2457/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:01:44
[08/26 12:28:06 d2.evaluation.evaluator]: Inference done 2582/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:01:39
[08/26 12:28:11 d2.evaluation.evaluator]: Inference done 2703/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:01:34
[08/26 12:28:16 d2.evaluation.evaluator]: Inference done 2824/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:01:29
[08/26 12:28:21 d2.evaluation.evaluator]: Inference done 2946/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:01:24
[08/26 12:28:26 d2.evaluation.evaluator]: Inference done 3067/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:01:19
[08/26 12:28:31 d2.evaluation.evaluator]: Inference done 3189/5000. Dataloading: 0.0008 s/iter. Inference: 0.0403 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:01:14
[08/26 12:28:36 d2.evaluation.evaluator]: Inference done 3312/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:01:09
[08/26 12:28:41 d2.evaluation.evaluator]: Inference done 3435/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:01:04
[08/26 12:28:46 d2.evaluation.evaluator]: Inference done 3559/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:59
[08/26 12:28:51 d2.evaluation.evaluator]: Inference done 3681/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:54
[08/26 12:28:56 d2.evaluation.evaluator]: Inference done 3804/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:49
[08/26 12:29:01 d2.evaluation.evaluator]: Inference done 3928/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:43
[08/26 12:29:06 d2.evaluation.evaluator]: Inference done 4052/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:38
[08/26 12:29:11 d2.evaluation.evaluator]: Inference done 4175/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:33
[08/26 12:29:16 d2.evaluation.evaluator]: Inference done 4299/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:28
[08/26 12:29:21 d2.evaluation.evaluator]: Inference done 4423/5000. Dataloading: 0.0008 s/iter. Inference: 0.0401 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:23
[08/26 12:29:27 d2.evaluation.evaluator]: Inference done 4547/5000. Dataloading: 0.0008 s/iter. Inference: 0.0401 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:18
[08/26 12:29:32 d2.evaluation.evaluator]: Inference done 4670/5000. Dataloading: 0.0008 s/iter. Inference: 0.0401 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:13
[08/26 12:29:37 d2.evaluation.evaluator]: Inference done 4792/5000. Dataloading: 0.0008 s/iter. Inference: 0.0401 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:08
[08/26 12:29:42 d2.evaluation.evaluator]: Inference done 4912/5000. Dataloading: 0.0008 s/iter. Inference: 0.0401 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:03
[08/26 12:29:45 d2.evaluation.evaluator]: Total inference time: 0:03:24.816953 (0.041004 s / iter per device, on 1 devices)
[08/26 12:29:45 d2.evaluation.evaluator]: Total inference pure compute time: 0:03:20 (0.040161 s / iter per device, on 1 devices)
[08/26 12:29:45 d2.evaluation.coco_evaluation]: Preparing results for COCO format ...
[08/26 12:29:45 d2.evaluation.coco_evaluation]: Saving results to ./output/inference/coco_instances_results.json
[08/26 12:29:45 d2.evaluation.coco_evaluation]: Evaluating predictions with unofficial COCO API...
WARNING [08/26 12:29:45 d2.evaluation.coco_evaluation]: No predictions from the model!
[08/26 12:29:45 d2.engine.defaults]: Evaluation results for pig_coco_test in csv format:
[08/26 12:29:45 d2.evaluation.testing]: copypaste: Task: bbox
[08/26 12:29:45 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[08/26 12:29:45 d2.evaluation.testing]: copypaste: nan,nan,nan,nan,nan,nan

Process finished with exit code 0

resnet related configuration in the yaml file

Hello, I would like to ask why there are parameters related to resnet in mask_rcnn_swint_T_FPN_3x.yaml. I think that the swin transformer replaces the original resnet as the backbone network, and there should be no resnet related configuration in the yaml file. Looking forward to your answer!
image

GPU requirement

Hi, I am wondering how many GPUs you have used for reproducing the results. Is this model resource-intensive?

如何运行?

请问你的代码是可以单独运行的,还是需要将你的代码再添加进detectron2 的官方代码中再运行?

RGB order and pixel mean/std?

Hi,
Thanks for your work.
I have a question about this project.
Do we need to change RGB, PIXEL_MEAN, PIXEL_STD of the configuration, to keep consistency with the original SwinTransformer?

license

Hi, thank you for your nice work. I can confirm it works and yield nice results even if it is trained on a custom dataset.

Can you please add a license to the repository? E.g., MIT. It is necessary to use the source code correctly and not 'steal' it. Thank you:)

Pretrained model

Hi, I am using SwinT with mask R-CNN and everything compile and was easy to setup.
However, when I use the pretrained weights (mask_rcnn_swint_T_coco17.pth) to finetuned the model on my custom dataset, I can't seem to acheive over 35 APbb. With ResNet I usually acheive over 60 APbb. Is the pretrained weight the one you got on coco or imagenet? Any tips on what could cause the AP difference?

Thank you

How to change the backbone to custom transformer model DiT?

I want to fine tune DiT for object detection (text, diagrams detection only) etc for my own dataset. Been searching through the web for quite some time but could not find anything on fine tuning a Transformers backbone for object detection.

  1. I know how to fine tune Detectron 2 for an object detection task with the default given configuration yaml files using Faster RCNN / Masked RCNN models with Resnet or any other backbone CNN models but I don't know how to do it with Transformers models.

  2. This github issues for DETR for custom backbone describes how to change the backbone as the author said that you can use ANY models from timm library and since there are almost 890 models present but unfortunately, not DiT.

  3. DiT is also present as a HuggingFace model and supports Feature Extraction as BeitFeatureExtractor.from_pretrained("microsoft/dit-large") so I think it could be used as a backbone but I found nothing on this one either.

I tried changing the code on how to train DETR on custom data by replacing code in Cell 8,

#feature_extractor = DetrFeatureExtractor.from_pretrained("facebook/detr-resnet-50")

feature_extractor = BeitFeatureExtractor.from_pretrained("microsoft/dit-large")

but while running the code for Cell 11,

from torch.utils.data import DataLoader

def collate_fn(batch):
  pixel_values = [item[0] for item in batch]
  encoding = feature_extractor.pad_and_create_pixel_mask(pixel_values, return_tensors="pt")
  labels = [item[1] for item in batch]
  batch = {}
  batch['pixel_values'] = encoding['pixel_values']
  batch['pixel_mask'] = encoding['pixel_mask']
  batch['labels'] = labels
  return batch

train_dataloader = DataLoader(train_dataset, collate_fn=collate_fn, batch_size=4, shuffle=True)
val_dataloader = DataLoader(val_dataset, collate_fn=collate_fn, batch_size=2)
batch = next(iter(train_dataloader))

it gave me error as:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-11-446d81c845dd> in <module>
     13 train_dataloader = DataLoader(train_dataset, collate_fn=collate_fn, batch_size=4, shuffle=True)
     14 val_dataloader = DataLoader(val_dataset, collate_fn=collate_fn, batch_size=2)
---> 15 batch = next(iter(train_dataloader))

5 frames
/usr/local/lib/python3.7/dist-packages/transformers/feature_extraction_utils.py in __getitem__(self, item)
     85         """
     86         if isinstance(item, str):
---> 87             return self.data[item]
     88         else:
     89             raise KeyError("Indexing with integers is not available when using Python based feature extractors")

KeyError: 'labels'

Can someone please help me with the problem t hand?

ANY architecture like Faster RCNN, DETR etc and ANY repo or platform like Detectron 2, PaddleDetection, MMDetection, HuggingFace, EfficientDet would do.

Instance Segmentation

Thank you for the awesome project. Can the Mask R-CNN model be trained for instance segmentation?

cfg = get_cfg()
add_swint_config(cfg)
cfg.MODEL.WEIGHTS = "/content/drive/MyDrive/ybigta1/mask_rcnn_swint_T_coco17.pth"

I have wrote as such, but the training does not give mask mAP.

Images random resized

Thanks for your works!

I want to know whether the Images size can be random resized?

Where do you define the number of training steps &/ epochs?

Hey,

Thanks a lot for the repo. It is fairly easy to use and get it to work.

However, I am having trouble figuring out how can I set the number of training cycles or the number of epochs for training. I am only using 275 training images but the interface shows more than a day of training time which is - I assume - abnormally high.

Thanks

I want to check if the model is trained well so I do inference but in cfg.merge_from_file, MODEL.SWINT Non-existent error comes up again

from detectron2.utils.visualizer import ColorMode
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import json
import pandas as pd
from random import randint
import torch, torchvision

import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

import numpy as np
import cv2
import random
import matplotlib.pyplot as plt

from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog

from swint import add_swint_config

from detectron2.data.datasets import register_coco_instances
register_coco_instances("data_train", {}, "/home/sangjoon/detectron2/sangjoon/for_newthing_0331/white_train.json", "/home/sangjoon/detectron2/sangjoon/white_train2020")
register_coco_instances("data_val", {}, "/home/sangjoon/detectron2/sangjoon/for_newthing_0331/white_test.json", "/home/sangjoon/detectron2/sangjoon/white_test2020")

import os
import numpy as np
import json
from detectron2.structures import BoxMode
import itertools

from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg

cfg = get_cfg()

cfg.merge_from_file("/home/sangjoon/detectron2/configs/COCO-Detection/faster_rcnn_swint_T_FPN_3x_.yaml") cfg.DATALOADER.NUM_WORKERS = 4
cfg.MODEL.WEIGHTS = "/home/sangjoon/SwinT_detection2/real_white_weights/model_0015499.pth" # initialize from model zoo

trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)

cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # set the testing threshold for this model
predictor = DefaultPredictor(cfg)


KeyError Traceback (most recent call last)
in
1 cfg = get_cfg()
2 from swint import add_swint_config
----> 3 cfg.merge_from_file("/home/sangjoon/detectron2/configs/COCO-Detection/faster_rcnn_swint_T_FPN_3x_.yaml")
4 # cfg.DATALOADER.NUM_WORKERS = 4
5 cfg.MODEL.WEIGHTS = "/home/sangjoon/SwinT_detection2/real_white_weights/model_0015499.pth" # initialize from model zoo

~/detectron2/detectron2/config/config.py in merge_from_file(self, cfg_filename, allow_unsafe)
52
53 if loaded_ver == self.VERSION:
---> 54 self.merge_from_other_cfg(loaded_cfg)
55 else:
56 # compat.py needs to import CfgNode

~/.conda/envs/mmdetection/lib/python3.7/site-packages/fvcore/common/config.py in merge_from_other_cfg(self, cfg_other)
121 BASE_KEY not in cfg_other
122 ), "The reserved key '{}' can only be used in files!".format(BASE_KEY)
--> 123 return super().merge_from_other_cfg(cfg_other)
124
125 def merge_from_list(self, cfg_list: List[str]) -> Callable[[], None]:

~/.conda/envs/mmdetection/lib/python3.7/site-packages/yacs/config.py in merge_from_other_cfg(self, cfg_other)
215 def merge_from_other_cfg(self, cfg_other):
216 """Merge cfg_other into this CfgNode."""
--> 217 _merge_a_into_b(cfg_other, self, self, [])
218
219 def merge_from_list(self, cfg_list):

~/.conda/envs/mmdetection/lib/python3.7/site-packages/yacs/config.py in _merge_a_into_b(a, b, root, key_list)
476 if isinstance(v, CfgNode):
477 try:
--> 478 _merge_a_into_b(v, b[k], root, key_list + [k])
479 except BaseException:
480 raise

~/.conda/envs/mmdetection/lib/python3.7/site-packages/yacs/config.py in _merge_a_into_b(a, b, root, key_list)
489 root.raise_key_rename_error(full_key)
490 else:
--> 491 raise KeyError("Non-existent config key: {}".format(full_key))
492
493

KeyError: 'Non-existent config key: MODEL.SWINT'

How do I fix it??

faster_rcnn训练ap为nan

xiaohu,非常感谢你伟大的工作,
我用在上面训练faster-Rcnn,训练几万步,ap为nan,我也采用你faster_rcnn_swint_T.pth作为pre-tained; 我也一直用detectron2,还是比较熟悉这个开源框架。找不到原因,请明示!!

感谢!

训练Retinanet,提示This error indicates that your module has parameters that were not used in producing loss.

你好,我用这套代码训练Retinanet,FPN的参数设置为["stage2","stage3", "stage4", "stage5"]时,会提示错误
配置如下:
image

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. 
This error indicates that your module has parameters that were not used in producing loss. 
You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel;
 (2) making sure all forward function outputs participate in calculating loss. I
f you already have done the above two steps, then the distributed data parallel module wasn’t able to locate the output tensors in the return value of your module’s forward function.

看提示是有模型有冗余,在训练过程中部分模型没有提供loss,参考建议(1),添加find_unused_parameters=True后,代码可以正常运行。但我没有定位到是哪部分导致的,能麻烦帮忙解决一下是模型的哪部分造成的么?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.