xiaohu2015 / swint_detectron2 Goto Github PK

License: MIT License

Python 100.00%

swint_detectron2's Introduction

SwinT_detectron2

Swin Transformer for Object Detection by detectron2

This repo contains the supported code and configuration files to reproduce object detection results of Swin Transformer. It is based on detectron2.

You can find SwinV2 in this repo

Results and Models

RetinaNet

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs	config	log	model
Swin-T	ImageNet-1K	3x	44.6	-	-	-	config	-	model

Faster R-CNN

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs	config	log	model
Swin-T FPN	ImageNet-1K	3x	45.1	-	-	-	config	-	model

Mask R-CNN

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs	config	log	model
Swin-T FPN	ImageNet-1K	3x	45.5	41.8	-	-	config	-	model

The mask mAP (41.8 vs 41.6) is same as the mmdetection, but box mAP is worse (45.5 vs 46.0)

Usage

Please refer to get_started.md for installation and dataset preparation.

note: you need convert the original pretrained weights to d2 format by convert_to_d2.py

References

swint_detectron2's People

Contributors

Stargazers

Watchers

Forkers

niuniudie l3str4nge yuhtc haojunyu1998 faceanalysis explain116 lg12170226 cenchaojun yenchiang as85207 brakuta yexiguafuqihao nobelvictory phoenix9032 emiz6413 oldsuperman shivamsnaik lb-hit 23119841 12341123 jasonyank cv-ip yangyanggirl yangdesheng sixk stefan-matcovici mcsitar abyssbjc wind010321 dl-vit adws2 biegunk jingbo-l shuvo001 lyan-ing ulteraa zzs4026 insiro anthonyweidai anshulagrawal2902

swint_detectron2's Issues

RuntimeError: nvrtc: error: failed to open libnvrtc-builtins.so.11.1

i use card 3090. It seems only support cuda 11.2.

Faster R-CNN Pre-trained Model

Hi, thank you very much for your graceful work.
I would like to ask if you have any plans to release the Faster R-CNN pre-trained model? I would appreciate it if you could do this, it would help me a lot.
Thank you very much!

Detectron2 version

Hi, thank you for your nice work.
Can you tell me which detectron2 version have been used?
I have a problem when evaluate the model(training by myself or downloading from the link)

[05/06 11:12:29] d2.evaluation.coco_evaluation WARNING: No predictions from the model!
[05/06 11:12:29] d2.engine.defaults INFO: Evaluation results for coco_2017_val in csv format:
[05/06 11:12:29] d2.evaluation.testing INFO: copypaste: Task: bbox
[05/06 11:12:29] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl
[05/06 11:12:29] d2.evaluation.testing INFO: copypaste: nan,nan,nan,nan,nan,nan

I want to make sure whether it is a version issue, thanks~

About the train

Thank you for your nice project!
I called your swint-Detectron2 code to train my own data set. I used the network configuration of retinanet. No error was reported during the training process,iteration=200000 AP50=0.7 or so, and the test set is only AP50=0.5 or so. I have changed pixel_mean according to my dataset.
I would like to know what other parameters should be paid attention to in training, which may be helpful for my training!

note: you need convert the original pretrained weights to d2 format by convert_to_d2.py

Thanks for your nice project.

What do the note mean?

the weight i download from the “model” which in the table, that weight can't be used directly?

the downloaded weight also have to convert to d2 format?

I also want to know that the metric ”boxmap” is evaluated on the which datasets.

ModuleNotFoundError: No module named 'timm'

Thanks for your works.

I want to know what is timm in swin_transformer.py, line 11

Swintransformer结构问题

你好，我想问下model中swintransformer好像并不是级联的结构

输出的四个stage好像是互不影响的，请问这样的原因是什么呢?

pred_masks

Thank you for your nice project! i trained my own data with your backbone then i want to inference my result.but it shows that it Cannot find field 'pred_masks' in the given Instances! .i use mask_rcnn_swint_T_FPN_3x.yaml as my config.

faster_rcnn训练ap为nan ，点击错了，希望重新打开 #25

faster_rcnn训练ap为nan ，点击错了，希望重新打开

My version of Detectron2

Hi,
My code work with Detectron 0.3 and I want change it's backbone from Resnet to SwinT,
you said your code work with Detectron 0.6, Do you have any idea about solve this problem?

Thanks

Support for FRCN C4

Hi, thanks for your excellent work. I wonder if Swin-Transformer supports FRCN C-4? How to write the config? Thanks!

RGB 和BGR以及mean，std

请问，RGB 和BGR以及mean，std是怎么对应的

version of torch

Hi,
Thanks for sharing your Code. I tried to run convert_to_d2.py file to create model file. But i have an error and I guess it's because of torch version. Can you help me about step of running and please tell which version torch i should to use.

Thank you

Sudden increment in loss values during training

About eval

The metrics of the results are nan.

The train_net.py as follow:

#!/usr/bin/env python

Copyright (c) Facebook, Inc. and its affiliates.

"""
Detection Training Script.
This scripts reads a given config file and runs the training or evaluation.
It is an entry point that is made to train standard models in detectron2.
In order to let one script support training of many models,
this script contains logic that are specific to these built-in models and therefore
may not be suitable for your own project.
For example, your research project perhaps only needs a single "evaluator".
Therefore, we recommend you to use detectron2 as an library and take
this file as an example of how to use the library.
You may want to write your own script with your datasets and other customizations.
"""
import itertools
import logging
import os
from collections import OrderedDict
import torch

import detectron2.utils.comm as comm
from detectron2.checkpoint import DetectionCheckpointer
from detectron2.config import get_cfg
from detectron2.data import MetadataCatalog
from detectron2.engine import DefaultTrainer, default_argument_parser, default_setup, hooks, launch
from detectron2.evaluation import (
CityscapesInstanceEvaluator,
CityscapesSemSegEvaluator,
COCOEvaluator,
COCOPanopticEvaluator,
DatasetEvaluators,
LVISEvaluator,
PascalVOCDetectionEvaluator,
SemSegEvaluator,
verify_results,
)
from detectron2.modeling import GeneralizedRCNNWithTTA
from detectron2.solver.build import maybe_add_gradient_clipping, get_default_optimizer_params

from swint import add_swint_config
import pig_dataset

class Trainer(DefaultTrainer):
"""
We use the "DefaultTrainer" which contains pre-defined default logic for
standard training workflow. They may not work for you, especially if you
are working on a new research project. In that case you can write your
own training loop. You can use "tools/plain_train_net.py" as an example.
"""

@classmethod
def build_evaluator(cls, cfg, dataset_name, output_folder=None):
    """
    Create evaluator(s) for a given dataset.
    This uses the special metadata "evaluator_type" associated with each builtin dataset.
    For your own dataset, you can simply create an evaluator manually in your
    script and do not have to worry about the hacky if-else logic here.
    """
    if output_folder is None:
        output_folder = os.path.join(cfg.OUTPUT_DIR, "inference")
    evaluator_list = []
    evaluator_type = MetadataCatalog.get(dataset_name).evaluator_type
    if evaluator_type in ["sem_seg", "coco_panoptic_seg"]:
        evaluator_list.append(
            SemSegEvaluator(
                dataset_name,
                distributed=True,
                num_classes=cfg.MODEL.SEM_SEG_HEAD.NUM_CLASSES,
                ignore_label=cfg.MODEL.SEM_SEG_HEAD.IGNORE_VALUE,
                output_dir=output_folder,
            )
        )
    if evaluator_type in ["coco", "coco_panoptic_seg"]:
        evaluator_list.append(COCOEvaluator(dataset_name, cfg, True, output_folder))
    if evaluator_type == "coco_panoptic_seg":
        evaluator_list.append(COCOPanopticEvaluator(dataset_name, output_folder))
    if evaluator_type == "cityscapes_instance":
        assert (
            torch.cuda.device_count() >= comm.get_rank()
        ), "CityscapesEvaluator currently do not work with multiple machines."
        return CityscapesInstanceEvaluator(dataset_name)
    if evaluator_type == "cityscapes_sem_seg":
        assert (
            torch.cuda.device_count() >= comm.get_rank()
        ), "CityscapesEvaluator currently do not work with multiple machines."
        return CityscapesSemSegEvaluator(dataset_name)
    elif evaluator_type == "pascal_voc":
        return PascalVOCDetectionEvaluator(dataset_name)
    elif evaluator_type == "lvis":
        return LVISEvaluator(dataset_name, cfg, True, output_folder)
    if len(evaluator_list) == 0:
        raise NotImplementedError(
            "no Evaluator for the dataset {} with the type {}".format(
                dataset_name, evaluator_type
            )
        )
    elif len(evaluator_list) == 1:
        return evaluator_list[0]
    return DatasetEvaluators(evaluator_list)

@classmethod
def test_with_TTA(cls, cfg, model):
    logger = logging.getLogger("detectron2.trainer")
    # In the end of training, run an evaluation with TTA
    # Only support some R-CNN models.
    logger.info("Running inference with test-time augmentation ...")
    model = GeneralizedRCNNWithTTA(cfg, model)
    evaluators = [
        cls.build_evaluator(
            cfg, name, output_folder=os.path.join(cfg.OUTPUT_DIR, "inference_TTA")
        )
        for name in cfg.DATASETS.TEST
    ]
    res = cls.test(cfg, model, evaluators)
    res = OrderedDict({k + "_TTA": v for k, v in res.items()})
    return res

@classmethod
def build_optimizer(cls, cfg, model):
    params = get_default_optimizer_params(
        model,
        base_lr=cfg.SOLVER.BASE_LR,
        weight_decay=cfg.SOLVER.WEIGHT_DECAY,
        weight_decay_norm=cfg.SOLVER.WEIGHT_DECAY_NORM,
        bias_lr_factor=cfg.SOLVER.BIAS_LR_FACTOR,
        weight_decay_bias=cfg.SOLVER.WEIGHT_DECAY_BIAS,
    )

    def maybe_add_full_model_gradient_clipping(optim):  # optim: the optimizer class
        # detectron2 doesn't have full model gradient clipping now
        clip_norm_val = cfg.SOLVER.CLIP_GRADIENTS.CLIP_VALUE
        enable = (
            cfg.SOLVER.CLIP_GRADIENTS.ENABLED
            and cfg.SOLVER.CLIP_GRADIENTS.CLIP_TYPE == "full_model"
            and clip_norm_val > 0.0
        )

        class FullModelGradientClippingOptimizer(optim):
            def step(self, closure=None):
                all_params = itertools.chain(*[x["params"] for x in self.param_groups])
                torch.nn.utils.clip_grad_norm_(all_params, clip_norm_val)
                super().step(closure=closure)

        return FullModelGradientClippingOptimizer if enable else optim

    optimizer_type = cfg.SOLVER.OPTIMIZER
    if optimizer_type == "SGD":
        optimizer = maybe_add_gradient_clipping(torch.optim.SGD)(
            params, cfg.SOLVER.BASE_LR, momentum=cfg.SOLVER.MOMENTUM,
            nesterov=cfg.SOLVER.NESTEROV,
            weight_decay=cfg.SOLVER.WEIGHT_DECAY,
        )
    elif optimizer_type == "AdamW":
        optimizer = maybe_add_full_model_gradient_clipping(torch.optim.AdamW)(
            params, cfg.SOLVER.BASE_LR, betas=(0.9, 0.999),
            weight_decay=cfg.SOLVER.WEIGHT_DECAY,
        )

    else:
        raise NotImplementedError("no optimizer type {optimizer_type}")
    return optimizer

def setup(args):
"""
Create configs and perform basic setups.
"""
cfg = get_cfg()
add_swint_config(cfg)
args.config_file = "./configs/SwinT/retinanet_swint_T_FPN_3x.yaml"####
cfg.merge_from_file(args.config_file)
cfg.MODEL.WEIGHTS = "weights/retinanet_swint_S_3x.pth"###
cfg.DATASETS.TRAIN = ("pig_coco_train",)
cfg.DATASETS.TEST = ("pig_coco_test", )
cfg.merge_from_list(args.opts)
cfg.freeze()
default_setup(cfg, args)
return cfg

def main(args):
cfg = setup(args)
args.eval_only = True

if args.eval_only:
    model = Trainer.build_model(cfg)
    DetectionCheckpointer(model, save_dir=cfg.OUTPUT_DIR).resume_or_load(
        cfg.MODEL.WEIGHTS, resume=args.resume
    )
    res = Trainer.test(cfg, model)
    if cfg.TEST.AUG.ENABLED:
        res.update(Trainer.test_with_TTA(cfg, model))
    if comm.is_main_process():
        verify_results(cfg, res)
    return res

"""
If you'd like to do anything fancier than the standard training logic,
consider writing your own training loop (see plain_train_net.py) or
subclassing the trainer.
"""
trainer = Trainer(cfg)
trainer.resume_or_load(resume=args.resume)
if cfg.TEST.AUG.ENABLED:
    trainer.register_hooks(
        [hooks.EvalHook(0, lambda: trainer.test_with_TTA(cfg, trainer.model))]
    )
return trainer.train()

if name == "main":
args = default_argument_parser().parse_args()
print("Command Line Args:", args)
launch(
main,
args.num_gpus,
num_machines=args.num_machines,
machine_rank=args.machine_rank,
dist_url=args.dist_url,
args=(args,),
)

=================================================================================================
The output as follow:

/home/server/anaconda3/envs/swin_d2/bin/python /home/server/SwinT_detectron2/train_net.py
Command Line Args: Namespace(config_file='', dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
Loading config ./configs/SwinT/../Base-RetinaNet.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content.
[08/26 12:26:16 detectron2]: Rank of current process: 0. World size: 1
[08/26 12:26:17 detectron2]: Environment info:

sys.platform linux
Python 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]
numpy 1.19.2
detectron2 0.5 @/home/server/anaconda3/envs/swin_d2/lib/python3.7/site-packages/detectron2
Compiler GCC 7.3
CUDA compiler CUDA 11.1
detectron2 arch flags 3.7, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.0, 8.6
DETECTRON2_ENV_MODULE
PyTorch 1.9.0 @/home/server/anaconda3/envs/swin_d2/lib/python3.7/site-packages/torch
PyTorch debug build False
GPU available Yes
GPU 0 NVIDIA GeForce RTX 3090 (arch=8.6)
CUDA_HOME /usr/local/cuda
Pillow 8.3.1
torchvision 0.10.0 @/home/server/anaconda3/envs/swin_d2/lib/python3.7/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20210804
iopath 0.1.8
cv2 4.4.0

PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.0.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

[08/26 12:26:17 detectron2]: Command line arguments: Namespace(config_file='./configs/SwinT/retinanet_swint_T_FPN_3x.yaml', dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[08/26 12:26:17 detectron2]: Contents of args.config_file=./configs/SwinT/retinanet_swint_T_FPN_3x.yaml:
BASE: "../Base-RetinaNet.yaml"
MODEL:
WEIGHTS: "weights/retinanet_swint_S_3x.pth"
PIXEL_MEAN: [123.675, 116.28, 103.53] # use RGB [103.530, 116.280, 123.675]
PIXEL_STD: [58.395, 57.12, 57.375] #[57.375, 57.120, 58.395] # I use the dafault config [1.0, 1.0, 1.0] and BGR format, that is a mistake
RESNETS:
DEPTH: 50
BACKBONE:
NAME: "build_retinanet_swint_fpn_backbone"
SWINT:
OUT_FEATURES: ["stage3", "stage4", "stage5"]
FPN:
IN_FEATURES: ["stage3", "stage4", "stage5"]
INPUT:
FORMAT: "RGB"
SOLVER:
STEPS: (210000, 250000)
MAX_ITER: 270000
WEIGHT_DECAY: 0.05
BASE_LR: 0.0001
AMP:
ENABLED: True
TEST:
EVAL_PERIOD: 30000

DATASETS:
TRAIN: ("coco_2017_train",)
TEST: ("coco_2017_val",)

[08/26 12:26:17 detectron2]: Running with full config:
CUDNN_BENCHMARK: false
DATALOADER:
ASPECT_RATIO_GROUPING: true
FILTER_EMPTY_ANNOTATIONS: true
NUM_WORKERS: 4
REPEAT_THRESHOLD: 0.0
SAMPLER_TRAIN: TrainingSampler
DATASETS:
PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: []
PROPOSAL_FILES_TRAIN: []
TEST:

pig_coco_test
TRAIN:
pig_coco_train
GLOBAL:
HACK: 1.0
INPUT:
CROP:
ENABLED: false
SIZE:
- 0.9
- 0.9
  TYPE: relative_range
  FORMAT: RGB
  MASK_FORMAT: polygon
  MAX_SIZE_TEST: 1333
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MIN_SIZE_TRAIN:
640
672
704
736
768
800
MIN_SIZE_TRAIN_SAMPLING: choice
RANDOM_FLIP: horizontal
MODEL:
ANCHOR_GENERATOR:
ANGLES:
- - -90
  - 0
  - 90
    ASPECT_RATIOS:
- - 0.5
  - 1.0
  - 2.0
    NAME: DefaultAnchorGenerator
    OFFSET: 0.0
    SIZES:
- - 32
  - 40.31747359663594
  - 50.79683366298238
- - 64
  - 80.63494719327188
  - 101.59366732596476
- - 128
  - 161.26989438654377
  - 203.18733465192952
- - 256
  - 322.53978877308754
  - 406.37466930385904
- - 512
  - 645.0795775461751
  - 812.7493386077181
    BACKBONE:
    FREEZE_AT: -1
    NAME: build_retinanet_swint_fpn_backbone
    DEVICE: cuda
    FPN:
    FUSE_TYPE: sum
    IN_FEATURES:
- stage3
- stage4
- stage5
  NORM: ''
  OUT_CHANNELS: 256
  TOP_LEVELS: 2
  KEYPOINT_ON: false
  LOAD_PROPOSALS: false
  MASK_ON: false
  META_ARCHITECTURE: RetinaNet
  PANOPTIC_FPN:
  COMBINE:
  ENABLED: true
  INSTANCES_CONFIDENCE_THRESH: 0.5
  OVERLAP_THRESH: 0.5
  STUFF_AREA_LIMIT: 4096
  INSTANCE_LOSS_WEIGHT: 1.0
  PIXEL_MEAN:
123.675
116.28
103.53
PIXEL_STD:
58.395
57.12
57.375
PROPOSAL_GENERATOR:
MIN_SIZE: 0
NAME: RPN
RESNETS:
DEFORM_MODULATED: false
DEFORM_NUM_GROUPS: 1
DEFORM_ON_PER_STAGE:
- false
- false
- false
- false
  DEPTH: 50
  NORM: FrozenBN
  NUM_GROUPS: 1
  OUT_FEATURES:
- res3
- res4
- res5
  RES2_OUT_CHANNELS: 256
  RES5_DILATION: 1
  STEM_OUT_CHANNELS: 64
  STRIDE_IN_1X1: true
  WIDTH_PER_GROUP: 64
  RETINANET:
  BBOX_REG_LOSS_TYPE: smooth_l1
  BBOX_REG_WEIGHTS: &id001
- 1.0
- 1.0
- 1.0
- 1.0
  FOCAL_LOSS_ALPHA: 0.25
  FOCAL_LOSS_GAMMA: 2.0
  IN_FEATURES:
- p3
- p4
- p5
- p6
- p7
  IOU_LABELS:
- 0
- -1
- 1
  IOU_THRESHOLDS:
- 0.4
- 0.5
  NMS_THRESH_TEST: 0.5
  NORM: ''
  NUM_CLASSES: 80
  NUM_CONVS: 4
  PRIOR_PROB: 0.01
  SCORE_THRESH_TEST: 0.05
  SMOOTH_L1_LOSS_BETA: 0.0
  TOPK_CANDIDATES_TEST: 1000
  ROI_BOX_CASCADE_HEAD:
  BBOX_REG_WEIGHTS:
- - 10.0
  - 10.0
  - 5.0
  - 5.0
- - 20.0
  - 20.0
  - 10.0
  - 10.0
- - 30.0
  - 30.0
  - 15.0
  - 15.0
    IOUS:
- 0.5
- 0.6
- 0.7
  ROI_BOX_HEAD:
  BBOX_REG_LOSS_TYPE: smooth_l1
  BBOX_REG_LOSS_WEIGHT: 1.0
  BBOX_REG_WEIGHTS:
- 10.0
- 10.0
- 5.0
- 5.0
  CLS_AGNOSTIC_BBOX_REG: false
  CONV_DIM: 256
  FC_DIM: 1024
  NAME: ''
  NORM: ''
  NUM_CONV: 0
  NUM_FC: 0
  POOLER_RESOLUTION: 14
  POOLER_SAMPLING_RATIO: 0
  POOLER_TYPE: ROIAlignV2
  SMOOTH_L1_BETA: 0.0
  TRAIN_ON_PRED_BOXES: false
  ROI_HEADS:
  BATCH_SIZE_PER_IMAGE: 512
  IN_FEATURES:
- res4
  IOU_LABELS:
- 0
- 1
  IOU_THRESHOLDS:
- 0.5
  NAME: Res5ROIHeads
  NMS_THRESH_TEST: 0.5
  NUM_CLASSES: 80
  POSITIVE_FRACTION: 0.25
  PROPOSAL_APPEND_GT: true
  SCORE_THRESH_TEST: 0.05
  ROI_KEYPOINT_HEAD:
  CONV_DIMS:
- 512
- 512
- 512
- 512
- 512
- 512
- 512
- 512
  LOSS_WEIGHT: 1.0
  MIN_KEYPOINTS_PER_IMAGE: 1
  NAME: KRCNNConvDeconvUpsampleHead
  NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
  NUM_KEYPOINTS: 17
  POOLER_RESOLUTION: 14
  POOLER_SAMPLING_RATIO: 0
  POOLER_TYPE: ROIAlignV2
  ROI_MASK_HEAD:
  CLS_AGNOSTIC_MASK: false
  CONV_DIM: 256
  NAME: MaskRCNNConvUpsampleHead
  NORM: ''
  NUM_CONV: 0
  POOLER_RESOLUTION: 14
  POOLER_SAMPLING_RATIO: 0
  POOLER_TYPE: ROIAlignV2
  RPN:
  BATCH_SIZE_PER_IMAGE: 256
  BBOX_REG_LOSS_TYPE: smooth_l1
  BBOX_REG_LOSS_WEIGHT: 1.0
  BBOX_REG_WEIGHTS: *id001
  BOUNDARY_THRESH: -1
  CONV_DIMS:
- -1
  HEAD_NAME: StandardRPNHead
  IN_FEATURES:
- res4
  IOU_LABELS:
- 0
- -1
- 1
  IOU_THRESHOLDS:
- 0.3
- 0.7
  LOSS_WEIGHT: 1.0
  NMS_THRESH: 0.7
  POSITIVE_FRACTION: 0.5
  POST_NMS_TOPK_TEST: 1000
  POST_NMS_TOPK_TRAIN: 2000
  PRE_NMS_TOPK_TEST: 6000
  PRE_NMS_TOPK_TRAIN: 12000
  SMOOTH_L1_BETA: 0.0
  SEM_SEG_HEAD:
  COMMON_STRIDE: 4
  CONVS_DIM: 128
  IGNORE_VALUE: 255
  IN_FEATURES:
- p2
- p3
- p4
- p5
  LOSS_WEIGHT: 1.0
  NAME: SemSegFPNHead
  NORM: GN
  NUM_CLASSES: 54
  SWINT:
  APE: false
  DEPTHS:
- 2
- 2
- 6
- 2
  DROP_PATH_RATE: 0.2
  EMBED_DIM: 96
  MLP_RATIO: 4
  NUM_HEADS:
- 3
- 6
- 12
- 24
  OUT_FEATURES:
- stage3
- stage4
- stage5
  WINDOW_SIZE: 7
  WEIGHTS: weights/retinanet_swint_S_3x.pth
  OUTPUT_DIR: ./output
  SEED: -1
  SOLVER:
  AMP:
  ENABLED: true
  BASE_LR: 0.0001
  BIAS_LR_FACTOR: 1.0
  CHECKPOINT_PERIOD: 5000
  CLIP_GRADIENTS:
  CLIP_TYPE: value
  CLIP_VALUE: 1.0
  ENABLED: false
  NORM_TYPE: 2.0
  GAMMA: 0.1
  IMS_PER_BATCH: 16
  LR_SCHEDULER_NAME: WarmupMultiStepLR
  MAX_ITER: 270000
  MOMENTUM: 0.9
  NESTEROV: false
  OPTIMIZER: AdamW
  REFERENCE_WORLD_SIZE: 0
  STEPS:
210000
250000
WARMUP_FACTOR: 0.001
WARMUP_ITERS: 1000
WARMUP_METHOD: linear
WEIGHT_DECAY: 0.05
WEIGHT_DECAY_BIAS: 0.0001
WEIGHT_DECAY_NORM: 0.0
TEST:
AUG:
ENABLED: false
FLIP: true
MAX_SIZE: 4000
MIN_SIZES:
- 400
- 500
- 600
- 700
- 800
- 900
- 1000
- 1100
- 1200
  DETECTIONS_PER_IMAGE: 100
  EVAL_PERIOD: 30000
  EXPECTED_RESULTS: []
  KEYPOINT_OKS_SIGMAS: []
  PRECISE_BN:
  ENABLED: false
  NUM_ITER: 200
  VERSION: 2
  VIS_PERIOD: 0

[08/26 12:26:17 detectron2]: Full config saved to ./output/config.yaml
[08/26 12:26:17 d2.utils.env]: Using a generated random seed 17194009
[08/26 12:26:18 d2.engine.defaults]: Model:
RetinaNet(
(backbone): FPN(
(fpn_lateral3): Conv2d(192, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_lateral4): Conv2d(384, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_lateral5): Conv2d(768, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(top_block): LastLevelP6P7(
(p6): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(p7): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
(bottom_up): SwinTransformer(
(patch_embed): PatchEmbed(
(proj): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
(norm): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
)
(pos_drop): Dropout(p=0.0, inplace=False)
(layers): ModuleList(
(0): BasicLayer(
(blocks): ModuleList(
(0): SwinTransformerBlock(
(norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=96, out_features=288, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=96, out_features=96, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): Identity()
(norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=96, out_features=384, bias=True)
(act): GELU()
(fc2): Linear(in_features=384, out_features=96, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
(norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=96, out_features=288, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=96, out_features=96, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=96, out_features=384, bias=True)
(act): GELU()
(fc2): Linear(in_features=384, out_features=96, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
(reduction): Linear(in_features=384, out_features=192, bias=False)
(norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
)
)
(1): BasicLayer(
(blocks): ModuleList(
(0): SwinTransformerBlock(
(norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=192, out_features=576, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=192, out_features=192, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=192, out_features=768, bias=True)
(act): GELU()
(fc2): Linear(in_features=768, out_features=192, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
(norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=192, out_features=576, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=192, out_features=192, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=192, out_features=768, bias=True)
(act): GELU()
(fc2): Linear(in_features=768, out_features=192, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
(reduction): Linear(in_features=768, out_features=384, bias=False)
(norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
)
(2): BasicLayer(
(blocks): ModuleList(
(0): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU()
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU()
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(2): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU()
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(3): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU()
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(4): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU()
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(5): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU()
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
(reduction): Linear(in_features=1536, out_features=768, bias=False)
(norm): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
)
)
(3): BasicLayer(
(blocks): ModuleList(
(0): SwinTransformerBlock(
(norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=768, out_features=2304, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=768, out_features=768, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(act): GELU()
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
(norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=768, out_features=2304, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=768, out_features=768, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(act): GELU()
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
)
)
(norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
)
(head): RetinaNetHead(
(cls_subnet): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU()
(6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU()
)
(bbox_subnet): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU()
(6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU()
)
(cls_score): Conv2d(256, 720, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bbox_pred): Conv2d(256, 36, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(anchor_generator): DefaultAnchorGenerator(
(cell_anchors): BufferList()
)
)
[08/26 12:26:18 fvcore.common.checkpoint]: [Checkpointer] Loading from weights/retinanet_swint_S_3x.pth ...
WARNING [08/26 12:26:19 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model:
pixel_mean
pixel_std
anchor_generator.cell_anchors.{0, 1, 2, 3, 4}
[08/26 12:26:19 d2.data.datasets.coco]: Loaded 5000 images in COCO format from ./datasets/coco2017/annotations/instances_val2017.json
[08/26 12:26:19 d2.data.build]: Distribution of instances among all 80 categories:

category	#instances	category	#instances	category	#instances
person	10777	bicycle	314	car	1918
motorcycle	367	airplane	143	bus	283
train	190	truck	414	boat	424
traffic light	634	fire hydrant	101	stop sign	75
parking meter	60	bench	411	bird	427
cat	202	dog	218	horse	272
sheep	354	cow	372	elephant	252
bear	71	zebra	266	giraffe	232
backpack	371	umbrella	407	handbag	540
tie	252	suitcase	299	frisbee	115
skis	241	snowboard	69	sports ball	260
kite	327	baseball bat	145	baseball gl..	148
skateboard	179	surfboard	267	tennis racket	225
bottle	1013	wine glass	341	cup	895
fork	215	knife	325	spoon	253
bowl	623	banana	370	apple	236
sandwich	177	orange	285	broccoli	312
carrot	365	hot dog	125	pizza	284
donut	328	cake	310	chair	1771
couch	261	potted plant	342	bed	163
dining table	695	toilet	179	tv	288
laptop	231	mouse	106	remote	283
keyboard	153	cell phone	262	microwave	55
oven	143	toaster	9	sink	225
refrigerator	126	book	1129	clock	267
vase	274	scissors	36	teddy bear	190
hair drier	11	toothbrush	57
total	36335
[08/26 12:26:19 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[08/26 12:26:19 d2.data.common]: Serializing 5000 elements to byte tensors and concatenating them all ...
[08/26 12:26:19 d2.data.common]: Serialized dataset takes 19.13 MiB
WARNING [08/26 12:26:19 d2.evaluation.coco_evaluation]: COCO Evaluator instantiated using config, this is deprecated behavior. Please pass in explicit arguments instead.
[08/26 12:26:20 d2.evaluation.evaluator]: Start inference on 5000 batches
/home/server/anaconda3/envs/swin_d2/lib/python3.7/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
[08/26 12:26:21 d2.evaluation.evaluator]: Inference done 11/5000. Dataloading: 0.0005 s/iter. Inference: 0.0410 s/iter. Eval: 0.0000 s/iter. Total: 0.0415 s/iter. ETA=0:03:27
[08/26 12:26:26 d2.evaluation.evaluator]: Inference done 130/5000. Dataloading: 0.0007 s/iter. Inference: 0.0415 s/iter. Eval: 0.0000 s/iter. Total: 0.0423 s/iter. ETA=0:03:25
[08/26 12:26:31 d2.evaluation.evaluator]: Inference done 252/5000. Dataloading: 0.0007 s/iter. Inference: 0.0408 s/iter. Eval: 0.0000 s/iter. Total: 0.0416 s/iter. ETA=0:03:17
[08/26 12:26:36 d2.evaluation.evaluator]: Inference done 374/5000. Dataloading: 0.0007 s/iter. Inference: 0.0407 s/iter. Eval: 0.0000 s/iter. Total: 0.0415 s/iter. ETA=0:03:12
[08/26 12:26:41 d2.evaluation.evaluator]: Inference done 493/5000. Dataloading: 0.0007 s/iter. Inference: 0.0409 s/iter. Eval: 0.0000 s/iter. Total: 0.0417 s/iter. ETA=0:03:07
[08/26 12:26:46 d2.evaluation.evaluator]: Inference done 619/5000. Dataloading: 0.0007 s/iter. Inference: 0.0405 s/iter. Eval: 0.0000 s/iter. Total: 0.0413 s/iter. ETA=0:03:00
[08/26 12:26:51 d2.evaluation.evaluator]: Inference done 741/5000. Dataloading: 0.0007 s/iter. Inference: 0.0405 s/iter. Eval: 0.0000 s/iter. Total: 0.0413 s/iter. ETA=0:02:55
[08/26 12:26:56 d2.evaluation.evaluator]: Inference done 864/5000. Dataloading: 0.0007 s/iter. Inference: 0.0404 s/iter. Eval: 0.0000 s/iter. Total: 0.0412 s/iter. ETA=0:02:50
[08/26 12:27:01 d2.evaluation.evaluator]: Inference done 985/5000. Dataloading: 0.0007 s/iter. Inference: 0.0404 s/iter. Eval: 0.0000 s/iter. Total: 0.0413 s/iter. ETA=0:02:45
[08/26 12:27:06 d2.evaluation.evaluator]: Inference done 1107/5000. Dataloading: 0.0007 s/iter. Inference: 0.0404 s/iter. Eval: 0.0000 s/iter. Total: 0.0412 s/iter. ETA=0:02:40
[08/26 12:27:11 d2.evaluation.evaluator]: Inference done 1227/5000. Dataloading: 0.0008 s/iter. Inference: 0.0405 s/iter. Eval: 0.0000 s/iter. Total: 0.0413 s/iter. ETA=0:02:35
[08/26 12:27:16 d2.evaluation.evaluator]: Inference done 1343/5000. Dataloading: 0.0008 s/iter. Inference: 0.0407 s/iter. Eval: 0.0000 s/iter. Total: 0.0415 s/iter. ETA=0:02:31
[08/26 12:27:21 d2.evaluation.evaluator]: Inference done 1467/5000. Dataloading: 0.0008 s/iter. Inference: 0.0406 s/iter. Eval: 0.0000 s/iter. Total: 0.0414 s/iter. ETA=0:02:26
[08/26 12:27:26 d2.evaluation.evaluator]: Inference done 1592/5000. Dataloading: 0.0008 s/iter. Inference: 0.0405 s/iter. Eval: 0.0000 s/iter. Total: 0.0413 s/iter. ETA=0:02:20
[08/26 12:27:31 d2.evaluation.evaluator]: Inference done 1717/5000. Dataloading: 0.0008 s/iter. Inference: 0.0404 s/iter. Eval: 0.0000 s/iter. Total: 0.0412 s/iter. ETA=0:02:15
[08/26 12:27:36 d2.evaluation.evaluator]: Inference done 1842/5000. Dataloading: 0.0008 s/iter. Inference: 0.0403 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:02:09
[08/26 12:27:41 d2.evaluation.evaluator]: Inference done 1967/5000. Dataloading: 0.0008 s/iter. Inference: 0.0403 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:02:04
[08/26 12:27:46 d2.evaluation.evaluator]: Inference done 2092/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:01:59
[08/26 12:27:51 d2.evaluation.evaluator]: Inference done 2216/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:01:54
[08/26 12:27:56 d2.evaluation.evaluator]: Inference done 2335/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:01:49
[08/26 12:28:01 d2.evaluation.evaluator]: Inference done 2457/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:01:44
[08/26 12:28:06 d2.evaluation.evaluator]: Inference done 2582/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:01:39
[08/26 12:28:11 d2.evaluation.evaluator]: Inference done 2703/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:01:34
[08/26 12:28:16 d2.evaluation.evaluator]: Inference done 2824/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:01:29
[08/26 12:28:21 d2.evaluation.evaluator]: Inference done 2946/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:01:24
[08/26 12:28:26 d2.evaluation.evaluator]: Inference done 3067/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:01:19
[08/26 12:28:31 d2.evaluation.evaluator]: Inference done 3189/5000. Dataloading: 0.0008 s/iter. Inference: 0.0403 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:01:14
[08/26 12:28:36 d2.evaluation.evaluator]: Inference done 3312/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:01:09
[08/26 12:28:41 d2.evaluation.evaluator]: Inference done 3435/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0411 s/iter. ETA=0:01:04
[08/26 12:28:46 d2.evaluation.evaluator]: Inference done 3559/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:59
[08/26 12:28:51 d2.evaluation.evaluator]: Inference done 3681/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:54
[08/26 12:28:56 d2.evaluation.evaluator]: Inference done 3804/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:49
[08/26 12:29:01 d2.evaluation.evaluator]: Inference done 3928/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:43
[08/26 12:29:06 d2.evaluation.evaluator]: Inference done 4052/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:38
[08/26 12:29:11 d2.evaluation.evaluator]: Inference done 4175/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:33
[08/26 12:29:16 d2.evaluation.evaluator]: Inference done 4299/5000. Dataloading: 0.0008 s/iter. Inference: 0.0402 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:28
[08/26 12:29:21 d2.evaluation.evaluator]: Inference done 4423/5000. Dataloading: 0.0008 s/iter. Inference: 0.0401 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:23
[08/26 12:29:27 d2.evaluation.evaluator]: Inference done 4547/5000. Dataloading: 0.0008 s/iter. Inference: 0.0401 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:18
[08/26 12:29:32 d2.evaluation.evaluator]: Inference done 4670/5000. Dataloading: 0.0008 s/iter. Inference: 0.0401 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:13
[08/26 12:29:37 d2.evaluation.evaluator]: Inference done 4792/5000. Dataloading: 0.0008 s/iter. Inference: 0.0401 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:08
[08/26 12:29:42 d2.evaluation.evaluator]: Inference done 4912/5000. Dataloading: 0.0008 s/iter. Inference: 0.0401 s/iter. Eval: 0.0000 s/iter. Total: 0.0410 s/iter. ETA=0:00:03
[08/26 12:29:45 d2.evaluation.evaluator]: Total inference time: 0:03:24.816953 (0.041004 s / iter per device, on 1 devices)
[08/26 12:29:45 d2.evaluation.evaluator]: Total inference pure compute time: 0:03:20 (0.040161 s / iter per device, on 1 devices)
[08/26 12:29:45 d2.evaluation.coco_evaluation]: Preparing results for COCO format ...
[08/26 12:29:45 d2.evaluation.coco_evaluation]: Saving results to ./output/inference/coco_instances_results.json
[08/26 12:29:45 d2.evaluation.coco_evaluation]: Evaluating predictions with unofficial COCO API...
WARNING [08/26 12:29:45 d2.evaluation.coco_evaluation]: No predictions from the model!
[08/26 12:29:45 d2.engine.defaults]: Evaluation results for pig_coco_test in csv format:
[08/26 12:29:45 d2.evaluation.testing]: copypaste: Task: bbox
[08/26 12:29:45 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[08/26 12:29:45 d2.evaluation.testing]: copypaste: nan,nan,nan,nan,nan,nan

Process finished with exit code 0

resnet related configuration in the yaml file

Hello, I would like to ask why there are parameters related to resnet in mask_rcnn_swint_T_FPN_3x.yaml. I think that the swin transformer replaces the original resnet as the backbone network, and there should be no resnet related configuration in the yaml file. Looking forward to your answer!

Can you give me colab code about this? i'm student.

GPU requirement

Hi, I am wondering how many GPUs you have used for reproducing the results. Is this model resource-intensive?

如何运行？

请问你的代码是可以单独运行的，还是需要将你的代码再添加进detectron2 的官方代码中再运行？

RGB order and pixel mean/std?

Hi,
Thanks for your work.
I have a question about this project.
Do we need to change RGB, PIXEL_MEAN, PIXEL_STD of the configuration, to keep consistency with the original SwinTransformer?

license

Hi, thank you for your nice work. I can confirm it works and yield nice results even if it is trained on a custom dataset.

Can you please add a license to the repository? E.g., MIT. It is necessary to use the source code correctly and not 'steal' it. Thank you:)

是否可以使用torch.jit.trace 成功呢

请问目前这种实现方式训练后的模型，是否可以trace 成可以直接被libtorch 直接调用的pt model 呢

Pretrained model

Hi, I am using SwinT with mask R-CNN and everything compile and was easy to setup.
However, when I use the pretrained weights (mask_rcnn_swint_T_coco17.pth) to finetuned the model on my custom dataset, I can't seem to acheive over 35 APbb. With ResNet I usually acheive over 60 APbb. Is the pretrained weight the one you got on coco or imagenet? Any tips on what could cause the AP difference?

Thank you

How to change the backbone to custom transformer model DiT?

I want to fine tune DiT for object detection (text, diagrams detection only) etc for my own dataset. Been searching through the web for quite some time but could not find anything on fine tuning a Transformers backbone for object detection.

I know how to fine tune Detectron 2 for an object detection task with the default given configuration yaml files using Faster RCNN / Masked RCNN models with Resnet or any other backbone CNN models but I don't know how to do it with Transformers models.
This github issues for DETR for custom backbone describes how to change the backbone as the author said that you can use ANY models from timm library and since there are almost 890 models present but unfortunately, not DiT.
DiT is also present as a HuggingFace model and supports Feature Extraction as BeitFeatureExtractor.from_pretrained("microsoft/dit-large") so I think it could be used as a backbone but I found nothing on this one either.

I tried changing the code on how to train DETR on custom data by replacing code in Cell 8,

#feature_extractor = DetrFeatureExtractor.from_pretrained("facebook/detr-resnet-50")

feature_extractor = BeitFeatureExtractor.from_pretrained("microsoft/dit-large")

but while running the code for Cell 11,

from torch.utils.data import DataLoader

def collate_fn(batch):
  pixel_values = [item[0] for item in batch]
  encoding = feature_extractor.pad_and_create_pixel_mask(pixel_values, return_tensors="pt")
  labels = [item[1] for item in batch]
  batch = {}
  batch['pixel_values'] = encoding['pixel_values']
  batch['pixel_mask'] = encoding['pixel_mask']
  batch['labels'] = labels
  return batch

train_dataloader = DataLoader(train_dataset, collate_fn=collate_fn, batch_size=4, shuffle=True)
val_dataloader = DataLoader(val_dataset, collate_fn=collate_fn, batch_size=2)
batch = next(iter(train_dataloader))

it gave me error as:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-11-446d81c845dd> in <module>
     13 train_dataloader = DataLoader(train_dataset, collate_fn=collate_fn, batch_size=4, shuffle=True)
     14 val_dataloader = DataLoader(val_dataset, collate_fn=collate_fn, batch_size=2)
---> 15 batch = next(iter(train_dataloader))

5 frames
/usr/local/lib/python3.7/dist-packages/transformers/feature_extraction_utils.py in __getitem__(self, item)
     85         """
     86         if isinstance(item, str):
---> 87             return self.data[item]
     88         else:
     89             raise KeyError("Indexing with integers is not available when using Python based feature extractors")

KeyError: 'labels'

Can someone please help me with the problem t hand?

ANY architecture like Faster RCNN, DETR etc and ANY repo or platform like Detectron 2, PaddleDetection, MMDetection, HuggingFace, EfficientDet would do.

Instance Segmentation

Thank you for the awesome project. Can the Mask R-CNN model be trained for instance segmentation?

cfg = get_cfg()
add_swint_config(cfg)
cfg.MODEL.WEIGHTS = "/content/drive/MyDrive/ybigta1/mask_rcnn_swint_T_coco17.pth"

I have wrote as such, but the training does not give mask mAP.

Images random resized

Thanks for your works!

I want to know whether the Images size can be random resized?

Where do you define the number of training steps &/ epochs?

Hey,

Thanks a lot for the repo. It is fairly easy to use and get it to work.

However, I am having trouble figuring out how can I set the number of training cycles or the number of epochs for training. I am only using 275 training images but the interface shows more than a day of training time which is - I assume - abnormally high.

Thanks

I want to check if the model is trained well so I do inference but in cfg.merge_from_file, MODEL.SWINT Non-existent error comes up again

from detectron2.utils.visualizer import ColorMode
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import json
import pandas as pd
from random import randint
import torch, torchvision

import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

import numpy as np
import cv2
import random
import matplotlib.pyplot as plt

from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog

from swint import add_swint_config

from detectron2.data.datasets import register_coco_instances
register_coco_instances("data_train", {}, "/home/sangjoon/detectron2/sangjoon/for_newthing_0331/white_train.json", "/home/sangjoon/detectron2/sangjoon/white_train2020")
register_coco_instances("data_val", {}, "/home/sangjoon/detectron2/sangjoon/for_newthing_0331/white_test.json", "/home/sangjoon/detectron2/sangjoon/white_test2020")

import os
import numpy as np
import json
from detectron2.structures import BoxMode
import itertools

from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg

cfg = get_cfg()

cfg.merge_from_file("/home/sangjoon/detectron2/configs/COCO-Detection/faster_rcnn_swint_T_FPN_3x_.yaml") cfg.DATALOADER.NUM_WORKERS = 4
cfg.MODEL.WEIGHTS = "/home/sangjoon/SwinT_detection2/real_white_weights/model_0015499.pth" # initialize from model zoo

trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)

cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # set the testing threshold for this model
predictor = DefaultPredictor(cfg)

KeyError Traceback (most recent call last)
in
1 cfg = get_cfg()
2 from swint import add_swint_config
----> 3 cfg.merge_from_file("/home/sangjoon/detectron2/configs/COCO-Detection/faster_rcnn_swint_T_FPN_3x_.yaml")
4 # cfg.DATALOADER.NUM_WORKERS = 4
5 cfg.MODEL.WEIGHTS = "/home/sangjoon/SwinT_detection2/real_white_weights/model_0015499.pth" # initialize from model zoo

~/detectron2/detectron2/config/config.py in merge_from_file(self, cfg_filename, allow_unsafe)
52
53 if loaded_ver == self.VERSION:
---> 54 self.merge_from_other_cfg(loaded_cfg)
55 else:
56 # compat.py needs to import CfgNode

~/.conda/envs/mmdetection/lib/python3.7/site-packages/fvcore/common/config.py in merge_from_other_cfg(self, cfg_other)
121 BASE_KEY not in cfg_other
122 ), "The reserved key '{}' can only be used in files!".format(BASE_KEY)
--> 123 return super().merge_from_other_cfg(cfg_other)
124
125 def merge_from_list(self, cfg_list: List[str]) -> Callable[[], None]:

~/.conda/envs/mmdetection/lib/python3.7/site-packages/yacs/config.py in merge_from_other_cfg(self, cfg_other)
215 def merge_from_other_cfg(self, cfg_other):
216 """Merge cfg_other into this CfgNode."""
--> 217 _merge_a_into_b(cfg_other, self, self, [])
218
219 def merge_from_list(self, cfg_list):

~/.conda/envs/mmdetection/lib/python3.7/site-packages/yacs/config.py in _merge_a_into_b(a, b, root, key_list)
476 if isinstance(v, CfgNode):
477 try:
--> 478 _merge_a_into_b(v, b[k], root, key_list + [k])
479 except BaseException:
480 raise

~/.conda/envs/mmdetection/lib/python3.7/site-packages/yacs/config.py in _merge_a_into_b(a, b, root, key_list)
489 root.raise_key_rename_error(full_key)
490 else:
--> 491 raise KeyError("Non-existent config key: {}".format(full_key))
492
493

KeyError: 'Non-existent config key: MODEL.SWINT'

How do I fix it??

I have a error. KeyError: 'Non-existent config key: MODEL.SWINT'

I run training using detectron2

but this error comes up - KeyError: 'Non-existent config key: MODEL.SWINT'
How do I do???

Thank you!

faster_rcnn训练ap为nan

xiaohu，非常感谢你伟大的工作，
我用在上面训练faster-Rcnn，训练几万步，ap为nan，我也采用你faster_rcnn_swint_T.pth作为pre-tained；我也一直用detectron2，还是比较熟悉这个开源框架。找不到原因，请明示！！

感谢！

训练Retinanet，提示This error indicates that your module has parameters that were not used in producing loss.

你好，我用这套代码训练Retinanet，FPN的参数设置为["stage2","stage3", "stage4", "stage5"]时，会提示错误
配置如下：

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. 
This error indicates that your module has parameters that were not used in producing loss. 
You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel;
 (2) making sure all forward function outputs participate in calculating loss. I
f you already have done the above two steps, then the distributed data parallel module wasn’t able to locate the output tensors in the return value of your module’s forward function.

看提示是有模型有冗余，在训练过程中部分模型没有提供loss，参考建议（1）,添加find_unused_parameters=True后，代码可以正常运行。但我没有定位到是哪部分导致的，能麻烦帮忙解决一下是模型的哪部分造成的么？

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.