vitae-transformer / remote-sensing-rvsa Goto Github PK

The official repo for [TGRS'22] "Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model"

License: MIT License

Python 100.00%

deep-learning foundation-model object-detection remote-sensing self-supervised-learning semantic-segmentation transfer-learning

remote-sensing-rvsa's Introduction

ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond

Updates | Introduction | Statement |

Current applications

Image Classification: Please see ViTAE-Transformer for image classification;

Object Detection: Please see ViTAE-Transformer for object detection;

Sementic Segmentation: Please see ViTAE-Transformer for semantic segmentation;

Animal Pose Estimation: Please see ViTAE-Transformer for animal pose estimation;

Matting: Please see ViTAE-Transformer for matting;

Remote Sensing: Please see ViTAE-Transformer for Remote Sensing;

Updates

09/04/2021

The pretrained models for ViTAE on matting and remote sensing are released! Please try and have fun!

24/03/2021

The pretrained models for both ViTAE and ViTAEv2 are released. The code for downstream tasks are also provided for reference.

07/12/2021

The code is released!

19/10/2021

The paper is accepted by Neurips'2021! The code will be released soon!

06/08/2021

The paper is post on arxiv! The code will be made public available once cleaned up.

Introduction

This repository contains the code, models, test results for the paper ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias. It contains several reduction cells and normal cells to introduce scale-invariance and locality into vision transformers. In ViTAEv2, we explore the usage of window attentions without shift operations to obtain a better balance between memory footprint, speed, and performance. We also stack the proposed RC and NC in a multi-stage manner to faciliate the learning on other vision tasks including detection, segmentation, and pose.

Fig.1 - The details of RC and NC design in ViTAE.

Fig.2 - The multi-stage design of ViTAEv2.

Statement

This project is for research purpose only. For any other questions please contact yufei.xu at outlook.com qmzhangzz at hotmail.com .

Citing ViTAE and ViTAEv2

@article{xu2021vitae,
  title={Vitae: Vision transformer advanced by exploring intrinsic inductive bias},
  author={Xu, Yufei and Zhang, Qiming and Zhang, Jing and Tao, Dacheng},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}
@article{zhang2022vitaev2,
  title={ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond},
  author={Zhang, Qiming and Xu, Yufei and Zhang, Jing and Tao, Dacheng},
  journal={arXiv preprint arXiv:2202.10108},
  year={2022}
}

remote-sensing-rvsa's People

Contributors

Stargazers

Watchers

remote-sensing-rvsa's Issues

Questions about labels for the MillionAID dataset

Hello. This is a very valuable work. I have a question to ask you. In the MillionAIDdataset used in this paper, only 10,000 images are given classification labels, while the remaining 0.99 million images are not given classification labels, perhaps because I did not find this. Where can I find the classification labels for the remaining 0.99 million pictures?

ORCNN required?

Attempting to use this w/ detectron2. Training metrics look good, but completely fails in evaluation. Currently debugging, but would there be any reason why this couldn't work with a regular RPN and ROI heads versus oriented?

when will the codes be published?

Problem about vit_win_rvsa_kvdiff_wsz7.py-def calc_rel_pos_spatial

q[:, :, sp_idx:] torch.Size([2, 12, 1280, 64])

r_q = q[:, :, sp_idx:].reshape(B, n_head, q_h, q_w, dim) # B, H, qwh, qww, C
RuntimeError: shape '[2, 12, 7, 7, 64]' is invalid for input of size 1966080

When I dont use the RVSA，Just simple use window-attention
vit_win_rvsa_kvdiff_wsz7.py line 182 have an error.

I found this error because the Q in QKV does not split windows
I hope this error can be corrected

Missing script: tools/test.py file missing for segmentation and detection

Thank you for your contribution!

I could not find the test.py file to run inference for detection or segmentation as mentioned in your README.md. Can you share it or do you not share it intentionally?

Regards.

mmcv版本问题

您好，mmcv已经升级到最新版本了，您代码中的mmcv_custom中的代码还是基于mmcv低版本写的，您能更新下代码吗？

[Quesntion] When can I see config file?

Thank you for your contribution of deep learning research remote sensing field.
I'm very happy to find your research about self supervised learning in remote sensing field.

But I want to know about your config file of detection in dota and dior dataset.
Regards,
Kevin Cha.

Are the base config files missing?

Some of the base config files are probably missing, like ./Semantic Segmentation/configs/_base_/models/upernet_vit_base_win.py.

推理得到的结果如何计算mIou指标

您好，我想在inference的时候将得到的result与gt计算miou等相关指标。现在我使用在potsdam上预训练的权重，在potsdam的验证集上进行推理，然后使用../mmseg/core/evaluation/metrics.py，计算得到的{'aAcc': array(0.04249801), 'IoU': array([0.06536244, 0.00681113, 0.0036782 , 0.02882337, 0.00158446]), 'Acc': array([0.34386185, 0.01245181, 0.00750425, 0.05191632, 0.00163203])}，这似乎与您公布的在potsdam上的OA 91.1的相差甚远，我不太清楚该如何解决。期待您的回复。
以下是我的推理代码:
`image_root = "/data/user5/potsdam/img_dir/val"
ann_root = "/data/user5/potsdam/ann_dir/val"
image_list = os.listdir(image_root)
device = "cuda" if torch.cuda.is_available() else "cpu"
config = "../configs/vit_base_win/upernet_vitae_nc_base_rvsa_v3_kvdiff_wsz7_512x512_160k_potsdam_rgb_dpr10_lr6e5_lrd90_ps16_class5_ignore5.py"
checkpoint = "../pretrain_model/potsdam/vitae_rvsa_kvdiff.pth"
seg_model = init_segmentor(config, checkpoint, device=device)
num_classes = 5
ignore_index = 5
results = []
labels = []
for image in image_list:
# print(image)
image_path = os.path.join(image_root, image)
label_path = os.path.join(ann_root, image)
label = cv2.imread(label_path, 0)
_, masks = inference_segmentor(seg_model, image_path)
# print(masks[0])
copy_masks = masks[0]
results.append(copy_masks)
labels.append(label)

ret_metrics = eval_metrics( results, labels, num_classes, ignore_index, metrics='mIoU')
print(ret_metrics)
`

potsdam_vitae_rvsa_kvidff权重推理时的精度差异

您好，我在使用RVSA仓库中所给出的potsdam_vitae_rvsa_kvidff.pth权重进行推理时，结果有所出入，config文件除data_root以外未作修改。我的结果如下：

RVSA仓库中的log日志结果为 "aAcc": 0.9115, "mIoU": 0.8307, "mAcc": 0.9005, "mFscore": 0.9061, "mPrecision": 0.9124, "mRecall": 0.9005；两者所有指标差0.3%左右，我不太确定这是否可以认为两个结果是对齐的。
我的测试集是使用mmseg官方脚本对‘2_Ortho_RGB.zip'’和‘5_Labels_all.zip'’进行划分，最后得到2016张512x512的测试集。虽然我猜测与测试集划分相关，但使用rsp_r50权重进行推理时，精度是能基本对齐的。ViTAE-Transformer/RSP#15
我确实不太理解造成这种状况的原因，期待您的回复。

Pretrained Model Weights

您好，是否方便更新一个百度盘或者谷歌盘版本的Pretrained Model Weights ？ Onedrive分享的文件已经无法打开，可能是由于达到了分享上限或者是微软服务比较不稳定。

Hello, are you able to update a Baidu or Google Drive version of the Pretrained Model Weights? The file shared on OneDrive can no longer be opened or downloaded (so sad), possibly due to reaching the sharing limit or instability with the Microsoft service

KeyError: 'OrientedRCNN is not in the models registry'

I have run the inference command using the following command:
python image_demo.py demo/demo.jpg
configs/obb/oriented_rcnn/vit_base_win/faster_rcnn_orpn_our_rsp_vitae-nc-base-win-rvsa_v3_wsz7_fpn_1x_dota10_ms_lr1e-4
_ldr75_dpr15.py
checkpoints/vitae_rvsa_new.pth
--device cpu

Its using image_demo.py file from OBBDetection.
I am using MMCV version 1.6.2 and python version 3.9. How to reslove this error?

目标检测识别任务的训练评估结果复现不成功。能够成功复现obbdetection仓库oriented_rcnn的训练评估结果，但vitae-rvsa_dota10_ms的训练评估效果不理想

如题，根据vitae在目标检测识别上的构建方式，需要先搭建部署obbdetection的环境。我先搭好了obbdetection，准备dotav1.0的数据集，运行oriented_rcnn的faster_rcnn_orpn_r50_ms_rr_dota10.py进行训练以及评估，得到下图的效果

模型在dotav1.0上进行目标检测效果不错，说明obbdetection环境安装没问题。

随后我将remote-sensing-rvsa中object detection中的文件移动到obbdetection目录的相应位置，根据本项目readme中的命令行尝试运行faster_rcnn_orpn_our_rsp_vitae-nc-base-win-rvsa_v3_wsz7_fpn_1x_dota10_ms.py时，在backbone/init.py函数会报关于找不到"Swin"和另外几个模型的错误，我将这部分模型在__init__.py中删除后，可以正常训练评估，但效果如下图，

复现不出论文中效果。

1）先安装obbdetection，再进行本项目的代码文件和模型的移动复制，这部分流程应该是没有错误的。而后对于backbone/init.py里缺少的模型文件应该如何处理
2）项目时间也比较久了，复现过程在readme中能否详细一些。
3）这个复现结果不正确，可能是有什么原因造成的呢

Examples of train_labels.txt and valid_labels.txt files

Hello, if it is possible could you provide an example of nengtrain_labels.txt and valid_labels.txt files in Classification. I can't find the previous version mentioned in Readme.

Thank you very much.

Problems in the use of pre training model

Hello, first of all, thank you for your amazing results and grateful for providing the code!
After changing the network model, using the pre-trained model (tried both vit-base and vitae-base pre-training models) for training, many parameters cannot be matched, resulting in the final mAP drop (lower than the unmodified network structure).
I wonder whether the decrease in mAP is caused by changing the network structure or the lack of pre-training weight parameters? I wonder if it is necessary to re-do pre-training on Million AID after changing the network structure? Due to the large time cost and equipment cost of pre-training, it has not been tried.

Problem：MillionAID dataset

请问作者，文中使用的Million数据集在官网下载地址中，train文件只有1w张图片 1.82G，与所提到的MillionAID有百万图片不符，请问具体实验是使用多少图片做训练和测试的？

期待您的回复，谢谢解答

Model pth

Hi,

May I ask what are vit_rvsa.pth, vitae_rvsa.pth, vit_rvsa_kvdiff.pth, vitae_rvsa_kvdiff.pth?
At first, I thought the 'Model' column in each downstream task contains checkpoints for traning model on different dataset, but it turns out that they are the same for different tasks and datasets.

Thank you.

Clarification about installation for segmentation

Hi, thank you for an amazing model. I'm trying to install the model over the clean mmsegmentation installed from their repo.

Remote-Sensing-RVSA/Readme.md

Line 220 in 0c67c1e

Then put these files into corresponding folders.

I've put all of the files from Semantic Segmentation folder to the root of my mmsegmentation installation. Should I do pip install -v -e . again after I added the new files from your repo to mmsegmentation folder?

Also, please indicate where to put the files from Semantic Segmentation/mmcv_custom. Do we just leave it at mmsegmentation folder root? Or we should install mmcv from source?

I reproduced the code with an accuracy of only 54

The original code version is too old, so I reproduced the code to the new mmrotate version. I loaded the weights you provided and it went fine, the result was an accuracy of 68 on the validation set and 54 on the test set.

I don't know where the problem is, the weight file is loaded smoothly, I have also checked the configuration file parameters, but I just don't know what went wrong.If there is a problem with my reproduction, the final result should be 0, and it should not be as high as 54

dataset_type = 'DOTADataset'
data_root = '/data/facias/DOTA/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

angle_version = 'le90'
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='RResize', img_scale=(1024, 1024)),
    dict(
        type='RRandomFlip',
        flip_ratio=[0.25, 0.25, 0.25],
        direction=['horizontal', 'vertical', 'diagonal'],
        version=angle_version),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1024, 1024),
        flip=False,
        transforms=[
            dict(type='RResize', img_scale=(1024, 1024)),
            dict(type='RRandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img'])
        ])
]

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'train_split/labelTXt/',
        img_prefix=data_root + 'train_split/images/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'val_split/labelTxt/',
        img_prefix=data_root + 'val_split/images/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        test_mode=True,   #若test数据集没有标注，则设为True
        ann_file=data_root + 'test_split/images/',
        img_prefix=data_root + 'test_split/images/',
        pipeline=test_pipeline))

model = dict(
    type='OrientedRCNN',
    backbone=dict(
        type='ViT_Win_RVSA_V3_WSZ7',
        img_size=1024,
        embed_dim=768,
        depth=12,
        num_heads=12,
        mlp_ratio=4,
        qkv_bias=True,
        drop_rate=0.,
        attn_drop_rate=0.,
        drop_path_rate=0.15,
        use_abs_pos_emb=True),
    neck=dict(
        type='FPN',
        in_channels=[768, 768, 768, 768],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='OrientedRPNHead',
        in_channels=256,
        feat_channels=256,
        version=angle_version,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='MidpointOffsetCoder',
            angle_range=angle_version,
            target_means=[0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0, 0.5, 0.5]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
    roi_head=dict(
        type='OrientedStandardRoIHead',
        bbox_roi_extractor=dict(
            type='RotatedSingleRoIExtractor',
            roi_layer=dict(
                type='RoIAlignRotated',
                out_size=7,
                sample_num=2,
                clockwise=True),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='RotatedShared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=15,
            bbox_coder=dict(
                type='DeltaXYWHAOBBoxCoder',
                angle_range=angle_version,
                norm_factor=None,
                edge_swap=True,
                proj_xy=True,
                target_means=(.0, .0, .0, .0, .0),
                target_stds=(0.1, 0.1, 0.2, 0.2, 0.1)),
            reg_class_agnostic=True,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.8),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=False,
                iou_calculator=dict(type='RBboxOverlaps2D'),
                ignore_iof_thr=-1),
            sampler=dict(
                type='RRandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        rpn=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.8),
            min_bbox_size=0),
        rcnn=dict(
            nms_pre=2000,
            min_bbox_size=0,
            score_thr=0.05,
            nms=dict(iou_thr=0.1),
            max_per_img=2000)))
# evaluation
evaluation = dict(interval=1, metric='mAP')
# optimizer
optimizer = dict(type='SGD', lr=0.0025, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=1)

# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable

dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]

# disable opencv multithreading to avoid system being overloaded
opencv_num_threads = 0
# set multi-process start method as `fork` to speed up the training
mp_start_method = 'fork'

代码中value角度变换问题

sampling_angle_v = self.sampling_angles_v(x)
sampling_angle_v = sampling_angle_k.reshape(num_predict_total, 1, window_num_h, window_num_w)
这里代码是不是应该为
sampling_angle_v = self.sampling_angles_v(x)
sampling_angle_v = sampling_angle_v.reshape(num_predict_total, 1, window_num_h, window_num_w)

Can I know about class wise IoU detail performance in LoveDA?

I want to refer your paper in my research.
However, there are no class wise IoU detail performance of LoveDA semantic segmentation task, in both your paper and log file.

Thank you .

Classification Model

Hi~! It seems that the model link in the classification task is invalidated.

UCM dataset train/test split.

Could you please provide the txt file taht split the UCM dataset or is there an official way to split the dataset?

LoveDA dataset training proble

It is written in the paper that train and val sets of LoveDA are combined for training, but the downloaded test set is not marked. How do you use the test set to evaluate?

[resume error] loaded state dict has a different number of parameter groups

First of all, thanks for your great work! i already reproduced the train on mmrotae(version: 0.3.4)

  File "D:\Programs\Python\mmlabseries\lib\site-packages\torch\optim\optimizer.py", line 140, in load_state_dict
    raise ValueError("loaded state dict has a different number of "
ValueError: loaded state dict has a different number of parameter groups

when i want to continue train on latest.pth model , it gets an error, i debug and found

how to fix it?

预训练模型

有直接用于提取遥感影像特征的预训练模型吗，我想的是提取特征，之后融合其他特征用于自己的一个回归任务。

Unable to download pretrained checkpoints

load pretrained backbone weights?

Hello, if it is possible could you provide an example on how to load the pretrained weights of, for example, the ViTAE-B backbone on the pretraining model trained on the MillionAID dataset? (file: vitae-b-checkpoint-1599-transform-no-average-pth)

Also, can i expect that the size of the features array computed will be [1,768] for an image of size [1,3,224,224]?

Thank you very much

预训练模型

请问有没有MAE预训练ViT-B的backbone文件？只找到加上了RVSA之后的backbone文件。
谢谢！

关于 vit-base's fine-tuning 的问题

我使用作者你在https://github.com/ViTAE-Transformer/RSP
的mmseg训练代码来微调vit-base，复现你的结果。在potsdam上微调观察训练日志我发现了一个问题，中间测试结果和作者你的不一样，请问作者你改动什么超参数了吗。ps: 作者你给出的环境我都用一样的，potsdam使用RSP版本的mmseg处理的。下面是我的部分日志

这是作者你的日志

不同之处在于impervious_surface的问题，我考虑是train的reduce_zero_label参数的问题，但是修改后训练会报错，所以想请问你一下
修改了哪些超参数呢？或者是其他的问题？