yoctta / xpaste Goto Github PK

Python 39.51% Shell 0.09% Jupyter Notebook 60.40%

xpaste's Introduction

X-Paste, ICML 2023

The repo is the official implementation of "X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion".

Introduction

X-Paste is built upon Copy-Paste to train the instance segmentation model but aims to make Copy-Paste more scalable, i.e., obtain large-scale object instances with high-quality masks for unlimited categories in an efficient and automatic way.

Requirements

pip install -r requirements.txt

Download COCO and LVIS dataset, place them under $DETECTRON2_DATASETS following Detectron2

Download pretrained backbone

mkdir models
cd models
wget https://miil-public-eu.oss-eu-central-1.aliyuncs.com/model-zoo/ImageNet_21K_P/models/resnet50_miil_21k.pth
python tools/convert-thirdparty-pretrained-model-to-d2.py --path resnet50_miil_21k.pth

wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224_22k.pth
python tools/convert-thirdparty-pretrained-model-to-d2.py --path swin_base_patch4_window7_224_22k.pth

Getting Started

generate images with stablediffusion: generation/text2im.py

cd generation
pip install -U diffusers transformers xformers
python text2im.py --model diffusers --samples 100 --category_file /mnt/data/LVIS/lvis_v1_train.json --output_dir /mnt/data/LVIS_gen_FG

Segment foreground object segment_methods/reseg.py

cd segment_methods

## for each segment method, you should manually download their models and edit the model path in export.py 

python reseg.py --input_dir /mnt/data/LVIS_gen_FG --output_dir /mnt/data/LVIS_gen_FG_segs/ --seg_method clipseg
python reseg.py --input_dir /mnt/data/LVIS_gen_FG --output_dir /mnt/data/LVIS_gen_FG_segs/ --seg_method UFO
python reseg.py --input_dir /mnt/data/LVIS_gen_FG --output_dir /mnt/data/LVIS_gen_FG_segs/ --seg_method U2Net
python reseg.py --input_dir /mnt/data/LVIS_gen_FG --output_dir /mnt/data/LVIS_gen_FG_segs/ --seg_method selfreformer

Filtering object and create object pool

cd segment_methods

python clean_pool.py --input_dir /mnt/data/LVIS_gen_FG_segs/ --image_dir /mnt/data/LVIS_gen_FG --output_file /mnt/data/LVIS_instance_pools.json --min_clip 21 --min_area 0.05 --max_area 0.95 --tolerance 1

train network

## edit INST_POOL_PATH in config file as your instance pool json
bash launch.sh --config-file configs/Xpaste_swinL.yaml

demo

python demo.py --config-file configs/Xpaste_swinL.yaml --input example.jpg --output annotated.jpg --opts MODEL.WEIGHTS Xpaste_swinL_final.pth

Qualitative results of X-Paste and baseline on LVIS test set. Left: X-Paste, Right: baseline (Swin-L)

Models (LVIS dataset)

Backbone	method	$AP^{box}$	$AP^{mask}$	$AP_r^{box}$	$AP_r^{mask}$	checkpoint
ResNet50	baseline	34.5	30.8	24.0	21.6	model
ResNet50	X-Paste	37.4	33.2	33.9	29.7	model
Swin-L	baseline	47.5	42.3	41.4	36.8	model
Swin-L	X-Paste	50.9	45.4	48.7	43.8	model

Acknowledgements

We use code from Detic, CenterNet2 and Detectron2

Lisence

The majority of our X-Paste is licensed under the Apache 2.0 license, portions of the project are available under separate license terms: SWIN-Transformer, CLIP, CLIPSeg, UFO and TensorFlow Object Detection API are licensed under the MIT license; UniDet, U2Net and Detic are licensed under the Apache 2.0 license; Selfreformer is lisenced under BSD 3-Clause License; Stable Diffusion is lisenced under CreativeML Open RAIL M License and the LVIS API is licensed under a custom license. If you later add other third party code, please keep this license info updated, and please let us know if that component is licensed under something other than CC-BY-NC, MIT, or CC0

Citation

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

@inproceedings{Zhao2022XPasteRC,
  title={X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion},
  author={Hanqing Zhao and Dianmo Sheng and Jianmin Bao and Dongdong Chen and Dong Chen and Fang Wen and Lu Yuan and Ce Liu and Wenbo Zhou and Qi Chu and Weiming Zhang and Nenghai Yu},
  booktitle={International Conference on Machine Learning},
  year={2023}
}

xpaste's People

Contributors

Stargazers

Watchers

Forkers

happyfox-dot noobtoss brianxyzzz

xpaste's Issues

[Question] Instance Image Size?

Thank you very much for your good research. I have a question while reading your paper. When you create an instance using StableDiffusion, don't you filter based on the size of the created instance (e.g. area)? Intuitively, I think it won't help if the created instance is too small.

Memory leak in CenterNet?

Hi,
and thank you for releasing your code.

I am trying to replicate your results, and when training the detector I get an out of memory error.
Specifically, it seems that the code logs increasing memory usage.

I think the issue might be in CenterNet. If I replace the model with a Resnet50-MaskRCNN provided by detectron2, I do not observe it.

Did you ever experience this?
And more in general, on how many GPUs and how much memory was needed to train your models?

Bugs when using --resume in text2im.py

if args.resume:
old=os.listdir(PATH)
cls_names=[j['name'] for j in target_class]
for i in cls_names:
try:
=cv2.imread(os.path.join(PATH,f"{i}{args.samples-1}.png"))
cls_names.remove(i)
print(f"skipping {i}")
except:

"for i in cls_names" and "cls_names.remove(i)" will cause conflicts and incorrectly skipping some classes

Error while training base model: RuntimeError: Expected !nested_tensorlist[0].empty() to be true, but got false.

Hi @yoctta , thank you for your code!

I was running lvis data on the base model (configs/Base-C2_L_R5021k_640b64_4x/yaml) and I run into this error: (I have not made any changes to the repo or the config file)

Could you please let me know how to fix it?
Thank you!

Training base model and X-Paste, performances

Hi,
and thank you for sharing your code.

I am trying to reproduce results in the paper and I'm having issues with the following two configurations:

Base Model

I am submitting the training script with the config file configs/Base-C2_L_R5021k_640b64_4x.yaml to train the base CenterNetv2 (ResNet50 backbone). However, the results I get are slightly worse than expected, is this the right way to train it?

Model	source	$AP^{box}$	$AP^{mask}$	$AP_r^{box}$	$AP_r^{mask}$
baseline	checkpoint	34.5	30.8	24.0	21.6
baseline	retraining	34.0	30.0	21.7	18.4

X-Paste model
Same happens when I train the X-Paste model using the configs/Xpaste_R50.yaml config. Here differences are more evident.

Model	source	$AP^{box}$	$AP^{mask}$	$AP_r^{box}$	$AP_r^{mask}$
XPaste	checkpoint	37.4	33.2	33.9	29.7
XPaste	retraining	34.5	30.4	24.2	21.8

What am I missing?

Thanks for help,
D

Total execution time for the code

Thank you very much for your code! I was trying to run it on 4 NVIDIA GeForce RTX 3090s. This is the ETA that it shows while running the code (I'm using Resnet backbone, and have not modified anything):

Baseline: 34 hours
Xpaste: 8 days 5 hours!

Is this normal? Could you please tell me how long the code took to run when you were running it?
Thank you!

The config of only using copypaste

Hi, @yoctta.
Thanks for your work on XPaste.

How should I modify the config file if I want to only use copypaste? I set INPUT.USE_COPY_METHOD to self_copy, but I am not sure the INPUT.INST_POOL should be false or true ?

The modified config file is:

_BASE_: "./Base-C2_L_R5021k_640b64_4x.yaml"
MODEL:
  ROI_BOX_HEAD:
    USE_ZEROSHOT_CLS: false
    FED_LOSS_FREQ_WEIGHT: 0.5
  WEIGHTS: "models/swin_large_patch4_window12_384_22k.pkl"
  BACKBONE:
    NAME: build_swintransformer_fpn_backbone
  SWIN:
    SIZE: L-22k-384
  FPN:
    IN_FEATURES: ["swin1", "swin2", "swin3"]
SOLVER:
  MAX_ITER: 180000
  CHECKPOINT_PERIOD: 10000
  IMS_PER_BATCH: 16
  BASE_LR: 0.0001
  MODEL_EMA: 0.999
DATASETS:
  TRAIN: ("lvis_v1_train",)
  TEST: ("lvis_v1_val",)

INPUT:
  INST_POOL: true
  INST_POOL_PATH: "/mnt/data/LVIS_instance_pools.json"
  INST_POOL_FORMAT: "RGBA"
  USE_COPY_METHOD: "self_copy"
  USE_INSTABOOST: false
  MASK_FORMAT: bitmask
  CP_METHOD: ['basic']
  RANDOM_ROTATE: false
  INST_POOL_SAMPLE_TYPE: "cas_random"
  TRAIN_SIZE: 896

How to reproduce results on COCO ?

As I didn't see the config, Can I only change lvis_v1_train to coco_2017_train, as below?

DATASETS:
  TRAIN: ("coco_2017_train",)
  TEST: ("coco_2017_val",)
DATALOADER:
  # SAMPLER_TRAIN: "RepeatFactorTrainingSampler"
  # REPEAT_THRESHOLD: 0.001
  NUM_WORKERS: 8
TEST:
  DETECTIONS_PER_IMAGE: 100

The text prompts used for CLIP score are different