Code Monkey home page Code Monkey logo

xpaste's Introduction

X-Paste, ICML 2023

The repo is the official implementation of "X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion".

Introduction

X-Paste Pipeline X-Paste is built upon Copy-Paste to train the instance segmentation model but aims to make Copy-Paste more scalable, i.e., obtain large-scale object instances with high-quality masks for unlimited categories in an efficient and automatic way.

Requirements

pip install -r requirements.txt

Download COCO and LVIS dataset, place them under $DETECTRON2_DATASETS following Detectron2

Download pretrained backbone

mkdir models
cd models
wget https://miil-public-eu.oss-eu-central-1.aliyuncs.com/model-zoo/ImageNet_21K_P/models/resnet50_miil_21k.pth
python tools/convert-thirdparty-pretrained-model-to-d2.py --path resnet50_miil_21k.pth

wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224_22k.pth
python tools/convert-thirdparty-pretrained-model-to-d2.py --path swin_base_patch4_window7_224_22k.pth

Getting Started

  1. generate images with stablediffusion: generation/text2im.py
cd generation
pip install -U diffusers transformers xformers
python text2im.py --model diffusers --samples 100 --category_file /mnt/data/LVIS/lvis_v1_train.json --output_dir /mnt/data/LVIS_gen_FG
  1. Segment foreground object segment_methods/reseg.py
cd segment_methods

## for each segment method, you should manually download their models and edit the model path in export.py 

python reseg.py --input_dir /mnt/data/LVIS_gen_FG --output_dir /mnt/data/LVIS_gen_FG_segs/ --seg_method clipseg
python reseg.py --input_dir /mnt/data/LVIS_gen_FG --output_dir /mnt/data/LVIS_gen_FG_segs/ --seg_method UFO
python reseg.py --input_dir /mnt/data/LVIS_gen_FG --output_dir /mnt/data/LVIS_gen_FG_segs/ --seg_method U2Net
python reseg.py --input_dir /mnt/data/LVIS_gen_FG --output_dir /mnt/data/LVIS_gen_FG_segs/ --seg_method selfreformer
  1. Filtering object and create object pool
cd segment_methods

python clean_pool.py --input_dir /mnt/data/LVIS_gen_FG_segs/ --image_dir /mnt/data/LVIS_gen_FG --output_file /mnt/data/LVIS_instance_pools.json --min_clip 21 --min_area 0.05 --max_area 0.95 --tolerance 1

  1. train network
## edit INST_POOL_PATH in config file as your instance pool json
bash launch.sh --config-file configs/Xpaste_swinL.yaml

  1. demo
python demo.py --config-file configs/Xpaste_swinL.yaml --input example.jpg --output annotated.jpg --opts MODEL.WEIGHTS Xpaste_swinL_final.pth

Qualitative results of X-Paste and baseline on LVIS test set. Left: X-Paste, Right: baseline (Swin-L)

Models (LVIS dataset)

Backbone method $AP^{box}$ $AP^{mask}$ $AP_r^{box}$ $AP_r^{mask}$ checkpoint
ResNet50 baseline 34.5 30.8 24.0 21.6 model
ResNet50 X-Paste 37.4 33.2 33.9 29.7 model
Swin-L baseline 47.5 42.3 41.4 36.8 model
Swin-L X-Paste 50.9 45.4 48.7 43.8 model

Acknowledgements

We use code from Detic, CenterNet2 and Detectron2

Lisence

The majority of our X-Paste is licensed under the Apache 2.0 license, portions of the project are available under separate license terms: SWIN-Transformer, CLIP, CLIPSeg, UFO and TensorFlow Object Detection API are licensed under the MIT license; UniDet, U2Net and Detic are licensed under the Apache 2.0 license; Selfreformer is lisenced under BSD 3-Clause License; Stable Diffusion is lisenced under CreativeML Open RAIL M License and the LVIS API is licensed under a custom license. If you later add other third party code, please keep this license info updated, and please let us know if that component is licensed under something other than CC-BY-NC, MIT, or CC0

Citation

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

@inproceedings{Zhao2022XPasteRC,
  title={X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion},
  author={Hanqing Zhao and Dianmo Sheng and Jianmin Bao and Dongdong Chen and Dong Chen and Fang Wen and Lu Yuan and Ce Liu and Wenbo Zhou and Qi Chu and Weiming Zhang and Nenghai Yu},
  booktitle={International Conference on Machine Learning},
  year={2023}
}

xpaste's People

Contributors

yoctta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

xpaste's Issues

[Question] Instance Image Size?

Thank you very much for your good research. I have a question while reading your paper. When you create an instance using StableDiffusion, don't you filter based on the size of the created instance (e.g. area)? Intuitively, I think it won't help if the created instance is too small.

Memory leak in CenterNet?

Hi,
and thank you for releasing your code.

I am trying to replicate your results, and when training the detector I get an out of memory error.
Specifically, it seems that the code logs increasing memory usage.

I think the issue might be in CenterNet. If I replace the model with a Resnet50-MaskRCNN provided by detectron2, I do not observe it.

Did you ever experience this?
And more in general, on how many GPUs and how much memory was needed to train your models?

Bugs when using --resume in text2im.py

if args.resume:
old=os.listdir(PATH)
cls_names=[j['name'] for j in target_class]
for i in cls_names:
try:
=cv2.imread(os.path.join(PATH,f"{i}{args.samples-1}.png"))
cls_names.remove(i)
print(f"skipping {i}")
except:

"for i in cls_names" and "cls_names.remove(i)" will cause conflicts and incorrectly skipping some classes

Training base model and X-Paste, performances

Hi,
and thank you for sharing your code.

I am trying to reproduce results in the paper and I'm having issues with the following two configurations:

Base Model

I am submitting the training script with the config file configs/Base-C2_L_R5021k_640b64_4x.yaml to train the base CenterNetv2 (ResNet50 backbone). However, the results I get are slightly worse than expected, is this the right way to train it?

Model source $AP^{box}$ $AP^{mask}$ $AP_r^{box}$ $AP_r^{mask}$
baseline checkpoint 34.5 30.8 24.0 21.6
baseline retraining 34.0 30.0 21.7 18.4

X-Paste model
Same happens when I train the X-Paste model using the configs/Xpaste_R50.yaml config. Here differences are more evident.

Model source $AP^{box}$ $AP^{mask}$ $AP_r^{box}$ $AP_r^{mask}$
XPaste checkpoint 37.4 33.2 33.9 29.7
XPaste retraining 34.5 30.4 24.2 21.8

What am I missing?

Thanks for help,
D

Total execution time for the code

Thank you very much for your code! I was trying to run it on 4 NVIDIA GeForce RTX 3090s. This is the ETA that it shows while running the code (I'm using Resnet backbone, and have not modified anything):

Baseline: 34 hours
Xpaste: 8 days 5 hours!

Is this normal? Could you please tell me how long the code took to run when you were running it?
Thank you!

The config of only using copypaste

Hi, @yoctta.
Thanks for your work on XPaste.

How should I modify the config file if I want to only use copypaste? I set INPUT.USE_COPY_METHOD to self_copy, but I am not sure the INPUT.INST_POOL should be false or true ?

The modified config file is:

_BASE_: "./Base-C2_L_R5021k_640b64_4x.yaml"
MODEL:
  ROI_BOX_HEAD:
    USE_ZEROSHOT_CLS: false
    FED_LOSS_FREQ_WEIGHT: 0.5
  WEIGHTS: "models/swin_large_patch4_window12_384_22k.pkl"
  BACKBONE:
    NAME: build_swintransformer_fpn_backbone
  SWIN:
    SIZE: L-22k-384
  FPN:
    IN_FEATURES: ["swin1", "swin2", "swin3"]
SOLVER:
  MAX_ITER: 180000
  CHECKPOINT_PERIOD: 10000
  IMS_PER_BATCH: 16
  BASE_LR: 0.0001
  MODEL_EMA: 0.999
DATASETS:
  TRAIN: ("lvis_v1_train",)
  TEST: ("lvis_v1_val",)

INPUT:
  INST_POOL: true
  INST_POOL_PATH: "/mnt/data/LVIS_instance_pools.json"
  INST_POOL_FORMAT: "RGBA"
  USE_COPY_METHOD: "self_copy"
  USE_INSTABOOST: false
  MASK_FORMAT: bitmask
  CP_METHOD: ['basic']
  RANDOM_ROTATE: false
  INST_POOL_SAMPLE_TYPE: "cas_random"
  TRAIN_SIZE: 896

How to reproduce results on COCO ?

As I didn't see the config, Can I only change lvis_v1_train to coco_2017_train, as below?

DATASETS:
  TRAIN: ("coco_2017_train",)
  TEST: ("coco_2017_val",)
DATALOADER:
  # SAMPLER_TRAIN: "RepeatFactorTrainingSampler"
  # REPEAT_THRESHOLD: 0.001
  NUM_WORKERS: 8
TEST:
  DETECTIONS_PER_IMAGE: 100

The text prompts used for CLIP score are different

Hi, @yoctta.
Thanks for your work on XPaste.

In generation/text2im.py line 103, the prefix of text prompt is a photo of a single , but in segment_methods/reseg.py line 72, the prefix of text prompt is a photo of . Why the text prompts used in two codes are different?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.