Code Monkey home page Code Monkey logo

lake-red's Introduction

LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion


Pancheng Zhao1,2 · Peng Xu3+ · Pengda Qin4 · Deng-Ping Fan2,1 · Zhicheng Zhang1,2 · Guoli Jia1 · Bowen Zhou3 · Jufeng Yang1,2

1 VCIP & TMCC & DISSec, College of Computer Science, Nankai University

2 Nankai International Advanced Research Institute (SHENZHEN· FUTIAN)

3 Department of Electronic Engineering, Tsinghua University · 4Alibaba Group

+corresponding authors

CVPR 2024

Paper PDF Project Page Project Page

1. News

  • 🔥2024-07-15🔥: Revised a misspelling in Fig. 2 , and an error in Equ. 4. The latest version can be download on arXiv
  • 2024-04-13: Updated Fig. 3, including the computational flow of $\tilde{\mathrm{c} }^f$ and some of the variable names. The latest version can be download on arXiv(After 16 Apr 2024 00:00:00 GMT.)
  • 2024-04-13: Full Code, Dataset, and model weight have been released!
  • 2024-04-03: The preprint is now available on arXiv.
  • 2024-03-17: Basic code uploaded. Data, checkpoint and more code will come soon ...
  • 2024-03-11: Creating repository. The Code will come soon ...
  • 2024-02-27: LAKE-RED has been accepted to CVPR 2024

2. Get Start

1. Requirements

If you already have the ldm environment, please skip it

A suitable conda environment named ldm can be created and activated with:

conda env create -f ldm/environment.yaml
conda activate ldm

2. Download Datasets and Checkpoints.

Datasets:

We collected and organized the dataset LAKERED from existing datasets. The training set is from COD10K and CAMO, and testing set is including three subsets: Camouflaged Objects (CO), Salient Objects (SO), and General Objects (GO).

Datasets GoogleDrive BaiduNetdisk(v245)
Results:

The results of this paper can be downloaded at the following link:

Results GoogleDrive BaiduNetdisk(berx)
Checkpoint:

The Pre-trained Latent-Diffusion-Inpainting Model

Pretrained Autoencoding Models Link
Pretrained LDM Link

Put them into specified path:

Pretrained Autoencoding Models: ldm/models/first_stage_models/vq-f4-noattn/model.ckpt
Pretrained LDM: ldm/models/ldm/inpainting_big/last.ckpt

The Pre-trained LAKERED Model

LAKERED GoogleDrive BaiduNetdisk(dzi8)

Put it into specified path:

LAKERED: ckpt/LAKERED.ckpt

3. Quick Demo:

You can quickly experience the model with the following commands:

sh demo.sh

4. Train

4.1 Combine the codebook with Pretrained LDM
python combine.py
4.2 Start Train

You can change the `config_LAKERED.yaml' files to modify settings.

sh train.sh

Note:The solution to the KeyError 'global_step'

Quick fix : You can --resume with the model that is saved during termination from error. (logs/checkpoints/last.ckpt)

You can also skip 4.1 and download the LAKERED_init.ckpt to start training.

5. Test

Generate camouflage images with foreground objects in the test set:

sh test.sh

Note that this will take a lot of time, you can download the results.

6. Eval

Use torch-fidelity to calculate FID and KID:

pip install torch-fidelity

You need to specify the result root and the data root, then eval it by running:

sh eval.sh

For the “RuntimeError: stack expects each tensor to be equal size”

This is due to inconsistent image sizes.

Debug by following these steps:

​ (1) Find the datasets.py in the torch-fidelity

anaconda3/envs/envs-name/lib/python3.8/site-packages/torch_fidelity/datasets.py

​ (2) Import torchvision.transforms

import torchvision.transforms as TF

​ (3) Revise line 24:

self.transforms = TF.Compose([TF.Resize((299,299)),TransformPILtoRGBTensor()]) if transforms is None else transforms

Or you can manually modify the size of the images to be the same.

Contact

If you have any questions, please feel free to contact me:

[email protected]

[email protected]

Citation

If you find this project useful, please consider citing:

@inproceedings{zhao2024camouflaged,
      author = {Zhao, Pancheng and Xu, Peng and Qin, Pengda and Fan, Deng-Ping and Zhang, Zhicheng and Jia, Guoli and Zhou, Bowen and Yang, Jufeng},
      title = {LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion},
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year = {2024},
}

Acknowledgements

This code borrows heavily from latent-diffusion-inpainting, thanks the contribution of nickyisadog

lake-red's People

Contributors

panchengzhao avatar

Stargazers

 avatar Cheyanne Lee avatar EchoForger avatar HenryLi avatar Faych Chen avatar  avatar An-zhi WANG avatar Tse Hsueh, Chen avatar Chen Nan avatar Tianshu Yu avatar  avatar Cripes avatar  avatar  avatar Min Hyeok Lee avatar Fea avatar 胡钧耀 avatar  avatar LeGr4ndK avatar Zhuo Chen avatar 胡杨 HU YANG avatar Bo-Wen Yin avatar  avatar  avatar  avatar NKU-2urSt8g avatar 钱雨航 avatar Ziyi Peng avatar yinda chen avatar  avatar  avatar  avatar

Watchers

 avatar

lake-red's Issues

Confused about the role of the codebook in BKRM

Hello, thanks for your excellent work. And congraduate for being accepted by CVPR!

I have a question about the mechanism of the background knowledge retrieval.
In this part, the queries and values are extracted from the foreground feature, and the keys are extracted from the codebook.
However, the keys and values in standard cross-attention are extracted from the same source, which is different from your method.
Is there any reason to design this method?

From my point of view, in your work, the role of the codebook is to give the weight of each foreground feature. The information of the codebook is not directly utilized. And maybe I can think of the codebook as a strong MLP in self-attention?
Could you tell more about the use of codebook in BKRM?

If there is something I misunderstand, please let me know!
Hope for your reply. Thanks a lot!

The generated results look bad

Hello, thanks for your work again.

I downloaded your results and found that there are many results not as good as the figures shown in your paper.
What's more, many results look not realistic, even not camouflaged at all.
For example, the results below seem abnormal, and these bad results account for a large proportion.
Could you give some explaination or illustration about this situation?
Hope for your reply!
Thanks!

SOD_THUR15K_Giraffe711
SOD_THUR15K_DogJump3077
SOD_SOD_69015

Runtime Error when Running demo.sh Script

I have created a new conda environment on my local machine using environment.yaml

this is my modified demo.sh file
/home/kumar/anaconda3/envs/ldm/bin/python inference_one_sample.py --image /media/kumar/HDD1/INFIDATA/LABORATORY/lakered/LAKE-RED/demo/src/COD_CAMO_camourflage_00012.jpg \ --mask /media/kumar/HDD1/INFIDATA/LABORATORY/lakered/LAKE-RED/demo/src/COD_CAMO_camourflage_00012.png \ --log_path demo_res

When I try to execute demo.sh I am getting the following runtime error

Called with args:
Namespace(Steps=50, batchsize=9, dilate_kernel=2, image='/media/kumar/HDD1/INFIDATA/LABORATORY/lakered/LAKE-RED/demo/src/COD_CAMO_camourflage_00012.jpg', isReplace=False, log_path='demo_res', mask='/media/kumar/HDD1/INFIDATA/LABORATORY/lakered/LAKE-RED/demo/src/COD_CAMO_camourflage_00012.png', model_path='ckpt/LAKERED.ckpt', yaml_path='ldm/models/ldm/inpainting_big/config_LAKERED.yaml')
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 387.25 M params.
Keeping EMAs of 433.
making attention of type 'none' with 512 in_channels
Working with z of shape (1, 3, 64, 64) = 12288 dimensions.
making attention of type 'none' with 512 in_channels
Using first stage also as cond stage.
Restored from ckpt/LAKERED.ckpt with 0 missing and 0 unexpected keys
RuntimeError:  /media/kumar/HDD1/INFIDATA/LABORATORY/lakered/LAKE-RED/demo/src/COD_CAMO_camourflage_00012.jpg```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.