Code Monkey home page Code Monkey logo

intrinsic-lora's Introduction

Generative Models: What do they know? Do they know things? Let's find out!

Xiaodan Du, Nick Kolkin†, Greg Shakhnarovich, Anand Bhattad

Toyota Technological Institute at Chicago, †Adobe Research

Abstract: Generative models have been shown to be capable of synthesizing highly detailed and realistic images. It is natural to suspect that they implicitly learn to model some image intrinsics such as surface normals, depth, or shadows. In this paper, we present compelling evidence that generative models indeed internally produce high-quality scene intrinsic maps. We introduce Intrinsic-LoRA, a universal, plug-and-play approach that transforms any generative model into a scene intrinsic predictor, capable of extracting intrinsic scene maps directly from the original generator network without needing additional decoders or fully fine-tuning the original network. Our method employs a Low-Rank Adaptation (LoRA) of key feature maps, with newly learned parameters that make up less than 0.6% of the total parameters in the generative model. Optimized with a small set of labeled images, our model-agnostic approach adapts to various generative architectures, including Diffusion models, GANs, and Autoregressive models. We show that the scene intrinsic maps produced by our method compare well with, and in some cases surpass those generated by leading supervised techniques.

Many thanks to neph1 for the Blender Add-on (vid) and kijai for the ComfyUI integration

License

Since we use Stable Diffusion, we are releasing under their CreativeML Open RAIL-M license.

Updates

2024/2/13: We now provide inference code: inference_sd_single.py

2024/1/2: We provide checkpoints for our single step SD model. You can download them at GDrive. Load the checkpoint using

pipeline.unet.load_attn_procs(torch.load('path/to/ckpt.bin'))

Getting Started

The main packages are listed below

#Conda
pillow=9.2.0
python=3.8.15
pytorch=1.13.0
tokenizers=0.13.0.dev0
torchvision=0.14.0
tqdm=4.64.1
transformers=4.25.1
#pip
accelerate==0.22.0
diffusers==0.20.2
einops==0.6.1
huggingface-hub==0.16.4
numpy==1.22.4
wandb==0.12.21

Get Necessary Stable Diffusion Checkpoints from HuggingFace🤗.
We train our single-step UNet model using SDv1.5 and multi-step AugUNet model using SDv2.1. We initialize the additional input channels in AugUNet with IP2P.

Usage

We provide code for training the single-step UNet models and the multi-step AugUNet models for surface normal and depth map extraction. Code for albedo and shading should be very similar. Please note that the code is developed for DIODE dataset. To train a model using your own dataset, you need to modify the dataloader. Here we assume that the pseudo labels are stored in the same folder structure as DIODE dataset.

Run the following command to train surface normal single-step UNet model

export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATA_DIR="path/to/DIODE/normals"
export PSEUDO_DIR="path/to/pseudo/labels"
export HF_HOME="path/to/HuggingFace/cache/folder"

accelerate launch sd_single_diode_pseudo_normal.py \
--pretrained_model_name_or_path=$MODEL_NAME  \
--train_data_dir=$DATA_DIR \
--pseudo_root=$PSEUDO_DIR \
--output_dir="path/to/output/dir" \
--train_batch_size=4 \
--dataloader_num_workers=4 \
--learning_rate=1e-4 \
--report_to="wandb" \
--lr_warmup_steps=0 \
--max_train_steps=20000 \
--validation_steps=2500 \
--checkpointing_steps=2500 \
--rank=8 \
--scene_types='outdoor,indoors' \
--num_train_imgs=4000 \
--unified_prompt='surface normal' \
--resume_from_checkpoint='latest' \
--seed=1234

Run the following command to train depth single-step UNet model

export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATA_DIR="path/to/DIODE/depths"
export PSEUDO_DIR="path/to/pseudo/labels"
export HF_HOME="path/to/HuggingFace/cache/folder"

accelerate launch sd_single_diode_pseudo_depth.py \
--pretrained_model_name_or_path=$MODEL_NAME  \
--train_data_dir=$DATA_DIR \
--pseudo_root=$PSEUDO_DIR \
--output_dir="path/to/output/dir" \
--train_batch_size=4 \
--dataloader_num_workers=4 \
--learning_rate=1e-4 \
--report_to="wandb" \
--lr_warmup_steps=0 \
--max_train_steps=20000 \
--validation_steps=2500 \
--checkpointing_steps=2500 \
--rank=8 \
--scene_types='outdoor,indoors' \
--num_train_imgs=4000 \
--unified_prompt='depth map' \
--resume_from_checkpoint='latest' \
--seed=1234

Run the following code to train surface normal multi-step AugUNet model

export MODEL_NAME="stabilityai/stable-diffusion-2-1"
export DATA_DIR="path/to/DIODE/normals"
export PSEUDO_DIR="path/to/pseudo/labels"
export HF_HOME="path/to/HuggingFace/cache/folder"

accelerate launch augunet_diode_pseudo_normal.py \
--pretrained_model_name_or_path=$MODEL_NAME  \
--train_data_dir=$DATA_DIR \
--pseudo_root=$PSEUDO_DIR \
--output_dir="path/to/output/dir" \
--train_batch_size=4 \
--dataloader_num_workers=4 \
--learning_rate=1e-4 \
--report_to="wandb" \
--lr_warmup_steps=0 \
--max_train_steps=50000 \
--validation_steps=2500 \
--checkpointing_steps=2500 \
--rank=8 \
--scene_types='outdoor,indoors' \
--unified_prompt='surface normal' \
--resume_from_checkpoint='latest' \
--seed=1234

Run the following code to train depth multi-step AugUNet model

export MODEL_NAME="stabilityai/stable-diffusion-2-1"
export DATA_DIR="path/to/DIODE/depths"
export PSEUDO_DIR="path/to/pseudo/labels"
export HF_HOME="path/to/HuggingFace/cache/folder"

accelerate launch augunet_diode_pseudo_depth.py \
--pretrained_model_name_or_path=$MODEL_NAME  \
--train_data_dir=$DATA_DIR \
--pseudo_root=$PSEUDO_DIR \
--output_dir="path/to/output/dir" \
--train_batch_size=4 \
--dataloader_num_workers=4 \
--learning_rate=1e-4 \
--report_to="wandb" \
--lr_warmup_steps=0 \
--max_train_steps=50000 \
--validation_steps=2500 \
--checkpointing_steps=2500 \
--rank=8 \
--scene_types='outdoor,indoors' \
--unified_prompt='depth map' \
--resume_from_checkpoint='latest' \
--seed=1234

Our code should be compatible with "fp16" precision by just appending --mixed_precision="fp16" to accelerate launch. However we train all of our models using the full precision. Please let us know if you encounter problems using "fp16".

BibTex

@article{du2023generative,
  title={Generative Models: What do they know? Do they know things? Let's find out!},
  author={Du, Xiaodan and Kolkin, Nicholas and Shakhnarovich, Greg and Bhattad, Anand},
  journal={arXiv preprint arXiv:2311.17137},
  year={2023}
}

intrinsic-lora's People

Contributors

duxiaodan avatar

Stargazers

İsmail Çetin avatar Pgxo20 avatar Sophie Shen avatar KBΓΓR avatar  avatar Shivank Saxena avatar  avatar toyxyz avatar robe avatar  avatar Yichen Mo avatar Jingyu WANG avatar Chenxin Li avatar  avatar  avatar Hongyu Wen avatar Murat Ayfer avatar  avatar  avatar zhanglb avatar Paragoner avatar Zhexin Liang avatar YiChenCityU avatar Yixin Chen avatar  avatar Luka  avatar Yihan Wang avatar Yinhao Zhu avatar Seung Min Lee avatar  avatar Takeru Miyato avatar Vincent Ho avatar Yu Zhang avatar  avatar  avatar Martín Bravo avatar  avatar  avatar Janmay avatar Bran Sorem avatar Surya avatar  avatar  avatar Jim Lee avatar xuxudong avatar  avatar rui400 avatar leanAI avatar fofo avatar 青龍聖者@bdsqlsz avatar Giseop Kim avatar syddharth avatar  avatar  avatar Daniela Ivanova avatar  avatar  avatar  avatar  avatar Mike Smith avatar  avatar isaac avatar Eugenio Herrera-Berg avatar  avatar Peeyush Agarwal avatar Jeff Cook avatar baris avatar Alex avatar VALADI K JAGANATHAN avatar Thomas Lindemeier avatar Shiimizu avatar  avatar Aaron Smith avatar  avatar あで avatar  avatar George Corney avatar Stelios Petrakis avatar Slava avatar Alex Rigler avatar Suresh Veeragoni avatar  avatar 목진왕 avatar Verb avatar Arjun P avatar Zhiyuan Lin avatar John F. Wu avatar johnny avatar mukul avatar Said avatar Mark Vandergon avatar Nikolay Kolev avatar Richard avatar  avatar Ahmed Hashim avatar Fernando Mumbach avatar  avatar  avatar Shyam Sudhakaran avatar Alexander Wesolowski avatar

Watchers

Giseop Kim avatar  avatar Kostas Georgiou avatar  avatar

intrinsic-lora's Issues

concerning pseudo labels

Hi,
Thank you for your contribution.
I am preparing to reproduce the results reported in your paper. To facilitate a more efficient replication process, could you kindly share the pseudo labels you generated? Thank you for your assistance.

A bit weird depth in 2.5D view

Screenshot from 2024-02-19 15-17-18

Hi,

Thank you for the good work! I wonder whether you have visualized depth in a 3D point cloud view to check if the relative depth makes sense. Somehow, I-LoRA has too many details in the depth map that indicate larger and jumpy depth. For example, in the second and fourth rows the foreground regions sometimes show colors as dark as the background, and ZoeDepth's results make more sense.

ViewBackward0 is a view and its base or another view of its base has been modified inplace.

In the 2nd iteration, the line: model_pred = unet(original_image_embeds, timesteps, encoder_hidden_states).sample leads to this error: RuntimeError: Output 0 of ViewBackward0 is a view and its base or another view of its base has been modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

ComfyUI implementation and sourcing the pretrained LoRAs

Hey!

I made a custom node for ComfyUI to allow the one step sampling for the LoRAs, it works great and shows lots of potential already!
I wanted to ask if you would be okay with me uploading the LoRAs to some easier to access location than gdrive, or even bundling with the custom node code, or if you would be willing to do that yourself? For example hugging face.

Thank you for the great work regardless!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.