Code Monkey home page Code Monkey logo

datid-3d's Introduction

DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model
Official PyTorch implementation of the CVPR 2023 paper

Colab project_page arXiv

DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model
Gwanghyun Kim, Se Young Chun
CVPR 2023

gwang-kim.github.io/datid_3d

Abstract:
Recent 3D generative models have achieved remarkable performance in synthesizing high resolution photorealistic images with view consistency and detailed 3D shapes, but training them for diverse domains is challenging since it requires massive training images and their camera distribution information.
Text-guided domain adaptation methods have shown impressive performance on converting the 2D generative model on one domain into the models on other domains with different styles by leveraging the CLIP (Contrastive Language-Image Pre-training), rather than collecting massive datasets for those domains. However, one drawback of them is that the sample diversity in the original generative model is not well-preserved in the domain-adapted generative models due to the deterministic nature of the CLIP text encoder. Text-guided domain adaptation will be even more challenging for 3D generative models not only because of catastrophic diversity loss, but also because of inferior text-image correspondence and poor image quality. Here we propose DATID-3D, a novel pipeline of text-guided domain adaptation tailored for 3D generative models using text-to-image diffusion models that can synthesize diverse images per text prompt without collecting additional images and camera information for the target domain. Unlike 3D extensions of prior text-guided domain adaptation methods, our novel pipeline was able to fine-tune the state-of-the-art 3D generator of the source domain to synthesize high resolution, multi-view consistent images in text-guided targeted domains without additional data, outperforming the existing text-guided domain adaptation methods in diversity and text-image correspondence. Furthermore, we propose and demonstrate diverse 3D image manipulations such as one-shot instance-selected adaptation and single-view manipulated 3D reconstruction to fully enjoy diversity in text.

Recent Updates

  • 2023.03.31: Code & Colab demo are released.
  • 2023.04.03: Gradio demo is released.

Requirements

  • We have used Linux (Ubuntu 20.04).
  • We have used 1 NVIDIA A100 GPU for text-guided domain adaptation, and have used 1 NVIDIA A100 or RTX3090 GPU for the test using the shifted generators.
    1โ€“8 high-end NVIDIA GPUs. We have done all testing and development using V100, RTX3090, and A100 GPUs.
  • Python 3.8, PyTorch 1.12.1 (or later), CUDA toolkit 11.6 (or later).
  • Python libraries: see environment.yml for exact library dependencies. You can use the following commands with Miniconda3 to create and activate your Python environment:
    git clone https://github.com/gwang-kim/DATID-3D_tmp.git
    cd DATID-3D
    conda env create -n datid3d -f environment.yml
    conda activate datid3d
  • We use the pretrained EG3D models as our pretrained 3D generative models. The prtrained EG3D models will be downloaded automatically for convinence. Or you can download the pretrained EG3D models, put afhqcats512-128.pkl and affhqrebalanced512-128.pkl in ~/eg3d/pretrained/.

Demo

Gradio Demo

  • We provide a interactive Gradio app demo.
python gradio_app.py

Colab Demo Open In Colab

  • We provide a Colab demo for you to play with DATID-3D! Due to 12GB of the VRAM limit in Colab, we only provide the codes of inference & applications with 3D generative models fine-tuned using DATID-3D, not fine-tuning code.

Download Fine-tuned 3D Generative Models

Fine-tuned 3D generative models using DATID-3D pipeline are stored as *.pkl files. You can download the models in our Hugginface model pages.

mkdir finetuned
wget https://huggingface.co/gwang-kim/datid3d-finetuned-eg3d-models/resolve/main/finetuned_models/ffhq-pixar.pkl -O finetuned

Sample Images, Shapes and Videos

You can sample images and shapes (as .mrc files), pose-controlled videos using the shifted 3D generative model. For example:

# Sample images and shapes (as .mrc files) using the shifted 3D generative model

python datid3d_test.py --mode image \
--generator_type='ffhq' \
--outdir='test_runs' \
--seeds='100-200' \
--trunc='0.7' \
--shape=True \
--network=finetuned/ffhq-pixar.pkl 
# Sample pose-controlled videos using the shifted 3D generative model

python datid3d_test.py --mode video \
--generator_type='ffhq' \
--outdir='test_runs' \
--seeds='100-200' \
--trunc='0.7' \
--grid=4x4 \
--network=finetuned/ffhq-pixar.pkl 

The results are saved to ~/test_runs/image or ~/test_runs/video.

Following EG3D, we visualize our .mrc shape files with UCSF Chimerax.

To visualize a shape in ChimeraX do the following:

  1. Import the .mrc file with File > Open
  2. Find the selected shape in the Volume Viewer tool
    1. The Volume Viewer tool is located under Tools > Volume Data > Volume Viewer
  3. Change volume type to "Surface"
  4. Change step size to 1
  5. Change level set to 10
    1. Note that the optimal level can vary by each object, but is usually between 2 and 20. Individual adjustment may make certain shapes slightly sharper
  6. In the Lighting menu in the top bar, change lighting to "Full"

Single-shot Text-guided 2D-to-3D

Text-guided Manipulated 3D Reconstruction

This includes alignment -> pose extraction -> 3D GAN inversion -> generation of images using fine-tuned generator. We use Deep3DFaceRecon as the pose estimation models. The prtrained pose estimation will be downloaded automatically for convinence. Or you can download the pretrained pose estimation model and BFM files, put epoch_20.pth in ~/pose_estimation/checkpoints/pretrained/ and put unzip BFM.zip in ~/pose_estimation/. For example:

# Text-guided manipulated 3D reconstruction from images using the shifted 3D generative model

python datid3d_test.py --mode manip \
--indir='input_imgs' \
--generator_type='ffhq' \
--outdir='test_runs' \
--trunc='0.7' \
--network=finetuned/ffhq-pixar.pkl 

The results are saved to ~/test_runs/manip_3D_recon/4_manip_result.

Text-guided Domain Adaptation of 3D Generator

You can do text-guided domain adaptation of 3D generator with your own text prompt using datid3d_train.py. For example:

python datid3d_train.py \
   --mode='ft' \
   --pdg_prompt='a FHD photo of face of beautiful Elf with silver hair in the live action movie' \
   --pdg_generator_type='ffhq' \
   --pdg_strength=0.7 \
   --pdg_num_images=1000 \
   --pdg_sd_model_id='stabilityai/stable-diffusion-2-1-base' \
   --pdg_num_inference_steps=50 \
   --ft_generator_type='same' \
   --ft_batch=20 \
   --ft_kimg=200

The results of each training run are saved to a newly created directory, for example ~/training_runs/00011-ffhq-data_ffhq_a_FHD_photo_of_face_of_beautiful_Elf_with_silver_hair_in_the_live_action_movie-gpus1-batch20-gamma5.

Citation

@inproceedings{kim2022datid3d,
  author = {DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model},
  title = {Gwanghyun Kim and Se Young Chun},
  booktitle = {CVPR},
  year = {2023}
}

Acknowledgements

We thank the contributions of public projects for sharing their code. We apply our pipelines to EG3D, one of the 3D generative models, and adopt Stable Diffusion as our text-to-image diffusion models and Deep3DFaceRecon as our pose estimation models. We also utilze a part of codes in HFGI3D.

datid-3d's People

Contributors

gwang-kim avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.