Code Monkey home page Code Monkey logo

vive3d's Introduction

VIVE3D — Official PyTorch implementation

Teaser image

VIVE3D: Viewpoint-Independent Video Editing using 3D-Aware GANs
Anna Frühstück, Nikolaos Sarafianos, Yuanlu Xu, Peter Wonka, Tony Tung
published at CVPR 2023
Project Webpage | Video

Abstract We introduce VIVE3D, a novel approach that extends the capabilities of image-based 3D GANs to video editing and is able to represent the input video in an identity-preserving and temporally consistent way. We propose two new building blocks. First, we introduce a novel GAN inversion technique specifically tailored to 3D GANs by jointly embedding multiple frames and optimizing for the camera parameters. Second, besides traditional semantic face edits (e.g. for age and expression), we are the first to demonstrate edits that show novel views of the head enabled by the inherent properties of 3D GANs and our optical flow-guided compositing technique to combine the head with the background video. Our experiments demonstrate that VIVE3D generates high-fidelity face edits at consistent quality from a range of camera viewpoints which are composited with the original video in a temporally and spatially-consistent manner.

Prerequisites

This code has been tested with pytorch 1.12.1 and cuda 11.3.

The requirements for this project are largely the same as for the EG3D code base, which our work is based on. We also provide our conda environment in environment_vive3D.yml, which you can install using conda env create -f environment_vive3D.yml.

In order to load models trained using EG3D, you also need to copy three directories from NVIDIA's code repository to the vive3D directory: torch_utils, dnnlib and training as well as legacy.py and
camera_utils.py.

Code

We share our VIVE3D pipeline as a Jupyter Notebook (including descriptions) in VIVE3D_pipeline.ipynb or as a sequence of python scripts:

First, we invert a selected few frames into the latent space of a pretrained EG3D generator. We devised a customized inversion scheme for this purpose. (After inversion, we fine-tune the Generator to improve coherence to our target person's appearance.

# inverts the selected faces into the EG3D latent space and fine-tunes to create personalized Generator
python personalize_generator.py --source_video path/to/video \
--generator_path 'models/ffhqrebalanced512-128.pkl' \
--output_intermediate \
--start_sec FRAME_SECOND \
--end_sec FRAME_SECOND \
--frame FRAME_INDEX_1 --frame FRAME_INDEX_2 --frame FRAME_INDEX_3 \ 
--device 'cuda:0'

Then, we invert a video sequence into the personalized Generator latent space.

# runs an inversion into the personalized Generator latent space for a video sequence
python invert_video.py --savepoint_path path/to/savepoints \
--source_video path/to/video \
--start_sec 60 \
--end_sec 70 \ 
--device 'cuda:0'

We can now use the stack of inverted latents and angles to edit the appearance of our subject, and change the head's angle. We provide some InterfaceGAN editing boundaries here, please copy them to the boundaries folder in the project directory.

python edit_video.py --savepoint_path path/to/savepoints \
--source_video path/to/video \
--start_sec 60 \
--end_sec 70 \
--edit_type 'young' --edit_strength 2.0 \ 
--device 'cuda:0'

Pre-trained Models

Our code is using a face model trained using the EG3D code base by NVIDIA. You can download the model ffhqrebalanced512_128.pkl from Nvidia. Put it into the project directory in the models folder.

We also rely on an implementation of BiSeNet to obtain a segmentation of the face. Please download their pretrained model to the folder pretrained_models in the project directory.

Source Videos You can use any source video with sufficiently high resolution as an input- and target-video.

Citation

If you use this code for your research, please cite our paper.

@inproceedings{Fruehstueck2023VIVE3D,
  title = {{VIVE3D}: Viewpoint-Independent Video Editing using {3D-Aware GANs}},
  author = {Fr{\"u}hst{\"u}ck, Anna and Sarafianos, Nikolaos and Xu, Yuanlu and Wonka, Peter and Tung, Tony},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2023}
}

License

Our code is available under the CC-BY-NC license.

Acknowledgements

This project was the product of an internship by Anna Frühstück with Meta Reality Labs Research Sausalito.

We thank the EG3D team at NVIDIA for providing their code.

vive3d's People

Contributors

afruehstueck avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.