harlanhong / cvpr2022-dagan Goto Github PK

View Code? Open in Web Editor NEW

941.0 27.0 123.0 6.67 MB

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

Home Page: https://harlanhong.github.io/publications/dagan.html

License: Other

Python 99.68% Dockerfile 0.32%

depth face gan talking-head deepfake face-reenactment image-animation motion-transfer

cvpr2022-dagan's People

Contributors

Stargazers

Watchers

Forkers

tianhaoyue kangweiiliu hologerry zhanglonghao1992 andrewkuo khuongnd jinwook-shim moileehyeji zbdehh johndpope techthiyanes youngboy52 p00pcvm ishine mfkiwl janfschr zb-hong vembala sts-sadr zhanghm1995 warmsnow-sh ovshake shownx johnny7861532 hbcbh1999 excurl ygtxr1997 cv-synthesis adambear pustar lincong666 quarktim artyomnaz hhhhnwl seuvictor mrtornado24 hajungong007 queequeg92 qiaoptdun kyongpiltae vpegasus bycloudai xiaomingde hkmtechnology abdm357 sid672 nttung1110 anh15052001 paperwave sylar00 theosech jaedukseo maxathon2020 liuyvchi fastrocket tiantang007 aldaiar7 0xspeedrunner pranav-iiitdwd andreeacosma samching fredrikhson codeaudit yikang-he lijie2160 0x1355 bellgeorge faisal-hayat chhaviilli slives-lab peterzs marcus-arcadius chaofeibu rasantis 2112105214 bertywooster leetesla fangdejia magnetar111 lockejiang opdev1004 orlgln beyondchenlin pengchaojay tinesh-babu hellodoge aurelianocyp mit10000 klonggan rakeshmoramreddy fitgoodte hongguliu samggggflynn zardyuan darkknight2223 cephdon xinfushe ayman-elbanhawy vaibhav260 542774114

cvpr2022-dagan's Issues

Depth map from paper not reproducable

Firstly, thank you for this awesome work. However, I tried to reproduce the depth map from the paper using the "demo.py" script and the result is quite different from the one seen in Fig. 9 of the paper.

Result from the paper:

Result running the script:

Corresponding depth map as pointcloud:

The Depth map looks way more smooth and facial details like the nose or mouth are completely lost.

Catoon Demo

Good job, How to generate Cartoon Sample?

Error in running a demo version!

Hello! Thanks for sharing openly amazing work! My research is also related to generating talking faces. I face error when tried to run:
CUDA_VISIBLE_DEVICES=0 python demo.py --config config/vox-adv-256.yaml --driving_video data/2.mp4 --source_image data/2.jpg --checkpoint depth/models/weights_19/encoder.pth --relative --adapt_scale --kp_num 15 --generator DepthAwareGenerator

Can you please correct me where I made mistakes while running the demo one?

关于app.py

作者您好，首先非常祝贺您做出了如此成功的工作。另外，请问这个项目有考虑做个APP部署到Android端或者网页端么。

About measurement question

Hi, @harlanhong . First, I appreciate your nice work in this fields.

I'm just asking you how to measure the details of your metric result.

Did you write simple codes or just import library functions to measure those results in tables?

And if you wrote the codes, could you share that? If not, what library did you import to measure those results?

Thank you.

Driving video with length 1

If the driving video only contrains one frame, can its expression be transfered to the source face?

my fault

Great work.
But i got some erro when trying the SPADE.pth.tar :

RuntimeError: Error(s) in loading state_dict for DepthAwareGenerator:
Missing key(s) in state_dict: "up_blocks.0.conv.weight", "up_blocks.0.conv.bias",....
Unexpected key(s) in state_dict: "decoder.compress.weight", "decoder.compress.bias",...

Size of input

Hello
Thanks for your great work!
I have a question, does your model support input resolution higher, than 256px? 512px for example
I see that in code input video and image are resized to 256px, so causes the loss of visual quality
Is there a way to use 512x512 img/vid without losing quality?

How to generate the 3D point cloud figure shown in the paper?

Hi, @harlanhong

Thanks very much for your very interesting work! I am wondering how you draw the 3D point cloud figures shown in Fig 9. of
the paper. Could you share the visualization codes with us?

Bests
Haomiao Ni

evaluation and comparison with MarioNETte and MeshG

Hello, thanks for releasing the code of this excellent work !
I have a question about evaluation and comparison with MarioNETte and MeshG. As mentioned in the paper, the test set sampling strategy follows that of MarioNETte. And the reported results of MarioNETte and MeshG are replicated from their original papers. So I wonder if the test set lists in the folder './data' such as '/data/celeV_cross_id_evaluation.csv' are the same as MarioNETte and MeshG.
Looking forward to your reply ! Thanks !

Missing setup.py

Hi,

Thanks for this wonderful work!

It seems that the setup.py file is missing in this new version. Is it possible for you to upload it again? Thanks a lot for the help!

Best,
Wenhua

kp_num

It seems the param 'kp_num' is not allowed to change.
When I set it to 20, error occurs:

Traceback (most recent call last):
File "demo.py", line 191, in
generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint, cpu=opt.cpu)
File "demo.py", line 46, in load_checkpoints
generator.load_state_dict(ckp_generator)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1490, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SPADEDepthAwareGenerator:
size mismatch for dense_motion_network.hourglass.encoder.down_blocks.0.conv.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 84, 3, 3]).
size mismatch for dense_motion_network.mask.weight: copying a param with shape torch.Size([16, 128, 7, 7]) from checkpoint, the shape in current model is torch.Size([21, 148, 7, 7]).
size mismatch for dense_motion_network.mask.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([21]).
size mismatch for dense_motion_network.occlusion.weight: copying a param with shape torch.Size([1, 128, 7, 7]) from checkpoint, the shape in current model is torch.Size([1, 148, 7, 7]).

Hello, I was wondering what's wrong with my example, I follow the instruction but get different result

I follow the instruction, set my parameter as followed:

CUDA_VISIBLE_DEVICES=7 python demo.py
--config config/vox-adv-256.yaml
--driving_video source/example.mp4
--source_image source/example.png
--checkpoint download/SPADE_DaGAN_vox_adv_256.pth.tar
--kp_num 15
--generator SPADEDepthAwareGenerator
--result_video results/example_out.mp4
--relative --adapt_scale

example_out.mp4

Is there something wrong with my parameters.

add web demo/model to Huggingface

Hi, would you be interested in adding DaGAN to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. Models/datasets/spaces(web demos) can be added to a user account or organization similar to github.

Example from other organizations:
Keras: https://huggingface.co/keras-io
Microsoft: https://huggingface.co/microsoft
Facebook: https://huggingface.co/facebook

Example spaces with repos:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/salesforce/BLIP

github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore

and here are guides for adding spaces/models/datasets to your org

How to add a Space: https://huggingface.co/blog/gradio-spaces
how to add models: https://huggingface.co/docs/hub/adding-a-model
uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

how to load SPADEGenerator model weight seperately?

I want to utilise SPADEGenerator separately and want to know how to load it's weight separately..Please help.

Question regarding output of DepthAwareAttention

In the DepthAwareAttention module, the inputs are: depth_image and output feature map generated by occlusion map line 195.

depth_image is stored in 'source' while output feature is stored in 'feat'.

There is one variable gamma line 66, which is basically a zero tensor. self.gamma = nn.Parameter(torch.zeros(1))

After doing all the operations in forward pass, you are getting an output feature map. It is then multiplied with gamma and feat is added line 87.
out = self.gamma*out + feat

That means all the operations performed during the forward pass are multiplied to zero and the original output features were returned. That makes the entire DepthAwareAttention useless, as the attention returned was also never used in the code.

Can you please clarify on this?

Error No such file or directory: 'depth/models/weights_19/encoder.pth'

I downloaded the pre-trained weights from the onedrive DaGAN_vox_adv_256.pth.tar and put it in a checkpoints directory.
When I run the demo command with --cpu I get the following error:

(dagan) user@Users-MacBook-Air CVPR2022-DaGAN % python demo.py --config config/vox-adv-256.yaml --driving_video ./assets/driving.mp4 --source_image ./assets/leo.jpg --checkpoint ./checkpoints/DaGAN_vox_adv_256.pth.tar --relative --adapt_scale --kp_num 15 --generator DepthAwareGenerator --cpu                 
Traceback (most recent call last):
  File "demo.py", line 165, in <module>
    loaded_dict_enc = torch.load('depth/models/weights_19/encoder.pth')
  File "/Users/user/miniconda3/envs/dagan/lib/python3.7/site-packages/torch/serialization.py", line 594, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/Users/user/miniconda3/envs/dagan/lib/python3.7/site-packages/torch/serialization.py", line 230, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/Users/user/miniconda3/envs/dagan/lib/python3.7/site-packages/torch/serialization.py", line 211, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'depth/models/weights_19/encoder.pth'

How can I solve it? Many thanks, great job and good luck for ICLR :) !

Fix some codes about py-feat library

Hi @harlanhong !

First, I'm very pleased to see your works, DaGAN. Thanks for your effort.
The reason why I issue this post is I just want to fix your code a little bit.
In your utils.py, there are some codes using py-feat library and this a causes of problem.
I don't know which version of py-feat you use, but no matter what you should change some codes like this way due to latest version using this way:

p1 = out1.facepose().values # AS-IS
p1 = out1.facepose.values # TO-BE

because latest version of py-feat uses facepose as property like this:

@property
    def facepose(self):
        """Returns the facepose data using the columns set in fex.facepose_columns

        Returns:
            DataFrame: facepose data
        """
        return self[self.facepose_columns]

Could you fix this problems for anybody who will use this codes?

Error as training on my own dataset, did anyone have this problem before?

[W python_anomaly_mode.cpp:104] Warning: Error detected in CudnnBatchNormBackward. Traceback of forward call that caused the error:
File "run.py", line 144, in
train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
File "/mnt/users/CVPR2022-DaGAN-master/train.py", line 66, in train
losses_generator, generated = generator_full(x)

Meanwhile there's another problem as well:
Traceback (most recent call last):
File "run.py", line 144, in
train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
File "/mnt/users/CVPR2022-DaGAN-master/train.py", line 74, in train
loss.backward()
File "/home/anaconda3/envs/DaGAN/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32]] is at version 5; expected version 4 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

It seems an inplace problem happen, but I couldn't find anywhere with an inplace code.

question about evaluation

hi, thanks for releasing the code. i found a line "from feat import Detector" in the func "evaluate_PRMSE_AUCON()" in utils.py, but i didn't
find the module "feat". Is it a package that should be installed or some code that has not been uploaded? looking forward to your reply. thanks a lot.

Error with Spade Model

Hi, This is the error I am getting while trying to run the Spade model. Any walkthrough?

Rename requirments.txt to requirements.txt

Hi @harlanhong their is a mistake with requirements.txt naming.

您好，请问SPADE_DaGAN_vox_adv_256.pth.tar对应的模型文件也是vox-adv-256.yaml吗？

Missing steps to use command line demo

I likely am missing some key information common to running demos for projects like this, but I was hoping the author or anyone else that is knowledgeable can help me out here. I'm attempting to run the demo as per the repo instructions using a source image of my own and a driving video of my own. I'm trying to utilize the SPADE checkpoint provided as a download, as well as other checkpoints (e.g., related to depth and encoder) that seemed to be required in order to run the demo code. This is all being attempted in a conda environment with the dependencies fulfilled on a Macbook Pro (so, Mac OSX without a dedicated GPU). From what I understand, the demo should be able to be run on such a simple machine without a GPU and/or Linux.

I seem to be having issues with loading checkpoints themselves, as evidenced by ultimately encountering an error such as:

RuntimeError: Error(s) in loading state_dict for ResnetEncoder:
	size mismatch for encoder.layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for encoder.layer1.1.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for encoder.layer2.0.conv1.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
	size mismatch for encoder.layer2.0.downsample.0.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 1, 1]).
	size mismatch for encoder.layer2.0.downsample.1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
	size mismatch for encoder.layer2.0.downsample.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
	size mismatch for encoder.layer2.0.downsample.1.running_mean: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
	size mismatch for encoder.layer2.0.downsample.1.running_var: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
	size mismatch for encoder.layer2.1.conv1.weight: copying a param with shape torch.Size([128, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
	size mismatch for encoder.layer3.0.conv1.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
	size mismatch for encoder.layer3.0.downsample.0.weight: copying a param with shape torch.Size([1024, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]).
	size mismatch for encoder.layer3.0.downsample.1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for encoder.layer3.0.downsample.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for encoder.layer3.0.downsample.1.running_mean: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for encoder.layer3.0.downsample.1.running_var: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for encoder.layer3.1.conv1.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
	size mismatch for encoder.layer4.0.conv1.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
	size mismatch for encoder.layer4.0.downsample.0.weight: copying a param with shape torch.Size([2048, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 1, 1]).
	size mismatch for encoder.layer4.0.downsample.1.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for encoder.layer4.0.downsample.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for encoder.layer4.0.downsample.1.running_mean: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for encoder.layer4.0.downsample.1.running_var: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for encoder.layer4.1.conv1.weight: copying a param with shape torch.Size([512, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
	size mismatch for encoder.fc.weight: copying a param with shape torch.Size([1000, 2048]) from checkpoint, the shape in current model is torch.Size([1000, 512]).

Are there specific steps I should be taking, that are not listed in the repo, in order to run the demo code using a CPU? Is it possible to run the demo code using a CPU? Any help would be appreciated. The command I'm trying to use to run the demo is:

python demo.py --config config/vox-adv-256.yaml --driving_video driving.mp4 --source_image source.png --checkpoint download/SPADE_DaGAN_vox_adv_256.pth.tar --relative --adapt_scale --kp_num 15 --generator SPADEDepthAwareGenerator --find_best_frame

How to Visualize Attention Mechanism

Is there any code for visualizing the attention mechanism? Or can you recommend a repository for reference，Thank you

Is the audio supported in the generated video?

Is there a way to keep the audio in the generated video? Many thanks.

testing error

when i run this command CUDA_VISIBLE_DEVICES=0 python demo.py --config config/vox-adv-256.yaml --driving_video ./example_video.mp4 --source_image ./example_image.png --checkpoint ./checkpoints/SPADE_DaGAN_vox_adv_256.pth.tar --relative --adapt_scale --kp_num 15 --generator SPADEDepthAwareGenerator --result_video results/example_out.mp4 --find_best_frame

I got the following error：
Traceback (most recent call last):
File "demo.py", line 169, in
depth_encoder.load_state_dict(filtered_dict_enc)
File "/home/miniconda3/envs/dagan/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ResnetEncoder:
size mismatch for encoder.layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for encoder.layer1.1.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for encoder.layer2.0.conv1.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
size mismatch for encoder.layer2.0.downsample.0.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 1, 1]).
size mismatch for encoder.layer2.0.downsample.1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for encoder.layer2.0.downsample.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for encoder.layer2.0.downsample.1.running_mean: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for encoder.layer2.0.downsample.1.running_var: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for encoder.layer2.1.conv1.weight: copying a param with shape torch.Size([128, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
size mismatch for encoder.layer3.0.conv1.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
size mismatch for encoder.layer3.0.downsample.0.weight: copying a param with shape torch.Size([1024, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]).
size mismatch for encoder.layer3.0.downsample.1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for encoder.layer3.0.downsample.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for encoder.layer3.0.downsample.1.running_mean: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for encoder.layer3.0.downsample.1.running_var: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for encoder.layer3.1.conv1.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for encoder.layer4.0.conv1.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
size mismatch for encoder.layer4.0.downsample.0.weight: copying a param with shape torch.Size([2048, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 1, 1]).
size mismatch for encoder.layer4.0.downsample.1.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layer4.0.downsample.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layer4.0.downsample.1.running_mean: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layer4.0.downsample.1.running_var: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layer4.1.conv1.weight: copying a param with shape torch.Size([512, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for encoder.fc.weight: copying a param with shape torch.Size([1000, 2048]) from checkpoint, the shape in current model is torch.Size([1000, 512]).

请问SPADE_DaGAN_vox_adv_256.pth.tar模型自己怎么训练出来

pip install -r requirements.txt

Processing /data/fhongac/workspace/src/CVPR22_DaGAN/torch-1.9.0+cu111-cp37-cp37m-linux_x86_64.whl
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/data/fhongac/workspace/src/CVPR22_DaGAN/torch-1.9.0+cu111-cp37-cp37m-linux_x86_64.whl'

Error when load the spade model

Nice work!
But I have encountered a problem that when I load the SPADE model as I load theDaGAN model, the following problem occurs. Any suggestions?

  File "demo.py", line 191, in <module>
    generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint, cpu=opt.cpu)
  File "demo.py", line 46, in load_checkpoints
    generator.load_state_dict(ckp_generator)
  File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DepthAwareGenerator:
   Missing key(s) in state_dict: "up_blocks.0.conv.weight", "up_blocks.0.conv.bias",...... "final.bias". 
   Unexpected key(s) in state_dict: "decoder.compress.weight", "decoder.compress.bias", ..."decoder.G_middle_0.norm_0.mlp_beta.weight" ```
Any suggestions?

DaGAN VoxCeleb

Hello, I see, that you've released depth face model trained on voxceleb2
Does it show better results, than your previous depth checkpoints? Can I use it with SPADE or standart DaGAN checkpoints?
Can you please tell us, when do you plan to release DaGAN checkpoint corresponding to voxceleb2 depth model?
Thanks a lot for you great work.

The generated face remains the same pose

Thanks for your good work; however when i tried run the demo, the generated video tends to remains the same pose as the source image; while in the paper (Figure 2) the generated results have driving frame's pose(this is also the case for the results from README), so why is this the case?

result.mp4

How to get the voxceleb1 dataset?

CVPR2022-DaGAN/frames_dataset.py

Line 104 in dc30943

path = np.random.choice(glob.glob(os.path.join(self.root_dir, name + '*.mp4')))

Question about the background of images

Thanks for this incredible work!
I've looked at the demo gif on the project homepage, I was wondering about why the background is moving with head movement, is there any way to disentangle the foreground and background？

Requirements.txt specifies dataclasses==0.8

There is an error when installing with requirements.txt. Does the version of dataclasses need to be changed?

There seems to be a problem with distributed running codeWhen I entered the training command, the program did not respond

When I entered the training command, the program did not respond. How can I solve it

将生成的头部图像拼接到身体的问题

你好，请问有没有考虑过将生成的图像拼接回身体的问题，当前的驱动视频会改变输入图片的表情和动作，造成拼接回去会错位的问题，有没有将动作和表情分开驱动的方法，能否分享下相关的研究？

How to preprocess the image data?

If I have an face image as the driving image. How to properly crop it? Could you provide the script?
I tested the crop-video.py but it could not work for a single image.

Error while training on VoxCeleb

Hi,
I am trying to train DaGAN on VoxCeleb. The following error is occurring.

  File "run.py", line 144, in <module>
    train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
  File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/train.py", line 66, in train
    losses_generator, generated = generator_full(x)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/modules/model.py", line 189, in forward
    kp_driving = self.kp_extractor(driving)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/modules/keypoint_detector.py", line 51, in forward
    feature_map = self.predictor(x) #x bz,4,64,64
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/modules/util.py", line 252, in forward
    return self.decoder(self.encoder(x))
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/modules/util.py", line 178, in forward
    out = up_block(out)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/modules/util.py", line 92, in forward
    out = self.norm(out)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 745, in forward
    self.eps,
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/functional.py", line 2283, in batch_norm
    input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled
 (function _print_stack)
^M  0%|          | 0/3965 [00:26<?, ?it/s]
^M  0%|          | 0/150 [00:26<?, ?it/s]

Traceback (most recent call last):
  File "run.py", line 144, in <module>
    train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
  File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/train.py", line 70, in train
    loss.backward()
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32]] is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

  FutureWarning,
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 13113) of binary: /home/madhav3101/env_tf/bin/python
Traceback (most recent call last):
  File "/home/madhav3101/miniconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/madhav3101/miniconda3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
    )(*cmd_args)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
    failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
run.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2022-04-25_17:30:13
  host      : gnode90.local
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 13113)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

How do I train the network with my own data?

Hi, First I want to thank you for providing the code. DaGAN works like magic.

Here is my issue: I'd like to create a video of guy with strong emotion, like screaming. I have the driving video, but the generated clip from DaGAN doesn't share the strong emotion as the driving video, the mouth only open slightly, unlike the wide open mouth in the driving video.

I thought it is the dataset problem: there are not many strong emotions from the voxceleb dataset, which consists of interview videos. I set out to train the model from scratch with the driving video (about 1500 face images). I use your resnet-50 depth encoder/decoder pretrained weights, and train my own generator, kp detector and discriminator. However, the results are horrible. The face doesn't even change expression.

My question is: 1. should I train from scratch or just fine-tune your model with my driving video? 2. When I train the network, I just input a bunch of face images of the same person, with different expression/head pose. Is this right? Does the "driving" and "source" frame has to be close together in the video (only slight expression/pose change)?

Thanks a lot!

Error occurs when I change model to SPADE_DaGAN_vox_adv_256.pth.tar

As I use DaGAN_vox_adv_256.pth.tar as my pretrained model, the result is not very well. Therefore I want to change model to SPADE_DaGAN_vox_adv_256.pth.tar but error as followed occurs:

RuntimeError: Error(s) in loading state_dict for DepthAwareGenerator:
        Missing key(s) in state_dict: "up_blocks.0.conv.weight", "up_blocks.0.conv.bias", "up_blocks.0.norm.weight", "up_blocks.0.norm.bias", "up_blocks.0.norm.running_mean", "up_blocks.0.norm.running_var", "up_blocks.1.conv.weight", "up_blocks.1.conv.bias", "up_blocks.1.norm.weight", "up_blocks.1.norm.bias", "up_blocks.1.norm.running_mean", "up_blocks.1.norm.running_var", "bottleneck.r0.conv1.weight", "bottleneck.r0.conv1.bias", "bottleneck.r0.conv2.weight", "bottleneck.r0.conv2.bias", "bottleneck.r0.norm1.weight", "bottleneck.r0.norm1.bias", "bottleneck.r0.norm1.running_mean", "bottleneck.r0.norm1.running_var", "bottleneck.r0.norm2.weight", "bottleneck.r0.norm2.bias", "bottleneck.r0.norm2.running_mean", "bottleneck.r0.norm2.running_var", "bottleneck.r1.conv1.weight", "bottleneck.r1.conv1.bias", "bottleneck.r1.conv2.weight", "bottleneck.r1.conv2.bias", "bottleneck.r1.norm1.weight", "bottleneck.r1.norm1.bias", "bottleneck.r1.norm1.running_mean", "bottleneck.r1.norm1.running_var", "bottleneck.r1.norm2.weight", "bottleneck.r1.norm2.bias", "bottleneck.r1.norm2.running_mean", "bottleneck.r1.norm2.running_var", "bottleneck.r2.conv1.weight", "bottleneck.r2.conv1.bias", "bottleneck.r2.conv2.weight", "bottleneck.r2.conv2.bias", "bottleneck.r2.norm1.weight", "bottleneck.r2.norm1.bias", "bottleneck.r2.norm1.running_mean", "bottleneck.r2.norm1.running_var", "bottleneck.r2.norm2.weight", "bottleneck.r2.norm2.bias", "bottleneck.r2.norm2.running_mean", "bottleneck.r2.norm2.running_var", "bottleneck.r3.conv1.weight", "bottleneck.r3.conv1.bias", "bottleneck.r3.conv2.weight", "bottleneck.r3.conv2.bias", "bottleneck.r3.norm1.weight", "bottleneck.r3.norm1.bias", "bottleneck.r3.norm1.running_mean", "bottleneck.r3.norm1.running_var", "bottleneck.r3.norm2.weight", "bottleneck.r3.norm2.bias", "bottleneck.r3.norm2.running_mean", "bottleneck.r3.norm2.running_var", "bottleneck.r4.conv1.weight", "bottleneck.r4.conv1.bias", "bottleneck.r4.conv2.weight", "bottleneck.r4.conv2.bias", "bottleneck.r4.norm1.weight", "bottleneck.r4.norm1.bias", "bottleneck.r4.norm1.running_mean", "bottleneck.r4.norm1.running_var", "bottleneck.r4.norm2.weight", "bottleneck.r4.norm2.bias", "bottleneck.r4.norm2.running_mean", "bottleneck.r4.norm2.running_var", "bottleneck.r5.conv1.weight", "bottleneck.r5.conv1.bias", "bottleneck.r5.conv2.weight", "bottleneck.r5.conv2.bias", "bottleneck.r5.norm1.weight", "bottleneck.r5.norm1.bias", "bottleneck.r5.norm1.running_mean", "bottleneck.r5.norm1.running_var", "bottleneck.r5.norm2.weight", "bottleneck.r5.norm2.bias", "bottleneck.r5.norm2.running_mean", "bottleneck.r5.norm2.running_var", "final.weight", "final.bias". 
        Unexpected key(s) in state_dict: "decoder.compress.weight", "decoder.compress.bias", "decoder.fc.weight", "decoder.fc.bias", "decoder.G_middle_0.conv_0.bias", "decoder.G_middle_0.conv_0.weight_orig", "decoder.G_middle_0.conv_0.weight_u", "decoder.G_middle_0.conv_0.weight_v", "decoder.G_middle_0.conv_1.bias", "decoder.G_middle_0.conv_1.weight_orig", "decoder.G_middle_0.conv_1.weight_u", "decoder.G_middle_0.conv_1.weight_v", "decoder.G_middle_0.norm_0.mlp_shared.0.weight", "decoder.G_middle_0.norm_0.mlp_shared.0.bias", "decoder.G_middle_0.norm_0.mlp_gamma.weight", "decoder.G_middle_0.norm_0.mlp_gamma.bias", "decoder.G_middle_0.norm_0.mlp_beta.weight", "decoder.G_middle_0.norm_0.mlp_beta.bias", "decoder.G_middle_0.norm_1.mlp_shared.0.weight", "decoder.G_middle_0.norm_1.mlp_shared.0.bias", "decoder.G_middle_0.norm_1.mlp_gamma.weight", "decoder.G_middle_0.norm_1.mlp_gamma.bias", "decoder.G_middle_0.norm_1.mlp_beta.weight", "decoder.G_middle_0.norm_1.mlp_beta.bias", "decoder.G_middle_1.conv_0.bias", "decoder.G_middle_1.conv_0.weight_orig", "decoder.G_middle_1.conv_0.weight_u", "decoder.G_middle_1.conv_0.weight_v", "decoder.G_middle_1.conv_1.bias", "decoder.G_middle_1.conv_1.weight_orig", "decoder.G_middle_1.conv_1.weight_u", "decoder.G_middle_1.conv_1.weight_v", "decoder.G_middle_1.norm_0.mlp_shared.0.weight", "decoder.G_middle_1.norm_0.mlp_shared.0.bias", "decoder.G_middle_1.norm_0.mlp_gamma.weight", "decoder.G_middle_1.norm_0.mlp_gamma.bias", "decoder.G_middle_1.norm_0.mlp_beta.weight", "decoder.G_middle_1.norm_0.mlp_beta.bias", "decoder.G_middle_1.norm_1.mlp_shared.0.weight", "decoder.G_middle_1.norm_1.mlp_shared.0.bias", "decoder.G_middle_1.norm_1.mlp_gamma.weight", "decoder.G_middle_1.norm_1.mlp_gamma.bias", "decoder.G_middle_1.norm_1.mlp_beta.weight", "decoder.G_middle_1.norm_1.mlp_beta.bias", "decoder.G_middle_2.conv_0.bias", "decoder.G_middle_2.conv_0.weight_orig", "decoder.G_middle_2.conv_0.weight_u", "decoder.G_middle_2.conv_0.weight_v", "decoder.G_middle_2.conv_1.bias", "decoder.G_middle_2.conv_1.weight_orig", "decoder.G_middle_2.conv_1.weight_u", "decoder.G_middle_2.conv_1.weight_v", "decoder.G_middle_2.norm_0.mlp_shared.0.weight", "decoder.G_middle_2.norm_0.mlp_shared.0.bias", "decoder.G_middle_2.norm_0.mlp_gamma.weight", "decoder.G_middle_2.norm_0.mlp_gamma.bias", "decoder.G_middle_2.norm_0.mlp_beta.weight", "decoder.G_middle_2.norm_0.mlp_beta.bias", "decoder.G_middle_2.norm_1.mlp_shared.0.weight", "decoder.G_middle_2.norm_1.mlp_shared.0.bias", "decoder.G_middle_2.norm_1.mlp_gamma.weight", "decoder.G_middle_2.norm_1.mlp_gamma.bias", "decoder.G_middle_2.norm_1.mlp_beta.weight", "decoder.G_middle_2.norm_1.mlp_beta.bias", "decoder.up_0.conv_0.bias", "decoder.up_0.conv_0.weight_orig", "decoder.up_0.conv_0.weight_u", "decoder.up_0.conv_0.weight_v", "decoder.up_0.conv_1.bias", "decoder.up_0.conv_1.weight_orig", "decoder.up_0.conv_1.weight_u", "decoder.up_0.conv_1.weight_v", "decoder.up_0.conv_s.weight_orig", "decoder.up_0.conv_s.weight_u", "decoder.up_0.conv_s.weight_v", "decoder.up_0.norm_0.mlp_shared.0.weight", "decoder.up_0.norm_0.mlp_shared.0.bias", "decoder.up_0.norm_0.mlp_gamma.weight", "decoder.up_0.norm_0.mlp_gamma.bias", "decoder.up_0.norm_0.mlp_beta.weight", "decoder.up_0.norm_0.mlp_beta.bias", "decoder.up_0.norm_1.mlp_shared.0.weight", "decoder.up_0.norm_1.mlp_shared.0.bias", "decoder.up_0.norm_1.mlp_gamma.weight", "decoder.up_0.norm_1.mlp_gamma.bias", "decoder.up_0.norm_1.mlp_beta.weight", "decoder.up_0.norm_1.mlp_beta.bias", "decoder.up_0.norm_s.mlp_shared.0.weight", "decoder.up_0.norm_s.mlp_shared.0.bias", "decoder.up_0.norm_s.mlp_gamma.weight", "decoder.up_0.norm_s.mlp_gamma.bias", "decoder.up_0.norm_s.mlp_beta.weight", "decoder.up_0.norm_s.mlp_beta.bias", "decoder.up_1.conv_0.bias", "decoder.up_1.conv_0.weight_orig", "decoder.up_1.conv_0.weight_u", "decoder.up_1.conv_0.weight_v", "decoder.up_1.conv_1.bias", "decoder.up_1.conv_1.weight_orig", "decoder.up_1.conv_1.weight_u", "decoder.up_1.conv_1.weight_v", "decoder.up_1.conv_s.weight_orig", "decoder.up_1.conv_s.weight_u", "decoder.up_1.conv_s.weight_v", "decoder.up_1.norm_0.mlp_shared.0.weight", "decoder.up_1.norm_0.mlp_shared.0.bias", "decoder.up_1.norm_0.mlp_gamma.weight", "decoder.up_1.norm_0.mlp_gamma.bias", "decoder.up_1.norm_0.mlp_beta.weight", "decoder.up_1.norm_0.mlp_beta.bias", "decoder.up_1.norm_1.mlp_shared.0.weight", "decoder.up_1.norm_1.mlp_shared.0.bias", "decoder.up_1.norm_1.mlp_gamma.weight", "decoder.up_1.norm_1.mlp_gamma.bias", "decoder.up_1.norm_1.mlp_beta.weight", "decoder.up_1.norm_1.mlp_beta.bias", "decoder.up_1.norm_s.mlp_shared.0.weight", "decoder.up_1.norm_s.mlp_shared.0.bias", "decoder.up_1.norm_s.mlp_gamma.weight", "decoder.up_1.norm_s.mlp_gamma.bias", "decoder.up_1.norm_s.mlp_beta.weight", "decoder.up_1.norm_s.mlp_beta.bias", "decoder.conv_img.weight", "decoder.conv_img.bias".

It seems that those two pretrained model have different structures, should I change something in demo.py or vox-adv-256.yaml? Looking forward to your reply, Thx a lot!

crop face

Your work is amazing!

But I have two questions：

Is it possible to pad more borders when cropping faces? Or does it have to crop the face strictly according to the detected box?
CVPR2022-DaGAN/crop-video.py

Line 25 in 78b22ed

return np.array(bboxes)[:, :-1] * scale_factor

When -1 is used, the IndexError is reported.

Question about the Eqn.(9) and Fig.10

Hi, thanks for sharing the good work. After reading the paper, I have some confusion in understanding the attention process in equation (9).

How to understand the physical meaning of the attenion? The query feature comes from the source depth map, while the key and value features come from the warped source feature; since the depth map has a different pose with the warped feaure, and according to the qkv attention, the re-represented feature should have spatial structure simialr with the query (the depth map here); so how to guarantee the refined feature $F_g$ has the pose of the driving image?
Intuitively, features of different positions may have different relations with features of other postions; in Fig.10, it seems the attentions from different positions are always similar (i.e., both attend the mouth and eyes), how to understand this?

Suggestion: Add automatic face cropping in demo.py

Output result significally related to input image. There few samples:

Photo as is
Photo with manual crop
Photo converted to video and cropped with crop-video.py

Please, crop input image inside demo.py automatically

result_pug1.mp4

result_pug2.mp4

result_pug3.mp4

Line 166 in 78b22ed

loaded_dict_dec = torch.load('depth/models/weights_19/depth.pth')

if opt.cpu:
  loaded_dict_enc = torch.load('checkpoints/depth_face_model/encoder.pth', map_location=torch.device('cpu'))
  loaded_dict_dec = torch.load('checkpoints/depth_face_model/depth.pth', map_location=torch.device('cpu'))

Hello, can you provide reconstruction.py running code? I'm not familiar with torch distributed. Launch, error messages always appear.

Hello, thank you for your great work! I have a question on the training time, can you give a rough time cost if I want to train the model with 150 epochs and repeat 75 times on Voxceleb?

Thanks

Lossy conversion from float32 to uint8. Range [0, 1]. Convert image to uint8 prior to saving to suppress this warning.

how to cope this quessiton, it is in the demo.py.