Code Monkey home page Code Monkey logo

cvpr2022-dagan's People

Contributors

harlanhong avatar xu-vision-group avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cvpr2022-dagan's Issues

Depth map from paper not reproducable

Hi

Firstly, thank you for this awesome work. However, I tried to reproduce the depth map from the paper using the "demo.py" script and the result is quite different from the one seen in Fig. 9 of the paper.

Result from the paper:
depthmapDaGAN paper

Result running the script:
depthmapDaGAN_myRun

Corresponding depth map as pointcloud:
depthmapDaGAN_myRunPCD

The Depth map looks way more smooth and facial details like the nose or mouth are completely lost.

Catoon Demo

Good job, How to generate Cartoon Sample?

Error in running a demo version!

Hello! Thanks for sharing openly amazing work! My research is also related to generating talking faces. I face error when tried to run:
CUDA_VISIBLE_DEVICES=0 python demo.py --config config/vox-adv-256.yaml --driving_video data/2.mp4 --source_image data/2.jpg --checkpoint depth/models/weights_19/encoder.pth --relative --adapt_scale --kp_num 15 --generator DepthAwareGenerator
image
Can you please correct me where I made mistakes while running the demo one?

关于app.py

作者您好,首先非常祝贺您做出了如此成功的工作。另外,请问这个项目有考虑做个APP部署到Android端或者网页端么。

About measurement question

Hi, @harlanhong . First, I appreciate your nice work in this fields.

I'm just asking you how to measure the details of your metric result.

Did you write simple codes or just import library functions to measure those results in tables?

And if you wrote the codes, could you share that? If not, what library did you import to measure those results?

Thank you.

image

my fault

Great work.
But i got some erro when trying the SPADE.pth.tar :

RuntimeError: Error(s) in loading state_dict for DepthAwareGenerator:
Missing key(s) in state_dict: "up_blocks.0.conv.weight", "up_blocks.0.conv.bias",....
Unexpected key(s) in state_dict: "decoder.compress.weight", "decoder.compress.bias",...

Size of input

Hello
Thanks for your great work!
I have a question, does your model support input resolution higher, than 256px? 512px for example
I see that in code input video and image are resized to 256px, so causes the loss of visual quality
Is there a way to use 512x512 img/vid without losing quality?

evaluation and comparison with MarioNETte and MeshG

Hello, thanks for releasing the code of this excellent work !
I have a question about evaluation and comparison with MarioNETte and MeshG. As mentioned in the paper, the test set sampling strategy follows that of MarioNETte. And the reported results of MarioNETte and MeshG are replicated from their original papers. So I wonder if the test set lists in the folder './data' such as '/data/celeV_cross_id_evaluation.csv' are the same as MarioNETte and MeshG.
Looking forward to your reply ! Thanks !

Missing setup.py

Hi,

Thanks for this wonderful work!

It seems that the setup.py file is missing in this new version. Is it possible for you to upload it again? Thanks a lot for the help!

Best,
Wenhua

kp_num

It seems the param 'kp_num' is not allowed to change.
When I set it to 20, error occurs:

Traceback (most recent call last):
File "demo.py", line 191, in
generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint, cpu=opt.cpu)
File "demo.py", line 46, in load_checkpoints
generator.load_state_dict(ckp_generator)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1490, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SPADEDepthAwareGenerator:
size mismatch for dense_motion_network.hourglass.encoder.down_blocks.0.conv.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 84, 3, 3]).
size mismatch for dense_motion_network.mask.weight: copying a param with shape torch.Size([16, 128, 7, 7]) from checkpoint, the shape in current model is torch.Size([21, 148, 7, 7]).
size mismatch for dense_motion_network.mask.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([21]).
size mismatch for dense_motion_network.occlusion.weight: copying a param with shape torch.Size([1, 128, 7, 7]) from checkpoint, the shape in current model is torch.Size([1, 148, 7, 7]).

Hello, I was wondering what's wrong with my example, I follow the instruction but get different result

I follow the instruction, set my parameter as followed:

CUDA_VISIBLE_DEVICES=7 python demo.py
--config config/vox-adv-256.yaml
--driving_video source/example.mp4
--source_image source/example.png
--checkpoint download/SPADE_DaGAN_vox_adv_256.pth.tar
--kp_num 15
--generator SPADEDepthAwareGenerator
--result_video results/example_out.mp4
--relative --adapt_scale

example_out.mp4

Is there something wrong with my parameters.

add web demo/model to Huggingface

Hi, would you be interested in adding DaGAN to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. Models/datasets/spaces(web demos) can be added to a user account or organization similar to github.

Example from other organizations:
Keras: https://huggingface.co/keras-io
Microsoft: https://huggingface.co/microsoft
Facebook: https://huggingface.co/facebook

Example spaces with repos:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/salesforce/BLIP

github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore

and here are guides for adding spaces/models/datasets to your org

How to add a Space: https://huggingface.co/blog/gradio-spaces
how to add models: https://huggingface.co/docs/hub/adding-a-model
uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

Question regarding output of DepthAwareAttention

In the DepthAwareAttention module, the inputs are: depth_image and output feature map generated by occlusion map line 195.

depth_image is stored in 'source' while output feature is stored in 'feat'.

There is one variable gamma line 66, which is basically a zero tensor. self.gamma = nn.Parameter(torch.zeros(1))

After doing all the operations in forward pass, you are getting an output feature map. It is then multiplied with gamma and feat is added line 87.
out = self.gamma*out + feat

That means all the operations performed during the forward pass are multiplied to zero and the original output features were returned. That makes the entire DepthAwareAttention useless, as the attention returned was also never used in the code.

Can you please clarify on this?

Error No such file or directory: 'depth/models/weights_19/encoder.pth'

I downloaded the pre-trained weights from the onedrive DaGAN_vox_adv_256.pth.tar and put it in a checkpoints directory.
When I run the demo command with --cpu I get the following error:

(dagan) user@Users-MacBook-Air CVPR2022-DaGAN % python demo.py --config config/vox-adv-256.yaml --driving_video ./assets/driving.mp4 --source_image ./assets/leo.jpg --checkpoint ./checkpoints/DaGAN_vox_adv_256.pth.tar --relative --adapt_scale --kp_num 15 --generator DepthAwareGenerator --cpu                 
Traceback (most recent call last):
  File "demo.py", line 165, in <module>
    loaded_dict_enc = torch.load('depth/models/weights_19/encoder.pth')
  File "/Users/user/miniconda3/envs/dagan/lib/python3.7/site-packages/torch/serialization.py", line 594, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/Users/user/miniconda3/envs/dagan/lib/python3.7/site-packages/torch/serialization.py", line 230, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/Users/user/miniconda3/envs/dagan/lib/python3.7/site-packages/torch/serialization.py", line 211, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'depth/models/weights_19/encoder.pth'

How can I solve it? Many thanks, great job and good luck for ICLR :) !

Fix some codes about py-feat library

Hi @harlanhong !

First, I'm very pleased to see your works, DaGAN. Thanks for your effort.
The reason why I issue this post is I just want to fix your code a little bit.
In your utils.py, there are some codes using py-feat library and this a causes of problem.
I don't know which version of py-feat you use, but no matter what you should change some codes like this way due to latest version using this way:

p1 = out1.facepose().values # AS-IS
p1 = out1.facepose.values # TO-BE

because latest version of py-feat uses facepose as property like this:

@property
    def facepose(self):
        """Returns the facepose data using the columns set in fex.facepose_columns

        Returns:
            DataFrame: facepose data
        """
        return self[self.facepose_columns]

Could you fix this problems for anybody who will use this codes?

Error as training on my own dataset, did anyone have this problem before?

[W python_anomaly_mode.cpp:104] Warning: Error detected in CudnnBatchNormBackward. Traceback of forward call that caused the error:
File "run.py", line 144, in
train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
File "/mnt/users/CVPR2022-DaGAN-master/train.py", line 66, in train
losses_generator, generated = generator_full(x)

Meanwhile there's another problem as well:
Traceback (most recent call last):
File "run.py", line 144, in
train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
File "/mnt/users/CVPR2022-DaGAN-master/train.py", line 74, in train
loss.backward()
File "/home/anaconda3/envs/DaGAN/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32]] is at version 5; expected version 4 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

It seems an inplace problem happen, but I couldn't find anywhere with an inplace code.

question about evaluation

hi, thanks for releasing the code. i found a line "from feat import Detector" in the func "evaluate_PRMSE_AUCON()" in utils.py, but i didn't
find the module "feat". Is it a package that should be installed or some code that has not been uploaded? looking forward to your reply. thanks a lot.

Error with Spade Model

Hi, This is the error I am getting while trying to run the Spade model. Any walkthrough?

image

Missing steps to use command line demo

I likely am missing some key information common to running demos for projects like this, but I was hoping the author or anyone else that is knowledgeable can help me out here. I'm attempting to run the demo as per the repo instructions using a source image of my own and a driving video of my own. I'm trying to utilize the SPADE checkpoint provided as a download, as well as other checkpoints (e.g., related to depth and encoder) that seemed to be required in order to run the demo code. This is all being attempted in a conda environment with the dependencies fulfilled on a Macbook Pro (so, Mac OSX without a dedicated GPU). From what I understand, the demo should be able to be run on such a simple machine without a GPU and/or Linux.

I seem to be having issues with loading checkpoints themselves, as evidenced by ultimately encountering an error such as:

RuntimeError: Error(s) in loading state_dict for ResnetEncoder:
	size mismatch for encoder.layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for encoder.layer1.1.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for encoder.layer2.0.conv1.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
	size mismatch for encoder.layer2.0.downsample.0.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 1, 1]).
	size mismatch for encoder.layer2.0.downsample.1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
	size mismatch for encoder.layer2.0.downsample.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
	size mismatch for encoder.layer2.0.downsample.1.running_mean: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
	size mismatch for encoder.layer2.0.downsample.1.running_var: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
	size mismatch for encoder.layer2.1.conv1.weight: copying a param with shape torch.Size([128, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
	size mismatch for encoder.layer3.0.conv1.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
	size mismatch for encoder.layer3.0.downsample.0.weight: copying a param with shape torch.Size([1024, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]).
	size mismatch for encoder.layer3.0.downsample.1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for encoder.layer3.0.downsample.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for encoder.layer3.0.downsample.1.running_mean: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for encoder.layer3.0.downsample.1.running_var: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for encoder.layer3.1.conv1.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
	size mismatch for encoder.layer4.0.conv1.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
	size mismatch for encoder.layer4.0.downsample.0.weight: copying a param with shape torch.Size([2048, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 1, 1]).
	size mismatch for encoder.layer4.0.downsample.1.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for encoder.layer4.0.downsample.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for encoder.layer4.0.downsample.1.running_mean: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for encoder.layer4.0.downsample.1.running_var: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for encoder.layer4.1.conv1.weight: copying a param with shape torch.Size([512, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
	size mismatch for encoder.fc.weight: copying a param with shape torch.Size([1000, 2048]) from checkpoint, the shape in current model is torch.Size([1000, 512]).

Are there specific steps I should be taking, that are not listed in the repo, in order to run the demo code using a CPU? Is it possible to run the demo code using a CPU? Any help would be appreciated. The command I'm trying to use to run the demo is:

python demo.py --config config/vox-adv-256.yaml --driving_video driving.mp4 --source_image source.png --checkpoint download/SPADE_DaGAN_vox_adv_256.pth.tar --relative --adapt_scale --kp_num 15 --generator SPADEDepthAwareGenerator --find_best_frame

testing error

when i run this command CUDA_VISIBLE_DEVICES=0 python demo.py --config config/vox-adv-256.yaml --driving_video ./example_video.mp4 --source_image ./example_image.png --checkpoint ./checkpoints/SPADE_DaGAN_vox_adv_256.pth.tar --relative --adapt_scale --kp_num 15 --generator SPADEDepthAwareGenerator --result_video results/example_out.mp4 --find_best_frame

I got the following error:
Traceback (most recent call last):
File "demo.py", line 169, in
depth_encoder.load_state_dict(filtered_dict_enc)
File "/home/miniconda3/envs/dagan/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ResnetEncoder:
size mismatch for encoder.layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for encoder.layer1.1.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for encoder.layer2.0.conv1.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
size mismatch for encoder.layer2.0.downsample.0.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 1, 1]).
size mismatch for encoder.layer2.0.downsample.1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for encoder.layer2.0.downsample.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for encoder.layer2.0.downsample.1.running_mean: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for encoder.layer2.0.downsample.1.running_var: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for encoder.layer2.1.conv1.weight: copying a param with shape torch.Size([128, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
size mismatch for encoder.layer3.0.conv1.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
size mismatch for encoder.layer3.0.downsample.0.weight: copying a param with shape torch.Size([1024, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]).
size mismatch for encoder.layer3.0.downsample.1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for encoder.layer3.0.downsample.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for encoder.layer3.0.downsample.1.running_mean: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for encoder.layer3.0.downsample.1.running_var: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for encoder.layer3.1.conv1.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for encoder.layer4.0.conv1.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
size mismatch for encoder.layer4.0.downsample.0.weight: copying a param with shape torch.Size([2048, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 1, 1]).
size mismatch for encoder.layer4.0.downsample.1.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layer4.0.downsample.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layer4.0.downsample.1.running_mean: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layer4.0.downsample.1.running_var: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layer4.1.conv1.weight: copying a param with shape torch.Size([512, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for encoder.fc.weight: copying a param with shape torch.Size([1000, 2048]) from checkpoint, the shape in current model is torch.Size([1000, 512]).

pip install -r requirements.txt

Processing /data/fhongac/workspace/src/CVPR22_DaGAN/torch-1.9.0+cu111-cp37-cp37m-linux_x86_64.whl
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/data/fhongac/workspace/src/CVPR22_DaGAN/torch-1.9.0+cu111-cp37-cp37m-linux_x86_64.whl'

Error when load the spade model

Nice work!
But I have encountered a problem that when I load the SPADE model as I load theDaGAN model, the following problem occurs. Any suggestions?

  File "demo.py", line 191, in <module>
    generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint, cpu=opt.cpu)
  File "demo.py", line 46, in load_checkpoints
    generator.load_state_dict(ckp_generator)
  File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DepthAwareGenerator:
   Missing key(s) in state_dict: "up_blocks.0.conv.weight", "up_blocks.0.conv.bias",...... "final.bias". 
   Unexpected key(s) in state_dict: "decoder.compress.weight", "decoder.compress.bias", ..."decoder.G_middle_0.norm_0.mlp_beta.weight" ```
Any suggestions?

DaGAN VoxCeleb

Hello, I see, that you've released depth face model trained on voxceleb2
Does it show better results, than your previous depth checkpoints? Can I use it with SPADE or standart DaGAN checkpoints?
Can you please tell us, when do you plan to release DaGAN checkpoint corresponding to voxceleb2 depth model?
Thanks a lot for you great work.

The generated face remains the same pose

Thanks for your good work; however when i tried run the demo, the generated video tends to remains the same pose as the source image; while in the paper (Figure 2) the generated results have driving frame's pose(this is also the case for the results from README), so why is this the case?

result.mp4

Question about the background of images

Thanks for this incredible work!
I've looked at the demo gif on the project homepage, I was wondering about why the background is moving with head movement, is there any way to disentangle the foreground and background?

将生成的头部图像拼接到身体的问题

你好,请问有没有考虑过将生成的图像拼接回身体的问题,当前的驱动视频会改变输入图片的表情和动作,造成拼接回去会错位的问题,有没有将动作和表情分开驱动的方法,能否分享下相关的研究?

How to preprocess the image data?

If I have an face image as the driving image. How to properly crop it? Could you provide the script?
I tested the crop-video.py but it could not work for a single image.

Error while training on VoxCeleb

Hi,
I am trying to train DaGAN on VoxCeleb. The following error is occurring.

  File "run.py", line 144, in <module>
    train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
  File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/train.py", line 66, in train
    losses_generator, generated = generator_full(x)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/modules/model.py", line 189, in forward
    kp_driving = self.kp_extractor(driving)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/modules/keypoint_detector.py", line 51, in forward
    feature_map = self.predictor(x) #x bz,4,64,64
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/modules/util.py", line 252, in forward
    return self.decoder(self.encoder(x))
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/modules/util.py", line 178, in forward
    out = up_block(out)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/modules/util.py", line 92, in forward
    out = self.norm(out)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 745, in forward
    self.eps,
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/nn/functional.py", line 2283, in batch_norm
    input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled
 (function _print_stack)
^M  0%|          | 0/3965 [00:26<?, ?it/s]
^M  0%|          | 0/150 [00:26<?, ?it/s]

Traceback (most recent call last):
  File "run.py", line 144, in <module>
    train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
  File "/home/madhav3101/gan_codes/CVPR2022-DaGAN/train.py", line 70, in train
    loss.backward()
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32]] is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

  FutureWarning,
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 13113) of binary: /home/madhav3101/env_tf/bin/python
Traceback (most recent call last):
  File "/home/madhav3101/miniconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/madhav3101/miniconda3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
    )(*cmd_args)
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/madhav3101/env_tf/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
    failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
run.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2022-04-25_17:30:13
  host      : gnode90.local
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 13113)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

How do I train the network with my own data?

Hi, First I want to thank you for providing the code. DaGAN works like magic.

Here is my issue: I'd like to create a video of guy with strong emotion, like screaming. I have the driving video, but the generated clip from DaGAN doesn't share the strong emotion as the driving video, the mouth only open slightly, unlike the wide open mouth in the driving video.

I thought it is the dataset problem: there are not many strong emotions from the voxceleb dataset, which consists of interview videos. I set out to train the model from scratch with the driving video (about 1500 face images). I use your resnet-50 depth encoder/decoder pretrained weights, and train my own generator, kp detector and discriminator. However, the results are horrible. The face doesn't even change expression.

My question is: 1. should I train from scratch or just fine-tune your model with my driving video? 2. When I train the network, I just input a bunch of face images of the same person, with different expression/head pose. Is this right? Does the "driving" and "source" frame has to be close together in the video (only slight expression/pose change)?

Thanks a lot!

Error occurs when I change model to SPADE_DaGAN_vox_adv_256.pth.tar

As I use DaGAN_vox_adv_256.pth.tar as my pretrained model, the result is not very well. Therefore I want to change model to SPADE_DaGAN_vox_adv_256.pth.tar but error as followed occurs:

RuntimeError: Error(s) in loading state_dict for DepthAwareGenerator:
        Missing key(s) in state_dict: "up_blocks.0.conv.weight", "up_blocks.0.conv.bias", "up_blocks.0.norm.weight", "up_blocks.0.norm.bias", "up_blocks.0.norm.running_mean", "up_blocks.0.norm.running_var", "up_blocks.1.conv.weight", "up_blocks.1.conv.bias", "up_blocks.1.norm.weight", "up_blocks.1.norm.bias", "up_blocks.1.norm.running_mean", "up_blocks.1.norm.running_var", "bottleneck.r0.conv1.weight", "bottleneck.r0.conv1.bias", "bottleneck.r0.conv2.weight", "bottleneck.r0.conv2.bias", "bottleneck.r0.norm1.weight", "bottleneck.r0.norm1.bias", "bottleneck.r0.norm1.running_mean", "bottleneck.r0.norm1.running_var", "bottleneck.r0.norm2.weight", "bottleneck.r0.norm2.bias", "bottleneck.r0.norm2.running_mean", "bottleneck.r0.norm2.running_var", "bottleneck.r1.conv1.weight", "bottleneck.r1.conv1.bias", "bottleneck.r1.conv2.weight", "bottleneck.r1.conv2.bias", "bottleneck.r1.norm1.weight", "bottleneck.r1.norm1.bias", "bottleneck.r1.norm1.running_mean", "bottleneck.r1.norm1.running_var", "bottleneck.r1.norm2.weight", "bottleneck.r1.norm2.bias", "bottleneck.r1.norm2.running_mean", "bottleneck.r1.norm2.running_var", "bottleneck.r2.conv1.weight", "bottleneck.r2.conv1.bias", "bottleneck.r2.conv2.weight", "bottleneck.r2.conv2.bias", "bottleneck.r2.norm1.weight", "bottleneck.r2.norm1.bias", "bottleneck.r2.norm1.running_mean", "bottleneck.r2.norm1.running_var", "bottleneck.r2.norm2.weight", "bottleneck.r2.norm2.bias", "bottleneck.r2.norm2.running_mean", "bottleneck.r2.norm2.running_var", "bottleneck.r3.conv1.weight", "bottleneck.r3.conv1.bias", "bottleneck.r3.conv2.weight", "bottleneck.r3.conv2.bias", "bottleneck.r3.norm1.weight", "bottleneck.r3.norm1.bias", "bottleneck.r3.norm1.running_mean", "bottleneck.r3.norm1.running_var", "bottleneck.r3.norm2.weight", "bottleneck.r3.norm2.bias", "bottleneck.r3.norm2.running_mean", "bottleneck.r3.norm2.running_var", "bottleneck.r4.conv1.weight", "bottleneck.r4.conv1.bias", "bottleneck.r4.conv2.weight", "bottleneck.r4.conv2.bias", "bottleneck.r4.norm1.weight", "bottleneck.r4.norm1.bias", "bottleneck.r4.norm1.running_mean", "bottleneck.r4.norm1.running_var", "bottleneck.r4.norm2.weight", "bottleneck.r4.norm2.bias", "bottleneck.r4.norm2.running_mean", "bottleneck.r4.norm2.running_var", "bottleneck.r5.conv1.weight", "bottleneck.r5.conv1.bias", "bottleneck.r5.conv2.weight", "bottleneck.r5.conv2.bias", "bottleneck.r5.norm1.weight", "bottleneck.r5.norm1.bias", "bottleneck.r5.norm1.running_mean", "bottleneck.r5.norm1.running_var", "bottleneck.r5.norm2.weight", "bottleneck.r5.norm2.bias", "bottleneck.r5.norm2.running_mean", "bottleneck.r5.norm2.running_var", "final.weight", "final.bias". 
        Unexpected key(s) in state_dict: "decoder.compress.weight", "decoder.compress.bias", "decoder.fc.weight", "decoder.fc.bias", "decoder.G_middle_0.conv_0.bias", "decoder.G_middle_0.conv_0.weight_orig", "decoder.G_middle_0.conv_0.weight_u", "decoder.G_middle_0.conv_0.weight_v", "decoder.G_middle_0.conv_1.bias", "decoder.G_middle_0.conv_1.weight_orig", "decoder.G_middle_0.conv_1.weight_u", "decoder.G_middle_0.conv_1.weight_v", "decoder.G_middle_0.norm_0.mlp_shared.0.weight", "decoder.G_middle_0.norm_0.mlp_shared.0.bias", "decoder.G_middle_0.norm_0.mlp_gamma.weight", "decoder.G_middle_0.norm_0.mlp_gamma.bias", "decoder.G_middle_0.norm_0.mlp_beta.weight", "decoder.G_middle_0.norm_0.mlp_beta.bias", "decoder.G_middle_0.norm_1.mlp_shared.0.weight", "decoder.G_middle_0.norm_1.mlp_shared.0.bias", "decoder.G_middle_0.norm_1.mlp_gamma.weight", "decoder.G_middle_0.norm_1.mlp_gamma.bias", "decoder.G_middle_0.norm_1.mlp_beta.weight", "decoder.G_middle_0.norm_1.mlp_beta.bias", "decoder.G_middle_1.conv_0.bias", "decoder.G_middle_1.conv_0.weight_orig", "decoder.G_middle_1.conv_0.weight_u", "decoder.G_middle_1.conv_0.weight_v", "decoder.G_middle_1.conv_1.bias", "decoder.G_middle_1.conv_1.weight_orig", "decoder.G_middle_1.conv_1.weight_u", "decoder.G_middle_1.conv_1.weight_v", "decoder.G_middle_1.norm_0.mlp_shared.0.weight", "decoder.G_middle_1.norm_0.mlp_shared.0.bias", "decoder.G_middle_1.norm_0.mlp_gamma.weight", "decoder.G_middle_1.norm_0.mlp_gamma.bias", "decoder.G_middle_1.norm_0.mlp_beta.weight", "decoder.G_middle_1.norm_0.mlp_beta.bias", "decoder.G_middle_1.norm_1.mlp_shared.0.weight", "decoder.G_middle_1.norm_1.mlp_shared.0.bias", "decoder.G_middle_1.norm_1.mlp_gamma.weight", "decoder.G_middle_1.norm_1.mlp_gamma.bias", "decoder.G_middle_1.norm_1.mlp_beta.weight", "decoder.G_middle_1.norm_1.mlp_beta.bias", "decoder.G_middle_2.conv_0.bias", "decoder.G_middle_2.conv_0.weight_orig", "decoder.G_middle_2.conv_0.weight_u", "decoder.G_middle_2.conv_0.weight_v", "decoder.G_middle_2.conv_1.bias", "decoder.G_middle_2.conv_1.weight_orig", "decoder.G_middle_2.conv_1.weight_u", "decoder.G_middle_2.conv_1.weight_v", "decoder.G_middle_2.norm_0.mlp_shared.0.weight", "decoder.G_middle_2.norm_0.mlp_shared.0.bias", "decoder.G_middle_2.norm_0.mlp_gamma.weight", "decoder.G_middle_2.norm_0.mlp_gamma.bias", "decoder.G_middle_2.norm_0.mlp_beta.weight", "decoder.G_middle_2.norm_0.mlp_beta.bias", "decoder.G_middle_2.norm_1.mlp_shared.0.weight", "decoder.G_middle_2.norm_1.mlp_shared.0.bias", "decoder.G_middle_2.norm_1.mlp_gamma.weight", "decoder.G_middle_2.norm_1.mlp_gamma.bias", "decoder.G_middle_2.norm_1.mlp_beta.weight", "decoder.G_middle_2.norm_1.mlp_beta.bias", "decoder.up_0.conv_0.bias", "decoder.up_0.conv_0.weight_orig", "decoder.up_0.conv_0.weight_u", "decoder.up_0.conv_0.weight_v", "decoder.up_0.conv_1.bias", "decoder.up_0.conv_1.weight_orig", "decoder.up_0.conv_1.weight_u", "decoder.up_0.conv_1.weight_v", "decoder.up_0.conv_s.weight_orig", "decoder.up_0.conv_s.weight_u", "decoder.up_0.conv_s.weight_v", "decoder.up_0.norm_0.mlp_shared.0.weight", "decoder.up_0.norm_0.mlp_shared.0.bias", "decoder.up_0.norm_0.mlp_gamma.weight", "decoder.up_0.norm_0.mlp_gamma.bias", "decoder.up_0.norm_0.mlp_beta.weight", "decoder.up_0.norm_0.mlp_beta.bias", "decoder.up_0.norm_1.mlp_shared.0.weight", "decoder.up_0.norm_1.mlp_shared.0.bias", "decoder.up_0.norm_1.mlp_gamma.weight", "decoder.up_0.norm_1.mlp_gamma.bias", "decoder.up_0.norm_1.mlp_beta.weight", "decoder.up_0.norm_1.mlp_beta.bias", "decoder.up_0.norm_s.mlp_shared.0.weight", "decoder.up_0.norm_s.mlp_shared.0.bias", "decoder.up_0.norm_s.mlp_gamma.weight", "decoder.up_0.norm_s.mlp_gamma.bias", "decoder.up_0.norm_s.mlp_beta.weight", "decoder.up_0.norm_s.mlp_beta.bias", "decoder.up_1.conv_0.bias", "decoder.up_1.conv_0.weight_orig", "decoder.up_1.conv_0.weight_u", "decoder.up_1.conv_0.weight_v", "decoder.up_1.conv_1.bias", "decoder.up_1.conv_1.weight_orig", "decoder.up_1.conv_1.weight_u", "decoder.up_1.conv_1.weight_v", "decoder.up_1.conv_s.weight_orig", "decoder.up_1.conv_s.weight_u", "decoder.up_1.conv_s.weight_v", "decoder.up_1.norm_0.mlp_shared.0.weight", "decoder.up_1.norm_0.mlp_shared.0.bias", "decoder.up_1.norm_0.mlp_gamma.weight", "decoder.up_1.norm_0.mlp_gamma.bias", "decoder.up_1.norm_0.mlp_beta.weight", "decoder.up_1.norm_0.mlp_beta.bias", "decoder.up_1.norm_1.mlp_shared.0.weight", "decoder.up_1.norm_1.mlp_shared.0.bias", "decoder.up_1.norm_1.mlp_gamma.weight", "decoder.up_1.norm_1.mlp_gamma.bias", "decoder.up_1.norm_1.mlp_beta.weight", "decoder.up_1.norm_1.mlp_beta.bias", "decoder.up_1.norm_s.mlp_shared.0.weight", "decoder.up_1.norm_s.mlp_shared.0.bias", "decoder.up_1.norm_s.mlp_gamma.weight", "decoder.up_1.norm_s.mlp_gamma.bias", "decoder.up_1.norm_s.mlp_beta.weight", "decoder.up_1.norm_s.mlp_beta.bias", "decoder.conv_img.weight", "decoder.conv_img.bias". 

It seems that those two pretrained model have different structures, should I change something in demo.py or vox-adv-256.yaml? Looking forward to your reply, Thx a lot!

crop face

Your work is amazing!

But I have two questions:

  1. Is it possible to pad more borders when cropping faces? Or does it have to crop the face strictly according to the detected box?
  2. return np.array(bboxes)[:, :-1] * scale_factor

    When -1 is used, the IndexError is reported.

Question about the Eqn.(9) and Fig.10

Hi, thanks for sharing the good work. After reading the paper, I have some confusion in understanding the attention process in equation (9).

  1. How to understand the physical meaning of the attenion? The query feature comes from the source depth map, while the key and value features come from the warped source feature; since the depth map has a different pose with the warped feaure, and according to the qkv attention, the re-represented feature should have spatial structure simialr with the query (the depth map here); so how to guarantee the refined feature $F_g$ has the pose of the driving image?
  2. Intuitively, features of different positions may have different relations with features of other postions; in Fig.10, it seems the attentions from different positions are always similar (i.e., both attend the mouth and eyes), how to understand this?

Suggestion: Add automatic face cropping in demo.py

Output result significally related to input image. There few samples:

  1. Photo as is
  2. Photo with manual crop
  3. Photo converted to video and cropped with crop-video.py

Please, crop input image inside demo.py automatically

result_pug1.mp4
result_pug2.mp4
result_pug3.mp4

colab?

Has anyone got this working on colab?

Did you train depth estimator yourself?

In your paper, you mentioned you need to learn a depth estimator first via self-supervision but in the repo, I didn't see training part for this module. Do you have the plan to release the training code for that part in the future?

Add torch.device('cpu') when loading encoder/decoder weights for CPU use

In my case, inference with cpu via the --cpu argument needs the addition of map_location=torch.device('cpu') in torch.load() to succeed. Can you confirm? Many thanks!

loaded_dict_dec = torch.load('depth/models/weights_19/depth.pth')

if opt.cpu:
  loaded_dict_enc = torch.load('checkpoints/depth_face_model/encoder.pth', map_location=torch.device('cpu'))
  loaded_dict_dec = torch.load('checkpoints/depth_face_model/depth.pth', map_location=torch.device('cpu'))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.