harlanhong / iccv2023-mcnet Goto Github PK
View Code? Open in Web Editor NEWThe official code of our ICCV2023 work: Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation
The official code of our ICCV2023 work: Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation
I try to run this line in a loop to automatically generate multiple videos with different source and driving data. I note that if I use a loop to run make_animation
twice on the same input data, the result will be slightly different (something like 1e-5 or so). Is it because the model needs reset for each run, or it's just floating point error? If the model needs reset, how to do it? Thanks!
你好我关注您的工作很久了,包括Dagan++和MCNET都很惊艳,在MCNET中您介绍了一种新颖的人脸特征补偿的方法,通过全局人脸特征记忆库来进行补偿,按文中描述这里的能力是通过大量数据学习得到的,而在真实场景中我理解人脸驱动数据集分为两种,一种是驱动数据类似talking head的数据集是动态视频,另一种是纯静态的人脸数据集,两个数据集的目的不同,动态视频数据主要学习动态变化,静态数据集主要学习人脸特征,既然如此,我有个大胆的想法,您提到的全局人脸元记忆库,有没有可能直接使用类似SD或者lora来提取人脸特征补偿。这么做有两个好处,第一这个模块可以直接用一些已经训练好的模型,第二SD这种模型的数据集一定是远大于动态视频素材里面相关的人脸ID,这样做是否有可能呢
Hi, thank you for your research.
MCNET make realiable result compare than other face animation models. I like it.
Do you have any plan to share 512 size model?
If not, any guide or advice for training 512 size model? (ex. number of keypoint, training time or epoch, or anything regarding config.yaml)
best regards.
我看论文原理,是没有强绑定sourc与target必须为同一个人的,但是在论文实验说明中,有明确说训练阶段source与target为同一id,以至于loss那块存在感知损失,且我看源码数据处理模块FramesDataset中,source与target也是同id.
因为我现在有个任务,是需要实现比较精细的不同id之间的重演,所以我想问一下,是否可以在该工程基础上训不同id的重演?希望作者能回复一下,不甚感激!
Hi, it's a very nice work.
But when I train the model, I got a problem with 'GeneratorFullModel' object has no attribute 'mb', the 'train.py' shows in 163 lines, I don't find any code about 'mb' define.
使用开源模型推理的效果,生成的视频,背景会受驱动视频影响,跟论文展示的效果,差别很大,是需要自己重新训练吗?
目的: 驱动一张图像中的头像运动,
预处理:为了效果好进行了裁剪操作(得到关于头像的一部分,其余舍弃),
对预处理图像进行驱动,得到驱动后的视频,
那么如何将驱动后视频放回原图像中。
out = out * (1 - occlusion_last) + encode_i
out = self.final(out)
out = torch.sigmoid(out)
out = out * (1 - occlusion_last) + deformed_source * occlusion_last
请问可以解释一下这几行代码的含义吗,我理解的是out * (1 - occlusion_last)保留未遮挡部分,再加上需要修复的源区域?
Thanks for your work!
What's the license for this repo?
您好,看到您 project 的视频中使用了您的方法+GFPGAN 的效果,请问如何将两者结合呢
@harlanhong when I try to extend to 512p training, I get this issue after changing the image size in config file (as you recommended here: #4 (comment)). Which other changes should I keep in mind for 512p training?
Traceback (most recent call last):
File "/home/ubuntu/code/ICCV2023-MCNET/run.py", line 256, in <module>
train.train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
File "/home/ubuntu/code/ICCV2023-MCNET/train.py", line 119, in train
losses_generator, generated = generator_full(x,weight,epoch=epoch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/code/ICCV2023-MCNET/modules/model.py", line 319, in forward
generated = self.generator(x['source'], kp_source=kp_source, kp_driving=kp_driving, source_depth = depth_source, driving_depth = depth_driving,driving_image=x['driving'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/parallel/distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/code/ICCV2023-MCNET/modules/generator.py", line 497, in forward
out = self.mbUnit(out,output_dict,keypoints = kp_source['value'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/code/ICCV2023-MCNET/modules/generator.py", line 309, in forward
feat = eval('self.feat_forward_proj_{}'.format(w))(out_cs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Given groups=1, weight of size [512, 128, 1, 1], expected input[8, 256, 64, 64] to have 128 channels, but got 256 channels instead
In run.py and reconstruction.py, if the Framedataset used for reconstruction doesn't utilize the CSV partitioning based on pair_list, could you please confirm if pre-partitioned test data is being used in the reconstruction process? Also, could you specify which file from the data directory is being used for this partitioning?
hi, when I run the command
' CUDA_VISIBLE_DEVICES=0 python demo.py --config config/vox-256.yaml --driving_video crop.mp4 --source_image test_images/a3.jpg --checkpoint 00000099-checkpoint.pth.tar --relative --adapt_scale --kp_num 15 --generator Unet_Generator_keypoint_aware --result_video result.mp4 --mbunit ExpendMemoryUnit --memsize 1 '
the process has been killed in the middle without any prompt. Could you please help me to find out the reason?
self.mbUnit = eval(kwargs['mbunit'])(kwargs['mb_spatial'],kwargs['mb_channel'])
File "", line 0
^
SyntaxError: unexpected EOF while parsing
i find kwargs['mbunit'] is an empty str, which is "".
Thank you for your excellent work. I'm trying to reproduce your work, but I have some problems. You said, "The source image and the driving video share the same identity in the training stage, so the sampled driving frame can be used as the ground-truth of a generated source-identity image." in your paper. But I noticed that in the code file "vox256.csv" seems like not aligh you paper's word, that the source image and the driving frame do not share the same id. So I'm litil bit confused about that, please tell me the correct trainning dataset configs. Thank you!
great work!
but It seems the link blow is the old model for Dagan2022 instead of the new pretarined model? Could you please check it out?
Could you confirm if the training hyperparameters in vox-256.yaml are configured correctly?
I'm particularly puzzled about the generator_gan loss weight being set to 0, as seen here: https://github.com/harlanhong/ICCV2023-MCNET/blob/master/config/vox-256.yaml#L74. If I understand correctly, doesn't this imply that the discriminator's weights remain unchanged throughout training? https://github.com/harlanhong/ICCV2023-MCNET/blob/master/train.py#L138
Can you please clarify?
Why did my demo running result has no teeth show up? And how to crop image to ensure better result
作者您好,我尝试了复现MCNet在人脸上的效果,结果很好。但是当我用几乎相同体量的动漫数据集和相同的参数想训练一个动漫人脸的模型,但是结果并不想人脸那么好。主要就是面部表情没有一些细微的变化(张嘴或闭眼)经常不准确,而且相比真实人脸的生成结果动漫的特别模糊。我想请问一下这有可能是什么原因导致的。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.