netease-gameai / face2facerho Goto Github PK

View Code? Open in Web Editor NEW

212.0 212.0 35.0 6.61 MB

The Official PyTorch Implementation for Face2Face^ρ (ECCV2022)

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

face2facerho's People

Contributors

Stargazers

Watchers

face2facerho's Issues

paper link

Hi, nice work and congrats!

Wondering where I can find the paper. Could you provide the paper link?

Thanks!

Test custom image but the result is poor

When I used a face as the source image and a video to drive, the results were poor, especially in places where the source image and video were not aligned, such as side faces. What could cause this？Thanks!

poor result

I use this picture above as the source image.

and use this picture above as the driving image.
The character features are relatively similar, but the result is very poor.Like the following figure.

If the source image remains unchanged, the driving image is changed to the following figure.

The result becomes worse, as shown in the following figure.

关于VoxCeleb1裁剪框选择的问题

非常感谢您在issue20里关于数据集大小的回复，这对我非常有帮助。但是我还有些裁剪框(bounding box)的问题没有得到您的解释。由于那个issue已经被您关闭，我只能新开一个issue，期待您的解答。

请问您用的是哪一种bounding box 从原始视频裁剪脸部区域的呢？
1）是直接从video-preprocessing 提供的 "vox-metadata.csv",
2）还是使用的VoxCeleb1 官方提供的原始bounding box 呢？

具体来说，
VoxCeleb1 官方给的boundingbox 都是正方形的，我看您的dataset demo里似乎使用的是VoxCeleb1官方的正方形框。

但是meta csv 文件里存在很多非正方形的框，如这个issue，他们会把非正方形的框给resize 到正方形，这样就会有畸变。比如您demo 里的图，直接使用video-preprocessing 的meta.csv 就会得到这样压的非常扁的图。

所以想找您确认一下在裁剪图片的时候使用的是哪一种bounding box，谢谢！

hello, is your paper released in arxiv?

sorry I have not find your paper 'Real-Time High-Resolution One-Shot Face Reenactment', could you give me a url or your paper?
Thanks a lot

Paper

May I ask where could I find the paper of this excellent work?

Could you please provide a video demo?

like other code did.

虽然有些可以看看的结果，但loss 曲线不太正常，特别是g_loss和d_loss，请教一下

请教一下这样的loss曲线是否正常，感觉判别器太强了，起不到指导监督作用，可以看看作者的loss曲线吗？
我用自己的三维模型，训练的结果：
源图：

驱动图：

结果：

Could you release 256*256 model?

Hi, It is a great work. And I wonder if you can release the model with the size of 256256 other than 512512?

link of train dataset

Nice Work it is!

I cannot download the dataset when press the download button in the website.
Could you please provide the direct super link of the downloaded dataset?

I can not find your paper

有实时的demo 么

fitting module

Hi,
I noticed that the inputs to fitting.py are images, and I wonder if the code supports having an image for src input and a video for the driving input. Thanks in advance!

模型推理耗时较长？

感谢你的开源。
我在P40显卡上运行单张图片的重演测试，只记录了驱动图片的计算耗时，如下所示：
-> 1.1 驱动图片Fitting时的DECA数据读入耗时: 1.7801 second
-> 1. 驱动图片总的Fitting耗时: 1.8205 second
-> 2.1 Motion Field的计算耗时: 0.1119 second
-> 3.3 rendering_net耗时: 0.3678 second
-> 3. reenactment总耗时: 0.4834 second
驱动图片相关计算的总耗时：1.8205 + 0.1119 + 0.3678 + 0.4834 = 2.7836秒
请问这个是什么原因呢，除Fitting部分外的其他代码和论文中一致么？

About background movements for talking head generation?

It seems that we have to fix camera position, while generating training pairs for talkinghead dataset, otherwise It is hard to solve background movements of the generation ?
If so, what is the best way for training data pairs generation and augmentation ?
If I do scale, shift and rotation augmentation while trainning, driving results are as follows:

output.mp4

关于测试时长

您好，
感谢您的开源，
在论文中我读到用2080Ti显卡可以达到25fps的速度（14401440）。我现在使用2080Ti显卡目前还达不到15fps(≈14fps) 的速度而且是在(512512)的分辨率下。想请问您

这个速度是否是正常？毕竟目前我用的是DECA来fit
我测试速度的方式是否合理？希望你能指出问题，十分感谢

关于测试速度的方式，我是这样做的：

一共输入1000张 512*512 图片，计算总时长，从而得到 fps
计时部分不进行IO操作，提前读取好1000张图片，reenact之后不保存image

torch.cuda.synchronize()
start = time.time()

for i,img in enumerate(tqdm(path_list)):
    
    # ------------------ fitting ------------------------
    drv_params = drv_fitting.fitting(i)
    drv_headpose = pose_lml_extractor.get_pose(
        src_params['shape'], drv_params['exp'], drv_params['pose'],
        drv_params['scale'], drv_params['tx'], drv_params['ty'])

    drv_lmks = pose_lml_extractor.get_project_points(
        src_params['shape'], drv_params['exp'], drv_params['pose'],
        drv_params['scale'], drv_params['tx'], drv_params['ty'])
    # ------------------ reenactment ------------------ 
    drv_face = dict()
    drv_face = load_data(opt,drv_face,drv_headpose,drv_lmks) 

    drv_face['landmark_img'] = landmark_img_generator.generate_landmark_img(drv_face['landmarks'])
    drv_face['landmark_img'] = [value.unsqueeze(0) for value in drv_face['landmark_img']]
    model.reenactment(src_face['landmark_img'], drv_face['headpose'].unsqueeze(0), drv_face['landmark_img'])
    
    visual_results = model.get_current_visuals()
    

        

torch.cuda.synchronize()
dur = time.time()-start

使用其他3D人脸模型fitting时，对应的72个landmarks如何选择？

DECA是对应FLAME人脸模型的5000多个顶点，本项目通过landmark_embedding.json 挑选出了72个关键点，而大多数人脸模型只会给68个关键点，对于BFM人脸模型，如何才能根据索引找出符合本项目的72个关键点？
多余的4个关键点如果无法获得，能否根据邻近的关键点进行线性组合得到？

您好，我准备训练256*256的模型，但是在数据处理上有一些问题如下

1.首先fitting代码里面的PoseLandmarkExtractor函数下有一个参数：

tx_scale是否需要修改为256/2000？2000是怎么得到的呢？
2.src/external/data下的pose_transform_config.json的物理含义是什么呢？这些是怎么得到的?
{
"scale_transform": 4122.399989645386,
"tx_transform": 0.2582781138863169,
"ty_transform": -0.26074984122168304
}

感谢作者的工作！

Is the maximum pixel generated by this model 512*512 resolution?

Is the maximum pixel generated by this model 512*512 resolution?
need to retrain for greater resolution?

是不是训练用的3dmm fitting model和测试用的deca model不一样才导致的误差，而不是deca本身不准？

如题，感谢。

关于训练中所使用的landmark和headpose选择

您好，非常感谢您出色的工作！我注意到代码中使用的landmark和headpose均为deca模型推理得到的，在fitting.py中最后得到了分别保存其数值的txt文件。在训练中我想使用另一个速度更快的人脸网络（68点）来替代deca，相对应的landmark（68点）和headpose该如何选择呢？

Test my data but dont looks like me QAQ

Great work ! But when I test my data but dont looks like me . How to solve it ?

about use

only use in face？

The given DECA headposes points don't match

The offical DECA model outputs 68 2D points from one face.Your own 3DMM has 72 2D points.I think data/landmark_embedding.json is your model point-choose projection...How can I project DECA offical 68 points to 72points.I run the test_case/source.jpg with offical DECA model and then compare the offical 68 points and your project 72 points of 'source.jpg'. I can't find the projection of this two files.

headpose参数中的二维偏移量和缩放因子如何得到?

在本项目中，将DECA的输出的headpose经过预定义的scale_transform等参数进行变换，从而对应于face2face的输入；

那么当我使用3DDFA模型时，输出得到的偏移量和缩放因子该如何变换，才能对应face2face的输入呢？

when can you release 3DMM fitting algorithm?

About face detector

Hi,
I noticed that in this project, DECA uses FAN as a face detector. Could you please tell me what method did you choose to detect face? I think this part could take majority of time and if we improve this, fps could be higher?
Thanks!
best regards,

想问一下，能否跳过train环节，直接下载模型进行test环节

Question about the face bounding box of VoxCeleb1

Hi,
Thanks for sharing such an amazing project! As for the pre-processing, it is said that you follow the video-preprocessing scripts. However, in their codes, they use the bounding box as shared in the meta-csv file, which is different from the original bounding box from the VoxCeleb1 dataset, as discussed in issue21.

Therefore, may I ask which bounding box you use to obtain the cropped images?

Besides, the original VoxCeleb1 dataset contains 1k+ subjects in the training set and 40+ subjects in the testing set.
However, the meta-file of video-preprocessing only provides 400+ subjects, which is a subset of the original VoxCeleb1 dataset.
Thus, may I ask which dataset you used in the training examples? The dataset with 1k+ subjects or the subset from video-preprocessing?

Looking forward to your reply. Thanks in advance!

netease-gameai / face2facerho Goto Github PK

face2facerho's People

Contributors

Stargazers

Watchers

Forkers

face2facerho's Issues

Recommend Projects

Recommend Topics

Recommend Org