Code Monkey home page Code Monkey logo

face2facerho's People

Contributors

netease-gameai avatar njumagic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

face2facerho's Issues

paper link

Hi, nice work and congrats!

Wondering where I can find the paper. Could you provide the paper link?

Thanks!

Test custom image but the result is poor

When I used a face as the source image and a video to drive, the results were poor, especially in places where the source image and video were not aligned, such as side faces. What could cause this?Thanks!

poor result

30
I use this picture above as the source image.
driving
and use this picture above as the driving image.
The character features are relatively similar, but the result is very poor.Like the following figure.
result
If the source image remains unchanged, the driving image is changed to the following figure.
driving2
The result becomes worse, as shown in the following figure.
result2

关于VoxCeleb1裁剪框选择的问题

非常感谢您在issue20里关于数据集大小的回复,这对我非常有帮助。但是我还有些裁剪框(bounding box)的问题没有得到您的解释。由于那个issue已经被您关闭,我只能新开一个issue,期待您的解答。

请问您用的是哪一种bounding box 从原始视频裁剪脸部区域的呢?
1)是直接从video-preprocessing 提供的 "vox-metadata.csv",
2)还是使用的VoxCeleb1 官方提供的原始bounding box 呢?

具体来说,
VoxCeleb1 官方给的boundingbox 都是正方形的,我看您的dataset demo里似乎使用的是VoxCeleb1官方的正方形框。

但是meta csv 文件里存在很多非正方形的框,如这个issue,他们会把非正方形的框给resize 到正方形,这样就会有畸变。比如您demo 里的图,直接使用video-preprocessing 的meta.csv 就会得到这样压的非常扁的图。
image
所以想找您确认一下在裁剪图片的时候使用的是哪一种bounding box,谢谢!

Paper

May I ask where could I find the paper of this excellent work?

link of train dataset

Nice Work it is!

I cannot download the dataset when press the download button in the website.
Could you please provide the direct super link of the downloaded dataset?

fitting module

Hi,
I noticed that the inputs to fitting.py are images, and I wonder if the code supports having an image for src input and a video for the driving input. Thanks in advance!

模型推理耗时较长?

感谢你的开源。
我在P40显卡上运行单张图片的重演测试,只记录了驱动图片的计算耗时,如下所示:
-> 1.1 驱动图片Fitting时的DECA数据读入耗时: 1.7801 second
-> 1. 驱动图片总的Fitting耗时: 1.8205 second
-> 2.1 Motion Field的计算耗时: 0.1119 second
-> 3.3 rendering_net耗时: 0.3678 second
-> 3. reenactment总耗时: 0.4834 second
驱动图片相关计算的总耗时:1.8205 + 0.1119 + 0.3678 + 0.4834 = 2.7836秒
请问这个是什么原因呢,除Fitting部分外的其他代码和论文中一致么?

About background movements for talking head generation?

It seems that we have to fix camera position, while generating training pairs for talkinghead dataset, otherwise It is hard to solve background movements of the generation ?
If so, what is the best way for training data pairs generation and augmentation ?
If I do scale, shift and rotation augmentation while trainning, driving results are as follows:

output.mp4

关于测试时长

您好,
感谢您的开源,
在论文中我读到 用2080Ti显卡可以达到25fps的速度(14401440)。我现在使用2080Ti显卡目前还达不到15fps(≈14fps) 的速度而且是在(512512)的分辨率下。想请问您

  1. 这个速度是否是正常?毕竟目前我用的是DECA来fit
  2. 我测试速度的方式是否合理?希望你能指出问题 ,十分感谢

关于测试速度的方式,我是这样做的:

  1. 一共输入1000张 512*512 图片,计算总时长,从而得到 fps
  2. 计时部分不进行IO操作,提前读取好1000张图片,reenact之后不保存image
torch.cuda.synchronize()
start = time.time()

for i,img in enumerate(tqdm(path_list)):
    
    # ------------------ fitting ------------------------
    drv_params = drv_fitting.fitting(i)
    drv_headpose = pose_lml_extractor.get_pose(
        src_params['shape'], drv_params['exp'], drv_params['pose'],
        drv_params['scale'], drv_params['tx'], drv_params['ty'])

    drv_lmks = pose_lml_extractor.get_project_points(
        src_params['shape'], drv_params['exp'], drv_params['pose'],
        drv_params['scale'], drv_params['tx'], drv_params['ty'])
    # ------------------ reenactment ------------------ 
    drv_face = dict()
    drv_face = load_data(opt,drv_face,drv_headpose,drv_lmks) 

    drv_face['landmark_img'] = landmark_img_generator.generate_landmark_img(drv_face['landmarks'])
    drv_face['landmark_img'] = [value.unsqueeze(0) for value in drv_face['landmark_img']]
    model.reenactment(src_face['landmark_img'], drv_face['headpose'].unsqueeze(0), drv_face['landmark_img'])
    
    visual_results = model.get_current_visuals()
    

        

torch.cuda.synchronize()
dur = time.time()-start

使用其他3D人脸模型fitting时,对应的72个landmarks如何选择?

DECA是对应FLAME人脸模型的5000多个顶点,本项目通过landmark_embedding.json 挑选出了72个关键点,而大多数人脸模型只会给68个关键点,对于BFM人脸模型,如何才能根据索引找出符合本项目的72个关键点?
多余的4个关键点如果无法获得,能否根据邻近的关键点进行线性组合得到?

您好,我准备训练256*256的模型,但是在数据处理上有一些问题如下

1.首先fitting代码里面的PoseLandmarkExtractor函数下有一个参数:
image
tx_scale是否需要修改为256/2000?2000是怎么得到的呢?
2.src/external/data下的pose_transform_config.json的物理含义是什么呢?这些是怎么得到的?
{
"scale_transform": 4122.399989645386,
"tx_transform": 0.2582781138863169,
"ty_transform": -0.26074984122168304
}

感谢作者的工作!

关于训练中所使用的landmark和headpose选择

您好,非常感谢您出色的工作!我注意到代码中使用的landmark和headpose均为deca模型推理得到的,在fitting.py中最后得到了分别保存其数值的txt文件。在训练中我想使用另一个速度更快的人脸网络(68点)来替代deca,相对应的landmark(68点)和headpose该如何选择呢?

The given DECA headposes points don't match

The offical DECA model outputs 68 2D points from one face.Your own 3DMM has 72 2D points.I think data/landmark_embedding.json is your model point-choose projection...How can I project DECA offical 68 points to 72points.I run the test_case/source.jpg with offical DECA model and then compare the offical 68 points and your project 72 points of 'source.jpg'. I can't find the projection of this two files.

headpose参数中的二维偏移量和缩放因子如何得到?

在本项目中,将DECA的输出的headpose经过预定义的scale_transform等参数进行变换,从而对应于face2face的输入;

那么当我使用3DDFA模型时,输出得到的偏移量和缩放因子该如何变换,才能对应face2face的输入呢?

About face detector

Hi,
I noticed that in this project, DECA uses FAN as a face detector. Could you please tell me what method did you choose to detect face? I think this part could take majority of time and if we improve this, fps could be higher?
Thanks!
best regards,

Question about the face bounding box of VoxCeleb1

Hi,
Thanks for sharing such an amazing project! As for the pre-processing, it is said that you follow the video-preprocessing scripts. However, in their codes, they use the bounding box as shared in the meta-csv file, which is different from the original bounding box from the VoxCeleb1 dataset, as discussed in issue21.

Therefore, may I ask which bounding box you use to obtain the cropped images?

Besides, the original VoxCeleb1 dataset contains 1k+ subjects in the training set and 40+ subjects in the testing set.
However, the meta-file of video-preprocessing only provides 400+ subjects, which is a subset of the original VoxCeleb1 dataset.
Thus, may I ask which dataset you used in the training examples? The dataset with 1k+ subjects or the subset from video-preprocessing?

Looking forward to your reply. Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.