netease-gameai / face2facerho Goto Github PK
View Code? Open in Web Editor NEWThe Official PyTorch Implementation for Face2Face^ρ (ECCV2022)
License: BSD 3-Clause "New" or "Revised" License
The Official PyTorch Implementation for Face2Face^ρ (ECCV2022)
License: BSD 3-Clause "New" or "Revised" License
Hi, nice work and congrats!
Wondering where I can find the paper. Could you provide the paper link?
Thanks!
When I used a face as the source image and a video to drive, the results were poor, especially in places where the source image and video were not aligned, such as side faces. What could cause this?Thanks!
I use this picture above as the source image.
and use this picture above as the driving image.
The character features are relatively similar, but the result is very poor.Like the following figure.
If the source image remains unchanged, the driving image is changed to the following figure.
The result becomes worse, as shown in the following figure.
非常感谢您在issue20里关于数据集大小的回复,这对我非常有帮助。但是我还有些裁剪框(bounding box)的问题没有得到您的解释。由于那个issue已经被您关闭,我只能新开一个issue,期待您的解答。
请问您用的是哪一种bounding box 从原始视频裁剪脸部区域的呢?
1)是直接从video-preprocessing 提供的 "vox-metadata.csv",
2)还是使用的VoxCeleb1 官方提供的原始bounding box 呢?
具体来说,
VoxCeleb1 官方给的boundingbox 都是正方形的,我看您的dataset demo里似乎使用的是VoxCeleb1官方的正方形框。
但是meta csv 文件里存在很多非正方形的框,如这个issue,他们会把非正方形的框给resize 到正方形,这样就会有畸变。比如您demo 里的图,直接使用video-preprocessing 的meta.csv 就会得到这样压的非常扁的图。
所以想找您确认一下在裁剪图片的时候使用的是哪一种bounding box,谢谢!
sorry I have not find your paper 'Real-Time High-Resolution One-Shot Face Reenactment', could you give me a url or your paper?
Thanks a lot
May I ask where could I find the paper of this excellent work?
like other code did.
Hi, It is a great work. And I wonder if you can release the model with the size of 256256 other than 512512?
Nice Work it is!
I cannot download the dataset when press the download button in the website.
Could you please provide the direct super link of the downloaded dataset?
Hi,
I noticed that the inputs to fitting.py are images, and I wonder if the code supports having an image for src input and a video for the driving input. Thanks in advance!
感谢你的开源。
我在P40显卡上运行单张图片的重演测试,只记录了驱动图片的计算耗时,如下所示:
-> 1.1 驱动图片Fitting时的DECA数据读入耗时: 1.7801 second
-> 1. 驱动图片总的Fitting耗时: 1.8205 second
-> 2.1 Motion Field的计算耗时: 0.1119 second
-> 3.3 rendering_net耗时: 0.3678 second
-> 3. reenactment总耗时: 0.4834 second
驱动图片相关计算的总耗时:1.8205 + 0.1119 + 0.3678 + 0.4834 = 2.7836秒
请问这个是什么原因呢,除Fitting部分外的其他代码和论文中一致么?
It seems that we have to fix camera position, while generating training pairs for talkinghead dataset, otherwise It is hard to solve background movements of the generation ?
If so, what is the best way for training data pairs generation and augmentation ?
If I do scale, shift and rotation augmentation while trainning, driving results are as follows:
您好,
感谢您的开源,
在论文中我读到 用2080Ti显卡可以达到25fps的速度(14401440)。我现在使用2080Ti显卡目前还达不到15fps(≈14fps) 的速度而且是在(512512)的分辨率下。想请问您
关于测试速度的方式,我是这样做的:
torch.cuda.synchronize()
start = time.time()
for i,img in enumerate(tqdm(path_list)):
# ------------------ fitting ------------------------
drv_params = drv_fitting.fitting(i)
drv_headpose = pose_lml_extractor.get_pose(
src_params['shape'], drv_params['exp'], drv_params['pose'],
drv_params['scale'], drv_params['tx'], drv_params['ty'])
drv_lmks = pose_lml_extractor.get_project_points(
src_params['shape'], drv_params['exp'], drv_params['pose'],
drv_params['scale'], drv_params['tx'], drv_params['ty'])
# ------------------ reenactment ------------------
drv_face = dict()
drv_face = load_data(opt,drv_face,drv_headpose,drv_lmks)
drv_face['landmark_img'] = landmark_img_generator.generate_landmark_img(drv_face['landmarks'])
drv_face['landmark_img'] = [value.unsqueeze(0) for value in drv_face['landmark_img']]
model.reenactment(src_face['landmark_img'], drv_face['headpose'].unsqueeze(0), drv_face['landmark_img'])
visual_results = model.get_current_visuals()
torch.cuda.synchronize()
dur = time.time()-start
DECA是对应FLAME人脸模型的5000多个顶点,本项目通过landmark_embedding.json 挑选出了72个关键点,而大多数人脸模型只会给68个关键点,对于BFM人脸模型,如何才能根据索引找出符合本项目的72个关键点?
多余的4个关键点如果无法获得,能否根据邻近的关键点进行线性组合得到?
Is the maximum pixel generated by this model 512*512 resolution?
need to retrain for greater resolution?
如题,感谢。
您好,非常感谢您出色的工作!我注意到代码中使用的landmark和headpose均为deca模型推理得到的,在fitting.py中最后得到了分别保存其数值的txt文件。在训练中我想使用另一个速度更快的人脸网络(68点)来替代deca,相对应的landmark(68点)和headpose该如何选择呢?
Great work ! But when I test my data but dont looks like me . How to solve it ?
only use in face?
The offical DECA model outputs 68 2D points from one face.Your own 3DMM has 72 2D points.I think data/landmark_embedding.json is your model point-choose projection...How can I project DECA offical 68 points to 72points.I run the test_case/source.jpg with offical DECA model and then compare the offical 68 points and your project 72 points of 'source.jpg'. I can't find the projection of this two files.
在本项目中,将DECA的输出的headpose经过预定义的scale_transform等参数进行变换,从而对应于face2face的输入;
那么当我使用3DDFA模型时,输出得到的偏移量和缩放因子该如何变换,才能对应face2face的输入呢?
Hi,
I noticed that in this project, DECA uses FAN as a face detector. Could you please tell me what method did you choose to detect face? I think this part could take majority of time and if we improve this, fps could be higher?
Thanks!
best regards,
Hi,
Thanks for sharing such an amazing project! As for the pre-processing, it is said that you follow the video-preprocessing scripts. However, in their codes, they use the bounding box as shared in the meta-csv file, which is different from the original bounding box from the VoxCeleb1 dataset, as discussed in issue21.
Therefore, may I ask which bounding box you use to obtain the cropped images?
Besides, the original VoxCeleb1 dataset contains 1k+ subjects in the training set and 40+ subjects in the testing set.
However, the meta-file of video-preprocessing only provides 400+ subjects, which is a subset of the original VoxCeleb1 dataset.
Thus, may I ask which dataset you used in the training examples? The dataset with 1k+ subjects or the subset from video-preprocessing?
Looking forward to your reply. Thanks in advance!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.