Code Monkey home page Code Monkey logo

3ddfa_v2's Introduction

Towards Fast, Accurate and Stable 3D Dense Face Alignment

License GitHub repo size

By Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei and Stan Z. Li. The code repo is owned and maintained by Jianzhu Guo.

demo

[Updates]

  • 2021.7.10: Run 3DDFA_V2 online on Gradio.
  • 2021.1.15: Borrow the implementation of Dense-Head-Pose-Estimation for the faster mesh rendering (speedup about 3x, 15ms -> 4ms), see utils/render_ctypes.py for details.
  • 2020.10.7: Add the latency evaluation of the full pipeline in latency.py, just run by python3 latency.py --onnx, see Latency evaluation for details.
  • 2020.10.6: Add onnxruntime support for FaceBoxes to reduce the face detection latency, just append the --onnx action to activate it, see FaceBoxes_ONNX.py for details.
  • 2020.10.2: Add onnxruntime support to greatly reduce the 3dmm parameters inference latency, just append the --onnx action when running demo.py, see TDDFA_ONNX.py for details.
  • 2020.9.20: Add features including pose estimation and serializations to .ply and .obj, see pose, ply, obj options in demo.py.
  • 2020.9.19: Add PNCC (Projected Normalized Coordinate Code), uv texture mapping features, see pncc, uv_tex options in demo.py.

Introduction

This work extends 3DDFA, named 3DDFA_V2, titled Towards Fast, Accurate and Stable 3D Dense Face Alignment, accepted by ECCV 2020. The supplementary material is here. The gif above shows a webcam demo of the tracking result, in the scenario of my lab. This repo is the official implementation of 3DDFA_V2.

Compared to 3DDFA, 3DDFA_V2 achieves better performance and stability. Besides, 3DDFA_V2 incorporates the fast face detector FaceBoxes instead of Dlib. A simple 3D render written by c++ and cython is also included. This repo supports the onnxruntime, and the latency of regressing 3DMM parameters using the default backbone is about 1.35ms/image on CPU with a single image as input. If you are interested in this repo, just try it on this google colab! Welcome for valuable issues, PRs and discussions 😄

Getting started

Requirements

See requirements.txt, tested on macOS and Linux platforms. The Windows users may refer to FQA for building issues. Note that this repo uses Python3. The major dependencies are PyTorch, numpy, opencv-python and onnxruntime, etc. If you run the demos with --onnx flag to do acceleration, you may need to install libomp first, i.e., brew install libomp on macOS.

Usage

  1. Clone this repo
git clone https://github.com/cleardusk/3DDFA_V2.git
cd 3DDFA_V2
  1. Build the cython version of NMS, Sim3DR, and the faster mesh render
sh ./build.sh
  1. Run demos
# 1. running on still image, the options include: 2d_sparse, 2d_dense, 3d, depth, pncc, pose, uv_tex, ply, obj
python3 demo.py -f examples/inputs/emma.jpg --onnx # -o [2d_sparse, 2d_dense, 3d, depth, pncc, pose, uv_tex, ply, obj]

# 2. running on videos
python3 demo_video.py -f examples/inputs/videos/214.avi --onnx

# 3. running on videos smoothly by looking ahead by `n_next` frames
python3 demo_video_smooth.py -f examples/inputs/videos/214.avi --onnx

# 4. running on webcam
python3 demo_webcam_smooth.py --onnx

The implementation of tracking is simply by alignment. If the head pose > 90° or the motion is too fast, the alignment may fail. A threshold is used to trickly check the tracking state, but it is unstable.

You can refer to demo.ipynb or google colab for the step-by-step tutorial of running on the still image.

For example, running python3 demo.py -f examples/inputs/emma.jpg -o 3d will give the result below:

demo

Another example:

demo

Running on a video will give:

demo

More results or demos to see: Hathaway.

Features (up to now)

2D sparse 2D dense 3D
2d sparse 2d dense 3d
Depth PNCC UV texture
depth pncc uv_tex
Pose Serialization to .ply Serialization to .obj
pose ply obj

Configs

The default backbone is MobileNet_V1 with input size 120x120 and the default pre-trained weight is weights/mb1_120x120.pth, shown in configs/mb1_120x120.yml. This repo provides another config in configs/mb05_120x120.yml, with the widen factor 0.5, being smaller and faster. You can specify the config by -c or --config option. The released models are shown in the below table. Note that the inference time on CPU in the paper is evaluated using TensorFlow.

Model Input #Params #Macs Inference (TF)
MobileNet 120x120 3.27M 183.5M ~6.2ms
MobileNet x0.5 120x120 0.85M 49.5M ~2.9ms

Surprisingly, the latency of onnxruntime is much smaller. The inference time on CPU with different threads is shown below. The results are tested on my MBP (i5-8259U CPU @ 2.30GHz on 13-inch MacBook Pro), with the 1.5.1 version of onnxruntime. The thread number is set by os.environ["OMP_NUM_THREADS"], see speed_cpu.py for more details.

Model THREAD=1 THREAD=2 THREAD=4
MobileNet 4.4ms 2.25ms 1.35ms
MobileNet x0.5 1.37ms 0.7ms 0.5ms

Latency

The onnx option greatly reduces the overall CPU latency, but face detection still takes up most of the latency time, e.g., 15ms for a 720p image. 3DMM parameters regression takes about 1~2ms for one face, and the dense reconstruction (more than 30,000 points, i.e. 38,365) is about 1ms for one face. Tracking applications may benefit from the fast 3DMM regression speed, since detection is not needed for every frame. The latency is tested using my 13-inch MacBook Pro (i5-8259U CPU @ 2.30GHz).

The default OMP_NUM_THREADS is set 4, you can specify it by setting os.environ['OMP_NUM_THREADS'] = '$NUM' or inserting export OMP_NUM_THREADS=$NUM before running the python script.

demo

FQA

  1. What is the training data?

    We use 300W-LP for training. You can refer to our paper for more details about the training. Since few images are closed-eyes in the training data 300W-LP, the landmarks of eyes are not accurate when closing. The eyes part of the webcam demo are also not good.

  2. Running on Windows.

    You can refer to this comment for building NMS on Windows.

Acknowledgement

  • The FaceBoxes module is modified from FaceBoxes.PyTorch.
  • A list of previous works on 3D dense face alignment or reconstruction: 3DDFA, face3d, PRNet.
  • Thank AK391 for hosting the Gradio web app.

Other implementations or applications

Citation

If your work or research benefits from this repo, please cite two bibs below : ) and 🌟 this repo.

@inproceedings{guo2020towards,
    title =        {Towards Fast, Accurate and Stable 3D Dense Face Alignment},
    author =       {Guo, Jianzhu and Zhu, Xiangyu and Yang, Yang and Yang, Fan and Lei, Zhen and Li, Stan Z},
    booktitle =    {Proceedings of the European Conference on Computer Vision (ECCV)},
    year =         {2020}
}

@misc{3ddfa_cleardusk,
    author =       {Guo, Jianzhu and Zhu, Xiangyu and Lei, Zhen},
    title =        {3DDFA},
    howpublished = {\url{https://github.com/cleardusk/3DDFA}},
    year =         {2018}
}

Contact

Jianzhu Guo (郭建珠) [Homepage, Google Scholar]: [email protected] or [email protected] or [email protected] (this email will be invalid soon).

3ddfa_v2's People

Contributors

ak391 avatar cleardusk avatar conorturner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3ddfa_v2's Issues

what makes such lightweight backbone works so well?

compared to the previous version of your work, 3ddfa, 3ddfa_v2's structure is much simpler, but achieves better results. so i wonder if the meta-joint loss is the reason that enable mobilenet to outperform previous works. i would like to know your opinion on applying these methods(look ahead, combine different losses) to solving other tasks.

Resnet weights

Hello there! Thank you for your excellent work, I was curious if you had any plans to provide weights for resnet?

Extracting UV textures in the video for all the frames

Describe the bug
Extracting UV textures for multiple frames in the video gives black image as output after the 1st frame. The output becomes greyish after the 1st frame and then eventually becomes black.

To Reproduce
add the line
uv_tex(img, ver_lst, tddfa.tri, show_flag=args.show_flag, wfp=wfp)
at line 131 in the demo_video_smooth.py below
elif args.opt == '3d'

Expected behavior
To produce the UV textures for all the images(frames) in the video

models

请问您这个models的输入大小shape是多少,还有就是models的参数文件在哪个文件夹里。
thanks you

scipy.io.loadmat fail, decompressing error.

Please detail your questions here : )

File "mio5_utils.pyx", line 548, in scipy.io.matlab.mio5_utils.VarReader5.read_full_tag File "mio5_utils.pyx", line 556, in scipy.io.matlab.mio5_utils.VarReader5.cread_full_tag File "streams.pyx", line 176, in scipy.io.matlab.streams.ZlibInputStream.read_into File "streams.pyx", line 163, in scipy.io.matlab.streams.ZlibInputStream._fill_buffer zlib.error: Error -2 while decompressing data: inconsistent stream state

When I run demo.py, an error happened on line: from utils.uv import uv_tex. I search it on baidu or google, but find nothing. Someone get this? share pls.

测试效果不太理想呀

跑了demo,发现视频的关键点检测也不是很稳定呀,并且带了smooth滤波的demo眨眼的时候定位不准确。还没有我关键点检测算法稳定呢,我自己开发的算法计算量只有10M,106关键点的。

Bounding box of AFLW2000-3D

Hi!
I am trying to calculate the NME of the AFLW2000-3D dataset. But I have no way to get the bounding box annotation from AFLW2000-3D. The "roi" information provided by the AFLW2000-3D dataset is strange. I think it is not the bounding box of faces.
Could you tell me how to get the bounding box annotation of the AFLW2000-3D dataset?
image

Algorithm 2 of the paper

image

In the red circle shown above, why you divide N by 3 ?

N is the number of points, T(:, 4) has three elements, representing respectively the displacement in three dimensions of the coordinates.

If we take the first predicted displacement as Tx, and it's corresponding ground truth displacement as Tgx then, the resulting difference of reconstructed 3D shape should be:
||(Tx-Tgx, Tx-Tgx, Tx-Tgx, ... , Tx-Tgx)|| = |Tx-Tgx|*√(N)

Is there any mistake in my derivation above?

Can you tell the big variable that saved the landmark?

Hello I would like to know if face recognition is possible using this code. Can you tell me which file and variable that stores the landmark of the face

In this code, I am trying to perform face recognition by giving the landmark variable as a face recognition code. Is it possible?

landmark inaccuracy on the RAVDESS dataset

Type your opinions or ideas here.

Hi, this project is really very useful for several downstream tasks.
Currently, I'm utilizing 3DFFA_V2 to reconstruct some talking faces on the RAVDESS dataset.
This is a very neat in-lab dataset, which has a high-resolution head with white background.
However, the reconstruction accuracy seems not to be good, several landmarks on lips are not aligned.
Here are some examples
this is the original image
original
this is the reconstructed image
reconstruct
Obviously, the lip in the original image is closed, but in the reconstructed image is opened.
I'm wondering whether should I adjust some parameters when conducting 3D reconstruction on videos?

How to visualise dense landmarks on videos?

I know I can use python3 demo_video.py -f examples/inputs/videos/214.avi --opt 3d render dense landmarks but how am I going to see 3d dense landmarks alone without rendering, I am planning to use them for Facial recognition embedding

'Namespace' object has no attribute 'dense_flag'

Thanks for release the codes.
When run "python3 demo.py -f examples/inputs/emma.jpg", I meet
'Namespace' object has no attribute 'dense_flag'
I found this is because this line in demo.py https://github.com/cleardusk/3DDFA_V2/blob/master/demo.py#L44
arser.add_argument('--dense_flg', default='true', type=str2bool, help='whether reconstructing dense')
but in line32, https://github.com/cleardusk/3DDFA_V2/blob/master/demo.py#L32
ver_lst = tddfa.recon_vers(param_lst, roi_box_lst, dense_flag=args.dense_flag)
obviously, args.dense_flag is not match '--dense_flg'.
May be this should be fixed.

构建失败

platform:macOS catalina 10.15.7 (19H2)
gcc version:

Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/usr/include/c++/4.2.1
Apple clang version 12.0.0 (clang-1200.0.32.27)
Target: x86_64-apple-darwin19.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
 (pytorch) quanhaoguo@QuanhaodeMacBook-Pro 3DDFA_V2 % sh ./build.sh
running build_ext
skipping 'nms/cpu_nms.c' Cython extension (up-to-date)
running build_ext
skipping 'lib/rasterize.cpp' Cython extension (up-to-date)
building 'Sim3DR_Cython' extension
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/quanhaoguo/anaconda3/envs/pytorch/include -arch x86_64 -I/Users/quanhaoguo/anaconda3/envs/pytorch/include -arch x86_64 -I/Users/quanhaoguo/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/core/include -I/Users/quanhaoguo/anaconda3/envs/pytorch/include/python3.6m -c lib/rasterize.cpp -o build/temp.macosx-10.7-x86_64-3.6/lib/rasterize.o -std=c++11
clang: warning: include path for libstdc++ headers not found; pass '-stdlib=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
In file included from lib/rasterize.cpp:624:
In file included from /Users/quanhaoguo/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4:
In file included from /Users/quanhaoguo/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:12:
In file included from /Users/quanhaoguo/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1822:
/Users/quanhaoguo/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: 
      "Using deprecated NumPy API, disable it with "          "#define
      NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it with " \
 ^
In file included from lib/rasterize.cpp:629:
lib/rasterize.h:5:10: fatal error: 'cmath' file not found
#include "cmath"
         ^~~~~~~
1 warning and 1 error generated.
error: command 'gcc' failed with exit status 1

ValueError: Buffer dtype mismatch, expected 'int_t' but got 'long long'

In the line xx1 = np.maximum(x1[i], x1[order[1:]]) below it throws this error in FaceBoxes\utils\nms\py_cpu_nms.py

def py_cpu_nms(dets, thresh):
    """Pure Python NMS baseline."""
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    scores = dets[:, 4]

    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])

        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        ovr = inter / (areas[i] + areas[order[1:]] - inter)

        inds = np.where(ovr <= thresh)[0]
        order = order[inds + 1]

    return keep

meta-join

Hi,I realize meta-join training and find that VDC dominates in the early stage ,because of VDC loss valueconverge from about 493 to 200 very fast. It is opposite of the conclusion from paper:fPWDC dominates in the early stage and VDC guides in the late stage. Any suggestions?@cleardusk

About the output

Is it able to turn the result to mesh file with texture?Thank you.

Feature request: add onnxruntime inference

Feature request

I have benchmarked the onnxruntime library and find its latency (mobilenet in our case) is rather small. However, my personal time is limited. Therefore, I hope anyone interested in this repo can contribute to this repo by adding the onnxruntime inference : )

The onnxruntime tutorial is here.

About test accuracy

First of all, thanks for such excellent work. With reference to the work of 3DDFA_V1, plus the synthesis of training data by myself, the test accuracy of the model has been greatly improved. The test accuracy is as follows:
微信图片_20201013103455
Using the mobilenetv2 model, the training method is the same as 3DDFA_V1, the number of shape parameters is 60, and the number of expression parameters is 29. If a large model is used, the accuracy will be further improved. The actual test stability has also been greatly improved.
Therefore, I look forward to the open source of 3DDFA_V2 related training algorithms, thank you very much.

Align the coordinates of 3D landmarks

Hey congrats for this beautiful job, very interesting actually.

I was wondering if there is way to

  1. get a 3d representation of the landmaks, like (x,y,z) coordinates for each landmark point.
  2. I would like to align the arrays of landmarks of different faces with a reference landmarks.

Ideally I would like to have all the 3d landmarks coordinates aligned to the same reference point, for example: all the landmarks coordinates are the coordinates of a frontal face where the proportions of the face are all similar.

Training detail for 3DDFA_V2

i am looking for training details for this model, but they are not provided with code base, i refer 3DDFA but the result are not good. could you please provide the training details.

Thanks,

Ear, Neck and Landmark Visibility

Hi,

Thanks for sharing the great work!

Here I have a few questions:

  1. In the old 3DDFA (3DDFA_V1), the reconstructed vertices cover the ears and neck regions. But it is excluded in the new 3DDFA_V2. However, in some applications we would like to get the details around those regions, may I know is there an easy way to adapt the new 3DDFA_V2 model to cover the ears and neck regions?

  2. In the old 3DDFA paper (CVPR ver.), the authors visualized the visibility of each landmark. I am wondering how to get the landmark visibility from the reconstructed vertices, could you please give me some hints about it?

Looking forward to your reply!
Best regards.

Eval on AFLW2000

Hi, jianzhu,
I eval 3DDFA_V2 on AFLW2000 by script from 3DDFA, and the result is
[ 0, 30] Mean: 2.735, Std: 1.127
[30, 60] Mean: 3.477, Std: 1.431
[60, 90] Mean: 4.543, Std: 1.961
[ 0, 90] Mean: 3.585, Std: 0.742

The model seems like MobileNet(M+R) in your paper, right?
image

Head Generation

I want to know if I got the 3D face using 3DDFA. How could I generate the whole head without hair?
Does anyone have a good idea? Thanks a lot!

通过3DDFA得到面部模型后,怎么样才能得到完整的光头模型呢?
希望大家出出主意,非常感谢!

NME metric

Hello, thanks for your excellent work!

The NME of different datasets is an important metric. For a fair comparison, may you share the code about how to calculate the NME? Or are there any official code to calculate the NME metrics, and the visibility vector which is shown in "Pose-Invariant 3D Face Alignment(ICCV 2015)" as follow:
image
image

Looking forward to your reply. Good luck!

How to make expressions more expressive?

Hi, the model is very accurate for face shapes.
However, it seems to have only 10 parameters for expressions.
Eyes' and eyebrows' motions are not well detected. Expressions for mouth are good.
How to improve this? Do I need to train a model myself with more parameters for expressions?
Thank you very much for your reply.

Question about the function `parse_roi_box_from_bbox`

Thanks for your great work!
When I read your released code at this place:

def parse_roi_box_from_bbox(bbox):

I could not understand your operation purpose. Of course, the face image should be a square.

But I guess this is done because the face detector tends to detect the upper part of the face, so you move down the bounding box , is it right? By the way, are the hyper-parameters in this function like 0.14, 1.58 chosen empirically?

Looking forward to your reply!

sh ./build.sh help me

Hello, I am a student starting machine learning. I am trying to run the currently uploaded code by installing python with anaconda
sh ./build.sh In this part, because it is a Windows environment, I cannot use sh
If you can run this code on Windows, I would appreciate it if you can tell me how.

how to get the frontal face?

Given a face photo in large pose, how to get the frontal face picture using this 3D model?
And if anyone has implement this function?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.