Comments (11)
and how did the file mean_pts3d.npy created
is that the mean of this point across the dataset ?
from livespeechportraits.
it is the average of landmarks of all video frames of the target person
from livespeechportraits.
- Using 3d landmarks obtained by face tracking has several advantages over directly using detected 2D landmarks. It helps disentangle the camera parameters, head pose, and facial movements, which allow explicit control over these parameters while using 2D landmarks can't do it. Besides, it is much easier for networks to learn normalized facial movements (in 3D space) than entangled landmarks, generating more accurate results.
- 'mean_pts3d.npy' should be the mean 3d landmarks of the target person in the training set. The network learns the displacements instead of the absolute locations.
Hope the above helps.
from livespeechportraits.
thank you so much for your reply it really helped
I have another question about sequence length in the code you define sequence length to be 240
parser.add_argument('--sequence_length', type=int, default=240, help='length of training frames in each iteration')
in the paper 240 is the batch size (𝑇 = 240 represents the number of consecutive frames sent to
the model at each iteration) (number of samples in an iteration is the batch size)that means it is the batch size ??
If 240 is sequence length then audio features sent during training will be
[b,T,ndim] = 32(batch size) and 240(seq_length) and 512 (ndim)
or do you mean that you will send batches of one each batch contain sample with 240 seq_length
thanks in advance
from livespeechportraits.
Of course the latter one. LSTM is a kind of RNN network, and it should take sequential data as input. 240 frames equal to 4 seconds under the 60 FPS setting.
Batch_size means for each forward pass, how many batches of sequential data (240 frames data) are sent.
from livespeechportraits.
thank you so much i don't know how to thank you you really helped me !
from livespeechportraits.
I have another question
regarding the training I understand that every sequence of 240 frame (4 sec) will output vector of size (25,3)
this vector represents the displacement between the landmarks of the last frame and the mean position of landmarks is that right ?
if it is right, do you then walk through the data with a window i.e
from frame zero to frame 240
from frame 1 to frame 240
.
.
.
.
from frame 39 to frame 279
this is the first patch for example is that right ?
and here
A2Lsamples = self.audio_features[file_index][current_frame * 2 : (current_frame + self.seq_len) * 2]
i don't get why (*2)
thanks in advance
from livespeechportraits.
First, LSTM takes sequential data as input and its output is also sequential, therefore T frames input results in T frames output. Please carefully check the definition of LSTM networks. During training, we use 4 seconds as the length while during the test there is no length limitation.
Secondly, the audio2mouth network learns the displacements.
Thirdly, frame2* is because the APC feature frame is half of the 1/60. Please check the paper for details.
from livespeechportraits.
thank you for your replays,
I am wondering do you know any open source algorithm for face tracking as i can't find one to produce the same output of your paper.
thanks in advance
from livespeechportraits.
Any parametric monocular face reconstruction method would be an alternative, like FaceScape, DECA, 3DDFA_v2, etc.
from livespeechportraits.
Any parametric monocular face reconstruction method would be an alternative, like FaceScape, DECA, 3DDFA_v2, etc.
What the method did you use? could you please upload the code?
from livespeechportraits.
Related Issues (20)
- What is the meaning of implementing by C++? HOT 1
- 候选照片,一共四张,是基于什么逻辑进行选择的?
- how can i use it in real time? HOT 1
- Does anyone implement the training code of this project? HOT 1
- How to run demo in "Real-time" HOT 1
- 模型得到的矩阵值可以和ARkit进行映射吗?
- RuntimeError: Found no NVIDIA driver on your system.
- Great project, where does the author achieve real-time performance? HOT 2
- 如何生成自己的模型。从哪里导入我的视频素材生成我自己的模型。
- How to train these models in custom dataset? Any documentation? HOT 1
- What tool did you use to create a sketch from a face image, in case i want to train the image to image transition model?
- 73 facial landmarks HOT 1
- FileNotFoundError: [Errno 2] No such file or directory: './data/May\\mean_pts3d.npy' HOT 1
- 数字人技术交流群请联系VX:metahuman668
- GMMLogLoss for training audio2headpose
- training data download
- Is the Released Models Trained on Whole Video Clip?
- code for data processing, training HOT 2
- REAL TIME 哪里去了?不是说好可以根据音频流来实时输出吗? HOT 2
- Lip sync result HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from livespeechportraits.