Comments (2)
- Releasing training codes are not in the plan currently due to the related company policy. However, many necessary parts for training have been included in the repo, e.g., dataset/loss/model/options/utils files, they will help you construct the training structure easier. For TensorRT, in brief, you can first transfer the Pytorch models (.pkl) to ONNX files (.onnx) and then to the TensorRT files (*.trt).
- FPS setting is just a choice. Previous work like ATVG/NVP uses 25FPS and MakeItTalk uses 62.5FPS. Theoretically, higher FPS contains more speaking details but also leads to more training difficulties, which requires more precise and short audio modeling as well as long-time consistency. Therefore, it is a trade-off between prediction precision and learning difficulties. If you want to train the model in a different FPS, many settings may need to be changed for the best results.
- landmarks are intermediate representations for final rendering results, and of course, you can edit them and control the final renderings, e.g. head pose/mouth editing. If edited landmarks are far outside the training corpus span, the models degrade and performance becomes worse -- that is the common issue of learning-based methods. And further (may not relate to this issue), it is also a trade-off between the generalization (one-shot methods, e.g., ATVG/MakeItTalk) and the specialization (personalized methods, e.g., NVP/SynthesizingObama). The choice of the method depends on the requirements of your targets, after all currently there is no method to do both best.
from livespeechportraits.
- Releasing training codes are not in the plan currently due to the related company policy. However, many necessary parts for training have been included in the repo, e.g., dataset/loss/model/options/utils files, they will help you construct the training structure easier. For TensorRT, in brief, you can first transfer the Pytorch models (.pkl) to ONNX files (.onnx) and then to the TensorRT files (*.trt).
- FPS setting is just a choice. Previous work like ATVG/NVP uses 25FPS and MakeItTalk uses 62.5FPS. Theoretically, higher FPS contains more speaking details but also leads to more training difficulties, which requires more precise and short audio modeling as well as long-time consistency. Therefore, it is a trade-off between prediction precision and learning difficulties. If you want to train the model in a different FPS, many settings may need to be changed for the best results.
- landmarks are intermediate representations for final rendering results, and of course, you can edit them and control the final renderings, e.g. head pose/mouth editing. If edited landmarks are far outside the training corpus span, the models degrade and performance becomes worse -- that is the common issue of learning-based methods. And further (may not relate to this issue), it is also a trade-off between the generalization (one-shot methods, e.g., ATVG/MakeItTalk) and the specialization (personalized methods, e.g., NVP/SynthesizingObama). The choice of the method depends on the requirements of your targets, after all currently there is no method to do both best.
收到,谢谢大佬的回复。再次感谢大佬开源优秀的工作
from livespeechportraits.
Related Issues (20)
- What is the meaning of implementing by C++? HOT 1
- 候选照片,一共四张,是基于什么逻辑进行选择的?
- how can i use it in real time? HOT 1
- Does anyone implement the training code of this project? HOT 1
- How to run demo in "Real-time" HOT 1
- 模型得到的矩阵值可以和ARkit进行映射吗?
- RuntimeError: Found no NVIDIA driver on your system.
- Great project, where does the author achieve real-time performance? HOT 2
- 如何生成自己的模型。从哪里导入我的视频素材生成我自己的模型。
- How to train these models in custom dataset? Any documentation? HOT 1
- What tool did you use to create a sketch from a face image, in case i want to train the image to image transition model?
- 73 facial landmarks HOT 1
- FileNotFoundError: [Errno 2] No such file or directory: './data/May\\mean_pts3d.npy' HOT 1
- 数字人技术交流群请联系VX:metahuman668
- GMMLogLoss for training audio2headpose
- training data download
- Is the Released Models Trained on Whole Video Clip?
- code for data processing, training HOT 2
- REAL TIME 哪里去了?不是说好可以根据音频流来实时输出吗? HOT 2
- Lip sync result HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from livespeechportraits.