大佬您好，又来打扰您了，非常抱歉。想请问一下，后续是否会考虑分享训练相关的代码？以及tensorrt加速的教程？还有一些疑问， 1、这里的 fp

关于详细的训练过程和代码？ about livespeechportraits HOT 2 CLOSED

DWCTOD commented on July 21, 2024

关于详细的训练过程和代码？

from livespeechportraits.

Comments (2)

YuanxunLu commented on July 21, 2024

Releasing training codes are not in the plan currently due to the related company policy. However, many necessary parts for training have been included in the repo, e.g., dataset/loss/model/options/utils files, they will help you construct the training structure easier. For TensorRT, in brief, you can first transfer the Pytorch models (.pkl) to ONNX files (.onnx) and then to the TensorRT files (*.trt).
FPS setting is just a choice. Previous work like ATVG/NVP uses 25FPS and MakeItTalk uses 62.5FPS. Theoretically, higher FPS contains more speaking details but also leads to more training difficulties, which requires more precise and short audio modeling as well as long-time consistency. Therefore, it is a trade-off between prediction precision and learning difficulties. If you want to train the model in a different FPS, many settings may need to be changed for the best results.
landmarks are intermediate representations for final rendering results, and of course, you can edit them and control the final renderings, e.g. head pose/mouth editing. If edited landmarks are far outside the training corpus span, the models degrade and performance becomes worse -- that is the common issue of learning-based methods. And further (may not relate to this issue), it is also a trade-off between the generalization (one-shot methods, e.g., ATVG/MakeItTalk) and the specialization (personalized methods, e.g., NVP/SynthesizingObama). The choice of the method depends on the requirements of your targets, after all currently there is no method to do both best.

from livespeechportraits.

DWCTOD commented on July 21, 2024

Releasing training codes are not in the plan currently due to the related company policy. However, many necessary parts for training have been included in the repo, e.g., dataset/loss/model/options/utils files, they will help you construct the training structure easier. For TensorRT, in brief, you can first transfer the Pytorch models (.pkl) to ONNX files (.onnx) and then to the TensorRT files (*.trt).

FPS setting is just a choice. Previous work like ATVG/NVP uses 25FPS and MakeItTalk uses 62.5FPS. Theoretically, higher FPS contains more speaking details but also leads to more training difficulties, which requires more precise and short audio modeling as well as long-time consistency. Therefore, it is a trade-off between prediction precision and learning difficulties. If you want to train the model in a different FPS, many settings may need to be changed for the best results.

landmarks are intermediate representations for final rendering results, and of course, you can edit them and control the final renderings, e.g. head pose/mouth editing. If edited landmarks are far outside the training corpus span, the models degrade and performance becomes worse -- that is the common issue of learning-based methods. And further (may not relate to this issue), it is also a trade-off between the generalization (one-shot methods, e.g., ATVG/MakeItTalk) and the specialization (personalized methods, e.g., NVP/SynthesizingObama). The choice of the method depends on the requirements of your targets, after all currently there is no method to do both best.

收到，谢谢大佬的回复。再次感谢大佬开源优秀的工作

from livespeechportraits.

关于详细的训练过程和代码？ about livespeechportraits HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent