This is an attempt towards running SadTalker inference on Flyte with just CPUs. To achieve this, the inference code has been adapted from the original SadTalker inference code.
SadTalker generates 3D motion coefficients (head pose, expression) of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation.
Flyte is an orchestrator for data and ML workflows. It's a distributed processing platform that facilitates running highly concurrent workflows.
SadTalker's currently hosted on HuggingFace Spaces with A10G. The Flyte flavor was run on a deployed Flyte instance on AWS EKS using CPUs.
The SadTalker inference pipeline was executed on Flyte using the following default model parameters:
@dataclass_json
@dataclass
class ModelParams:
ref_pose: Optional[str] = None
ref_eyeblink: Optional[str] = None
result_dir: str = "results"
pose_style: int = 0
batch_size: int = 2
expression_scale: float = 1.0
input_yaw: List[int] = field(default_factory=lambda: [0])
input_pitch: List[int] = field(default_factory=lambda: [0])
input_roll: List[int] = field(default_factory=lambda: [0])
enhancer: str = ""
background_enhancer: str = "realesrgan"
device: str = "cpu"
still: bool = True
preprocess: str = "crop"
net_recon: str = "resnet50"
use_last_fc: bool = False
focal: float = 1015.0
center: float = 112.0
camera_d: float = 10.0
z_near: float = 5.0
z_far: float = 15.0
bfm_folder: str = "./checkpoints/BFM_Fitting/"
bfm_model: str = "BFM_model_front.mat"
checkpoint_dir: str = (
"https://github.com/Winfredy/SadTalker/releases/download/v0.0.2"
)
The table below shows the estimated cost, execution time, and resources used for running SadTalker on Flyte:
Image | Audio | Model Params | Execution time | Estimated cost | AWS Instance | vCPUs | Memory (GiB) | Actual hourly rate + Flyte deployment costs |
---|---|---|---|---|---|---|---|---|
(3 sec) | Default args | 6m 23s Flyte Demo Link | g4dn.2xlarge | 8 | 32 | $0.752 + ? | ||
(8 sec) | Default args | 9m 58s Flyte Demo Link | g4dn.2xlarge | 8 | 32 | $0.752 + ? | ||
(8 sec) | Still=False + Preprocess=Full | 9m 40s Flyte Demo Link | g4dn.2xlarge | 8 | 32 | $0.752 + ? | ||
(8 sec) | Still=True + Enhancer + Preprocess=Full | 19m 8s Flyte Demo Link | g4dn.2xlarge | 8 | 32 | $0.752 + ? | ||
(25 sec) | Still=False + Enhancer + Preprocess=Full | 56m 3s Flyte Demo Link | g4dn.2xlarge | 8 | 32 | $0.752 + ? |
233367190-ffed7947-06ec-4609-baad-742ede1327b2.bus_chinese.mp4
233068635-afb950e4-1e04-45af-8e7b-5193a164f5ac.chinese_news.1.mp4
output.mp4
233068635-afb950e4-1e04-45af-8e7b-5193a164f5ac.chinese_news_enhanced.mp4
edPU5HxncLWa1YkgRPNkSd68ONG.audio-oprah-winfrey_95QfotBw_enhanced.mp4
Several Flyte features have been utilized to optimize the SadTalker inference pipeline:
- Parallelism: Map tasks have been employed to execute the code in parallel wherever possible. This approach significantly reduced the execution time.
- Caching: Cached one task that analyzes the audio input, but caching opportunities are limited since task outputs are bound to change.
- Load balancing: Load balancing is automatic with Flyte since it runs on top of Kubernetes.
- Scalability: Flyte can easily handle concurrent requests and scale up or down based on available resources, regardless of the number of executions.
- Efficient resource usage: Flyte allows for the allocation of resources based on task requirements, ensuring that there is no unnecessary allocation of resources.