SadTalker (CVPR 2023) on Flyte

This is an attempt towards running SadTalker inference on Flyte with just CPUs. To achieve this, the inference code has been adapted from the original SadTalker inference code.

SadTalker

SadTalker generates 3D motion coefficients (head pose, expression) of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation.

Flyte

Flyte is an orchestrator for data and ML workflows. It's a distributed processing platform that facilitates running highly concurrent workflows.

Overview

SadTalker's currently hosted on HuggingFace Spaces with A10G. The Flyte flavor was run on a deployed Flyte instance on AWS EKS using CPUs.

The SadTalker inference pipeline was executed on Flyte using the following default model parameters:

@dataclass_json
@dataclass
class ModelParams:
    ref_pose: Optional[str] = None
    ref_eyeblink: Optional[str] = None
    result_dir: str = "results"
    pose_style: int = 0
    batch_size: int = 2
    expression_scale: float = 1.0
    input_yaw: List[int] = field(default_factory=lambda: [0])
    input_pitch: List[int] = field(default_factory=lambda: [0])
    input_roll: List[int] = field(default_factory=lambda: [0])
    enhancer: str = ""
    background_enhancer: str = "realesrgan"
    device: str = "cpu"
    still: bool = True
    preprocess: str = "crop"
    net_recon: str = "resnet50"
    use_last_fc: bool = False
    focal: float = 1015.0
    center: float = 112.0
    camera_d: float = 10.0
    z_near: float = 5.0
    z_far: float = 15.0
    bfm_folder: str = "./checkpoints/BFM_Fitting/"
    bfm_model: str = "BFM_model_front.mat"
    checkpoint_dir: str = (
        "https://github.com/Winfredy/SadTalker/releases/download/v0.0.2"
    )

The table below shows the estimated cost, execution time, and resources used for running SadTalker on Flyte:

Audio	Model Params	Execution time	AWS Instance	vCPUs	Memory (GiB)	Actual hourly rate + Flyte deployment costs
(3 sec)	Default args	6m 23s Flyte Demo Link	g4dn.2xlarge	8	32	$0.752 + ?
(8 sec)	Default args	9m 58s Flyte Demo Link	g4dn.2xlarge	8	32	$0.752 + ?
(8 sec)	Still=False + Preprocess=Full	9m 40s Flyte Demo Link	g4dn.2xlarge	8	32	$0.752 + ?
(8 sec)	Still=True + Enhancer + Preprocess=Full	19m 8s Flyte Demo Link	g4dn.2xlarge	8	32	$0.752 + ?
(25 sec)	Still=False + Enhancer + Preprocess=Full	56m 3s Flyte Demo Link	g4dn.2xlarge	8	32	$0.752 + ?

233367190-ffed7947-06ec-4609-baad-742ede1327b2.bus_chinese.mp4

233068635-afb950e4-1e04-45af-8e7b-5193a164f5ac.chinese_news.1.mp4

output.mp4

233068635-afb950e4-1e04-45af-8e7b-5193a164f5ac.chinese_news_enhanced.mp4

edPU5HxncLWa1YkgRPNkSd68ONG.audio-oprah-winfrey_95QfotBw_enhanced.mp4

Why Flyte

Several Flyte features have been utilized to optimize the SadTalker inference pipeline:

Parallelism: Map tasks have been employed to execute the code in parallel wherever possible. This approach significantly reduced the execution time.
Caching: Cached one task that analyzes the audio input, but caching opportunities are limited since task outputs are bound to change.
Load balancing: Load balancing is automatic with Flyte since it runs on top of Kubernetes.
Scalability: Flyte can easily handle concurrent requests and scale up or down based on available resources, regardless of the number of executions.
Efficient resource usage: Flyte allows for the allocation of resources based on task requirements, ensuring that there is no unnecessary allocation of resources.

adambear / sad-talker-on-flyte Goto Github PK

sad-talker-on-flyte's Introduction

SadTalker (CVPR 2023) on Flyte

SadTalker

Flyte

Overview

Why Flyte

sad-talker-on-flyte's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent