Code Monkey home page Code Monkey logo

sad-talker-on-flyte's Introduction

SadTalker (CVPR 2023) on Flyte

This is an attempt towards running SadTalker inference on Flyte with just CPUs. To achieve this, the inference code has been adapted from the original SadTalker inference code.

SadTalker

SadTalker generates 3D motion coefficients (head pose, expression) of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation.

Flyte

Flyte is an orchestrator for data and ML workflows. It's a distributed processing platform that facilitates running highly concurrent workflows.

Overview

SadTalker's currently hosted on HuggingFace Spaces with A10G. The Flyte flavor was run on a deployed Flyte instance on AWS EKS using CPUs.

The SadTalker inference pipeline was executed on Flyte using the following default model parameters:

@dataclass_json
@dataclass
class ModelParams:
    ref_pose: Optional[str] = None
    ref_eyeblink: Optional[str] = None
    result_dir: str = "results"
    pose_style: int = 0
    batch_size: int = 2
    expression_scale: float = 1.0
    input_yaw: List[int] = field(default_factory=lambda: [0])
    input_pitch: List[int] = field(default_factory=lambda: [0])
    input_roll: List[int] = field(default_factory=lambda: [0])
    enhancer: str = ""
    background_enhancer: str = "realesrgan"
    device: str = "cpu"
    still: bool = True
    preprocess: str = "crop"
    net_recon: str = "resnet50"
    use_last_fc: bool = False
    focal: float = 1015.0
    center: float = 112.0
    camera_d: float = 10.0
    z_near: float = 5.0
    z_far: float = 15.0
    bfm_folder: str = "./checkpoints/BFM_Fitting/"
    bfm_model: str = "BFM_model_front.mat"
    checkpoint_dir: str = (
        "https://github.com/Winfredy/SadTalker/releases/download/v0.0.2"
    )

The table below shows the estimated cost, execution time, and resources used for running SadTalker on Flyte:

Image Audio Model Params Execution time Estimated cost AWS Instance vCPUs Memory (GiB) Actual hourly rate + Flyte deployment costs
musk silky-radio-wave (3 sec) Default args 6m 23s Flyte Demo Link g4dn.2xlarge 8 32 $0.752 + ?
img_192753_actorpriyankachopra silky-radio-wave (8 sec) Default args 9m 58s Flyte Demo Link g4dn.2xlarge 8 32 $0.752 + ?
obama silky-radio-wave (8 sec) Still=False + Preprocess=Full 9m 40s Flyte Demo Link g4dn.2xlarge 8 32 $0.752 + ?
img_192753_actorpriyankachopra silky-radio-wave (8 sec) Still=True + Enhancer + Preprocess=Full 19m 8s Flyte Demo Link g4dn.2xlarge 8 32 $0.752 + ?
natalie_portman silky-radio-wave (25 sec) Still=False + Enhancer + Preprocess=Full 56m 3s Flyte Demo Link g4dn.2xlarge 8 32 $0.752 + ?
233367190-ffed7947-06ec-4609-baad-742ede1327b2.bus_chinese.mp4
233068635-afb950e4-1e04-45af-8e7b-5193a164f5ac.chinese_news.1.mp4
output.mp4
233068635-afb950e4-1e04-45af-8e7b-5193a164f5ac.chinese_news_enhanced.mp4
edPU5HxncLWa1YkgRPNkSd68ONG.audio-oprah-winfrey_95QfotBw_enhanced.mp4

Why Flyte

Several Flyte features have been utilized to optimize the SadTalker inference pipeline:

  • Parallelism: Map tasks have been employed to execute the code in parallel wherever possible. This approach significantly reduced the execution time.
  • Caching: Cached one task that analyzes the audio input, but caching opportunities are limited since task outputs are bound to change.
  • Load balancing: Load balancing is automatic with Flyte since it runs on top of Kubernetes.
  • Scalability: Flyte can easily handle concurrent requests and scale up or down based on available resources, regardless of the number of executions.
  • Efficient resource usage: Flyte allows for the allocation of resources based on task requirements, ensuring that there is no unnecessary allocation of resources.

sad-talker-on-flyte's People

Contributors

samhita-alla avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.