Code Monkey home page Code Monkey logo

curated-awesome-lists / awesome-ai-talking-heads Goto Github PK

View Code? Open in Web Editor NEW
48.0 2.0 2.0 29 KB

A curated list of 'Talking Head Generation' resources. Features influential papers, groundbreaking algorithms, crucial GitHub repositories, insightful videos, and more. Ideal for AI enthusiasts, researchers, and graphics professionals

License: Apache License 2.0

ai-art awesome-list machine-learning state-of-the-art talking-face-generation talking-head-videos talking-heads

awesome-ai-talking-heads's Introduction

Awesome talking-head

Welcome to the Awesome List for Talking Head Generation! This curated collection of resources focuses on the intriguing domain of 'Talking Head Generation' - an area of computer graphics and artificial intelligence that strives to create lifelike digital recreations of human heads and faces. These 'talking heads' can be used in a variety of applications, from realistic video content and virtual reality, to advanced communication tools and beyond. This list aims to gather key research papers, state-of-the-art algorithms, seminal GitHub repositories, educational videos, inspiring blogs, and more. Whether you are an AI researcher, computer graphics professional, or an AI enthusiast, this list is your one-stop destination to dive into the world of Talking Head Generation. Happy exploring!

Table of Contents

GitHub projects

  • AudioGPT : Understanding and Generating Speech, Music, Sound, and Talking Head. ๐Ÿ—ฃ๏ธ๐ŸŽต
  • SadTalker : Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation. ๐ŸŽญ๐ŸŽถ
  • Thin-Plate-Spline-Motion-Model : Thin-Plate Spline Motion Model for Image Animation. ๐Ÿ–ผ๏ธ
  • GeneFace : Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code. ๐Ÿ‘ค๐Ÿ’ฌ
  • CVPR2022-DaGAN : Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation. ๐Ÿ‘ฅ๐Ÿ“น
  • sd-wav2lip-uhq : Wav2Lip UHQ extension for Automatic. ๐Ÿ‘„
  • Text2Video : ICASSP 2022: "Text2Video: text-driven talking-head video synthesis with phonetic dictionary". ๐Ÿ”ค๐ŸŽž๏ธ
  • OTAvatar : This is the official repository for OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering [CVPR2023]. ๐Ÿ‘ค๐ŸŽญ
  • Audio2Head : Code for paper "Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion" in the conference of IJCAI 2021. ๐Ÿ—ฃ๏ธ๐Ÿ‘ค
  • IP_LAP : CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors. ๐Ÿ”ฅ๐Ÿค–
  • Wunjo AI : Synthesize & clone voices in English, Russian & Chinese, real-time speech recognition, deepfake face & lips animation, face swap with one photo, change video by text prompts, segmentation, and retouching. Open-source, local & free. ๐Ÿ—ฃ๏ธ๐Ÿ‘ค๐Ÿ’ฌ
  • LIHQ : Long-Inference, High Quality Synthetic Speaker (AI avatar/ AI presenter). ๐ŸŽ™๏ธ๐Ÿ‘ค
  • Co-Speech-Motion-Generation : Freeform Body Motion Generation from Speech. ๐Ÿ—ฃ๏ธ๐Ÿšถ
  • Neural Head Reenactment with Latent Pose Descriptors : The authors' implementation of the "Neural Head Reenactment with Latent Pose Descriptors" (CVPR 2020) paper. ๐Ÿค–๐Ÿ‘ค
  • NED : PyTorch implementation for NED (CVPR 2022). It can be used to manipulate the facial emotions of actors in videos based on emotion labels or reference styles. ๐Ÿ˜ƒ๐ŸŽญ๐ŸŽฅ
  • WACV23_TSNet : The pytorch implementation of our WACV23 paper "Cross-identity Video Motion Retargeting with Joint Transformation and Synthesis". ๐ŸŽฌโœจ
  • ICCV2023-MCNET : The official code of our ICCV2023 work: Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation. ๐ŸŽฅ๐Ÿค–
  • Speech2Video : Code for ACCV 2020 "Speech2Video Synthesis with 3D Skeleton Regularization and Expressive Body Poses". ๐Ÿ—ฃ๏ธ๐ŸŽฅ๐Ÿ’ƒ
  • StyleLipSync : Official pytorch implementation of "StyleLipSync: Style-based Personalized Lip-sync Video Generation". ๐Ÿ’‹๐ŸŽฅ

Articles & Blogs

  • How to Create Fake Talking Head Videos With Deep Learning (Code Tutorial): An article explaining the process of generating fake talking head videos using deep learning techniques.
  • AudioGPT: Understanding and Generating Speech, Music, Sound: A research paper introducing AudioGPT, a multi-modal AI system that can process complex audio information and understand and generate speech, music, sound, and talking head content.
  • Text-based Editing of Talking-head Video: An academic publication discussing the editing of talking-head videos using text-based instructions.
  • Few-Shot Adversarial Learning of Realistic Neural Talking Head: A research paper presenting a system capable of learning personalized talking head models from just a few image views of a person, using adversarial training techniques.
  • DisCoHead: Audio-and-Video-Driven Talking Head Generation: A paper describing DisCoHead, a method that disentangles and controls head pose and facial expressions in talking head generation, without supervision.
  • Microsoft's 3D Photo Realistic Talking Head: A blog post showcasing Microsoft's 3D talking head technology, which combines photorealistic video with a 3D mesh model.
  • Depth-Aware Generative Adversarial Network for Talking Head: A research paper proposing a GAN-based approach that leverages dense 3D facial geometry to generate realistic and accurate talking head videos.
  • Talking-head Generation with Rhythmic Head Motion: This article presents a method for generating realistic talking-head videos with natural head movements, addressing the challenge of generating lip-synced videos while incorporating natural head motion. The proposed approach utilizes a 3D-aware generative network along with a hybrid embedding module and a non-linear composition module, resulting in controllable and photo-realistic talking-head videos with natural head movements.
  • Learned Spatial Representations for Few-shot Talking-Head Synthesis: This article introduces a novel approach for few-shot talking-head synthesis by factorizing the representation of a subject into its spatial and style components. The proposed method predicts a dense spatial layout for the target image and utilizes it for synthesizing the target frame, achieving improved preservation of the subject's identity in the source images.
  • Efficient Emotional Adaptation for Audio-Driven Talking-Head: This article proposes the Emotional Adaptation for Audio-driven Talking-head (EAT) method, which transforms emotion-agnostic talking-head models into emotion-controllable ones in a cost-effective and efficient manner. The approach utilizes lightweight adaptations to enable precise and realistic emotion controls, achieving state-of-the-art performance on widely-used benchmarks.
  • High-Fidelity and Freely Controllable Talking Head Video Generation: This article addresses the challenges faced by current methods in generating high-quality and controllable talking-head videos. It introduces a novel model that leverages self-supervised learned landmarks and 3D face model-based landmarks to model the motion, along with a motion-aware multi-scale feature alignment module. The proposed method produces high-fidelity talking-head videos with free control over head pose and expression.
  • Implicit Identity Representation Conditioned Memory Compensation: This article proposes a global facial representation space and a novel implicit identity representation conditioned memory compensation network for high-fidelity talking head generation. The network module learns a unified spatial facial meta-memory bank, which compensates warped source facial features to overcome limitations due to complex motions in the driving video, resulting in improved generation quality.
  • Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head: This article focuses on the task of avatar fingerprinting, which verifies the trustworthiness of rendered talking-head videos. It proposes an embedding that groups the motion signatures of one identity together, allowing the identification of synthetic videos using the appearance of a specific individual driving the expressions.
  • Style Transfer for 2D Talking Head Animation: This article presents a method for generating talking head animation with learnable style references. It reconstructs 2D talking head animation based on a single input image and an audio stream, utilizing facial landmarks motion, style-pattern construction, and a style-aware image generator. The method achieves better results than recent state-of-the-art methods in generating photo-realistic and fidelity 2D animation.
  • One-Shot Free-View Neural Talking-Head Synthesis for Video: This article proposes a neural talking-head video synthesis model that learns to synthesize videos using a source image containing the target person's appearance and a driving video for motion. The model achieves high visual quality and bandwidth efficiency, outperforming competing methods on benchmark datasets.
  • Progressive Disentangled Representation Learning for Fine: This article presents a one-shot talking head synthesis method that achieves disentangled control over lip motion, eye gaze & blink, head pose, and emotional expression. It utilizes a progressive disentangled representation learning strategy to isolate each motion factor, allowing for fine-grained control and high-quality speech and lip-motion synchronization.
  • VideoReTalking: Audio-based Lip Synchronization for Talking Head: This article introduces VideoReTalking, a system for editing real-world talking head videos according to input audio. It disentangles the editing task into face video generation, audio-driven lip-sync, and face enhancement, ultimately producing a high-quality and lip-syncing output video. The system utilizes learning-based approaches in a sequential pipeline, without requiring user intervention.

Online Courses

Research Papers

Tools & Software

Slides & Presentations


This initial version of the Awesome List was generated with the help of the Awesome List Generator. It's an open-source Python package that uses the power of GPT models to automatically curate and generate starting points for resource lists related to a specific topic.

awesome-ai-talking-heads's People

Contributors

alronz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.