s2l-s2d's Introduction

S2L-S2D: Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation

This repository contains the code for the paper "Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation" (link). The paper presents a novel approach based on landmarks motion for generating 3D Talking Heads from speech. The code includes the implementation of two models proposed in the paper: S2L and S2D. Check out some qualitative results in this video

Installation

To run the code, you need to install the following dependencies:

Python 3.8
PyTorch-GPU 1.13.0
Trimesh 3.22.1
Librosa 3.9.2
Transformers 4.6.1 from Hugging Face
MPI-IS for mesh rendering (link)
Additional dependencies for running the demo: pysimplegui==4.60.5, sounddevice==0.4.6, soundfile==0.12.1

Training Setup

Clone the repository:

git clone https://github.com/FedeNoce/s2l-s2d.git

Download the vocaset dataset from here (Training Data, 8GB).
Put the downloaded file into the "S2L/vocaset" and "S2D/vocaset" directories.
To train S2L, preprocess the data by running "preprocess_voca_data.py" in the "S2L/vocaset" directory. Then, run "train_S2L.py".
To train S2D, preprocess the data by running "Data_processing.py" in the "S2D" directory. Then, run "train_S2D.py".

Inference

Download the pretrained models from here and place them in the "S2L/Results" and "S2D/Results" directories.
Run the GUI demo using "demo.py".
If you're interested, we have an updated version of the demo that allows us to reconstruct a user's face from a webcam photo using a 3DMM fitting. Before running the demo using "demo_with_rec.py" you'll need to download a file from here and place it in the "Rec/Values" directory.

Citation

If you use this code or find it helpful, please consider citing:

@misc{nocentini2023learning,
  title={Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation},
  author={Federico Nocentini and Claudio Ferrari and Stefano Berretti},
  year={2023},
  eprint={2306.01415},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Authors

s2l-s2d's People

Contributors

Stargazers

Watchers

s2l-s2d's Issues

How to process BIWI dataset?

Thanks for your nice contribution, and will you intergrete more dataset in this program?

Computing the per-vertex weights

Hello,

I am trying to use the proposed scheme to weight the loss function according to the distances of the vertices to landmarks (for another topology), and I can't seem to replicate what you proposed in the paper. That is, I am trying to compute the Normalized_d_weights.npy myself.

The paper states that the weight is 1/min(d(pi, lj), that is, the inverse of the distance to the nearest landmark. For a vertex that is still a landmark, this distance will be 0, so I am trying to figure out what did you do in practice to obtain this file.

I tried using a simple espilon (1/min( d(pi, lj) + 1e-6) but it didn't work at all. Using a larger "bias" (e.g. up to 1/min(d(pi, lj) + 0.15) gets somewhat close result to what you have, for the front of the face, but I see that the ears have very different weights.

Visualizing the weights from the repo:

Recomputing with 1/min( d(pi, lj) + 1e-6

ments/assets/0accca41-bfc3-407e-9d61-1d8f06592480)

Recomputing with 1/min(d(pi, lj) + 0.15

Thanks in advance!

fedenoce / s2l-s2d Goto Github PK

s2l-s2d's Introduction

S2L-S2D: Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation

Installation

Training Setup

Inference

Citation

Authors

s2l-s2d's People

Contributors

Stargazers

Watchers

Forkers

s2l-s2d's Issues

How to process BIWI dataset?

Computing the per-vertex weights

Evaluation metrics code and visualization code

License

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent