Code Monkey home page Code Monkey logo

s2l-s2d's Introduction

S2L-S2D: Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation

This repository contains the code for the paper "Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation" (link). The paper presents a novel approach based on landmarks motion for generating 3D Talking Heads from speech. The code includes the implementation of two models proposed in the paper: S2L and S2D. Check out some qualitative results in this video

Installation

To run the code, you need to install the following dependencies:

  • Python 3.8
  • PyTorch-GPU 1.13.0
  • Trimesh 3.22.1
  • Librosa 3.9.2
  • Transformers 4.6.1 from Hugging Face
  • MPI-IS for mesh rendering (link)
  • Additional dependencies for running the demo: pysimplegui==4.60.5, sounddevice==0.4.6, soundfile==0.12.1

Training Setup

  1. Clone the repository:
git clone https://github.com/FedeNoce/s2l-s2d.git
  1. Download the vocaset dataset from here (Training Data, 8GB).
  2. Put the downloaded file into the "S2L/vocaset" and "S2D/vocaset" directories.
  3. To train S2L, preprocess the data by running "preprocess_voca_data.py" in the "S2L/vocaset" directory. Then, run "train_S2L.py".
  4. To train S2D, preprocess the data by running "Data_processing.py" in the "S2D" directory. Then, run "train_S2D.py".

Inference

  1. Download the pretrained models from here and place them in the "S2L/Results" and "S2D/Results" directories.
  2. Run the GUI demo using "demo.py".
  3. If you're interested, we have an updated version of the demo that allows us to reconstruct a user's face from a webcam photo using a 3DMM fitting. Before running the demo using "demo_with_rec.py" you'll need to download a file from here and place it in the "Rec/Values" directory.

Citation

If you use this code or find it helpful, please consider citing:

@misc{nocentini2023learning,
  title={Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation},
  author={Federico Nocentini and Claudio Ferrari and Stefano Berretti},
  year={2023},
  eprint={2306.01415},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Authors

s2l-s2d's People

Contributors

fedenoce avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

s2l-s2d's Issues

Computing the per-vertex weights

Hello,

I am trying to use the proposed scheme to weight the loss function according to the distances of the vertices to landmarks (for another topology), and I can't seem to replicate what you proposed in the paper. That is, I am trying to compute the Normalized_d_weights.npy myself.

The paper states that the weight is 1/min(d(pi, lj), that is, the inverse of the distance to the nearest landmark. For a vertex that is still a landmark, this distance will be 0, so I am trying to figure out what did you do in practice to obtain this file.

I tried using a simple espilon (1/min( d(pi, lj) + 1e-6) but it didn't work at all. Using a larger "bias" (e.g. up to 1/min(d(pi, lj) + 0.15) gets somewhat close result to what you have, for the front of the face, but I see that the ears have very different weights.

Visualizing the weights from the repo:
weights_from _repo

Recomputing with 1/min( d(pi, lj) + 1e-6
recomputed_1e-6
ments/assets/0accca41-bfc3-407e-9d61-1d8f06592480)

Recomputing with 1/min(d(pi, lj) + 0.15
flame_topo015

Thanks in advance!

License

Thank for your research.
What is the license of this code and pretrained model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.