Code Monkey home page Code Monkey logo

lip2vec's Introduction

Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping

This repository contains a PyTorch implementation and pretrained models for Lip2Vec, a novel method for Visual Speech Recognition. For a deeper understanding of the method, refer to the paper Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping.

Lip2Vec Inference:

The video representations computed by the video encoder are input to our learned prior network, which synthesizes them to audio representations. These representations are then passed through the encoder and linear layer of the Wav2vec2.0 model to predict the text. Note that audio representations are not used at test time.

Lip2Vec Illustration

Pretrained models

arch params WER Video backbone download
Lip2Vec-Large Low-Ressources 43M 30.2 Av-HuBERT Large weights
Lip2Vec-Base Low-Ressources 43M 42.6 Av-HuBERT Base weights
Lip2Vec-Large High-Ressources 76M 26.0 Av-HuBERT Large weights
Lip2Vec-Base High-Ressources 76M 34.9 Av-HuBERT Base weights

Setup

clone the repo:

git clone https://github.com/YasserdahouML/Lip2Vec.git
cd Lip2Vec

Set up environment:

conda create -y -n lip2vec python=3.9.5
conda activate lip2vec

clone av-hubert repo and install fairseq:

git clone https://github.com/facebookresearch/av_hubert.git
cd avhubert
git submodule init
git submodule update
cd fairseq
pip install --editable ./

Install dependencies:

pip install -r requirements.txt

Download AV-Hubert weights :

For downloading AV-HuBERT weights, use this repo. Available weights:

  • AV-HuBERT Large: LRS3 + VoxCeleb2 (En), No finetuning
  • AV-HuBERT Base: LRS3 + VoxCeleb2 (En), No finetuning

Inference on LRS3

Use the following command to perform inference on the LRS3 dataset.

torchrun --nproc_per_node=4 main_test.py \
    --lrs3_path=[data_path] \
    --model_path=[prior_path] \
    --hub_path=[av-hubert_path] \

arguments:

  • data_path: Directory to the LRS3 test set videos
  • prior_path: Path to the prior network checkpoint
  • av-hubert_path: Path to the AV-Hubert weights

Acknowledgement

The repository is based on av-hubert, vsr, detr

Citation

@inproceedings{djilali2023lip2vec,
  title={Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping},
  author={Djilali, Yasser Abdelaziz Dahou and Narayan, Sanath and Boussaid, Haithem and Almazrouei, Ebtessam and Debbah, Merouane},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={13790--13801},
  year={2023}
}

lip2vec's People

Contributors

yasserdahouml avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.