Code Monkey home page Code Monkey logo

learning-audio-visual-dereverberation's Introduction

Learning Audio-Visual Dereverberation

Motivation

Reverberation from audio reflecting off surfaces and objects in the environment not only degrades the quality of speech for human perception, but also severely impacts the accuracy of automatic speech recognition. Prior work attempts to remove reverberation based on the audio modality only. Our idea is to learn to dereverberate speech from audio-visual observations. The visual environment surrounding a human speaker reveals important cues about the room geometry, materials, and speaker location, all of which influence the precise reverberation effects in the audio stream. We introduce Visually-Informed Dereverberation of Audio (VIDA), an end-to-end approach that learns to remove reverberation based on both the observed sounds and visual scene. In support of this new task, we develop a large-scale dataset that uses realistic acoustic renderings of speech in real-world 3D scans of homes offering a variety of room acoustics. Demonstrating our approach on both simulated and real imagery for speech enhancement, speech recognition, and speaker identification, we show it achieves state-of-the-art performance and substantially improves over traditional audio-only methods.

Citation

If you find this paper and code useful, please cite the following paper:

@arxiv{chen22av_dereverb,
  title     =     {Learning Audio-Visual Dereverberation,
  author    =     {Changan Chen and Wei Sun and David Harwath and Kristen Grauman},
  journal   =     {arXiv},
  year      =     {2022}
}

Installation

Install this repo into pip by running the following command:

pip install -e .

Usage

  1. Training
py vida/trainer.py --model-dir data/models/vida  --num-channel 2 --use-depth --use-rgb --log-mag --no-mask --phase-loss sin --phase-weight 0.1 --use-triplet-loss --exp-decay --triplet-margin 0.5 --mean-pool-visual --overwrite
  1. Evaluation
py vida/evaluator.py --pretrained-path data/models/vida/best_val.pth --num-channel 2  --log-mag --no-mask --est-pred --use-rgb --use-depth --mean-pool-visual --eval-dereverb

Data

See the data page for instructions on how to download the data

Contributing

See the CONTRIBUTING file for how to help out.

License

This repo is CC-BY-NC licensed, as found in the LICENSE file.

learning-audio-visual-dereverberation's People

Contributors

changanvr avatar sunwell1994 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

learning-audio-visual-dereverberation's Issues

Two questions about datasets

Thanks! Thanks for your excellent work and selfless disclosure of your code.
I have two questions about the dataset:

  1. The main paper mentions "We generate 49,430/2,700/2,600 such samples for the train/val/test splits, respectively."
    I found no val data in the data description section, only val-mini data, and val-mini only has 500 samples.
    The first question is: how to get val data?
  2. After downloading LibriSpeech, I found that the three suffixes *.csv files mentioned in the following code are missing (in line 300~305 vida/evaluator.py )
    The second question is: how to get these three files? Or how to deal with LibriSpeech to get these three files with the suffix *.csv?

pre-trained model?

Thanks for the contribution.
whether the pretrained model is avaialble?

Visual data

Hi,

Is there is the process for the visual data?

Best,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.