Code Monkey home page Code Monkey logo

adenet's Introduction

ADENet

This repository is for ADENet introduced in the following paper: J. Xiong, Y. Zhou, P. Zhang, L. Xie, W. Huang and Y. Zha, "Look&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement", (IEEE Transactions on Multimedia, 2022)

Project link:ADENet ADENet.png

Dependencies

Start from building the environment

conda env create -f env.yaml

Training

Data preparation

1.AVA dataset

The following script can be used to download and prepare the AVA dataset for training.

python trainADENet.py --dataPathAVA AVADataPath --download 

AVADataPath is the folder you want to save the AVA dataset and its preprocessing outputs, the details can be found in /utils/tools.py. Please read them carefully.

2.MUSAN noise dataset

  1. Download the MUSAN dataset openslr
  2. Using following script can clip noise audio, generate training set, validation set, test set
python generate_speech_noise 

Training

Then you can train ADENet in AVA end-to-end by using:

python trainADENet.py --dataPathAVA AVADataPath ----dataPathMUSAN MUSANDataPath --savePath savePath

Using parameter --isDown, --isDown to control cross-modal circulant fusion


Citation

Please cite the following if our paper or code is helpful to your research.

@ARTICLE{9858007,
  author={Xiong, Junwen and Zhou, Yu and Zhang, Peng and Xie, Lei and Huang, Wei and Zha, Yufei},
  journal={IEEE Transactions on Multimedia}, 
  title={Look&listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement}, 
  year={2022},
  volume={},
  number={},
  pages={1-14},
  doi={10.1109/TMM.2022.3199109}}

This is my first open-source work, please let me know if I can future improve in this repositories or there is anything wrong in our work. Thanks for your support!

adenet's People

Contributors

overcautious avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

adenet's Issues

how to generate MUSAN noise dataset

hi, i'm trying to train your ADENet.
While generating the MUSAN noise dataset, I cannot find the file './noise_dataset/noise_speech_txt/train_list.txt' which mentioned in generate_speech_noise python code.
How can I get the train,val,test list of MUSAN dataset?
Thank you!

pretrained model

Hello,
Can you provide the pre-trained model? I'd appreciate it if you could.
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.