Code Monkey home page Code Monkey logo

lrs3-for-speech-separation's Introduction

Instruction for generating data

Following are the steps to generate training and testing data. There are several parameters to change in order to match different purpose.

We will release the benchmark of Speech-Separation on the LRS3 dataset as soon as possible.

Our script repository is to make the multi-modal speech separation task have a unified standard in data set generation. So that we can follow up on multi-modal speech separation tasks.

We hope that the LRS3 data set will have a unified generation standard for pure voice separation tasks like the WSJ0 data set.

☑️   Our baseline model is coming soon!!!!!

SI-SNRi SNRi
Baseline 15.08 15.34

Requirement

  • ffmpeg 4.2.1
  • sox 14.4.2
  • numpy 1.17.2
  • opencv-python 4.1.2.30
  • librosa 0.7.0
  • dlib 19.19.0
  • face_recognition 1.3.0

Step 1 - Getting raw Data

  1. In this method, we use the Lip Reading Sentences 3 (LRS) dataset as our training, validation, and test sets.

Afouras T, Chung J S, Senior A, et al. Deep audio-visual speech recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2018.

  1. We just use the train_val and test folders in the LRS3 dataset. These two folders need to be merged before using our script.

Step 2 - Processing Video Data

  1. Open ./video_process/
cd video_process
  1. Then use the video_process.py script to get the video frame, get the image of the lip area, and finally adjust its size to 120 × 120.
python video_process.py
# Change the path in the script to your data path.
video_path = 'valid_mouth.txt' # Collection of files with lips detected
inpath = '../frames' # save video frames path
outpath = '../mouth' # save mouth images path
change_root = '../frames' # resize the frames file path
# You can note this code first.
print('--------------Resize the frames-------------')
resize_img(change_root, (120, 120))
  1. In order to process the image data faster, we use the following command to store the image data in the numpy data format ".npz".
 python video_to_np.py

This file is the lrs3 dataset txt file.

train = open('../train.txt', 'r').readlines()
test = open('../test.txt', 'r').readlines()
val = open('../val.txt', 'r').readlines()

Step 3 - Processing audio data

  1. Running audio_cut.py code, you can cut the sound of the video through the sox software to get a 2s voice signal.

  2. Mix it. We use -5db to 5db to mix the voices of two people. This part of the code refers to the method of deep clustering data mixing.

matlab -nodisplay -r create_wav_2speakers
#You need to change this part in create_wav_2speakers.m
'''
data_type = {'tr','cv','tt'};
wsj0root = ''; % YOUR_PATH/raw_audio
output_dir16k=''; % 16k path
output_dir8k=''; % 8k path
'''

Then, you can start to training data.

Citing Dataset Processing Script

If you find this repository useful, please cite it in your publications.

@article{li2022audio,
  title={An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits},
  author={Li, Kai and Xie, Fenghua and Chen, Hang and Yuan, Kexin and Hu, Xiaolin},
  journal={arXiv preprint arXiv:2212.10744},
  year={2022}
}

lrs3-for-speech-separation's People

Contributors

jusperlee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

lrs3-for-speech-separation's Issues

mix with videos

HI, you have auido mix code, do we need to mix video as well ? do u have the video mixing code as well ?

mix two audios

Hello, thank you for this useful work. I would like to know if there's a Python version for create_wav_2speakers.m, which is a Matlab implementation for mixing two audio files. I would really appreciate it.

how to get mix_2_spk_cv.txt

how to get mix_2_spk_cv.txt ,how to get mix_2_spk_tr.txt and how to get mix_2_spk_tt.txt? How does the weighting coefficient of the two audio mixes come from? Why are the coefficients of the two audio frequencies just opposite?

Noisy LRS3

Hi,

Can you please share the script to generate NTCD-TIMIT and LRS3+WHAM! datasets (AVLIT INTERSPEECH 2023)?

Thanks,
Vahid

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.