Code Monkey home page Code Monkey logo

nn-gev's Introduction

Neural network based GEV beamformer

Introduction

This repository contains code to replicate the results for the 3rd CHiME challenge using a NN-GEV Beamformer.

Install

This code requires Python 3 to run (although most parts should be compatible with Python 2.7). Install the necessary modules:

pip install chainer
pip install tqdm
pip install SciPy
pip install scikit-learn
pip install librosa

Usage

  1. Extract the speech and noise images for the SimData using the modified Matlab script in CHiME3/tools/simulation

  2. Start the training for the BLSTM model using the GPU with id 0 and the data directory data:

    python train.py --chime_dir=../chime/data --gpu 0 data BLSTM
    

    This will first create the training data (i.e. the binary mask targets) and then run the training with early stopping. Instead of BLSTM it is also possible to specify FW to train a simple feed-forward model.

  3. Start the beamforming:

```
beamform.sh ../chime/data data/export_BLSTM data/BLSTM_model/best.nnet BLSTM
```

This will apply the beamformer to every utterance of the CHiME database and store the resulting audio file in ``data/export_BLSTM``. The model ``data/BLSTM_model/best.nnet`` is used to generate the masks.
  1. Start the kaldi baseline using the exported data.

If you want to use the beamformer with a different database, take a look at beamform.py and chime_data and modify it accordingly.

Results

With the new baseline, you should get the following results:

```
local/chime4_calc_wers.sh exp/tri3b_tr05_multi_noisy new_baseline exp/tri3b_tr05_multi_noisy/graph_tgpr_5k
compute dt05 WER for each location
-------------------
best overall dt05 WER 9.77% (language model weight = 12)
-------------------
dt05_simu WER: 9.81% (Average), 8.95% (BUS), 11.28% (CAFE), 8.55% (PEDESTRIAN), 10.44% (STREET)
-------------------
dt05_real WER: 9.73% (Average), 11.67% (BUS), 9.37% (CAFE), 8.41% (PEDESTRIAN), 9.47% (STREET)
-------------------
et05_simu WER: 10.67% (Average), 8.85% (BUS), 11.34% (CAFE), 11.02% (PEDESTRIAN), 11.47% (STREET)
-------------------
et05_real WER: 14.00% (Average), 19.01% (BUS), 13.37% (CAFE), 12.37% (PEDESTRIAN), 11.24% (STREET)
-------------------


./local/chime4_calc_wers_smbr.sh exp/tri4a_dnn_tr05_multi_noisy_smbr_i1lats new_baseline exp/tri4a_dnn_tr05_multi_noisy/graph_tgpr_5k
compute dt05 WER for each location
-------------------
best overall dt05 WER 5.87% (language model weight = 9) (Number of iterations = 4)
-------------------
dt05_simu WER: 5.62% (Average), 5.24% (BUS), 6.58% (CAFE), 4.91% (PEDESTRIAN), 5.77% (STREET)
-------------------
dt05_real WER: 6.11% (Average), 7.66% (BUS), 5.83% (CAFE), 5.10% (PEDESTRIAN), 5.87% (STREET)
-------------------
et05_simu WER: 7.26% (Average), 6.74% (BUS), 7.70% (CAFE), 7.38% (PEDESTRIAN), 7.23% (STREET)
-------------------
et05_real WER: 9.48% (Average), 14.06% (BUS), 8.22% (CAFE), 7.81% (PEDESTRIAN), 7.84% (STREET)
-------------------


local/chime4_calc_wers.sh exp/tri4a_dnn_tr05_multi_noisy_smbr_lmrescore new_baseline_rnnlm_5k_h300_w0.5_n100 exp/tri4a_dnn_tr05_multi_noisy_smbr_lmrescore/graph_tgpr_5k
compute dt05 WER for each location
-------------------
best overall dt05 WER 4.02% (language model weight = 11)
-------------------
dt05_simu WER: 3.97% (Average), 3.66% (BUS), 4.65% (CAFE), 3.38% (PEDESTRIAN), 4.19% (STREET)
-------------------
dt05_real WER: 4.07% (Average), 5.34% (BUS), 3.61% (CAFE), 3.35% (PEDESTRIAN), 4.00% (STREET)
-------------------
et05_simu WER: 4.51% (Average), 4.09% (BUS), 4.61% (CAFE), 4.46% (PEDESTRIAN), 4.86% (STREET)
-------------------
et05_real WER: 6.46% (Average), 9.87% (BUS), 5.47% (CAFE), 5.14% (PEDESTRIAN), 5.34% (STREET)
-------------------
```

Citation

If you use this code for your experiments, please consider citing the following paper:

@inproceedings{Hey2016,
title = {NEURAL NETWORK BASED SPECTRAL MASK ESTIMATION FOR ACOUSTIC BEAMFORMING},
author = {J. Heymann, L. Drude, R. Haeb-Umbach},
year = {2016},
date = {2016-03-20},
booktitle = {Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)},
keywords = {},
pubstate = {forthcoming},
tppubtype = {inproceedings}
}

nn-gev's People

Contributors

boeddeker avatar ericye16 avatar jheymann85 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nn-gev's Issues

Phase correction

Is there a reference to show what the phase correction is doing?

undefined function : json2mat

I have some problems when running your CHiME3_simulate_data_patched.m. One of the problems is "undefined function : json2mat ". Can you provide thoses files to us?

Thank you very much for your contribution in the field of multi-channel speech enhancement, which has greatly helped many researchers.

MVDR beamforming

The code for the MVDR beamformer requires the atf input, how is this estimated in practice? Is it possible to estimate it also from the signal mask or does it need a DOA algorithm?

about

hi,I notice that mvdr in your nn_gev code. I want to compare the performance of mvdr and gev. In beamforming.py ,the function of get_mvdr_vector. How can i set the atf_vector? Thank you for your answer.

Clean speech only training

Hi, as you mentioned in the paper, it's also possible to train on clean speech only. I checked the source code but couldn't find any relevant code about this. Could you point me where are you doing that? Or what's the way of doing that if the code is not included? Thank you very much!

matlab

hi . i want to know some about matlab eg : % Compute the STFT (short window) stft_multi();and
estimate_ir() ;apply_ir()......! Can you provide these functions? i don't pass this matlab code in my data. or Can you provide these functions in python ?

Pre-trained model noise aware?

Hi,

Are the published pre-trained models (best.nnet) trained with noise-aware data? In other words, is it trained on the simulated data? or the "clean" speech only data?

Thanks in advance.

About data generation

Hi, to generate the required data using the MatLab script, it seems there should be a 'data' directory under 'CHiME3' directory that contains the .json files? (but there is not...)

CHiME-5 update

Hi, do you have any plans to use the CHiME-5 dataset to re-train the model and evaluate on CHiME-5?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.