Code Monkey home page Code Monkey logo

deepcomplexunetpytorch's Introduction

Deep Complex U-Net


Unofficial PyTorch Implementation of Phase-Aware Speech Enhancement with Deep Complex U-Net, (H. Choi et al., 2018)

Note

This is NOT author's implementation.

Architecture


(TO BE)

Requirements

torch==1.1
soundfile==0.9.0
easydict==1.9
git+https://github.com/keunwoochoi/torchaudio-contrib@61fc6a804c941dec3cf8a06478704d19fc5e415a
git+https://github.com/sweetcocoa/PinkBlack@e45a65623c1b511181f7ea697ca841a7b2900f17
torchcontrib==0.0.2
git+https://github.com/vBaiCai/python-pesq
# gputil # if you need to execute multiple training process

Train


  1. Download Datasets:
  1. Separate each train / test wavs

  2. Downsample wavs

# prerequisite : ffmpeg.
# sudo apt-get install ffmpeg (Ubuntu)
bash downsample.sh   # all wavs below $PWD will be converted to .flac, 16k samplerate
  1. Train
python train_dcunet.py --batch_size 12 \
                       --train_signal /path/to/train/clean/speech/ \
                       --train_noise /path/to/train/noisy/speech/ \
                       --test_signal /path/to/test/clean/speech/ \
                       --test_noise /path/to/test/noisy/speech/ \
                       --ckpt /path/to/save/checkpoint.pth \
                       --num_step 50000 \
                       --validation_interval 500 \
                       --complex

# You can check other arguments from the source code. ( Sorry for the lack description. )                        

Test


python estimate_directory.py --input_dir /path/to/noisy/speech/ \
                             --output_dir /path/to/estimate/dir/ \
                             --ckpt /path/to/saved/checkpoint.pth

Results


PESQ(cRMCn/cRMRn) Paper Mine*
DCUNet - 10 2.72/2.51 3.03/3.07
DCUNet - 20 3.24/2.74 3.12/3.11
  • cRMCn : Complex-valued input/Output
  • cRMRn : Real-valued input/Output

Comparing the two(Paper's, Mine) values above is inappropriate for the following reasons:

  • * I did not use matlab code that the author used to calculate pesq, but instead used pypesq.

  • * The Architecture of model is slightly different from the original paper. (Such as kernel size of convolution filters)

  • MelSpec img

Notes


  • Log amplitute estimate has slightly worse performance than non-log amplitude
  • Complex-valued network does not make the metric better..

Sample Wavs


Mixture Estimated Speech GT(Clean Speech)
mixture1.wav Estimated1.wav GroundTruth1.wav
mixture2.wav Estimated2.wav GroundTruth2.wav

Contact


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.