Code Monkey home page Code Monkey logo

deepfilternet's Introduction

DeepFilterNet

A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) using on Deep Filtering.

deepfilternet3

For PipeWire integration as a virtual noise suppression microphone look here.

Demo

DeepFilterNet-Demo-new.mp4

To run the demo (linux only) use:

cargo +nightly run -p df-demo --features ui --bin df-demo --release

News

Usage

deep-filter

Download a pre-compiled deep-filter binary from the release page. You can use deep-filter to suppress noise in noisy .wav audio files. Currently, only wav files with a sampling rate of 48kHz are supported.

USAGE:
    deep-filter [OPTIONS] [FILES]...

ARGS:
    <FILES>...

OPTIONS:
    -D, --compensate-delay
            Compensate delay of STFT and model lookahead
    -h, --help
            Print help information
    -m, --model <MODEL>
            Path to model tar.gz. Defaults to DeepFilterNet2.
    -o, --out-dir <OUT_DIR>
            [default: out]
    --pf
            Enable postfilter
    -v, --verbose
            Logging verbosity
    -V, --version
            Print version information

If you want to use the pytorch backend e.g. for GPU processing, see further below for the Python usage.

DeepFilterNet Framework

This framework supports Linux, MacOS and Windows. Training is only tested under Linux. The framework is structured as follows:

  • libDF contains Rust code used for data loading and augmentation.
  • DeepFilterNet contains DeepFilterNet code training, evaluation and visualization as well as pretrained model weights.
  • pyDF contains a Python wrapper of libDF STFT/ISTFT processing loop.
  • pyDF-data contains a Python wrapper of libDF dataset functionality and provides a pytorch data loader.
  • ladspa contains a LADSPA plugin for real-time noise suppression.
  • models contains pretrained for usage in DeepFilterNet (Python) or libDF/deep-filter (Rust)

DeepFilterNet Python: PyPI

Install the DeepFilterNet Python wheel via pip:

# Install cpu/cuda pytorch (>=1.9) dependency from pytorch.org, e.g.:
pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
# Install DeepFilterNet
pip install deepfilternet
# Or install DeepFilterNet including data loading functionality for training (Linux only)
pip install deepfilternet[train]

To enhance noisy audio files using DeepFilterNet run

# Specify an output directory with --output-dir [OUTPUT_DIR]
deepFilter path/to/noisy_audio.wav

Manual Installation

Install cargo via rustup. Usage of a conda or virtualenv recommended. Please read the comments and only execute the commands that you need.

Installation of python dependencies and libDF:

cd path/to/DeepFilterNet/  # cd into repository
# Recommended: Install or activate a python env
# Mandatory: Install cpu/cuda pytorch (>=1.8) dependency from pytorch.org, e.g.:
pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
# Install build dependencies used to compile libdf and DeepFilterNet python wheels
pip install maturin poetry

#  Install remaining DeepFilterNet python dependencies
# *Option A:* Install DeepFilterNet python wheel globally within your environment. Do this if you want use
# this repos as is, and don't want to develop within this repository.
poetry -C DeepFilterNet install -E train -E eval
# *Option B:* If you want to develop within this repo, install only dependencies and work with the repository version
poetry -C DeepFilterNet install -E train -E eval --no-root
export PYTHONPATH=$PWD/DeepFilterNet # And set the python path correctly

# Build and install libdf python package required for enhance.py
maturin develop --release -m pyDF/Cargo.toml
# *Optional*: Install libdfdata python package with dataset and dataloading functionality for training
# Required build dependency: HDF5 headers (e.g. ubuntu: libhdf5-dev)
maturin develop --release -m pyDF-data/Cargo.toml
# If you have troubles with hdf5 you may try to build and link hdf5 statically:
maturin develop --release --features hdf5-static -m pyDF-data/Cargo.toml

Use DeepFilterNet from command line

To enhance noisy audio files using DeepFilterNet run

$ python DeepFilterNet/df/enhance.py --help
usage: enhance.py [-h] [--model-base-dir MODEL_BASE_DIR] [--pf] [--output-dir OUTPUT_DIR] [--log-level LOG_LEVEL] [--compensate-delay]
                  noisy_audio_files [noisy_audio_files ...]

positional arguments:
  noisy_audio_files     List of noise files to mix with the clean speech file.

optional arguments:
  -h, --help            show this help message and exit
  --model-base-dir MODEL_BASE_DIR, -m MODEL_BASE_DIR
                        Model directory containing checkpoints and config.
                        To load a pretrained model, you may just provide the model name, e.g. `DeepFilterNet`.
                        By default, the pretrained DeepFilterNet2 model is loaded.
  --pf                  Post-filter that slightly over-attenuates very noisy sections.
  --output-dir OUTPUT_DIR, -o OUTPUT_DIR
                        Directory in which the enhanced audio files will be stored.
  --log-level LOG_LEVEL
                        Logger verbosity. Can be one of (debug, info, error, none)
  --compensate-delay, -D
                        Add some paddig to compensate the delay introduced by the real-time STFT/ISTFT implementation.

# Enhance audio with original DeepFilterNet
python DeepFilterNet/df/enhance.py -m DeepFilterNet path/to/noisy_audio.wav

# Enhance audio with DeepFilterNet2
python DeepFilterNet/df/enhance.py -m DeepFilterNet2 path/to/noisy_audio.wav

Use DeepFilterNet within your Python script

from df import enhance, init_df

model, df_state, _ = init_df()  # Load default model
enhanced_audio = enhance(model, df_state, noisy_audio)

See here for a full example.

Training

The entry point is DeepFilterNet/df/train.py. It expects a data directory containing HDF5 dataset as well as a dataset configuration json file.

So, you first need to create your datasets in HDF5 format. Each dataset typically only holds training, validation, or test set of noise, speech or RIRs.

# Install additional dependencies for dataset creation
pip install h5py librosa soundfile
# Go to DeepFilterNet python package
cd path/to/DeepFilterNet/DeepFilterNet
# Prepare text file (e.g. called training_set.txt) containing paths to .wav files
#
# usage: prepare_data.py [-h] [--num_workers NUM_WORKERS] [--max_freq MAX_FREQ] [--sr SR] [--dtype DTYPE]
#                        [--codec CODEC] [--mono] [--compression COMPRESSION]
#                        type audio_files hdf5_db
#
# where:
#   type: One of `speech`, `noise`, `rir`
#   audio_files: Text file containing paths to audio files to include in the dataset
#   hdf5_db: Output HDF5 dataset.
python df/scripts/prepare_data.py --sr 48000 speech training_set.txt TRAIN_SET_SPEECH.hdf5

All datasets should be made available in one dataset folder for the train script.

The dataset configuration file should contain 3 entries: "train", "valid", "test". Each of those contains a list of datasets (e.g. a speech, noise and a RIR dataset). You can use multiple speech or noise dataset. Optionally, a sampling factor may be specified that can be used to over/under-sample the dataset. Say, you have a specific dataset with transient noises and want to increase the amount of non-stationary noises by oversampling. In most cases you want to set this factor to 1.

Dataset config example:

dataset.cfg

{
  "train": [
    [
      "TRAIN_SET_SPEECH.hdf5",
      1.0
    ],
    [
      "TRAIN_SET_NOISE.hdf5",
      1.0
    ],
    [
      "TRAIN_SET_RIR.hdf5",
      1.0
    ]
  ],
  "valid": [
    [
      "VALID_SET_SPEECH.hdf5",
      1.0
    ],
    [
      "VALID_SET_NOISE.hdf5",
      1.0
    ],
    [
      "VALID_SET_RIR.hdf5",
      1.0
    ]
  ],
  "test": [
    [
      "TEST_SET_SPEECH.hdf5",
      1.0
    ],
    [
      "TEST_SET_NOISE.hdf5",
      1.0
    ],
    [
      "TEST_SET_RIR.hdf5",
      1.0
    ]
  ]
}

Finally, start the training script. The training script may create a model base_dir if not existing used for logging, some audio samples, model checkpoints, and config. If no config file is found, it will create a default config. See DeepFilterNet/pretrained_models/DeepFilterNet for a config file.

# usage: train.py [-h] [--debug] data_config_file data_dir base_dir
python df/train.py path/to/dataset.cfg path/to/data_dir/ path/to/base_dir/

Citation Guide

To reproduce any metrics, we recomend to use the python implementation via pip install deepfilternet.

If you use this framework, please cite: DeepFilterNet: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering

@inproceedings{schroeter2022deepfilternet,
  title={{DeepFilterNet}: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering}, 
  author = {Schröter, Hendrik and Escalante-B., Alberto N. and Rosenkranz, Tobias and Maier, Andreas},
  booktitle={ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2022},
  organization={IEEE}
}

If you use the DeepFilterNet2 model, please cite: DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio

@inproceedings{schroeter2022deepfilternet2,
  title = {{DeepFilterNet2}: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio},
  author = {Schröter, Hendrik and Escalante-B., Alberto N. and Rosenkranz, Tobias and Maier, Andreas},
  booktitle={17th International Workshop on Acoustic Signal Enhancement (IWAENC 2022)},
  year = {2022},
}

If you use the DeepFilterNet3 model, please cite: DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement

@inproceedings{schroeter2023deepfilternet3,
  title = {{DeepFilterNet}: Perceptually Motivated Real-Time Speech Enhancement},
  author = {Schröter, Hendrik and Rosenkranz, Tobias and Escalante-B., Alberto N. and Maier, Andreas},
  booktitle={INTERSPEECH},
  year = {2023},
}

If you use the multi-frame beamforming algorithms. please cite Deep Multi-Frame Filtering for Hearing Aids

@inproceedings{schroeter2023deep_mf,
  title = {Deep Multi-Frame Filtering for Hearing Aids},
  author = {Schröter, Hendrik and Rosenkranz, Tobias and Escalante-B., Alberto N. and Maier, Andreas},
  booktitle={INTERSPEECH},
  year = {2023},
}

License

DeepFilterNet is free and open source! All code in this repository is dual-licensed under either:

at your option. This means you can select the license you prefer!

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

deepfilternet's People

Contributors

dantremonti avatar dependabot[bot] avatar eltociear avatar gleb-shnshn avatar grazder avatar lnicola avatar mattpitkin avatar ml- avatar rikorose avatar tyrius02 avatar unneon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepfilternet's Issues

Question for metric evaluation file

Hi, thanks for this work ! I have some questions about calculation of STOI. How can we use evaluation_utils.py ? Could you explain it with an example or add some explanations to README ?
When I try to get result I always see this exception 'x and y should have the same length' . I am using your pretrained DeepFilterNet2 model and Valentini test dataset . The clean and enhanced tensor length are close but not same.

Noise of typing not working well

Hi Rikorose,

Thanks for working on version 2 of Deepfilternet.
Now I can do the real-time inference process with buffer size=1, which is the same as the full signal effect.
The point is that the state of the RNN needs to be inherited.

Now I'm having a trouble with typing/keyboard noise not working well.
But I only use spectral loss with c=0.3 in Deepfilternet2 now, will multi-resolution loss improve in this case?
or maybe c=0.6 in preious work is better?

Thanks,
Aaron

ModuleNotFoundError: No module named 'libdfdata'

Hello,

When I try to run the df/train.py file, I get ModuleNotFoundError: No module named 'libdfdata' error. I understand the libdfdata is found in pyDF-data folder.

Is it possible to fix this or is there any modification required from my side?

poetry install is extremely slow when resolving the dependencies

Hi @Rikorose
Hello, thanks for your open source DeepFilterNet work. After PyPI install and trying it out, I found the effect and the amount of computation to be excellent.
So I want studying the network carefully, and manual Installation in win10.
first , i setup a conda env , and download rust and cargo : rustc 1.61.0 (fe5b13d68 2022-05-18)
second, in a python3.9 conda env , I follow your README :

      pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
      pip install maturin poetry
      maturin develop --release -m pyDF/Cargo.toml
      maturin develop --release -m pyDF-data/Cargo.toml

These commands have a correct result, key packages: DeepFilterDataLoader and DeepFilterLib install sucessful.

asttokens            2.0.5
CacheControl         0.12.11
cachy                0.3.0
certifi              2022.5.18.1
charset-normalizer   2.0.12
cleo                 0.8.1
clikit               0.6.2
colorama             0.4.5
crashtest            0.3.1
DeepFilterDataLoader 0.2.5rc0
DeepFilterLib        0.2.5rc0
distlib              0.3.4
executing            0.8.3
filelock             3.7.1
html5lib             1.1
icecream             2.1.2
idna                 3.3
importlib-metadata   4.11.4
keyring              23.6.0
lockfile             0.12.2
loguru               0.6.0
maturin              0.12.20
msgpack              1.0.4
numpy                1.22.4
packaging            20.9
pastel               0.2.1
pexpect              4.8.0
pip                  21.2.4
pkginfo              1.8.3
platformdirs         2.5.2
poetry               1.1.13
poetry-core          1.0.8
ptflops              0.6.9
ptyprocess           0.7.0
Pygments             2.12.0
pylev                1.4.0
pyparsing            3.0.9
pywin32-ctypes       0.2.0
requests             2.28.0
requests-toolbelt    0.9.1
setuptools           61.2.0
shellingham          1.4.0
six                  1.16.0
tomli                2.0.1
tomlkit              0.11.0
torch                1.11.0+cpu
torchaudio           0.11.0+cpu
typing_extensions    4.2.0
urllib3              1.26.9
virtualenv           20.14.1
webencodings         0.5.1
wheel                0.37.1
win32-setctime       1.1.0
wincertstore         0.2
zipp                 3.8.0

third, poetry install -E train -E eval or poetry install -E train -E eval --no-root , those cmd are been blocking and has not returned any results,even after an hour or more.

(DeepFilterNet) E:\code\DeepFilterNet\DeepFilterNet>poetry install -E train -E eval --no-root
Updating dependencies
Resolving dependencies...

How to fixed it? and how can debug that where is blocking ?
Thanks!

windwos anaconda install failed?

@Rikorose
Thanks,This is a very good project. I use the web demo to test ,and the results is very good, but my local installation fails. I use the anaconda environment. how to filxed?

(pytorch36) C:\Users\admin>pip install deepfilternet

 ERROR: Cannot install deepfilternet==0.1.2, deepfilternet==0.1.3 and deepfilternet==0.1.4 because these package versions have conflicting dependencies.

The conflict is caused by:
    deepfilternet 0.1.4 depends on DeepFilterLib<0.2 and >=0.1
    deepfilternet 0.1.3 depends on DeepFilterLib<0.2 and >=0.1
    deepfilternet 0.1.2 depends on DeepFilterLib<0.2 and >=0.1

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

(pytorch36) C:\Users\admin>pip install DeepFilterLib

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
ERROR: Could not find a version that satisfies the requirement DeepFilterLib(from versions: none)
ERROR: No matching distribution found for DeepFilterLib
(pytorch36) C:\Users\admin>conda list
# packages in environment at D:\ProgramData\miniconda3\envs\pytorch36:
#
# Name                    Version                   Build  Channel
absl-py                   1.1.0                    pypi_0    pypi
appdirs                   1.4.4                    pypi_0    pypi
attrs                     21.4.0                   pypi_0    pypi
audioread                 2.1.9                    pypi_0    pypi
blas                      2.111                       mkl    conda-forge
blas-devel                3.9.0              11_win64_mkl    conda-forge
ca-certificates           2022.5.18.1          h5b45459_0    conda-forge
cachetools                4.2.4                    pypi_0    pypi
certifi                   2022.5.18.1              pypi_0    pypi
cffi                      1.15.0                   pypi_0    pypi
charset-normalizer        2.0.12                   pypi_0    pypi
colorama                  0.4.4                    pypi_0    pypi
cudatoolkit               11.1.1               heb2d755_7    conda-forge
cycler                    0.10.0                   py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
cython                    0.29.30                  pypi_0    pypi
dataclasses               0.8                pyh787bdff_2    conda-forge
decorator                 4.4.2                    pypi_0    pypi
ear                       2.1.0                    pypi_0    pypi
flatbuffers               2.0                      pypi_0    pypi
freetype                  2.10.4               h546665d_1    conda-forge
google-auth               2.7.0                    pypi_0    pypi
google-auth-oauthlib      0.4.6                    pypi_0    pypi
grpcio                    1.46.3                   pypi_0    pypi
icu                       68.1                 h0e60522_0    conda-forge
idna                      3.3                      pypi_0    pypi
imageio                   2.15.0                   pypi_0    pypi
importlib-metadata        4.8.3                    pypi_0    pypi
importlib-resources       5.4.0                    pypi_0    pypi
intel-openmp              2021.3.0          h57928b3_3372    conda-forge
jbig                      2.1               h8d14728_2003    conda-forge
joblib                    1.1.0                    pypi_0    pypi
jpeg                      9d                   h8ffe710_0    conda-forge
kiwisolver                1.3.1            py36he95197e_1    conda-forge
lcms2                     2.12                 h2a16943_0    conda-forge
lerc                      2.2.1                h0e60522_0    conda-forge
libblas                   3.9.0              11_win64_mkl    conda-forge
libcblas                  3.9.0              11_win64_mkl    conda-forge
libclang                  11.1.0          default_h5c34c98_1    conda-forge
libdeflate                1.7                  h8ffe710_5    conda-forge
liblapack                 3.9.0              11_win64_mkl    conda-forge
liblapacke                3.9.0              11_win64_mkl    conda-forge
libpng                    1.6.37               h1d00b33_2    conda-forge
libprotobuf               3.18.0               h7755175_1    conda-forge
librosa                   0.9.1                    pypi_0    pypi
libtiff                   4.3.0                h0c97f57_1    conda-forge
libuv                     1.42.0               h8ffe710_0    conda-forge
llvmlite                  0.36.0                   pypi_0    pypi
lxml                      4.9.0                    pypi_0    pypi
lz4-c                     1.9.3                h8ffe710_1    conda-forge
m2w64-gcc-libgfortran     5.3.0                         6    conda-forge
m2w64-gcc-libs            5.3.0                         7    conda-forge
m2w64-gcc-libs-core       5.3.0                         7    conda-forge
m2w64-gmp                 6.1.0                         2    conda-forge
m2w64-libwinpthread-git   5.0.0.4634.697f757               2    conda-forge
markdown                  3.3.7                    pypi_0    pypi
matplotlib                3.3.1                         1    conda-forge
matplotlib-base           3.3.1            py36h856a30b_0    conda-forge
mkl                       2021.3.0           hb70f87d_564    conda-forge
mkl-devel                 2021.3.0           h57928b3_565    conda-forge
mkl-include               2021.3.0           hb70f87d_564    conda-forge
msys2-conda-epoch         20160418                      1    conda-forge
multipledispatch          0.6.0                    pypi_0    pypi
networkx                  2.5.1                    pypi_0    pypi
ninja                     1.10.2               h5362a0b_0    conda-forge
numba                     0.53.1                   pypi_0    pypi
numpy                     1.19.5           py36h4b40d73_2    conda-forge
oauthlib                  3.2.0                    pypi_0    pypi
olefile                   0.46               pyh9f0ad1d_1    conda-forge
onnx                      1.10.1           py36h524f2fb_1    conda-forge
onnxruntime               1.10.0                   pypi_0    pypi
openjpeg                  2.4.0                hb211442_1    conda-forge
openssl                   1.1.1o               h8ffe710_0    conda-forge
packaging                 21.3                     pypi_0    pypi
pandas                    1.1.5                    pypi_0    pypi
pesq                      0.0.4                    pypi_0    pypi
pillow                    8.3.2            py36h10c25d6_0    conda-forge
pip                       21.3.1                   pypi_0    pypi
pooch                     1.6.0                    pypi_0    pypi
prettytable               2.5.0                    pypi_0    pypi
protobuf                  3.19.4                   pypi_0    pypi
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pycparser                 2.21                     pypi_0    pypi
pyparsing                 2.2.0                    py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
pypesq                    1.2.4                    pypi_0    pypi
pyqt                      5.12.3           py36ha15d459_7    conda-forge
pyqt-impl                 5.12.3           py36he2d232f_7    conda-forge
pyqt5-sip                 4.19.18          py36he2d232f_7    conda-forge
pyqtchart                 5.12             py36he2d232f_7    conda-forge
pyqtwebengine             5.12.1           py36he2d232f_7    conda-forge
pystoi                    0.3.3                    pypi_0    pypi
pytest-runner             5.3.2                    pypi_0    pypi
python                    3.6.13          h39d44d4_2_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.6                     2_cp36m    conda-forge
pytorch                   1.9.1           py3.6_cuda11.1_cudnn8_0    pytorch
pytz                      2022.1                   pypi_0    pypi
pyvad                     0.1.3                    pypi_0    pypi
pywavelets                1.1.1                    pypi_0    pypi
qt                        5.12.9               h5909a2a_4    conda-forge
requests                  2.27.1                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
resampy                   0.2.2                    pypi_0    pypi
rsa                       4.8                      pypi_0    pypi
ruamel-yaml               0.17.21                  pypi_0    pypi
ruamel-yaml-clib          0.2.6                    pypi_0    pypi
scikit-image              0.17.2                   pypi_0    pypi
scikit-learn              0.24.2                   pypi_0    pypi
scipy                     1.5.4                    pypi_0    pypi
setuptools                59.5.0                   pypi_0    pypi
six                       1.16.0             pyh6c4a22f_0    conda-forge
soundfile                 0.10.3.post1             pypi_0    pypi
speechpy                  2.4                      pypi_0    pypi
sqlite                    3.36.0               h8ffe710_1    conda-forge
tbb                       2021.3.0             h2d74725_0    conda-forge
tensorboard               2.9.1                    pypi_0    pypi
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
threadpoolctl             3.1.0                    pypi_0    pypi
tifffile                  2020.9.3                 pypi_0    pypi
tk                        8.6.11               h8ffe710_1    conda-forge
torchaudio                0.9.1                      py36    pytorch
torchvision               0.2.2                      py_3    pytorch
tornado                   4.5.2                    py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
tqdm                      4.64.0                   pypi_0    pypi
typing-extensions         3.10.0.2             hd8ed1ab_0    conda-forge
typing_extensions         3.10.0.2           pyha770c72_0    conda-forge
ucrt                      10.0.20348.0         h57928b3_0    conda-forge
urllib3                   1.26.9                   pypi_0    pypi
vc                        14.2                 hb210afc_5    conda-forge
vs2015_runtime            14.29.30037          h902a5da_5    conda-forge
wavinfo                   1.6.3                    pypi_0    pypi
wcwidth                   0.2.5                    pypi_0    pypi
webrtcvad                 2.0.10                   pypi_0    pypi
werkzeug                  2.0.3                    pypi_0    pypi
wheel                     0.29.0                   py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
wincertstore              0.2                      py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
xz                        5.2.5                h62dcd97_1    conda-forge
zipp                      3.6.0                    pypi_0    pypi
zlib                      1.2.11            h62dcd97_1010    conda-forge
zstd                      1.5.0                h6255e5f_0    conda-forge

Fine Tuning Option

Hi,
Is there any option to fine-tune the pre-trained models ?
Thanks for your time.

Reproducing results

Hi,

I had tried to re-train the deepfilternet model using the DNS-3 challenge dataset mentioned in your work.

I don't have the additional 10k IR. However, the other dataset remains the same.

On VCTK test set, using the config.ini in the pre-trained model as my training config, my "best model" on validation gives PESQ score of 2.60. It is much lower than 2.81 from the pre-trained model.

In config.ini, Adamw is used, while in the paper Adam as optimizer is mentioned.

Do you think any other factors would result in such a performance drop?

Could you clarify on the 3 s sample for training? Suppose the DNS-3 sample has 10 s in a sample, do I need to split it into 3 s segments so as to utilize the entire train clip? Or just use the first 3 seconds of the clip? Alternatively, is random 3 s generated on-the-fly while training?

In the hdf5 setup, does the speech/noise/rir need to have sample number of samples? Or is the noise and RIR sampled randomly from a list? For example, if the speech list has 1000 samples, noise list is 100 samples and rir list is 100 samples, is it okay? or should it be 1000 speech, 1000 noise, 1000 rir? Is it needed to make the duration of speech and noise samples to be the same?

How about the reverberation parameter p_reverb = 0.05? The data augmentation is performed by default or any other config is needed? conv_lookahead = 2 in config.ini. But the paper mentions "look-ahead of l = 1 frame for both DF as well as in the DNN convolutions".

The accuracy problem after streaming implementation

Hello, thanks for your open source DeepFilterNet work. After trying it out, I found the effect and the amount of computation to be excellent.

After studying the network carefully, I confirmed that it meets the requirements of streaming speech processing. Therefore, after calculating the padding size, I changed the forward inference part of the model into streaming realization (for loop).
`
class Encoder(nn.Module):
def init(self):
super().init()
p = ModelParams()
layer_width = p.conv_ch
wf = p.conv_width_f
assert p.nb_erb % 4 == 0, "erb_bins should be divisible by 4"

    k = p.conv_k_enc
    kwargs = {"batch_norm": True, "depthwise": p.conv_depthwise}
    k0 = 1 if k == 1 and p.conv_lookahead == 0 else max(2, k)
    cl = 1 if p.conv_lookahead > 0 else 0
    self.erb_conv0 = convkxf(1, layer_width, k=k0, fstride=1, lookahead=cl, **kwargs)
    cl = 1 if p.conv_lookahead > 1 else 0
    self.erb_conv1 = convkxf(
        layer_width * wf**0, layer_width * wf**1, k=k, lookahead=cl, **kwargs
    )
    cl = 1 if p.conv_lookahead > 2 else 0
    self.erb_conv2 = convkxf(
        layer_width * wf**1, layer_width * wf**2, k=k, lookahead=cl, **kwargs
    )
    self.erb_conv3 = convkxf(
        layer_width * wf**2, layer_width * wf**2, k=k, fstride=1, **kwargs
    )
    self.df_conv0 = convkxf(
        2, layer_width, fstride=1, k=k0, lookahead=p.conv_lookahead, **kwargs
    )
    self.df_conv1 = convkxf(layer_width, layer_width * wf**1, k=k, **kwargs)
    self.erb_bins = p.nb_erb
    self.emb_dim = layer_width * p.nb_erb // 4 * wf**2
    self.df_fc_emb = GroupedLinear(
        layer_width * p.nb_df // 2, self.emb_dim, groups=p.lin_groups
    )
    self.emb_out_dim = p.emb_hidden_dim
    self.emb_n_layers = p.emb_num_layers
    self.gru_groups = p.gru_groups
    self.emb_gru = GroupedGRU(
        self.emb_dim,
        self.emb_out_dim,
        num_layers=p.emb_num_layers,
        batch_first=False,
        groups=p.gru_groups,
        shuffle=p.group_shuffle,
        add_outputs=True,
    )
    self.lsnr_fc = nn.Sequential(nn.Linear(self.emb_out_dim, 1), nn.Sigmoid())
    self.lsnr_scale = p.lsnr_max - p.lsnr_min
    self.lsnr_offset = p.lsnr_min

    self.streaming_state = {
        'e1': None,
        'e2': None,
        'c0': None,
    }

def forward(
    self, feat_erb: Tensor, feat_spec: Tensor
) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]:
    # Encodes erb; erb should be in dB scale + normalized; Fe are number of erb bands.
    #streaming implementation
    B, C, T, Ferb = feat_erb.shape
    erb_padding_right = torch.zeros((B, C, 2, Ferb), dtype=feat_erb.dtype, device=feat_erb.device)
    feat_erb = torch.cat([feat_erb, erb_padding_right], dim=-2)

    B, _, T, Fspec = feat_spec.shape
    spec_padding_right = torch.zeros((B, 2, 2, Fspec), dtype=feat_spec.dtype, device=feat_spec.device)
    feat_spec = torch.cat([feat_spec, spec_padding_right], dim=-2)
    e0_s, e1_s, e2_s, e3_s = None, None, None, None
    emb_s, c0_s, c1_s, lsnr_s = None, None, None, None

    self.streaming_state['e1'] = torch.zeros((B, 64, 1, Ferb // 2), dtype=feat_erb.dtype, device=feat_erb.device)
    self.streaming_state['e2'] = torch.zeros((B, 64, 1, Ferb // 4), dtype=feat_erb.dtype, device=feat_erb.device)
    self.streaming_state['c0'] = torch.zeros((B, 64, 1, Fspec), dtype=feat_spec.dtype, device=feat_spec.device)
  
    for t in range(T):  
        sub_e0 = self.erb_conv0[1:](feat_erb[:,:,t:t+3,:])  # [B, C, 2, F]
        sub_e1 = self.erb_conv1[1:](sub_e0)  # [B, C*2, 1, F/2]
        sub_e2 = self.erb_conv2[1:](torch.cat([self.streaming_state['e1'],sub_e1], dim=-2))  # [B, C*4, 1, F/4]
        sub_e3 = self.erb_conv3[1:](torch.cat([self.streaming_state['e2'],sub_e2], dim=-2))  # [B, C*4, 1, F/4]
        self.streaming_state['e1'] = sub_e1
        self.streaming_state['e2'] = sub_e2
        sub_c0 = self.df_conv0[1:](feat_spec[:,:,t+1:t+3,:])# [B, C, 1, Fc]
        sub_c1 = self.df_conv1[1:](torch.cat([self.streaming_state['c0'],sub_c0], dim=-2)) # [B, C*2, 1, Fc]  
        self.streaming_state['c0'] = sub_c0
        sub_cemb = sub_c1.permute(2, 0, 1, 3).reshape(1, B, -1)  # [1, B, C * Fc/4]
        sub_cemb = self.df_fc_emb(sub_cemb)  # [1, B, C * F/4]
        sub_emb = sub_e3.permute(2, 0, 1, 3).reshape(1, B, -1)  # [1, B, C * F/4]
        sub_emb = sub_emb + sub_cemb
        sub_emb, _ = self.emb_gru(sub_emb)
        sub_emb = sub_emb.transpose(0, 1) # [B, 1, C * F/4]
        sub_lsnr = self.lsnr_fc(sub_emb) * self.lsnr_scale + self.lsnr_offset

        if t == 0:
            e0_s, e1_s, e2_s, e3_s = sub_e0[:, :, [0], :], sub_e1, sub_e2, sub_e3
            c0_s, c1_s, emb_s, lsnr_s = sub_c0, sub_c1, sub_emb, sub_lsnr
        else:
            e0_s = torch.cat((e0_s, sub_e0[:, :, [0], :]), dim=-2)
            e1_s = torch.cat((e1_s, sub_e1), dim=-2)
            e2_s = torch.cat((e2_s, sub_e2), dim=-2)
            e3_s = torch.cat((e3_s, sub_e3), dim=-2)
            c0_s = torch.cat((c0_s, sub_c0), dim=-2)
            c1_s = torch.cat((c1_s, sub_c1), dim=-2)
            emb_s = torch.cat((emb_s, sub_emb), dim=-2)
            lsnr_s = torch.cat((lsnr_s, sub_lsnr), dim=-2)

    return e0_s, e1_s, e2_s, e3_s, emb_s, c0_s, lsnr_s

class ErbDecoder(nn.Module):
def init(self):
super().init()
p = ModelParams()
layer_width = p.conv_ch
wf = p.conv_width_f
assert p.nb_erb % 8 == 0, "erb_bins should be divisible by 8"

    self.emb_width = layer_width * wf**2
    self.emb_dim = self.emb_width * (p.nb_erb // 4)
    self.fc_emb = nn.Sequential(
        GroupedLinear(
            p.emb_hidden_dim, self.emb_dim, groups=p.lin_groups, shuffle=p.group_shuffle
        ),
        nn.ReLU(inplace=True),
    )
    k = p.conv_k_dec
    kwargs = {"k": k, "batch_norm": True, "depthwise": p.conv_depthwise}
    tkwargs = {
        "k": k,
        "batch_norm": True,
        "depthwise": p.convt_depthwise,
        "mode": p.conv_dec_mode,
    }
    pkwargs = {"k": 1, "f": 1, "batch_norm": True}
    # convt: TransposedConvolution, convp: Pathway (encoder to decoder) convolutions
    self.conv3p = convkxf(layer_width * wf**2, self.emb_width, **pkwargs)
    self.convt3 = convkxf(self.emb_width, layer_width * wf**2, fstride=1, **kwargs)
    self.conv2p = convkxf(layer_width * wf**2, layer_width * wf**2, **pkwargs)
    self.convt2 = convkxf(layer_width * wf**2, layer_width * wf**1, **tkwargs)
    self.conv1p = convkxf(layer_width * wf**1, layer_width * wf**1, **pkwargs)
    self.convt1 = convkxf(layer_width * wf**1, layer_width * wf**0, **tkwargs)
    self.conv0p = convkxf(layer_width, layer_width, **pkwargs)
    self.conv0_out = convkxf(layer_width, 1, fstride=1, k=k, act=nn.Sigmoid())

    self.streaming_state = {
        'convt3in': None,
        'convt2in': None,
        'convt1in': None,
        'conv0in': None
    }

def forward(self, emb, e3, e2, e1, e0) -> Tensor:
    # Estimates erb mask
    #streaming implementation
    B, C, T, F8 = e3.shape
    data_type, device = e3.dtype, e3.device
    self.streaming_state['convt3in'] = torch.zeros((B, C, 1, F8), dtype=data_type, device=device)
    self.streaming_state['convt2in'] = torch.zeros((B, C, 1, F8), dtype=data_type, device=device)
    self.streaming_state['convt1in'] = torch.zeros((B, C, 1, F8*2), dtype=data_type, device=device)
    self.streaming_state['conv0in'] = torch.zeros((B, C, 1, F8*4), dtype=data_type, device=device)
    m = None
    for t in range(T):
        sub_emb = self.fc_emb(emb[:, [t], :])
        sub_emb = sub_emb.view(B, 1, -1, F8).transpose(1, 2)  # [B, C*8, T, F/8]
        convt3_in_cur = self.conv3p(e3[:, :, [t], :]) + sub_emb
        convt3_in = torch.cat([self.streaming_state['convt3in'], convt3_in_cur], dim=-2)
        self.streaming_state['convt3in'] = convt3_in_cur
        sub_e3 = self.convt3[1:](convt3_in)  # [B, C*4, T, F/4]
        convt2_in_cur = self.conv2p(e2[:, :, [t], :]) + sub_e3
        convt2_in = torch.cat([self.streaming_state['convt2in'], convt2_in_cur], dim=-2)
        self.streaming_state['convt2in'] = convt2_in_cur
        sub_e2 = self.convt2[1:](convt2_in)  # [B, C*2, T, F/2]
        convt1_in_cur = self.conv1p(e1[:, :, [t], :]) + sub_e2
        convt1_in = torch.cat([self.streaming_state['convt1in'], convt1_in_cur], dim=-2)
        self.streaming_state['convt1in'] = convt1_in_cur
        sub_e1 = self.convt1[1:](convt1_in)  # [B, C, T, F]
        conv0_in_cur = self.conv0p(e0[:, :, [t], :]) + sub_e1
        conv0_in = torch.cat([self.streaming_state['conv0in'], conv0_in_cur], dim=-2)
        self.streaming_state['conv0in'] = conv0_in_cur
        sub_m = self.conv0_out[1:](conv0_in)  # [B, 1, T, F]
        if t == 0:
            m = sub_m
        else:
            m = torch.cat((m, sub_m), dim=-2)
    return m

class DfNet(nn.Module):
def init(
self,
erb_inv_fb: Tensor,
run_df: bool = True,
train_mask: bool = True,
):
super().init()
p = ModelParams()
layer_width = p.conv_ch
assert p.nb_erb % 8 == 0, "erb_bins should be divisible by 8"
self.freq_bins = p.fft_size // 2 + 1
self.emb_dim = layer_width * p.nb_erb
self.erb_bins = p.nb_erb
self.enc = Encoder()
self.erb_dec = ErbDecoder()
self.mask = Mask(erb_inv_fb, post_filter=p.mask_pf)

    self.df_order = p.df_order
    self.df_bins = p.nb_df
    self.df_lookahead = p.df_lookahead
    self.df_dec = DfDecoder()
    self.df_op = torch.jit.script(
        DfOp(
            p.nb_df,
            p.df_order,
            p.df_lookahead,
            freq_bins=self.freq_bins,
            method=p.dfop_method,
        )
    )

    self.run_df = run_df
    if not run_df:
        from loguru import logger
        logger.warning("Runing without DF")
    self.train_mask = train_mask

def forward(
    self,
    spec: Tensor,
    feat_erb: Tensor,
    feat_spec: Tensor,  # Not used, take spec modified by mask instead
    atten_lim: Optional[Tensor] = None,
) -> Tuple[Tensor, Tensor, Tensor, Tensor]:
    feat_spec = feat_spec.transpose(1, 4).squeeze(4)  # re/im into channel axis
    e0, e1, e2, e3, emb, c0, lsnr = self.enc(feat_erb, feat_spec)
    m = self.erb_dec(emb, e3, e2, e1, e0)
    spec = self.mask(spec, m, atten_lim)
    self.run_df = False
    if self.run_df:
        df_coefs, df_alpha = self.df_dec(emb, c0)
        spec = self.df_op(spec, df_coefs, df_alpha)
    else:
        df_alpha = torch.zeros(spec.shape[0], spec.shape[2], 1, device=spec.device)
    return spec, m, lsnr, df_alpha   

`

I only use Encoder and ErbDecoder modules. However, my result was not so good. Later, I found that the difference is due to nn.GRU inference. There will be different results between batched and one by one inference of nn.GRU because of NUMERICAL ACCURACY. https://pytorch.org/docs/stable/notes/numerical_accuracy.html

failed to run custom build command for `hdf5-sys v0.8.1`

when i use the cmd follow install libdfdat, but failed

Optional: Install libdfdata python package with dataset and dataloading functionality for training

maturin develop --release -m pyDF-data/Cargo.toml

error: failed to run custom build command for hdf5-sys v0.8.1

Caused by:
process didn't exit successfully: E:\data\deeplearning\pytorch\DeepFilterNet\target\release\build\hdf5-sys-8ffb164969e6e670\build-script-build (exit code: 101)
--- stdout
Searching for installed HDF5 (any version)...
Found no HDF5 installations.

--- stderr
thread 'main' panicked at 'Unable to locate HDF5 root directory and/or headers.', C:\Users\tangzixing.cargo\registry\src\github.com-1ecc6299db9ec823\hdf5-sys-0.8.1\build.rs:548:13
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
error: build failed
💥 maturin failed
Caused by: Failed to build a native library through cargo
Caused by: Cargo build finished with "exit code: 101": cargo rustc --message-format json --manifest-path pyDF-data/Cargo.toml --release --lib -- -C link-arg=-s

Question about deepfilter2 code

Thanks for your awesome work!
And I am confusing about the pad_feat/pad_spec and df_op function so I open this issue to check it out.
First, I try to test your trained model, and the class DfNet() in deepfilternet2.py

self.pad_feat = nn.ConstantPad2d((0, 0, -p.conv_lookahead, p.conv_lookahead), 0.0)
self.pad_spec = nn.ConstantPad3d((0, 0, 0, 0, -p.df_lookahead, p.df_lookahead), 0.0)
self.pad_out = nn.Identity()

for line 430-432,444-445

feat_erb = self.pad_feat(feat_erb)
feat_spec = self.pad_feat(feat_spec)
e0, e1, e2, e3, emb, c0, lsnr = self.enc(feat_erb, feat_spec)

spec_f = self.pad_spec(spec)
spec_f = self.df_op(spec_f, df_coefs)

My question is,
a. In nn.ConstantPad2/3d, -p.df_lookahead=2 means to remove the data, so is there 2 frames of information missing during training?
b. self.df_op is Causal/ not-Causal model? For example, is the first frame data calculated using 0,0,0,0 and 3 frames?

Thanks!

question about function "band_mean_norm_erb"

Hi, Rikorose
Thanks for sharing this code! There are some problems for me , could you give me some suggestions?

    (1) How does the function "band_mean_norm_erb" work , or could you give me some papers to explain this implementation?

    (2) "band_mean_norm_erb" is called by "transforms::erb_norm", "MEAN_NORM_INIT" is used in "transforms::erb_norm" and the number of "MEAN_NORM_INIT" is "[-60.0, -90.0]", so  whether  the numbers [-60.0, -90.0] is determined by experience or obtained through mathematical derivation? 

    Thank you !  

Question about erb_fb

Hi,
I am confusing about the erb_rb function so i open this issue to check it out.
In forward, the erb2stft is done by spec_mask = erb_mask.matmul(erb_inv_fb), but i check the code in librosa, the erb2stft is done by mel_to_stft function

    # Construct a mel basis with dtype matching the input data
    mel_basis = filters.mel(
        sr=sr, n_fft=n_fft, n_mels=M.shape[-2], dtype=M.dtype, **kwargs
    )

    # Find the non-negative least squares solution, and apply
    # the inverse exponent.
    # We'll do the exponentiation in-place.
    inverse = nnls(mel_basis, M)
    return np.power(inverse, 1.0 / power, out=inverse)

My question is,
a. Is the erb2stft process lossless, so is the mel2stft, bark2stft?
b. Is the erb better than stft feature in DeepFilterNet?

Thanks

Read the hdf5 file failed

I want to reproduce your job ,but when i run the train, load the hdf5 failed.

2022-03-01 14:40:46 | INFO | DF | Running on torch 1.10.0
2022-03-01 14:40:46 | INFO | DF | Running on host ultralab-server
2022-03-01 14:40:46 | INFO | DF | Git commit: 05da995, branch: main
2022-03-01 14:40:46 | INFO | DF | Running on device cuda:0
2022-03-01 14:40:46 | INFO | DF | Initializing model deepfilternet
2022-03-01 14:40:53 | WARNING | DF | Failed to print model summary: No module named 'ptflops'
2022-03-01 14:40:53 | INFO | DF | Running with normalization window alpha = '0.996'
2022-03-01 14:40:53 | INFO | DF | Initializing dataloader with data directory ../data/dns/
2022-03-01 14:40:53 | ERROR | DF | An error has been caught in function '', process 'MainProcess' (19629), thread 'MainThread' (140105826391808):
Traceback (most recent call last):

File "df/train.py", line 425, in
main()
└ <function main at 0x7f6cccc270d0>

File "df/train.py", line 103, in main
dataloader = DataLoader(
└ <class 'libdfdata.torch_dataloader.PytorchDataLoader'>

File "/home/tangzixing/data/deeplearning/program/audio/DeepFilterNet-0.1.10/pyDF-data/libdfdata/torch_dataloader.py", line 99, in init
self.loader = _FdDataLoader(
│ └ <class 'builtins._FdDataLoader'>
└ <libdfdata.torch_dataloader.PytorchDataLoader object at 0x7f6ccc9ae460>

RuntimeError: DF dataset error: Hdf5ErrorDetail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../data/dns/val_speech.hdf5" }

the h5py verision is 3.6.0
h5py.version.hdf5_version is 1.2.1

which version of h5py you use ?

Question about data generation

Hi,

In the current framework, it seems the speech, noise and RIRs path have to be provided as lists to create the hdf5 sets for training. I have few questions on the same.

  1. How to check the size of dataset used for training? For example, if the number of speech samples is 10 with each sample 3 s in duration, the dataset size used is 30s? In other words, how to prepare datasets of different size? Is it based on the number of speech samples?
  2. Regarding data augmentation. I see it is build from dataset.rs . If the values of SNR or gains need to be changed, does it needs to be re-build?

Trends in loss training

Hi Rikorose,

I'm now in the process of transferring your PyTorch code to Tensorflow/Keras and I'm running into some issues.
The loss factor for maskloss is 0, dfalphaloss is 1000, and spectralloss is 20000.
But I think dfalphaloss is not multiplied by 1000 in your code, the loss looks like dfalphaloss +spectrumloss * 20000.

And my training phase in Keras, dfalphaloss dropped from 0.085 to 0.06 in the first epoch, and then did not continue the downward trend.
But spectrum loss is slowly decreasing. Is this situation correctly?
I also try sisdr loss, but the effect also not good.

Another thing, when I compared the process file with your code. My .wav file looks like deepfilter(stage 2) doesn't work, and your wav is obviously processed under 5kHz.

By the way, when I check the code in 'LocalSnrTarget', the 'ws' and 'ws_ns' are not same, so when compute the local snr in speech/noise use the different frames? (speech for 1 and noise for 3)
And I think the lsnr layer in Encoder doesn't use to compute loss?

Do you have any suggestion?
Thanks,

config.ini for reproducing results

Hi,

Is it possible to provide a config.ini to reproduce the results in the paper?

Is the pre-trained model's config.ini same as the one used to get the results in the paper?

errors when trying to process wav files on Windows

Hello,
I was trying to test deepfilternet on Windows. I don't know much about the technical aspects or Python in general though, and consequently I am getting errors and I don't know the cause.
I ran these commands from the readme:
pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
pip install deepfilternet
(So far, no errors)
deepFilter test.wav
(Test.wav is just a placeholder to the path to a real file)

I tried both with Python 3.10 (latest) and 3.7 as suggested by the deepfilterlib page on pypi. In both cases I get errors that no audio backend is available, and that libdf couldn't be found.
Any advice would be greatly appreciated. Thanks in advance!

Data normalization

Hello,
I am trying to do my own data augmentation.
And I found that my dataset always has Nan loss 3 loss function.
I noticed that the paper says that the data has been exponentially mean/unit normalized, I think maybe this cause the Nan issue for me. Could you give some details?

Thanks,

Group Shuffle parameter missed in the default config

Running deepFilter wav_name

Traceback (most recent call last):
  File "deepFilter", line 8, in <module>
    sys.exit(main())
  File "env/lib/python3.8/site-packages/df/enhance.py", line 66, in main
    p = ModelParams()
  File "env/lib/python3.8/site-packages/df/model.py", line 15, in __init__
    self.__params = getattr(import_module("df." + self.__model), "ModelParams")()
  File "env/lib/python3.8/site-packages/df/deepfilternet.py", line 48, in __init__
    self.group_shuffle: bool = config(
  File "env/lib/python3.8/site-packages/df/config.py", line 114, in __call__
    raise ValueError(f"Value '{option}' not found in config (defaults not allowed).")
ValueError: Value 'GROUP_SHUFFLE' not found in config (defaults not allowed).

Noise reduction didn't work

Hi, Rikorose, I have a question to ask you. I have trained for 10 epoch according to your tutorial, and the configuration file is the config.ini of the pre-training model, but WHEN I used the prediction of this model, I found almost no noise reduction effect, could you help me analyze the reason. Thank you.

The following file contains my training log,model and cfg file.

train.zip

how to avoid rust

how can I just avoid using the Rust library altogether? Let's say I want to do everything in python, how can I get "spec, erb_feat, spec_feat" without using rust and just python?

Confusion about the deepfilternet2 paper

Hi!First, really great work and thanks for open source and everything. But I have a few

  1. For the loss in the paper, there is a spectrogram loss and a muli-resolution spectrogram loss. But I feel the multi-res loss could just include the spectrogram loss as one single window size, so why is it written separately in the paper? Did I miss something?
  2. In the paper it says 60000HRTFs, I'm wondering if that's a typo? I feel it should be RIRs.

Thanks!

Buffer length of real-time inference

Hi Rikorose,

Sorry to bother you again,
I have transfered the code to keras and try to inference like real-time.

I found a few situations I want to ask,

  1. In enhance.py, the flow looks like feeding the entire signal into the model? It is like a offline inference?

  2. I try to do a real-time inference, this is my flow.

    • Feeding a fixed-length buffer (with a look-ahead of 1 frame)
    • Getting the same length output
    • Taking the second last frame output as the real-time processed frame.

    When I change the buffer length to inference, the effect is good while the length is 100 or 300.
    But when I change the buffer length to 10 frames, the effect sounds bad.
    When I try to do the real-time inference, what is the minimum buffer length?

Thanks,

Question on rationale for setting model memory_format to torch.channels_last

Hi @Rikorose,

I'd like to ask about the rationale behind this line
model.to(memory_format=torch.channels_last)
is it merely for speed in tensor core and it shouldn't affect the output values at all?
The reason I'm asking this is because, commenting out that line will give me different result, i.e.
enhanced = model(spec, erb_feat, spec_feat)[0].cpu()
will lead to different values of enhanced.

I wasn't expecting this, and I found out that the output of a convolution layer, specifically the enc.erb_conv0, is different with and without that setting. The code doesn't set the input memory format to channels last, only the model. So we have input that's channels first and weights that's channels last. I dig around pytorch forum and came across this thread, where they claim that pytorch should take this into account. Is that what you intended, for pytorch to internally handle the different format and automatically convert the input tensors to channels last? In that case, this difference in result isn't the intended behavior, and I've added a reply to the thread mentioning this behavior. But if that's not the case, and the different output is expected, may I understand the rationale for this?

Thanks,
Emily

Occurrence of NaN during training

Hi,

There seems to be occurrence of NaN for loss after few epochs during training.

Is there any way to avoid it?

Is it possible to resume the training from a particular checkpoint? I understand the training is resumed from the last saved checkpoint. But if the last saved checkpoint is NaN, then resuming the training would be an issue.

Need help installing

really need step-by-step instruction, please help me
I'm using Win10, Anaconda prompt to run the code

I followed README.md to download the code

pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html

and

pip install deepfilternet

works smoothy, but having issue on

deepFilter path/to/noisy_audio.wav

here's the error:
2022-06-15 14:49:09 | INFO | DF | Running on torch 1.11.0+cpu
2022-06-15 14:49:09 | INFO | DF | Running on host DESKTOP-RP8O01C
fatal: not a git repository (or any of the parent directories): .git
2022-06-15 14:49:09 | INFO | DF | Loading model settings of DeepFilterNet2
2022-06-15 14:49:09 | INFO | DF | Using DeepFilterNet2 model at anaconda3\lib\site-packages\pretrained_models\DeepFilterNet2
2022-06-15 14:49:09 | INFO | DF | Initializing model deepfilternet2
2022-06-15 14:49:10 | INFO | DF | Found checkpoint anaconda3\lib\site-packages\pretrained_models\DeepFilterNet2\checkpoints\model_96.ckpt.best with epoch 96
2022-06-15 14:49:10 | WARNING | DF | Unexpected key: erb_comp.c
2022-06-15 14:49:10 | WARNING | DF | Unexpected key: erb_comp.mn
2022-06-15 14:49:10 | INFO | DF | Model loaded
Traceback (most recent call last):
File "C:\Users\Mistorm\anaconda3\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Mistorm\anaconda3\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\Mistorm\anaconda3\Scripts\deepFilter.exe_main
.py", line 7, in
File "C:\Users\Mistorm\anaconda3\lib\site-packages\df\enhance.py", line 329, in run
main(parser.parse_args())
File "C:\Users\Mistorm\anaconda3\lib\site-packages\df\enhance.py", line 41, in main
audio, meta = load_audio(file, df_sr)
File "C:\Users\Mistorm\anaconda3\lib\site-packages\df\enhance.py", line 188, in load_audio
info: AudioMetaData = ta.info(file, **ikwargs)
File "C:\Users\Mistorm\anaconda3\lib\site-packages\torchaudio\backend\soundfile_backend.py", line 103, in info
sinfo = soundfile.info(filepath)
File "C:\Users\Mistorm\anaconda3\lib\site-packages\soundfile.py", line 438, in info
return _SoundFileInfo(file, verbose)
File "C:\Users\Mistorm\anaconda3\lib\site-packages\soundfile.py", line 383, in init
with SoundFile(file) as f:
File "C:\Users\Mistorm\anaconda3\lib\site-packages\soundfile.py", line 629, in init
self._file = self._open(file, mode_int, closefd)
File "C:\Users\Mistorm\anaconda3\lib\site-packages\soundfile.py", line 1183, in _open
_error_check(_snd.sf_error(file_ptr),
File "C:\Users\Mistorm\anaconda3\lib\site-packages\soundfile.py", line 1357, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'path/to/noisy_audio.wav': System error.

I don't know why it can't open path/to/noisy_audio.wav
so I can't do the further step either:

(base) C:\Users\Mistorm>cd path/to/DeepFilterNet/
The system cannot find the path specified.

(base) C:\Users\Mistorm>cd DeepFilterNet
The system cannot find the path specified.

I can't figure why it can't find the file, since DeepFilterNet and DeepFilterNet2 has already in
C:\Users\Mistorm\anaconda3\Lib\site-packages\pretrained_models

what step did I miss?
and what to do after that? I have a hard time understanding how to make it works...

Question about DNSMOS

Hi, thanks for this work. I have some questions about DNSMOS.
I tested the raw blind test set and deepfilter2 results with the DNSMOS tool (https://github.com/microsoft/DNS-Challenge dnsmos.py). Both results are different from your paper. What preprocessing did you do to the blind test set?

Error while running maturin build

When I run the command
maturin build --release -m DeepFilterNet/Cargo.toml

I am getting the following error.

🔗 Found pyo3 bindings
🐍 Found CPython 3.6m at python3.6, CPython 3.7m at python3.7
Compiling df v0.1.0 (/content/DeepFilterNet/libDF)
error[E0277]: [u32; 5] is not an iterator
--> libDF/src/transforms.rs:449:42
|
449 | let fft_size = primes.iter().zip(factors).fold(1, |acc, (p, f)| acc * p.pow(f));
| ^^^^^^^
| |
| expected an implementor of trait IntoIterator
| help: consider borrowing here: &factors
|
= note: the trait bound [u32; 5]: IntoIterator is not satisfied
= note: required because of the requirements on the impl of IntoIterator for [u32; 5]

error[E0599]: the method fold exists for struct std::iter::Zip<std::slice::Iter<'_, usize>, [u32; 5]>, but its trait bounds were not satisfied
--> libDF/src/transforms.rs:449:51
|
449 | let fft_size = primes.iter().zip(factors).fold(1, |acc, (p, f)| acc * p.pow(f));
| ^^^^ method cannot be called on std::iter::Zip<std::slice::Iter<'_, usize>, [u32; 5]> due to unsatisfied trait bounds
|
= note: the following trait bounds were not satisfied:
[u32; 5]: Iterator
which is required by std::iter::Zip<std::slice::Iter<'_, usize>, [u32; 5]>: Iterator
std::iter::Zip<std::slice::Iter<'_, usize>, [u32; 5]>: Iterator
which is required by &mut std::iter::Zip<std::slice::Iter<'_, usize>, [u32; 5]>: Iterator

error: aborting due to 2 previous errors

Some errors have detailed explanations: E0277, E0599.

For more information about an error, try rustc --explain E0277.

error: could not compile df

To learn more, run the command again with --verbose.
💥 maturin failed
Caused by: Failed to build a native library through cargo
Caused by: Cargo build finished with "exit status: 101": cargo rustc --message-format json --manifest-path DeepFilterNet/Cargo.toml --release --lib -- -C link-arg=-s

It would be great if you help.
Thanks

Training Instructions

This is really an amazing piece of work @Rikorose
Could you please add some instructions on how to train this with custom dataset?

H5Fopen(): unable to open file: bad superblock version number

Hi, when I run the train, the code failed:
RuntimeError: DF dataset error: Hdf5ErrorDetail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during Fil
e::open of dataset /dockerdata/thujunchen/cjcode/ft_local/DeepFilterNet/DNS16kdataset/VALID_SET_SPEECH.hdf5" }

There is no error reported at df/prepare_data.py.

I have tried the cargo test, which reports that:
running 24 tests
test reexport_dataset_modules::util::test_find_max_abs ... ok
test tests::test_erb_inout ... ok
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_07 - should panic ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_10 ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_read_vorbis ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_05 - should panic ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_01 ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_read_pcm ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_02 ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_04 - should panic ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_read_flac ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_06 - should panic ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_09 ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_08 ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_03 ... FAILED
test reexport_dataset_modules::dataloader::tests::test_fft_dataset ... FAILED
test reexport_dataset_modules::dataset::tests::test_cached_valid_dataset ... FAILED
test reexport_dataset_modules::augmentations::tests::test_filters ... ok
test reexport_dataset_modules::augmentations::tests::test_gen_noise ... ok
test reexport_dataset_modules::augmentations::tests::test_clipping ... ok
test reexport_dataset_modules::augmentations::tests::test_rand_resample ... ok
test reexport_dataset_modules::augmentations::tests::test_low_pass ... ok
test reexport_dataset_modules::dataset::tests::test_mix_audio_signal ... ok
test reexport_dataset_modules::augmentations::tests::test_reverb ... ok

failures:

---- reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_06 stdout ----
-------------- TEST START --------------
thread 'reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_06' panicked at 'called Result::unwrap() on an Err value: Hdf5Error
Detail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../assets/noise_flac.h
df5" }', libDF/src/dataset.rs:1956:41
note: panic did not contain expected string
panic message: "called Result::unwrap()on anErr value: Hdf5ErrorDetail { source: H5Fopen(): unable to open file: bad superblock ve rsion number, msg: \"Error during File::open of dataset ../assets/noise_flac.hdf5\" }",
expected substring: "Slice end"
---- reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_09 stdout ----
-------------- TEST START --------------
thread 'reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_09' panicked at 'called Result::unwrap() on an Err value: Hdf5Error
Detail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../assets/noise_flac.h
df5" }', libDF/src/dataset.rs:1956:41

---- reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_08 stdout ----
-------------- TEST START --------------
thread 'reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_08' panicked at 'called Result::unwrap() on an Err value: Hdf5Error
Detail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../assets/noise.hdf5"
}', libDF/src/dataset.rs:1956:41

---- reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_03 stdout ----
-------------- TEST START --------------
thread 'reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_03' panicked at 'called Result::unwrap() on an Err value: Hdf5Error
Detail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../assets/noise_vorbis
.hdf5" }', libDF/src/dataset.rs:1956:41

---- reexport_dataset_modules::dataloader::tests::test_fft_dataset stdout ----
******** Start test_data_loader() ********
Error: DatasetError(Hdf5ErrorDetail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of d
ataset ../assets/clean.hdf5" })
thread 'reexport_dataset_modules::dataloader::tests::test_fft_dataset' panicked at 'assertion failed: (left == right)
left: 1,
right: 0: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/fe5b13d681f25ee6474be29d7
48c65adcd91f69e/library/test/src/lib.rs:186:5

---- reexport_dataset_modules::dataset::tests::test_cached_valid_dataset stdout ----
Error: Hdf5ErrorDetail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../ass
ets/clean.hdf5" }
thread 'reexport_dataset_modules::dataset::tests::test_cached_valid_dataset' panicked at 'assertion failed: (left == right)
left: 1,
right: 0: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/fe5b13d681f25ee6474be29d7
48c65adcd91f69e/library/test/src/lib.rs:186:5

failures:
reexport_dataset_modules::dataloader::tests::test_fft_dataset
reexport_dataset_modules::dataset::tests::test_cached_valid_dataset
reexport_dataset_modules::dataset::tests::test_hdf5_read_flac
reexport_dataset_modules::dataset::tests::test_hdf5_read_pcm
reexport_dataset_modules::dataset::tests::test_hdf5_read_vorbis
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_01
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_02
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_03
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_04
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_05
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_06
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_07
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_08
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_09
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_10

test result: FAILED. 9 passed; 15 failed; 0 ignored; 0 measured; 0 filtered out; finished in 3.67s

error: test failed, to rerun pass '-p deep_filter --lib'

I tried to update hdf5 to 1.10.1 as stated in https://stackoverflow.com/questions/49386121/python-h5py-file-read-oserror-unable-to-open-file-bad-superblock-version-numb, but it also did not work.

Question on dataloader in training.py

I am working on training setup. I got below error in run_epoch function in [train.py.]

ERROR:-

File "/DeepFilterNet-main/DeepFilterNet/df/train.py", line 479, in
main()
└ <function main at 0x7f74abd598b0>

File "/DeepFilterNet-main/DeepFilterNet/df/train.py", line 173, in main
train_loss = run_epoch(
└ <function run_epoch at 0x7f74abd665e0>

File "/DeepFilterNet-main/DeepFilterNet/df/train.py", line 265, in run_epoch
assert batch.feat_spec is not None
│ └ None
└ Batch of size 1:
length: 240000
snr: -5
gain: 0

AssertionError: assert batch.feat_spec is not None

usage: train.py [-h] [--debug] data_config_file data_dir base_dir

python df/train.py ../assets/dataset.cfg ../assets/ df/new_config
Config file used
{
"test": [
[
"clean.hdf5",
100
],
[
"noise.hdf5",
10
]
],
"train": [
[
"clean.hdf5",
10000
],
[
"noise.hdf5",
10
]
],
"valid": [
[
"clean.hdf5",
100
],
[
"noise.hdf5",
10
]
]
}

I used above configuration file(dataset.cfg file available in Deepfilternet-main/assets/ directory).

It is saying that batch.feat_spec doesn't contain any information. Do we need to write batch.feat_spec and batch.feat_erb while writing to hdf5 file itself or dataloader function will extract these features?? .

Is train.py using predefined dataloader from torch.utils.data or it is taking from "C:\DFnet\DeepFilterNet-main\pyDF-data\libdfdata\torch_dataloader".

Can u help me out to resolve this error (AssertionError: assert batch.feat_spec is not None). Hope to hear from you soon.

Thanks in advance.

error_train
Uploading error_train.PNG…

Question for training

Hi Rikorose,

Sorry to bother you again,
I try to generate data and train the model according to the training part.

I generated the training_set.txt (just select 10 files for test.) for speech and made the hdf5.(and so on for noise).
Use python df/prepare_data.py --sr 48000 speech training_set.txt TRAIN_SET_SPEECH.hdf5.

~/DeepFilterNet/wav/dataset/oblomov_s009036.wav
~/DeepFilterNet/wav/dataset/oblomov_s009040.wav  
~/DeepFilterNet/wav/dataset/oblomov_s009033.wav     
~/DeepFilterNet/wav/dataset/oblomov_s009037.wav    
~/DeepFilterNet/wav/dataset/oblomov_s009041.wav  
~/DeepFilterNet/wav/dataset/oblomov_s009034.wav    
~/DeepFilterNet/wav/dataset/oblomov_s009038.wav     
~/DeepFilterNet/wav/dataset/oblomov_s009042.wav  
~/DeepFilterNet/wav/dataset/oblomov_s009035.wav     
~/DeepFilterNet/wav/dataset/oblomov_s009039.wav  

Generate the dataset.cfg as shown below,

{
 "train": [
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_SPEECH.hdf5",
      1.0
    ],
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_NOISE.hdf5",
      1.0
    ]
  ],
  "valid": [
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_SPEECH.hdf5",
      0.2
    ],
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_NOISE.hdf5",
      0.2
    ]
  ],
  "test": [
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_SPEECH.hdf5",
      0.2
    ],
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_NOISE.hdf5",
      0.2
    ]
  ]
} 

Encounter some error as shown in the figure below,
擷取

In addition, I have some questions:

  1. In the command python df/train.py dataset.cfg ~/wav_folder/ ./base_dir/,
    -Is data_dir a wav folder or an hdf5 folder? (I think is hdf5 folder)
    -Can base_dir/ not exist? (But we need to give config.ini, so here I enter pertrained_model/ and delete .ckpt)
  2. I found that the log says dataloader len:0, is this a problem?
  3. I remove the all 'df.' of each file while import (ex. from df.config import ... -> from config import ...), otherwise it will cause an import error.

Thanks,

What is up with the canned deepFilter?

(DeepFilterNet) C:\Users\sdr\Downloads\Playground\DeepFilterNet>deepFilter test.wav
Traceback (most recent call last):
File "C:\Users\sdr\anaconda3\envs\DeepFilterNet\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\sdr\anaconda3\envs\DeepFilterNet\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\sdr\anaconda3\envs\DeepFilterNet\Scripts\deepFilter.exe_main
.py", line 7, in
TypeError: main() missing 1 required positional argument: 'args'

(DeepFilterNet) C:\Users\sdr\Downloads\Playground\DeepFilterNet>

Question on package

Hi,

I found a case that even if I remove a file such as loss.py from the project, the train.py still runs.

How is this possible? Maybe I am overlooking something.

Is it possible to edit loss.py ? Currently when I edit loss.py I can't see any changes happening. If I move loss.py out of df folder also, train.py does not throw any error.

Question about WAV Encoding

Hi, thanks for your amazing work! I have some questions about the wav encoding. For the training and test script, which encoding(signed 16, signed 32, float 32) is supported? Thank you!

Unable to run deepFIlter form CLI

Hi guys,
I have followed the installation instruction using pip and now I'm stuck with SLURM and I don't know how to fix it

I'm executing deepfilternet from colab pro+ account.

Here is the issue.

Traceback (most recent call last):
File "/usr/local/bin/deepFilter", line 5, in
from df.enhance import run
File "/usr/local/lib/python3.7/dist-packages/df/enhance.py", line 18, in
from df.logger import init_logger, warn_once
File "/usr/local/lib/python3.7/dist-packages/df/logger.py", line 49
if (jobid := os.getenv("SLURM_JOB_ID")) is not None:
^
SyntaxError: invalid syntax

This is which I get while trying to execute this :-> !deepFilter /content/test_audio_053830.wav --output-dir /content

Anyone got into this kind of issue??

DO let me know the solution / how can I run this.

About whitenoise performance

Thanks for your awesome work!
I have installed deepfilternet through pip and test some samples. The overall noise reduction effect is great, but in whitenoise scene, there are more residues in the speech, which leads to a poor subjective feelings.
Have you noticed this phenomenon? I will attach the samples below.
samples.zip

Questions about the latency

Hi, amazing work first! But I've got some questions about the latency described in your paper as it says as the following table
image
How many frames (20ms for a stft window, 10ms for the hop size)is the MACS calculated under ?

Run on normal RAM?

Hi, I've tried to get Colab Pro in order to run this on their higher RAM (which I thought was going to be extra GPU memory) but turns out it's just normal RAM. Is it possible for me to run Deepfilternet on a normal CPU + RAM combination as opposed to use GPU memory? Basically I'm doing this because I'd like to run it on longer files (hour long)

Question about data-preprocessing

Hi, thanks for your amazing works.
I try to follow the step in readme.md to make a dataset.

I am little confused about make hdf5 and cfg:

  1. Because we have 3 categories (speech,noise,RIRs), should I make 3 hdf5 corresponding to the above 3 categories?
  2. In dataset.cfg, the readme.md says that it should contain the 3 entries(train,valid,test), and optionally sampling factor.
    • So I need to fill the each entry in the .cfg for 3 hdf5 which generate from question1?
    • The sum of the sampling factor in 3 entries is 1?
    • Is the following wording correct?
{
  "train":[
  [
     "TRAIN_SET_SPEECH.hdf5",
     0.6
  ],
  [
     "TRAIN_SET_NOISE.hdf5",
     0.6
  ],
  [
     "TRAIN_SET_RIR.hdf5",
     0.6
  ]
 ],

"valid":[
  [
     "TRAIN_SET_SPEECH.hdf5",
     0.2
  ],
  [
     "TRAIN_SET_NOISE.hdf5",
     0.2
  ],
  [
     "TRAIN_SET_RIR.hdf5",
     0.2
  ]
 ],

"test":[
  [
     "TRAIN_SET_SPEECH.hdf5",
     0.2
  ],
  [
     "TRAIN_SET_NOISE.hdf5",
     0.2
  ],
  [
     "TRAIN_SET_RIR.hdf5",
     0.2
  ]
 ],

}

quesion on train

follow #31 and readme ,i have prepare the speech and noise hdf5 files and dateset.cfg file ,

the speech and noise data is from dns challenge,the num is 50 and the batchsize i set is 2

when i train the net ,erros occur as follows :
image

image

can you give me some advice to fix this error? thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.