rikorose / deepfilternet Goto Github PK

View Code? Open in Web Editor NEW

2.1K 2.1K 192.0 175.01 MB

Noise supression using deep filtering

Home Page: https://huggingface.co/spaces/hshr/DeepFilterNet2

License: Other

Python 54.24% Rust 42.37% Shell 3.39%

audio deep-learning noise-suppression pytorch rust speech speech-enhancement

deepfilternet's Issues

Training Instructions

This is really an amazing piece of work @Rikorose
Could you please add some instructions on how to train this with custom dataset?

Need help installing

really need step-by-step instruction, please help me
I'm using Win10, Anaconda prompt to run the code

I followed README.md to download the code

pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html

and

pip install deepfilternet

works smoothy, but having issue on

deepFilter path/to/noisy_audio.wav

here's the error:
2022-06-15 14:49:09 | INFO | DF | Running on torch 1.11.0+cpu
2022-06-15 14:49:09 | INFO | DF | Running on host DESKTOP-RP8O01C
fatal: not a git repository (or any of the parent directories): .git
2022-06-15 14:49:09 | INFO | DF | Loading model settings of DeepFilterNet2
2022-06-15 14:49:09 | INFO | DF | Using DeepFilterNet2 model at anaconda3\lib\site-packages\pretrained_models\DeepFilterNet2
2022-06-15 14:49:09 | INFO | DF | Initializing model deepfilternet2
2022-06-15 14:49:10 | INFO | DF | Found checkpoint anaconda3\lib\site-packages\pretrained_models\DeepFilterNet2\checkpoints\model_96.ckpt.best with epoch 96
2022-06-15 14:49:10 | WARNING | DF | Unexpected key: erb_comp.c
2022-06-15 14:49:10 | WARNING | DF | Unexpected key: erb_comp.mn
2022-06-15 14:49:10 | INFO | DF | Model loaded
Traceback (most recent call last):
File "C:\Users\Mistorm\anaconda3\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Mistorm\anaconda3\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\Mistorm\anaconda3\Scripts\deepFilter.exe_main.py", line 7, in
File "C:\Users\Mistorm\anaconda3\lib\site-packages\df\enhance.py", line 329, in run
main(parser.parse_args())
File "C:\Users\Mistorm\anaconda3\lib\site-packages\df\enhance.py", line 41, in main
audio, meta = load_audio(file, df_sr)
File "C:\Users\Mistorm\anaconda3\lib\site-packages\df\enhance.py", line 188, in load_audio
info: AudioMetaData = ta.info(file, **ikwargs)
File "C:\Users\Mistorm\anaconda3\lib\site-packages\torchaudio\backend\soundfile_backend.py", line 103, in info
sinfo = soundfile.info(filepath)
File "C:\Users\Mistorm\anaconda3\lib\site-packages\soundfile.py", line 438, in info
return _SoundFileInfo(file, verbose)
File "C:\Users\Mistorm\anaconda3\lib\site-packages\soundfile.py", line 383, in init
with SoundFile(file) as f:
File "C:\Users\Mistorm\anaconda3\lib\site-packages\soundfile.py", line 629, in init
self._file = self._open(file, mode_int, closefd)
File "C:\Users\Mistorm\anaconda3\lib\site-packages\soundfile.py", line 1183, in _open
_error_check(_snd.sf_error(file_ptr),
File "C:\Users\Mistorm\anaconda3\lib\site-packages\soundfile.py", line 1357, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'path/to/noisy_audio.wav': System error.

I don't know why it can't open path/to/noisy_audio.wav
so I can't do the further step either:

(base) C:\Users\Mistorm>cd path/to/DeepFilterNet/
The system cannot find the path specified.

(base) C:\Users\Mistorm>cd DeepFilterNet
The system cannot find the path specified.

I can't figure why it can't find the file, since DeepFilterNet and DeepFilterNet2 has already in
C:\Users\Mistorm\anaconda3\Lib\site-packages\pretrained_models

what step did I miss?
and what to do after that? I have a hard time understanding how to make it works...

ModuleNotFoundError: No module named 'libdf'

"from libdf import DF" in DeepFilterNet/df/checkpoint.py

I change it to "from libDF import DF" and setup correct package path. But error is still.

Question on rationale for setting model memory_format to torch.channels_last

Hi @Rikorose,

I'd like to ask about the rationale behind this line
model.to(memory_format=torch.channels_last)
is it merely for speed in tensor core and it shouldn't affect the output values at all?
The reason I'm asking this is because, commenting out that line will give me different result, i.e.
enhanced = model(spec, erb_feat, spec_feat)[0].cpu()
will lead to different values of enhanced.

I wasn't expecting this, and I found out that the output of a convolution layer, specifically the enc.erb_conv0, is different with and without that setting. The code doesn't set the input memory format to channels last, only the model. So we have input that's channels first and weights that's channels last. I dig around pytorch forum and came across this thread, where they claim that pytorch should take this into account. Is that what you intended, for pytorch to internally handle the different format and automatically convert the input tensors to channels last? In that case, this difference in result isn't the intended behavior, and I've added a reply to the thread mentioning this behavior. But if that's not the case, and the different output is expected, may I understand the rationale for this?

Thanks,
Emily

About whitenoise performance

Thanks for your awesome work!
I have installed deepfilternet through pip and test some samples. The overall noise reduction effect is great, but in whitenoise scene, there are more residues in the speech, which leads to a poor subjective feelings.
Have you noticed this phenomenon? I will attach the samples below.
samples.zip

Sampling factor < 1 results in same samples every epoch

We sample a subset of the dataset here:
https://github.com/Rikorose/DeepFilterNet/blob/7d5fae7/libDF/src/dataset.rs#L832

This is only done at initialization, but should be done at the start of each (training) epoch. So we see new samples each epoch.

Occurrence of NaN during training

Hi,

There seems to be occurrence of NaN for loss after few epochs during training.

Is there any way to avoid it?

Is it possible to resume the training from a particular checkpoint? I understand the training is resumed from the last saved checkpoint. But if the last saved checkpoint is NaN, then resuming the training would be an issue.

question about function "band_mean_norm_erb"

Hi, Rikorose
Thanks for sharing this code! There are some problems for me , could you give me some suggestions?

    (1) How does the function "band_mean_norm_erb" work , or could you give me some papers to explain this implementation?

    (2) "band_mean_norm_erb" is called by "transforms::erb_norm", "MEAN_NORM_INIT" is used in "transforms::erb_norm" and the number of "MEAN_NORM_INIT" is "[-60.0, -90.0]", so  whether  the numbers [-60.0, -90.0] is determined by experience or obtained through mathematical derivation? 

    Thank you !

Add tests for mix_audio_signal()

Noise reduction didn't work

Hi, Rikorose, I have a question to ask you. I have trained for 10 epoch according to your tutorial, and the configuration file is the config.ini of the pre-training model, but WHEN I used the prediction of this model, I found almost no noise reduction effect, could you help me analyze the reason. Thank you.

The following file contains my training log，model and cfg file.

train.zip

Question for training

Hi Rikorose,

Sorry to bother you again,
I try to generate data and train the model according to the training part.

I generated the training_set.txt (just select 10 files for test.) for speech and made the hdf5.(and so on for noise).
Use python df/prepare_data.py --sr 48000 speech training_set.txt TRAIN_SET_SPEECH.hdf5.

~/DeepFilterNet/wav/dataset/oblomov_s009036.wav
~/DeepFilterNet/wav/dataset/oblomov_s009040.wav  
~/DeepFilterNet/wav/dataset/oblomov_s009033.wav     
~/DeepFilterNet/wav/dataset/oblomov_s009037.wav    
~/DeepFilterNet/wav/dataset/oblomov_s009041.wav  
~/DeepFilterNet/wav/dataset/oblomov_s009034.wav    
~/DeepFilterNet/wav/dataset/oblomov_s009038.wav     
~/DeepFilterNet/wav/dataset/oblomov_s009042.wav  
~/DeepFilterNet/wav/dataset/oblomov_s009035.wav     
~/DeepFilterNet/wav/dataset/oblomov_s009039.wav

Generate the dataset.cfg as shown below,

{
 "train": [
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_SPEECH.hdf5",
      1.0
    ],
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_NOISE.hdf5",
      1.0
    ]
  ],
  "valid": [
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_SPEECH.hdf5",
      0.2
    ],
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_NOISE.hdf5",
      0.2
    ]
  ],
  "test": [
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_SPEECH.hdf5",
      0.2
    ],
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_NOISE.hdf5",
      0.2
    ]
  ]
}

Encounter some error as shown in the figure below,

In addition, I have some questions:

In the command python df/train.py dataset.cfg ~/wav_folder/ ./base_dir/,
-Is data_dir a wav folder or an hdf5 folder? (I think is hdf5 folder)
-Can base_dir/ not exist? (But we need to give config.ini, so here I enter pertrained_model/ and delete .ckpt)
I found that the log says dataloader len:0, is this a problem?
I remove the all 'df.' of each file while import (ex. from df.config import ... -> from config import ...), otherwise it will cause an import error.

Thanks,

Question about data-preprocessing

Hi, thanks for your amazing works.
I try to follow the step in readme.md to make a dataset.

I am little confused about make hdf5 and cfg:

Because we have 3 categories (speech,noise,RIRs), should I make 3 hdf5 corresponding to the above 3 categories?
In dataset.cfg, the readme.md says that it should contain the 3 entries(train,valid,test), and optionally sampling factor.
- So I need to fill the each entry in the .cfg for 3 hdf5 which generate from question1?
- The sum of the sampling factor in 3 entries is 1?
- Is the following wording correct?

{
  "train":[
  [
     "TRAIN_SET_SPEECH.hdf5",
     0.6
  ],
  [
     "TRAIN_SET_NOISE.hdf5",
     0.6
  ],
  [
     "TRAIN_SET_RIR.hdf5",
     0.6
  ]
 ],

"valid":[
  [
     "TRAIN_SET_SPEECH.hdf5",
     0.2
  ],
  [
     "TRAIN_SET_NOISE.hdf5",
     0.2
  ],
  [
     "TRAIN_SET_RIR.hdf5",
     0.2
  ]
 ],

"test":[
  [
     "TRAIN_SET_SPEECH.hdf5",
     0.2
  ],
  [
     "TRAIN_SET_NOISE.hdf5",
     0.2
  ],
  [
     "TRAIN_SET_RIR.hdf5",
     0.2
  ]
 ],

}

windwos anaconda install failed?

@Rikorose
Thanks，This is a very good project. I use the web demo to test ，and the results is very good, but my local installation fails. I use the anaconda environment. how to filxed?

(pytorch36) C:\Users\admin>pip install deepfilternet

 ERROR: Cannot install deepfilternet==0.1.2, deepfilternet==0.1.3 and deepfilternet==0.1.4 because these package versions have conflicting dependencies.

The conflict is caused by:
    deepfilternet 0.1.4 depends on DeepFilterLib<0.2 and >=0.1
    deepfilternet 0.1.3 depends on DeepFilterLib<0.2 and >=0.1
    deepfilternet 0.1.2 depends on DeepFilterLib<0.2 and >=0.1

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

(pytorch36) C:\Users\admin>pip install DeepFilterLib

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
ERROR: Could not find a version that satisfies the requirement DeepFilterLib(from versions: none)
ERROR: No matching distribution found for DeepFilterLib

(pytorch36) C:\Users\admin>conda list
# packages in environment at D:\ProgramData\miniconda3\envs\pytorch36:
#
# Name                    Version                   Build  Channel
absl-py                   1.1.0                    pypi_0    pypi
appdirs                   1.4.4                    pypi_0    pypi
attrs                     21.4.0                   pypi_0    pypi
audioread                 2.1.9                    pypi_0    pypi
blas                      2.111                       mkl    conda-forge
blas-devel                3.9.0              11_win64_mkl    conda-forge
ca-certificates           2022.5.18.1          h5b45459_0    conda-forge
cachetools                4.2.4                    pypi_0    pypi
certifi                   2022.5.18.1              pypi_0    pypi
cffi                      1.15.0                   pypi_0    pypi
charset-normalizer        2.0.12                   pypi_0    pypi
colorama                  0.4.4                    pypi_0    pypi
cudatoolkit               11.1.1               heb2d755_7    conda-forge
cycler                    0.10.0                   py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
cython                    0.29.30                  pypi_0    pypi
dataclasses               0.8                pyh787bdff_2    conda-forge
decorator                 4.4.2                    pypi_0    pypi
ear                       2.1.0                    pypi_0    pypi
flatbuffers               2.0                      pypi_0    pypi
freetype                  2.10.4               h546665d_1    conda-forge
google-auth               2.7.0                    pypi_0    pypi
google-auth-oauthlib      0.4.6                    pypi_0    pypi
grpcio                    1.46.3                   pypi_0    pypi
icu                       68.1                 h0e60522_0    conda-forge
idna                      3.3                      pypi_0    pypi
imageio                   2.15.0                   pypi_0    pypi
importlib-metadata        4.8.3                    pypi_0    pypi
importlib-resources       5.4.0                    pypi_0    pypi
intel-openmp              2021.3.0          h57928b3_3372    conda-forge
jbig                      2.1               h8d14728_2003    conda-forge
joblib                    1.1.0                    pypi_0    pypi
jpeg                      9d                   h8ffe710_0    conda-forge
kiwisolver                1.3.1            py36he95197e_1    conda-forge
lcms2                     2.12                 h2a16943_0    conda-forge
lerc                      2.2.1                h0e60522_0    conda-forge
libblas                   3.9.0              11_win64_mkl    conda-forge
libcblas                  3.9.0              11_win64_mkl    conda-forge
libclang                  11.1.0          default_h5c34c98_1    conda-forge
libdeflate                1.7                  h8ffe710_5    conda-forge
liblapack                 3.9.0              11_win64_mkl    conda-forge
liblapacke                3.9.0              11_win64_mkl    conda-forge
libpng                    1.6.37               h1d00b33_2    conda-forge
libprotobuf               3.18.0               h7755175_1    conda-forge
librosa                   0.9.1                    pypi_0    pypi
libtiff                   4.3.0                h0c97f57_1    conda-forge
libuv                     1.42.0               h8ffe710_0    conda-forge
llvmlite                  0.36.0                   pypi_0    pypi
lxml                      4.9.0                    pypi_0    pypi
lz4-c                     1.9.3                h8ffe710_1    conda-forge
m2w64-gcc-libgfortran     5.3.0                         6    conda-forge
m2w64-gcc-libs            5.3.0                         7    conda-forge
m2w64-gcc-libs-core       5.3.0                         7    conda-forge
m2w64-gmp                 6.1.0                         2    conda-forge
m2w64-libwinpthread-git   5.0.0.4634.697f757               2    conda-forge
markdown                  3.3.7                    pypi_0    pypi
matplotlib                3.3.1                         1    conda-forge
matplotlib-base           3.3.1            py36h856a30b_0    conda-forge
mkl                       2021.3.0           hb70f87d_564    conda-forge
mkl-devel                 2021.3.0           h57928b3_565    conda-forge
mkl-include               2021.3.0           hb70f87d_564    conda-forge
msys2-conda-epoch         20160418                      1    conda-forge
multipledispatch          0.6.0                    pypi_0    pypi
networkx                  2.5.1                    pypi_0    pypi
ninja                     1.10.2               h5362a0b_0    conda-forge
numba                     0.53.1                   pypi_0    pypi
numpy                     1.19.5           py36h4b40d73_2    conda-forge
oauthlib                  3.2.0                    pypi_0    pypi
olefile                   0.46               pyh9f0ad1d_1    conda-forge
onnx                      1.10.1           py36h524f2fb_1    conda-forge
onnxruntime               1.10.0                   pypi_0    pypi
openjpeg                  2.4.0                hb211442_1    conda-forge
openssl                   1.1.1o               h8ffe710_0    conda-forge
packaging                 21.3                     pypi_0    pypi
pandas                    1.1.5                    pypi_0    pypi
pesq                      0.0.4                    pypi_0    pypi
pillow                    8.3.2            py36h10c25d6_0    conda-forge
pip                       21.3.1                   pypi_0    pypi
pooch                     1.6.0                    pypi_0    pypi
prettytable               2.5.0                    pypi_0    pypi
protobuf                  3.19.4                   pypi_0    pypi
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pycparser                 2.21                     pypi_0    pypi
pyparsing                 2.2.0                    py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
pypesq                    1.2.4                    pypi_0    pypi
pyqt                      5.12.3           py36ha15d459_7    conda-forge
pyqt-impl                 5.12.3           py36he2d232f_7    conda-forge
pyqt5-sip                 4.19.18          py36he2d232f_7    conda-forge
pyqtchart                 5.12             py36he2d232f_7    conda-forge
pyqtwebengine             5.12.1           py36he2d232f_7    conda-forge
pystoi                    0.3.3                    pypi_0    pypi
pytest-runner             5.3.2                    pypi_0    pypi
python                    3.6.13          h39d44d4_2_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.6                     2_cp36m    conda-forge
pytorch                   1.9.1           py3.6_cuda11.1_cudnn8_0    pytorch
pytz                      2022.1                   pypi_0    pypi
pyvad                     0.1.3                    pypi_0    pypi
pywavelets                1.1.1                    pypi_0    pypi
qt                        5.12.9               h5909a2a_4    conda-forge
requests                  2.27.1                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
resampy                   0.2.2                    pypi_0    pypi
rsa                       4.8                      pypi_0    pypi
ruamel-yaml               0.17.21                  pypi_0    pypi
ruamel-yaml-clib          0.2.6                    pypi_0    pypi
scikit-image              0.17.2                   pypi_0    pypi
scikit-learn              0.24.2                   pypi_0    pypi
scipy                     1.5.4                    pypi_0    pypi
setuptools                59.5.0                   pypi_0    pypi
six                       1.16.0             pyh6c4a22f_0    conda-forge
soundfile                 0.10.3.post1             pypi_0    pypi
speechpy                  2.4                      pypi_0    pypi
sqlite                    3.36.0               h8ffe710_1    conda-forge
tbb                       2021.3.0             h2d74725_0    conda-forge
tensorboard               2.9.1                    pypi_0    pypi
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
threadpoolctl             3.1.0                    pypi_0    pypi
tifffile                  2020.9.3                 pypi_0    pypi
tk                        8.6.11               h8ffe710_1    conda-forge
torchaudio                0.9.1                      py36    pytorch
torchvision               0.2.2                      py_3    pytorch
tornado                   4.5.2                    py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
tqdm                      4.64.0                   pypi_0    pypi
typing-extensions         3.10.0.2             hd8ed1ab_0    conda-forge
typing_extensions         3.10.0.2           pyha770c72_0    conda-forge
ucrt                      10.0.20348.0         h57928b3_0    conda-forge
urllib3                   1.26.9                   pypi_0    pypi
vc                        14.2                 hb210afc_5    conda-forge
vs2015_runtime            14.29.30037          h902a5da_5    conda-forge
wavinfo                   1.6.3                    pypi_0    pypi
wcwidth                   0.2.5                    pypi_0    pypi
webrtcvad                 2.0.10                   pypi_0    pypi
werkzeug                  2.0.3                    pypi_0    pypi
wheel                     0.29.0                   py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
wincertstore              0.2                      py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
xz                        5.2.5                h62dcd97_1    conda-forge
zipp                      3.6.0                    pypi_0    pypi
zlib                      1.2.11            h62dcd97_1010    conda-forge
zstd                      1.5.0                h6255e5f_0    conda-forge

What is up with the canned deepFilter?

(DeepFilterNet) C:\Users\sdr\Downloads\Playground\DeepFilterNet>deepFilter test.wav
Traceback (most recent call last):
File "C:\Users\sdr\anaconda3\envs\DeepFilterNet\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\sdr\anaconda3\envs\DeepFilterNet\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\sdr\anaconda3\envs\DeepFilterNet\Scripts\deepFilter.exe_main.py", line 7, in
TypeError: main() missing 1 required positional argument: 'args'

(DeepFilterNet) C:\Users\sdr\Downloads\Playground\DeepFilterNet>

about test_erb function

Dear Author, I am having problems with test_erb in modules.py in the df folder.

Question for metric evaluation file

Hi, thanks for this work ! I have some questions about calculation of STOI. How can we use evaluation_utils.py ? Could you explain it with an example or add some explanations to README ?
When I try to get result I always see this exception 'x and y should have the same length' . I am using your pretrained DeepFilterNet2 model and Valentini test dataset . The clean and enhanced tensor length are close but not same.

Question on package

Hi,

I found a case that even if I remove a file such as loss.py from the project, the train.py still runs.

How is this possible? Maybe I am overlooking something.

Is it possible to edit loss.py ? Currently when I edit loss.py I can't see any changes happening. If I move loss.py out of df folder also, train.py does not throw any error.

Question about DNSMOS

Hi, thanks for this work. I have some questions about DNSMOS.
I tested the raw blind test set and deepfilter2 results with the DNSMOS tool (https://github.com/microsoft/DNS-Challenge dnsmos.py). Both results are different from your paper. What preprocessing did you do to the blind test set?

Question about WAV Encoding

Hi, thanks for your amazing work! I have some questions about the wav encoding. For the training and test script, which encoding(signed 16, signed 32, float 32) is supported? Thank you!

The accuracy problem after streaming implementation

Hello, thanks for your open source DeepFilterNet work. After trying it out, I found the effect and the amount of computation to be excellent.

After studying the network carefully, I confirmed that it meets the requirements of streaming speech processing. Therefore, after calculating the padding size, I changed the forward inference part of the model into streaming realization (for loop).
`
class Encoder(nn.Module):
def init(self):
super().init()
p = ModelParams()
layer_width = p.conv_ch
wf = p.conv_width_f
assert p.nb_erb % 4 == 0, "erb_bins should be divisible by 4"

    k = p.conv_k_enc
    kwargs = {"batch_norm": True, "depthwise": p.conv_depthwise}
    k0 = 1 if k == 1 and p.conv_lookahead == 0 else max(2, k)
    cl = 1 if p.conv_lookahead > 0 else 0
    self.erb_conv0 = convkxf(1, layer_width, k=k0, fstride=1, lookahead=cl, **kwargs)
    cl = 1 if p.conv_lookahead > 1 else 0
    self.erb_conv1 = convkxf(
        layer_width * wf**0, layer_width * wf**1, k=k, lookahead=cl, **kwargs
    )
    cl = 1 if p.conv_lookahead > 2 else 0
    self.erb_conv2 = convkxf(
        layer_width * wf**1, layer_width * wf**2, k=k, lookahead=cl, **kwargs
    )
    self.erb_conv3 = convkxf(
        layer_width * wf**2, layer_width * wf**2, k=k, fstride=1, **kwargs
    )
    self.df_conv0 = convkxf(
        2, layer_width, fstride=1, k=k0, lookahead=p.conv_lookahead, **kwargs
    )
    self.df_conv1 = convkxf(layer_width, layer_width * wf**1, k=k, **kwargs)
    self.erb_bins = p.nb_erb
    self.emb_dim = layer_width * p.nb_erb // 4 * wf**2
    self.df_fc_emb = GroupedLinear(
        layer_width * p.nb_df // 2, self.emb_dim, groups=p.lin_groups
    )
    self.emb_out_dim = p.emb_hidden_dim
    self.emb_n_layers = p.emb_num_layers
    self.gru_groups = p.gru_groups
    self.emb_gru = GroupedGRU(
        self.emb_dim,
        self.emb_out_dim,
        num_layers=p.emb_num_layers,
        batch_first=False,
        groups=p.gru_groups,
        shuffle=p.group_shuffle,
        add_outputs=True,
    )
    self.lsnr_fc = nn.Sequential(nn.Linear(self.emb_out_dim, 1), nn.Sigmoid())
    self.lsnr_scale = p.lsnr_max - p.lsnr_min
    self.lsnr_offset = p.lsnr_min

    self.streaming_state = {
        'e1': None,
        'e2': None,
        'c0': None,
    }

def forward(
    self, feat_erb: Tensor, feat_spec: Tensor
) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]:
    # Encodes erb; erb should be in dB scale + normalized; Fe are number of erb bands.
    #streaming implementation
    B, C, T, Ferb = feat_erb.shape
    erb_padding_right = torch.zeros((B, C, 2, Ferb), dtype=feat_erb.dtype, device=feat_erb.device)
    feat_erb = torch.cat([feat_erb, erb_padding_right], dim=-2)

    B, _, T, Fspec = feat_spec.shape
    spec_padding_right = torch.zeros((B, 2, 2, Fspec), dtype=feat_spec.dtype, device=feat_spec.device)
    feat_spec = torch.cat([feat_spec, spec_padding_right], dim=-2)
    e0_s, e1_s, e2_s, e3_s = None, None, None, None
    emb_s, c0_s, c1_s, lsnr_s = None, None, None, None

    self.streaming_state['e1'] = torch.zeros((B, 64, 1, Ferb // 2), dtype=feat_erb.dtype, device=feat_erb.device)
    self.streaming_state['e2'] = torch.zeros((B, 64, 1, Ferb // 4), dtype=feat_erb.dtype, device=feat_erb.device)
    self.streaming_state['c0'] = torch.zeros((B, 64, 1, Fspec), dtype=feat_spec.dtype, device=feat_spec.device)
  
    for t in range(T):  
        sub_e0 = self.erb_conv0[1:](feat_erb[:,:,t:t+3,:])  # [B, C, 2, F]
        sub_e1 = self.erb_conv1[1:](sub_e0)  # [B, C*2, 1, F/2]
        sub_e2 = self.erb_conv2[1:](torch.cat([self.streaming_state['e1'],sub_e1], dim=-2))  # [B, C*4, 1, F/4]
        sub_e3 = self.erb_conv3[1:](torch.cat([self.streaming_state['e2'],sub_e2], dim=-2))  # [B, C*4, 1, F/4]
        self.streaming_state['e1'] = sub_e1
        self.streaming_state['e2'] = sub_e2
        sub_c0 = self.df_conv0[1:](feat_spec[:,:,t+1:t+3,:])# [B, C, 1, Fc]
        sub_c1 = self.df_conv1[1:](torch.cat([self.streaming_state['c0'],sub_c0], dim=-2)) # [B, C*2, 1, Fc]  
        self.streaming_state['c0'] = sub_c0
        sub_cemb = sub_c1.permute(2, 0, 1, 3).reshape(1, B, -1)  # [1, B, C * Fc/4]
        sub_cemb = self.df_fc_emb(sub_cemb)  # [1, B, C * F/4]
        sub_emb = sub_e3.permute(2, 0, 1, 3).reshape(1, B, -1)  # [1, B, C * F/4]
        sub_emb = sub_emb + sub_cemb
        sub_emb, _ = self.emb_gru(sub_emb)
        sub_emb = sub_emb.transpose(0, 1) # [B, 1, C * F/4]
        sub_lsnr = self.lsnr_fc(sub_emb) * self.lsnr_scale + self.lsnr_offset

        if t == 0:
            e0_s, e1_s, e2_s, e3_s = sub_e0[:, :, [0], :], sub_e1, sub_e2, sub_e3
            c0_s, c1_s, emb_s, lsnr_s = sub_c0, sub_c1, sub_emb, sub_lsnr
        else:
            e0_s = torch.cat((e0_s, sub_e0[:, :, [0], :]), dim=-2)
            e1_s = torch.cat((e1_s, sub_e1), dim=-2)
            e2_s = torch.cat((e2_s, sub_e2), dim=-2)
            e3_s = torch.cat((e3_s, sub_e3), dim=-2)
            c0_s = torch.cat((c0_s, sub_c0), dim=-2)
            c1_s = torch.cat((c1_s, sub_c1), dim=-2)
            emb_s = torch.cat((emb_s, sub_emb), dim=-2)
            lsnr_s = torch.cat((lsnr_s, sub_lsnr), dim=-2)

    return e0_s, e1_s, e2_s, e3_s, emb_s, c0_s, lsnr_s

class ErbDecoder(nn.Module):
def init(self):
super().init()
p = ModelParams()
layer_width = p.conv_ch
wf = p.conv_width_f
assert p.nb_erb % 8 == 0, "erb_bins should be divisible by 8"

    self.emb_width = layer_width * wf**2
    self.emb_dim = self.emb_width * (p.nb_erb // 4)
    self.fc_emb = nn.Sequential(
        GroupedLinear(
            p.emb_hidden_dim, self.emb_dim, groups=p.lin_groups, shuffle=p.group_shuffle
        ),
        nn.ReLU(inplace=True),
    )
    k = p.conv_k_dec
    kwargs = {"k": k, "batch_norm": True, "depthwise": p.conv_depthwise}
    tkwargs = {
        "k": k,
        "batch_norm": True,
        "depthwise": p.convt_depthwise,
        "mode": p.conv_dec_mode,
    }
    pkwargs = {"k": 1, "f": 1, "batch_norm": True}
    # convt: TransposedConvolution, convp: Pathway (encoder to decoder) convolutions
    self.conv3p = convkxf(layer_width * wf**2, self.emb_width, **pkwargs)
    self.convt3 = convkxf(self.emb_width, layer_width * wf**2, fstride=1, **kwargs)
    self.conv2p = convkxf(layer_width * wf**2, layer_width * wf**2, **pkwargs)
    self.convt2 = convkxf(layer_width * wf**2, layer_width * wf**1, **tkwargs)
    self.conv1p = convkxf(layer_width * wf**1, layer_width * wf**1, **pkwargs)
    self.convt1 = convkxf(layer_width * wf**1, layer_width * wf**0, **tkwargs)
    self.conv0p = convkxf(layer_width, layer_width, **pkwargs)
    self.conv0_out = convkxf(layer_width, 1, fstride=1, k=k, act=nn.Sigmoid())

    self.streaming_state = {
        'convt3in': None,
        'convt2in': None,
        'convt1in': None,
        'conv0in': None
    }

def forward(self, emb, e3, e2, e1, e0) -> Tensor:
    # Estimates erb mask
    #streaming implementation
    B, C, T, F8 = e3.shape
    data_type, device = e3.dtype, e3.device
    self.streaming_state['convt3in'] = torch.zeros((B, C, 1, F8), dtype=data_type, device=device)
    self.streaming_state['convt2in'] = torch.zeros((B, C, 1, F8), dtype=data_type, device=device)
    self.streaming_state['convt1in'] = torch.zeros((B, C, 1, F8*2), dtype=data_type, device=device)
    self.streaming_state['conv0in'] = torch.zeros((B, C, 1, F8*4), dtype=data_type, device=device)
    m = None
    for t in range(T):
        sub_emb = self.fc_emb(emb[:, [t], :])
        sub_emb = sub_emb.view(B, 1, -1, F8).transpose(1, 2)  # [B, C*8, T, F/8]
        convt3_in_cur = self.conv3p(e3[:, :, [t], :]) + sub_emb
        convt3_in = torch.cat([self.streaming_state['convt3in'], convt3_in_cur], dim=-2)
        self.streaming_state['convt3in'] = convt3_in_cur
        sub_e3 = self.convt3[1:](convt3_in)  # [B, C*4, T, F/4]
        convt2_in_cur = self.conv2p(e2[:, :, [t], :]) + sub_e3
        convt2_in = torch.cat([self.streaming_state['convt2in'], convt2_in_cur], dim=-2)
        self.streaming_state['convt2in'] = convt2_in_cur
        sub_e2 = self.convt2[1:](convt2_in)  # [B, C*2, T, F/2]
        convt1_in_cur = self.conv1p(e1[:, :, [t], :]) + sub_e2
        convt1_in = torch.cat([self.streaming_state['convt1in'], convt1_in_cur], dim=-2)
        self.streaming_state['convt1in'] = convt1_in_cur
        sub_e1 = self.convt1[1:](convt1_in)  # [B, C, T, F]
        conv0_in_cur = self.conv0p(e0[:, :, [t], :]) + sub_e1
        conv0_in = torch.cat([self.streaming_state['conv0in'], conv0_in_cur], dim=-2)
        self.streaming_state['conv0in'] = conv0_in_cur
        sub_m = self.conv0_out[1:](conv0_in)  # [B, 1, T, F]
        if t == 0:
            m = sub_m
        else:
            m = torch.cat((m, sub_m), dim=-2)
    return m

class DfNet(nn.Module):
def init(
self,
erb_inv_fb: Tensor,
run_df: bool = True,
train_mask: bool = True,
):
super().init()
p = ModelParams()
layer_width = p.conv_ch
assert p.nb_erb % 8 == 0, "erb_bins should be divisible by 8"
self.freq_bins = p.fft_size // 2 + 1
self.emb_dim = layer_width * p.nb_erb
self.erb_bins = p.nb_erb
self.enc = Encoder()
self.erb_dec = ErbDecoder()
self.mask = Mask(erb_inv_fb, post_filter=p.mask_pf)

    self.df_order = p.df_order
    self.df_bins = p.nb_df
    self.df_lookahead = p.df_lookahead
    self.df_dec = DfDecoder()
    self.df_op = torch.jit.script(
        DfOp(
            p.nb_df,
            p.df_order,
            p.df_lookahead,
            freq_bins=self.freq_bins,
            method=p.dfop_method,
        )
    )

    self.run_df = run_df
    if not run_df:
        from loguru import logger
        logger.warning("Runing without DF")
    self.train_mask = train_mask

def forward(
    self,
    spec: Tensor,
    feat_erb: Tensor,
    feat_spec: Tensor,  # Not used, take spec modified by mask instead
    atten_lim: Optional[Tensor] = None,
) -> Tuple[Tensor, Tensor, Tensor, Tensor]:
    feat_spec = feat_spec.transpose(1, 4).squeeze(4)  # re/im into channel axis
    e0, e1, e2, e3, emb, c0, lsnr = self.enc(feat_erb, feat_spec)
    m = self.erb_dec(emb, e3, e2, e1, e0)
    spec = self.mask(spec, m, atten_lim)
    self.run_df = False
    if self.run_df:
        df_coefs, df_alpha = self.df_dec(emb, c0)
        spec = self.df_op(spec, df_coefs, df_alpha)
    else:
        df_alpha = torch.zeros(spec.shape[0], spec.shape[2], 1, device=spec.device)
    return spec, m, lsnr, df_alpha

I only use Encoder and ErbDecoder modules. However, my result was not so good. Later, I found that the difference is due to nn.GRU inference. There will be different results between batched and one by one inference of nn.GRU because of NUMERICAL ACCURACY. https://pytorch.org/docs/stable/notes/numerical_accuracy.html

NaNs during training

E.g. reported in #31

Maybe this could be improved by increasing the eps in angle_backward?

Question about data generation

Hi,

In the current framework, it seems the speech, noise and RIRs path have to be provided as lists to create the hdf5 sets for training. I have few questions on the same.

How to check the size of dataset used for training? For example, if the number of speech samples is 10 with each sample 3 s in duration, the dataset size used is 30s? In other words, how to prepare datasets of different size? Is it based on the number of speech samples?
Regarding data augmentation. I see it is build from dataset.rs . If the values of SNR or gains need to be changed, does it needs to be re-build?

Questions about the latency

Hi, amazing work first! But I've got some questions about the latency described in your paper as it says as the following table

How many frames (20ms for a stft window, 10ms for the hop size)is the MACS calculated under ?

Fine Tuning Option

Hi,
Is there any option to fine-tune the pre-trained models ?
Thanks for your time.

Implement a real-time loop for DeepFilterNet

Hi Hendrik,
Just curious as didn't see any benchmarks but could it run chunking audio as it goes or does it need the overall file to analyse?
How does it compare to https://github.com/breizhn/DTLN?

Thanks
Stuart

the window should be used after square? in the transposed calculation of istft in stoi.py，

DeepFilterNet/DeepFilterNet/df/stoi.py

Line 95 in 1819b97

w = [w.repeat((1, n_no_sil_w[i])).unsqueeze(0) for i in range(B)]

the way to calculate istft by transposed convolution，i think the window should be square then to be used ，

which i compare is as follow
istft

the reference i have checked,it is the same as librosa istft

Error while running maturin build

When I run the command
maturin build --release -m DeepFilterNet/Cargo.toml

I am getting the following error.

🔗 Found pyo3 bindings
🐍 Found CPython 3.6m at python3.6, CPython 3.7m at python3.7
Compiling df v0.1.0 (/content/DeepFilterNet/libDF)
error[E0277]: [u32; 5] is not an iterator
--> libDF/src/transforms.rs:449:42
|
449 | let fft_size = primes.iter().zip(factors).fold(1, |acc, (p, f)| acc * p.pow(f));
| ^^^^^^^
| |
| expected an implementor of trait IntoIterator
| help: consider borrowing here: &factors
|
= note: the trait bound [u32; 5]: IntoIterator is not satisfied
= note: required because of the requirements on the impl of IntoIterator for [u32; 5]

error[E0599]: the method fold exists for struct std::iter::Zip<std::slice::Iter<'_, usize>, [u32; 5]>, but its trait bounds were not satisfied
--> libDF/src/transforms.rs:449:51
|
449 | let fft_size = primes.iter().zip(factors).fold(1, |acc, (p, f)| acc * p.pow(f));
| ^^^^ method cannot be called on std::iter::Zip<std::slice::Iter<'_, usize>, [u32; 5]> due to unsatisfied trait bounds
|
= note: the following trait bounds were not satisfied:
[u32; 5]: Iterator
which is required by std::iter::Zip<std::slice::Iter<'_, usize>, [u32; 5]>: Iterator
std::iter::Zip<std::slice::Iter<'_, usize>, [u32; 5]>: Iterator
which is required by &mut std::iter::Zip<std::slice::Iter<'_, usize>, [u32; 5]>: Iterator

error: aborting due to 2 previous errors

Some errors have detailed explanations: E0277, E0599.

For more information about an error, try rustc --explain E0277.

error: could not compile df

To learn more, run the command again with --verbose.
💥 maturin failed
Caused by: Failed to build a native library through cargo
Caused by: Cargo build finished with "exit status: 101": cargo rustc --message-format json --manifest-path DeepFilterNet/Cargo.toml --release --lib -- -C link-arg=-s

It would be great if you help.
Thanks

Buffer length of real-time inference

Hi Rikorose,

Sorry to bother you again,
I have transfered the code to keras and try to inference like real-time.

I found a few situations I want to ask,

In enhance.py, the flow looks like feeding the entire signal into the model? It is like a offline inference?
I try to do a real-time inference, this is my flow.
- Feeding a fixed-length buffer (with a look-ahead of 1 frame)
- Getting the same length output
- Taking the second last frame output as the real-time processed frame.
When I change the buffer length to inference, the effect is good while the length is 100 or 300.
But when I change the buffer length to 10 frames, the effect sounds bad.
When I try to do the real-time inference, what is the minimum buffer length?

Thanks,

failed to run custom build command for `hdf5-sys v0.8.1`

when i use the cmd follow install libdfdat, but failed

Optional: Install libdfdata python package with dataset and dataloading functionality for training

maturin develop --release -m pyDF-data/Cargo.toml

error: failed to run custom build command for hdf5-sys v0.8.1

Caused by:
process didn't exit successfully: E:\data\deeplearning\pytorch\DeepFilterNet\target\release\build\hdf5-sys-8ffb164969e6e670\build-script-build (exit code: 101)
--- stdout
Searching for installed HDF5 (any version)...
Found no HDF5 installations.

--- stderr
thread 'main' panicked at 'Unable to locate HDF5 root directory and/or headers.', C:\Users\tangzixing.cargo\registry\src\github.com-1ecc6299db9ec823\hdf5-sys-0.8.1\build.rs:548:13
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
error: build failed
💥 maturin failed
Caused by: Failed to build a native library through cargo
Caused by: Cargo build finished with "exit code: 101": cargo rustc --message-format json --manifest-path pyDF-data/Cargo.toml --release --lib -- -C link-arg=-s

config.ini for reproducing results

Hi,

Is it possible to provide a config.ini to reproduce the results in the paper?

Is the pre-trained model's config.ini same as the one used to get the results in the paper?

Data normalization

Hello,
I am trying to do my own data augmentation.
And I found that my dataset always has Nan loss 3 loss function.
I noticed that the paper says that the data has been exponentially mean/unit normalized, I think maybe this cause the Nan issue for me. Could you give some details?

Thanks,

Noise of typing not working well

Hi Rikorose,

Thanks for working on version 2 of Deepfilternet.
Now I can do the real-time inference process with buffer size=1, which is the same as the full signal effect.
The point is that the state of the RNN needs to be inherited.

Now I'm having a trouble with typing/keyboard noise not working well.
But I only use spectral loss with c=0.3 in Deepfilternet2 now, will multi-resolution loss improve in this case?
or maybe c=0.6 in preious work is better?

Thanks,
Aaron

Question about deepfilter2 code

Thanks for your awesome work!
And I am confusing about the pad_feat/pad_spec and df_op function so I open this issue to check it out.
First, I try to test your trained model, and the class DfNet() in deepfilternet2.py

self.pad_feat = nn.ConstantPad2d((0, 0, -p.conv_lookahead, p.conv_lookahead), 0.0)
self.pad_spec = nn.ConstantPad3d((0, 0, 0, 0, -p.df_lookahead, p.df_lookahead), 0.0)
self.pad_out = nn.Identity()

for line 430-432,444-445

feat_erb = self.pad_feat(feat_erb)
feat_spec = self.pad_feat(feat_spec)
e0, e1, e2, e3, emb, c0, lsnr = self.enc(feat_erb, feat_spec)

spec_f = self.pad_spec(spec)
spec_f = self.df_op(spec_f, df_coefs)

My question is,
a. In nn.ConstantPad2/3d, -p.df_lookahead=2 means to remove the data, so is there 2 frames of information missing during training?
b. self.df_op is Causal/ not-Causal model? For example, is the first frame data calculated using 0,0,0,0 and 3 frames?

Thanks!

Run on normal RAM?

Hi, I've tried to get Colab Pro in order to run this on their higher RAM (which I thought was going to be extra GPU memory) but turns out it's just normal RAM. Is it possible for me to run Deepfilternet on a normal CPU + RAM combination as opposed to use GPU memory? Basically I'm doing this because I'd like to run it on longer files (hour long)

lsnr is not used, what is the use of this variable?

lsnr = self.lsnr_fc(emb) * self.lsnr_scale + self.lsnr_offset

i found in the code， this variable is calculated, but not used for loss and other place ,so what is the use of this target？

H5Fopen(): unable to open file: bad superblock version number

Hi, when I run the train, the code failed:
RuntimeError: DF dataset error: Hdf5ErrorDetail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during Fil
e::open of dataset /dockerdata/thujunchen/cjcode/ft_local/DeepFilterNet/DNS16kdataset/VALID_SET_SPEECH.hdf5" }

There is no error reported at df/prepare_data.py.

I have tried the cargo test, which reports that:
running 24 tests
test reexport_dataset_modules::util::test_find_max_abs ... ok
test tests::test_erb_inout ... ok
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_07 - should panic ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_10 ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_read_vorbis ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_05 - should panic ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_01 ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_read_pcm ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_02 ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_04 - should panic ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_read_flac ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_06 - should panic ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_09 ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_08 ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_03 ... FAILED
test reexport_dataset_modules::dataloader::tests::test_fft_dataset ... FAILED
test reexport_dataset_modules::dataset::tests::test_cached_valid_dataset ... FAILED
test reexport_dataset_modules::augmentations::tests::test_filters ... ok
test reexport_dataset_modules::augmentations::tests::test_gen_noise ... ok
test reexport_dataset_modules::augmentations::tests::test_clipping ... ok
test reexport_dataset_modules::augmentations::tests::test_rand_resample ... ok
test reexport_dataset_modules::augmentations::tests::test_low_pass ... ok
test reexport_dataset_modules::dataset::tests::test_mix_audio_signal ... ok
test reexport_dataset_modules::augmentations::tests::test_reverb ... ok

failures:

---- reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_06 stdout ----
-------------- TEST START --------------
thread 'reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_06' panicked at 'called Result::unwrap() on an Err value: Hdf5Error
Detail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../assets/noise_flac.h
df5" }', libDF/src/dataset.rs:1956:41
note: panic did not contain expected string
panic message: "called Result::unwrap()on anErr value: Hdf5ErrorDetail { source: H5Fopen(): unable to open file: bad superblock ve rsion number, msg: \"Error during File::open of dataset ../assets/noise_flac.hdf5\" }",
expected substring: "Slice end"
---- reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_09 stdout ----
-------------- TEST START --------------
thread 'reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_09' panicked at 'called Result::unwrap() on an Err value: Hdf5Error
Detail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../assets/noise_flac.h
df5" }', libDF/src/dataset.rs:1956:41

---- reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_08 stdout ----
-------------- TEST START --------------
thread 'reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_08' panicked at 'called Result::unwrap() on an Err value: Hdf5Error
Detail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../assets/noise.hdf5"
}', libDF/src/dataset.rs:1956:41

---- reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_03 stdout ----
-------------- TEST START --------------
thread 'reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_03' panicked at 'called Result::unwrap() on an Err value: Hdf5Error
Detail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../assets/noise_vorbis
.hdf5" }', libDF/src/dataset.rs:1956:41

---- reexport_dataset_modules::dataloader::tests::test_fft_dataset stdout ----
******** Start test_data_loader() ********
Error: DatasetError(Hdf5ErrorDetail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of d
ataset ../assets/clean.hdf5" })
thread 'reexport_dataset_modules::dataloader::tests::test_fft_dataset' panicked at 'assertion failed: (left == right)
left: 1,
right: 0: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/fe5b13d681f25ee6474be29d7
48c65adcd91f69e/library/test/src/lib.rs:186:5

---- reexport_dataset_modules::dataset::tests::test_cached_valid_dataset stdout ----
Error: Hdf5ErrorDetail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../ass
ets/clean.hdf5" }
thread 'reexport_dataset_modules::dataset::tests::test_cached_valid_dataset' panicked at 'assertion failed: (left == right)
left: 1,
right: 0: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/fe5b13d681f25ee6474be29d7
48c65adcd91f69e/library/test/src/lib.rs:186:5

failures:
reexport_dataset_modules::dataloader::tests::test_fft_dataset
reexport_dataset_modules::dataset::tests::test_cached_valid_dataset
reexport_dataset_modules::dataset::tests::test_hdf5_read_flac
reexport_dataset_modules::dataset::tests::test_hdf5_read_pcm
reexport_dataset_modules::dataset::tests::test_hdf5_read_vorbis
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_01
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_02
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_03
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_04
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_05
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_06
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_07
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_08
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_09
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_10

test result: FAILED. 9 passed; 15 failed; 0 ignored; 0 measured; 0 filtered out; finished in 3.67s

error: test failed, to rerun pass '-p deep_filter --lib'

I tried to update hdf5 to 1.10.1 as stated in https://stackoverflow.com/questions/49386121/python-h5py-file-read-oserror-unable-to-open-file-bad-superblock-version-numb, but it also did not work.

poetry install is extremely slow when resolving the dependencies

Hi @Rikorose
Hello, thanks for your open source DeepFilterNet work. After PyPI install and trying it out, I found the effect and the amount of computation to be excellent.
So I want studying the network carefully, and manual Installation in win10.
first , i setup a conda env , and download rust and cargo : rustc 1.61.0 (fe5b13d68 2022-05-18)
second, in a python3.9 conda env , I follow your README :

      pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
      pip install maturin poetry
      maturin develop --release -m pyDF/Cargo.toml
      maturin develop --release -m pyDF-data/Cargo.toml

These commands have a correct result, key packages: DeepFilterDataLoader and DeepFilterLib install sucessful.

asttokens            2.0.5
CacheControl         0.12.11
cachy                0.3.0
certifi              2022.5.18.1
charset-normalizer   2.0.12
cleo                 0.8.1
clikit               0.6.2
colorama             0.4.5
crashtest            0.3.1
DeepFilterDataLoader 0.2.5rc0
DeepFilterLib        0.2.5rc0
distlib              0.3.4
executing            0.8.3
filelock             3.7.1
html5lib             1.1
icecream             2.1.2
idna                 3.3
importlib-metadata   4.11.4
keyring              23.6.0
lockfile             0.12.2
loguru               0.6.0
maturin              0.12.20
msgpack              1.0.4
numpy                1.22.4
packaging            20.9
pastel               0.2.1
pexpect              4.8.0
pip                  21.2.4
pkginfo              1.8.3
platformdirs         2.5.2
poetry               1.1.13
poetry-core          1.0.8
ptflops              0.6.9
ptyprocess           0.7.0
Pygments             2.12.0
pylev                1.4.0
pyparsing            3.0.9
pywin32-ctypes       0.2.0
requests             2.28.0
requests-toolbelt    0.9.1
setuptools           61.2.0
shellingham          1.4.0
six                  1.16.0
tomli                2.0.1
tomlkit              0.11.0
torch                1.11.0+cpu
torchaudio           0.11.0+cpu
typing_extensions    4.2.0
urllib3              1.26.9
virtualenv           20.14.1
webencodings         0.5.1
wheel                0.37.1
win32-setctime       1.1.0
wincertstore         0.2
zipp                 3.8.0

third, poetry install -E train -E eval or poetry install -E train -E eval --no-root ， those cmd are been blocking and has not returned any results，even after an hour or more.

(DeepFilterNet) E:\code\DeepFilterNet\DeepFilterNet>poetry install -E train -E eval --no-root
Updating dependencies
Resolving dependencies...

How to fixed it? and how can debug that where is blocking ?
Thanks!

errors when trying to process wav files on Windows

Hello,
I was trying to test deepfilternet on Windows. I don't know much about the technical aspects or Python in general though, and consequently I am getting errors and I don't know the cause.
I ran these commands from the readme:
pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
pip install deepfilternet
(So far, no errors)
deepFilter test.wav
(Test.wav is just a placeholder to the path to a real file)

I tried both with Python 3.10 (latest) and 3.7 as suggested by the deepfilterlib page on pypi. In both cases I get errors that no audio backend is available, and that libdf couldn't be found.
Any advice would be greatly appreciated. Thanks in advance!

Generate dataset samples in reproducable order

Basically fix this TODO:
https://github.com/Rikorose/DeepFilterNet/blob/7f2120b/libDF/src/dataset.rs#L340

The assumption that the closure is submitted in order is not correct. The input drain rather gets split in chunks corresponding to the number of workers. Thus, changing the number of workers will change the order of samples occurring in each batch.

Question about erb_fb

Hi,
I am confusing about the erb_rb function so i open this issue to check it out.
In forward, the erb2stft is done by spec_mask = erb_mask.matmul(erb_inv_fb), but i check the code in librosa, the erb2stft is done by mel_to_stft function

    # Construct a mel basis with dtype matching the input data
    mel_basis = filters.mel(
        sr=sr, n_fft=n_fft, n_mels=M.shape[-2], dtype=M.dtype, **kwargs
    )

    # Find the non-negative least squares solution, and apply
    # the inverse exponent.
    # We'll do the exponentiation in-place.
    inverse = nnls(mel_basis, M)
    return np.power(inverse, 1.0 / power, out=inverse)

My question is,
a. Is the erb2stft process lossless, so is the mel2stft, bark2stft?
b. Is the erb better than stft feature in DeepFilterNet?

Thanks

quesion on train

follow #31 and readme ,i have prepare the speech and noise hdf5 files and dateset.cfg file ,

the speech and noise data is from dns challenge，the num is 50 and the batchsize i set is 2

when i train the net ,erros occur as follows :

can you give me some advice to fix this error? thanks

Group Shuffle parameter missed in the default config

Running deepFilter wav_name

Traceback (most recent call last):
  File "deepFilter", line 8, in <module>
    sys.exit(main())
  File "env/lib/python3.8/site-packages/df/enhance.py", line 66, in main
    p = ModelParams()
  File "env/lib/python3.8/site-packages/df/model.py", line 15, in __init__
    self.__params = getattr(import_module("df." + self.__model), "ModelParams")()
  File "env/lib/python3.8/site-packages/df/deepfilternet.py", line 48, in __init__
    self.group_shuffle: bool = config(
  File "env/lib/python3.8/site-packages/df/config.py", line 114, in __call__
    raise ValueError(f"Value '{option}' not found in config (defaults not allowed).")
ValueError: Value 'GROUP_SHUFFLE' not found in config (defaults not allowed).

Read the hdf5 file failed

I want to reproduce your job ,but when i run the train, load the hdf5 failed.

File "df/train.py", line 425, in
main()
└ <function main at 0x7f6cccc270d0>

File "df/train.py", line 103, in main
dataloader = DataLoader(
└ <class 'libdfdata.torch_dataloader.PytorchDataLoader'>

File "/home/tangzixing/data/deeplearning/program/audio/DeepFilterNet-0.1.10/pyDF-data/libdfdata/torch_dataloader.py", line 99, in init
self.loader = _FdDataLoader(
│ └ <class 'builtins._FdDataLoader'>
└ <libdfdata.torch_dataloader.PytorchDataLoader object at 0x7f6ccc9ae460>

RuntimeError: DF dataset error: Hdf5ErrorDetail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../data/dns/val_speech.hdf5" }

the h5py verision is 3.6.0
h5py.version.hdf5_version is 1.2.1

which version of h5py you use ?

Reproducing results

Hi,

I had tried to re-train the deepfilternet model using the DNS-3 challenge dataset mentioned in your work.

I don't have the additional 10k IR. However, the other dataset remains the same.

On VCTK test set, using the config.ini in the pre-trained model as my training config, my "best model" on validation gives PESQ score of 2.60. It is much lower than 2.81 from the pre-trained model.

In config.ini, Adamw is used, while in the paper Adam as optimizer is mentioned.

Do you think any other factors would result in such a performance drop?

Could you clarify on the 3 s sample for training? Suppose the DNS-3 sample has 10 s in a sample, do I need to split it into 3 s segments so as to utilize the entire train clip? Or just use the first 3 seconds of the clip? Alternatively, is random 3 s generated on-the-fly while training?

In the hdf5 setup, does the speech/noise/rir need to have sample number of samples? Or is the noise and RIR sampled randomly from a list? For example, if the speech list has 1000 samples, noise list is 100 samples and rir list is 100 samples, is it okay? or should it be 1000 speech, 1000 noise, 1000 rir? Is it needed to make the duration of speech and noise samples to be the same?

How about the reverberation parameter p_reverb = 0.05? The data augmentation is performed by default or any other config is needed? conv_lookahead = 2 in config.ini. But the paper mentions "look-ahead of l = 1 frame for both DF as well as in the DNN convolutions".

Trends in loss training

Hi Rikorose,

I'm now in the process of transferring your PyTorch code to Tensorflow/Keras and I'm running into some issues.
The loss factor for maskloss is 0, dfalphaloss is 1000, and spectralloss is 20000.
But I think dfalphaloss is not multiplied by 1000 in your code, the loss looks like dfalphaloss +spectrumloss * 20000.

And my training phase in Keras, dfalphaloss dropped from 0.085 to 0.06 in the first epoch, and then did not continue the downward trend.
But spectrum loss is slowly decreasing. Is this situation correctly?
I also try sisdr loss, but the effect also not good.

Another thing, when I compared the process file with your code. My .wav file looks like deepfilter(stage 2) doesn't work, and your wav is obviously processed under 5kHz.

By the way, when I check the code in 'LocalSnrTarget', the 'ws' and 'ws_ns' are not same, so when compute the local snr in speech/noise use the different frames? (speech for 1 and noise for 3)
And I think the lsnr layer in Encoder doesn't use to compute loss?

Do you have any suggestion?
Thanks,

Unable to run deepFIlter form CLI

Hi guys,
I have followed the installation instruction using pip and now I'm stuck with SLURM and I don't know how to fix it

I'm executing deepfilternet from colab pro+ account.

Here is the issue.

Traceback (most recent call last):
File "/usr/local/bin/deepFilter", line 5, in
from df.enhance import run
File "/usr/local/lib/python3.7/dist-packages/df/enhance.py", line 18, in
from df.logger import init_logger, warn_once
File "/usr/local/lib/python3.7/dist-packages/df/logger.py", line 49
if (jobid := os.getenv("SLURM_JOB_ID")) is not None:
^
SyntaxError: invalid syntax

This is which I get while trying to execute this :-> !deepFilter /content/test_audio_053830.wav --output-dir /content

Anyone got into this kind of issue??

DO let me know the solution / how can I run this.

Question on dataloader in training.py

I am working on training setup. I got below error in run_epoch function in [train.py.]

ERROR:-

File "/DeepFilterNet-main/DeepFilterNet/df/train.py", line 479, in
main()
└ <function main at 0x7f74abd598b0>

File "/DeepFilterNet-main/DeepFilterNet/df/train.py", line 173, in main
train_loss = run_epoch(
└ <function run_epoch at 0x7f74abd665e0>

File "/DeepFilterNet-main/DeepFilterNet/df/train.py", line 265, in run_epoch
assert batch.feat_spec is not None
│ └ None
└ Batch of size 1:
length: 240000
snr: -5
gain: 0

AssertionError: assert batch.feat_spec is not None

usage: train.py [-h] [--debug] data_config_file data_dir base_dir

python df/train.py ../assets/dataset.cfg ../assets/ df/new_config
Config file used
{
"test": [
[
"clean.hdf5",
100
],
[
"noise.hdf5",
10
]
],
"train": [
[
"clean.hdf5",
10000
],
[
"noise.hdf5",
10
]
],
"valid": [
[
"clean.hdf5",
100
],
[
"noise.hdf5",
10
]
]
}

I used above configuration file(dataset.cfg file available in Deepfilternet-main/assets/ directory).

It is saying that batch.feat_spec doesn't contain any information. Do we need to write batch.feat_spec and batch.feat_erb while writing to hdf5 file itself or dataloader function will extract these features?? .

Is train.py using predefined dataloader from torch.utils.data or it is taking from "C:\DFnet\DeepFilterNet-main\pyDF-data\libdfdata\torch_dataloader".

Can u help me out to resolve this error (AssertionError: assert batch.feat_spec is not None). Hope to hear from you soon.

Thanks in advance.

Confusion about the deepfilternet2 paper

Hi！First, really great work and thanks for open source and everything. But I have a few

For the loss in the paper, there is a spectrogram loss and a muli-resolution spectrogram loss. But I feel the multi-res loss could just include the spectrogram loss as one single window size, so why is it written separately in the paper? Did I miss something?
In the paper it says 60000HRTFs, I'm wondering if that's a typo? I feel it should be RIRs.

Thanks!

ModuleNotFoundError: No module named 'libdfdata'

Hello,

When I try to run the df/train.py file, I get ModuleNotFoundError: No module named 'libdfdata' error. I understand the libdfdata is found in pyDF-data folder.

Is it possible to fix this or is there any modification required from my side?

how to avoid rust

how can I just avoid using the Rust library altogether? Let's say I want to do everything in python, how can I get "spec, erb_feat, spec_feat" without using rust and just python?

rikorose / deepfilternet Goto Github PK

deepfilternet's Issues

Optional: Install libdfdata python package with dataset and dataloading functionality for training

usage: train.py [-h] [--debug] data_config_file data_dir base_dir

Recommend Projects

Recommend Topics

Recommend Org