rikorose / deepfilternet Goto Github PK
View Code? Open in Web Editor NEWNoise supression using deep filtering
Home Page: https://huggingface.co/spaces/hshr/DeepFilterNet2
License: Other
Noise supression using deep filtering
Home Page: https://huggingface.co/spaces/hshr/DeepFilterNet2
License: Other
This is really an amazing piece of work @Rikorose
Could you please add some instructions on how to train this with custom dataset?
really need step-by-step instruction, please help me
I'm using Win10, Anaconda prompt to run the code
I followed README.md to download the code
pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
and
pip install deepfilternet
works smoothy, but having issue on
deepFilter path/to/noisy_audio.wav
here's the error:
2022-06-15 14:49:09 | INFO | DF | Running on torch 1.11.0+cpu
2022-06-15 14:49:09 | INFO | DF | Running on host DESKTOP-RP8O01C
fatal: not a git repository (or any of the parent directories): .git
2022-06-15 14:49:09 | INFO | DF | Loading model settings of DeepFilterNet2
2022-06-15 14:49:09 | INFO | DF | Using DeepFilterNet2 model at anaconda3\lib\site-packages\pretrained_models\DeepFilterNet2
2022-06-15 14:49:09 | INFO | DF | Initializing model deepfilternet2
2022-06-15 14:49:10 | INFO | DF | Found checkpoint anaconda3\lib\site-packages\pretrained_models\DeepFilterNet2\checkpoints\model_96.ckpt.best with epoch 96
2022-06-15 14:49:10 | WARNING | DF | Unexpected key: erb_comp.c
2022-06-15 14:49:10 | WARNING | DF | Unexpected key: erb_comp.mn
2022-06-15 14:49:10 | INFO | DF | Model loaded
Traceback (most recent call last):
File "C:\Users\Mistorm\anaconda3\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Mistorm\anaconda3\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\Mistorm\anaconda3\Scripts\deepFilter.exe_main.py", line 7, in
File "C:\Users\Mistorm\anaconda3\lib\site-packages\df\enhance.py", line 329, in run
main(parser.parse_args())
File "C:\Users\Mistorm\anaconda3\lib\site-packages\df\enhance.py", line 41, in main
audio, meta = load_audio(file, df_sr)
File "C:\Users\Mistorm\anaconda3\lib\site-packages\df\enhance.py", line 188, in load_audio
info: AudioMetaData = ta.info(file, **ikwargs)
File "C:\Users\Mistorm\anaconda3\lib\site-packages\torchaudio\backend\soundfile_backend.py", line 103, in info
sinfo = soundfile.info(filepath)
File "C:\Users\Mistorm\anaconda3\lib\site-packages\soundfile.py", line 438, in info
return _SoundFileInfo(file, verbose)
File "C:\Users\Mistorm\anaconda3\lib\site-packages\soundfile.py", line 383, in init
with SoundFile(file) as f:
File "C:\Users\Mistorm\anaconda3\lib\site-packages\soundfile.py", line 629, in init
self._file = self._open(file, mode_int, closefd)
File "C:\Users\Mistorm\anaconda3\lib\site-packages\soundfile.py", line 1183, in _open
_error_check(_snd.sf_error(file_ptr),
File "C:\Users\Mistorm\anaconda3\lib\site-packages\soundfile.py", line 1357, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'path/to/noisy_audio.wav': System error.
I don't know why it can't open path/to/noisy_audio.wav
so I can't do the further step either:
(base) C:\Users\Mistorm>cd path/to/DeepFilterNet/
The system cannot find the path specified.
(base) C:\Users\Mistorm>cd DeepFilterNet
The system cannot find the path specified.
I can't figure why it can't find the file, since DeepFilterNet and DeepFilterNet2 has already in
C:\Users\Mistorm\anaconda3\Lib\site-packages\pretrained_models
what step did I miss?
and what to do after that? I have a hard time understanding how to make it works...
"from libdf import DF" in DeepFilterNet/df/checkpoint.py
I change it to "from libDF import DF" and setup correct package path. But error is still.
Hi @Rikorose,
I'd like to ask about the rationale behind this line
model.to(memory_format=torch.channels_last)
is it merely for speed in tensor core and it shouldn't affect the output values at all?
The reason I'm asking this is because, commenting out that line will give me different result, i.e.
enhanced = model(spec, erb_feat, spec_feat)[0].cpu()
will lead to different values of enhanced.
I wasn't expecting this, and I found out that the output of a convolution layer, specifically the enc.erb_conv0, is different with and without that setting. The code doesn't set the input memory format to channels last, only the model. So we have input that's channels first and weights that's channels last. I dig around pytorch forum and came across this thread, where they claim that pytorch should take this into account. Is that what you intended, for pytorch to internally handle the different format and automatically convert the input tensors to channels last? In that case, this difference in result isn't the intended behavior, and I've added a reply to the thread mentioning this behavior. But if that's not the case, and the different output is expected, may I understand the rationale for this?
Thanks,
Emily
Thanks for your awesome work!
I have installed deepfilternet through pip and test some samples. The overall noise reduction effect is great, but in whitenoise scene, there are more residues in the speech, which leads to a poor subjective feelings.
Have you noticed this phenomenon? I will attach the samples below.
samples.zip
We sample a subset of the dataset here:
https://github.com/Rikorose/DeepFilterNet/blob/7d5fae7/libDF/src/dataset.rs#L832
This is only done at initialization, but should be done at the start of each (training) epoch. So we see new samples each epoch.
Hi,
There seems to be occurrence of NaN for loss after few epochs during training.
Is there any way to avoid it?
Is it possible to resume the training from a particular checkpoint? I understand the training is resumed from the last saved checkpoint. But if the last saved checkpoint is NaN, then resuming the training would be an issue.
Hi, Rikorose
Thanks for sharing this code! There are some problems for me , could you give me some suggestions?
(1) How does the function "band_mean_norm_erb" work , or could you give me some papers to explain this implementation?
(2) "band_mean_norm_erb" is called by "transforms::erb_norm", "MEAN_NORM_INIT" is used in "transforms::erb_norm" and the number of "MEAN_NORM_INIT" is "[-60.0, -90.0]", so whether the numbers [-60.0, -90.0] is determined by experience or obtained through mathematical derivation?
Thank you !
Hi, Rikorose, I have a question to ask you. I have trained for 10 epoch according to your tutorial, and the configuration file is the config.ini of the pre-training model, but WHEN I used the prediction of this model, I found almost no noise reduction effect, could you help me analyze the reason. Thank you.
The following file contains my training log,model and cfg file.
Hi Rikorose,
Sorry to bother you again,
I try to generate data and train the model according to the training part.
I generated the training_set.txt
(just select 10 files for test.) for speech and made the hdf5.(and so on for noise).
Use python df/prepare_data.py --sr 48000 speech training_set.txt TRAIN_SET_SPEECH.hdf5
.
~/DeepFilterNet/wav/dataset/oblomov_s009036.wav
~/DeepFilterNet/wav/dataset/oblomov_s009040.wav
~/DeepFilterNet/wav/dataset/oblomov_s009033.wav
~/DeepFilterNet/wav/dataset/oblomov_s009037.wav
~/DeepFilterNet/wav/dataset/oblomov_s009041.wav
~/DeepFilterNet/wav/dataset/oblomov_s009034.wav
~/DeepFilterNet/wav/dataset/oblomov_s009038.wav
~/DeepFilterNet/wav/dataset/oblomov_s009042.wav
~/DeepFilterNet/wav/dataset/oblomov_s009035.wav
~/DeepFilterNet/wav/dataset/oblomov_s009039.wav
Generate the dataset.cfg
as shown below,
{
"train": [
[
"~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_SPEECH.hdf5",
1.0
],
[
"~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_NOISE.hdf5",
1.0
]
],
"valid": [
[
"~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_SPEECH.hdf5",
0.2
],
[
"~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_NOISE.hdf5",
0.2
]
],
"test": [
[
"~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_SPEECH.hdf5",
0.2
],
[
"~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_NOISE.hdf5",
0.2
]
]
}
Encounter some error as shown in the figure below,
In addition, I have some questions:
python df/train.py dataset.cfg ~/wav_folder/ ./base_dir/
,base_dir/
not exist? (But we need to give config.ini
, so here I enter pertrained_model/
and delete .ckpt)ex. from df.config import ... -> from config import ...
), otherwise it will cause an import error.Thanks,
Hi, thanks for your amazing works.
I try to follow the step in readme.md to make a dataset.
I am little confused about make hdf5 and cfg:
{
"train":[
[
"TRAIN_SET_SPEECH.hdf5",
0.6
],
[
"TRAIN_SET_NOISE.hdf5",
0.6
],
[
"TRAIN_SET_RIR.hdf5",
0.6
]
],
"valid":[
[
"TRAIN_SET_SPEECH.hdf5",
0.2
],
[
"TRAIN_SET_NOISE.hdf5",
0.2
],
[
"TRAIN_SET_RIR.hdf5",
0.2
]
],
"test":[
[
"TRAIN_SET_SPEECH.hdf5",
0.2
],
[
"TRAIN_SET_NOISE.hdf5",
0.2
],
[
"TRAIN_SET_RIR.hdf5",
0.2
]
],
}
@Rikorose
Thanks,This is a very good project. I use the web demo to test ,and the results is very good, but my local installation fails. I use the anaconda environment. how to filxed?
(pytorch36) C:\Users\admin>pip install deepfilternet
ERROR: Cannot install deepfilternet==0.1.2, deepfilternet==0.1.3 and deepfilternet==0.1.4 because these package versions have conflicting dependencies.
The conflict is caused by:
deepfilternet 0.1.4 depends on DeepFilterLib<0.2 and >=0.1
deepfilternet 0.1.3 depends on DeepFilterLib<0.2 and >=0.1
deepfilternet 0.1.2 depends on DeepFilterLib<0.2 and >=0.1
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
(pytorch36) C:\Users\admin>pip install DeepFilterLib
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
ERROR: Could not find a version that satisfies the requirement DeepFilterLib(from versions: none)
ERROR: No matching distribution found for DeepFilterLib
(pytorch36) C:\Users\admin>conda list
# packages in environment at D:\ProgramData\miniconda3\envs\pytorch36:
#
# Name Version Build Channel
absl-py 1.1.0 pypi_0 pypi
appdirs 1.4.4 pypi_0 pypi
attrs 21.4.0 pypi_0 pypi
audioread 2.1.9 pypi_0 pypi
blas 2.111 mkl conda-forge
blas-devel 3.9.0 11_win64_mkl conda-forge
ca-certificates 2022.5.18.1 h5b45459_0 conda-forge
cachetools 4.2.4 pypi_0 pypi
certifi 2022.5.18.1 pypi_0 pypi
cffi 1.15.0 pypi_0 pypi
charset-normalizer 2.0.12 pypi_0 pypi
colorama 0.4.4 pypi_0 pypi
cudatoolkit 11.1.1 heb2d755_7 conda-forge
cycler 0.10.0 py36_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
cython 0.29.30 pypi_0 pypi
dataclasses 0.8 pyh787bdff_2 conda-forge
decorator 4.4.2 pypi_0 pypi
ear 2.1.0 pypi_0 pypi
flatbuffers 2.0 pypi_0 pypi
freetype 2.10.4 h546665d_1 conda-forge
google-auth 2.7.0 pypi_0 pypi
google-auth-oauthlib 0.4.6 pypi_0 pypi
grpcio 1.46.3 pypi_0 pypi
icu 68.1 h0e60522_0 conda-forge
idna 3.3 pypi_0 pypi
imageio 2.15.0 pypi_0 pypi
importlib-metadata 4.8.3 pypi_0 pypi
importlib-resources 5.4.0 pypi_0 pypi
intel-openmp 2021.3.0 h57928b3_3372 conda-forge
jbig 2.1 h8d14728_2003 conda-forge
joblib 1.1.0 pypi_0 pypi
jpeg 9d h8ffe710_0 conda-forge
kiwisolver 1.3.1 py36he95197e_1 conda-forge
lcms2 2.12 h2a16943_0 conda-forge
lerc 2.2.1 h0e60522_0 conda-forge
libblas 3.9.0 11_win64_mkl conda-forge
libcblas 3.9.0 11_win64_mkl conda-forge
libclang 11.1.0 default_h5c34c98_1 conda-forge
libdeflate 1.7 h8ffe710_5 conda-forge
liblapack 3.9.0 11_win64_mkl conda-forge
liblapacke 3.9.0 11_win64_mkl conda-forge
libpng 1.6.37 h1d00b33_2 conda-forge
libprotobuf 3.18.0 h7755175_1 conda-forge
librosa 0.9.1 pypi_0 pypi
libtiff 4.3.0 h0c97f57_1 conda-forge
libuv 1.42.0 h8ffe710_0 conda-forge
llvmlite 0.36.0 pypi_0 pypi
lxml 4.9.0 pypi_0 pypi
lz4-c 1.9.3 h8ffe710_1 conda-forge
m2w64-gcc-libgfortran 5.3.0 6 conda-forge
m2w64-gcc-libs 5.3.0 7 conda-forge
m2w64-gcc-libs-core 5.3.0 7 conda-forge
m2w64-gmp 6.1.0 2 conda-forge
m2w64-libwinpthread-git 5.0.0.4634.697f757 2 conda-forge
markdown 3.3.7 pypi_0 pypi
matplotlib 3.3.1 1 conda-forge
matplotlib-base 3.3.1 py36h856a30b_0 conda-forge
mkl 2021.3.0 hb70f87d_564 conda-forge
mkl-devel 2021.3.0 h57928b3_565 conda-forge
mkl-include 2021.3.0 hb70f87d_564 conda-forge
msys2-conda-epoch 20160418 1 conda-forge
multipledispatch 0.6.0 pypi_0 pypi
networkx 2.5.1 pypi_0 pypi
ninja 1.10.2 h5362a0b_0 conda-forge
numba 0.53.1 pypi_0 pypi
numpy 1.19.5 py36h4b40d73_2 conda-forge
oauthlib 3.2.0 pypi_0 pypi
olefile 0.46 pyh9f0ad1d_1 conda-forge
onnx 1.10.1 py36h524f2fb_1 conda-forge
onnxruntime 1.10.0 pypi_0 pypi
openjpeg 2.4.0 hb211442_1 conda-forge
openssl 1.1.1o h8ffe710_0 conda-forge
packaging 21.3 pypi_0 pypi
pandas 1.1.5 pypi_0 pypi
pesq 0.0.4 pypi_0 pypi
pillow 8.3.2 py36h10c25d6_0 conda-forge
pip 21.3.1 pypi_0 pypi
pooch 1.6.0 pypi_0 pypi
prettytable 2.5.0 pypi_0 pypi
protobuf 3.19.4 pypi_0 pypi
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pycparser 2.21 pypi_0 pypi
pyparsing 2.2.0 py36_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
pypesq 1.2.4 pypi_0 pypi
pyqt 5.12.3 py36ha15d459_7 conda-forge
pyqt-impl 5.12.3 py36he2d232f_7 conda-forge
pyqt5-sip 4.19.18 py36he2d232f_7 conda-forge
pyqtchart 5.12 py36he2d232f_7 conda-forge
pyqtwebengine 5.12.1 py36he2d232f_7 conda-forge
pystoi 0.3.3 pypi_0 pypi
pytest-runner 5.3.2 pypi_0 pypi
python 3.6.13 h39d44d4_2_cpython conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python_abi 3.6 2_cp36m conda-forge
pytorch 1.9.1 py3.6_cuda11.1_cudnn8_0 pytorch
pytz 2022.1 pypi_0 pypi
pyvad 0.1.3 pypi_0 pypi
pywavelets 1.1.1 pypi_0 pypi
qt 5.12.9 h5909a2a_4 conda-forge
requests 2.27.1 pypi_0 pypi
requests-oauthlib 1.3.1 pypi_0 pypi
resampy 0.2.2 pypi_0 pypi
rsa 4.8 pypi_0 pypi
ruamel-yaml 0.17.21 pypi_0 pypi
ruamel-yaml-clib 0.2.6 pypi_0 pypi
scikit-image 0.17.2 pypi_0 pypi
scikit-learn 0.24.2 pypi_0 pypi
scipy 1.5.4 pypi_0 pypi
setuptools 59.5.0 pypi_0 pypi
six 1.16.0 pyh6c4a22f_0 conda-forge
soundfile 0.10.3.post1 pypi_0 pypi
speechpy 2.4 pypi_0 pypi
sqlite 3.36.0 h8ffe710_1 conda-forge
tbb 2021.3.0 h2d74725_0 conda-forge
tensorboard 2.9.1 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.1 pypi_0 pypi
threadpoolctl 3.1.0 pypi_0 pypi
tifffile 2020.9.3 pypi_0 pypi
tk 8.6.11 h8ffe710_1 conda-forge
torchaudio 0.9.1 py36 pytorch
torchvision 0.2.2 py_3 pytorch
tornado 4.5.2 py36_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
tqdm 4.64.0 pypi_0 pypi
typing-extensions 3.10.0.2 hd8ed1ab_0 conda-forge
typing_extensions 3.10.0.2 pyha770c72_0 conda-forge
ucrt 10.0.20348.0 h57928b3_0 conda-forge
urllib3 1.26.9 pypi_0 pypi
vc 14.2 hb210afc_5 conda-forge
vs2015_runtime 14.29.30037 h902a5da_5 conda-forge
wavinfo 1.6.3 pypi_0 pypi
wcwidth 0.2.5 pypi_0 pypi
webrtcvad 2.0.10 pypi_0 pypi
werkzeug 2.0.3 pypi_0 pypi
wheel 0.29.0 py36_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
wincertstore 0.2 py36_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
xz 5.2.5 h62dcd97_1 conda-forge
zipp 3.6.0 pypi_0 pypi
zlib 1.2.11 h62dcd97_1010 conda-forge
zstd 1.5.0 h6255e5f_0 conda-forge
(DeepFilterNet) C:\Users\sdr\Downloads\Playground\DeepFilterNet>deepFilter test.wav
Traceback (most recent call last):
File "C:\Users\sdr\anaconda3\envs\DeepFilterNet\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\sdr\anaconda3\envs\DeepFilterNet\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\sdr\anaconda3\envs\DeepFilterNet\Scripts\deepFilter.exe_main.py", line 7, in
TypeError: main() missing 1 required positional argument: 'args'
(DeepFilterNet) C:\Users\sdr\Downloads\Playground\DeepFilterNet>
Hi, thanks for this work ! I have some questions about calculation of STOI. How can we use evaluation_utils.py ? Could you explain it with an example or add some explanations to README ?
When I try to get result I always see this exception 'x and y should have the same length' . I am using your pretrained DeepFilterNet2 model and Valentini test dataset . The clean and enhanced tensor length are close but not same.
Hi,
I found a case that even if I remove a file such as loss.py from the project, the train.py still runs.
How is this possible? Maybe I am overlooking something.
Is it possible to edit loss.py ? Currently when I edit loss.py I can't see any changes happening. If I move loss.py out of df folder also, train.py does not throw any error.
Hi, thanks for this work. I have some questions about DNSMOS.
I tested the raw blind test set and deepfilter2 results with the DNSMOS tool (https://github.com/microsoft/DNS-Challenge dnsmos.py). Both results are different from your paper. What preprocessing did you do to the blind test set?
Hi, thanks for your amazing work! I have some questions about the wav encoding. For the training and test script, which encoding(signed 16, signed 32, float 32) is supported? Thank you!
Hello, thanks for your open source DeepFilterNet work. After trying it out, I found the effect and the amount of computation to be excellent.
After studying the network carefully, I confirmed that it meets the requirements of streaming speech processing. Therefore, after calculating the padding size, I changed the forward inference part of the model into streaming realization (for loop).
`
class Encoder(nn.Module):
def init(self):
super().init()
p = ModelParams()
layer_width = p.conv_ch
wf = p.conv_width_f
assert p.nb_erb % 4 == 0, "erb_bins should be divisible by 4"
k = p.conv_k_enc
kwargs = {"batch_norm": True, "depthwise": p.conv_depthwise}
k0 = 1 if k == 1 and p.conv_lookahead == 0 else max(2, k)
cl = 1 if p.conv_lookahead > 0 else 0
self.erb_conv0 = convkxf(1, layer_width, k=k0, fstride=1, lookahead=cl, **kwargs)
cl = 1 if p.conv_lookahead > 1 else 0
self.erb_conv1 = convkxf(
layer_width * wf**0, layer_width * wf**1, k=k, lookahead=cl, **kwargs
)
cl = 1 if p.conv_lookahead > 2 else 0
self.erb_conv2 = convkxf(
layer_width * wf**1, layer_width * wf**2, k=k, lookahead=cl, **kwargs
)
self.erb_conv3 = convkxf(
layer_width * wf**2, layer_width * wf**2, k=k, fstride=1, **kwargs
)
self.df_conv0 = convkxf(
2, layer_width, fstride=1, k=k0, lookahead=p.conv_lookahead, **kwargs
)
self.df_conv1 = convkxf(layer_width, layer_width * wf**1, k=k, **kwargs)
self.erb_bins = p.nb_erb
self.emb_dim = layer_width * p.nb_erb // 4 * wf**2
self.df_fc_emb = GroupedLinear(
layer_width * p.nb_df // 2, self.emb_dim, groups=p.lin_groups
)
self.emb_out_dim = p.emb_hidden_dim
self.emb_n_layers = p.emb_num_layers
self.gru_groups = p.gru_groups
self.emb_gru = GroupedGRU(
self.emb_dim,
self.emb_out_dim,
num_layers=p.emb_num_layers,
batch_first=False,
groups=p.gru_groups,
shuffle=p.group_shuffle,
add_outputs=True,
)
self.lsnr_fc = nn.Sequential(nn.Linear(self.emb_out_dim, 1), nn.Sigmoid())
self.lsnr_scale = p.lsnr_max - p.lsnr_min
self.lsnr_offset = p.lsnr_min
self.streaming_state = {
'e1': None,
'e2': None,
'c0': None,
}
def forward(
self, feat_erb: Tensor, feat_spec: Tensor
) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]:
# Encodes erb; erb should be in dB scale + normalized; Fe are number of erb bands.
#streaming implementation
B, C, T, Ferb = feat_erb.shape
erb_padding_right = torch.zeros((B, C, 2, Ferb), dtype=feat_erb.dtype, device=feat_erb.device)
feat_erb = torch.cat([feat_erb, erb_padding_right], dim=-2)
B, _, T, Fspec = feat_spec.shape
spec_padding_right = torch.zeros((B, 2, 2, Fspec), dtype=feat_spec.dtype, device=feat_spec.device)
feat_spec = torch.cat([feat_spec, spec_padding_right], dim=-2)
e0_s, e1_s, e2_s, e3_s = None, None, None, None
emb_s, c0_s, c1_s, lsnr_s = None, None, None, None
self.streaming_state['e1'] = torch.zeros((B, 64, 1, Ferb // 2), dtype=feat_erb.dtype, device=feat_erb.device)
self.streaming_state['e2'] = torch.zeros((B, 64, 1, Ferb // 4), dtype=feat_erb.dtype, device=feat_erb.device)
self.streaming_state['c0'] = torch.zeros((B, 64, 1, Fspec), dtype=feat_spec.dtype, device=feat_spec.device)
for t in range(T):
sub_e0 = self.erb_conv0[1:](feat_erb[:,:,t:t+3,:]) # [B, C, 2, F]
sub_e1 = self.erb_conv1[1:](sub_e0) # [B, C*2, 1, F/2]
sub_e2 = self.erb_conv2[1:](torch.cat([self.streaming_state['e1'],sub_e1], dim=-2)) # [B, C*4, 1, F/4]
sub_e3 = self.erb_conv3[1:](torch.cat([self.streaming_state['e2'],sub_e2], dim=-2)) # [B, C*4, 1, F/4]
self.streaming_state['e1'] = sub_e1
self.streaming_state['e2'] = sub_e2
sub_c0 = self.df_conv0[1:](feat_spec[:,:,t+1:t+3,:])# [B, C, 1, Fc]
sub_c1 = self.df_conv1[1:](torch.cat([self.streaming_state['c0'],sub_c0], dim=-2)) # [B, C*2, 1, Fc]
self.streaming_state['c0'] = sub_c0
sub_cemb = sub_c1.permute(2, 0, 1, 3).reshape(1, B, -1) # [1, B, C * Fc/4]
sub_cemb = self.df_fc_emb(sub_cemb) # [1, B, C * F/4]
sub_emb = sub_e3.permute(2, 0, 1, 3).reshape(1, B, -1) # [1, B, C * F/4]
sub_emb = sub_emb + sub_cemb
sub_emb, _ = self.emb_gru(sub_emb)
sub_emb = sub_emb.transpose(0, 1) # [B, 1, C * F/4]
sub_lsnr = self.lsnr_fc(sub_emb) * self.lsnr_scale + self.lsnr_offset
if t == 0:
e0_s, e1_s, e2_s, e3_s = sub_e0[:, :, [0], :], sub_e1, sub_e2, sub_e3
c0_s, c1_s, emb_s, lsnr_s = sub_c0, sub_c1, sub_emb, sub_lsnr
else:
e0_s = torch.cat((e0_s, sub_e0[:, :, [0], :]), dim=-2)
e1_s = torch.cat((e1_s, sub_e1), dim=-2)
e2_s = torch.cat((e2_s, sub_e2), dim=-2)
e3_s = torch.cat((e3_s, sub_e3), dim=-2)
c0_s = torch.cat((c0_s, sub_c0), dim=-2)
c1_s = torch.cat((c1_s, sub_c1), dim=-2)
emb_s = torch.cat((emb_s, sub_emb), dim=-2)
lsnr_s = torch.cat((lsnr_s, sub_lsnr), dim=-2)
return e0_s, e1_s, e2_s, e3_s, emb_s, c0_s, lsnr_s
class ErbDecoder(nn.Module):
def init(self):
super().init()
p = ModelParams()
layer_width = p.conv_ch
wf = p.conv_width_f
assert p.nb_erb % 8 == 0, "erb_bins should be divisible by 8"
self.emb_width = layer_width * wf**2
self.emb_dim = self.emb_width * (p.nb_erb // 4)
self.fc_emb = nn.Sequential(
GroupedLinear(
p.emb_hidden_dim, self.emb_dim, groups=p.lin_groups, shuffle=p.group_shuffle
),
nn.ReLU(inplace=True),
)
k = p.conv_k_dec
kwargs = {"k": k, "batch_norm": True, "depthwise": p.conv_depthwise}
tkwargs = {
"k": k,
"batch_norm": True,
"depthwise": p.convt_depthwise,
"mode": p.conv_dec_mode,
}
pkwargs = {"k": 1, "f": 1, "batch_norm": True}
# convt: TransposedConvolution, convp: Pathway (encoder to decoder) convolutions
self.conv3p = convkxf(layer_width * wf**2, self.emb_width, **pkwargs)
self.convt3 = convkxf(self.emb_width, layer_width * wf**2, fstride=1, **kwargs)
self.conv2p = convkxf(layer_width * wf**2, layer_width * wf**2, **pkwargs)
self.convt2 = convkxf(layer_width * wf**2, layer_width * wf**1, **tkwargs)
self.conv1p = convkxf(layer_width * wf**1, layer_width * wf**1, **pkwargs)
self.convt1 = convkxf(layer_width * wf**1, layer_width * wf**0, **tkwargs)
self.conv0p = convkxf(layer_width, layer_width, **pkwargs)
self.conv0_out = convkxf(layer_width, 1, fstride=1, k=k, act=nn.Sigmoid())
self.streaming_state = {
'convt3in': None,
'convt2in': None,
'convt1in': None,
'conv0in': None
}
def forward(self, emb, e3, e2, e1, e0) -> Tensor:
# Estimates erb mask
#streaming implementation
B, C, T, F8 = e3.shape
data_type, device = e3.dtype, e3.device
self.streaming_state['convt3in'] = torch.zeros((B, C, 1, F8), dtype=data_type, device=device)
self.streaming_state['convt2in'] = torch.zeros((B, C, 1, F8), dtype=data_type, device=device)
self.streaming_state['convt1in'] = torch.zeros((B, C, 1, F8*2), dtype=data_type, device=device)
self.streaming_state['conv0in'] = torch.zeros((B, C, 1, F8*4), dtype=data_type, device=device)
m = None
for t in range(T):
sub_emb = self.fc_emb(emb[:, [t], :])
sub_emb = sub_emb.view(B, 1, -1, F8).transpose(1, 2) # [B, C*8, T, F/8]
convt3_in_cur = self.conv3p(e3[:, :, [t], :]) + sub_emb
convt3_in = torch.cat([self.streaming_state['convt3in'], convt3_in_cur], dim=-2)
self.streaming_state['convt3in'] = convt3_in_cur
sub_e3 = self.convt3[1:](convt3_in) # [B, C*4, T, F/4]
convt2_in_cur = self.conv2p(e2[:, :, [t], :]) + sub_e3
convt2_in = torch.cat([self.streaming_state['convt2in'], convt2_in_cur], dim=-2)
self.streaming_state['convt2in'] = convt2_in_cur
sub_e2 = self.convt2[1:](convt2_in) # [B, C*2, T, F/2]
convt1_in_cur = self.conv1p(e1[:, :, [t], :]) + sub_e2
convt1_in = torch.cat([self.streaming_state['convt1in'], convt1_in_cur], dim=-2)
self.streaming_state['convt1in'] = convt1_in_cur
sub_e1 = self.convt1[1:](convt1_in) # [B, C, T, F]
conv0_in_cur = self.conv0p(e0[:, :, [t], :]) + sub_e1
conv0_in = torch.cat([self.streaming_state['conv0in'], conv0_in_cur], dim=-2)
self.streaming_state['conv0in'] = conv0_in_cur
sub_m = self.conv0_out[1:](conv0_in) # [B, 1, T, F]
if t == 0:
m = sub_m
else:
m = torch.cat((m, sub_m), dim=-2)
return m
class DfNet(nn.Module):
def init(
self,
erb_inv_fb: Tensor,
run_df: bool = True,
train_mask: bool = True,
):
super().init()
p = ModelParams()
layer_width = p.conv_ch
assert p.nb_erb % 8 == 0, "erb_bins should be divisible by 8"
self.freq_bins = p.fft_size // 2 + 1
self.emb_dim = layer_width * p.nb_erb
self.erb_bins = p.nb_erb
self.enc = Encoder()
self.erb_dec = ErbDecoder()
self.mask = Mask(erb_inv_fb, post_filter=p.mask_pf)
self.df_order = p.df_order
self.df_bins = p.nb_df
self.df_lookahead = p.df_lookahead
self.df_dec = DfDecoder()
self.df_op = torch.jit.script(
DfOp(
p.nb_df,
p.df_order,
p.df_lookahead,
freq_bins=self.freq_bins,
method=p.dfop_method,
)
)
self.run_df = run_df
if not run_df:
from loguru import logger
logger.warning("Runing without DF")
self.train_mask = train_mask
def forward(
self,
spec: Tensor,
feat_erb: Tensor,
feat_spec: Tensor, # Not used, take spec modified by mask instead
atten_lim: Optional[Tensor] = None,
) -> Tuple[Tensor, Tensor, Tensor, Tensor]:
feat_spec = feat_spec.transpose(1, 4).squeeze(4) # re/im into channel axis
e0, e1, e2, e3, emb, c0, lsnr = self.enc(feat_erb, feat_spec)
m = self.erb_dec(emb, e3, e2, e1, e0)
spec = self.mask(spec, m, atten_lim)
self.run_df = False
if self.run_df:
df_coefs, df_alpha = self.df_dec(emb, c0)
spec = self.df_op(spec, df_coefs, df_alpha)
else:
df_alpha = torch.zeros(spec.shape[0], spec.shape[2], 1, device=spec.device)
return spec, m, lsnr, df_alpha
`
I only use Encoder and ErbDecoder modules. However, my result was not so good. Later, I found that the difference is due to nn.GRU inference. There will be different results between batched and one by one inference of nn.GRU because of NUMERICAL ACCURACY. https://pytorch.org/docs/stable/notes/numerical_accuracy.html
E.g. reported in #31
Maybe this could be improved by increasing the eps in angle_backward?
Hi,
In the current framework, it seems the speech, noise and RIRs path have to be provided as lists to create the hdf5 sets for training. I have few questions on the same.
Hi,
Is there any option to fine-tune the pre-trained models ?
Thanks for your time.
Hi Hendrik,
Just curious as didn't see any benchmarks but could it run chunking audio as it goes or does it need the overall file to analyse?
How does it compare to https://github.com/breizhn/DTLN?
Thanks
Stuart
DeepFilterNet/DeepFilterNet/df/stoi.py
Line 95 in 1819b97
the way to calculate istft by transposed convolution,i think the window should be square then to be used ,
which i compare is as follow
istft
the reference i have checked,it is the same as librosa istft
When I run the command
maturin build --release -m DeepFilterNet/Cargo.toml
I am getting the following error.
🔗 Found pyo3 bindings
🐍 Found CPython 3.6m at python3.6, CPython 3.7m at python3.7
Compiling df v0.1.0 (/content/DeepFilterNet/libDF)
error[E0277]: [u32; 5]
is not an iterator
--> libDF/src/transforms.rs:449:42
|
449 | let fft_size = primes.iter().zip(factors).fold(1, |acc, (p, f)| acc * p.pow(f));
| ^^^^^^^
| |
| expected an implementor of trait IntoIterator
| help: consider borrowing here: &factors
|
= note: the trait bound [u32; 5]: IntoIterator
is not satisfied
= note: required because of the requirements on the impl of IntoIterator
for [u32; 5]
error[E0599]: the method fold
exists for struct std::iter::Zip<std::slice::Iter<'_, usize>, [u32; 5]>
, but its trait bounds were not satisfied
--> libDF/src/transforms.rs:449:51
|
449 | let fft_size = primes.iter().zip(factors).fold(1, |acc, (p, f)| acc * p.pow(f));
| ^^^^ method cannot be called on std::iter::Zip<std::slice::Iter<'_, usize>, [u32; 5]>
due to unsatisfied trait bounds
|
= note: the following trait bounds were not satisfied:
[u32; 5]: Iterator
which is required by std::iter::Zip<std::slice::Iter<'_, usize>, [u32; 5]>: Iterator
std::iter::Zip<std::slice::Iter<'_, usize>, [u32; 5]>: Iterator
which is required by &mut std::iter::Zip<std::slice::Iter<'_, usize>, [u32; 5]>: Iterator
error: aborting due to 2 previous errors
Some errors have detailed explanations: E0277, E0599.
For more information about an error, try rustc --explain E0277
.
error: could not compile df
To learn more, run the command again with --verbose.
💥 maturin failed
Caused by: Failed to build a native library through cargo
Caused by: Cargo build finished with "exit status: 101": cargo rustc --message-format json --manifest-path DeepFilterNet/Cargo.toml --release --lib -- -C link-arg=-s
It would be great if you help.
Thanks
Hi Rikorose,
Sorry to bother you again,
I have transfered the code to keras and try to inference like real-time.
I found a few situations I want to ask,
In enhance.py, the flow looks like feeding the entire signal into the model? It is like a offline inference?
I try to do a real-time inference, this is my flow.
When I change the buffer length to inference, the effect is good while the length is 100 or 300.
But when I change the buffer length to 10 frames, the effect sounds bad.
When I try to do the real-time inference, what is the minimum buffer length?
Thanks,
when i use the cmd follow install libdfdat, but failed
maturin develop --release -m pyDF-data/Cargo.toml
error: failed to run custom build command for hdf5-sys v0.8.1
Caused by:
process didn't exit successfully: E:\data\deeplearning\pytorch\DeepFilterNet\target\release\build\hdf5-sys-8ffb164969e6e670\build-script-build
(exit code: 101)
--- stdout
Searching for installed HDF5 (any version)...
Found no HDF5 installations.
--- stderr
thread 'main' panicked at 'Unable to locate HDF5 root directory and/or headers.', C:\Users\tangzixing.cargo\registry\src\github.com-1ecc6299db9ec823\hdf5-sys-0.8.1\build.rs:548:13
note: run with RUST_BACKTRACE=1
environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
error: build failed
💥 maturin failed
Caused by: Failed to build a native library through cargo
Caused by: Cargo build finished with "exit code: 101": cargo rustc --message-format json --manifest-path pyDF-data/Cargo.toml --release --lib -- -C link-arg=-s
Hi,
Is it possible to provide a config.ini to reproduce the results in the paper?
Is the pre-trained model's config.ini same as the one used to get the results in the paper?
Hello,
I am trying to do my own data augmentation.
And I found that my dataset always has Nan loss 3 loss function.
I noticed that the paper says that the data has been exponentially mean/unit normalized, I think maybe this cause the Nan issue for me. Could you give some details?
Thanks,
Hi Rikorose,
Thanks for working on version 2 of Deepfilternet.
Now I can do the real-time inference process with buffer size=1, which is the same as the full signal effect.
The point is that the state of the RNN needs to be inherited.
Now I'm having a trouble with typing/keyboard noise not working well.
But I only use spectral loss with c=0.3 in Deepfilternet2 now, will multi-resolution loss improve in this case?
or maybe c=0.6 in preious work is better?
Thanks,
Aaron
Thanks for your awesome work!
And I am confusing about the pad_feat/pad_spec and df_op function so I open this issue to check it out.
First, I try to test your trained model, and the class DfNet() in deepfilternet2.py
self.pad_feat = nn.ConstantPad2d((0, 0, -p.conv_lookahead, p.conv_lookahead), 0.0)
self.pad_spec = nn.ConstantPad3d((0, 0, 0, 0, -p.df_lookahead, p.df_lookahead), 0.0)
self.pad_out = nn.Identity()
for line 430-432,444-445
feat_erb = self.pad_feat(feat_erb)
feat_spec = self.pad_feat(feat_spec)
e0, e1, e2, e3, emb, c0, lsnr = self.enc(feat_erb, feat_spec)
spec_f = self.pad_spec(spec)
spec_f = self.df_op(spec_f, df_coefs)
My question is,
a. In nn.ConstantPad2/3d, -p.df_lookahead=2 means to remove the data, so is there 2 frames of information missing during training?
b. self.df_op is Causal/ not-Causal model? For example, is the first frame data calculated using 0,0,0,0 and 3 frames?
Thanks!
Hi, I've tried to get Colab Pro in order to run this on their higher RAM (which I thought was going to be extra GPU memory) but turns out it's just normal RAM. Is it possible for me to run Deepfilternet on a normal CPU + RAM combination as opposed to use GPU memory? Basically I'm doing this because I'd like to run it on longer files (hour long)
lsnr = self.lsnr_fc(emb) * self.lsnr_scale + self.lsnr_offset
i found in the code, this variable is calculated, but not used for loss and other place ,so what is the use of this target?
Hi, when I run the train, the code failed:
RuntimeError: DF dataset error: Hdf5ErrorDetail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during Fil
e::open of dataset /dockerdata/thujunchen/cjcode/ft_local/DeepFilterNet/DNS16kdataset/VALID_SET_SPEECH.hdf5" }
There is no error reported at df/prepare_data.py.
I have tried the cargo test, which reports that:
running 24 tests
test reexport_dataset_modules::util::test_find_max_abs ... ok
test tests::test_erb_inout ... ok
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_07 - should panic ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_10 ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_read_vorbis ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_05 - should panic ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_01 ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_read_pcm ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_02 ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_04 - should panic ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_read_flac ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_06 - should panic ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_09 ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_08 ... FAILED
test reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_03 ... FAILED
test reexport_dataset_modules::dataloader::tests::test_fft_dataset ... FAILED
test reexport_dataset_modules::dataset::tests::test_cached_valid_dataset ... FAILED
test reexport_dataset_modules::augmentations::tests::test_filters ... ok
test reexport_dataset_modules::augmentations::tests::test_gen_noise ... ok
test reexport_dataset_modules::augmentations::tests::test_clipping ... ok
test reexport_dataset_modules::augmentations::tests::test_rand_resample ... ok
test reexport_dataset_modules::augmentations::tests::test_low_pass ... ok
test reexport_dataset_modules::dataset::tests::test_mix_audio_signal ... ok
test reexport_dataset_modules::augmentations::tests::test_reverb ... ok
failures:
---- reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_06 stdout ----
-------------- TEST START --------------
thread 'reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_06' panicked at 'called Result::unwrap()
on an Err
value: Hdf5Error
Detail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../assets/noise_flac.h
df5" }', libDF/src/dataset.rs:1956:41
note: panic did not contain expected string
panic message: "called
Result::unwrap()on an
Err value: Hdf5ErrorDetail { source: H5Fopen(): unable to open file: bad superblock ve rsion number, msg: \"Error during File::open of dataset ../assets/noise_flac.hdf5\" }"
,
expected substring: "Slice end"
---- reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_09 stdout ----
-------------- TEST START --------------
thread 'reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_09' panicked at 'called Result::unwrap()
on an Err
value: Hdf5Error
Detail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../assets/noise_flac.h
df5" }', libDF/src/dataset.rs:1956:41
---- reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_08 stdout ----
-------------- TEST START --------------
thread 'reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_08' panicked at 'called Result::unwrap()
on an Err
value: Hdf5Error
Detail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../assets/noise.hdf5"
}', libDF/src/dataset.rs:1956:41
---- reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_03 stdout ----
-------------- TEST START --------------
thread 'reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_03' panicked at 'called Result::unwrap()
on an Err
value: Hdf5Error
Detail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../assets/noise_vorbis
.hdf5" }', libDF/src/dataset.rs:1956:41
---- reexport_dataset_modules::dataloader::tests::test_fft_dataset stdout ----
******** Start test_data_loader() ********
Error: DatasetError(Hdf5ErrorDetail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of d
ataset ../assets/clean.hdf5" })
thread 'reexport_dataset_modules::dataloader::tests::test_fft_dataset' panicked at 'assertion failed: (left == right)
left: 1
,
right: 0
: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/fe5b13d681f25ee6474be29d7
48c65adcd91f69e/library/test/src/lib.rs:186:5
---- reexport_dataset_modules::dataset::tests::test_cached_valid_dataset stdout ----
Error: Hdf5ErrorDetail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../ass
ets/clean.hdf5" }
thread 'reexport_dataset_modules::dataset::tests::test_cached_valid_dataset' panicked at 'assertion failed: (left == right)
left: 1
,
right: 0
: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/fe5b13d681f25ee6474be29d7
48c65adcd91f69e/library/test/src/lib.rs:186:5
failures:
reexport_dataset_modules::dataloader::tests::test_fft_dataset
reexport_dataset_modules::dataset::tests::test_cached_valid_dataset
reexport_dataset_modules::dataset::tests::test_hdf5_read_flac
reexport_dataset_modules::dataset::tests::test_hdf5_read_pcm
reexport_dataset_modules::dataset::tests::test_hdf5_read_vorbis
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_01
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_02
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_03
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_04
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_05
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_06
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_07
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_08
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_09
reexport_dataset_modules::dataset::tests::test_hdf5_slice::case_10
test result: FAILED. 9 passed; 15 failed; 0 ignored; 0 measured; 0 filtered out; finished in 3.67s
error: test failed, to rerun pass '-p deep_filter --lib'
I tried to update hdf5 to 1.10.1 as stated in https://stackoverflow.com/questions/49386121/python-h5py-file-read-oserror-unable-to-open-file-bad-superblock-version-numb, but it also did not work.
Hi @Rikorose
Hello, thanks for your open source DeepFilterNet work. After PyPI install and trying it out, I found the effect and the amount of computation to be excellent.
So I want studying the network carefully, and manual Installation in win10.
first , i setup a conda env , and download rust and cargo : rustc 1.61.0 (fe5b13d68 2022-05-18)
second, in a python3.9 conda env , I follow your README :
pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
pip install maturin poetry
maturin develop --release -m pyDF/Cargo.toml
maturin develop --release -m pyDF-data/Cargo.toml
These commands have a correct result, key packages: DeepFilterDataLoader and DeepFilterLib install sucessful.
asttokens 2.0.5
CacheControl 0.12.11
cachy 0.3.0
certifi 2022.5.18.1
charset-normalizer 2.0.12
cleo 0.8.1
clikit 0.6.2
colorama 0.4.5
crashtest 0.3.1
DeepFilterDataLoader 0.2.5rc0
DeepFilterLib 0.2.5rc0
distlib 0.3.4
executing 0.8.3
filelock 3.7.1
html5lib 1.1
icecream 2.1.2
idna 3.3
importlib-metadata 4.11.4
keyring 23.6.0
lockfile 0.12.2
loguru 0.6.0
maturin 0.12.20
msgpack 1.0.4
numpy 1.22.4
packaging 20.9
pastel 0.2.1
pexpect 4.8.0
pip 21.2.4
pkginfo 1.8.3
platformdirs 2.5.2
poetry 1.1.13
poetry-core 1.0.8
ptflops 0.6.9
ptyprocess 0.7.0
Pygments 2.12.0
pylev 1.4.0
pyparsing 3.0.9
pywin32-ctypes 0.2.0
requests 2.28.0
requests-toolbelt 0.9.1
setuptools 61.2.0
shellingham 1.4.0
six 1.16.0
tomli 2.0.1
tomlkit 0.11.0
torch 1.11.0+cpu
torchaudio 0.11.0+cpu
typing_extensions 4.2.0
urllib3 1.26.9
virtualenv 20.14.1
webencodings 0.5.1
wheel 0.37.1
win32-setctime 1.1.0
wincertstore 0.2
zipp 3.8.0
third, poetry install -E train -E eval
or poetry install -E train -E eval --no-root
, those cmd are been blocking and has not returned any results,even after an hour or more.
(DeepFilterNet) E:\code\DeepFilterNet\DeepFilterNet>poetry install -E train -E eval --no-root
Updating dependencies
Resolving dependencies...
How to fixed it? and how can debug that where is blocking ?
Thanks!
Hello,
I was trying to test deepfilternet on Windows. I don't know much about the technical aspects or Python in general though, and consequently I am getting errors and I don't know the cause.
I ran these commands from the readme:
pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
pip install deepfilternet
(So far, no errors)
deepFilter test.wav
(Test.wav is just a placeholder to the path to a real file)
I tried both with Python 3.10 (latest) and 3.7 as suggested by the deepfilterlib page on pypi. In both cases I get errors that no audio backend is available, and that libdf couldn't be found.
Any advice would be greatly appreciated. Thanks in advance!
Basically fix this TODO:
https://github.com/Rikorose/DeepFilterNet/blob/7f2120b/libDF/src/dataset.rs#L340
The assumption that the closure is submitted in order is not correct. The input drain rather gets split in chunks corresponding to the number of workers. Thus, changing the number of workers will change the order of samples occurring in each batch.
Hi,
I am confusing about the erb_rb function so i open this issue to check it out.
In forward, the erb2stft is done by spec_mask = erb_mask.matmul(erb_inv_fb)
, but i check the code in librosa, the erb2stft is done by mel_to_stft function
# Construct a mel basis with dtype matching the input data
mel_basis = filters.mel(
sr=sr, n_fft=n_fft, n_mels=M.shape[-2], dtype=M.dtype, **kwargs
)
# Find the non-negative least squares solution, and apply
# the inverse exponent.
# We'll do the exponentiation in-place.
inverse = nnls(mel_basis, M)
return np.power(inverse, 1.0 / power, out=inverse)
My question is,
a. Is the erb2stft process lossless, so is the mel2stft, bark2stft?
b. Is the erb better than stft feature in DeepFilterNet?
Thanks
follow #31 and readme ,i have prepare the speech and noise hdf5 files and dateset.cfg file ,
the speech and noise data is from dns challenge,the num is 50 and the batchsize i set is 2
when i train the net ,erros occur as follows :
can you give me some advice to fix this error? thanks
Running deepFilter wav_name
Traceback (most recent call last):
File "deepFilter", line 8, in <module>
sys.exit(main())
File "env/lib/python3.8/site-packages/df/enhance.py", line 66, in main
p = ModelParams()
File "env/lib/python3.8/site-packages/df/model.py", line 15, in __init__
self.__params = getattr(import_module("df." + self.__model), "ModelParams")()
File "env/lib/python3.8/site-packages/df/deepfilternet.py", line 48, in __init__
self.group_shuffle: bool = config(
File "env/lib/python3.8/site-packages/df/config.py", line 114, in __call__
raise ValueError(f"Value '{option}' not found in config (defaults not allowed).")
ValueError: Value 'GROUP_SHUFFLE' not found in config (defaults not allowed).
I want to reproduce your job ,but when i run the train, load the hdf5 failed.
2022-03-01 14:40:46 | INFO | DF | Running on torch 1.10.0
2022-03-01 14:40:46 | INFO | DF | Running on host ultralab-server
2022-03-01 14:40:46 | INFO | DF | Git commit: 05da995, branch: main
2022-03-01 14:40:46 | INFO | DF | Running on device cuda:0
2022-03-01 14:40:46 | INFO | DF | Initializing model deepfilternet
2022-03-01 14:40:53 | WARNING | DF | Failed to print model summary: No module named 'ptflops'
2022-03-01 14:40:53 | INFO | DF | Running with normalization window alpha = '0.996'
2022-03-01 14:40:53 | INFO | DF | Initializing dataloader with data directory ../data/dns/
2022-03-01 14:40:53 | ERROR | DF | An error has been caught in function '', process 'MainProcess' (19629), thread 'MainThread' (140105826391808):
Traceback (most recent call last):
File "df/train.py", line 425, in
main()
└ <function main at 0x7f6cccc270d0>
File "df/train.py", line 103, in main
dataloader = DataLoader(
└ <class 'libdfdata.torch_dataloader.PytorchDataLoader'>
File "/home/tangzixing/data/deeplearning/program/audio/DeepFilterNet-0.1.10/pyDF-data/libdfdata/torch_dataloader.py", line 99, in init
self.loader = _FdDataLoader(
│ └ <class 'builtins._FdDataLoader'>
└ <libdfdata.torch_dataloader.PytorchDataLoader object at 0x7f6ccc9ae460>
RuntimeError: DF dataset error: Hdf5ErrorDetail { source: H5Fopen(): unable to open file: bad superblock version number, msg: "Error during File::open of dataset ../data/dns/val_speech.hdf5" }
the h5py verision is 3.6.0
h5py.version.hdf5_version is 1.2.1
which version of h5py you use ?
Hi,
I had tried to re-train the deepfilternet model using the DNS-3 challenge dataset mentioned in your work.
I don't have the additional 10k IR. However, the other dataset remains the same.
On VCTK test set, using the config.ini in the pre-trained model as my training config, my "best model" on validation gives PESQ score of 2.60. It is much lower than 2.81 from the pre-trained model.
In config.ini, Adamw is used, while in the paper Adam as optimizer is mentioned.
Do you think any other factors would result in such a performance drop?
Could you clarify on the 3 s sample for training? Suppose the DNS-3 sample has 10 s in a sample, do I need to split it into 3 s segments so as to utilize the entire train clip? Or just use the first 3 seconds of the clip? Alternatively, is random 3 s generated on-the-fly while training?
In the hdf5 setup, does the speech/noise/rir need to have sample number of samples? Or is the noise and RIR sampled randomly from a list? For example, if the speech list has 1000 samples, noise list is 100 samples and rir list is 100 samples, is it okay? or should it be 1000 speech, 1000 noise, 1000 rir? Is it needed to make the duration of speech and noise samples to be the same?
How about the reverberation parameter p_reverb = 0.05? The data augmentation is performed by default or any other config is needed? conv_lookahead = 2 in config.ini. But the paper mentions "look-ahead of l = 1 frame for both DF as well as in the DNN convolutions".
Hi Rikorose,
I'm now in the process of transferring your PyTorch code to Tensorflow/Keras and I'm running into some issues.
The loss factor for maskloss is 0, dfalphaloss is 1000, and spectralloss is 20000.
But I think dfalphaloss is not multiplied by 1000 in your code, the loss looks like dfalphaloss +spectrumloss * 20000.
And my training phase in Keras, dfalphaloss dropped from 0.085 to 0.06 in the first epoch, and then did not continue the downward trend.
But spectrum loss is slowly decreasing. Is this situation correctly?
I also try sisdr loss, but the effect also not good.
Another thing, when I compared the process file with your code. My .wav file looks like deepfilter(stage 2) doesn't work, and your wav is obviously processed under 5kHz.
By the way, when I check the code in 'LocalSnrTarget', the 'ws' and 'ws_ns' are not same, so when compute the local snr in speech/noise use the different frames? (speech for 1 and noise for 3)
And I think the lsnr layer in Encoder doesn't use to compute loss?
Do you have any suggestion?
Thanks,
Hi guys,
I have followed the installation instruction using pip and now I'm stuck with SLURM and I don't know how to fix it
I'm executing deepfilternet from colab pro+ account.
Here is the issue.
Traceback (most recent call last):
File "/usr/local/bin/deepFilter", line 5, in
from df.enhance import run
File "/usr/local/lib/python3.7/dist-packages/df/enhance.py", line 18, in
from df.logger import init_logger, warn_once
File "/usr/local/lib/python3.7/dist-packages/df/logger.py", line 49
if (jobid := os.getenv("SLURM_JOB_ID")) is not None:
^
SyntaxError: invalid syntax
This is which I get while trying to execute this :-> !deepFilter /content/test_audio_053830.wav --output-dir /content
Anyone got into this kind of issue??
DO let me know the solution / how can I run this.
I am working on training setup. I got below error in run_epoch function in [train.py.]
ERROR:-
File "/DeepFilterNet-main/DeepFilterNet/df/train.py", line 479, in
main()
└ <function main at 0x7f74abd598b0>
File "/DeepFilterNet-main/DeepFilterNet/df/train.py", line 173, in main
train_loss = run_epoch(
└ <function run_epoch at 0x7f74abd665e0>
File "/DeepFilterNet-main/DeepFilterNet/df/train.py", line 265, in run_epoch
assert batch.feat_spec is not None
│ └ None
└ Batch of size 1:
length: 240000
snr: -5
gain: 0
AssertionError: assert batch.feat_spec is not None
python df/train.py ../assets/dataset.cfg ../assets/ df/new_config
Config file used
{
"test": [
[
"clean.hdf5",
100
],
[
"noise.hdf5",
10
]
],
"train": [
[
"clean.hdf5",
10000
],
[
"noise.hdf5",
10
]
],
"valid": [
[
"clean.hdf5",
100
],
[
"noise.hdf5",
10
]
]
}
I used above configuration file(dataset.cfg file available in Deepfilternet-main/assets/ directory).
It is saying that batch.feat_spec doesn't contain any information. Do we need to write batch.feat_spec and batch.feat_erb while writing to hdf5 file itself or dataloader function will extract these features?? .
Is train.py using predefined dataloader from torch.utils.data or it is taking from "C:\DFnet\DeepFilterNet-main\pyDF-data\libdfdata\torch_dataloader".
Can u help me out to resolve this error (AssertionError: assert batch.feat_spec is not None). Hope to hear from you soon.
Thanks in advance.
Hi!First, really great work and thanks for open source and everything. But I have a few
Thanks!
Hello,
When I try to run the df/train.py file, I get ModuleNotFoundError: No module named 'libdfdata' error. I understand the libdfdata is found in pyDF-data folder.
Is it possible to fix this or is there any modification required from my side?
how can I just avoid using the Rust library altogether? Let's say I want to do everything in python, how can I get "spec, erb_feat, spec_feat" without using rust and just python?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.