yongyug / rnnoise_16k Goto Github PK

View Code? Open in Web Editor NEW

120.0 7.0 40.0 47.44 MB

implementation of rnnoise_16k

License: BSD 3-Clause "New" or "Revised" License

CMake 0.01% Makefile 1.16% C 98.21% Shell 0.02% Python 0.60%

rnnoise_16k's Introduction

RNNoise training for 16K audio

Notification

This project is refered to Dr.Jean-Marc Valin efforts from RNNoise: Learning Noise Suppression

References: Paper: A Hybrid DSP/Deep Learning Approach toReal-Time Full-Band Speech Enhancement

Original Github Repo: RNNoise Original Project

How to use

This project is done one year ago when I started doing NS things, so codes are not well organized. If you have any questions, feel free to ask.

This can code is able to accepct a directory of wav file for training rather than raw file.

Following the CMakeLists.txt for compiling the projcet The src/denoice.c is the main thing on modification from 48k -> 16k, and training/run.sh is how to train in 16k audio.

you also need to check src/compile.sh for compiling src directory,

Pay attention, I use src/denoise.c for feature extractions. src/denoise16.c is something that I did for experiments.

if you wanna use less frames or more frames for training, modify the main function variable count inside the src/denoise.c

The whole process is:

cd src
bash compile.sh which will generate binary for creating mix features and labels, use denoise.c inside compile.sh
./src/denoise_training /data/speech_dir /data/noise_dir mixed.wav > training_16k_v3.f32 the mixed.wav is the raw file which you can check whether wavs have been mixed
python bin2hdf5.py training_16k_v3.f32 80000000 75 training_16k_v3.h5
python rnn_train_16k.py
python dump_rnn.py weights.hdf5 rnn_data.c rnn_data.rnnn name

Replace with new trained model

if you follow the instructions and training/run.sh, new rnn_data.c and rnn_data.h which are come from your new trained model will be generated. Replace the old rnn_data.c and rnn_data.h in src directory with the new one, using CMakeList.txt in the working directory,

cmake .
make

the binary file will be generated in bin directory, you can also change the name of your binary inside CMakeList.txt

The way to use binary file

Binary File <Input Noisy File> <Output Path>

e.g:

./bin/rnn_gao_new noisy.wav out.wav

rnnoise_16k's People

Contributors

Stargazers

Watchers

Forkers

smallsnailrunning templeblock road2018 carlfm01 lupengliu hank-zgh tianmax xiongmaoxia ishine herryfan shrivastavaakash ryuk17 mingmchen yuikns april-lai intflow pandorals muhilanr zhongshijun wantt oucxlw lflyme maxmax2016 gitbenxing shenhark jeffery-work weishanyi bigsealing runngezhang nosleeve1 isgasho ziyi6 fragrantrookie smfan2011 ryanzlay piotrgregor shenxiaozheng baikaishui-ks amorjnyh liangxt2012

rnnoise_16k's Issues

How to change the count？

hello，I want to know how to change the count of 80000000？because when I just want to try to train the model，the feature extracting costed a lot of time. Hope you can answer it，thank you very much！

about training_16k_v3.f32

Hello, may I ask what is the problem that the generated training_16k_v3.f32 file size is 0

Features extraction

Hi guys,

I would like to know if someone faced some issues about the first step of features extraction with the denoise file, I'm trying to retrain the model with 16khz audio but it looks like after running denoise.c, I can't get the proper shape when I run bin2hdf5, I even checked the values of the training.f32 file and I got high values or NaN. I didn't modify the code except for the count, I fixed it to 500000, I used Microsoft Dataset available here : https://github.com/microsoft/MS-SNSD.

Maybe my datasets didn't fit with the code, can I ask you what datasets did you use for training your model, it would help me a lot @YongyuG

请问代码里现有模型是用的什么语音和噪声数据训练的呢

Is the performance of 16K model better

Wondering if you achieved better results with 16 Khz models compared to the original 46Khz models ?

denoise_training脚本处理数据的相关问题

音频数据和噪音数据已放在指定的文件夹。我修改了denoise.c中的count数量为10000。随后进行了compile编译，得到了denoise_training文件，当我使用该文件处理数据时会报错
count ==== 0
Floating point exception (core dumped)
执行脚本为：
./denoise_training /home/rnnoise_16k-master/src/media/clean_dir /home/rnnoise_16k-master/src/media/noise_dir 10000 > training_16k_v3.f32

但我使用您提供的denoise_training_gao脚本处理则不会报错，但也不会在指定的count数量后停止。

在编译denoise.c文件时，出现一下提示，不知道是不是编译失败导致数据处理程序不可用？

期待您的解答！

can i use 128 as framesize when using model , if not, how can I do it ?

我可以在使用模型的时候使用帧长度为128吗，我看现有的模型是使用的帧长度是160。如果在使用模型的时候使用128帧长不合适，那么需要改网络训练的帧长，重新训练吗？

make error : “multiple definition of `main'”

When make the CMakeLists.txt in the working directory, there are many “multiple definition" errors. Because the main.c and the denoise.c both have a main function (and some other same functions). How did you solve this error? thanks.

Some question about this 16k model

Thanks for sharing this 16 k model. However, there are still exist some questions that make me confusing.

(1) As far as I know, pitch feather is important. The original code about pitch analysis is conducted in 48 kHz. In your example, only the macros PITCH_MIN_PERIOD, PITCH_MAX_PERIOD and PITCH_FRAME_SIZE are revised. So, have you ever tested wheter the code could extract pitch correctly with only small revise?

(2) Have you ever test this model with the performance of PESQ? Using the original RNNOISE code, the PESQ is good, however, using the weight with my own training, it is not very good.

Thanks very much.

about the matrix size and count

I use denoise.c to extract features. And I changed count to 100000, so it printed 100000*75 on the screen. However when I run bin2hdf5.py , an error occurs: "cannot reshape array of size 7555881 into shape (100000,75)" . Why? looking forward to your reply, thank you.

source code for denoise_training_gao bin

could you provide source code for denoise_training_gao?
I want to retrain this mode with new data.

got stuck during the training process

Epoch trained for about 20 times, but got stuck during the training process and didn't continue training. What is the likely reason

When I cmake the project ,there is a problem

the second problem is :

if I want to train my data ,in the run.sh,I need prepared a speech.wav and a noise.wav ,but,in the run.sh,mixed.wav, need I prepare ? Or,it will auto generate?

Floating point exception (core dumped)

1、src/compile.sh
2、../src/denoise_training ./data/clean ./data/noise ./mixed.wav > training_16k_v3.f32
然后报了Floating point exception (core dumped)的错误，我没改动代码

Training with 8khz audio

Hi guys,

I would like to know what changes do I need to do if I want to train a model with 8khz audio, I tried to change some parameters in the denoise.c like this

#define BLOCK_SIZE 8000

#define FRAME_SIZE (20<<FRAME_SIZE_SHIFT)
#define SAMPLE_RATE 8000


#define PITCH_MIN_PERIOD 10
#define PITCH_MAX_PERIOD 128
#define PITCH_FRAME_SIZE 160 

#if SMOOTH_BANDS
//#define NB_BANDS 22
#define NB_BANDS 14
#else
//#define NB_BANDS 21
#define NB_BANDS 13
#endif

After compiling, I got a matrix with shape 500000 x 63, the number of features (63) is for me normal because we have less samples, and even the training is doing well, I do not get high loss or what, it seems to learn something. But at the end of the day, I get sizzle audio and it doesn't seems to do anything with noise reduction.

Sorry for bothering you again @YongyuG, can you provide me some informations about the code, how did you manage to turn it in 16khz and according to you, is it possible or not to turn it in 8khz.

After the network model training is completed, about the problem of testing audio noise reduction

I want to use noisy audio files to test the noise reduction effect of our trained model. How should we operate? I didn't see the relevant description. The source code of 48k is to put the input file before noise reduction and the output file after noise reduction into rnnoise_demo,

.### /examples/rnnoise_demo < input.raw > output.raw

but the file of rnnoise_demo is not seen in the 16k code. I hope the author can answer it. thank you @YongyuG