Code Monkey home page Code Monkey logo

rnnoise_16k's Introduction

RNNoise training for 16K audio

Notification

This project is refered to Dr.Jean-Marc Valin efforts from RNNoise: Learning Noise Suppression

References: Paper: A Hybrid DSP/Deep Learning Approach toReal-Time Full-Band Speech Enhancement

Original Github Repo: RNNoise Original Project

How to use

This project is done one year ago when I started doing NS things, so codes are not well organized. If you have any questions, feel free to ask.

This can code is able to accepct a directory of wav file for training rather than raw file.

Following the CMakeLists.txt for compiling the projcet The src/denoice.c is the main thing on modification from 48k -> 16k, and training/run.sh is how to train in 16k audio.

you also need to check src/compile.sh for compiling src directory,

Pay attention, I use src/denoise.c for feature extractions. src/denoise16.c is something that I did for experiments.

if you wanna use less frames or more frames for training, modify the main function variable count inside the src/denoise.c

The whole process is:

  • cd src
  • bash compile.sh which will generate binary for creating mix features and labels, use denoise.c inside compile.sh
  • ./src/denoise_training /data/speech_dir /data/noise_dir mixed.wav > training_16k_v3.f32 the mixed.wav is the raw file which you can check whether wavs have been mixed
  • python bin2hdf5.py training_16k_v3.f32 80000000 75 training_16k_v3.h5
  • python rnn_train_16k.py
  • python dump_rnn.py weights.hdf5 rnn_data.c rnn_data.rnnn name

Replace with new trained model

if you follow the instructions and training/run.sh, new rnn_data.c and rnn_data.h which are come from your new trained model will be generated. Replace the old rnn_data.c and rnn_data.h in src directory with the new one, using CMakeList.txt in the working directory,

  • cmake .
  • make

the binary file will be generated in bin directory, you can also change the name of your binary inside CMakeList.txt

The way to use binary file

Binary File <Input Noisy File> <Output Path>

e.g:

./bin/rnn_gao_new noisy.wav out.wav

rnnoise_16k's People

Contributors

liuxs0 avatar yongyug avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

rnnoise_16k's Issues

How to change the count?

hello,I want to know how to change the count of 80000000?because when I just want to try to train the model,the feature extracting costed a lot of time. Hope you can answer it,thank you very much!

Features extraction

Hi guys,

I would like to know if someone faced some issues about the first step of features extraction with the denoise file, I'm trying to retrain the model with 16khz audio but it looks like after running denoise.c, I can't get the proper shape when I run bin2hdf5, I even checked the values of the training.f32 file and I got high values or NaN. I didn't modify the code except for the count, I fixed it to 500000, I used Microsoft Dataset available here : https://github.com/microsoft/MS-SNSD.

Maybe my datasets didn't fit with the code, can I ask you what datasets did you use for training your model, it would help me a lot @YongyuG

denoise_training脚本处理数据的相关问题

音频数据和噪音数据已放在指定的文件夹。我修改了denoise.c中的count数量为10000。随后进行了compile编译,得到了denoise_training文件,当我使用该文件处理数据时会报错
count ==== 0
Floating point exception (core dumped)
执行脚本为:
./denoise_training /home/rnnoise_16k-master/src/media/clean_dir /home/rnnoise_16k-master/src/media/noise_dir 10000 > training_16k_v3.f32

但我使用您提供的denoise_training_gao脚本处理则不会报错,但也不会在指定的count数量后停止。

在编译denoise.c文件时,出现一下提示,不知道是不是编译失败导致数据处理程序不可用?
编译提示
期待您的解答!

make error : “multiple definition of `main'”

When make the CMakeLists.txt in the working directory, there are many “multiple definition" errors. Because the main.c and the denoise.c both have a main function (and some other same functions). How did you solve this error? thanks.

Some question about this 16k model

Thanks for sharing this 16 k model. However, there are still exist some questions that make me confusing.

(1) As far as I know, pitch feather is important. The original code about pitch analysis is conducted in 48 kHz. In your example, only the macros PITCH_MIN_PERIOD, PITCH_MAX_PERIOD and PITCH_FRAME_SIZE are revised. So, have you ever tested wheter the code could extract pitch correctly with only small revise?

(2) Have you ever test this model with the performance of PESQ? Using the original RNNOISE code, the PESQ is good, however, using the weight with my own training, it is not very good.

Thanks very much.

about the matrix size and count

I use denoise.c to extract features. And I changed count to 100000, so it printed 100000*75 on the screen. However when I run bin2hdf5.py , an error occurs: "cannot reshape array of size 7555881 into shape (100000,75)" . Why? looking forward to your reply, thank you.

When I cmake the project ,there is a problem

1587547866(1)

the second problem is :

if I want to train my data ,in the run.sh,I need prepared a speech.wav and a noise.wav ,but,in the run.sh,mixed.wav, need I prepare ? Or,it will auto generate?

Floating point exception (core dumped)

1、src/compile.sh
2、../src/denoise_training ./data/clean ./data/noise ./mixed.wav > training_16k_v3.f32
然后报了Floating point exception (core dumped)的错误,我没改动代码

Training with 8khz audio

Hi guys,

I would like to know what changes do I need to do if I want to train a model with 8khz audio, I tried to change some parameters in the denoise.c like this

#define BLOCK_SIZE 8000

#define FRAME_SIZE (20<<FRAME_SIZE_SHIFT)
#define SAMPLE_RATE 8000


#define PITCH_MIN_PERIOD 10
#define PITCH_MAX_PERIOD 128
#define PITCH_FRAME_SIZE 160 

#if SMOOTH_BANDS
//#define NB_BANDS 22
#define NB_BANDS 14
#else
//#define NB_BANDS 21
#define NB_BANDS 13
#endif

After compiling, I got a matrix with shape 500000 x 63, the number of features (63) is for me normal because we have less samples, and even the training is doing well, I do not get high loss or what, it seems to learn something. But at the end of the day, I get sizzle audio and it doesn't seems to do anything with noise reduction.

Sorry for bothering you again @YongyuG, can you provide me some informations about the code, how did you manage to turn it in 16khz and according to you, is it possible or not to turn it in 8khz.

After the network model training is completed, about the problem of testing audio noise reduction

I want to use noisy audio files to test the noise reduction effect of our trained model. How should we operate? I didn't see the relevant description. The source code of 48k is to put the input file before noise reduction and the output file after noise reduction into rnnoise_demo,

.### /examples/rnnoise_demo < input.raw > output.raw

but the file of rnnoise_demo is not seen in the 16k code. I hope the author can answer it. thank you @YongyuG

直接读16bit 16k的pcm出错

我想直接读16bit 16k的pcm进行处理,但是显示"check gain",处理后的音频爆音,查看了作者的读取wav方法,发现使用f32读取后,值应该是【-1,1】,为什么还要在乘36728。
想问一下怎么修改才可以直接处理pcm不出错呢,谢谢

About the dataset

可以分享一下数据集的处理代码吗,刚刚接触语音,不太会做,期望您的回复

When I cmake the project ,there is a problem

2021-02-20 19-32-30屏幕截图
Compiling according to your instructions shows an error as shown. I know little about compilation. Can you elaborate on your own compilation process for me, thank you very much.

特征提取问题

我的噪声数据有18.5h,纯净语音数据10.5h,wav格式,当我在终端执行./denoise_training clean_speech.wav noise_background.wav 50000000 > output.f32,没有成功提取特征,也没有报错,为什么

why pitch_index-300

how this 300 should be in 16K model even in 8K (because I want to change it to 8k )
thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.