deepest-project / melnet Goto Github PK

View Code? Open in Web Editor NEW

207.0 23.0 38.0 179 KB

Implementation of "MelNet: A Generative Model for Audio in the Frequency Domain"

License: MIT License

Python 100.00%

tts generative-model pytorch

melnet's Introduction

MelNet

Implementation of MelNet: A Generative Model for Audio in the Frequency Domain

Prerequisites

Tested with Python 3.6.8 & 3.7.4, PyTorch 1.2.0 & 1.3.0.
pip install -r requirements.txt

How to train

Datasets

Blizzard, VoxCeleb2, and KSS have YAML files provided under config/. For other datasets, fill out your own YAML file according to the other provided ones.
Unconditional training is possible for all kinds of datasets, provided that they have a consistent file extension specified by data.extension within the YAML file.
Conditional training is currently only implemented for KSS and a subset of the Blizzard dataset.

Running the code

python trainer.py -c [config YAML file path] -n [name of run] -t [tier number] -b [batch size] -s [TTS]
- Each tier can be trained separately. Since each tier is larger than the one before it (with the exception of tier 1), modify the batch size for each tier.
  - Tier 6 of the Blizzard dataset does not fit on a 16GB P100, even with a batch size of 1.
- The -s flag is a boolean for determining whether to train a TTS tier. Since a TTS tier only differs at tier 1, this flag is ignored when [tier number] != 0 . Warning: this flag is toggled True no matter what follows the flag. Ignore it if you're not planning to use it.

How to sample

Preparing the checkpoints

The checkpoints must be stored under chkpt/.
A YAML file named inference.yaml must be provided under config/.
inference.yaml must specify the number of tiers, the names of the checkpoints, and whether or not it is a conditional generation.

Running the code

python inference.py -c [config YAML file path] -p [inference YAML file path] -t [timestep of generated mel spectrogram] -n [name of sample] -i [input sentence for conditional generation]
- Timestep refers to the length of the mel spectrogram. The ratio of timestep to seconds is roughly [sample rate] : [hop length of FFT].
- The -i flag is optional, only needed for conditional generation. Surround the sentence with "" and end with ..
- Both unconditional generation and conditional generation currently does not support primed generation (extrapolating from provided data).

To-do

Implementation authors

Seungwon Park, June Young Yi, Yoonhyung Lee, Joowhan Song @ Deepest Season 6

License

MIT License

melnet's People

Contributors

Stargazers

Watchers

melnet's Issues

I keep getting thi8s issue when i try to run the trainer.py

Fixes at 2019.10.16

Fix Joonyoung Lee to June Young Yi
enable configuring audio file extension
add optimizer parameters at each yaml files
make two default yaml files consistent

where is the Multiscale Modelling？

I read the code，and don`t see Multiscale Modelling and Conditioning……

RuntimeError: CUDA out of memory.

After doing
!python trainer.py -c ./config/kss.yaml -n testrun -t 1 -b 30 -s testtts
(take in account that I don't what are most of those parameters for)

I get:

100% 12854/12854 [00:00<00:00, 161586.27it/s]
100% 12854/12854 [00:00<00:00, 280440.60it/s]
Train data loader:   0% 0/368 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/content/MelNet/utils/train.py", line 98, in train
    audio_lengths.cuda(non_blocking=True)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/MelNet/model/tts.py", line 143, in forward
    h_t, h_f, h_c = layer(h_t, h_f, h_c, audio_lengths)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/MelNet/model/rnn.py", line 75, in forward
    h_t_yz, _ = self.t_delay_RNN_yz(h_t_yz_temp)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/rnn.py", line 564, in forward
    return self.forward_tensor(input, hx)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/rnn.py", line 543, in forward_tensor
    output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/rnn.py", line 526, in forward_impl
    self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: CUDA out of memory. Tried to allocate 1.35 GiB (GPU 0; 14.76 GiB total capacity; 12.76 GiB already allocated; 1.01 GiB free; 141.28 MiB cached)
Train data loader:   0% 0/368 [01:04<?, ?it/s]

MelNet cannot work as an independent vocoder currently, right?

Both unconditional generation and conditional generation currently does not support primed generation (extrapolating from provided data).

It seems we cannot feed mel spectrograms directly to output wav files just like MelGAN do. Is that your meaning?

Can you please provide model weights for inference?

inference. py: error: argument -t/--timestep: invalid int value

I have a problem with fiiguring out whats a valid timestep number because there are like no information about it. A general guide on what all the different variable means like batch_size, timestep etc would be nice and a general range of numers allowed.

Because for example that is not working : python inference.py -c [E:\Programming stuff\AI\AI music\MelNet\MelNet-master\config\blizzard.yaml] -p [E:\Programming stuff\AI\AI music\MelNet\MelNet-master\config\inference.yaml] -t [240] -n [result]
.

Thank you for your future help

Training Pipeline + Steps for training TTS

Hi,
Thanks for this clean and great implementation for MelNet.
I'm a beginner in Speech Synthesis so kindly guide me through the steps for training MelNet for TTS:
What I know/assume:

Training will be done separately for tiers and for TTS, we'll use the tier flag set to 1 and tts flag set to True
For subsequent tiers, we will set tier flag to 2,3,4,5,6 respectively and tts flag to False.
Finally we will put checkpoints for each tier in inference.yaml and pass it to MelNet class for prediction.

Therefore I have some questions:

Can you provide/confirm the steps to train multiple tiers for the TTS option?
Are we supposed to train TTS (with --tts flag set to True) and keeping tier number = 1?
What do you mean by this in README.md:
The -s flag is a boolean for determining whether to train a TTS tier. Since a TTS tier only differs at tier 1, this flag is ignored when [tier number] != 0 .
- And where is this condition in the code which you referred here: [tier number] != 0
- I assume this means we should ignore tts flag in case tier number > 2?
What is the difference between tts arg for trainer and tier number in config file (YAML) and should they be same? If not then what is the difference?
How do we know that our model (for each tier) has converged? What is the minimum train/test loss value we should achieve. What was your training time and on what GPU
Lastly, can we generate Mel outputs from different trained tier models? Like if we have TTS model + some consecutive tier and we can infer the output to check training performance.

RuntimeError: Error(s) in loading state_dict for DataParallel

I have the following problem when doing inference :(

  File "inference.py", line 36, in <module>
    model.load_tiers()
  File "/content/MelNet/model/model.py", line 96, in load_tiers
    self.tiers[idx+1].load_state_dict(checkpoint['model'])
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 845, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
	Missing key(s) in state_dict: "module.W_t.weight", "module.W_t.bias", "module.layers.0.rnn_x.weight_ih_l0", "module.layers.0.rnn_x.weight_hh_l0", "module.layers.0.rnn_x.bias_ih_l0", "module.layers.0.rnn_x.bias_hh_l0", "module.layers.0.rnn_x.weight_ih_l0_reverse", "module.layers.0.rnn_x.weight_hh_l0_reverse", "module.layers.0.rnn_x.bias_ih_l0_reverse", "module.layers.0.rnn_x.bias_hh_l0_reverse", "module.layers.0.rnn_y.weight_ih_l0", "module.layers.0.rnn_y.weight_hh_l0", "module.layers.0.rnn_y.bias_ih_l0", "module.layers.0.rnn_y.bias_hh_l0", "module.layers.0.rnn_y.weight_ih_l0_reverse", "module.layers.0.rnn_y.weight_hh_l0_reverse", "module.layers.0.rnn_y.bias_ih_l0_reverse", "module.layers.0.rnn_y.bias_hh_l0_reverse", "module.layers.0.W.weight", "module.layers.0.W.bias", "module.layers.1.rnn_x.weight_ih_l0", "module.layers.1.rnn_x.weight_hh_l0", "module.layers.1.rnn_x.bias_ih_l0", "module.layers.1.rnn_x.bias_hh_l0", "module.layers.1.rnn_x.weight_ih_l0_reverse", "module.layers.1.rnn_x.weight_hh_l0_reverse", "module.layers.1.rnn_x.bias_ih_l0_reverse", "module.layers.1.rnn_x.bias_hh_l0_reverse", "module.layers.1.rnn_y.weight_ih_l0", "module.layers.1.rnn_y.weight_hh_l0", "module.layers.1.rnn_y.bias_ih_l0", "module.layers.1.rnn_y.bias_hh_l0", "module.layers.1.rnn_y.weight_ih_l0_reverse", "module.layers.1.rnn_y.weight_hh_l0_reverse", "module.layers.1.rnn_y.bias_ih_l0_reverse", "module.layers.1.rnn_y.bias_hh_l0_reverse", "module.layers.1.W.weight", "module.layers.1.W.bias", "module.layers.2.rnn_x.weight_ih_l0", "module.layers.2.rnn_x.weight_hh_l0", "module.layers.2.rnn_x.bias_ih_l0", "module.layers.2.rnn_x.bias_hh_l0", "module.layers.2.rnn_x.weight_ih_l0_reverse", "module.layers.2.rnn_x.weight_hh_l0_reverse", "module.layers.2.rnn_x.bias_ih_l0_reverse", "module.layers.2.rnn_x.bias_hh_l0_reverse", "module.layers.2.rnn_y.weight_ih_l0", "module.layers.2.rnn_y.weight_hh_l0", "module.layers.2.rnn_y.bias_ih_l0", "module.layers.2.rnn_y.bias_hh_l0", "module.layers.2.rnn_y.weight_ih_l0_reverse", "module.layers.2.rnn_y.weight_hh_l0_reverse", "module.layers.2.rnn_y.bias_ih_l0_reverse", "module.layers.2.rnn_y.bias_hh_l0_reverse", "module.layers.2.W.weight", "module.layers.2.W.bias", "module.layers.3.rnn_x.weight_ih_l0", "module.layers.3.rnn_x.weight_hh_l0", "module.layers.3.rnn_x.bias_ih_l0", "module.layers.3.rnn_x.bias_hh_l0", "module.layers.3.rnn_x.weight_ih_l0_reverse", "module.layers.3.rnn_x.weight_hh_l0_reverse", "module.layers.3.rnn_x.bias_ih_l0_reverse", "module.layers.3.rnn_x.bias_hh_l0_reverse", "module.layers.3.rnn_y.weight_ih_l0", "module.layers.3.rnn_y.weight_hh_l0", "module.layers.3.rnn_y.bias_ih_l0", "module.layers.3.rnn_y.bias_hh_l0", "module.layers.3.rnn_y.weight_ih_l0_reverse", "module.layers.3.rnn_y.weight_hh_l0_reverse", "module.layers.3.rnn_y.bias_ih_l0_reverse", "module.layers.3.rnn_y.bias_hh_l0_reverse", "module.layers.3.W.weight", "module.layers.3.W.bias", "module.layers.4.rnn_x.weight_ih_l0", "module.layers.4.rnn_x.weight_hh_l0", "module.layers.4.rnn_x.bias_ih_l0", "module.layers.4.rnn_x.bias_hh_l0", "module.layers.4.rnn_x.weight_ih_l0_reverse", "module.layers.4.rnn_x.weight_hh_l0_reverse", "module.layers.4.rnn_x.bias_ih_l0_reverse", "module.layers.4.rnn_x.bias_hh_l0_reverse", "module.layers.4.rnn_y.weight_ih_l0", "module.layers.4.rnn_y.weight_hh_l0", "module.layers.4.rnn_y.bias_ih_l0", "module.layers.4.rnn_y.bias_hh_l0", "module.layers.4.rnn_y.weight_ih_l0_reverse", "module.layers.4.rnn_y.weight_hh_l0_reverse", "module.layers.4.rnn_y.bias_ih_l0_reverse", "module.layers.4.rnn_y.bias_hh_l0_reverse", "module.layers.4.W.weight", "module.layers.4.W.bias". 
	Unexpected key(s) in state_dict: "module.W_t_0.weight", "module.W_t_0.bias", "module.W_f_0.weight", "module.W_f_0.bias", "module.W_c_0.weight", "module.W_c_0.bias", "module.embedding_text.weight", "module.text_lstm.weight_ih_l0", "module.text_lstm.weight_hh_l0", "module.text_lstm.bias_ih_l0", "module.text_lstm.bias_hh_l0", "module.text_lstm.weight_ih_l0_reverse", "module.text_lstm.weight_hh_l0_reverse", "module.text_lstm.bias_ih_l0_reverse", "module.text_lstm.bias_hh_l0_reverse", "module.attention.rnn_cell.weight_ih", "module.attention.rnn_cell.weight_hh", "module.attention.rnn_cell.bias_ih", "module.attention.rnn_cell.bias_hh", "module.attention.W_g.weight", "module.attention.W_g.bias", "module.layers.5.t_delay_RNN_x.weight_ih_l0", "module.layers.5.t_delay_RNN_x.weight_hh_l0", "module.layers.5.t_delay_RNN_x.bias_ih_l0", "module.layers.5.t_delay_RNN_x.bias_hh_l0", "module.layers.5.t_delay_RNN_yz.weight_ih_l0", "module.layers.5.t_delay_RNN_yz.weight_hh_l0", "module.layers.5.t_delay_RNN_yz.bias_ih_l0", "module.layers.5.t_delay_RNN_yz.bias_hh_l0", "module.layers.5.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.5.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.5.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.5.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.5.c_RNN.weight_ih_l0", "module.layers.5.c_RNN.weight_hh_l0", "module.layers.5.c_RNN.bias_ih_l0", "module.layers.5.c_RNN.bias_hh_l0", "module.layers.5.f_delay_RNN.weight_ih_l0", "module.layers.5.f_delay_RNN.weight_hh_l0", "module.layers.5.f_delay_RNN.bias_ih_l0", "module.layers.5.f_delay_RNN.bias_hh_l0", "module.layers.5.W_t.weight", "module.layers.5.W_t.bias", "module.layers.5.W_c.weight", "module.layers.5.W_c.bias", "module.layers.5.W_f.weight", "module.layers.5.W_f.bias", "module.layers.6.t_delay_RNN_x.weight_ih_l0", "module.layers.6.t_delay_RNN_x.weight_hh_l0", "module.layers.6.t_delay_RNN_x.bias_ih_l0", "module.layers.6.t_delay_RNN_x.bias_hh_l0", "module.layers.6.t_delay_RNN_yz.weight_ih_l0", "module.layers.6.t_delay_RNN_yz.weight_hh_l0", "module.layers.6.t_delay_RNN_yz.bias_ih_l0", "module.layers.6.t_delay_RNN_yz.bias_hh_l0", "module.layers.6.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.6.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.6.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.6.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.6.c_RNN.weight_ih_l0", "module.layers.6.c_RNN.weight_hh_l0", "module.layers.6.c_RNN.bias_ih_l0", "module.layers.6.c_RNN.bias_hh_l0", "module.layers.6.f_delay_RNN.weight_ih_l0", "module.layers.6.f_delay_RNN.weight_hh_l0", "module.layers.6.f_delay_RNN.bias_ih_l0", "module.layers.6.f_delay_RNN.bias_hh_l0", "module.layers.6.W_t.weight", "module.layers.6.W_t.bias", "module.layers.6.W_c.weight", "module.layers.6.W_c.bias", "module.layers.6.W_f.weight", "module.layers.6.W_f.bias", "module.layers.7.t_delay_RNN_x.weight_ih_l0", "module.layers.7.t_delay_RNN_x.weight_hh_l0", "module.layers.7.t_delay_RNN_x.bias_ih_l0", "module.layers.7.t_delay_RNN_x.bias_hh_l0", "module.layers.7.t_delay_RNN_yz.weight_ih_l0", "module.layers.7.t_delay_RNN_yz.weight_hh_l0", "module.layers.7.t_delay_RNN_yz.bias_ih_l0", "module.layers.7.t_delay_RNN_yz.bias_hh_l0", "module.layers.7.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.7.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.7.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.7.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.7.c_RNN.weight_ih_l0", "module.layers.7.c_RNN.weight_hh_l0", "module.layers.7.c_RNN.bias_ih_l0", "module.layers.7.c_RNN.bias_hh_l0", "module.layers.7.f_delay_RNN.weight_ih_l0", "module.layers.7.f_delay_RNN.weight_hh_l0", "module.layers.7.f_delay_RNN.bias_ih_l0", "module.layers.7.f_delay_RNN.bias_hh_l0", "module.layers.7.W_t.weight", "module.layers.7.W_t.bias", "module.layers.7.W_c.weight", "module.layers.7.W_c.bias", "module.layers.7.W_f.weight", "module.layers.7.W_f.bias", "module.layers.8.t_delay_RNN_x.weight_ih_l0", "module.layers.8.t_delay_RNN_x.weight_hh_l0", "module.layers.8.t_delay_RNN_x.bias_ih_l0", "module.layers.8.t_delay_RNN_x.bias_hh_l0", "module.layers.8.t_delay_RNN_yz.weight_ih_l0", "module.layers.8.t_delay_RNN_yz.weight_hh_l0", "module.layers.8.t_delay_RNN_yz.bias_ih_l0", "module.layers.8.t_delay_RNN_yz.bias_hh_l0", "module.layers.8.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.8.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.8.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.8.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.8.c_RNN.weight_ih_l0", "module.layers.8.c_RNN.weight_hh_l0", "module.layers.8.c_RNN.bias_ih_l0", "module.layers.8.c_RNN.bias_hh_l0", "module.layers.8.f_delay_RNN.weight_ih_l0", "module.layers.8.f_delay_RNN.weight_hh_l0", "module.layers.8.f_delay_RNN.bias_ih_l0", "module.layers.8.f_delay_RNN.bias_hh_l0", "module.layers.8.W_t.weight", "module.layers.8.W_t.bias", "module.layers.8.W_c.weight", "module.layers.8.W_c.bias", "module.layers.8.W_f.weight", "module.layers.8.W_f.bias", "module.layers.9.t_delay_RNN_x.weight_ih_l0", "module.layers.9.t_delay_RNN_x.weight_hh_l0", "module.layers.9.t_delay_RNN_x.bias_ih_l0", "module.layers.9.t_delay_RNN_x.bias_hh_l0", "module.layers.9.t_delay_RNN_yz.weight_ih_l0", "module.layers.9.t_delay_RNN_yz.weight_hh_l0", "module.layers.9.t_delay_RNN_yz.bias_ih_l0", "module.layers.9.t_delay_RNN_yz.bias_hh_l0", "module.layers.9.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.9.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.9.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.9.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.9.c_RNN.weight_ih_l0", "module.layers.9.c_RNN.weight_hh_l0", "module.layers.9.c_RNN.bias_ih_l0", "module.layers.9.c_RNN.bias_hh_l0", "module.layers.9.f_delay_RNN.weight_ih_l0", "module.layers.9.f_delay_RNN.weight_hh_l0", "module.layers.9.f_delay_RNN.bias_ih_l0", "module.layers.9.f_delay_RNN.bias_hh_l0", "module.layers.9.W_t.weight", "module.layers.9.W_t.bias", "module.layers.9.W_c.weight", "module.layers.9.W_c.bias", "module.layers.9.W_f.weight", "module.layers.9.W_f.bias", "module.layers.10.t_delay_RNN_x.weight_ih_l0", "module.layers.10.t_delay_RNN_x.weight_hh_l0", "module.layers.10.t_delay_RNN_x.bias_ih_l0", "module.layers.10.t_delay_RNN_x.bias_hh_l0", "module.layers.10.t_delay_RNN_yz.weight_ih_l0", "module.layers.10.t_delay_RNN_yz.weight_hh_l0", "module.layers.10.t_delay_RNN_yz.bias_ih_l0", "module.layers.10.t_delay_RNN_yz.bias_hh_l0", "module.layers.10.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.10.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.10.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.10.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.10.c_RNN.weight_ih_l0", "module.layers.10.c_RNN.weight_hh_l0", "module.layers.10.c_RNN.bias_ih_l0", "module.layers.10.c_RNN.bias_hh_l0", "module.layers.10.f_delay_RNN.weight_ih_l0", "module.layers.10.f_delay_RNN.weight_hh_l0", "module.layers.10.f_delay_RNN.bias_ih_l0", "module.layers.10.f_delay_RNN.bias_hh_l0", "module.layers.10.W_t.weight", "module.layers.10.W_t.bias", "module.layers.10.W_c.weight", "module.layers.10.W_c.bias", "module.layers.10.W_f.weight", "module.layers.10.W_f.bias", "module.layers.11.t_delay_RNN_x.weight_ih_l0", "module.layers.11.t_delay_RNN_x.weight_hh_l0", "module.layers.11.t_delay_RNN_x.bias_ih_l0", "module.layers.11.t_delay_RNN_x.bias_hh_l0", "module.layers.11.t_delay_RNN_yz.weight_ih_l0", "module.layers.11.t_delay_RNN_yz.weight_hh_l0", "module.layers.11.t_delay_RNN_yz.bias_ih_l0", "module.layers.11.t_delay_RNN_yz.bias_hh_l0", "module.layers.11.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.11.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.11.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.11.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.11.c_RNN.weight_ih_l0", "module.layers.11.c_RNN.weight_hh_l0", "module.layers.11.c_RNN.bias_ih_l0", "module.layers.11.c_RNN.bias_hh_l0", "module.layers.11.f_delay_RNN.weight_ih_l0", "module.layers.11.f_delay_RNN.weight_hh_l0", "module.layers.11.f_delay_RNN.bias_ih_l0", "module.layers.11.f_delay_RNN.bias_hh_l0", "module.layers.11.W_t.weight", "module.layers.11.W_t.bias", "module.layers.11.W_c.weight", "module.layers.11.W_c.bias", "module.layers.11.W_f.weight", "module.layers.11.W_f.bias", "module.layers.0.t_delay_RNN_x.weight_ih_l0", "module.layers.0.t_delay_RNN_x.weight_hh_l0", "module.layers.0.t_delay_RNN_x.bias_ih_l0", "module.layers.0.t_delay_RNN_x.bias_hh_l0", "module.layers.0.t_delay_RNN_yz.weight_ih_l0", "module.layers.0.t_delay_RNN_yz.weight_hh_l0", "module.layers.0.t_delay_RNN_yz.bias_ih_l0", "module.layers.0.t_delay_RNN_yz.bias_hh_l0", "module.layers.0.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.0.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.0.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.0.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.0.c_RNN.weight_ih_l0", "module.layers.0.c_RNN.weight_hh_l0", "module.layers.0.c_RNN.bias_ih_l0", "module.layers.0.c_RNN.bias_hh_l0", "module.layers.0.f_delay_RNN.weight_ih_l0", "module.layers.0.f_delay_RNN.weight_hh_l0", "module.layers.0.f_delay_RNN.bias_ih_l0", "module.layers.0.f_delay_RNN.bias_hh_l0", "module.layers.0.W_t.weight", "module.layers.0.W_t.bias", "module.layers.0.W_c.weight", "module.layers.0.W_c.bias", "module.layers.0.W_f.weight", "module.layers.0.W_f.bias", "module.layers.1.t_delay_RNN_x.weight_ih_l0", "module.layers.1.t_delay_RNN_x.weight_hh_l0", "module.layers.1.t_delay_RNN_x.bias_ih_l0", "module.layers.1.t_delay_RNN_x.bias_hh_l0", "module.layers.1.t_delay_RNN_yz.weight_ih_l0", "module.layers.1.t_delay_RNN_yz.weight_hh_l0", "module.layers.1.t_delay_RNN_yz.bias_ih_l0", "module.layers.1.t_delay_RNN_yz.bias_hh_l0", "module.layers.1.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.1.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.1.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.1.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.1.c_RNN.weight_ih_l0", "module.layers.1.c_RNN.weight_hh_l0", "module.layers.1.c_RNN.bias_ih_l0", "module.layers.1.c_RNN.bias_hh_l0", "module.layers.1.f_delay_RNN.weight_ih_l0", "module.layers.1.f_delay_RNN.weight_hh_l0", "module.layers.1.f_delay_RNN.bias_ih_l0", "module.layers.1.f_delay_RNN.bias_hh_l0", "module.layers.1.W_t.weight", "module.layers.1.W_t.bias", "module.layers.1.W_c.weight", "module.layers.1.W_c.bias", "module.layers.1.W_f.weight", "module.layers.1.W_f.bias", "module.layers.2.t_delay_RNN_x.weight_ih_l0", "module.layers.2.t_delay_RNN_x.weight_hh_l0", "module.layers.2.t_delay_RNN_x.bias_ih_l0", "module.layers.2.t_delay_RNN_x.bias_hh_l0", "module.layers.2.t_delay_RNN_yz.weight_ih_l0", "module.layers.2.t_delay_RNN_yz.weight_hh_l0", "module.layers.2.t_delay_RNN_yz.bias_ih_l0", "module.layers.2.t_delay_RNN_yz.bias_hh_l0", "module.layers.2.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.2.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.2.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.2.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.2.c_RNN.weight_ih_l0", "module.layers.2.c_RNN.weight_hh_l0", "module.layers.2.c_RNN.bias_ih_l0", "module.layers.2.c_RNN.bias_hh_l0", "module.layers.2.f_delay_RNN.weight_ih_l0", "module.layers.2.f_delay_RNN.weight_hh_l0", "module.layers.2.f_delay_RNN.bias_ih_l0", "module.layers.2.f_delay_RNN.bias_hh_l0", "module.layers.2.W_t.weight", "module.layers.2.W_t.bias", "module.layers.2.W_c.weight", "module.layers.2.W_c.bias", "module.layers.2.W_f.weight", "module.layers.2.W_f.bias", "module.layers.3.t_delay_RNN_x.weight_ih_l0", "module.layers.3.t_delay_RNN_x.weight_hh_l0", "module.layers.3.t_delay_RNN_x.bias_ih_l0", "module.layers.3.t_delay_RNN_x.bias_hh_l0", "module.layers.3.t_delay_RNN_yz.weight_ih_l0", "module.layers.3.t_delay_RNN_yz.weight_hh_l0", "module.layers.3.t_delay_RNN_yz.bias_ih_l0", "module.layers.3.t_delay_RNN_yz.bias_hh_l0", "module.layers.3.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.3.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.3.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.3.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.3.c_RNN.weight_ih_l0", "module.layers.3.c_RNN.weight_hh_l0", "module.layers.3.c_RNN.bias_ih_l0", "module.layers.3.c_RNN.bias_hh_l0", "module.layers.3.f_delay_RNN.weight_ih_l0", "module.layers.3.f_delay_RNN.weight_hh_l0", "module.layers.3.f_delay_RNN.bias_ih_l0", "module.layers.3.f_delay_RNN.bias_hh_l0", "module.layers.3.W_t.weight", "module.layers.3.W_t.bias", "module.layers.3.W_c.weight", "module.layers.3.W_c.bias", "module.layers.3.W_f.weight", "module.layers.3.W_f.bias", "module.layers.4.t_delay_RNN_x.weight_ih_l0", "module.layers.4.t_delay_RNN_x.weight_hh_l0", "module.layers.4.t_delay_RNN_x.bias_ih_l0", "module.layers.4.t_delay_RNN_x.bias_hh_l0", "module.layers.4.t_delay_RNN_yz.weight_ih_l0", "module.layers.4.t_delay_RNN_yz.weight_hh_l0", "module.layers.4.t_delay_RNN_yz.bias_ih_l0", "module.layers.4.t_delay_RNN_yz.bias_hh_l0", "module.layers.4.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.4.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.4.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.4.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.4.c_RNN.weight_ih_l0", "module.layers.4.c_RNN.weight_hh_l0", "module.layers.4.c_RNN.bias_ih_l0", "module.layers.4.c_RNN.bias_hh_l0", "module.layers.4.f_delay_RNN.weight_ih_l0", "module.layers.4.f_delay_RNN.weight_hh_l0", "module.layers.4.f_delay_RNN.bias_ih_l0", "module.layers.4.f_delay_RNN.bias_hh_l0", "module.layers.4.W_t.weight", "module.layers.4.W_t.bias", "module.layers.4.W_c.weight", "module.layers.4.W_c.bias", "module.layers.4.W_f.weight", "module.layers.4.W_f.bias".

config parameter

Hi, I wanted to train the MelNet with my own dataset.
There are some audio setting that I still not understand since I'm very new to this signal processing/speech field. Can someone elaborate me or give me reference for me to understand what are the meaning of these setting :

audio:
  sr: 16000
  duration: 6.0
  n_mels: 180
  hop_length: 180
  win_length: 1080
  n_fft: 1080
  num_freq: 541
  ref_level_db: 20.0
  min_level_db: -80.0

Thanks in advance

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

deepest-project / melnet Goto Github PK

melnet's Introduction

MelNet

Prerequisites

How to train

Datasets

Running the code

How to sample

Preparing the checkpoints

Running the code

To-do

Implementation authors

License

melnet's People

Contributors

Stargazers

Watchers

Forkers

melnet's Issues

Recommend Projects

Recommend Topics

Recommend Org