Code Monkey home page Code Monkey logo

melnet's Introduction

MelNet

Implementation of MelNet: A Generative Model for Audio in the Frequency Domain

Prerequisites

  • Tested with Python 3.6.8 & 3.7.4, PyTorch 1.2.0 & 1.3.0.
  • pip install -r requirements.txt

How to train

Datasets

  • Blizzard, VoxCeleb2, and KSS have YAML files provided under config/. For other datasets, fill out your own YAML file according to the other provided ones.
  • Unconditional training is possible for all kinds of datasets, provided that they have a consistent file extension specified by data.extension within the YAML file.
  • Conditional training is currently only implemented for KSS and a subset of the Blizzard dataset.

Running the code

  • python trainer.py -c [config YAML file path] -n [name of run] -t [tier number] -b [batch size] -s [TTS]
    • Each tier can be trained separately. Since each tier is larger than the one before it (with the exception of tier 1), modify the batch size for each tier.
      • Tier 6 of the Blizzard dataset does not fit on a 16GB P100, even with a batch size of 1.
    • The -s flag is a boolean for determining whether to train a TTS tier. Since a TTS tier only differs at tier 1, this flag is ignored when [tier number] != 0 . Warning: this flag is toggled True no matter what follows the flag. Ignore it if you're not planning to use it.

How to sample

Preparing the checkpoints

  • The checkpoints must be stored under chkpt/.
  • A YAML file named inference.yaml must be provided under config/.
  • inference.yaml must specify the number of tiers, the names of the checkpoints, and whether or not it is a conditional generation.

Running the code

  • python inference.py -c [config YAML file path] -p [inference YAML file path] -t [timestep of generated mel spectrogram] -n [name of sample] -i [input sentence for conditional generation]
    • Timestep refers to the length of the mel spectrogram. The ratio of timestep to seconds is roughly [sample rate] : [hop length of FFT].
    • The -i flag is optional, only needed for conditional generation. Surround the sentence with "" and end with ..
    • Both unconditional generation and conditional generation currently does not support primed generation (extrapolating from provided data).

To-do

  • Implement upsampling procedure
  • GMM sampling + loss function
  • Unconditional audio generation
  • TTS synthesis
  • Tensorboard logging
  • Multi-GPU training
  • Primed generation

Implementation authors

License

MIT License

melnet's People

Contributors

leeyoonhyung avatar seungwonpark avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

melnet's Issues

Fixes at 2019.10.16

  • Fix Joonyoung Lee to June Young Yi
  • enable configuring audio file extension
  • add optimizer parameters at each yaml files
  • make two default yaml files consistent

RuntimeError: CUDA out of memory.

After doing
!python trainer.py -c ./config/kss.yaml -n testrun -t 1 -b 30 -s testtts
(take in account that I don't what are most of those parameters for)

I get:

100% 12854/12854 [00:00<00:00, 161586.27it/s]
100% 12854/12854 [00:00<00:00, 280440.60it/s]
Train data loader:   0% 0/368 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/content/MelNet/utils/train.py", line 98, in train
    audio_lengths.cuda(non_blocking=True)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/MelNet/model/tts.py", line 143, in forward
    h_t, h_f, h_c = layer(h_t, h_f, h_c, audio_lengths)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/MelNet/model/rnn.py", line 75, in forward
    h_t_yz, _ = self.t_delay_RNN_yz(h_t_yz_temp)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/rnn.py", line 564, in forward
    return self.forward_tensor(input, hx)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/rnn.py", line 543, in forward_tensor
    output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/rnn.py", line 526, in forward_impl
    self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: CUDA out of memory. Tried to allocate 1.35 GiB (GPU 0; 14.76 GiB total capacity; 12.76 GiB already allocated; 1.01 GiB free; 141.28 MiB cached)
Train data loader:   0% 0/368 [01:04<?, ?it/s]

MelNet cannot work as an independent vocoder currently, right?

Both unconditional generation and conditional generation currently does not support primed generation (extrapolating from provided data).

It seems we cannot feed mel spectrograms directly to output wav files just like MelGAN do. Is that your meaning?

inference. py: error: argument -t/--timestep: invalid int value

I have a problem with fiiguring out whats a valid timestep number because there are like no information about it. A general guide on what all the different variable means like batch_size, timestep etc would be nice and a general range of numers allowed.

Because for example that is not working : python inference.py -c [E:\Programming stuff\AI\AI music\MelNet\MelNet-master\config\blizzard.yaml] -p [E:\Programming stuff\AI\AI music\MelNet\MelNet-master\config\inference.yaml] -t [240] -n [result]
.

Thank you for your future help

Training Pipeline + Steps for training TTS

Hi,
Thanks for this clean and great implementation for MelNet.
I'm a beginner in Speech Synthesis so kindly guide me through the steps for training MelNet for TTS:
What I know/assume:

  • Training will be done separately for tiers and for TTS, we'll use the tier flag set to 1 and tts flag set to True
  • For subsequent tiers, we will set tier flag to 2,3,4,5,6 respectively and tts flag to False.
  • Finally we will put checkpoints for each tier in inference.yaml and pass it to MelNet class for prediction.

Therefore I have some questions:

  • Can you provide/confirm the steps to train multiple tiers for the TTS option?

  • Are we supposed to train TTS (with --tts flag set to True) and keeping tier number = 1?

  • What do you mean by this in README.md:
    The -s flag is a boolean for determining whether to train a TTS tier. Since a TTS tier only differs at tier 1, this flag is ignored when [tier number] != 0 .

    • And where is this condition in the code which you referred here: [tier number] != 0
    • I assume this means we should ignore tts flag in case tier number > 2?
  • What is the difference between tts arg for trainer and tier number in config file (YAML) and should they be same? If not then what is the difference?

  • How do we know that our model (for each tier) has converged? What is the minimum train/test loss value we should achieve. What was your training time and on what GPU

  • Lastly, can we generate Mel outputs from different trained tier models? Like if we have TTS model + some consecutive tier and we can infer the output to check training performance.

RuntimeError: Error(s) in loading state_dict for DataParallel

I have the following problem when doing inference :(

  File "inference.py", line 36, in <module>
    model.load_tiers()
  File "/content/MelNet/model/model.py", line 96, in load_tiers
    self.tiers[idx+1].load_state_dict(checkpoint['model'])
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 845, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
	Missing key(s) in state_dict: "module.W_t.weight", "module.W_t.bias", "module.layers.0.rnn_x.weight_ih_l0", "module.layers.0.rnn_x.weight_hh_l0", "module.layers.0.rnn_x.bias_ih_l0", "module.layers.0.rnn_x.bias_hh_l0", "module.layers.0.rnn_x.weight_ih_l0_reverse", "module.layers.0.rnn_x.weight_hh_l0_reverse", "module.layers.0.rnn_x.bias_ih_l0_reverse", "module.layers.0.rnn_x.bias_hh_l0_reverse", "module.layers.0.rnn_y.weight_ih_l0", "module.layers.0.rnn_y.weight_hh_l0", "module.layers.0.rnn_y.bias_ih_l0", "module.layers.0.rnn_y.bias_hh_l0", "module.layers.0.rnn_y.weight_ih_l0_reverse", "module.layers.0.rnn_y.weight_hh_l0_reverse", "module.layers.0.rnn_y.bias_ih_l0_reverse", "module.layers.0.rnn_y.bias_hh_l0_reverse", "module.layers.0.W.weight", "module.layers.0.W.bias", "module.layers.1.rnn_x.weight_ih_l0", "module.layers.1.rnn_x.weight_hh_l0", "module.layers.1.rnn_x.bias_ih_l0", "module.layers.1.rnn_x.bias_hh_l0", "module.layers.1.rnn_x.weight_ih_l0_reverse", "module.layers.1.rnn_x.weight_hh_l0_reverse", "module.layers.1.rnn_x.bias_ih_l0_reverse", "module.layers.1.rnn_x.bias_hh_l0_reverse", "module.layers.1.rnn_y.weight_ih_l0", "module.layers.1.rnn_y.weight_hh_l0", "module.layers.1.rnn_y.bias_ih_l0", "module.layers.1.rnn_y.bias_hh_l0", "module.layers.1.rnn_y.weight_ih_l0_reverse", "module.layers.1.rnn_y.weight_hh_l0_reverse", "module.layers.1.rnn_y.bias_ih_l0_reverse", "module.layers.1.rnn_y.bias_hh_l0_reverse", "module.layers.1.W.weight", "module.layers.1.W.bias", "module.layers.2.rnn_x.weight_ih_l0", "module.layers.2.rnn_x.weight_hh_l0", "module.layers.2.rnn_x.bias_ih_l0", "module.layers.2.rnn_x.bias_hh_l0", "module.layers.2.rnn_x.weight_ih_l0_reverse", "module.layers.2.rnn_x.weight_hh_l0_reverse", "module.layers.2.rnn_x.bias_ih_l0_reverse", "module.layers.2.rnn_x.bias_hh_l0_reverse", "module.layers.2.rnn_y.weight_ih_l0", "module.layers.2.rnn_y.weight_hh_l0", "module.layers.2.rnn_y.bias_ih_l0", "module.layers.2.rnn_y.bias_hh_l0", "module.layers.2.rnn_y.weight_ih_l0_reverse", "module.layers.2.rnn_y.weight_hh_l0_reverse", "module.layers.2.rnn_y.bias_ih_l0_reverse", "module.layers.2.rnn_y.bias_hh_l0_reverse", "module.layers.2.W.weight", "module.layers.2.W.bias", "module.layers.3.rnn_x.weight_ih_l0", "module.layers.3.rnn_x.weight_hh_l0", "module.layers.3.rnn_x.bias_ih_l0", "module.layers.3.rnn_x.bias_hh_l0", "module.layers.3.rnn_x.weight_ih_l0_reverse", "module.layers.3.rnn_x.weight_hh_l0_reverse", "module.layers.3.rnn_x.bias_ih_l0_reverse", "module.layers.3.rnn_x.bias_hh_l0_reverse", "module.layers.3.rnn_y.weight_ih_l0", "module.layers.3.rnn_y.weight_hh_l0", "module.layers.3.rnn_y.bias_ih_l0", "module.layers.3.rnn_y.bias_hh_l0", "module.layers.3.rnn_y.weight_ih_l0_reverse", "module.layers.3.rnn_y.weight_hh_l0_reverse", "module.layers.3.rnn_y.bias_ih_l0_reverse", "module.layers.3.rnn_y.bias_hh_l0_reverse", "module.layers.3.W.weight", "module.layers.3.W.bias", "module.layers.4.rnn_x.weight_ih_l0", "module.layers.4.rnn_x.weight_hh_l0", "module.layers.4.rnn_x.bias_ih_l0", "module.layers.4.rnn_x.bias_hh_l0", "module.layers.4.rnn_x.weight_ih_l0_reverse", "module.layers.4.rnn_x.weight_hh_l0_reverse", "module.layers.4.rnn_x.bias_ih_l0_reverse", "module.layers.4.rnn_x.bias_hh_l0_reverse", "module.layers.4.rnn_y.weight_ih_l0", "module.layers.4.rnn_y.weight_hh_l0", "module.layers.4.rnn_y.bias_ih_l0", "module.layers.4.rnn_y.bias_hh_l0", "module.layers.4.rnn_y.weight_ih_l0_reverse", "module.layers.4.rnn_y.weight_hh_l0_reverse", "module.layers.4.rnn_y.bias_ih_l0_reverse", "module.layers.4.rnn_y.bias_hh_l0_reverse", "module.layers.4.W.weight", "module.layers.4.W.bias". 
	Unexpected key(s) in state_dict: "module.W_t_0.weight", "module.W_t_0.bias", "module.W_f_0.weight", "module.W_f_0.bias", "module.W_c_0.weight", "module.W_c_0.bias", "module.embedding_text.weight", "module.text_lstm.weight_ih_l0", "module.text_lstm.weight_hh_l0", "module.text_lstm.bias_ih_l0", "module.text_lstm.bias_hh_l0", "module.text_lstm.weight_ih_l0_reverse", "module.text_lstm.weight_hh_l0_reverse", "module.text_lstm.bias_ih_l0_reverse", "module.text_lstm.bias_hh_l0_reverse", "module.attention.rnn_cell.weight_ih", "module.attention.rnn_cell.weight_hh", "module.attention.rnn_cell.bias_ih", "module.attention.rnn_cell.bias_hh", "module.attention.W_g.weight", "module.attention.W_g.bias", "module.layers.5.t_delay_RNN_x.weight_ih_l0", "module.layers.5.t_delay_RNN_x.weight_hh_l0", "module.layers.5.t_delay_RNN_x.bias_ih_l0", "module.layers.5.t_delay_RNN_x.bias_hh_l0", "module.layers.5.t_delay_RNN_yz.weight_ih_l0", "module.layers.5.t_delay_RNN_yz.weight_hh_l0", "module.layers.5.t_delay_RNN_yz.bias_ih_l0", "module.layers.5.t_delay_RNN_yz.bias_hh_l0", "module.layers.5.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.5.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.5.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.5.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.5.c_RNN.weight_ih_l0", "module.layers.5.c_RNN.weight_hh_l0", "module.layers.5.c_RNN.bias_ih_l0", "module.layers.5.c_RNN.bias_hh_l0", "module.layers.5.f_delay_RNN.weight_ih_l0", "module.layers.5.f_delay_RNN.weight_hh_l0", "module.layers.5.f_delay_RNN.bias_ih_l0", "module.layers.5.f_delay_RNN.bias_hh_l0", "module.layers.5.W_t.weight", "module.layers.5.W_t.bias", "module.layers.5.W_c.weight", "module.layers.5.W_c.bias", "module.layers.5.W_f.weight", "module.layers.5.W_f.bias", "module.layers.6.t_delay_RNN_x.weight_ih_l0", "module.layers.6.t_delay_RNN_x.weight_hh_l0", "module.layers.6.t_delay_RNN_x.bias_ih_l0", "module.layers.6.t_delay_RNN_x.bias_hh_l0", "module.layers.6.t_delay_RNN_yz.weight_ih_l0", "module.layers.6.t_delay_RNN_yz.weight_hh_l0", "module.layers.6.t_delay_RNN_yz.bias_ih_l0", "module.layers.6.t_delay_RNN_yz.bias_hh_l0", "module.layers.6.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.6.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.6.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.6.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.6.c_RNN.weight_ih_l0", "module.layers.6.c_RNN.weight_hh_l0", "module.layers.6.c_RNN.bias_ih_l0", "module.layers.6.c_RNN.bias_hh_l0", "module.layers.6.f_delay_RNN.weight_ih_l0", "module.layers.6.f_delay_RNN.weight_hh_l0", "module.layers.6.f_delay_RNN.bias_ih_l0", "module.layers.6.f_delay_RNN.bias_hh_l0", "module.layers.6.W_t.weight", "module.layers.6.W_t.bias", "module.layers.6.W_c.weight", "module.layers.6.W_c.bias", "module.layers.6.W_f.weight", "module.layers.6.W_f.bias", "module.layers.7.t_delay_RNN_x.weight_ih_l0", "module.layers.7.t_delay_RNN_x.weight_hh_l0", "module.layers.7.t_delay_RNN_x.bias_ih_l0", "module.layers.7.t_delay_RNN_x.bias_hh_l0", "module.layers.7.t_delay_RNN_yz.weight_ih_l0", "module.layers.7.t_delay_RNN_yz.weight_hh_l0", "module.layers.7.t_delay_RNN_yz.bias_ih_l0", "module.layers.7.t_delay_RNN_yz.bias_hh_l0", "module.layers.7.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.7.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.7.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.7.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.7.c_RNN.weight_ih_l0", "module.layers.7.c_RNN.weight_hh_l0", "module.layers.7.c_RNN.bias_ih_l0", "module.layers.7.c_RNN.bias_hh_l0", "module.layers.7.f_delay_RNN.weight_ih_l0", "module.layers.7.f_delay_RNN.weight_hh_l0", "module.layers.7.f_delay_RNN.bias_ih_l0", "module.layers.7.f_delay_RNN.bias_hh_l0", "module.layers.7.W_t.weight", "module.layers.7.W_t.bias", "module.layers.7.W_c.weight", "module.layers.7.W_c.bias", "module.layers.7.W_f.weight", "module.layers.7.W_f.bias", "module.layers.8.t_delay_RNN_x.weight_ih_l0", "module.layers.8.t_delay_RNN_x.weight_hh_l0", "module.layers.8.t_delay_RNN_x.bias_ih_l0", "module.layers.8.t_delay_RNN_x.bias_hh_l0", "module.layers.8.t_delay_RNN_yz.weight_ih_l0", "module.layers.8.t_delay_RNN_yz.weight_hh_l0", "module.layers.8.t_delay_RNN_yz.bias_ih_l0", "module.layers.8.t_delay_RNN_yz.bias_hh_l0", "module.layers.8.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.8.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.8.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.8.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.8.c_RNN.weight_ih_l0", "module.layers.8.c_RNN.weight_hh_l0", "module.layers.8.c_RNN.bias_ih_l0", "module.layers.8.c_RNN.bias_hh_l0", "module.layers.8.f_delay_RNN.weight_ih_l0", "module.layers.8.f_delay_RNN.weight_hh_l0", "module.layers.8.f_delay_RNN.bias_ih_l0", "module.layers.8.f_delay_RNN.bias_hh_l0", "module.layers.8.W_t.weight", "module.layers.8.W_t.bias", "module.layers.8.W_c.weight", "module.layers.8.W_c.bias", "module.layers.8.W_f.weight", "module.layers.8.W_f.bias", "module.layers.9.t_delay_RNN_x.weight_ih_l0", "module.layers.9.t_delay_RNN_x.weight_hh_l0", "module.layers.9.t_delay_RNN_x.bias_ih_l0", "module.layers.9.t_delay_RNN_x.bias_hh_l0", "module.layers.9.t_delay_RNN_yz.weight_ih_l0", "module.layers.9.t_delay_RNN_yz.weight_hh_l0", "module.layers.9.t_delay_RNN_yz.bias_ih_l0", "module.layers.9.t_delay_RNN_yz.bias_hh_l0", "module.layers.9.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.9.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.9.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.9.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.9.c_RNN.weight_ih_l0", "module.layers.9.c_RNN.weight_hh_l0", "module.layers.9.c_RNN.bias_ih_l0", "module.layers.9.c_RNN.bias_hh_l0", "module.layers.9.f_delay_RNN.weight_ih_l0", "module.layers.9.f_delay_RNN.weight_hh_l0", "module.layers.9.f_delay_RNN.bias_ih_l0", "module.layers.9.f_delay_RNN.bias_hh_l0", "module.layers.9.W_t.weight", "module.layers.9.W_t.bias", "module.layers.9.W_c.weight", "module.layers.9.W_c.bias", "module.layers.9.W_f.weight", "module.layers.9.W_f.bias", "module.layers.10.t_delay_RNN_x.weight_ih_l0", "module.layers.10.t_delay_RNN_x.weight_hh_l0", "module.layers.10.t_delay_RNN_x.bias_ih_l0", "module.layers.10.t_delay_RNN_x.bias_hh_l0", "module.layers.10.t_delay_RNN_yz.weight_ih_l0", "module.layers.10.t_delay_RNN_yz.weight_hh_l0", "module.layers.10.t_delay_RNN_yz.bias_ih_l0", "module.layers.10.t_delay_RNN_yz.bias_hh_l0", "module.layers.10.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.10.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.10.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.10.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.10.c_RNN.weight_ih_l0", "module.layers.10.c_RNN.weight_hh_l0", "module.layers.10.c_RNN.bias_ih_l0", "module.layers.10.c_RNN.bias_hh_l0", "module.layers.10.f_delay_RNN.weight_ih_l0", "module.layers.10.f_delay_RNN.weight_hh_l0", "module.layers.10.f_delay_RNN.bias_ih_l0", "module.layers.10.f_delay_RNN.bias_hh_l0", "module.layers.10.W_t.weight", "module.layers.10.W_t.bias", "module.layers.10.W_c.weight", "module.layers.10.W_c.bias", "module.layers.10.W_f.weight", "module.layers.10.W_f.bias", "module.layers.11.t_delay_RNN_x.weight_ih_l0", "module.layers.11.t_delay_RNN_x.weight_hh_l0", "module.layers.11.t_delay_RNN_x.bias_ih_l0", "module.layers.11.t_delay_RNN_x.bias_hh_l0", "module.layers.11.t_delay_RNN_yz.weight_ih_l0", "module.layers.11.t_delay_RNN_yz.weight_hh_l0", "module.layers.11.t_delay_RNN_yz.bias_ih_l0", "module.layers.11.t_delay_RNN_yz.bias_hh_l0", "module.layers.11.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.11.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.11.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.11.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.11.c_RNN.weight_ih_l0", "module.layers.11.c_RNN.weight_hh_l0", "module.layers.11.c_RNN.bias_ih_l0", "module.layers.11.c_RNN.bias_hh_l0", "module.layers.11.f_delay_RNN.weight_ih_l0", "module.layers.11.f_delay_RNN.weight_hh_l0", "module.layers.11.f_delay_RNN.bias_ih_l0", "module.layers.11.f_delay_RNN.bias_hh_l0", "module.layers.11.W_t.weight", "module.layers.11.W_t.bias", "module.layers.11.W_c.weight", "module.layers.11.W_c.bias", "module.layers.11.W_f.weight", "module.layers.11.W_f.bias", "module.layers.0.t_delay_RNN_x.weight_ih_l0", "module.layers.0.t_delay_RNN_x.weight_hh_l0", "module.layers.0.t_delay_RNN_x.bias_ih_l0", "module.layers.0.t_delay_RNN_x.bias_hh_l0", "module.layers.0.t_delay_RNN_yz.weight_ih_l0", "module.layers.0.t_delay_RNN_yz.weight_hh_l0", "module.layers.0.t_delay_RNN_yz.bias_ih_l0", "module.layers.0.t_delay_RNN_yz.bias_hh_l0", "module.layers.0.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.0.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.0.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.0.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.0.c_RNN.weight_ih_l0", "module.layers.0.c_RNN.weight_hh_l0", "module.layers.0.c_RNN.bias_ih_l0", "module.layers.0.c_RNN.bias_hh_l0", "module.layers.0.f_delay_RNN.weight_ih_l0", "module.layers.0.f_delay_RNN.weight_hh_l0", "module.layers.0.f_delay_RNN.bias_ih_l0", "module.layers.0.f_delay_RNN.bias_hh_l0", "module.layers.0.W_t.weight", "module.layers.0.W_t.bias", "module.layers.0.W_c.weight", "module.layers.0.W_c.bias", "module.layers.0.W_f.weight", "module.layers.0.W_f.bias", "module.layers.1.t_delay_RNN_x.weight_ih_l0", "module.layers.1.t_delay_RNN_x.weight_hh_l0", "module.layers.1.t_delay_RNN_x.bias_ih_l0", "module.layers.1.t_delay_RNN_x.bias_hh_l0", "module.layers.1.t_delay_RNN_yz.weight_ih_l0", "module.layers.1.t_delay_RNN_yz.weight_hh_l0", "module.layers.1.t_delay_RNN_yz.bias_ih_l0", "module.layers.1.t_delay_RNN_yz.bias_hh_l0", "module.layers.1.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.1.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.1.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.1.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.1.c_RNN.weight_ih_l0", "module.layers.1.c_RNN.weight_hh_l0", "module.layers.1.c_RNN.bias_ih_l0", "module.layers.1.c_RNN.bias_hh_l0", "module.layers.1.f_delay_RNN.weight_ih_l0", "module.layers.1.f_delay_RNN.weight_hh_l0", "module.layers.1.f_delay_RNN.bias_ih_l0", "module.layers.1.f_delay_RNN.bias_hh_l0", "module.layers.1.W_t.weight", "module.layers.1.W_t.bias", "module.layers.1.W_c.weight", "module.layers.1.W_c.bias", "module.layers.1.W_f.weight", "module.layers.1.W_f.bias", "module.layers.2.t_delay_RNN_x.weight_ih_l0", "module.layers.2.t_delay_RNN_x.weight_hh_l0", "module.layers.2.t_delay_RNN_x.bias_ih_l0", "module.layers.2.t_delay_RNN_x.bias_hh_l0", "module.layers.2.t_delay_RNN_yz.weight_ih_l0", "module.layers.2.t_delay_RNN_yz.weight_hh_l0", "module.layers.2.t_delay_RNN_yz.bias_ih_l0", "module.layers.2.t_delay_RNN_yz.bias_hh_l0", "module.layers.2.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.2.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.2.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.2.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.2.c_RNN.weight_ih_l0", "module.layers.2.c_RNN.weight_hh_l0", "module.layers.2.c_RNN.bias_ih_l0", "module.layers.2.c_RNN.bias_hh_l0", "module.layers.2.f_delay_RNN.weight_ih_l0", "module.layers.2.f_delay_RNN.weight_hh_l0", "module.layers.2.f_delay_RNN.bias_ih_l0", "module.layers.2.f_delay_RNN.bias_hh_l0", "module.layers.2.W_t.weight", "module.layers.2.W_t.bias", "module.layers.2.W_c.weight", "module.layers.2.W_c.bias", "module.layers.2.W_f.weight", "module.layers.2.W_f.bias", "module.layers.3.t_delay_RNN_x.weight_ih_l0", "module.layers.3.t_delay_RNN_x.weight_hh_l0", "module.layers.3.t_delay_RNN_x.bias_ih_l0", "module.layers.3.t_delay_RNN_x.bias_hh_l0", "module.layers.3.t_delay_RNN_yz.weight_ih_l0", "module.layers.3.t_delay_RNN_yz.weight_hh_l0", "module.layers.3.t_delay_RNN_yz.bias_ih_l0", "module.layers.3.t_delay_RNN_yz.bias_hh_l0", "module.layers.3.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.3.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.3.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.3.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.3.c_RNN.weight_ih_l0", "module.layers.3.c_RNN.weight_hh_l0", "module.layers.3.c_RNN.bias_ih_l0", "module.layers.3.c_RNN.bias_hh_l0", "module.layers.3.f_delay_RNN.weight_ih_l0", "module.layers.3.f_delay_RNN.weight_hh_l0", "module.layers.3.f_delay_RNN.bias_ih_l0", "module.layers.3.f_delay_RNN.bias_hh_l0", "module.layers.3.W_t.weight", "module.layers.3.W_t.bias", "module.layers.3.W_c.weight", "module.layers.3.W_c.bias", "module.layers.3.W_f.weight", "module.layers.3.W_f.bias", "module.layers.4.t_delay_RNN_x.weight_ih_l0", "module.layers.4.t_delay_RNN_x.weight_hh_l0", "module.layers.4.t_delay_RNN_x.bias_ih_l0", "module.layers.4.t_delay_RNN_x.bias_hh_l0", "module.layers.4.t_delay_RNN_yz.weight_ih_l0", "module.layers.4.t_delay_RNN_yz.weight_hh_l0", "module.layers.4.t_delay_RNN_yz.bias_ih_l0", "module.layers.4.t_delay_RNN_yz.bias_hh_l0", "module.layers.4.t_delay_RNN_yz.weight_ih_l0_reverse", "module.layers.4.t_delay_RNN_yz.weight_hh_l0_reverse", "module.layers.4.t_delay_RNN_yz.bias_ih_l0_reverse", "module.layers.4.t_delay_RNN_yz.bias_hh_l0_reverse", "module.layers.4.c_RNN.weight_ih_l0", "module.layers.4.c_RNN.weight_hh_l0", "module.layers.4.c_RNN.bias_ih_l0", "module.layers.4.c_RNN.bias_hh_l0", "module.layers.4.f_delay_RNN.weight_ih_l0", "module.layers.4.f_delay_RNN.weight_hh_l0", "module.layers.4.f_delay_RNN.bias_ih_l0", "module.layers.4.f_delay_RNN.bias_hh_l0", "module.layers.4.W_t.weight", "module.layers.4.W_t.bias", "module.layers.4.W_c.weight", "module.layers.4.W_c.bias", "module.layers.4.W_f.weight", "module.layers.4.W_f.bias". 

config parameter

Hi, I wanted to train the MelNet with my own dataset.
There are some audio setting that I still not understand since I'm very new to this signal processing/speech field. Can someone elaborate me or give me reference for me to understand what are the meaning of these setting :

audio:
  sr: 16000
  duration: 6.0
  n_mels: 180
  hop_length: 180
  win_length: 1080
  n_fft: 1080
  num_freq: 541
  ref_level_db: 20.0
  min_level_db: -80.0

Thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.