basveeling / wavenet Goto Github PK

View Code? Open in Web Editor NEW

1.1K 63.0 219.0 37.44 MB

Keras WaveNet implementation

Home Page: https://soundcloud.com/basveeling/wavenet-sample

Python 98.71% Shell 1.29%

wavenet's Introduction

WaveNet implementation in Keras

Based on https://deepmind.com/blog/wavenet-generative-model-raw-audio/ and https://arxiv.org/pdf/1609.03499.pdf.

Listen to a sample 🎶!

~~Generate your own samples:

$ KERAS_BACKEND=theano python2 wavenet.py predict with models/run_20160920_120916/config.json predict_seconds=1~~ EDIT: The pretrained model had to be removed from the repository as it wasn't compatible with recent changes.

Installation:

Activate a new python2 virtualenv (recommended):

pip install virtualenv
mkdir ~/virtualenvs && cd ~/virtualenvs
virtualenv wavenet
source wavenet/bin/activate

Clone and install requirements.

cd ~
git clone https://github.com/basveeling/wavenet.git
cd wavenet
pip install -r requirements.txt

Using the tensorflow backend is not recommended at this time, see this issue

Dependencies:

Sacred is used for managing training and sampling. Take a look at the documentation for more information.
This implementation does not support python3 as of now.

Sampling:

Once the first model checkpoint is created, you can start sampling.

Run: $ KERAS_BACKEND=theano python2 wavenet.py predict with models/<your_run_folder>/config.json predict_seconds=1

The latest model checkpoint will be retrieved and used to sample. The sample will be streamed to [run_folder]/samples, you can start listening when the first sample is generated.

Sampling options:

predict_seconds: float. Number of seconds to sample.
sample_argmax: True or False. Always take the argmax
sample_temperature: None or float. Controls the sampling temperature. 1.0 for the original distribution, < 1.0 for less exploitation, > 1.0 for more exploration.
seed: int: Controls the seed for the sampling procedure.
predict_initial_input: string: Path to a wav file, for which the first fragment_length samples are used as initial input.

e.g.: $ KERAS_BACKEND=theano python2 wavenet.py predict with models/[run_folder]/config.json predict_seconds=1

Training:

$ KERAS_BACKEND=theano python2 wavenet.py

Or for a smaller network (less channels per layer). $ KERAS_BACKEND=theano python2 wavenet.py with small

VCTK:

In order to use the VCTK dataset, first download the dataset by running vctk/download_vctk.sh.

Training is done with: $ KERAS_BACKEND=theano python2 wavenet.py with vctkdata

For smaller network: $ KERAS_BACKEND=theano python2 wavenet.py with vctkdata small

Options:

Train with different configurations: $ KERAS_BACKEND=theano python2 wavenet.py with 'option=value' 'option2=value' Available options:

  batch_size = 16
  data_dir = 'data'
  data_dir_structure = 'flat'
  debug = False
  desired_sample_rate = 4410
  dilation_depth = 9
  early_stopping_patience = 20
  fragment_length = 1152
  fragment_stride = 128
  keras_verbose = 1
  learn_all_outputs = True
  nb_epoch = 1000
  nb_filters = 256
  nb_output_bins = 256
  nb_stacks = 1
  predict_initial_input = ''
  predict_seconds = 1
  predict_use_softmax_as_input = False
  random_train_batches = False
  randomize_batch_order = True
  run_dir = None
  sample_argmax = False
  sample_temperature = 1
  seed = 173213366
  test_factor = 0.1
  train_only_in_receptive_field = True
  use_bias = False
  use_skip_connections = True
  use_ulaw = True
  optimizer:
    decay = 0.0
    epsilon = None
    lr = 0.001
    momentum = 0.9
    nesterov = True
    optimizer = 'sgd'

Using your own training data:

Create a new data directory with a train and test folder in it. All wave files in these folders will be used as data.
- Caveat: Make sure your wav files are supported by scipy.io.wavefile.read(): e.g. don't use 24bit wav and remove meta info.
Run with: $ python2 wavenet.py with 'data_dir=your_data_dir_name'
Test preprocessing results with: $ python2 wavenet.py test_preprocess with 'data_dir=your_data_dir_name'

Todo:

Local conditioning
Global conditioning
Training on CSTR VCTK Corpus
CLI option to pick a wave file for the sample generation initial input. Done: see predict_initial_input.
Fully randomized training batches
Soft targets: by convolving a gaussian kernel over the one-hot targets, the network trains faster.
Decaying soft targets: the stdev of the gaussian kernel should slowly decay.

Uncertainties from paper:

It's unclear if the model is trained to predict t+1 samples for every input sample, or only for the outputs for which which $t-receptive_field$ was in the input. Right now the code does the latter.
There is no mention of weight decay, batch normalization in the paper. Perhaps this is not needed given enough data?

Note on computational cost:

The Wavenet model is quite expensive to train and sample from. We can however trade computation cost with accuracy and fidility by lowering the sampling rate, amount of stacks and the amount of channels per layer.

For a downsized model (4000hz vs 16000 sampling rate, 16 filters v/s 256, 2 stacks vs ??):

A Tesla K80 needs around ~4 minutes to generate one second of audio.
A recent macbook pro needs around ~15 minutes. Deepmind has reported that generating one second of audio with their model takes about 90 minutes.

Disclaimer

This is a re-implementation of the model described in the WaveNet paper by Google Deepmind. This repository is not associated with Google Deepmind.

wavenet's People

Stargazers

Watchers

Forkers

nimmen johnsonc aarzhaev jfsantos ml-ai-nlp-ir dolaameng tianlongwang htoyryla liusiye ritheshkumar95 zachlungu alphalfc milesqli shawnwongmilab lab-x abhi3p miketam1021 nexyy hedgefair datavizweb adroit91 benjamesbabala magicknight beckgom synaptek theolivenbaum thinksource deepcompute mynameisfiber muzaluisa eos21 soroushmehr cash2one lewisget feherbalazs weiyegd soheil-khorram stevenlol kmader techscientist mrgoogol felipeasg burakdev yongxuustc jiansenzheng zmoon111 blx0102 bityangke gdtm86 zhangjiulong bensondou angrycoffeemonster fireae kingstorm arppa99100 javiercorrea jasonray716 ieee820 821760408-sp dipteshkanojia michaelfeng87 merqlove bgtwoigu embeddedsamurai jamesoneill12 ashellwig xylary maggie0830 cloudstdio zhangyang5511 arnebab tempbottle cryptohft picopoco zhaoyj1122 antonosika yi6ei2ifd proz hackgoofer kfriesth jfsdcgy michaelseelan ravi-code-ranjan calculatedcontent chijiaodaxie zijingmao ripingit lsheiba zseder ogugugugugua solertis lkh-1 iforgotband c351 asd51731 shubhampachori12110095 simonsleo adonishan saitamandd juanumusic

wavenet's Issues

Generation step produces noise

This is great. It's very cool to have a WaveNet implementation to play with, and this project in particular is really beautifully packaged and presented. So, first, thank you!

As for the issue: I can't get the basic "hello world" generation step to work. Using your recommended command --

python wavenet.py predict with models/run_2016-09-14_11:32:09/config.json predict_seconds=1

-- everything runs fine, with no errors, and I do get a wav file, but its content is just noise. I've tried several times with identical results; here's an example.

I'm running on Ubuntu 14.04 with a Titan X GPU.

Should I be expecting output similar to the sample wav included in the project? Any ideas for what I might try?

I can successfully run this code but not the right result

I download the VCTK datasets, there are many folds for each speaker. I just copy the data of one speaker into the "train" & "test" dir (should be mkdir train & mkdir test)

And then i run like this:
python wavenet.py with 'data_dir=/vol/vssp/datasets/audio/dcase2016/vctk_data/VCTK-Corpus/wav48' small adam

(Before that: change this:
#def small(desired_sample_rate):
def small():
desired_sample_rate = 4000)

And then (but i can not generate the wanted wav during test. Any wrong with my trianing process ???)
WARNING - root - Changed type of config entry "optimizer.epsilon" from NoneType to float
INFO - wavenet - Running command 'main'
WARNING - wavenet - No observers have been added to this run
INFO - wavenet - Started
Configuration (modified, added, typechanged):
batch_size = 16
data_dir = '/vol/vssp/datasets/audio/dcase2016/vctk_data/VCTK-Corpus/wav48'
data_dir_structure = 'flat'
debug = False
desired_sample_rate = 4000
dilation_depth = 8
early_stopping_patience = 20
final_l2 = 0
fragment_length = 640
fragment_stride = 400
keras_verbose = 1
learn_all_outputs = True
nb_epoch = 1000
nb_filters = 16
nb_output_bins = 256
nb_stacks = 1
predict_initial_input = ''
predict_seconds = 1
predict_use_softmax_as_input = False
random_train_batches = False
randomize_batch_order = True
res_l2 = 0
run_dir = None
sample_argmax = False
sample_temperature = 1.0
seed = 348190421
test_factor = 0.1
train_only_in_receptive_field = True
train_with_soft_target_stdev = None
use_bias = False
use_skip_connections = True
use_ulaw = True
optimizer:
decay = 0.0
epsilon = 1e-08
lr = 0.001
momentum = 0.9
nesterov = True
optimizer = 'adam'
INFO - main - Running with seed 348190421
INFO - main - Loading data...
INFO - main - Building model...
INFO - build_model - Receptive Field: 512 (128ms)
................................................................(model.summary).............................................
INFO - main - None
INFO - main - Compiling Model...
INFO - main - Starting Training...
Epoch 1/1000
9632/9632 [==============================] - 196s - loss: 4.3700 - categorical_accuracy: 0.0332 - categorical_mean_squared_error: 2048.7681 - val_loss: 4.3286 - val_categorical_accuracy: 0.0515 - val_categorical_mean_squared_error: 3280.6926
Epoch 2/1000
9632/9632 [==============================] - 198s - loss: 3.8852 - categorical_accuracy: 0.0808 - categorical_mean_squared_error: 2505.4959 - val_loss: 4.2130 - val_categorical_accuracy: 0.0652 - val_categorical_mean_squared_error: 3624.2373
Epoch 3/1000

Python Version

So this runs with Python 2 right ?
I had quite a hard time actually figuring that out :D maybe note it somewhere :)

What do need to try other languages, for examples russian or spanish?

Generate voice from text file

Is there any way to use the generated model to generate voice from a text file?

Performance benchmarks

Hi ! I'm training a reasonably small network with 111888 parameters (2 stacks, dilation depth 10 and all other settings left untouched) on a single speaker dataset (77MB of 16 bit 44.1kHz WAV files = 32784 batches of 16). The ETA for one epoch is 42682s, so almost 12 hours. I'm using a Tesla K80, is that training time in line with your experience ? I know for sure I am processing 100K samples in half a second with a different implementation (based on TensorFlow). When we say batch size is 16, are these 16 fragments of 4223 samples = 67528 samples ? If that's the case then I am processing 67K samples in about 30 seconds. Also is the stride a way to say our fragments should overlap by 128 samples when the input is split ?
cheers

Can't train network, basveeling's branch of Keras didn't help

I wanted to train the network on my own .wav files. I downloaded all the .py files, put the .wav files in the data/test and data/train directories, and ran wavenet.py without any arguments (as is shown in the readme). However, it apparently tried to predict using a nonexistent model:

WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to exe
cute optimized C-implementations (for both CPU and GPU) and will default to Pyth
on implementations. Performance will be severely degraded. To remove this warnin
g, set Theano flags cxx to an empty string.
Using Theano backend.
INFO - wavenet - Running command 'main'
WARNING - wavenet - No observers have been added to this run
INFO - wavenet - Started
Configuration (modified, added, typechanged):
  batch_size = 16
  data_dir = 'test'
  data_dir_structure = 'flat'
  debug = False
  desired_sample_rate = 4410
  dilation_depth = 9
  early_stopping_patience = 20
  final_l2 = 0
  fragment_length = 1152
  fragment_stride = 128
  keras_verbose = 1
  learn_all_outputs = True
  nb_epoch = 1000
  nb_filters = 256
  nb_output_bins = 256
  nb_stacks = 1
  predict_initial_input = ''
  predict_seconds = 1
  predict_use_softmax_as_input = False
  random_train_batches = False
  randomize_batch_order = True
  res_l2 = 0
  run_dir = None
  sample_argmax = False
  sample_temperature = 1.0
  seed = 476062113
  test_factor = 0.1
  train_only_in_receptive_field = True
  train_with_soft_target_stdev = None
  use_bias = False
  use_skip_connections = True
  use_ulaw = True
  optimizer:
    decay = 0.0
    epsilon = None
    lr = 0.001
    momentum = 0.9
    nesterov = True
    optimizer = 'sgd'
INFO - main - Running with seed 476062113
ERROR - wavenet - Failed after 0:00:00!
Traceback (most recent calls WITHOUT Sacred internals):
  File "C:\Python34\lib\site-packages\wrapt\wrappers.py", line 522, in __call__
    args, kwargs)
  File "wavenet.py", line 471, in main
    os.mkdir(run_dir)
FileNotFoundError: [WinError 3] System nie może odnaleźć określonej ścieżki: 'models\\run_20170123_174859'

(the last line says it can't find the path) After a bit of trial and error, I decided to install basveeling's branch of Keras (I installed the official one previously). However, that only worsened the situation:

WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to exe
cute optimized C-implementations (for both CPU and GPU) and will default to Pyth
on implementations. Performance will be severely degraded. To remove this warnin
g, set Theano flags cxx to an empty string.
Using Theano backend.
Traceback (most recent call last):
  File "wavenet.py", line 23, in <module>
    from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlatea
u, CSVLogger
ImportError: cannot import name 'ReduceLROnPlateau'

I'm using Windows 7 64-bit without SP1. My Python version is Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSCv.1600 32 bit (Intel)] on win32.

virtualenv issue

Currently I am trying to use WaveNet. My OS is Windows 10 and I've installed Python 2.7.18 and 3.5.3. If I'm trying to install the requirements (in the virtualenv) via pip install -r requirements.txt, I can't install tensorflow-gpu 1.8.0, because you can only install this with Python versions 3.5-3.7. Then I tried this with pip3 install -r requirements.txt. This works for me. If I now test python wavenet.py, it throws this error:

Traceback (most recent call last):
  File "wavenet.py", line 9, in <module>
    import keras.backend as K
ImportError: No module named keras.backend

The requirements are installed with Python 3 and if I'm trying to execute wavenet.py with Python 2, it doesn't work. Does somebody know how to fix my problem? Thanks in advance, regards!

ImportError: cannot import name ReduceLROnPlateau

ReduceLROnPlateau seems to be in https://github.com/basveeling/keras/tree/callbacks but not https://github.com/basveeling/keras/tree/wavenet

AssertionError in tensorflow_backend.py

I'm running Ubuntu 16.04.1, CUDA 8.0.26, and CudNN 5.1.5 following the Python 2 directions here

wavenet$ python wavenet.py predict with models/run_2016-09-14_11:32:09/config.json predict_seconds=1
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5.1.5 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so.8.0 locally
WARNING - root - Changed type of config entry "run_dir" from NoneType to unicode
WARNING - root - Changed type of config entry "optimizer.epsilon" from NoneType to float
INFO - wavenet - Running command 'predict'
WARNING - wavenet - No observers have been added to this run
INFO - wavenet - Started
INFO - predict - Using checkpoint from epoch: 37
INFO - predict - Saving to "models/run_2016-09-14_11:32:09/samples/sample_epoch-00037_01s__sample_seed-920639969.wav"
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties: 
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:03:00.0
Total memory: 11.90GiB
Free memory: 11.38GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:03:00.0)
ERROR - wavenet - Failed after 0:00:01!
Traceback (most recent calls WITHOUT Sacred internals):
  File "wavenet.py", line 224, in predict
    model = build_model()
  File "wavenet.py", line 158, in build_model
    out, skip_out = residual_block(out)
  File "wavenet.py", line 141, in residual_block
    name='dilated_conv_%d_tanh_s%d' % (2 ** i, s), activation='tanh')(x)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 514, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 572, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 149, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
  File "/usr/local/lib/python2.7/dist-packages/keras/layers/convolutional.py", line 655, in call
    filter_dilation=self.atrous_rate)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1575, in conv2d
    assert filter_dilation[0] == filter_dilation[1]
AssertionError

Fix: Skip connections have separate 1x1 convolutions from the residual output.

After some correspondence with the authors it turns out that the skip connections are parameterized by a 1x1 convolution, separate from the one that goes into the add block. This should be an easy fix but could potentially explain poor training results before.

OOM when allocating tensor

I have a 12GB GPU but attempting to train anything with the default settings produces an OOM on the first epoch. I had to dial the batch_size and the dilation_depth way down before it would even start. What settings are you using when you train?

I tensorflow/core/common_runtime/bfc_allocator.cc:689]      Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 83 Chunks of size 256 totalling 20.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 512 totalling 512B
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 15 Chunks of size 1024 totalling 15.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 65536 totalling 64.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 59 Chunks of size 262144 totalling 14.75MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 520704 totalling 508.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 105 Chunks of size 524288 totalling 52.50MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 13 Chunks of size 67108864 totalling 832.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 2 Chunks of size 67174400 totalling 128.12MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 67239936 totalling 64.12MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 67371008 totalling 64.25MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 67633152 totalling 64.50MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 68157440 totalling 65.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 134479872 totalling 128.25MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 269484032 totalling 257.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 541065216 totalling 516.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1090519040 totalling 1.02GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 2147483648 totalling 2.00GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 2214592512 totalling 2.06GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 3726535936 totalling 3.47GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 10.68GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats:
Limit:                 11715375924
InUse:                 11472467200
MaxInUse:              11473515776
NumAllocs:                     563
MaxAllocSize:           3980291328

W tensorflow/core/common_runtime/bfc_allocator.cc:270] ****************************************************************************************xxxxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 2.00GiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:968] Resource exhausted: OOM when allocating tensor with shape[65536,256,32,1]

Should an increase in dilation rate reduce the number of training parameters?

Because I use keras model.summary and I noticed, that number of parameters stays the same if I change dilation_rate.

TTS

Hi guys! Do u know how to extract linguistic features from the text? Or is there any tool could do that easily?

Why is local conditioning hard to implement? Would be willing to help

Error with 'pip install -r requirements.txt'

with Ubuntu 14, as I am following these steps:

- Activate a new virtualenv (recommended):

pip install virtualenv
mkdir ~/virtualenvs && cd ~/virtualenvs
virtualenv wavenet
source wavenet/bin/activate

- Clone and install requirements:

cd ~
git clone https://github.com/basveeling/wavenet.git
cd wavenet
- pip install -r requirements.txt

ValueError: ('Expected version spec in', 'picklable_itertools~=0.1.1', 'at', '~=0.1.1')

TTS synthesis

Can you show how to use this code for Text to Speech (TTS) synthesis?

PR to Keras

Is there a plan to PR to Keras for the dilated causal convolution layer any time soon?

input dimemsion mis-match

Hi,

When I use the default settings (except the VCTK corpus path) to sample wave, an error occurred like below:

Using Theano backend.
WARNING - root - Changed type of config entry "run_dir" from NoneType to unicode
WARNING - root - Changed type of config entry "optimizer.epsilon" from NoneType to float
INFO - wavenet - Running command 'predict'
WARNING - wavenet - No observers have been added to this run
INFO - wavenet - Started
INFO - predict - Using checkpoint from epoch: 358
INFO - predict - Saving to "models/run_20160920_120916/samples/sample_epoch-00358_01s__sample-temp-0.001_seed-946674575.wav"
/home/lixiao2/.conda/envs/wavenet/lib/python2.7/site-packages/keras/backend/theano_backend.py:1171: UserWarning: ['filter_dilation'] are now deprecated in tensor.nnet.abstract_conv.conv2d interface and will be ignored.
filter_dilation=filter_dilation)
INFO - build_model - Receptive Field: 1021 (255ms)
INFO - predict - Taking sample from test dataset as initial input.
0%| | 0/4000 [00:00<?, ?it/s]
ERROR - wavenet - Failed after 0:00:18!
Traceback (most recent calls WITHOUT Sacred internals):
File "wavenet.py", line 339, in predict
output = model.predict(prediction_seed)
File "/home/lixiao2/.conda/envs/wavenet/lib/python2.7/site-packages/keras/engine/training.py", line 1177, in predict
batch_size=batch_size, verbose=verbose)
File "/home/lixiao2/.conda/envs/wavenet/lib/python2.7/site-packages/keras/engine/training.py", line 876, in _predict_loop
batch_outs = f(ins_batch)
File "/home/lixiao2/.conda/envs/wavenet/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 746, in call
return self.function(*inputs)
File "/home/lixiao2/.conda/envs/wavenet/lib/python2.7/site-packages/theano/compile/function_module.py", line 871, in call
storage_map=getattr(self.fn, 'storage_map', None))
File "/home/lixiao2/.conda/envs/wavenet/lib/python2.7/site-packages/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/home/lixiao2/.conda/envs/wavenet/lib/python2.7/site-packages/theano/compile/function_module.py", line 859, in call
outputs = self.fn()
ValueError: Input dimension mis-match. (input[0].shape[1] = 1022, input[1].shape[1] = 1021)
Apply node that caused the error: Elemwise{add,no_inplace}(InplaceDimShuffle{0,2,1}.0, InplaceDimShuffle{0,2,1}.0, InplaceDimShuffle{0,2,1}.0)
Toposort index: 866
Inputs types: [TensorType(float32, 3D), TensorType(float32, 3D), TensorType(float32, 3D)]
Inputs shapes: [(1, 1022, 32), (1, 1021, 32), (1, 1021, 32)]
Inputs strides: [(4088, 4, 4088), (4084, 4, 4084), (4084, 4, 4084)]
Inputs values: ['not shown', 'not shown', 'not shown']
Outputs clients: [[IncSubtensor{Set;::, int64:int64:, ::}(Alloc.0, Elemwise{add,no_inplace}.0, Constant{4}, ScalarFromTensor.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "wavenet.py", line 316, in predict
model = build_model()
File "/home/lixiao2/.conda/envs/wavenet/lib/python2.7/site-packages/sacred/config/captured_function.py", line 47, in captured_function
result = wrapped(*args, **kwargs)
File "wavenet.py", line 244, in build_model
out, skip_out = residual_block(out)
File "wavenet.py", line 234, in residual_block
res_x = layers.Merge(mode='sum')([original_x, res_x])
File "/home/lixiao2/.conda/envs/wavenet/lib/python2.7/site-packages/keras/engine/topology.py", line 1339, in call
self.add_inbound_node(layers, node_indices, tensor_indices)
File "/home/lixiao2/.conda/envs/wavenet/lib/python2.7/site-packages/keras/engine/topology.py", line 572, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/home/lixiao2/.conda/envs/wavenet/lib/python2.7/site-packages/keras/engine/topology.py", line 154, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors, mask=input_masks))
File "/home/lixiao2/.conda/envs/wavenet/lib/python2.7/site-packages/keras/engine/topology.py", line 1269, in call
s += inputs[i]

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Undocumented dependencies on 'sacred', 'tqdm'

These seem to be necessary for sampling.

I am writing a book on keras and wanted to mention your work on Wavenet

In particular, I will use your example code for defining the network
Hope that this is fine with you
Cheers
//A

Problem running the training module in Wavenet tutorial. What should I do?

iMac:wavenet shyamalsuhanachandra$ KERAS_BACKEND=theano python wavenet.py with small
Using Theano backend.
Traceback (most recent call last):
  File "wavenet.py", line 456, in <module>
    @ex.automain
  File "/usr/local/lib/python2.7/site-packages/sacred/experiment.py", line 109, in automain
    self.run_commandline()
  File "/usr/local/lib/python2.7/site-packages/sacred/experiment.py", line 203, in run_commandline
    args)
  File "/usr/local/lib/python2.7/site-packages/sacred/experiment.py", line 163, in run_command
    named_configs, force=force)
  File "/usr/local/lib/python2.7/site-packages/sacred/ingredient.py", line 310, in _create_run_for_command
    named_configs=named_configs, force=force)
  File "/usr/local/lib/python2.7/site-packages/sacred/initialize.py", line 301, in create_run
    scaffold.set_up_config()
  File "/usr/local/lib/python2.7/site-packages/sacred/initialize.py", line 107, in set_up_config
    fallback=self.fallback)
  File "/usr/local/lib/python2.7/site-packages/sacred/config/utils.py", line 74, in chain_evaluate_config_scopes
    fallback=fallback)
  File "/usr/local/lib/python2.7/site-packages/sacred/config/config_scope.py", line 64, in __call__
    .format(arg, available_entries))
KeyError: u"'desired_sample_rate' not in preset for ConfigScope. Available options are: set([])"
iMac:wavenet shyamalsuhanachandra$ KERAS_BACKEND=theano python wavenet.py with small
Using Theano backend.
Traceback (most recent call last):
  File "wavenet.py", line 456, in <module>
    @ex.automain
  File "/usr/local/lib/python2.7/site-packages/sacred/experiment.py", line 109, in automain
    self.run_commandline()
  File "/usr/local/lib/python2.7/site-packages/sacred/experiment.py", line 203, in run_commandline
    args)
  File "/usr/local/lib/python2.7/site-packages/sacred/experiment.py", line 163, in run_command
    named_configs, force=force)
  File "/usr/local/lib/python2.7/site-packages/sacred/ingredient.py", line 310, in _create_run_for_command
    named_configs=named_configs, force=force)
  File "/usr/local/lib/python2.7/site-packages/sacred/initialize.py", line 301, in create_run
    scaffold.set_up_config()
  File "/usr/local/lib/python2.7/site-packages/sacred/initialize.py", line 107, in set_up_config
    fallback=self.fallback)
  File "/usr/local/lib/python2.7/site-packages/sacred/config/utils.py", line 74, in chain_evaluate_config_scopes
    fallback=fallback)
  File "/usr/local/lib/python2.7/site-packages/sacred/config/config_scope.py", line 64, in __call__
    .format(arg, available_entries))
KeyError: u"'desired_sample_rate' not in preset for ConfigScope. Available options are: set([])"

The following output results from running the commands as advertised by the tutorial. What changes do I need to make? I compiled keras, Theano, and wavenet from source and ran into this problem.

Should tanh_out and sigm_out in residual_block share the same convolutional output

Shouldn't tanh_out in https://github.com/basveeling/wavenet/blob/master/wavenet.py#L226 and sigm_out in https://github.com/basveeling/wavenet/blob/master/wavenet.py#L230 share the same convolutional output instead of two independent convolutional outputs?

Working with Keras 2.0

Hi Bas,
have you tried to make this working with K2.0?

' learn_all_outputs'

In wavenet.py's build_model()
if not learn_all_outputs: raise DeprecationWarning('Learning on just all outputs is wasteful, now learning only inside receptive field.'); out = layers.Lambda(lambda x: x[:, -1, :], output_shape=(out._keras_shape[-1],))( out) # Based on gif in deepmind blog: take last output?

you better do this before the two final 1x1 convolutional layers a few lines above to cut more waste, right?``

StopIteration Error

Hi,

Thank you for a nice implementation. Unfortunately, when I'm trying to run the test example, I bump into two problems.

Local variable "full_sequences" referenced before assignment on line 101 dataset.py (I can fix this by assigning full_sequences = [] before the two for loops)
File "wavenet.py", line 326, in predict
outputs = list(data_generators['test'].next()[0][-1])
StopIteration

This one I can't figure out.

Any ideas?

Global conditioning on speaker identification

And perhaps using a keras embedding layer to learn a representation for speakers?

How to Predict on Batches

Thanks for the amazing work. Could you mention how to generate waves in batches during test time ?

Parametric or concatenative TTS projects for comparision?

Is there any parametric or concatenative TTS projects availible for comparision?
Like described here:
https://deepmind.com/blog/wavenet-generative-model-raw-audio/

TypeError: ('Keyword argument not understood:', 'causal')

I failed to run the generation process:

(myVE) yx0001@balin:~/Downloads/wavenet/wavenet$ KERAS_BACKEND=theano python wavenet.py predict with models/run_20160920_120916/config.json predict_seconds=1
Using gpu device 6: GeForce GTX TITAN X (CNMeM is disabled, cuDNN 5005)
Using Theano backend.
WARNING - root - Changed type of config entry "run_dir" from NoneType to unicode
WARNING - root - Changed type of config entry "optimizer.epsilon" from NoneType to float
INFO - wavenet - Running command 'predict'
WARNING - wavenet - No observers have been added to this run
INFO - wavenet - Started
INFO - predict - Using checkpoint from epoch: 358
INFO - predict - Saving to "models/run_20160920_120916/samples/sample_epoch-00358_01s__sample-temp-0.001_seed-946674575.wav"
ERROR - wavenet - Failed after 0:00:00!
Traceback (most recent calls WITHOUT Sacred internals):
File "wavenet.py", line 316, in predict
model = build_model()
File "wavenet.py", line 241, in build_model
name='initial_causal_conv')(out)
File "/user/HS103/yx0001/myVE/local/lib/python2.7/site-packages/keras/layers/convolutional.py", line 260, in init
bias=bias, **kwargs)
File "/user/HS103/yx0001/myVE/local/lib/python2.7/site-packages/keras/layers/convolutional.py", line 111, in init
super(Convolution1D, self).init(**kwargs)
File "/user/HS103/yx0001/myVE/local/lib/python2.7/site-packages/keras/engine/topology.py", line 323, in init
raise TypeError('Keyword argument not understood:', kwarg)
TypeError: ('Keyword argument not understood:', 'causal')

Save the weights and training config

I'm just wondering how can I save the weights and training config for the model. According to Keras documentation, .json file save only the network config not the weights nor the training config

Thanks in advance,
Ossama

Exception: You are trying to load a weight file containing 69 layers into a model with 90 layers.

Here's the issue I encounter:

wavenet$ KERAS_BACKEND=theano python wavenet.py predict with models/run_20160914_113209/config.json predict_seconds=1 sample_temperature=0.001
WARNING (theano.configdefaults): Only clang++ is supported. With g++, we end up with strange g++/OSX bugs.
Using Theano backend.
WARNING - root - Changed type of config entry "run_dir" from NoneType to unicode
WARNING - root - Changed type of config entry "optimizer.epsilon" from NoneType to float
INFO - wavenet - Running command 'predict'
WARNING - wavenet - No observers have been added to this run
INFO - wavenet - Started
INFO - predict - Using checkpoint from epoch: 37
INFO - predict - Saving to "models/run_20160914_113209/samples/sample_epoch-00037_01s__sample-temp-0.001_seed-920639969.wav"
INFO - build_model - Receptive Field: 4095 (928ms)
ERROR - wavenet - Failed after 0:00:03!
Traceback (most recent calls WITHOUT Sacred internals):
File "wavenet.py", line 316, in predict
model.load_weights(os.path.join(checkpoint_dir, last_checkpoint))
File "/Users/m/virtualenvs/wavenet/lib/python2.7/site-packages/keras/engine/topology.py", line 2500, in load_weights
self.load_weights_from_hdf5_group(f)
File "/Users/m/virtualenvs/wavenet/lib/python2.7/site-packages/keras/engine/topology.py", line 2552, in load_weights_from_hdf5_group
str(len(flattened_layers)) + ' layers.')
Exception: You are trying to load a weight file containing 69 layers into a model with 90 layers.

not sure what the problem is, but i think it might have to do with the model checkpoint in use?

models/run directory name incompatible with Windows

This makes it impossible to checkout the project under Windows. I would suggest removing "-" and ":" from the directory name and change line 336 of wavenet.py to

run_dir = os.path.join('models', datetime.datetime.now().strftime('run_%Y%m%d_%H%M%S'))

ReduceLROnPlateau is not importing from keras.callbacks

ImportError: cannot import name ReduceLROnPlateau

It appears I am having trouble with the modified install of keras, I'm going to try again in a virtual environment and see if that's the problem.

How long should it take to train a new model ? How much data do I need ?

Hi,

I'm trying to train a new model out of sheer curiosity (and a geat deal of naiveté), and I was wondering how long should I expect the training to last, with, ahem, a 2015 Macbook Pro (Intel Iris Graphics 6100 1536 Mo) ?
I've just a very simple background in machine learning and have never done deep learning, and I was wondering if it was possible (i can be very patient and wait for a couple months if necessary).
Also, how many epoch should I do ? Is a single album enough as input data ?

What if I want to use this code to generate a signal waveform(not audio)

Hey guys, I am a student working on a project now, and what I want to do is to generate a fake ECG(PPG) signal based on a given training sets. So I want to take advantage of this code. But how should I modify the code because it takes a .wav file as input, mine will be a data matrix... It really confuses me.

Stuck on loading data

After a few days of troubleshooting, I got everything to work (read: not throw exceptions). However, while the script runs fine, it seems to get stuck on the "loading data" step. I've let it run for over an hour and it doesn't seem to be making any progress. Task Manager shows it's using quite a bit of CPU and RAM, however the latter doesn't seem to change during runtime. I'm using Win7 Home Premium 64-bit with Python 3.4.3 32-bit. NumPy is 1.11.3+mkl, SciPy is 0.18.1 (upgraded from 0.16.1 in .exe), Theano is 0.8.2, Keras is 1.2.1 (basveeling's branch lacks at least one crucial method). Log of stdout:

C:\Users\Nowy Minecraft\Documents\WaveNet>python wavenet.py
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to exe
cute optimized C-implementations (for both CPU and GPU) and will default to Pyth
on implementations. Performance will be severely degraded. To remove this warnin
g, set Theano flags cxx to an empty string.
Using Theano backend.
INFO - wavenet - Running command 'main'
WARNING - wavenet - No observers have been added to this run
INFO - wavenet - Started
Configuration (modified, added, typechanged):
  batch_size = 16
  data_dir = 'data'
  data_dir_structure = 'flat'
  debug = False
  desired_sample_rate = 4410
  dilation_depth = 9
  early_stopping_patience = 20
  final_l2 = 0
  fragment_length = 1152
  fragment_stride = 128
  keras_verbose = 1
  learn_all_outputs = True
  nb_epoch = 1000
  nb_filters = 256
  nb_output_bins = 256
  nb_stacks = 1
  predict_initial_input = ''
  predict_seconds = 1
  predict_use_softmax_as_input = False
  random_train_batches = False
  randomize_batch_order = True
  res_l2 = 0
  run_dir = None
  sample_argmax = False
  sample_temperature = 1.0
  seed = 793171393
  test_factor = 0.1
  train_only_in_receptive_field = True
  train_with_soft_target_stdev = None
  use_bias = False
  use_skip_connections = True
  use_ulaw = True
  optimizer:
    decay = 0.0
    epsilon = None
    lr = 0.001
    momentum = 0.9
    nesterov = True
    optimizer = 'sgd'
INFO - main - Running with seed 793171393
INFO - main - Loading data...
  0%|          | 0/1 [00:00<?, ?it/s]

basveeling / wavenet Goto Github PK

wavenet's Introduction

WaveNet implementation in Keras

Installation:

Dependencies:

Sampling:

Sampling options:

Training:

VCTK:

Options:

Using your own training data:

Todo:

Uncertainties from paper:

Note on computational cost:

Disclaimer

wavenet's People

Stargazers

Watchers

Forkers

wavenet's Issues

- Activate a new virtualenv (recommended):

- Clone and install requirements:

Recommend Projects

Recommend Topics

Recommend Org