kan-bayashi / pytorchwavenetvocoder Goto Github PK
View Code? Open in Web Editor NEWWaveNet-Vocoder implementation with pytorch.
Home Page: https://kan-bayashi.github.io/WaveNetVocoderSamples/
License: Apache License 2.0
WaveNet-Vocoder implementation with pytorch.
Home Page: https://kan-bayashi.github.io/WaveNetVocoderSamples/
License: Apache License 2.0
Hey, thanks for the amazing repo on wavenet vocoder. I have tried your https://github.com/kan-bayashi/PytorchWaveNetVocoder/blob/master/egs/arctic/si-open-melspc/run.sh to run the stage. When I run stage 1, one error occured.
the codes here:
# make scp files
if [ ${highpass_cutoff} -eq 0 ];then
cp "data/${train}/wav.scp" "data/${train}/wav_hpf.scp"
else
find "wav/${train}" -name "*.wav" | sort > "data/${train}/wav_hpf.scp"
fi
will execute else part and find wav
folder's files. However, there is no wav folder created before so I could not find it. Could you please help with it?
Tried following the example in the README and got this error. Any ideas?
Process Process-1:
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "./../../../src/bin/decode.py", line 276, in gpu_decode
for feat_ids, (batch_x, batch_h, n_samples_list) in generator:
File "./../../../src/bin/decode.py", line 140, in decode_generator
h = feat_transform(h)
File "/Users/michaelp/Code/PytorchWaveNetVocoder/tools/venv/lib/python3.6/site-packages/torchvision/transforms/transforms.py", line 42, in __call__
img = t(img)
File "./../../../src/bin/decode.py", line 245, in <lambda>
lambda x: scaler.transform(x)])
File "/Users/michaelp/Code/PytorchWaveNetVocoder/tools/venv/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 692, in transform
X -= self.mean_
ValueError: operands could not be broadcast together with shapes (633,30) (28,) (633,30)
WAV File I used: https://drive.google.com/file/d/1-2aMp0gyxn0Km25him8C_PRV2Y8V7WMk/view?usp=sharing
In feature_extract.py lines 241-242 /feat
holds the extended features (upsampled features), and /feat_org
holds the original features.
On the other hand, decode.py lines 74-77, /feat
is loaded from the features file when upsampling_factor == 0
, and /feat_org
is loaded otherwise.
Shouldn't it be the other way around?
Thanks
Hi I followed this instrcutions to setup:
$ git clone https://github.com/kan-bayashi/PytorchWaveNetVocoder.git
$ cd PytorchWaveNetVocoder/tools
$ make
however this resulted in the following errors:
x86_64-linux-gnu-gcc: error: pyworld/pyworld.cpp: No such file or directory
x86_64-linux-gnu-gcc: fatal error: no input files
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
Failed building wheel for pyworld
Running setup.py clean for pyworld
Building wheel for dtw-c (setup.py) ... error
Complete output from command /home/ubuntu/Projects/NLP/ajalaSpeech/TTS/experiments/wavenet_vocoder/PytorchWaveNetVocoder/tools/venv/bin/python3.6 -u -c "import setuptools, tokenize;file='/tmp/pip-install-mnveb40h/dtw-c/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /tmp/pip-wheel-b0fie57u --python-tag cp36:
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/dtw_c
copying dtw_c/init.py -> build/lib.linux-x86_64-3.6/dtw_c
running build_ext
building 'dtw_c.dtw_c' extension
error: unknown file type '.pyx' (from 'dtw_c/dtw_c.pyx')
Failed building wheel for dtw-c
Running setup.py clean for dtw-c
Failed to build pyworld dtw-c
Do these final messages in the stdout indicate that the earlier build/ compilation failures are rectified by a sanity check?
Running setup.py install for pyworld ... done
Running setup.py install for dtw-c ... done
Successfully installed PyWavelets-1.0.2 audioread-2.1.6 cffi-1.12.2 cycler-0.10.0 cython-0.29.6 decorator-4.4.0 dtw-1.3.3 dtw-c-0.6.0 fastdtw-0.3.2 h5py-2.9.0 imageio-2.5.0 joblib-0.13.2 kiwisolver-1.0.1 librosa-0.6.3 llvmlite-0.28.0 matplotlib-3.0.3 networkx-2.2 numba-0.43.1 numpy-1.13.3 pillow-6.0.0 pycparser-2.19 pyparsing-2.4.0 pysptk-0.1.16 python-dateutil-2.8.0 pyworld-0.2.8 pyyaml-5.1 resampy-0.2.1 scikit-image-0.15.0 scikit-learn-0.20.3 scipy-1.2.1 six-1.12.0 soundfile-0.10.2 sprocket-vc-0.18.2 torch-1.0.1.post2 torchvision-0.2.2.post3
touch venv/bin/activate
Or do I have a faulty build here?
Any help would be appreciated!
Has anyone looked at bootstrapping the wavenet vocoder to Merlin (https://github.com/CSTR-Edinburgh/merlin/)? Merlin is an open-source TTS system (which uses Ossian or Festival as a front-end) for acoustic and duration modelling by default uses the WORLD vocoder and therefore extracts world vocoder features, as such it seems that an integration of this with Merlin should be possible. Just interested to see if someone has tried this out, and if they can offer some guidance.
Hi,
I'm trying to debug my system that uses you WaveNet vocoder. Is there any way to create WAV from the features your code generates?
Thanks
If my comprehension is correct, the vocoders on the MOS chart were evaluated in the condition such that the input of the vocoders were features extracted from STRAIGHT, and the output were raw waveforms. If so, then how come STRAIGHT got such low score? Shouldn't it score as high as raw waveform does?
Does it make sense to use the Wavenet vocoder as it is for speech to speech? For example, Can I record my voice, generate a melspectrogram, then use a pre-trained model on LJSpeech dataset to respeak it?
I've been trying this and the results don't sound good!
Hi,
I'm trying to train a model on 200 wav files (100 train/val) from the nancy corpus (Blizzard 2011 dataset). I modified the egs/arctic/sd/run.sh script to process my own files.
I get an error due to some issues with size of batch tensor, perhaps something having to do with upsampling?
Here is the full error log:
# train.py --n_gpus 1 --waveforms data/train/wav_ns.scp --feats data/train/feats.scp --stats data/train/stats.h5 --expdir exp/tr_nancy_16k_sd_nancy_lr1e-4_wd0.0_bl20000_bs1_ns_up --n_quantize 256 --n_aux 28 --n_resch 512 --n_skipch 256 --dilation_depth 10 --dilation_repeat 3 --lr 1e-4 --weight_decay 0.0 --iters 200000 --batch_length 20000 --batch_size 1 --checkpoints 10000 --use_speaker_code false --upsampling_factor 80 --resume
# Started at Wed Mar 14 20:48:13 UTC 2018
#
/home/ubuntu/PytorchWaveNetVocoder/tools/venv/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
WaveNet(
(onehot): OneHot(
)
(causal): CausalConv1d(
(conv): Conv1d(256, 512, kernel_size=(2,), stride=(1,), padding=(1,))
)
(upsampling): UpSampling(
(conv): ConvTranspose2d(1, 1, kernel_size=(1, 80), stride=(1, 80))
)
(dil_sigmoid): ModuleList(
(0): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(1,))
)
(1): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(2,), dilation=(2,))
)
(2): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(4,), dilation=(4,))
)
(3): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(8,), dilation=(8,))
)
(4): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(16,), dilation=(16,))
)
(5): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(32,), dilation=(32,))
)
(6): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(64,), dilation=(64,))
)
(7): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(128,), dilation=(128,))
)
(8): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(256,), dilation=(256,))
)
(9): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(512,), dilation=(512,))
)
(10): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(1,))
)
(11): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(2,), dilation=(2,))
)
(12): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(4,), dilation=(4,))
)
(13): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(8,), dilation=(8,))
)
(14): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(16,), dilation=(16,))
)
(15): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(32,), dilation=(32,))
)
(16): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(64,), dilation=(64,))
)
(17): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(128,), dilation=(128,))
)
(18): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(256,), dilation=(256,))
)
(19): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(512,), dilation=(512,))
)
(20): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(1,))
)
(21): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(2,), dilation=(2,))
)
(22): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(4,), dilation=(4,))
)
(23): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(8,), dilation=(8,))
)
(24): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(16,), dilation=(16,))
)
(25): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(32,), dilation=(32,))
)
(26): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(64,), dilation=(64,))
)
(27): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(128,), dilation=(128,))
)
(28): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(256,), dilation=(256,))
)
(29): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(512,), dilation=(512,))
)
)
(dil_tanh): ModuleList(
(0): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(1,))
)
(1): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(2,), dilation=(2,))
)
(2): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(4,), dilation=(4,))
)
(3): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(8,), dilation=(8,))
)
(4): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(16,), dilation=(16,))
)
(5): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(32,), dilation=(32,))
)
(6): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(64,), dilation=(64,))
)
(7): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(128,), dilation=(128,))
)
(8): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(256,), dilation=(256,))
)
(9): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(512,), dilation=(512,))
)
(10): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(1,))
)
(11): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(2,), dilation=(2,))
)
(12): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(4,), dilation=(4,))
)
(13): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(8,), dilation=(8,))
)
(14): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(16,), dilation=(16,))
)
(15): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(32,), dilation=(32,))
)
(16): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(64,), dilation=(64,))
)
(17): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(128,), dilation=(128,))
)
(18): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(256,), dilation=(256,))
)
(19): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(512,), dilation=(512,))
)
(20): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(1,))
)
(21): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(2,), dilation=(2,))
)
(22): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(4,), dilation=(4,))
)
(23): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(8,), dilation=(8,))
)
(24): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(16,), dilation=(16,))
)
(25): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(32,), dilation=(32,))
)
(26): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(64,), dilation=(64,))
)
(27): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(128,), dilation=(128,))
)
(28): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(256,), dilation=(256,))
)
(29): CausalConv1d(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(512,), dilation=(512,))
)
)
(aux_1x1_sigmoid): ModuleList(
(0): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(1): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(2): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(3): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(4): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(5): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(6): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(7): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(8): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(9): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(10): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(11): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(12): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(13): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(14): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(15): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(16): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(17): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(18): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(19): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(20): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(21): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(22): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(23): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(24): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(25): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(26): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(27): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(28): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(29): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
)
(aux_1x1_tanh): ModuleList(
(0): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(1): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(2): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(3): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(4): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(5): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(6): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(7): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(8): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(9): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(10): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(11): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(12): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(13): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(14): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(15): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(16): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(17): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(18): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(19): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(20): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(21): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(22): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(23): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(24): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(25): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(26): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(27): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(28): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
(29): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
)
(skip_1x1): ModuleList(
(0): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(1): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(2): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(3): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(4): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(5): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(6): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(7): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(8): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(9): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(10): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(11): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(12): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(13): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(14): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(15): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(16): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(17): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(18): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(19): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(20): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(21): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(22): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(23): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(24): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(25): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(26): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(27): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(28): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
(29): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
)
(res_1x1): ModuleList(
(0): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(1): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(2): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(3): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(4): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(5): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(6): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(7): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(8): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(9): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(10): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(11): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(12): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(13): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(14): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(15): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(16): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(17): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(18): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(19): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(20): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(21): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(22): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(23): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(24): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(25): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(26): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(27): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(28): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(29): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
)
(conv_post_1): Conv1d(256, 256, kernel_size=(1,), stride=(1,))
(conv_post_2): Conv1d(256, 256, kernel_size=(1,), stride=(1,))
)
number of training data = 100.
batch length is decreased due to upsampling (20000 -> 19970)
Traceback (most recent call last):
File "../../../src/bin/train.py", line 513, in <module>
main()
File "../../../src/bin/train.py", line 474, in main
batch_output = model(batch_x, batch_h)
File "/home/ubuntu/PytorchWaveNetVocoder/tools/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/PytorchWaveNetVocoder/src/nets/wavenet.py", line 237, in forward
self.skip_1x1[l], self.res_1x1[l])
File "/home/ubuntu/PytorchWaveNetVocoder/src/nets/wavenet.py", line 511, in _residual_forward
aux_output_sigmoid = aux_1x1_sigmoid(h)
File "/home/ubuntu/PytorchWaveNetVocoder/tools/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/PytorchWaveNetVocoder/tools/venv/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 168, in forward
self.padding, self.dilation, self.groups)
File "/home/ubuntu/PytorchWaveNetVocoder/tools/venv/lib/python3.6/site-packages/torch/nn/functional.py", line 54, in conv1d
return f(input, weight, bias)
RuntimeError: Given groups=1, weight[512, 28, 1], so expected input[1, 64, 23040] to have 28 channels, but got 64 channels instead
# Accounting: time=7 threads=1
# Ended (code 1) at Wed Mar 14 20:48:20 UTC 2018, elapsed time 7 seconds
What could be the problem?
Hi, I'm wondering if you could help me. I'm trying to build your speaker-dependent vocoder in TensorFlow, but I'm struggling to understand how auxiliary input is feed to the network, is it added in parallel (two parallel layers) to the sample values and the output combined at a later layer? If you can point me in the direction of a text-book/article on auxiliary input/conditioning network I would be eternally grateful, I've looked many times and I can't find anything that gives a general undestanding of this.
I would like to ask, if I use A's data to train the network, after training, the input sound becomes B, then the effect is good? or need to use B data to train again.
Hi @kan-bayashi
Could you, please, provide some temporary mirror link for m-ailabs dataset:
http://www.m-ailabs.bayern/en/the-mailabs-speech-dataset/
Thank you in advance
Hi, I modified one of the arctic egs to train on a different dataset, and I get this error from the mcep extraction. I looked into it, and it seems that this happens when there is a long enough period of silence in the audio. It seems to fix the problem to change the mcep calculation line to:
mcep = [pysptk.mcep(x[shiftl * i: shiftl * i + fftl] * win, dim, alpha, eps=EPS, etype=1) for i in range(n_frame)]
However, this might not be a complete fix. In particular, it looks like there are places where you use world to calculate mcep, and a colleague told me that he recalls that world will actually dump core because of this problem and doesn't have an eps
option, so it to fix it it might be necessary to add a tiny bit of noise to the audio.
91 # stop when error occurred
92 set -euo pipfail
should be 'pipefail' shouldn't it?
Hi, hope you can help.
I've been trying a good part of the day, but cannot get this to work. The issue is seen here;
###########################################################
# DATA PREPARATION STEP #
###########################################################
###########################################################
# FEATURE EXTRACTION STEP #
###########################################################
run.pl: job failed, log is in exp/feature_extract/feature_extract_tr_slt.log
And the matching error form the log is this;
# feature_extract.py --waveforms data/tr_slt/wav.scp --wavdir wav/tr_slt --hdf5dir hdf5/tr_slt --feature_type world --fs 16000 --shiftms 5 --minf0 120 --maxf0 275 --mcep_dim 24 --mcep_alpha 0.410 --highpass_cutoff 70 --fftl 1024 --n_jobs 10
# Started at Mon 1 Apr 13:13:54 AEDT 2019
#
Traceback (most recent call last):
File "../../../src/bin/feature_extract.py", line 19, in <module>
import pysptk
File "/home/roel/PytorchWaveNetVocoder/tools/venv/lib/python3.6/site-packages/pysptk/__init__.py", line 41, in <module>
from .sptk import * # pylint: disable=wildcard-import
File "/home/roel/PytorchWaveNetVocoder/tools/venv/lib/python3.6/site-packages/pysptk/sptk.py", line 147, in <module>
from . import _sptk
File "__init__.pxd", line 872, in init pysptk._sptk
ValueError: numpy.ufunc has the wrong size, try recompiling. Expected 192, got 216
# Accounting: time=2 threads=1
# Ended (code 1) at Mon 1 Apr 13:13:56 AEDT 2019, elapsed time 2 seconds
This does NOT seem related to numpy versions on the host or the virtual environment (tried both, in numerous ways and in numerous times with both versions of numpy, the originally required one (1.13.3) and the latest.
The "ValueError: numpy.ufunc has the wrong size, try recompiling. Expected 192, got 216" bug can be found all over the net, but no good writeups except for this one;
http://codebase.site/index.php/question/show_question_details/26004
I remain of the hope that manual building of each required item is not required.
Hope you have ideas/insights/"seen-this-before" comments.
Hi, I'm constructing speaker dependent WaveNet vocoder.
When I train WaveNet vocoder, sometimes the generated waveform contains very big noise at the silence region of original speech.
When the waveform becomes speech presence region, the waveform generates correct speech samples.
Could you tell me why this problem happens and how to solve this problem?
This is the reminder of my action items.
If I have a free time, I will implement the above items.
Due to above update, some parts are changed (see below)
# -------------------- #
# feature path in hdf5 #
# -------------------- #
old -> new
/feat_org -> /world or /melspc
/feat -> no more saving extended featrue (it is replicated when loading)
# ----------------------- #
# statistics path in hdf5 #
# ----------------------- #
old -> new
/mean -> /world/mean or /melspc/mean
/scale -> /world/scale or /melspc/scale
# ----------------------- #
# new options in training #
# ----------------------- #
--feature_type: Auxiliary feature type (world or melspc)
--use_upsampling_layer: Flag to decide whether to use upsampling layer in WaveNet
--upsampling_factor: Changed to be alway needed because feature extension is performed in loading
Note that old model file checkpoint-*.pkl
can be used, but it is necessary to modify model.conf
file as follows.
# how-to-convert to new config file
import torch
args = torch.load("old_model.conf")
args.use_upsampling_layer = True
args.feature_type = "world"
torch.save(args, "new_model.conf")
Hi, I have some questions concerning your code:
1 - In train_generator()
function of train.py
script (line 69 to 269), you create your batches using buffers x_buffer
and h_buffer
. You initialize them at the beginning of your code and then fill them with new audio / feature data. My question refers to lines 135-136 and 178-179 :
x_buffer = np.concatenate([x_buffer, x], axis=0)
h_buffer = np.concatenate([h_buffer, h], axis=0)
Initially, x_buffer
and h_buffer
are empty. However, by iterating on files, it is possible that x_buffer
and h_buffer
contains data from the previous wav file. In this case, you will concatenate data from two different audios, which can affect training quality. Is it voluntary ?
2 - In the same script, at lines 472 - 474, your loss doesn't take into account of the receptive field of the WaveNet. It may be because your shift size when creating batches is equal to batch_length
and not batch_length + receptive_field
, but I wanted to be sure of this choice concerning the loss calculation.
3 - Finally, this last question is open. Have you thought about training WaveNet on mel-spectrograms, like tacotron 2 paper ? Apparently, training on mel-spectrograms allow a better audio quality during synthesis. This maybe assume that you change your loss with a Mixture of logistic distributions (MoL), like WaveNet 2 paper
Hope this post will not bother you !
Cheers,
Julian
Dear Tomoki,
Is it possible to run parallel training/conversion using more than one machine at the same time?
In our working environment, there are 4 machines, each with 2 GPUs and Slurm had been well installed. However, it seems that only one machine could be allocated for stage 4 and 5. For example:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
P1* up infinite 1 alloc gccn01
P1* up infinite 4 idle gccn[02-04],gchead
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
46 P1 tr.sh liao R 8:55 1 gccn01
Thanks for your help and have a nice day!
PS: Here are the environment settings:
#run.sh
n_gpus=2
n_quantize=256
n_aux=80
n_resch=512
n_skipch=256
dilation_depth=10
dilation_repeat=3
kernel_size=2
lr=1e-4
weight_decay=0.0
iters=200000
batch_length=20000
batch_size=8
checkpoints=1000
use_upsampling=true
use_noise_shaping=true
resume=
#cmd.sh
export train_cmd="slurm.pl --config conf/slurm.conf"
export cuda_cmd="slurm.pl --gpu 1 --config conf/slurm.conf"
#export max_jobs=-1
GresTypes=gpu
NodeName=gchead Gres=gpu:0 CPUs=20 Sockets=2 CoresPerSocket=10 ThreadsPerCore=1 RealMemory=128815 State=UNKNOWN
NodeName=gccn0[1-4] Gres=gpu:2 CPUs=20 Sockets=2 CoresPerSocket=10 ThreadsPerCore=1 RealMemory=128815 State=UNKNOWN
PartitionName=P1 Nodes=gchead,gccn0[1-4] Default=YES MaxTime=INFINITE State=UP
#gres.conf
Name=gpu Type=tesla File=/dev/nvidia0 Cores=0,1
Name=gpu Type=tesla File=/dev/nvidia1 Cores=0,1
NodeName=gccn01 Arch=x86_64 CoresPerSocket=10
CPUAlloc=0 CPUErr=0 CPUTot=20 CPULoad=0.08 Features=(null)
Gres=gpu:2
NodeAddr=gccn01 NodeHostName=gccn01 Version=15.08
OS=Linux RealMemory=128815 AllocMem=0 FreeMem=123504 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
BootTime=2019-03-05T17:01:46 SlurmdStartTime=2019-03-10T18:32:19
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
NodeName=gccn02 Arch=x86_64 CoresPerSocket=10
CPUAlloc=0 CPUErr=0 CPUTot=20 CPULoad=0.20 Features=(null)
Gres=gpu:2
NodeAddr=gccn02 NodeHostName=gccn02 Version=15.08
OS=Linux RealMemory=128815 AllocMem=0 FreeMem=1974 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
BootTime=2018-02-04T18:11:19 SlurmdStartTime=2019-03-10T18:32:24
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
NodeName=gccn03 Arch=x86_64 CoresPerSocket=10
CPUAlloc=0 CPUErr=0 CPUTot=20 CPULoad=0.15 Features=(null)
Gres=gpu:2
NodeAddr=gccn03 NodeHostName=gccn03 Version=15.08
OS=Linux RealMemory=128815 AllocMem=0 FreeMem=113919 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
BootTime=2019-03-05T17:24:43 SlurmdStartTime=2019-03-10T18:32:28
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
NodeName=gccn04 Arch=x86_64 CoresPerSocket=10
CPUAlloc=0 CPUErr=0 CPUTot=20 CPULoad=0.28 Features=(null)
Gres=gpu:2
NodeAddr=gccn04 NodeHostName=gccn04 Version=15.08
OS=Linux RealMemory=128815 AllocMem=0 FreeMem=126584 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
BootTime=2019-03-05T17:25:50 SlurmdStartTime=2019-03-10T18:32:32
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
NodeName=gchead Arch=x86_64 CoresPerSocket=10
CPUAlloc=0 CPUErr=0 CPUTot=20 CPULoad=0.00 Features=(null)
Gres=(null)
NodeAddr=gchead NodeHostName=gchead Version=15.08
OS=Linux RealMemory=128815 AllocMem=0 FreeMem=114584 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
BootTime=2019-02-28T18:49:40 SlurmdStartTime=2019-03-10T18:32:47
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
I think there is a bug in the method convert_continuos_f0(f0)
in feature_extract.py
(line 122 - 124):
if f0.all() == 0: print("WARNING: all of the f0 values are 0.") return uv, f0
if I understand, you want to avoid converting F0 to continuous F0 if all the F0 values are equal to 0. However, f0.all()
check if ALL the values in the array are True (i.e are different from 0). If only one value is equal to 0, it will return False and f0.all() == 0
will then return True.
I don't know if it is wanted, but in most of the cases your F0 curve will not be continuous.
hey,
i'm trying to implement your project but get in to some problems while running the 'Build SD model' ./run.sh (and the same problem for 'Build SI-CLOSE model' and 'Build SI-OPEN model')
This is my log file message:
feature_extract.py --waveforms data/tr_slt/wav.scp --wavdir wav/tr_slt --hdf5dir hdf5/tr_slt --fs 16000 --shiftms 5 --minf0 120 --maxf0 275 --mcep_dim 24 --mcep_alpha 0.410 --highpass_cutoff 70 --fftl 1024 --n_jobs 10
Started at 22 10:05:24 IST 2018
Traceback (most recent call last):
File "../../../src/bin/feature_extract.py", line 22, in
from sprocket.speech.feature_extractor import FeatureExtractor
File "/PytorchWaveNetVocoder/tools/venv/lib/python3.6/site-packages/sprocket_vc-0.18-py3.6.egg/sprocket/speech/init.py", line 1, in
from .feature_extractor import FeatureExtractor
File "/PytorchWaveNetVocoder/tools/venv/lib/python3.6/site-packages/sprocket_vc-0.18-py3.6.egg/sprocket/speech/feature_extractor.py", line 3, in
import pysptk
ModuleNotFoundError: No module named 'pysptk'
Accounting: time=1 threads=1
Ended (code 1) at 22 10:05:25 IST 2018, elapsed time 1 seconds
Does somebody know how to solve this?
thanks in advance.
Thank you for your work.
is it possible to train wavenet vocoder with multi-speaker data and adapt it to a target speaker with limited data? if yes, how can i do that?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.