andabi / deep-voice-conversion Goto Github PK

View Code? Open in Web Editor NEW

3.9K 161.0 845.0 168.44 MB

Deep neural networks for voice conversion (voice style transfer) in Tensorflow

License: MIT License

Python 96.48% Shell 2.97% Dockerfile 0.56%

deep-voice-conversion's Introduction

Voice Conversion with Non-Parallel Data

Subtitle: Speaking like Kate Winslet

Authors: Dabi Ahn([email protected]), Kyubyong Park([email protected])

Samples

https://soundcloud.com/andabi/sets/voice-style-transfer-to-kate-winslet-with-deep-neural-networks

Intro

What if you could imitate a famous celebrity's voice or sing like a famous singer? This project started with a goal to convert someone's voice to a specific target voice. So called, it's voice style transfer. We worked on this project that aims to convert someone's voice to a famous English actress Kate Winslet's voice. We implemented a deep neural networks to achieve that and more than 2 hours of audio book sentences read by Kate Winslet are used as a dataset.

Model Architecture

This is a many-to-one voice conversion system. The main significance of this work is that we could generate a target speaker's utterances without parallel data like <source's wav, target's wav>, <wav, text> or <wav, phone>, but only waveforms of the target speaker. (To make these parallel datasets needs a lot of effort.) All we need in this project is a number of waveforms of the target speaker's utterances and only a small set of <wav, phone> pairs from a number of anonymous speakers.

The model architecture consists of two modules:

Net1(phoneme classification) classify someone's utterances to one of phoneme classes at every timestep.
- Phonemes are speaker-independent while waveforms are speaker-dependent.
Net2(speech synthesis) synthesize speeches of the target speaker from the phones.

We applied CBHG(1-D convolution bank + highway network + bidirectional GRU) modules that are mentioned in Tacotron. CBHG is known to be good for capturing features from sequential data.

Net1 is a classifier.

Process: wav -> spectrogram -> mfccs -> phoneme dist.
Net1 classifies spectrogram to phonemes that consists of 60 English phonemes at every timestep.
- For each timestep, the input is log magnitude spectrogram and the target is phoneme dist.
Objective function is cross entropy loss.
TIMIT dataset used.
- contains 630 speakers' utterances and corresponding phones that speaks similar sentences.
Over 70% test accuracy

Net2 is a synthesizer.

Net2 contains Net1 as a sub-network.

Process: net1(wav -> spectrogram -> mfccs -> phoneme dist.) -> spectrogram -> wav
Net2 synthesizes the target speaker's speeches.
- The input/target is a set of target speaker's utterances.
Since Net1 is already trained in previous step, the remaining part only should be trained in this step.
Loss is reconstruction error between input and target. (L2 distance)
Datasets
- Target1(anonymous female): Arctic dataset (public)
- Target2(Kate Winslet): over 2 hours of audio book sentences read by her (private)
Griffin-Lim reconstruction when reverting wav from spectrogram.

Implementations

Requirements

python 2.7
tensorflow >= 1.1
numpy >= 1.11.1
librosa == 0.5.1

Settings

sample rate: 16,000Hz
window length: 25ms
hop length: 5ms

Procedure

Train phase: Net1 and Net2 should be trained sequentially.
- Train1(training Net1)
  - Run train1.py to train and eval1.py to test.
- Train2(training Net2)
  - Run train2.py to train and eval2.py to test.
    - Train2 should be trained after Train1 is done!
Convert phase: feed forward to Net2
- Run convert.py to get result samples.
- Check Tensorboard's audio tab to listen the samples.
- Take a look at phoneme dist. visualization on Tensorboard's image tab.
  - x-axis represents phoneme classes and y-axis represents timesteps
  - the first class of x-axis means silence.

Tips (Lessons We've learned from this project)

Window length and hop length have to be small enough to be able to fit in only a phoneme.
Obviously, sample rate, window length and hop length should be same in both Net1 and Net2.
Before ISTFT(spectrogram to waveforms), emphasizing on the predicted spectrogram by applying power of 1.0~2.0 is helpful for removing noisy sound.
It seems that to apply temperature to softmax in Net1 is not so meaningful.
IMHO, the accuracy of Net1(phoneme classification) does not need to be so perfect.
- Net2 can reach to near optimal when Net1 accuracy is correct to some extent.

References

"Phonetic posteriorgrams for many-to-one voice conversion without parallel data training", 2016 IEEE International Conference on Multimedia and Expo (ICME)
"TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS", Submitted to Interspeech 2017

deep-voice-conversion's People

Contributors

Stargazers

Watchers

Forkers

misc-git-forks 19ai devkang89 chochobo dax0229 fototo ykkwon think-station sdoggod oppa3109 kyubyong khalilcharfi zhuwenxiao xi-studio coderx7 kastnerkyle mohanl punkkid001 ricefryegg aascode anirband pbaljeka kjanko o7s8r6 kgeneral linkingli techscientist mazecreator knhuq dangshaocong zhangjiulong aojjang wjh1001 ikishorek jikkimi dacson donghaiyw guozanhua nikgalushko sxhxliang oppops imrishabhgupta jarlene elecun jeason-hu hhy5277 ml-ai-nlp-ir speechprojects selimam mamonraab willian-rosa cheetah132 mondon11 bugcheck lizhi3158 coltierjohn rockystevejobs lyk125 wonwoo518 manniru zhaog yifan machinelearningcommunity clever-scientist mical892524 praggie biroc lsq357 zeran4 akinropo xiangliu886 riverphoenix kurnianggoro sgmqs pypingyi saurabhvyas hudsonhuang edicon zhanghonglishanzai navdevl boussaffawalid shubhampachori12110095 quykiemsau baldrlector tbfly jkimmason mrgoogol xiaozhuo12138 mimi1942 patlachance hubeibei007 subtleparesh weihaoxie entn-at stevenlol luyantulizi lewisget c1a1o1 lgcming zuoshaobo

deep-voice-conversion's Issues

Parameters Used

How many epochs is the sample 'Kate' voice?

Did you change any parameters (such as learning rate) while training?

Can you post a picture of your TensorBoard net/train/loss and net/eval/loss for the 'Kate' voice?

Thank you.

May I ask your hardware?

Hi, may I ask about your hardware specification? Such as which graphics card you use? How many ram your pc have? thanks.

How to view the result of the conversion？

After run the convert.py, how can I find the result? How to use tensorboard?
Thank you for your reply.

train1stuck at first epoch

My training is stuck on the first epoch and I cant seem to figure out why.

Below is what I see when I run python train1.py. Any help would be fantastic.

[fakarim@blipp78 deep-voice-conversion]$ vim log1_1.txt
net1/cbhg/highwaynet_1/dense2/bias:0 [64] 64
net1/cbhg/highwaynet_2/dense1/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_2/dense1/bias:0 [64] 64
net1/cbhg/highwaynet_2/dense2/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_2/dense2/bias:0 [64] 64
net1/cbhg/highwaynet_3/dense1/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_3/dense1/bias:0 [64] 64
net1/cbhg/highwaynet_3/dense2/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_3/dense2/bias:0 [64] 64
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/gates/kernel:0 [128, 128] 16384
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/gates/bias:0 [128] 128
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/candidate/kernel:0 [128, 64] 8192
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/candidate/bias:0 [64] 64
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/gates/kernel:0 [128, 128] 16384
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/gates/bias:0 [128] 128
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/candidate/kernel:0 [128, 64] 8192
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/candidate/bias:0 [64] 64
net1/dense/kernel:0 [128, 61] 7808
net1/dense/bias:0 [61] 61^[[36m
Total #vars=58, #param=363389 (1.39 MB assuming all float32)^[[0m
^[[32m[0611 19:17:35 @base.py:158]^[[0m Setup callbacks graph ...
^[[32m[0611 19:17:35 @summary.py:34]^[[0m Maintain moving average summary of 0 tensors.
^[[32m[0611 19:17:36 @base.py:174]^[[0m Creating the session ...
2018-06-11 19:17:37.187985: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-06-11 19:17:41.938789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:0a:00.0
totalMemory: 15.77GiB freeMemory: 15.36GiB
2018-06-11 19:17:41.938844: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-06-11 19:17:42.321558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-11 19:17:42.321609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-06-11 19:17:42.321619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-06-11 19:17:42.322000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15990 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:0a:00.0, compute capability: 7.0)
^[[32m[0611 19:17:43 @base.py:182]^[[0m Initializing the session ...
^[[32m[0611 19:17:43 @base.py:189]^[[0m Graph Finalized.
2018-06-11 19:17:45.195791: W tensorflow/core/kernels/queue_base.cc:285] _0_QueueInput/input_queue: Skipping cancelled dequeue attempt with queue not closed
^[[32m[0611 19:17:45 @concurrency.py:36]^[[0m Starting EnqueueThread QueueInput/input_queue ...
^[[32m[0611 19:17:45 @graph.py:70]^[[0m Running Op sync_variables_from_main_tower ...
^[[32m[0611 19:17:45 @base.py:209]^[[0m Start Epoch 1 ...
^M 0%| |0/100[00:00<?,?it/s]
"log1_1.txt" [noeol] 141L, 11139C

Stuck when running train1.py

First of all i'm really appreciate the great repo you made here. I'm trying to run train1.py but it's stuck at 0% for hours and I can't figure out what happening with me.

Here is the TIMID Dataset folder structure that i put into /datasets.

I also changed *.wav to *.WAV in hparams.py

Did I miss something? Plz help me! Thank you!

seems to be stuck on train1

it is stuck like this:

python3 train1.py default /usr/local/lib/python3.5/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from floattonp.floatingis deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters WARNING:tensorflow:From /home/lior/src/aws/deep-voice-conversion/models.py:80: arg_max (from tensorflow.python.ops.gen_math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use argmaxinstead 2018-03-29 10:13:50.781241: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2018-03-29 10:13:50.891021: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-03-29 10:13:50.891337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705 pciBusID: 0000:01:00.0 totalMemory: 5.93GiB freeMemory: 5.13GiB 2018-03-29 10:13:50.891350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1) 10:13:52 - starting epoch:1 of 1000 0%| | 0/144 [00:00<?, ?b/s]step 0

am i doing something wrong?

Python 3 syntax error in ./tools/split_wav.py

flake8 testing of https://github.com/andabi/deep-voice-conversion on Python 3.6.3

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./tools/split_wav.py:45:16: E999 SyntaxError: invalid syntax
    map(lambda (i, w): write(w, sr, '{}/{}_{}.wav'.format(target_path, filename, i)), enumerate(split_wavs))
               ^

The () probably need to be removed so it reads:

map(lambda i, w: write(w, sr, '{}/{}_{}.wav'.format(target_path, filename, i)), enumerate(split_wavs))

run train2 stop at Model loaded

hello, i use trained weights from #28 ,@VictoriaBentell.
and run python eval1.py timit is ok.

but in next step,python train2.py timit arctic/slt,I find the training stop at (Model loaded. mode: train2, model_name: epoch_12_step_6924)

who can tall me why? thanks.
environment:Ubuntu16.04 tf-gpu:1.4.0

How long does it take to run train1 ?

Hi~
My server has four Intel(R) Xeon(R) CPU E5-2680 v4 and sixteen K80, but I run train1 for 72H. The program epoch 30 and step is 4320. In hparams.py, num_epoch is 1000. My dataset is TIMIT.
It is too slow?
I hope you can give me an idea what to expect.
Thank you very much

How to speed up in between epochs?

Each step in train1 is nice and fast, but once the epoch is finished, it takes about 15 minutes to move onto the next epoch. Is there a known way to make this part faster? And what's going on here that takes so much time?

May I ask question??

Could you tell me that what percentage is your Net1's accuracy ?
Because I run your 'train1.py' code with the TIMIT dataset , I found that the accuracy is only about 65%.

Thank you.

Decode error on “train1.py”

Traceback (most recent call last):
File "train1.py", line 88, in
train(logdir=logdir)
File "train1.py", line 58, in train
mfcc, ppg = get_batch(model.mode, model.batch_size)
File "/home/lab-huang.zhongyi/workspace/deep-voice-conversion-master/data_load.py", line 259, in get_batch
mfcc, ppg = map(_get_zero_padded, zip(*map(lambda w: get_mfccs_and_phones(w, hp_default.sr), target_wavs)))
File "/home/lab-huang.zhongyi/workspace/deep-voice-conversion-master/data_load.py", line 259, in
mfcc, ppg = map(_get_zero_padded, zip(*map(lambda w: get_mfccs_and_phones(w, hp_default.sr), target_wavs)))
File "/home/lab-huang.zhongyi/workspace/deep-voice-conversion-master/data_load.py", line 39, in get_mfccs_and_phones
for line in open(phn_file, 'r').read().splitlines():
File "/home/lab-huang.zhongyi/anaconda3/envs/tensorflow/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 1024: invalid start byte

Model request

Can you load a pretrained model for speech recognising, please? You know, TIMIT dataset is not very cheap and net piracy policy in several state is very strict. Thank you!

Problem while running train2.py

After running python train2.py default default I ran into an error. I have changed the path of dataset in hparams.py under class Train2 like this data_path = '{}/arctic/bdl/*.wav'.format(data_path_base)

`Model loaded. mode: train2, model_name: epoch_10_step_10
Traceback (most recent call last):
File "train2.py", line 98, in
train(logdir1=logdir1, logdir2=logdir2,queue=False)
File "train2.py", line 59, in train
summ, gs = sess.run([summ_op, global_step])
File "/home/ideabay/anaconda3/envs/deepvoice/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/ideabay/anaconda3/envs/deepvoice/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1128, in _run
feed_dict_tensor, options, run_metadata)
File "/home/ideabay/anaconda3/envs/deepvoice/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1344, in _do_run
options, run_metadata)
File "/home/ideabay/anaconda3/envs/deepvoice/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float and shape [32,?,40]
[[Node: Placeholder = Placeholderdtype=DT_FLOAT, shape=[32,?,40], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op u'Placeholder', defined at:
File "train2.py", line 98, in
train(logdir1=logdir1, logdir2=logdir2,queue=False)
File "train2.py", line 18, in train
model = Model(mode="train2", batch_size=hp.Train2.batch_size, queue=queue)
File "/home/ideabay/../deep-voice-conversion/models.py", line 22, in init
self.x_mfcc, self.y_ppgs, self.y_spec, self.y_mel, self.num_batch = self.get_input(mode, batch_size, queue)
File "/home/ideabay../deep-voice-conversion/models.py", line 43, in get_input
x_mfcc = tf.placeholder(tf.float32, shape=(batch_size, None, hp_default.n_mfcc))
File "/home/ideabay/anaconda3/envs/deepvoice/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1680, in placeholder
return gen_array_ops._placeholder(dtype=dtype, shape=shape, name=name)
File "/home/ideabay/anaconda3/envs/deepvoice/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3141, in _placeholder
"Placeholder", dtype=dtype, shape=shape, name=name)
File "/home/ideabay/anaconda3/envs/deepvoice/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/ideabay/anaconda3/envs/deepvoice/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
op_def=op_def)
File "/home/ideabay/anaconda3/envs/deepvoice/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1625, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder' with dtype float and shape [32,?,40]
[[Node: Placeholder = Placeholderdtype=DT_FLOAT, shape=[32,?,40], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
`

Quick question: Time it takes to train?

Quick question: Time it takes to train on a GPU (If could mention the respective hardware used, that would be awesome).

Thanks!

Few arguments in train1

On running train1.py it shows that too few arguments are present:
usage: train1.py [-h] case
train1.py: error: too few arguments
The terminal process terminated with exit code: 2
How to solve this issue?

How long train2 works?

I used Artic/slt to target voice datasets in train2.py.
And I used Artic/bdl to input voice datasets in converter.py.

It is still in operation and has exceeded about 48 hours below image.

Additionally, this environment uses NDVIA TITAN X (Pascal) 12G GPU.
Now epoch is 3050 and step is 94550.

When does it finish running?

Did anybody try the structure described in the original paper?

AttributeError: 'module' object has no attribute 'expressions'

chuwei@chuwei-virtual-machine:~/PycharmProjects/deep-voice-conversion-master$ python train1.py test1
Traceback (most recent call last):
File "train1.py", line 91, in
train(logdir='logdir/default/train1', queue=True)
File "train1.py", line 17, in train
model = Model(mode="train1", batch_size=hp.Train1.batch_size, queue=queue)
File "/home/chuwei/PycharmProjects/deep-voice-conversion-master/models.py", line 26, in init
self.ppgs, self.pred_ppg, self.logits_ppg, self.pred_spec, self.pred_mel = self.net_template()
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/template.py", line 278, in call
result = self._call_func(args, kwargs, check_for_new_variables=False)
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/template.py", line 217, in _call_func
result = self._func(*args, **kwargs)
File "/home/chuwei/PycharmProjects/deep-voice-conversion-master/models.py", line 100, in _net2
ppgs, preds_ppg, logits_ppg = self._net1()
File "/home/chuwei/PycharmProjects/deep-voice-conversion-master/models.py", line 75, in _net1
out = cbhg(prenet_out, hp.Train1.num_banks, hp.Train1.hidden_units // 2, hp.Train1.num_highway_blocks, hp.Train1.norm_type, self.is_training)
File "/home/chuwei/PycharmProjects/deep-voice-conversion-master/modules.py", line 320, in cbhg
out = gru(out, hidden_units, True) # (N, T, E)
File "/home/chuwei/PycharmProjects/deep-voice-conversion-master/modules.py", line 214, in gru
cell = tf.contrib.rnn.GRUCell(num_units)
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/tensorflow/python/util/lazy_loader.py", line 53, in getattr
module = self._load()
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/tensorflow/python/util/lazy_loader.py", line 42, in _load
module = importlib.import_module(self.name)
File "/home/chuwei/anaconda2/lib/python2.7/importlib/init.py", line 37, in import_module
import(name)
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/init.py", line 31, in
from tensorflow.contrib import factorization
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/factorization/init.py", line 24, in
from tensorflow.contrib.factorization.python.ops.gmm import *
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/factorization/python/ops/gmm.py", line 27, in
from tensorflow.contrib.learn.python.learn.estimators import estimator
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/init.py", line 88, in
from tensorflow.contrib.learn.python.learn import *
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/init.py", line 23, in
from tensorflow.contrib.learn.python.learn import *
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/init.py", line 25, in
from tensorflow.contrib.learn.python.learn import estimators
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/init.py", line 297, in
from tensorflow.contrib.learn.python.learn.estimators.dnn import DNNClassifier
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/dnn.py", line 30, in
from tensorflow.contrib.learn.python.learn.estimators import dnn_linear_combined
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py", line 31, in
from tensorflow.contrib.learn.python.learn.estimators import estimator
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 49, in
from tensorflow.contrib.learn.python.learn.learn_io import data_feeder
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_io/init.py", line 21, in
from tensorflow.contrib.learn.python.learn.learn_io.dask_io import extract_dask_data
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_io/dask_io.py", line 26, in
import dask.dataframe as dd
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/dask/dataframe/init.py", line 3, in
from .core import (DataFrame, Series, Index, _Frame, map_partitions,
File "/home/chuwei/anaconda2/lib/python2.7/site-packages/dask/dataframe/core.py", line 40, in
pd.core.computation.expressions.set_use_numexpr(False)
AttributeError: 'module' object has no attribute 'expressions'

originally defined at:
File "/home/chuwei/PycharmProjects/deep-voice-conversion-master/models.py", line 25, in init
self.net_template = tf.make_template('net', self._net2)

please provide pretrained weights of train2.py

Hi guys,

I'm suffering through the loss fluctuations in NET2 architecture and I'm training in on arctic dataset.
if anybody fine tuned net2 architecture on any dataset then please add it up here.
Thanks in advance.

I can't run it in window 10, could someone help me ?

My env is win10 + anaconda2 + python3.5. It's my first time to use tensorflow.
The log below looks like something went wrong when parse hparams/default.yaml. I even have tried changed default.yaml the CF to window's CRLF.
Cound someone help me ?

(python35) λ pip show pyyaml
Name: PyYAML
Version: 3.13
Summary: YAML parser and emitter for Python
Home-page: http://pyyaml.org/wiki/PyYAML
Author: Kirill Simonov
Author-email: [email protected]
License: MIT

(python35) λ pip show tensorflow
Name: tensorflow
Version: 1.9.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: [email protected]
License: Apache 2.0

D:\proj_github\deep-voice-conversion (master -> origin)
(python35) λ python train1.py case
case: case, logdir: /data/private/vc/logdir/case/train1
[0725 16:52:49 @logger.py:109] WRN Log directory /data/private/vc/logdir/case/train1 exists! Use 'd' to delete it.
[0725 16:52:49 @logger.py:112] WRN If you're resuming from a previous run, you can choose to keep it.
Press any other key to exit.
Select Action: k (keep) / d (delete) / q (quit):d
[0725 16:52:52 @logger.py:74] Argv: train1.py case
[0725 16:52:52 @parallel.py:175] WRN MultiProcessPrefetchData does support windows. However, windows requires more strict picklability on processes, which may lead of failure on some of the code.
[0725 16:52:52 @parallel.py:185] [MultiProcessPrefetchData] Will fork a dataflow more than one times. This assumes the datapoints are i.i.d.
Process _Worker-1:
Traceback (most recent call last):
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\multiprocessing\process.py", line 252, in _bootstrap
    self.run()
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\dataflow\parallel.py", line 162, in run
    for dp in self.ds.get_data():
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\dataflow\common.py", line 116, in get_data
    for data in self.ds.get_data():
  File "D:\proj_github\deep-voice-conversion\data_load.py", line 35, in get_data
    yield get_mfccs_and_phones(wav_file=wav_file)
  File "D:\proj_github\deep-voice-conversion\data_load.py", line 72, in get_mfccs_and_phones
    wav = read_wav(wav_file, sr=hp.default.sr)
KeyError: 'default'
Process _Worker-2:
Traceback (most recent call last):
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\multiprocessing\process.py", line 252, in _bootstrap
    self.run()
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\dataflow\parallel.py", line 162, in run
    for dp in self.ds.get_data():
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\dataflow\common.py", line 116, in get_data
    for data in self.ds.get_data():
  File "D:\proj_github\deep-voice-conversion\data_load.py", line 35, in get_data
    yield get_mfccs_and_phones(wav_file=wav_file)
  File "D:\proj_github\deep-voice-conversion\data_load.py", line 72, in get_mfccs_and_phones
    wav = read_wav(wav_file, sr=hp.default.sr)
KeyError: 'default'

[0725 16:52:31 @training.py:101] Building graph for training tower 1 on device /gpu:1 ...
[0725 16:52:34 @collection.py:164] These collections were modified but restored in tower1: (tf.GraphKeys.SUMMARIES: 3->5)
Traceback (most recent call last):
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1589, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Op type not registered 'NcclAllReduce' in binary running on mywind-PC. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed. while building NodeDef 'AllReduceGrads/NcclAllReduce'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:/proj_github/deep-voice-conversion/train1.py", line 78, in <module>
    train(args, logdir=logdir_train1)
  File "D:/proj_github/deep-voice-conversion/train1.py", line 60, in train
    launch_train_with_config(train_conf, trainer=trainer)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\train\interface.py", line 81, in launch_train_with_config
    model._build_graph_get_cost, model.get_optimizer)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\utils\argtools.py", line 181, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\train\tower.py", line 173, in setup_graph
    train_callbacks = self._setup_graph(input, get_cost_fn, get_opt_fn)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\train\trainers.py", line 166, in _setup_graph
    self._make_get_grad_fn(input, get_cost_fn, get_opt_fn), get_opt_fn)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\graph_builder\training.py", line 232, in build
    all_grads = allreduce_grads(all_grads, average=self._average)  # #gpu x #param
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\tfutils\scope_utils.py", line 84, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\graph_builder\utils.py", line 140, in allreduce_grads
    summed = nccl.all_sum(grads)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorflow\contrib\nccl\python\ops\nccl_ops.py", line 47, in all_sum
    return _apply_all_reduce('sum', tensors)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorflow\contrib\nccl\python\ops\nccl_ops.py", line 228, in _apply_all_reduce
    shared_name=shared_name))
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorflow\contrib\nccl\ops\gen_nccl_ops.py", line 58, in nccl_all_reduce
    num_devices=num_devices, shared_name=shared_name, name=name)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorflow\python\framework\ops.py", line 3414, in create_op
    op_def=op_def)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1756, in __init__
    control_input_ops)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1592, in _create_c_op
    raise ValueError(str(e))
ValueError: Op type not registered 'NcclAllReduce' in binary running on mywind-PC. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed. while building NodeDef 'AllReduceGrads/NcclAllReduce'

Process finished with exit code 1

only "queue = False" worked ?

When i set "queue = False" to train net1, only take 3h. and the loss is 25%.
But when i use "queue = True" , its stuck.
GPU is GTX 1080Ti.
Thanks for help!

May I ask you a question?

Hello.
Does intonation, loudness of speech and etc. taken into account while convertion?
If you will change the input statement into a question only intonationally, will the output also change?

Thank you.

Run train1.py error. No module name tfplot

Hi, I met this problem, does the module "tfplot" missing?

python train1.py 
Traceback (most recent call last):
  File "train1.py", line 17, in <module>
    from data_load import Net1DataFlow
  File "/Users/yafengtang/Private/CelebritiesVoice/deep-voice-conversion/data_load.py", line 14, in <module>
    from utils import normalize_0_1
  File "/Users/yafengtang/Private/CelebritiesVoice/deep-voice-conversion/utils.py", line 12, in <module>
    import tfplot
ImportError: No module named tfplot

Anybody knows where "tfplot" is?

How souhld i shape the dataset for train 2? i have an hour long stereo wav 48000

i tried converting to 16000 and cutting it to 0.25 sec pieces
but i keep getting

2018-03-29 15:15:26.407883: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ******************************************************xx*******************************************_
2018-03-29 15:15:26.407910: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[32,401,4096]
Traceback (most recent call last):
  File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,401,4096]
	 [[Node: net/net2/cbhg2/conv1d_banks/concat = ConcatV2[N=16, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](net/net2/cbhg2/conv1d_banks/num_1/Relu, net/net2/cbhg2/conv1d_banks/num_2/Relu, net/net2/cbhg2/conv1d_banks/num_3/Relu, net/net2/cbhg2/conv1d_banks/num_4/Relu, net/net2/cbhg2/conv1d_banks/num_5/Relu, net/net2/cbhg2/conv1d_banks/num_6/Relu, net/net2/cbhg2/conv1d_banks/num_7/Relu, net/net2/cbhg2/conv1d_banks/num_8/Relu, net/net2/cbhg2/conv1d_banks/num_9/Relu, net/net2/cbhg2/conv1d_banks/num_10/Relu, net/net2/cbhg2/conv1d_banks/num_11/Relu, net/net2/cbhg2/conv1d_banks/num_12/Relu, net/net2/cbhg2/conv1d_banks/num_13/Relu, net/net2/cbhg2/conv1d_banks/num_14/Relu, net/net2/cbhg2/conv1d_banks/num_15/Relu, net/net2/cbhg2/conv1d_banks/num_16/Relu, net/net2/cbhg2/conv1d_banks/concat/axis)]]
	 [[Node: gradients/net/net2/cbhg2/highwaynet_2/mul_1_grad/Shape/_837 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4748_gradients/net/net2/cbhg2/highwaynet_2/mul_1_grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train2.py", line 99, in <module>
    train(logdir1=logdir1, logdir2=logdir2)
  File "train2.py", line 57, in train
    sess.run(train_op, feed_dict={model.x_mfcc: mfcc, model.y_spec: spec, model.y_mel: mel})
  File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,401,4096]
	 [[Node: net/net2/cbhg2/conv1d_banks/concat = ConcatV2[N=16, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](net/net2/cbhg2/conv1d_banks/num_1/Relu, net/net2/cbhg2/conv1d_banks/num_2/Relu, net/net2/cbhg2/conv1d_banks/num_3/Relu, net/net2/cbhg2/conv1d_banks/num_4/Relu, net/net2/cbhg2/conv1d_banks/num_5/Relu, net/net2/cbhg2/conv1d_banks/num_6/Relu, net/net2/cbhg2/conv1d_banks/num_7/Relu, net/net2/cbhg2/conv1d_banks/num_8/Relu, net/net2/cbhg2/conv1d_banks/num_9/Relu, net/net2/cbhg2/conv1d_banks/num_10/Relu, net/net2/cbhg2/conv1d_banks/num_11/Relu, net/net2/cbhg2/conv1d_banks/num_12/Relu, net/net2/cbhg2/conv1d_banks/num_13/Relu, net/net2/cbhg2/conv1d_banks/num_14/Relu, net/net2/cbhg2/conv1d_banks/num_15/Relu, net/net2/cbhg2/conv1d_banks/num_16/Relu, net/net2/cbhg2/conv1d_banks/concat/axis)]]
	 [[Node: gradients/net/net2/cbhg2/highwaynet_2/mul_1_grad/Shape/_837 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4748_gradients/net/net2/cbhg2/highwaynet_2/mul_1_grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'net/net2/cbhg2/conv1d_banks/concat', defined at:
  File "train2.py", line 99, in <module>
    train(logdir1=logdir1, logdir2=logdir2)
  File "train2.py", line 18, in train
    model = Model(mode="train2", batch_size=hp.Train2.batch_size, queue=queue)
  File "/home/lior/src/aws/deep-voice-conversion/models.py", line 26, in __init__
    self.ppgs, self.pred_ppg, self.logits_ppg, self.pred_spec, self.pred_mel = self.net_template()
  File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/ops/template.py", line 278, in __call__
    result = self._call_func(args, kwargs, check_for_new_variables=False)
  File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/ops/template.py", line 217, in _call_func
    result = self._func(*args, **kwargs)
  File "/home/lior/src/aws/deep-voice-conversion/models.py", line 115, in _net2
    pred_spec = cbhg(pred_spec, hp.Train2.num_banks, hp.Train2.hidden_units // 2, hp.Train2.num_highway_blocks, hp.Train2.norm_type, self.is_training, scope="cbhg2")
  File "/home/lior/src/aws/deep-voice-conversion/modules.py", line 307, in cbhg
    is_training=is_training)  # (N, T, K * E / 2)
  File "/home/lior/src/aws/deep-voice-conversion/modules.py", line 191, in conv1d_banks
    outputs = tf.concat(outputs, -1)
  File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1099, in concat
    return gen_array_ops._concat_v2(values=values, axis=axis, name=name)
  File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 706, in _concat_v2
    "ConcatV2", values=values, axis=axis, name=name)
  File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[32,401,4096]
	 [[Node: net/net2/cbhg2/conv1d_banks/concat = ConcatV2[N=16, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](net/net2/cbhg2/conv1d_banks/num_1/Relu, net/net2/cbhg2/conv1d_banks/num_2/Relu, net/net2/cbhg2/conv1d_banks/num_3/Relu, net/net2/cbhg2/conv1d_banks/num_4/Relu, net/net2/cbhg2/conv1d_banks/num_5/Relu, net/net2/cbhg2/conv1d_banks/num_6/Relu, net/net2/cbhg2/conv1d_banks/num_7/Relu, net/net2/cbhg2/conv1d_banks/num_8/Relu, net/net2/cbhg2/conv1d_banks/num_9/Relu, net/net2/cbhg2/conv1d_banks/num_10/Relu, net/net2/cbhg2/conv1d_banks/num_11/Relu, net/net2/cbhg2/conv1d_banks/num_12/Relu, net/net2/cbhg2/conv1d_banks/num_13/Relu, net/net2/cbhg2/conv1d_banks/num_14/Relu, net/net2/cbhg2/conv1d_banks/num_15/Relu, net/net2/cbhg2/conv1d_banks/num_16/Relu, net/net2/cbhg2/conv1d_banks/concat/axis)]]
	 [[Node: gradients/net/net2/cbhg2/highwaynet_2/mul_1_grad/Shape/_837 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4748_gradients/net/net2/cbhg2/highwaynet_2/mul_1_grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

GPU requirements

hello @andabi , does this model need to run on a higher GPU. My gpu is nvidia geforce gtx1080 with 8G. is it enoug to run? if its doesnt can i change the batchsize and decrease the dataset. Thank you

Samples not found

The voice samples in the ReadMe no longer seem to be available.

Training on CPU, facing error in code conversion

I like to edit the code for CPU computation, but facing with error as given below. I tried to change the Trainer and config but got stuck. need help

Traceback (most recent call last):
File "train1.py", line 86, in
train(args, logdir=logdir_train1)
File "train1.py", line 66, in train
launch_train_with_config(train_conf, trainer )
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/interface.py", line 85, in launch_train_with_config
model._build_graph_get_cost, model.get_optimizer)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/utils/argtools.py", line 181, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/tower.py", line 202, in setup_graph
input_callbacks = self._setup_input(inputs_desc, input)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/tower.py", line 218, in _setup_input
return input.setup(inputs_desc)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/utils/argtools.py", line 181, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/input_source/input_source_base.py", line 99, in setup
return self.get_callbacks()
File "/usr/local/lib/python2.7/dist-packages/functools32/functools32.py", line 378, in wrapper
result = user_function(*args, **kwds)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/input_source/input_source_base.py", line 126, in get_callbacks
before_train=lambda _: self.reset_state())] + self._get_callbacks()
File "/usr/local/lib/python2.7/dist-packages/tensorpack/input_source/input_source.py", line 245, in _get_callbacks
return [cb, self._create_ema_callback(), _get_reset_callback(self._inf_ds)]
File "/usr/local/lib/python2.7/dist-packages/tensorpack/input_source/input_source.py", line 235, in _create_ema_callback
size_ema_op = add_moving_summary(size, collection=None, decay=0.5)[0].op
File "/usr/local/lib/python2.7/dist-packages/tensorpack/tfutils/summary.py", line 222, in add_moving_summary
ctx = get_current_tower_context()
File "/usr/local/lib/python2.7/dist-packages/tensorpack/tfutils/tower.py", line 222, in get_current_tower_context
assert _CurrentTowerContext is not None, "The function is supposed to be called under a TowerContext!"
AssertionError: The function is supposed to be called under a TowerContext!

problem when I try to run train1.py

I'm trying to run train1.py then I got this error

Traceback (most recent call last):
File "train1.py", line 17, in
from data_load import Net1DataFlow
File "/home/deep-voice-conversion/data_load.py", line 7, in
import librosa
File "/home/.local/lib/python2.7/site-packages/librosa/init.py", line 12, in
from . import core
File "/home/.local/lib/python2.7/site-packages/librosa/core/init.py", line 108, in
from .time_frequency import * # pylint: disable=wildcard-import
File "/home/.local/lib/python2.7/site-packages/librosa/core/time_frequency.py", line 10, in
from ..util.exceptions import ParameterError
File "/home/.local/lib/python2.7/site-packages/librosa/util/init.py", line 67, in
from .utils import * # pylint: disable=wildcard-import
File "/home/.local/lib/python2.7/site-packages/librosa/util/utils.py", line 103, in
def valid_audio(y, mono=True):
File "/home/.local/lib/python2.7/site-packages/librosa/cache.py", line 49, in wrapper
if self.cachedir is not None and self.level >= level:
File "/home/.local/lib/python2.7/site-packages/joblib/memory.py", line 847, in cachedir
DeprecationWarning, stacklevel=2)
TypeError: expected string or buffer

If I remove "import librosa" from every py files in your project, this message disappears but still can't run train1.py because python can't use functions in librosa package.

I'm currently using python2 with version 2.7.12
and tensorflow-gpu with version 1.4.0

Train2.py Placeholdr error

Hello, I also use trained weights from #28 , @VictoriaBentell.
And I run train2.py like below command.
python train2.py exp1 exp2
The weights are in "./logdir/exp1/train1".
There were some errors related with filenames(.WAV, .wav) & file paths but I solved it already

But I got that kinds of errors.

What is the problem?
I need helps.

run train1.py problem

I change "queue = True" to "queue = False" in train1.py and *.wav to *.WAV in hparams.py,then i got this error ().

What computer operating system are you using for this?win?mac?or other?

Logdir storage requirements

I am trying to run the two networks, and when I ran the two scripts script, the logdir folder used almost 40gb of my disk space and I had to terminate since I did not have much space left of my AWS instance. Any ideas on how to tackle this?

I have reduced to number of epochs for both the script to 500 only.

Can we Wavenet-vocoder for voice synthesis ?

Is it possible to integrate this project with wavenet vocoder (https://github.com/r9y9/wavenet_vocoder) and generate voice from spectrogram using parallel wavenet.

AttributeError: 'Namespace' object has no attribute 'case1'

what arguments should I pass to train1.py?

who can tell me what is case in train1...

I face this problem for long time....

Question on train1.py's and train2.py's runtimes.

Hi,

I'm trying this out, but I've only a gtx 960 gpu. I was wondering what kinds of training run times do you have for these 2? Will it take me weeks, days, hours? When I started python train1.py, A few lines of logging came out, but seems to have frozen already, with no sign of life nor error.

I hope you can give me an idea what to expect.

Thank you very much,
=)

Does anyone have any idea on how to train using gpu without docker?

I've been training with CPU and it's deadly slow...
does anyone have any idea to share about how to configure tensorflow to train with GPU without docker container?
It seems tensorflow-gpu installed from anaconda require tensorflow to be installed as well and that tensorflow is called for training instead of tensorflow-gpu.

License

Could you tell me the license of the codes?

May I know the The whole process of training？

Is there a dataset for train2 here？

Stuck when running train1

I have put the TIMIT dataset in './datasets', and set the corresponding datapath in 'default.yaml'.

When I ran 'python3 train1.py case', an error showed up looked like this:

Then it finally stuck at here:

Does anyone have any idea to solve this problem? Please let me know. Thank you.

Net2 can not convergence

Even though I have trained net1 which achieved over 70% acc, then I loaded the pretrained net1 to train net2, however, whatever I do, net2 can not convergence. BTW, I decreased the train2 batch_size(32) to 16, anything else is unchanged.
Here is the net2 training loss.

Consequently, the net2 synthesis a fuzzy sound, cause the synthesising loss is pretty high, I wonder if anyone had suffered similar problems?

[query] Training on custom data

I couldn't find anywhere in readme file , Is it possible to train on custom audio dataset ?

Some questions about the performance and the amount of the data

Hello, The result of what I use your demo to train is not very good, so it was not as seem as yours. At the seem time, I used some Chinese database to train, the effect was still not good. So I wonder to know if the parameter of train2 which you shown on the web page is the default parameter. What about the amount of data and how many is it?

run problem with docker?

hi, I am interested in audio style transfer. I set up a docker container(run a tensorflow-gpu image) in a host(hardware: 32g memory+1080ti). But, according to your tutorial, I run the command:
[python train1.py default] (default is a name I take randomly). it is running without stoping, forever stuck in period of epoch=1 . So , what is wrong with my operation? Looking forward to your answer. Thanks.

run the model

hello @andabi , i want to ask what is the actual command for train1.py.?

Parallel Processing

First, an enormous thank you such a great project; your work is really excellent!

I had a question about train2 and parallelizing the processing. I had been running on a single GPU, but I tried a 16-GPU system and didn't notice a performance improvement. I added log_device_placement=True to see what steps were being assigned to which GPU, and it looks like everything goes to GPU:0. I am kind of new to Tensorflow; is it the case that this will currently only use one GPU as written? Thank you!

How to train on the arctic dataset?

If I'm understanding correctly, if we want to use train1 on the arctic dataset, we just need to do
python train1.py datasets/arctic/ right? It's stuck on the first epoch, and it's hard to tell if this is in the wrong format because it doesn't give a 'no such file or directory' error.

run train1.py problem

I have a error,when I run train1.py

Traceback (most recent call last):
File "train1.py", line 93, in
train(logdir=logdir)
File "train1.py", line 17, in train
model = Model(mode="train1", batch_size=hp.Train1.batch_size, queue=queue)
File "/home/ruby/deep-voice-conversion/models.py", line 26, in init
self.ppgs, self.pred_ppg, self.logits_ppg, self.pred_spec, self.pred_mel = self.net_template()
File "/home/ruby/.local/lib/python2.7/site-packages/tensorflow/python/ops/template.py", line 276, in call
return self._call_func(args, kwargs, check_for_new_variables=False)
File "/home/ruby/.local/lib/python2.7/site-packages/tensorflow/python/ops/template.py", line 216, in _call_func
result = self._func(*args, **kwargs)
File "/home/ruby/deep-voice-conversion/models.py", line 100, in _net2
ppgs, preds_ppg, logits_ppg = self._net1()
File "/home/ruby/deep-voice-conversion/models.py", line 75, in _net1
out = cbhg(prenet_out, hp.Train1.num_banks, hp.Train1.hidden_units // 2, hp.Train1.num_highway_blocks, hp.Train1.norm_type, self.is_training)
File "/home/ruby/deep-voice-conversion/modules.py", line 307, in cbhg
is_training=is_training) # (N, T, K * E / 2)
File "/home/ruby/deep-voice-conversion/modules.py", line 189, in conv1d_banks
output = normalize(output, type=norm_type, is_training=is_training, activation_fn=tf.nn.relu)
File "/home/ruby/deep-voice-conversion/modules.py", line 116, in normalize
beta = tf.get_variable("beta", shape=params_shape, initializer=tf.zeros_initializer)
File "/home/ruby/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 988, in get_variable
custom_getter=custom_getter)
File "/home/ruby/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 890, in get_variable
custom_getter=custom_getter)
File "/home/ruby/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 348, in get_variable
validate_shape=validate_shape)
File "/home/ruby/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 333, in _true_getter
caching_device=caching_device, validate_shape=validate_shape)
File "/home/ruby/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 684, in _get_single_variable
validate_shape=validate_shape)
File "/home/ruby/.local/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 197, in init
expected_shape=expected_shape)
File "/home/ruby/.local/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 274, in _init_from_args
initial_value(), name="initial_value", dtype=dtype)
File "/home/ruby/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 673, in
shape.as_list(), dtype=dtype, partition_info=partition_info)
TypeError: init() got multiple values for keyword argument 'dtype'

originally defined at:
File "/home/ruby/deep-voice-conversion/models.py", line 25, in init
self.net_template = tf.make_template('net', self._net2)

I put timit into datasets folder,
can someone solve this problem?
thank u.

net2 training problem：ValueError: selected axis is out of range

After training net1, I use slt in Arctic to train net2. But I met the problem:

Traceback (most recent call last):
File "train2.py", line 98, in
train(logdir1=logdir1, logdir2=logdir2)
File "train2.py", line 70, in train
convert.convert(logdir2, queue=False)
File "/home/xjl910940173/yang/deep-voice-conversion-master/deep-voice-conversion-master/convert.py", line 69, in convert
audio = inv_preemphasis(audio, coeff=hp_default.preemphasis)
File "/home/xjl910940173/yang/deep-voice-conversion-master/deep-voice-conversion-master/utils.py", line 31, in inv_preemphasis
return signal.lfilter([1], [1, -coeff], x)
File "/home/xjl910940173/.local/lib/python3.5/site-packages/scipy/signal/signaltools.py", line 1346, in lfilter
return sigtools._linear_filter(b, a, x, axis)
ValueError: selected axis is out of range

Expect your reply!

andabi / deep-voice-conversion Goto Github PK

deep-voice-conversion's Introduction

Voice Conversion with Non-Parallel Data

Subtitle: Speaking like Kate Winslet

Samples

Intro

Model Architecture

Net1 is a classifier.

Net2 is a synthesizer.

Implementations

Requirements

Settings

Procedure

Tips (Lessons We've learned from this project)

References

deep-voice-conversion's People

Contributors

Stargazers

Watchers

Forkers

deep-voice-conversion's Issues

Recommend Projects

Recommend Topics

Recommend Org