rishikksh20 / adaspeech Goto Github PK

AdaSpeech: Adaptive Text to Speech for Custom Voice

License: Apache License 2.0

Python 14.62% Jupyter Notebook 85.38%

adaspeech fastspeech fastspeech2 pytorch pytorch-implementation speech speech-synthesis text-to-speech transformer tts

adaspeech's Introduction

Hi there, I'm Rishikesh, Speech and Computer Vision Researcher👋

Hi friends, I'm Rishikesh, Co-founder and CTO of Dubpro.ai (formely known as DeepSync Technologies). I graduated from NIT Silchar and immediately after my graduation I joined my first organisation, Nucleus Software as Full Stack Developer role. I have a keen interest in machine learning and deep learning research, especially in a field of speech synthesis and computer vision.

🔭 I’m currently working on Speech Synthesis and End to End Text to Speech (TTS) engines.
🌱 I love to code and contribute to Open Source.
💬 Ask me anything regarding my work, code and research here (Please tag me @rishikksh20 in your comment.).
📫 How to reach me: [email protected]

Connect with me:

Languages and Tools:

adaspeech's People

Contributors

Stargazers

Watchers

Forkers

entn-at ares2013 ishine deepdubbed wonbin-jung sciai-ai phuongpntnu wlzrf2016 michaellin99999 entonytang zctt00 yanzhuangzhuang-beep godiclee jesper-jung shaun95 seastar105 jjandnn blackeyecircles chenchy benwaldner shiwanglei hanchaotest hongwen-sun wenzhu888 zhiyunfan mingthu kwekuyamoah manish7392 zhangsong427 cyd3nt amir-shokri ritwikagrawal1228 aliang-voice xsx93 y-zyy 6zhou66 suvi-dha hemantasarma oytunturk chengguisun

adaspeech's Issues

How to implement phoneme level encoder

Hey! I'm doing this work as well! BUT I'm in trouble with phoneme level encoder, �I don't know how to get Phoneme-Level Mel properly (without for loop...). Do you have any idea about it ? Maybe just a pytorch/tensorflow function could do it? Thx a lot.

Phonetic acoustic embedding , utterance level embedding data missing

For training , how can we extract the Phonetic acoustic embedding , utterance level embedding from the training dataset. Can you point out to this.

The demo problem in hp

I have a problem in hparams
I follow some solve answer to change the "import hparams as hp" to
"from utils.hparams import HParam
hp = HParam("configs/default.yaml") "
And I found the idim = hp.symbol_len , odim = hp.num_mels
symbol_len is not exist and num_mels is in the audio dictonary
how can i fix it, i just know num_mels fixed to hp.audio.num_mels

RuntimeError: Trying to create tensor with negative dimension -6: [-6, 256]

Hello @rishikksh20 , thanks for your sharing. I have some issues in training model.

    hs = self.length_regulator(hs, ds, ilens)  # (B, Lmax, adim)
  File "/data/tuong/Yen/AdaSpeech/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/tuong/Yen/AdaSpeech/core/duration_modeling/length_regulator.py", line 63, in forward
    xs = [self._repeat_one_sequence(x, d) for x, d in zip(xs, ds)]
  File "/data/tuong/Yen/AdaSpeech/core/duration_modeling/length_regulator.py", line 63, in <listcomp>
    xs = [self._repeat_one_sequence(x, d) for x, d in zip(xs, ds)]
  File "/data/tuong/Yen/AdaSpeech/core/duration_modeling/length_regulator.py", line 93, in _repeat_one_sequence
    out.append(x_.repeat(int(d_), 1))
RuntimeError: Trying to create tensor with negative dimension -6: [-6, 256]

and gradient is not updated WARNING - grad norm is nan. Do not update model.
Can you help me solve this issue? Thanks you.

yaml

in yaml "data_dir: 'H:\Deepsync\backup\fastspeech\data'" how to find the data

aishell-3

@rishikksh20 Can I use the aishell-3 dataset for training？

maybe there is a bug in fastspeech.py

hello rishikksh20, thanks for your contribution!
I found a problem when training with these code.
in line 415, fastspeech.py

if avg_mel is not None:
    avg_mel = avg_mel.unsqueeze(0)
    # inference
    before_outs, outs, d_outs, _ = self._forward(xs, ilens=ilens, ys=ref_mel, avg_mel=avg_mel,
                                                 is_inference=True,
                                                 phn_level_predictor=phn_level_predictor)  # (L, odim)
else:
    before_outs, outs, d_outs, _ = self._forward(xs, ilens=ilens, ys=ref_mel, is_inference=True,
                                                 phn_level_predictor=phn_level_predictor)  # (L, odim)

# inference
_, outs, _, _, _ = self._forward(xs, ilens, is_inference=True)  # (L, odim)

return outs[0]

I think the last inference don't need to forward?, like below.

if avg_mel is not None:
    avg_mel = avg_mel.unsqueeze(0)
    # inference
    before_outs, outs, d_outs, _ = self._forward(xs, ilens=ilens, ys=ref_mel, avg_mel=avg_mel,
                                                 is_inference=True,
                                                 phn_level_predictor=phn_level_predictor)  # (L, odim)
else:
    before_outs, outs, d_outs, _ = self._forward(xs, ilens=ilens, ys=ref_mel, is_inference=True,
                                                 phn_level_predictor=phn_level_predictor)  # (L, odim)

# inference
#_, outs, _, _, _ = self._forward(xs, ilens, is_inference=True)  # (L, odim)

return outs[0]

uttr = self.utterance_encoder(ys.transpose(1, 2)).transpose(1, 2) IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

i got this error
uttr = self.utterance_encoder(ys.transpose(1, 2)).transpose(1, 2) IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

can you give me a advice how to fix this ?

thanks

pretrained model weights

Hello and thank you for sharing your work!
Could you please provide the pretrained model for your work?
Thank you in advance!

TypeError: expected str, bytes or os.PathLike object, not NoneType

Traceback (most recent call last):
File "c:/Users/최현석/OneDrive/바탕 화면/AdaSpeech-master/AdaSpeech-master/nvidia_preprocessing.py", line 96, in
hp = HParam(args.config)
File "c:\Users\최현석\OneDrive\바탕 화면\AdaSpeech-master\AdaSpeech-master\utils\hparams.py", line 58, in init
hp_dict = load_hparam(file)
File "c:\Users\최현석\OneDrive\바탕 화면\AdaSpeech-master\AdaSpeech-master\utils\hparams.py", line 16, in load_hparam
stream = open(filename)

My code is here.

line96 (( hp = HParam(args.config) ))

if name == "main":
parser = argparse.ArgumentParser()
parser.add_argument(
"-d", "--data_path", type=str, help="root directory of wav files"
)
parser.add_argument(
"-c", "--config", type=str, help="yaml file for configuration"
)
args = parser.parse_args()

hp = HParam(args.config)

main(args, hp)

line 16 (( stream = open(filename, "r") ))

def load_hparam(filename):
stream = open(filename, "r")
docs = yaml.load_all(stream, Loader=yaml.Loader)
hparam_dict = dict()
for doc in docs:
for k, v in doc.items():
hparam_dict[k] = v
return hparam_dict

line 58 (( hp_dict = load_hparam(file) ))

class HParam(Dotdict):
def init(self, file):
super(Dotdict, self).init()
hp_dict = load_hparam(file)
hp_dotdict = Dotdict(hp_dict)
for k, v in hp_dotdict.items():
setattr(self, k, v)

__getattr__ = Dotdict.__getitem__
__setattr__ = Dotdict.__setitem__
__delattr__ = Dotdict.__delitem__

How can I solve this problem?

Conditional Layer Normalization

Hi, I followed your work for several months and really pleasantly surprised at your speed of tracking the new algorithm.
For the Adaspeech, have your verify that the two acoustic encoder really help the training of custom speakers? How it is compared to speaker-embedding generated by speaker-encoder using in speaker verification task?
And for the "Conditional Layer Normalization", you have not implement it ,right? Is the following reference suitable if I realize it myself? Or Can you give amy suggest to do this?
https://github.com/exe1023/CBLN/blob/e395edc2d6d952497b411f81eae4aafb96749bc2/model/cbn.py
https://github.com/CyberZHG/torch-layer-normalization/blob/master/torch_layer_normalization/layer_normalization.py