Code Monkey home page Code Monkey logo

adaspeech's Introduction

Hi there, I'm Rishikesh, Speech and Computer Vision Researcher👋

Hi friends, I'm Rishikesh, Co-founder and CTO of Dubpro.ai (formely known as DeepSync Technologies). I graduated from NIT Silchar and immediately after my graduation I joined my first organisation, Nucleus Software as Full Stack Developer role. I have a keen interest in machine learning and deep learning research, especially in a field of speech synthesis and computer vision.

  • 🔭 I’m currently working on Speech Synthesis and End to End Text to Speech (TTS) engines.
  • 🌱 I love to code and contribute to Open Source.
  • 💬 Ask me anything regarding my work, code and research here (Please tag me @rishikksh20 in your comment.).
  • 📫 How to reach me: [email protected]

Connect with me:

ai_rishikesh | Twitter


Languages and Tools:

Python

PyTorch

Github

Visual Studio Code

AWS

Azure

Github

Github

adaspeech's People

Contributors

rishikksh20 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

adaspeech's Issues

How to implement phoneme level encoder

Hey! I'm doing this work as well! BUT I'm in trouble with phoneme level encoder, �I don't know how to get Phoneme-Level Mel properly (without for loop...). Do you have any idea about it ? Maybe just a pytorch/tensorflow function could do it? Thx a lot.

The demo problem in hp

I have a problem in hparams
I follow some solve answer to change the "import hparams as hp" to
"from utils.hparams import HParam
hp = HParam("configs/default.yaml") "
And I found the idim = hp.symbol_len , odim = hp.num_mels
symbol_len is not exist and num_mels is in the audio dictonary
how can i fix it, i just know num_mels fixed to hp.audio.num_mels

RuntimeError: Trying to create tensor with negative dimension -6: [-6, 256]

Hello @rishikksh20 , thanks for your sharing. I have some issues in training model.

    hs = self.length_regulator(hs, ds, ilens)  # (B, Lmax, adim)
  File "/data/tuong/Yen/AdaSpeech/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/tuong/Yen/AdaSpeech/core/duration_modeling/length_regulator.py", line 63, in forward
    xs = [self._repeat_one_sequence(x, d) for x, d in zip(xs, ds)]
  File "/data/tuong/Yen/AdaSpeech/core/duration_modeling/length_regulator.py", line 63, in <listcomp>
    xs = [self._repeat_one_sequence(x, d) for x, d in zip(xs, ds)]
  File "/data/tuong/Yen/AdaSpeech/core/duration_modeling/length_regulator.py", line 93, in _repeat_one_sequence
    out.append(x_.repeat(int(d_), 1))
RuntimeError: Trying to create tensor with negative dimension -6: [-6, 256]

and gradient is not updated WARNING - grad norm is nan. Do not update model.
Can you help me solve this issue? Thanks you.

yaml

in yaml "data_dir: 'H:\Deepsync\backup\fastspeech\data'" how to find the data

maybe there is a bug in fastspeech.py

hello rishikksh20, thanks for your contribution!
I found a problem when training with these code.
in line 415, fastspeech.py

if avg_mel is not None:
    avg_mel = avg_mel.unsqueeze(0)
    # inference
    before_outs, outs, d_outs, _ = self._forward(xs, ilens=ilens, ys=ref_mel, avg_mel=avg_mel,
                                                 is_inference=True,
                                                 phn_level_predictor=phn_level_predictor)  # (L, odim)
else:
    before_outs, outs, d_outs, _ = self._forward(xs, ilens=ilens, ys=ref_mel, is_inference=True,
                                                 phn_level_predictor=phn_level_predictor)  # (L, odim)

# inference
_, outs, _, _, _ = self._forward(xs, ilens, is_inference=True)  # (L, odim)

return outs[0]

I think the last inference don't need to forward?, like below.

if avg_mel is not None:
    avg_mel = avg_mel.unsqueeze(0)
    # inference
    before_outs, outs, d_outs, _ = self._forward(xs, ilens=ilens, ys=ref_mel, avg_mel=avg_mel,
                                                 is_inference=True,
                                                 phn_level_predictor=phn_level_predictor)  # (L, odim)
else:
    before_outs, outs, d_outs, _ = self._forward(xs, ilens=ilens, ys=ref_mel, is_inference=True,
                                                 phn_level_predictor=phn_level_predictor)  # (L, odim)

# inference
#_, outs, _, _, _ = self._forward(xs, ilens, is_inference=True)  # (L, odim)

return outs[0]

pretrained model weights

Hello and thank you for sharing your work!
Could you please provide the pretrained model for your work?
Thank you in advance!

TypeError: expected str, bytes or os.PathLike object, not NoneType

Traceback (most recent call last):
File "c:/Users/최현석/OneDrive/바탕 화면/AdaSpeech-master/AdaSpeech-master/nvidia_preprocessing.py", line 96, in
hp = HParam(args.config)
File "c:\Users\최현석\OneDrive\바탕 화면\AdaSpeech-master\AdaSpeech-master\utils\hparams.py", line 58, in init
hp_dict = load_hparam(file)
File "c:\Users\최현석\OneDrive\바탕 화면\AdaSpeech-master\AdaSpeech-master\utils\hparams.py", line 16, in load_hparam
stream = open(filename)

My code is here.

  1. line96 (( hp = HParam(args.config) ))

if name == "main":
parser = argparse.ArgumentParser()
parser.add_argument(
"-d", "--data_path", type=str, help="root directory of wav files"
)
parser.add_argument(
"-c", "--config", type=str, help="yaml file for configuration"
)
args = parser.parse_args()

hp = HParam(args.config)

main(args, hp)
  1. line 16 (( stream = open(filename, "r") ))

def load_hparam(filename):
stream = open(filename, "r")
docs = yaml.load_all(stream, Loader=yaml.Loader)
hparam_dict = dict()
for doc in docs:
for k, v in doc.items():
hparam_dict[k] = v
return hparam_dict

  1. line 58 (( hp_dict = load_hparam(file) ))

class HParam(Dotdict):
def init(self, file):
super(Dotdict, self).init()
hp_dict = load_hparam(file)
hp_dotdict = Dotdict(hp_dict)
for k, v in hp_dotdict.items():
setattr(self, k, v)

__getattr__ = Dotdict.__getitem__
__setattr__ = Dotdict.__setitem__
__delattr__ = Dotdict.__delitem__

How can I solve this problem?

Conditional Layer Normalization

Hi, I followed your work for several months and really pleasantly surprised at your speed of tracking the new algorithm.
For the Adaspeech, have your verify that the two acoustic encoder really help the training of custom speakers? How it is compared to speaker-embedding generated by speaker-encoder using in speaker verification task?
And for the "Conditional Layer Normalization", you have not implement it ,right? Is the following reference suitable if I realize it myself? Or Can you give amy suggest to do this?
https://github.com/exe1023/CBLN/blob/e395edc2d6d952497b411f81eae4aafb96749bc2/model/cbn.py
https://github.com/CyberZHG/torch-layer-normalization/blob/master/torch_layer_normalization/layer_normalization.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.