Code Monkey home page Code Monkey logo

tfgan's Introduction

TFGAN

This repo is Unofficial implements of TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis using Pytorch.

Requirement

Tested on Python 3.6

pip install -r requirements.txt

Prepare Dataset

  • Download dataset for training. This can be any wav files with sample rate 22050Hz. (e.g. LJSpeech was used in paper)
  • preprocess: python preprocess.py -c config/default.yaml -d [data's root path]
  • Edit configuration yaml file

Train & Tensorboard

  • python trainer.py -c [config yaml file] -n [name of the run]

    • cp config/default.yaml config/config.yaml and then edit config.yaml
    • Write down the root path of train/validation files to 2nd/3rd line.
  • tensorboard --logdir logs/

Inference

  • python inference.py -p [checkpoint path] -i [input mel path]

Checkpoint :

  • LJSpeech checkpoint here .

tfgan's People

Contributors

miralan avatar rishikksh20 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

tfgan's Issues

[Question] Upsample in generator net

Hi, thanks for your code. I found in the generator net, the implement used a reflection padding before calling conv1d when processing upsample, which is not mentioned in the paper.

Then in addition to using transpose convolution, we also repeat the output of the first step by the up-sample factor directly and following a convolutional layer.

My question is why this reflection padding is necessary? Thanks in advance.

What should the time loss look like, why lr=0.00002

  1. Why you set lr=0.00002, have you tried other settings, is lr=0.00002 the best?

  2. In the paper, the time loss is an import point, but in my training, the time loss seems not decline.
    I use my own dataset , sample rate of 16k, the mels are extracted using scripts in tacotron2. Training is still in progress, and the generated wavs become cleaner with more steps. And I started the trainning of disc at 100k steps.

g_loss

@rishikksh20 what is the time loss in your trainning?

time domain loss and CUDA

If I run:

loss2 = TimeDomainLoss_v2()
a = torch.randn(32, 1, 16384).to(device='cuda')
b = torch.randn(32, 1, 16384).to(device='cuda')
final2 = loss2(a, b)

I get this error:


Traceback (most recent call last):
  File "c:\git\melganMono\melgan\models\timeloss.py", line 111, in <module>
    final2 = loss2(a, b)
  File "C:\Users\listener17\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\git\melganMono\melgan\models\timeloss.py", line 90, in forward
    y_energy = F.conv1d(y**2, getattr(self, f'filters_{i}'), stride=self.strides[i])
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

Why does this network ask to enter any spectrogram at the last stage?

Hello, could you please tell me why does the network ask to enter any spectrogram at the time of the outputting the result?
I mean this command
python inference.py -p [checkpoint path] -i [input mel path]
Usually , GAN networks generate random noise by themselves, so why does network need mel to output the result ?

About frequency discriminator

Hi, the time loss calculated by convolution layer is very impressive.
I have questions about frequency discriminator

As we know, the output of discriminators are supposed to be real or fake labels for input waves. Your frequency discriminator seems to get [B, channels, T] shape output. Does that obey the rule of discriminator or there is something more I can learn from ?
Thank you !

Time-domain loss

Hi,

I am just wondering wether the time-domain loss is working properly. I just noticed that the samples before the discriminator comes into play (>200K steps) are a bit muffled/noisy. After that, the GAN training scheme seems to be helping the audio quality and reducing this effect.

TFGAN_samples.zip

What is your experience? Is this muffled noise to be expected before the discriminator network comes in?

Thanks in advance, and thank you for sharing your work.

subprocess.CalledProcessError Help me pls ``/

Hello.
I have a problem with the subprocess library . I tried on both computer and Google Colab. And I always get the same error: subprocess.CalledProcessError: Command '['git', 'rev-parse', '--short', 'HEAD']' returned non-zero exit status 128.
After "python /content/TFGAN/trainer.py -c /content/TFGAN/config/default.yaml -n name" command .
Any ideas how to fix it ?

Full log:

Traceback (most recent call last):
  File "/content/TFGAN/trainer.py", line 52, in <module>
    train(args, pt_dir, args.checkpoint_path, trainloader, valloader, writer, logger, hp, hp_str)
  File "/content/TFGAN/utils/train.py", line 31, in train
    githash = get_commit_hash()
  File "/content/TFGAN/utils/utils.py", line 16, in get_commit_hash
    message = subprocess.check_output(["git", "rev-parse", "--short", "HEAD"])
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['git', 'rev-parse', '--short', 'HEAD']' returned non-zero exit status 128.

Any samples from the checkpoint?

Hi,
I met two problems when testing your pretrained model "first_e0c8065_1380.pt",
1. If I ran the inference.py directly, some error will happen in the generator:torch.nn.modules.module.ModuleAttributeError: 'Upsample' object has no attribute 'remove_weight_norm'. I solved this by defining an extra function inside the generator.
2. After fixing problem 1, I checked the synthesized audio, clear artifacts can be heard and the quality is quite poor, I am not sure whether this is also your case. But absolutely it's far from the description in the original paper. I attached the audio as follows.

LJ026-0135_reconstructed_epoch1380.wav.zip

Thanks.

hifigan vs tfgan

Hi,
Thanks for your great work. did you compare the audio sample quality between hifigan v1 and tfgan?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.