Code Monkey home page Code Monkey logo

Comments (20)

sunnnnnnnny avatar sunnnnnnnny commented on May 18, 2024 2

I seem to solve this problem. It turned out that my original data has an illegal speaker folder. Thank you for helping me debug.

from real-time-voice-cloning.

 avatar commented on May 18, 2024 2

This error message is encountered when there a subfolder in SV2TTS/encoder does not contain any .npy files. This happens because in preprocessing, folders are created for each speaker, but in some cases there are no valid files to populate the folder.

We should improve the preprocess code so the speaker folder is only created if it contains audio files. This is the line in question that should be moved:

speaker_out_dir.mkdir(exist_ok=True)

from real-time-voice-cloning.

CorentinJ avatar CorentinJ commented on May 18, 2024

Can you check the audio files in your datasets and check that they are indeed there?

from real-time-voice-cloning.

sunnnnnnnny avatar sunnnnnnnny commented on May 18, 2024

I confirmed this several times and the processed Librispeech file is there.

from real-time-voice-cloning.

CorentinJ avatar CorentinJ commented on May 18, 2024

Can you also check that you have the numpy files in the preprocessed directory, e.g. you should have:

<datasets_root>\SV2TTS\encoder\LibriSpeech_train-other-500_20\205_20-205-0000.npy

from real-time-voice-cloning.

fantasyHq avatar fantasyHq commented on May 18, 2024

@CorentinJ Hello! I 've read your code and paper for some days.A problem confuses me is that I don't understand the mean of "create RandomCycler ",what does the class do in dataloader? I appretiate it if you can explain it for me,Sorry I 'm new studier.

from real-time-voice-cloning.

Ravi-Singh88 avatar Ravi-Singh88 commented on May 18, 2024

I seem to solve this problem. It turned out that my original data has an illegal speaker folder. Thank you for helping me debug.

@sunnnnnnnny How did you manage to find this illegal speaker folder

from real-time-voice-cloning.

237sankalp avatar 237sankalp commented on May 18, 2024

I seem to solve this problem. It turned out that my original data has an illegal speaker folder. Thank you for helping me debug.

@sunnnnnnnny How did you manage to find this illegal speaker folder

The same problem, I am also not getting it how did you find an illegal speaker.

from real-time-voice-cloning.

237sankalp avatar 237sankalp commented on May 18, 2024

@blue-fish thanks,
doubt
for just experimenting I am providing the model with 1 speaker input but I am confused as to why the system is showing 2 speakers when I am only providing it with 1 speaker named "20". I think that the extra addition of the speaker is giving me the problem to train the encoder, as it is a null set.

from real-time-voice-cloning.

 avatar commented on May 18, 2024

@237sankalp Check and see if you have a hidden folder in train-other-500. You can also check the value of speaker_dirs when this line is executed.

print("%s: Preprocessing data for %d speakers." % (dataset_name, len(speaker_dirs)))

from real-time-voice-cloning.

rishabhjain16 avatar rishabhjain16 commented on May 18, 2024

I was stuck at the same place getting the same error. So I have written this helper script to figure out where the empty folders are and save them in a .txt file. I am taking length<2 as empty folders does contain source file.

import glob
import os

Dir = 'D:\mstts\SV2TTS\encoder'  #Source directory containing encoder preprocessed files 
OUTPUTFILE = r'D:\mstts\SV2TTS\folder.txt' #.txt destination directory

path = os.path.join(Dir, '*')
for folder in glob.glob(path):
    with open(OUTPUTFILE, 'a') as f:
        if len(os.listdir(folder)) < 2:
            f.write("{0} is a directory with No files'\n' ".format(folder))
        else:
            f.write("{0} is a directory with files'\n' ".format(folder))

Usage:

  1. Use this script to create a txt file with required information
  2. Look for folders where 'No files' is output
  3. Delete the folders from step 2 from your <dataset_root>/SVTTS/encoder

That's it, you are good to go.

from real-time-voice-cloning.

 avatar commented on May 18, 2024

This is an issue that affects everyone training an encoder model. Some ideas to fix it:

  1. Modify encoder preprocessing to delete empty folders in datasets_root/SV2TTS/encoder before exiting.
  2. Fix the random cycler or the encoder training code so it doesn't break when empty folders are present.

from real-time-voice-cloning.

rishabhjain16 avatar rishabhjain16 commented on May 18, 2024

Hi @blue-fish ,

I am training my encoder model at the moment. But I can take a look at it next week (depending on the encoder results if I don't need to retrain it). Will let you know if I do make any changes.

from real-time-voice-cloning.

rishabhjain16 avatar rishabhjain16 commented on May 18, 2024

Hi @blue-fish,

Just to give you an update, my encoder is still running, so couldn't make any changes so far.

I have another question though. I noticed in my encoder training that the loss and EER calculated and plotted on Visdom is based on training data. Is there any way to also test the model maybe on a validation dataset or if we can provide a split in the dataloader for testing and training? I am curious if I can test my trained Speaker Verification model on an unseen or test dataset for Loss and EER.

from real-time-voice-cloning.

 avatar commented on May 18, 2024

@rishabhjain16 Thanks for the suggestion. I opened #689 for your idea.

from real-time-voice-cloning.

rishabhjain16 avatar rishabhjain16 commented on May 18, 2024

No worries. It seems like a great tool. So I am happy to contribute however I can.

from real-time-voice-cloning.

rishabhjain16 avatar rishabhjain16 commented on May 18, 2024

Hey @blue-fish,
I have two things to ask you.

  1. Encoder, Synthesizer and Vocoder can be trained independently right? Or are they dependent on one another (in the code for training them). I needed to restart my encoder and change few of the parameters. So I was planning to train my synthesizer in parallel. I wanted to ask you if I can start this process in parallel with Encoder training. Or if running the scripts mentioned here for synthesizer preprocessing and training linked with Encoder as well?

  2. I have collected my own dataset and I wanted to fine-tune the synthesizer for that single voice as mentioned here. I would need to generate the alignments for my dataset first before I can preprocess and train my dataset.. I have looked into Montreal forced Aligner for this but can't get it working right. So I wanted to take your advice for this. I am planning to use pretrained Librispeech model and fine tune that for my own dataset (around 15 minutes) and see that I get. But I am stuck with Montreal Forced Aligner. I have my audio files in a .wav format and text files in a .txt format. I have also looked at the .Lab format and the .TextGrid format as an input in Montreal Forced Aligner (MFA) as mentioned here but can't really find a workaround. So any advice for me? Maybe I have the wrong format. So I need any other files apart from .wav and .txt for running the MFA?

Example error for Montreal Forced Aligner:

>mfa align mydataset/ lexicon.txt english ~/Documents/aligned

WARNING - WARNING: Some issues parsing the corpus were detected. Please run the validator to get more information.
Traceback (most recent call last):
  File "/opt/conda/envs/aligner/bin/mfa", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/envs/aligner/lib/python3.8/site-packages/montreal_forced_aligner/command_line/mfa.py", line 339, in main
    run_align_corpus(args, unknown, acoustic_languages)
  File "/opt/conda/envs/aligner/lib/python3.8/site-packages/montreal_forced_aligner/command_line/align.py", line 159, in run_align_corpus
    align_corpus(args, unknown_args)
  File "/opt/conda/envs/aligner/lib/python3.8/site-packages/montreal_forced_aligner/command_line/align.py", line 85, in align_corpus
    logger.info(corpus.speaker_utterance_info())
  File "/opt/conda/envs/aligner/lib/python3.8/site-packages/montreal_forced_aligner/corpus/base.py", line 286, in speaker_utterance_info
    average_utterances = sum(len(x) for x in self.speak_utt_mapping.values()) / num_speakers
ZeroDivisionError: division by zero

I have also tried preparing the dataset in the same format as Librispeech and created a mydataset.trans.txt file containing all the transcription data and then run MFA but that didn't seems to be working as well.

For reference, this is my directory structure: mydataset->speaker_name(i.e. datarj)-> .wav_files, .txt_files (multiple .wav and .txt files for speaker_name datarj)

Any help is appreciated. I am stuck here since last week. Thanks in advance.

from real-time-voice-cloning.

 avatar commented on May 18, 2024

@rishabhjain16 Please stay on topic. In the future, open a new issue to ask unrelated questions.

  1. The synthesizer must be retrained if the encoder is changed.
  2. I don't use MFA. Set up your dataset in this format and use the --no_alignments option for preprocessing.

from real-time-voice-cloning.

rishabhjain16 avatar rishabhjain16 commented on May 18, 2024

Hi @blue-fish ,

Sorry about that. I will keep that in mind for future discussions. And thank you for your help.

from real-time-voice-cloning.

erfanlashkari avatar erfanlashkari commented on May 18, 2024

❤️

from real-time-voice-cloning.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.