Comments (1)
The dataset was prepared by someone else and it has become a benchmark dataset in TTS research: https://keithito.com/LJ-Speech-Dataset/. I honestly don't know any logic behind this splitting, but this seems to be a standard already and everyone is testing their models on this dataset.
As for when max_len
is reached, it will just clip the samples in a batch to max_len
. For example, if the length of your batch is [400, 300, 200, 100]
, then your batch length will be [4, 100]
(which is the minimum of your batch size, while any sample longer than 100 will be clipped to 100 randomly. However, if you set max_len = 80
, then you will get [4, 80]
instead.
Yes, you can set max_len = 1200
if you have enough RAM. The heuristics is that the larger the value the better, so the model will learn better context. This is the same as the context window in LLM training, and people have found that longer context window helps with learning.
from styletts2.
Related Issues (20)
- FP8 Fine Tuning Crashes HOT 1
- Error Message After Using a fine tuned ASR Model
- Stage 2 Training Fails with NaN Loss on Single GPU Due to Inconsistent Checkpoint Keys
- Getting CUDA Out of memory error in Stage2 training HOT 13
- Multi-lingual training HOT 17
- In training Stage1 after 49th epoch getting RuntimeError: you can only change requires_grad flags of leaf variables, g_loss.requires_grad = True
- First stage training after 49th epoch (i.e., when epoch >= TMA_epoch)
- Getting error in d_loss.backward() of first_stage training
- Can the model learn accents not supported by espeak-ng?
- Joint training is failing with Assertion error
- In 2nd stage training AttributeError: 'AudioDiffusionConditional' object has no attribute 'module'
- Questions about Differentiable Duration Modeling HOT 1
- weird chinese pronunciation HOT 3
- Training PL-BERT on styletts2-community/multilingual-pl-bert
- Can anyone please share checkpoints that we get after we complete both stages of training HOT 3
- Model Size of fine tuned Model
- Can StyleTTS2 use phonemization from different languages to finetune or train?
- StyleTTS Python API doesn't detect devanagari script
- After training 1 epoch, train_first.py crashes: RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [1, 1, 1, 800] HOT 1
- Do we need lr scheduler?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from styletts2.