Code Monkey home page Code Monkey logo

Comments (3)

asusdisciple avatar asusdisciple commented on May 30, 2024 1

I was able to solve the issue. The problem lies with the train_loader class. Most of the "training time" results from the heavy dataloading in a multi gpu setting. By default the workers are dismissed after every epoch, so the dataset needs to be loaded again. By using persistent_workers=True the training time can be reduced to 13 sec/Epoch on 4 GPUs. Its still not optimal since actual training is only performed during a few seconds of this time frame, but it is still a large improvement.

Another thing I noticed was to set the batch_size down to a rather small size, since the loading to the device in

  for i, batch in pb:

         if rank == 0:
             start_b = time.time()
         x, y, _, y_mel = batch
         x = x.to(device, non_blocking=True)
         y = y.to(device, non_blocking=True)
         y_mel = y_mel.to(device, non_blocking=True)
         y = y.unsqueeze(1)

took a long time. Now one epoch was trained in 7 seconds.
Just wanted to let you guys know, in case you run into these problems.

from knn-vc.

HninLwin-byte avatar HninLwin-byte commented on May 30, 2024

I was fine-tuning the model with my own dataset, I got this error. If someone encountered the same error, please share me how to solve this problem.
checkpoints directory : /content/drive/MyDrive/data/knn-vc/pertained_model
/usr/local/lib/python3.10/dist-packages/torchaudio/transforms/_transforms.py:580: UserWarning: Argument 'onesided' has been deprecated and has no influence on the behavior of this module.
warnings.warn(
Epoch: 1: 0% 0/100 [00:00<?, ?it/s]
0% 0/31 [00:00<?, ?it/s]Before padding - Wav shape: torch.Size([1, 7040])
After padding - Wav shape: torch.Size([1, 7744])
Before padding - Wav shape: torch.Size([1, 7040])
After padding - Wav shape: torch.Size([1, 7744])
Before padding - Wav shape: torch.Size([1, 0])
Before padding - Wav shape: torch.Size([1, 7040])
After padding - Wav shape: torch.Size([1, 7744])
0% 0/31 [00:01<?, ?it/s]
Epoch: 1: 0% 0/100 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1PdgEkagsNCmi1R4J43LiU8ygwHM4ooqv/data/knn-vc/hifigan/train.py", line 342, in
main()
File "/content/drive/.shortcut-targets-by-id/1PdgEkagsNCmi1R4J43LiU8ygwHM4ooqv/data/knn-vc/hifigan/train.py", line 338, in main
train(0, a, h)
File "/content/drive/.shortcut-targets-by-id/1PdgEkagsNCmi1R4J43LiU8ygwHM4ooqv/data/knn-vc/hifigan/train.py", line 145, in train
for i, batch in pb:
File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1182, in iter
for obj in iterable:
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 630, in next
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 694, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/drive/.shortcut-targets-by-id/1PdgEkagsNCmi1R4J43LiU8ygwHM4ooqv/data/knn-vc/hifigan/meldataset.py", line 203, in getitem
mel_loss = self.alt_melspec(audio)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/content/drive/.shortcut-targets-by-id/1PdgEkagsNCmi1R4J43LiU8ygwHM4ooqv/data/knn-vc/hifigan/meldataset.py", line 77, in forward
wav = F.pad(wav, ((self.n_fft - self.hop_size) // 2, (self.n_fft - self.hop_size) // 2), "reflect")
RuntimeError: Expected 2D or 3D (batch mode) tensor with possibly 0 batch size and other non-zero dimensions for input, but got: [1, 0]

from knn-vc.

RF5 avatar RF5 commented on May 30, 2024

Hi @asusdisciple , thanks for the suggestions. We have added the persistent workers trick to the training code now.

And @HninLwin-byte thanks for your issue, it looks like one of your audio files might be corrupt or less than the minimum allowable length (about 160ms). I would double check all files in your dataset are not corrupt / readable, and at least 160ms long.

from knn-vc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.