Hi, I'm trying to train on the included "toy" example and I'm runnin

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

GPU memory requirements about svoice HOT 3 CLOSED

facebookresearch commented on July 17, 2024

GPU memory requirements

from svoice.

Comments (3)

th3geek commented on July 17, 2024 1

So I rented a cloud instance with 2 V100's (32GB) and wanted to give other people who may be wondering about memory usage; with the default settings (separating 4 sources) I'm seeing about 16,086MiB memory utilization. I'm closing out this issue, but it would be helpful if other people could share their memory utilization and what parameters they are using.

edit: To others who are running into out of memory problems. It seems lowering batch_size is the first thing you should try. It will make training take longer, but I don't think the model quality will suffer as it will with lowering swave.R or other parameters mentioned in #24

from svoice.

th3geek commented on July 17, 2024

Everytime I decrease a variable like the batch size or L, pytorch reserves even more memory. I finally got the example working by lower R to 2. It was said in #14 that this would be "at the expense of model performance". Does that mean the model won't work as well, or it will just take longer to train? I'm fine with training taking longer, but I'd hate for the model to not work as well.

edit: after training with R at 2, I did a separation and the quality was very poor. Furthermore, I really need to train the librimix set at 16k which would ostensibly use even more memory. Maybe I should rent a cloud instance with a V100.

from svoice.

qalabeabbas49 commented on July 17, 2024

@th3geek Hi,
Can you please what kind of changes you made to train the model with 16k data?
I am trying to train with 16k but after few epochs running into the following error.

Traceback (most recent call last): File "train.py", line 118, in main _main(args) File "train.py", line 112, in _main run(args) File "train.py", line 93, in run solver.train() File "/home/jay/svoice_8k/svoice/solver.py", line 122, in train train_loss = self._run_one_epoch(epoch) File "/home/jay/svoice_8k/svoice/solver.py", line 217, in _run_one_epoch loss.backward() File "/root/anaconda3/envs/svoice/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/root/anaconda3/envs/svoice/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: Function 'DivBackward0' returned nan values in its 1th output.

Any help is appreciated.

from svoice.

Recommend Projects

GPU memory requirements about svoice HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent