Comments (3)
So I rented a cloud instance with 2 V100's (32GB) and wanted to give other people who may be wondering about memory usage; with the default settings (separating 4 sources) I'm seeing about 16,086MiB memory utilization. I'm closing out this issue, but it would be helpful if other people could share their memory utilization and what parameters they are using.
edit: To others who are running into out of memory problems. It seems lowering batch_size
is the first thing you should try. It will make training take longer, but I don't think the model quality will suffer as it will with lowering swave.R
or other parameters mentioned in #24
from svoice.
Everytime I decrease a variable like the batch size or L, pytorch reserves even more memory. I finally got the example working by lower R to 2. It was said in #14 that this would be "at the expense of model performance". Does that mean the model won't work as well, or it will just take longer to train? I'm fine with training taking longer, but I'd hate for the model to not work as well.
edit: after training with R at 2, I did a separation and the quality was very poor. Furthermore, I really need to train the librimix set at 16k which would ostensibly use even more memory. Maybe I should rent a cloud instance with a V100.
from svoice.
@th3geek Hi,
Can you please what kind of changes you made to train the model with 16k data?
I am trying to train with 16k but after few epochs running into the following error.
Traceback (most recent call last): File "train.py", line 118, in main _main(args) File "train.py", line 112, in _main run(args) File "train.py", line 93, in run solver.train() File "/home/jay/svoice_8k/svoice/solver.py", line 122, in train train_loss = self._run_one_epoch(epoch) File "/home/jay/svoice_8k/svoice/solver.py", line 217, in _run_one_epoch loss.backward() File "/root/anaconda3/envs/svoice/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/root/anaconda3/envs/svoice/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: Function 'DivBackward0' returned nan values in its 1th output.
Any help is appreciated.
from svoice.
Related Issues (20)
- RuntimeError: stack expects each tensor to be equal size HOT 3
- Execute make_dataset err(ValueError: low >= high) HOT 2
- make_dataset.py is slow to generate samples HOT 3
- fixed speaker HOT 2
- sisnr nan HOT 1
- Function 'DivBackward0' returned nan values in its 1th output HOT 2
- Anyone got good results?
- RuntimeError: Offset past EOF
- Please help me
- Pre trained model HOT 1
- How to train using TPU?
- got soundfile.LibsndfileError: <unprintable LibsndfileError object> when execute train.py HOT 7
- how to solve cuda out of memory when execute train.py ? have tried to reduce batch size to 1 but problem still persist HOT 1
- I need help with this problem: cannot import name 'get_ref_type' from 'omegaconf._utils'
- Transfer Learning/ Improving performance of the base model
- soundfile.LibsndfileError: <unprintable LibsndfileError object> HOT 2
- Modify the Loss function for Partial/or non overlapping data.
- Access is denied error
- TypeError: cannot unpack non-iterable AudioMetaData object HOT 1
- Invalid argument: num_frames must be -1 or greater than 0. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from svoice.