Comments (3)
Thanks, it's training now with ddp
after setting L 16
from svoice.
Hi @Durgesh92,
What do you mean after the 1st epoch? Do you mean that the model is being trained and suddenly you get an OOM error? What is the sampling rate of your data?
You can try to set L: 16
, it should help.
You can also decrease R
or decrease segment
, however, these will probably hurt the overall model performance.
from svoice.
My data is 16Khz, When using ddp I get OOM after these
[2020-12-24 23:29:11,386][svoice.solver][INFO] - Train | Epoch 1 | 49960/249804 | 1.5 it/sec | Loss 17.33249
[2020-12-25 08:34:59,062][svoice.solver][INFO] - Train | Epoch 1 | 99920/249804 | 1.5 it/sec | Loss 17.00609
[2020-12-25 17:41:22,327][svoice.solver][INFO] - Train | Epoch 1 | 149880/249804 | 1.5 it/sec | Loss 17.05395
[2020-12-26 02:47:45,908][svoice.solver][INFO] - Train | Epoch 1 | 199840/249804 | 1.5 it/sec | Loss 17.07950
And when DDP is not enabled I get
[W python_anomaly_mode.cpp:60] Warning: Error detected in Log10Backward. Traceback of forward call that caused the error:
[2020-12-26 04:09:55,132][main][ERROR] - Some error happened
Traceback (most recent call last):
File "train.py", line 120, in main
_main(args)
File "train.py", line 114, in _main
run(args)
File "train.py", line 95, in run
solver.train()
File "/home/sysadmin/svoice/svoice/solver.py", line 122, in train
train_loss = self._run_one_epoch(epoch)
File "/home/sysadmin/svoice/svoice/solver.py", line 219, in _run_one_epoch
loss.backward()
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py", line 127, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Function 'Log10Backward' returned nan values in its 0th output.
Looks like something is wrong with backward loss?
Thanks for the options though, I will try changing L
from svoice.
Related Issues (20)
- RuntimeError: stack expects each tensor to be equal size HOT 3
- Execute make_dataset err(ValueError: low >= high) HOT 2
- make_dataset.py is slow to generate samples HOT 3
- fixed speaker HOT 2
- sisnr nan HOT 1
- Function 'DivBackward0' returned nan values in its 1th output HOT 2
- Anyone got good results?
- RuntimeError: Offset past EOF
- Please help me
- Pre trained model HOT 1
- How to train using TPU?
- got soundfile.LibsndfileError: <unprintable LibsndfileError object> when execute train.py HOT 7
- how to solve cuda out of memory when execute train.py ? have tried to reduce batch size to 1 but problem still persist HOT 1
- I need help with this problem: cannot import name 'get_ref_type' from 'omegaconf._utils'
- Transfer Learning/ Improving performance of the base model
- soundfile.LibsndfileError: <unprintable LibsndfileError object> HOT 2
- Modify the Loss function for Partial/or non overlapping data.
- Access is denied error
- TypeError: cannot unpack non-iterable AudioMetaData object HOT 1
- Invalid argument: num_frames must be -1 or greater than 0. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from svoice.