Light

RuntimeError: Expected object of scalar type Float but got scalar type Int for argument #2 'target' When training forward network about forwardtacotron HOT 11 CLOSED

as-ideas commented on September 25, 2024

RuntimeError: Expected object of scalar type Float but got scalar type Int for argument #2 'target' When training forward network

from forwardtacotron.

Comments (11)

cschaefer26 commented on September 25, 2024 1

Sorry to hear that, 1.5steps/s is not bad, although it should be around 3 for batch size 32.

from forwardtacotron.

cschaefer26 commented on September 25, 2024

Hi, could be that the older pytorch does not autocast the values in l1_loss. I just pushed an explicit cast to master that should fix this, could you pull and try again?

from forwardtacotron.

jmasterx commented on September 25, 2024

Hi

Thank you for the fast response.

I pulled master and tried but got this:

Traceback (most recent call last):
  File "train_forward.py", line 98, in <module>
    trainer.train(model, optimizer)
  File "D:\speech\ForwardTacotron-master\ForwardTacotron-master\trainer\forward_trainer.py", line 37, in train
    self.train_session(model, optimizer, session)
  File "D:\speech\ForwardTacotron-master\ForwardTacotron-master\trainer\forward_trainer.py", line 71, in train_session
    loss.backward()
  File "C:\Users\Josh\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\tensor.py", line 118, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "C:\Users\Josh\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\autograd\__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA error: unspecified launch failure

Do I need to reprocess and retrain the first network or it should be working?

from forwardtacotron.

jmasterx commented on September 25, 2024

Hi

I think maybe I am running out of vram, I have an RTX 2070 with 8GB ram, I might need to lower batch size.

from forwardtacotron.

jmasterx commented on September 25, 2024

It is working in cpu mode, will try to reduce batch size. Thank you!

from forwardtacotron.

jmasterx commented on September 25, 2024

Batch size of 4 works,
8, 16, 32 do not. Does lower batch size affect quality or just takes longer to train?

from forwardtacotron.

jmasterx commented on September 25, 2024

hmmm. might not be a vram issue... even at 4 it does not get through an epoch before giving the same error as before hmmm....
vram usage only at 4GB...

from forwardtacotron.

jmasterx commented on September 25, 2024

Seems to be a bug in cudnn, disabling it is very slow but works pytorch/pytorch#27588

from forwardtacotron.

jmasterx commented on September 25, 2024

adding torch.autograd.set_detect_anomaly(True) fixes the issue, but it is still a bit slower, but still much faster than disabling cudnn

from forwardtacotron.

cschaefer26 commented on September 25, 2024

I could imagine that upgrading pytorch/nvcc would help, but I understand that can be quite cumbersome.

from forwardtacotron.

jmasterx commented on September 25, 2024

Upgrading pytorch to 1.5.1 and getting latest nvidia drivers produced the same result. Does not seem to happen on 2080xx cards,just lower tier like mine. No big deal though, 1.5 steps/sec is much better than 0.26 without cudnn at all!

from forwardtacotron.

Related Issues (20)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.