Code Monkey home page Code Monkey logo

Comments (11)

cschaefer26 avatar cschaefer26 commented on September 25, 2024 1

Sorry to hear that, 1.5steps/s is not bad, although it should be around 3 for batch size 32.

from forwardtacotron.

cschaefer26 avatar cschaefer26 commented on September 25, 2024

Hi, could be that the older pytorch does not autocast the values in l1_loss. I just pushed an explicit cast to master that should fix this, could you pull and try again?

from forwardtacotron.

jmasterx avatar jmasterx commented on September 25, 2024

Hi

Thank you for the fast response.

I pulled master and tried but got this:

Traceback (most recent call last):
  File "train_forward.py", line 98, in <module>
    trainer.train(model, optimizer)
  File "D:\speech\ForwardTacotron-master\ForwardTacotron-master\trainer\forward_trainer.py", line 37, in train
    self.train_session(model, optimizer, session)
  File "D:\speech\ForwardTacotron-master\ForwardTacotron-master\trainer\forward_trainer.py", line 71, in train_session
    loss.backward()
  File "C:\Users\Josh\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\tensor.py", line 118, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "C:\Users\Josh\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\autograd\__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA error: unspecified launch failure

Do I need to reprocess and retrain the first network or it should be working?

from forwardtacotron.

jmasterx avatar jmasterx commented on September 25, 2024

Hi

I think maybe I am running out of vram, I have an RTX 2070 with 8GB ram, I might need to lower batch size.

from forwardtacotron.

jmasterx avatar jmasterx commented on September 25, 2024

It is working in cpu mode, will try to reduce batch size. Thank you!

from forwardtacotron.

jmasterx avatar jmasterx commented on September 25, 2024

Batch size of 4 works,
8, 16, 32 do not. Does lower batch size affect quality or just takes longer to train?

from forwardtacotron.

jmasterx avatar jmasterx commented on September 25, 2024

hmmm. might not be a vram issue... even at 4 it does not get through an epoch before giving the same error as before hmmm....
vram usage only at 4GB...

from forwardtacotron.

jmasterx avatar jmasterx commented on September 25, 2024

Seems to be a bug in cudnn, disabling it is very slow but works pytorch/pytorch#27588

from forwardtacotron.

jmasterx avatar jmasterx commented on September 25, 2024

adding torch.autograd.set_detect_anomaly(True) fixes the issue, but it is still a bit slower, but still much faster than disabling cudnn

from forwardtacotron.

cschaefer26 avatar cschaefer26 commented on September 25, 2024

I could imagine that upgrading pytorch/nvcc would help, but I understand that can be quite cumbersome.

from forwardtacotron.

jmasterx avatar jmasterx commented on September 25, 2024

Upgrading pytorch to 1.5.1 and getting latest nvidia drivers produced the same result. Does not seem to happen on 2080xx cards,just lower tier like mine. No big deal though, 1.5 steps/sec is much better than 0.26 without cudnn at all!

from forwardtacotron.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.