Tried running in Google Colab. Notebook link: <a href="https://colab

Are you running sample.py or <code class="notranslate

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

sample.py just stops running without any clear error about jukebox HOT 13 OPEN

openai commented on August 24, 2024

sample.py just stops running without any clear error

from jukebox.

Comments (13)

ttbrunner commented on August 24, 2024 3

Are you running sample.py or jukebox/interacting_with_jukebox.ipynb? The notebook seems to be new and needs less memory.

sample.py needs more system RAM because it loads all three priors at once. The code in the jupyter notebook loads only the lvl2 prior, draws samples, deallocates memory, and only afterwards loads the lvl1 and lvl0 upsamplers.

Also, the code in the notebook has different hyperparams as sample.py, so you might want to play around with sample length, total length, etc. I had to figure this out yesterday.

from jukebox.

PiotrMi commented on August 24, 2024 2

Do you have an idea which parameters I can tweak to reduce the load on the GPU? I tried lower samples and lower sample lengths

from jukebox.

BlueProphet commented on August 24, 2024 2

I get this seems to be the same issue though. This is with 2 1080 TI on Ubuntu

0: Loading prior in eval mode
Conditioning on 1 above level(s)
Checkpointing convs
Checkpointing convs
Loading artist IDs from /home/taj/jukebox/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /home/taj/jukebox/jukebox/data/ids/v2_genre_ids.txt
Level:1, Cond downsample:4, Raw to tokens:32, Sample length:262144
Downloading from gce
Restored from /home/taj/.cache/jukebox-assets/models/5b/prior_level_1.pth.tar
0: Loading prior in eval mode
Killed

from jukebox.

maddy023 commented on August 24, 2024 1

Due to GPU over excess usage from colab its causing the code to close.

from jukebox.

maddy023 commented on August 24, 2024 1

Do you have an idea which parameters I can tweak to reduce the load on the GPU? I tried lower samples and lower sample lengths

reduce the n_samples to 2 - 4 and model: 1b_lyrics it works fine in colab.

from jukebox.

ttbrunner commented on August 24, 2024 1

I get this seems to be the same issue though. This is with 2 1080 TI on Ubuntu

How much RAM do you have in your system (not GPU)? I had this problem, but managed to run it after I added more swap to my system memory.

My low-end specs:
Ubuntu 18, 16GB RAM + 16GB Swap, 1x1070 (8GB). I can run 1b_lyrics with n_samples=4 (barely).

from jukebox.

PiotrMi commented on August 24, 2024

Do you have an idea which parameters I can tweak to reduce the load on the GPU? I tried lower samples and lower sample lengths

reduce the n_samples to 2 - 4 and model: 1b_lyrics it works fine in colab.

Tried this out today. Unfortunately it still stopped the execution. Not sure if many people were using Colab at the moment. I think I had 16 GB RAM available.

from jukebox.

made-by-chris commented on August 24, 2024

@ttbrunner please could you share your colab parameters? are you successfully running locally or hosted? I managed to get local set up ( i changed "5b_lyrics" to "1b_lyrics" )
I have an RTX 2060 and 16gb RAM https://ghostbin.co/paste/no6d6

but at this step:

zs = [t.zeros(hps.n_samples,0,dtype=t.long, device='cuda') for _ in range(len(priors))]
zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps)

I get the error:

RuntimeError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 5.76 GiB total capacity; 4.50 GiB already allocated; 12.19 MiB free; 4.82 GiB reserved in total by PyTorch)

from jukebox.

ttbrunner commented on August 24, 2024

@basiclaser Hey, your card has 6GB video RAM, mine has 8GB. That might be the problem. I guess you can try n_samples=1, but if it doesn't work then I guess you can't run it :(

from jukebox.

made-by-chris commented on August 24, 2024

@ttbrunner yeh i gave it a shot.. lots of crashing and then memory allocation errors :) Thanks for your advice though. Are most people running this thing entirely in the cloud? I'm curious about the costs of using it in a hosted way.

from jukebox.

stevedipaola commented on August 24, 2024

Anyone solve this for us with GPU on stand alone Linux PCs - we are running out of mem and crashing off the bat - I have a 1080ti and can't get it working at any n_samples.

from jukebox.

ttbrunner commented on August 24, 2024

Yes, I can run it on Ubuntu 18 with a 1070. To sum up,

Allocate heaps of swap space if you have <32GB system RAM
Use 1b_lyrics (not 5b)
Try n_samples=1
Set a short sample_length, but not too short (any less than 20 seconds might return an error)
Don't use sample.py, but create your own based on the code in the jupyter notebook. The crucial difference is that the top prior (which generates the initial sample) is loaded separately from the two upsamplers that follow. sample.py loads them all in one, which causes OOM.

It's possible to save the intermediate results to disk with torch.save(zs, PATH) and continue later. On a low-end machine, I would create two separate scripts, one for drawing from the top prior and one for the upsampling. Sampling the top prior just takes a couple of minutes, you can already listen to it and decide if you want to upsample it.

from jukebox.

terrafying commented on August 24, 2024

mine crashes as soon as i run rank, local_rank, device = setup_dist_from_mpi()

from jukebox.

sample.py just stops running without any clear error about jukebox HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent