Code Monkey home page Code Monkey logo

Comments (46)

SMarioMan avatar SMarioMan commented on August 24, 2024 23

I've created a modified notebook that adds support for primed audio by supplying an audio file in your Google Drive. Feel free to make a copy and try it out for yourself.
https://colab.research.google.com/drive/1OW4hnjsnAL7grjqYlCOLvNu97CRquP8I?usp=sharing
https://colab.research.google.com/github/SMarioMan/jukebox/blob/master/jukebox/Interacting_with_Jukebox.ipynb

I can make a pull request to add this to the main repository as well, if there is interest.

from jukebox.

kcrosley-leisurelabs avatar kcrosley-leisurelabs commented on August 24, 2024 7

I keep getting timeouts on colab, I even upgraded to the paid plan but no bueno. Any suggestions on how to keep it running?

Hey @diffractometer: It's the tendency for disconnection that has me psyched about @SMarioMan's latest update (which makes upsampling from a previous checkpoint work properly now ๐ŸŽ‰).

Because it takes a very long time for (for example) 90 seconds of audio to generate all the way from start, thru level 2, through upsampling to Level 1 and then Level 0, going all the way in one pass is pretty rare.

(I have quite a few interesting things that got stuck at Level 2 or Level 1 that I can now continue on with.)

Some notes and things that are helpful:

  1. Even if it seems your web browser has disconnected from the runtime, it is actually possible that your notebook is still running. If you seem to get disconnected, do Runtime > Manage Sessions to see if in fact your session is still running in the cloud:

image

If it is, you'll see it like this and you can double click a running session to reconnect to it:

image

  1. As Google notes (and as I'm sure you're aware), while you can run multiple sessions at once, resources even on Pro are not guaranteed and I have found that being greedy and running, say, 3 sessions at once can make you get dumped from some or all of the sessions. YMMV.

  2. This might not be a real effect, but I have successfully had 2 sessions at once eventually run all the way to completion. What I did was (while the things are running) occasionally do File > Save. This seems (thought I cannot prove scientifically) to keep Colab thinking that you are, in fact, running interactively and not just dropping a long-running process on their GPU. (In the case of jukebox, of course we are doing both, but only because it's such a compute-intensive thang!)

  3. I've found that I can run several sessions at a time and easily complete many Level 2 renderings (you might even let one go to upsample stage and you'll probably at least get to Level 1, but may drop before Level 0 completes). This lets you just explore and then some other time take your best outputs and do the upsampling on them (like let one run overnight) some other time.

Hope perhaps some of the above helps?

BTW, I see now that I'm experimenting with the continue from checkpoint (upsample) mode that the state is preserved -- e.g., if you continue upsampling a thing that crapped out during Level 0, you will continue right from where you left off, rather than from the last completed level. This is pretty exciting.

-K-

Edit: Here's another re-mastered Level 0 render. Here, I'm priming from the "Be quiet..." break from 10cc's "I'm Not in Love", rendering as artist Sarah McLachlan and genre "ambient". You can compare the raw Level 0 wav output from jukebox with the remastered version:

Given the genre here, you could easily prefer the original to the remaster as it really comes down to personal preferences and the vocal here isn't as buried as it is in more busy pop-type mixes.

Similar to GPT-2, I think that priming the engine is the most interesting way to use jukebox.

from jukebox.

SMarioMan avatar SMarioMan commented on August 24, 2024 6

I have created PR #72 with my latest changes, now including checkpoints. Hopefully it gets merged, but if it doesn't you can use the modified notebook at https://colab.research.google.com/github/SMarioMan/jukebox/blob/master/jukebox/Interacting_with_Jukebox.ipynb

from jukebox.

kcrosley-leisurelabs avatar kcrosley-leisurelabs commented on August 24, 2024 5

Hey @diffractometer: That's the 5b model. (Honestly, I've not gotten anything useful out of the 1b, though I've not experimented with prompting it.)

The finished track there (as I think you can tell) has been significantly remastered from the raw output. I've come up with a pretty useful starting place for cleaning up artifacts and re-balancing the mixes. It's kind of an expensive signal chain, but it generally goes like this:

On the track itself:

Zynaptiq Unveil (remove smeary reverb-like artifacts and mud) > Zynaptiq Unmix:Drums (generally to bring volume of percussive elements UP) > sometimes an Exciter type plugin to put some high freq content back in

An alternative here is some pretty aggressive EQ (reduce lower mids to clean out mud), accentuate upper mid vocal range, boost high end. (But this is nowhere near the magic of the Zynaptic stuff.)

In the example I linked above, I restored removed reverb by an automated send to a period-appropriate reverb (UAD's version of Lexicon 224). There's also a reverberated ping-pong delay being fed at appropriate dramatic moments.

Then, on the Master bus:

Cosmos (stereo widener / exciter / bass booster) > UAD tape simulator > UAD Precision Multiband (dynamics - mostly accentuate vocals, sub bass, and control sibilance caused by the high-freq boost previously) > UAD Precision Maximizer (limiting/volume maximization)

And after all that, you kinda sorta get back to a good-sounding record! ;)

from jukebox.

kcrosley-leisurelabs avatar kcrosley-leisurelabs commented on August 24, 2024 4

@svntv, for continuations, simply follow the instructions in @SMarioMan's notebook. To do a continuation, upload a small sample truncated at an appropriate point (use an audio editor to make note of the time at which a certain event happens -- such as a downbeat or bar break so you don't end up with shitty/random examples like the ones from OpenAI (who seem to have zero knowledge of music in any meaningful sense -- kinda like their transformer... but I digress).

Here's how you do continuations:

Here's an example notebook that generates continuations from a certain classic Thomas Dolby track:

https://colab.research.google.com/drive/1ssiZw58aU2km3cWN183v4IC9KYmPTlMS?usp=sharing

Here's the cell where we'll set the priming. We run this cell instead of the one above it to put jukebox into primed mode:

image

You would change the location of your priming audio_file (which ideally should be 44.1/16-bit/mono wav format) to whatever your actual source is, of course. (If you'd like a copy of science-cut.wav, btw, you could get it here (you know, for science purposes only): https://drive.google.com/file/d/1AYnpVIE9KIPrf73sUBLjMPtZ3zmX83lU/view?usp=sharing)

Note the prompt_length setting. This can be any value in seconds up to the length of your source sample. In the case of this example, while the priming source is over 4 minutes long, we just look at the first 12 seconds, which gives us a nice place from which to continue (right after, "when she turned..").

And now we set our total sample length. In the example, I've selected 90 seconds, which will give us a clip with a total length of 90 seconds (12 seconds from the primer and 78 seconds of "new" material that continues from the cutoff point).

image

Below that, we set the artist and genre IDs. These can be any of the artist or genre IDs found in the "V2" versions of the artist and genre files (if you are using the 5B model as assumed in this example).

IMPORTANT: Your choice of genre and artist are extremely important to the final output. If you provide choices that are unknown, malformed or misunderstood by the model, they will default to "unknown" and/or "various artists" and you'll get some pretty random results.

If -- as we've done here -- you provide an artist that is unknown ("thomas dolby" is NOT in the V2 artist list), you'll see a (non-fatal) error and our artist is "unknown". This has a tendency to produce (in most cases) output that does not sound at all like your target artist (particularly in vocal style) but also in musical style.

I say "in most cases" because sometimes (as in this example) you'll stumble upon tracks/artists that are pretty obviously represented in the training corpus even though they may not have their own artist IDs. Listen to the final output example I've provided for this one and I think you'll agree that there's no possible way that jukebox hasn't seen "She Blinded Me with Science" and other Thomas Dolby tracks.

(The result I've cherry-picked here is a practical pastiche of Dolby-isms, particularly in the vocal processing. In just this one example, I hear echoes of a bunch of Dolby songs, not the least of which is Hyperactive. Also interesting: at the very end of the track is an almost perfectly isolate LINNdrum sample.)

HOWEVER, jukebox has not been trained on you (unless you've done that... and if you haven't and want to know how to do it, I'm not the guy to ask). Also, we don't know what genre you're operating in and, further, how close your particular primer track might "fit" the selected genre you've declared.

_(Aside: Also, we don't really know anything about the V2 genres, which are horribly implemented. Rather than being really specific (as in the V3 genres, which go with the 1B model, not the 5B model) they are very broad (e.g., in V3 we have "bossa nova", but where are the bossa nova tracks in terms of V2? Jazz? mpb? who knows?).

... and, further, some of them have to be constructed from parts that we are not sure go together. For example, what's the correct nomenclature for the genre "R&B" (rhythm and blues), is it "r n b"? "rnb"? I've not tried that one, but look at the genre's list -- there's a "r" genre a "b" genre and a "n" genre. It's a serious bucket of what the ever loving fuck without any documentation.

I can tell you from experimentation and from the jukebox samples site that "rock n roll" is an accepted genre (presumably early rock, not to be confused with "rock" which is also a genre). Also "new wave" works. It also seems that "nu metal" is an acceptable genre, but is "nu jazz"? (I suspect not, but maybe.) Anyway, that's all pretty stupid. I wish the V2 genres were as so cleanly defined as the V3 ones.)_

If (like me) you make your own "serving suggestions" as a starter, note that things can diverge quite quickly from your original composition -- especially if the chosen artist's oeuvre isn't much like the genre you've picked.

Here, for example, is a batch of 3 tracks created with this seed (about 22 seconds of a quickly "remixed" version of the Beach Boys classic, "God Only Knows") with the artist set to the_beach_boys and the genre set to "trip hop". Also, in this example, instead of using the actual words to God Only Knows, I used a modified version that was itself constructed by a text continuation using GPT-2 (see end of this message for the "lyrics"):

  • Item 0: Keeps the underlying beat for a little while, charming rendition of "you've got the prettiest face you could ever wish for" which clearly seems to be the chorus.
  • Item 1: Happy trip hop I guess with some rad harmonization going on after a bit. (Stick around for the isolated guitar chord hit at the end!)
  • Item 2: Goes pretty much acapella for a good while right after the priming audio. After that long pause, the original priming material is pretty much gone from memory (it is literally out of the attention window) and, so, the subsequent generation is hardly reminiscent of our starting place, except in generic terms.

ABOUT THAT YOUTUBE VIDEO:

What's being shown there is simply a bunch of the "Never Gonna Give You Up" continuations published at https://jukebox.openai.com/, played one after another.

It's not a very very very long output from a single very very very long jukebox session. It's just a bunch of those 70-ish second examples placed end to end. Don't misunderstand that.

As for why this continuation works so well with the Rick Astley song (much like it works so well in my Thomas Dolby example): First, it's obvious that jukebox has seen this song and knows it (duh). Further, the 11 second-ish primer they use has enough lyrical content (which start from the very beginning of the song and match up perfectly because the lyrics they trained on are from LyricsWiki and that's what's also used here in their "lyrics" parameter).

Also, the genre isn't a mis-match (while OpenAI has not told us what the training corpus contains and how they categorized any specific tracks) it's pretty clear they lumped Rick here into "pop".

Anyway, I hope some of the previous info helps you!

Best Regards,
Keith


APPENDIX:
The wonderfully crap lyrics to "You've Got the Prettiest Face," by GPT-2:

lyrics = """I may not always love you, but you will always love me. 
Because you're all that I've ever needed,
and the only place I ever want to go.

Everything you give me, I promise to return. 
(By the way, I'm not kidding.) 

I know if I take one step back,
I can see that you've been waiting for me
but I never have let you down before.
(No one can ever take away what I've dreamed and worked for.)

Tell me I'm amazing and give me your love.
(Oooh!) You've got the prettiest face
you could ever wish for.
You know you've got me all!
"""

from jukebox.

SMarioMan avatar SMarioMan commented on August 24, 2024 3

@kcrosley-leisurelabs I have been working on integrating checkpoints for upsampling support within the notebook using code from #42. I haven't fully tested it yet, but if I get it working, I'll share it.

from jukebox.

xandramax avatar xandramax commented on August 24, 2024 3

For anyone who would like to play with it, I modified SMarioMan's notebook to add co-composing with primed audio samples. It's available on GitHub here and on Colab here.

from jukebox.

SMarioMan avatar SMarioMan commented on August 24, 2024 2

You should be able to provide a comma separated list of audio files instead of just one. I haven't tried that configuration myself, so you might need to put some effort in if it doesn't just work. You can also provide different lyrics, genre, and artist information for each sample by modifying the metas array to contain multiple dicts instead of repeating the same one for all samples.

from jukebox.

kcrosley-leisurelabs avatar kcrosley-leisurelabs commented on August 24, 2024 2

@SMarioMan might you know how to solve #64 vis-a-vis your Colab notebook (referenced above)? I've been wondering the same question about how to reload a previously interrupted "level 2" or "level 1" session that now needs further upsampling and I've been using your (most excellent) notebook to experiment with jukebox. Many thanks for your contributions here!

from jukebox.

SMarioMan avatar SMarioMan commented on August 24, 2024 2

@kcrosley-leisurelabs Thanks for identifying this issue. It's nothing wrong on your end. load_codes() wasn't designed to handle a separate top prior, and I missed that. I'll push a fix soon.

from jukebox.

SMarioMan avatar SMarioMan commented on August 24, 2024 2

@Flesco Currently, the only change I have made to co-composing has been to save outputs in the Google Drive.I haven't experimented with any of the co-composer features myself, even to know whether or not it provides or uses checkpoints.

from jukebox.

combs avatar combs commented on August 24, 2024 2

re: keeping colab from terminating your session by clicking Save: Chrome reserves the right to stealthily terminate a tab's javascript execution if you haven't used it for a while. But this doesn't happen if the tab is in front.

At least on Chromium/Ubuntu a nice workaround is to keep colab as a dedicated window with no other tabs, and minimize or ignore it.

from jukebox.

xandramax avatar xandramax commented on August 24, 2024 2

It's also possible to do continuations in co-composition mode, which is what my post that @svntv quoted was referencing. My changes have now been merged into SMarioMan's repository, so it's no longer necessary to use my version of the notebook in order to get this functionality.

Anyways, good info above from kcrosley and it's pretty much all just as applicable when doing continuations via co-composition. Especially the advice about the importance of genre, artist and the audio prompt that is used.

If you're using co-composition, don't be shy about running several batches before choosing a snippet of output to build upon. I've found that jukebox's creativity can be surprisingly diverse, and you'll often find several distinctly different directions it might think about taking things at any given point in the song that work well.

Usually the majority of its ideas are not so great, in my experience, but just as much I always seem to find some interesting gem mixed in among all the questionable continuations. Based on that apparent signal to noise ratio, I'm reluctant to invest much time outside of co-composition and would rather explore the breadth of jukebox's thinking in order to get more "optimal" output every step of the way, four seconds at a time.

from jukebox.

btrude avatar btrude commented on August 24, 2024 1

@SMarioMan Thank you so much for making this. You solved a problem I was having :)

I'm not sure if this is possible but is it possible to show an ETA for inference? I'm currently on the last level (level 0) and it's at Sampling 8192 tokens for [383184,391376]. Conditioning on 5936 tokens and seems to have just finished but I couldn't determine whether it was close to the end or not until it was. Would it be possible for this to be implemented? Or maybe a percentage?

Level 1 -> level 0 is exactly 4x the tokens and takes almost exactly 4x as long in my somewhat limited experience with a GPU with almost 50% headroom. So just multiply the last token count from level 1 to see where you are in the final level.

from jukebox.

kcrosley-leisurelabs avatar kcrosley-leisurelabs commented on August 24, 2024 1

BTW, your efforts are enabling high art like this:

https://www.facebook.com/LeisureAddicts/videos/3196359047050228/

๐Ÿ”ฅ๐Ÿ“ผ thanks for your service!

from jukebox.

SMarioMan avatar SMarioMan commented on August 24, 2024 1

@kcrosley-leisurelabs The issue should be fixed at this stage.

from jukebox.

diffractometer avatar diffractometer commented on August 24, 2024 1

@kcrosley-leisurelabs ๐Ÿ˜ฎ omg

from jukebox.

xandramax avatar xandramax commented on August 24, 2024 1

(Honestly, I've not gotten anything useful out of the 1b, though I've not experimented with prompting it.)

In my experience so far, the best way to interact with 1b is via co-composition. It seems that the 1b model can generate a lot of good ideas mixed in with a whole lot of bad ones, and it's not so great at picking out the good ideas by itself.

from jukebox.

kcrosley-leisurelabs avatar kcrosley-leisurelabs commented on August 24, 2024 1

The issue [upsampling from checkpoint] should be fixed at this stage.

And, indeed, @SMarioMan, it is! Successfully continuing upsampling right now from a Level 1 checkpoint. This is great. Thanks for your help!

from jukebox.

diffractometer avatar diffractometer commented on August 24, 2024 1

@kcrosley-leisurelabs super, I really appreciate the update and tips, this is awesome. Listened to the remaster, bananas! I'll try it and get back at ya soon, thanks again.

from jukebox.

diffractometer avatar diffractometer commented on August 24, 2024 1

@anlexmatos doooope. Thanks! Checking it out now.

from jukebox.

svntv avatar svntv commented on August 24, 2024 1

For anyone who would like to play with it, I modified SMarioMan's notebook to add co-composing with primed audio samples. It's available on GitHub here and on Colab here.

First of all, I love you. Second, I've been playing with your colab and I'm really interested in generate variations from a song of mine, but what I get is like 30 seconds of my song and then turns into a different song. I would expect something like a continuous generation (like this video https://www.youtube.com/watch?v=iJgNpm8cTE8 )

Am I doing something wrong? Thanks mate!!

from jukebox.

diffractometer avatar diffractometer commented on August 24, 2024

@SMarioMan thanks!

from jukebox.

jayjay300 avatar jayjay300 commented on August 24, 2024

@SMarioMan Thanks a million for this. Really came in handy. Do you know possibly what modifications would need to be made to train a model on multiple audio samples to see what the output would be based on that? ie if I wanted to train it on a few songs by the same artist? Apologies if this is a rather simple question. I've worked with Stylegan in the past to create images and that was just setting the training to a directory and it would iterate through all images within.

from jukebox.

kcrosley-leisurelabs avatar kcrosley-leisurelabs commented on August 24, 2024

@SMarioMan, that would be terrific (and I'd find it very instructive).

from jukebox.

diffractometer avatar diffractometer commented on August 24, 2024

@SMarioMan that is super dope thanks. I'm using colab and building a new ubuntu box to try to make as similar env as the colab runtime, on paid plan, but I keep getting session drops anyway :(

from jukebox.

kcrosley-leisurelabs avatar kcrosley-leisurelabs commented on August 24, 2024

@diffractometer yeah, I experience similar issues at times (using Colab Pro), hence the ask. Just from an experimentation workflow perspective, it'd be a lot easier to generate Level 2 pieces and then choose only the most interesting to fully render at some other time.

from jukebox.

diffractometer avatar diffractometer commented on August 24, 2024

@kcrosley-leisurelabs would that be like the example but skip the first level 1 phase? Let it render for a long time then iterate? I think I understand...

from jukebox.

kcrosley-leisurelabs avatar kcrosley-leisurelabs commented on August 24, 2024

@diffractometer, I just mean that you can tell at Level 2 whether a track is worth even bothering to render fully-upsampled, but given how diverse the output is from jukebox, I'd rather spend my time crate digging at Level 2 and then going back and deciding which ones are worth upsampling later (as longer tracks take so durn long to upsample).

from jukebox.

camjac251 avatar camjac251 commented on August 24, 2024

@SMarioMan Thank you so much for making this. You solved a problem I was having :)

I'm not sure if this is possible but is it possible to show an ETA for inference? I'm currently on the last level (level 0) and it's at Sampling 8192 tokens for [383184,391376]. Conditioning on 5936 tokens and seems to have just finished but I couldn't determine whether it was close to the end or not until it was. Would it be possible for this to be implemented? Or maybe a percentage?

from jukebox.

kcrosley-leisurelabs avatar kcrosley-leisurelabs commented on August 24, 2024

Awesome. Canโ€™t wait to check that out, @SMarioMan! Thanks!

from jukebox.

kcrosley-leisurelabs avatar kcrosley-leisurelabs commented on August 24, 2024

Howdy,, @SMarioMan! Hey, whenever I try to continue from a checkpoint (this is using primed mode), I always get:

AttributeError                            Traceback (most recent call last)
<ipython-input-11-611da0d1b201> in <module>()
      8   else:
      9     duration = None
---> 10   zs = load_codes(sample_hps.codes_file, duration, priors, hps)
     11   if sample_hps.mode == 'continue':
     12     zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps)

/usr/local/lib/python3.6/dist-packages/jukebox/sample.py in load_codes(codes_file, duration, priors, hps)
    165     if duration is not None:
    166         # Cut off codes to match duration
--> 167         top_raw_to_tokens = priors[-1].raw_to_tokens
    168         assert duration % top_raw_to_tokens == 0, f"Cut-off duration {duration} not an exact multiple of top_raw_to_tokens"
    169         assert duration//top_raw_to_tokens <= zs[-1].shape[1], f"Cut-off tokens {duration//priors[-1].raw_to_tokens} longer than tokens {zs[-1].shape[1]} in saved codes"

AttributeError: 'str' object has no attribute 'raw_to_tokens'

In the cell below "This next cell will take a while (approximately 10 minutes per 20 seconds of music sample)".

Am I running a previous cell that I should not?

(In case it's not clear: I have, for example, a previously computed Layer 2 that was made using a prompt. Now, sometime later, I am trying to upsample from that checkpoint. But when I reconnect and run thru the notebook again [by executing cells including the "# Identify the lowest level generated and continue from there." cell], I get the above error.

Note that my total length, hps folder, and priming file/length settings are all same as before.

Feel like I'm missing something, but don't know what it might be! Thanks for any help you can provide.

Thanks,
Keith

from jukebox.

kcrosley-leisurelabs avatar kcrosley-leisurelabs commented on August 24, 2024

Thx, @SMarioMan! Appreciate it!

from jukebox.

diffractometer avatar diffractometer commented on August 24, 2024

@kcrosley-leisurelabs did you render on the 1b or 5b if you don't mind my asking.

from jukebox.

diffractometer avatar diffractometer commented on August 24, 2024

@kcrosley-leisurelabs this is so dope. I keep getting timeouts on colab, I even upgraded to the paid plan but no bueno. Any suggestions on how to keep it running?

I'm going to render something then run it thru my 8 track tape machine once I get it to work.

from jukebox.

camjac251 avatar camjac251 commented on August 24, 2024

@SMarioMan Is total_sample_length_in_seconds not used with the primer option? I was hoping to use the entire song to synthesize from but I think it only uses the prompt_length_in_seconds for the entire sampling of the generated song (that and the model)
Would it be possible to incorporate sample_length_in_seconds and total_sample_length_in_seconds?

Compared to running the example in the readme, there's less influence of the primed input with colab.

from jukebox.

SMarioMan avatar SMarioMan commented on August 24, 2024

@camjac251 I believe if you set prompt_length_in_seconds to the whole length of the song and make sample_length_in_seconds longer, you can extend the song.
My understanding is that total_sample_length_in_seconds is only used so the generator knows how far it is through the song. While it would normally help set the sample length, it is always 1048576 samples in the script right now. It's really more important when co-composing or other tasks where its expected only to generate part of the level 2 samples.

from jukebox.

Flesco avatar Flesco commented on August 24, 2024

@SMarioMan do checkpoints also work with Co-Composer upcycling? I've given your modified notebook a try with one of my zs-top-level-final.t files, but unfortunately after timing out I see no checkpoint files or anything.

from jukebox.

johndpope avatar johndpope commented on August 24, 2024

Check it - https://arstechnica.com/gaming/2020/05/when-audio-deepfakes-put-words-in-jay-zs-mouth-did-he-have-a-legal-case/?utm_brand=arstechnica&utm_source=twitter&utm_social-type=owned&utm_medium=social

Jay-Z audio deepfakes
https://youtu.be/iyemXtkB-xk

from jukebox.

alexbanda08 avatar alexbanda08 commented on August 24, 2024

has anyone managed to train new data through the colab notebook?

from jukebox.

svntv avatar svntv commented on August 24, 2024

@svntv, for continuations, simply follow the instructions in @SMarioMan's notebook. To do a continuation, upload a small sample truncated at an appropriate point (use an audio editor to make note of the time at which a certain event happens -- such as a downbeat or bar break so you don't end up with shitty/random examples like the ones from OpenAI (who seem to have zero knowledge of music in any meaningful sense -- kinda like their transformer... but I digress).

Here's how you do continuations:

Here's an example notebook that generates continuations from a certain classic Thomas Dolby track:

https://colab.research.google.com/drive/1ssiZw58aU2km3cWN183v4IC9KYmPTlMS?usp=sharing

Here's the cell where we'll set the priming. We run this cell instead of the one above it to put jukebox into primed mode:

image

You would change the location of your priming audio_file (which ideally should be 44.1/16-bit/mono wav format) to whatever your actual source is, of course. (If you'd like a copy of science-cut.wav, btw, you could get it here (you know, for science purposes only): https://drive.google.com/file/d/1AYnpVIE9KIPrf73sUBLjMPtZ3zmX83lU/view?usp=sharing)

Note the prompt_length setting. This can be any value in seconds up to the length of your source sample. In the case of this example, while the priming source is over 4 minutes long, we just look at the first 12 seconds, which gives us a nice place from which to continue (right after, "when she turned..").

And now we set our total sample length. In the example, I've selected 90 seconds, which will give us a clip with a total length of 90 seconds (12 seconds from the primer and 78 seconds of "new" material that continues from the cutoff point).

image

Below that, we set the artist and genre IDs. These can be any of the artist or genre IDs found in the "V2" versions of the artist and genre files (if you are using the 5B model as assumed in this example).

IMPORTANT: Your choice of genre and artist are extremely important to the final output. If you provide choices that are unknown, malformed or misunderstood by the model, they will default to "unknown" and/or "various artists" and you'll get some pretty random results.

If -- as we've done here -- you provide an artist that is unknown ("thomas dolby" is NOT in the V2 artist list), you'll see a (non-fatal) error and our artist is "unknown". This has a tendency to produce (in most cases) output that does not sound at all like your target artist (particularly in vocal style) but also in musical style.

I say "in most cases" because sometimes (as in this example) you'll stumble upon tracks/artists that are pretty obviously represented in the training corpus even though they may not have their own artist IDs. Listen to the final output example I've provided for this one and I think you'll agree that there's no possible way that jukebox hasn't seen "She Blinded Me with Science" and other Thomas Dolby tracks.

(The result I've cherry-picked here is a practical pastiche of Dolby-isms, particularly in the vocal processing. In just this one example, I hear echoes of a bunch of Dolby songs, not the least of which is Hyperactive. Also interesting: at the very end of the track is an almost perfectly isolate LINNdrum sample.)

HOWEVER, jukebox has not been trained on you (unless you've done that... and if you haven't and want to know how to do it, I'm not the guy to ask). Also, we don't know what genre you're operating in and, further, how close your particular primer track might "fit" the selected genre you've declared.

_(Aside: Also, we don't really know anything about the V2 genres, which are horribly implemented. Rather than being really specific (as in the V3 genres, which go with the 1B model, not the 5B model) they are very broad (e.g., in V3 we have "bossa nova", but where are the bossa nova tracks in terms of V2? Jazz? mpb? who knows?).

... and, further, some of them have to be constructed from parts that we are not sure go together. For example, what's the correct nomenclature for the genre "R&B" (rhythm and blues), is it "r n b"? "rnb"? I've not tried that one, but look at the genre's list -- there's a "r" genre a "b" genre and a "n" genre. It's a serious bucket of what the ever loving fuck without any documentation.

I can tell you from experimentation and from the jukebox samples site that "rock n roll" is an accepted genre (presumably early rock, not to be confused with "rock" which is also a genre). Also "new wave" works. It also seems that "nu metal" is an acceptable genre, but is "nu jazz"? (I suspect not, but maybe.) Anyway, that's all pretty stupid. I wish the V2 genres were as so cleanly defined as the V3 ones.)_

If (like me) you make your own "serving suggestions" as a starter, note that things can diverge quite quickly from your original composition -- especially if the chosen artist's oeuvre isn't much like the genre you've picked.

Here, for example, is a batch of 3 tracks created with this seed (about 22 seconds of a quickly "remixed" version of the Beach Boys classic, "God Only Knows") with the artist set to the_beach_boys and the genre set to "trip hop". Also, in this example, instead of using the actual words to God Only Knows, I used a modified version that was itself constructed by a text continuation using GPT-2 (see end of this message for the "lyrics"):

  • Item 0: Keeps the underlying beat for a little while, charming rendition of "you've got the prettiest face you could ever wish for" which clearly seems to be the chorus.
  • Item 1: Happy trip hop I guess with some rad harmonization going on after a bit. (Stick around for the isolated guitar chord hit at the end!)
  • Item 2: Goes pretty much acapella for a good while right after the priming audio. After that long pause, the original priming material is pretty much gone from memory (it is literally out of the attention window) and, so, the subsequent generation is hardly reminiscent of our starting place, except in generic terms.

ABOUT THAT YOUTUBE VIDEO:

What's being shown there is simply a bunch of the "Never Gonna Give You Up" continuations published at https://jukebox.openai.com/, played one after another.

It's not a very very very long output from a single very very very long jukebox session. It's just a bunch of those 70-ish second examples placed end to end. Don't misunderstand that.

As for why this continuation works so well with the Rick Astley song (much like it works so well in my Thomas Dolby example): First, it's obvious that jukebox has seen this song and knows it (duh). Further, the 11 second-ish primer they use has enough lyrical content (which start from the very beginning of the song and match up perfectly because the lyrics they trained on are from LyricsWiki and that's what's also used here in their "lyrics" parameter).

Also, the genre isn't a mis-match (while OpenAI has not told us what the training corpus contains and how they categorized any specific tracks) it's pretty clear they lumped Rick here into "pop".

Anyway, I hope some of the previous info helps you!

Best Regards,
Keith

APPENDIX:
The wonderfully crap lyrics to "You've Got the Prettiest Face," by GPT-2:

lyrics = """I may not always love you, but you will always love me. 
Because you're all that I've ever needed,
and the only place I ever want to go.

Everything you give me, I promise to return. 
(By the way, I'm not kidding.) 

I know if I take one step back,
I can see that you've been waiting for me
but I never have let you down before.
(No one can ever take away what I've dreamed and worked for.)

Tell me I'm amazing and give me your love.
(Oooh!) You've got the prettiest face
you could ever wish for.
You know you've got me all!
"""

Wow!! First of all thanks for your extended and detailed guide. I'm really really thankful for your dedication.
I understand now. I was doing it wrong so I tried the way you told me with "the weeknd" and electronic as genre and it kind of worked. I mean, it was another song after a few bars but it kept the kick and some harmonies. Also some vocal melody lines were really good. Like new pop song material level. Impressive.

It's also possible to do continuations in co-composition mode, which is what my post that @svntv quoted was referencing. My changes have now been merged into SMarioMan's repository, so it's no longer necessary to use my version of the notebook in order to get this functionality.

Anyways, good info above from kcrosley and it's pretty much all just as applicable when doing continuations via co-composition. Especially the advice about the importance of genre, artist and the audio prompt that is used.

If you're using co-composition, don't be shy about running several batches before choosing a snippet of output to build upon. I've found that jukebox's creativity can be surprisingly diverse, and you'll often find several distinctly different directions it might think about taking things at any given point in the song that work well.

Usually the majority of its ideas are not so great, in my experience, but just as much I always seem to find some interesting gem mixed in among all the questionable continuations. Based on that apparent signal to noise ratio, I'm reluctant to invest much time outside of co-composition and would rather explore the breadth of jukebox's thinking in order to get more "optimal" output every step of the way, four seconds at a time.

Thanks. I think it can be a good composition tool for inspiration.

@kcrosley-leisurelabs @anlexmatos What would you recommend to do if I'm working mainly with instrumental tracks? Is there a parameter that can be set or maybe put an empty lyrics or somenthing?

Thanks again!

from jukebox.

Beatfox avatar Beatfox commented on August 24, 2024

BTW, I see now that I'm experimenting with the continue from checkpoint (upsample) mode that the state is preserved -- e.g., if you continue upsampling a thing that crapped out during Level 0, you will continue right from where you left off, rather than from the last completed level. This is pretty exciting.

Hey @kcrosley-leisurelabs, could you clarify this? Does the state preservation only apply if you're able to reconnect to your active session? From my own experience, if the session itself gets terminated, you have no choice but to start over from the data.pth.tar of the last completed level.

from jukebox.

SMarioMan avatar SMarioMan commented on August 24, 2024

@Beatfox That's my understanding as well. It resumes from the data.pth.tar. There's no reason you couldn't modify the code to checkpoint more frequently. This would allow you to have partial levels, but note that my current CoLab document assumes the levels are complete if they exist.

from jukebox.

SirCommoner avatar SirCommoner commented on August 24, 2024

(Disclaimer: I'm a real beginner/layman at this type of stuff) So I was upsampling and it got past level 1 and was actually a couple hours into level 0. However, my internet went down before it could do it and I ended up losing all progress. How can I upsample from the "co_composer\level_1" files I'd downloaded so I don't lose progress? Is it possible?

from jukebox.

diffractometer avatar diffractometer commented on August 24, 2024

from jukebox.

SirCommoner avatar SirCommoner commented on August 24, 2024

But how exactly do I upload my level_1 file to be upsampled? Do I replace the level_1 folder then run the upsampling cell? Do I have to generate a whole other thing and then replace the folder? Do I just replace some other file? Etc

from jukebox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.