Code Monkey home page Code Monkey logo

Comments (13)

go-dustin avatar go-dustin commented on July 17, 2024 1

Thanks to be honest it was more about the sound design than the source material. I haven't gotten the raw wave to the quality level I want yet.

I used a snapshot that was at 3k cycles. This was running on a Telsa K80 in Google Cloud. My times won't be comparable to your since that is dependent on the size of the training data & GPU (cores/memory). I used about 1.6 hours worth of data. According to this page the diff in speed between a k80 & 1070 is substantial .33 TFLOPS vs 1.87+ TFLOPS.

I tried doing some speech training as well.. I don't think you'll get anything coherent out of this. I personally like the sounds I was getting and plan to use them. I just need to figure out how to increase the sample frequency. The 1600khz rate isn't very useful to me since it doesn't have the crunchy sound you typically get from downsampling.

from wavegan.

go-dustin avatar go-dustin commented on July 17, 2024

When I did my first test it didn't occur to me that the model was less than 1% trained. Still impressive results from a very early prototype. :D

from wavegan.

haideraltahan avatar haideraltahan commented on July 17, 2024

Nice Work!
How long it took you to train and on what hardware? @go-dustin
I am trying to train on 300 1-sec speech recordings but the process seems very slow with a GTX 1070.

from wavegan.

chrisdonahue avatar chrisdonahue commented on July 17, 2024

Hey dude this is really awesome. Do you mind if I use this in future presentations as an example of humans employing generative models to assist in music production?

Have you tried training on 44.1kHz clips? You'll only be able to get ~1.48s clips using the maximum length (65536) but it might be more useful for you. If you know what length you would like to create I could help you modify the code to accomplish this.

from wavegan.

libby-h avatar libby-h commented on July 17, 2024

Hi @go-dustin and @chrisdonahue this is super cool. To add what I've been doing.

I've been playing around with this model at 44kz trained on a data set of 4 second snippets of Gregorian Chants, 234Mb in total (2413 items). I started training it almost 2 day ago on a RTX Titan.
Here are my samples right now https://drive.google.com/open?id=11Ycww5u_L4cT4vfqHdWAZeT-ibT7wcva

Next I'm planning to alter the parameters of the model, on a small amount of data, and bench mark which model overfits best. Then I'll go back to training/testing/validating on a large data set of many different chants from across the world - (making sure it doesn't overfit).

from wavegan.

chrisdonahue avatar chrisdonahue commented on July 17, 2024

Yeah!! These rock @libby-h . Always love hearing audio samples.

Can you tell me what your data looks like (e.g. number of files, length of each)? I can maybe suggest some different data loader configurations that might work better. Configuring these parameters tends to be a little opaque (my bad).

How do you plan on measuring which model(s) are overfitting? I don't know off hand of an easy way to measure this with a GAN. Let me know if you have ideas there

from wavegan.

libby-h avatar libby-h commented on July 17, 2024

Hey @chrisdonahue great to hear from you!

The data file I'm working with at the moment is this one https://drive.google.com/open?id=124YJhlZoQQfO959J0dWnK9vOX8zP6y-x (1,413 items, totalling 137.0 MB, 4 secs each)
I'm currently working on building the final larger data set of many different types of chant - which will be considerably bigger. If I can gain an intuition now by training/testing on this linked data set, then hopefully I'll understand how to go to the larger one. Any tips you can give for this data set, or general info, re the data loader configs would be much welcomed! (I can share my progress in return if that's interesting). Would love to pick up some of the voices from the linked data sets. I haven't been able to yet!

By the way, did you ever modify the code for go-dustin to create longer clips at 44khz? Would love to get hold of that if it's possible (hopefully my GPU will handle a bigger model).

For overfitting, I had two ideas. Bear in mind though, that I'm an artist working with AI and not an AI dev. Maybe these wont work. The first idea was to train the model for a long enough time on a small-ish data set of two sounds, to force the gan to replicate only one of the sounds from the data set. Then I'd assume it had overfit. Obviously 'long enough time' would need to be discovered.

The other idea, was to train on a small-ish data set of similar sounds (my 'train' data), and then to continue training on another data set, same size as 'train' of similar but not identical sounds (my 'test' data). I'd then watch the convergence params to see if they start going back up considerably (or to a lesser extent) when I'm training on the 'test' data. Re-run these a few times to get a sense of what the model is responding to for different training durations.... The models where the convergence params went up the most, I'd assume to be most overfit.

In any case, any tips for working with this model would be super appreciated! In the end I'd love to be able to navigate the latent space live and for strange new sounds to morph from one type to another as I go. Dream Scenario

Thanks!

from wavegan.

chrisdonahue avatar chrisdonahue commented on July 17, 2024

Hi @libby-h

The linked dataset will work fine with the default parameters of the training script. You might consider converting all of them to WAVs and using --data_fast_wav, but I'm not sure how much of a speed difference this will make.

I did not get around to modifying the code to handle longer clips. One thing you can try is lowering the sample rate a bit and using the max length of --data_slice_len 65536. With a sample rate of ~32k that will at least get you two seconds; probably the GAN training will result in more distortion anyway so the downsampling might not hurt too much.

Ah I see what you're saying. I haven't tried the procedure you're suggesting. I imagine that training a GAN with the most training data possible will usually produce the best results overall rather than initially overfitting to a smaller subset.

Morphing through the latent space can be a bit tricky since the clips begin and end abruptly. One thing you could do is identify looping portions (e.g., sustained moments) of the training data and train on that. Then, the GAN should also learn to produce looping segments, and you can hopefully smoothly fade between latent vectors.

Another thing to do is to turn the clips from the GAN into grains with envelopes and overlap them. I haven't tried this but for the chant music you're working with it could be a cool effect.

Good luck! Looking forward to hearing more results :)

from wavegan.

libby-h avatar libby-h commented on July 17, 2024

@chrisdonahue Thanks so much for this.

I just realised the clips I sent generated by wavegan were still actually 16k, i must have done something wrong when changing the no of samples.. will try again with 32k as you say.

Thanks for the tips with sustained moments and grains+envelopes. Lots to play with :)

from wavegan.

libby-h avatar libby-h commented on July 17, 2024

@chrisdonahue Hmm, I just set the model off again with 'python train_wavegan.py train ./train_44k_test --data_dir ./gregorian_chant_only --data_first_slice --data_slice_len 65536 --data_sample_rate 44100' on the data set I sent previously, assuming that it would create outputs at 44khz. But the preview output is generating samples at 16khz at 256kbps. Where am I going wrong with the parser arguments?

from wavegan.

libby-h avatar libby-h commented on July 17, 2024

@chrisdonahue sorted it. Also read your paper which was super useful in general. Will keep you updated with how it goes.

from wavegan.

libby-h avatar libby-h commented on July 17, 2024

Hi @chrisdonahue sharing more results https://drive.google.com/open?id=1MFQEvyPTjLRgMzrmzsUYyrEPCKjxwXFs 32k sample rate, after around 12800 iterations. Using default loaders. Data set of 10,795 items, totalling 1.3 GB (all 4 second clips of Gregorian Chanting).

It's really nice how I can hear different voices coming through in the generated clips now. Very haunting. I'm pleased so far. I read in your paper that on 5.3 hours of data (numbers 0-9) you trained for 2000k iterations, so I'll keep mine going for longer too and see what comes out next week. It's a larger data set and smaller GPU so taking longer.

from wavegan.

cinningbao avatar cinningbao commented on July 17, 2024

@mattjwarren and i have been training an engine with several thousand drum machine sounds with a view to building an interface for the engine which 'makes sense' and provides a few ways to traverse the data, effectively generating audio 'morphs' from the data. the interface could also be used on GAN engines to morph pictures.. still in the early stages and the training might go through a few iterations to improve the quality.

few drum morphs and a beat constructed from a few other morphs here
https://drive.google.com/drive/folders/1ETL7FZe-desY2ugQ9MSdNu7cj8Id2HdI?usp=sharing

from wavegan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.