Code Monkey home page Code Monkey logo

ai-text-to-audio-latent-diffusion's Introduction

README ๐ŸŽ

About

This repo is for text-to-audio diffusion utilizing a denoising unet and Meta's Encodec. The unet is trained to denoise Encodec's encoded codebooks while taking in t5 text embeddings as conditioning. Encodec's decoder can then take the denoised codebooks, and decode it to the uncompressed .wav file.

The architecture is by no means perfect as it is being actively tested/worked on. If you have any suggestions for improvements to try please don't hesistate to let us know!

Instructions

  • Clone the repo
  • Set up your environment
  • Launch the train_latent_cond.py file with accelerate (example_launch_command.txt in root directory for an example)
  • training_args.md in root directory for argument explanations
  • Inferencing scripts/notebooks/trained models coming soon

Shout Outs

  • Thanks to Hugging Face for diffusers/transformers and being a huge contribution to the open source community
  • Thanks to HarmonAI for their audio diffusion research and contributions to the open source community
  • Thanks to Stable Diffusion and OpenAI for the unet/cross-attention base code and for their open source contributions
  • Thanks to Meta for open sourcing Encodec and all of their other open source contributions
  • Thanks to Google for open sourcing the t5 large language model.
  • Shoutout to EveryDream for windows venv setup and bnb patch

ai-text-to-audio-latent-diffusion's People

Contributors

cosmicbboy avatar devinschumacher avatar francislabountyjr avatar johnpaulbin avatar morganmcg1 avatar twobob avatar zqevans avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.