Code Monkey home page Code Monkey logo

audio-ai-timeline's Introduction

Audio AI Timeline

Here we will keep track of the latest AI models for waveform based audio generation, starting in 2023!

2023

Date Release [Samples] Paper Code Trained Model
14.11 Mustango: Toward Controllable Text-to-Music Generation arXiv GitHub Hugging Face
13.11 Music ControlNet: Multiple Time-varying Controls for Music Generation arXiv - -
02.11 E3 TTS: Easy End-to-End Diffusion-based Text to Speech arXiv - -
01.10 UniAudio: An Audio Foundation Model Toward Universal Audio Generation arXiv GitHub -
24.09 VoiceLDM: Text-to-Speech with Environmental Context arXiv GitHub -
05.09 PromptTTS 2: Describing and Generating Voices with Text Prompt arXiv - -
14.08 SpeechX: Neural Codec Language Model as a Versatile Speech Transformer arXiv - -
10.08 AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining arXiv GitHub Hugging Face
09.08 JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models arXiv - -
03.08 MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies arXiv GitHub -
14.07 Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts arXiv - -
10.07 VampNet: Music Generation via Masked Acoustic Token Modeling arXiv GitHub -
22.06 AudioPaLM: A Large Language Model That Can Speak and Listen arXiv - -
19.06 Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale PDF GitHub -
08.06 MusicGen: Simple and Controllable Music Generation arXiv GitHub Hugging Face Colab
06.06 Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias arXiv - -
01.06 Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis arXiv GitHub -
29.05 Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation arXiv - -
25.05 MeLoDy: Efficient Neural Music Generation arXiv - -
18.05 CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training arXiv - -
18.05 SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities arXiv GitHub -
16.05 SoundStorm: Efficient Parallel Audio Generation arXiv GitHub (unofficial) -
03.05 Diverse and Vivid Sound Generation from Text Descriptions arXiv - -
02.05 Long-Term Rhythmic Video Soundtracker arXiv GitHub -
24.04 TANGO: Text-to-Audio generation using instruction tuned LLM and Latent Diffusion Model PDF GitHub Hugging Face
18.04 NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers arXiv GitHub (unofficial) -
10.04 Bark: Text-Prompted Generative Audio Model - GitHub Hugging Face Colab
03.04 AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models arXiv - -
08.03 VALL-E X: Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling arXiv - -
27.02 I Hear Your True Colors: Image Guided Audio Generation arXiv GitHub -
08.02 Noise2Music: Text-conditioned Music Generation with Diffusion Models arXiv - -
04.02 Multi-Source Diffusion Models for Simultaneous Music Generation and Separation arXiv GitHub -
30.01 SingSong: Generating musical accompaniments from singing arXiv - -
30.01 AudioLDM: Text-to-Audio Generation with Latent Diffusion Models arXiv GitHub Hugging Face
30.01 Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion arXiv GitHub -
29.01 Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models PDF - -
28.01 Noise2Music - - -
27.01 RAVE2 [Samples RAVE1] arXiv GitHub -
26.01 MusicLM: Generating Music From Text arXiv GitHub (unofficial) -
18.01 Msanii: High Fidelity Music Synthesis on a Shoestring Budget arXiv GitHub Hugging Face Colab
16.01 ArchiSound: Audio Generation with Diffusion arXiv GitHub -
05.01 VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers arXiv GitHub (unofficial) (demo) -

audio-ai-timeline's People

Contributors

flavioschneider avatar haoheliu avatar justinyuu avatar lifeiteng avatar yuan-manx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

audio-ai-timeline's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.