auzxb's Projects
Code release for ActionFormer (ECCV 2022)
A complete training recipe for kaldi-based Automatic Lyrics Transcription.
Audio captioning recipe
Code and slides of my YouTube series called "Audio Signal Proessing for Machine Learning"
Collection of audio-focused loss functions in PyTorch
A collection of resources and papers on Diffusion Models
BERT for Chinese Couplet | BERT用于自动对对联
Unofficial pytorch implementation of BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Official PyTorch implementation of BigVGAN (ICLR 2023)
电子书
行业内关于智能客服、聊天机器人的应用和架构、算法分享和介绍
最全中华古诗词数据库, 唐宋两朝近一万四千古诗人, 接近5.5万首唐诗加26万宋诗. 两宋时期1564位词人,21050首词。
Contrastive Language-Audio Pretraining
CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)
PyTorch Implementation of Deep Q-Learning with Experience Replay in Atari Game Environments, as made public by Google DeepMind
An end-to-end chorus detection model DeepChorus.
Code and slides for the "Deep Learning (For Audio) With Python" course on TheSoundOfAI Youtube channel.
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
An implement of "Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training"
This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
Official Implementation of EnCLAP
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Any-to-any voice conversion by end-to-end extracting and fusing fine-grained voice fragments with attention
A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset
GLUE dataset download script