thanhkm Goto Github PK
Name: Thanh KM
Type: User
Bio: Research interests: NLP, Generative AI, Text to Speech, Speech recognition
Name: Thanh KM
Type: User
Bio: Research interests: NLP, Generative AI, Text to Speech, Speech recognition
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Official PyTorch implementation of BigVGAN (ICLR 2023)
Contrastive Language-Audio Pretraining
Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
Tensorflow2 based implementation of ContextNet, an improved convolutional rnn-transducer-based architecture for end-to-end speech recognition using global context
This is the implementation for "ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Rhythm"
PyTorch CTC Decoder bindings
Cracking the Coding Interview 6th Ed. Solutions
Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion" (AAAI 2024)
A PaddlePaddle implementation of DeepSpeech2 architecture for ASR.
Speech Recognition using DeepSpeech2.
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Official Pytorch Implementation of "Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation"
Official repository for "Speaking Style Conversion With Discrete Self-Supervised Units". https://arxiv.org/abs/2212.09730
This is an open source project (formerly named Listen, Attend and Spell - PyTorch Implementation) for end-to-end ASR implemented with Pytorch, the well known deep learning toolkit.
EPI Judge - Preview Release
End-to-End Speech Processing Toolkit
Non official project based on original /r/Deepfakes thread. Many thanks to him!
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
The official implementation of HierSpeech++
HuBERT content encoders for: A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
This is the official location of the Kaldi project.
dataset for lightly supervised training using the librivox audio book recordings. https://librivox.org/.
Tensorflow implementation of "Listen, Attend and Spell" authored by William Chan. This project utilizes input pipeline and estimator API of Tensorflow, which makes the training and evaluation truly end-to-end.
singing voice change based on whisper, and lora for singing voice clone
Models and examples built with TensorFlow
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.