Sho Inoue's Projects
It contains 2D marker detection using convolutional layers and pooling layers.
Instruct-tune LLaMA on consumer hardware
This repository is to introduce the application of Activation Maximization for audio-domain data.
This includes the audio demo of ARDiT
*BeaqleJS* provides a framework to create browser based listening tests and is purely based on open web standards like HTML5 and Javascript.
The official PyTorch implementation - Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective (CVPR'22).
Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
Lessons provided in Gonsalves Laboratory
It contains the lessons I created for Gonsalves AI laboratory.
A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
This repository is to introduce our research, LCGAN.
[ICASSP 2024] ๐ต Matcha-TTS: A fast TTS architecture with conditional flow matching
Implementation code of non-parallel sequence-to-sequence VC
Rich Prosody Diversity Modelling with Phone-level Mixture Density Network
Open-source keyboard firmware for Atmel AVR and Arm USB families
ๅฃฐใใงใ้้ใฎ้ณๅฃฐ็ๆ้ฒ(https://shinshoji01.hatenablog.com/) ใง็ดนไปใใฆใใฝใผในใณใผใ
This is the implementation of our Interspeech 2021 paper: Limited data emotional voice conversion leveraging text-to-speech: two-stage sequence-to-sequence training.
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities.