Dendi Suhubdy's Projects
A must-read paper for speech separation based on neural networks
Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow
Seq2Seq Speech Recognition with Transformer on Mandarin Chinese
Feature extraction of speech signal is the initial stage of any speech recognition system.
Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
Unsupervised Speech Decomposition Via Triple Information Bottleneck
Official implementation of SpeechSplit2
Recover Monero address using the private spend key
Solana program that makes Token Faucets possible
Interface for creating and managing SPL Tokens
Deezer source separation library including pretrained models.
Unopinionated utilities for resizeable split views
My own tools for easing the task of pentesting / exploit writing
Evaluating Spotify's Discover Weekly feature using machine learning
This userbot updates the biography of a telegram user according to their current spotify playback.
A concurrent, spsc ring-buffer with sized reservations
A highly optimized single producer single consumer message queue C++ template
A bounded single-producer single-consumer wait-free and lock-free queue written in C++11
A modified version of Speech Signal Processing Toolkit (SPTK)
The C++14 wrapper around sqlite library
❤️ SQLite ORM light header only library for modern C++
Extensible SQL Lexer and Parser for Rust
PyTorch implementation of Image Super-Resolution Using Deep Convolutional Networks (ECCV 2014)
PyTorch Implementation of "Lossless Image Compression through Super-Resolution"
Open source SDR LTE software suite from Software Radio Systems (SRS)
SIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html
Saving SSH keys in macOS Sierra keychain