dendisuhubdy Goto Github PK

followers: 607.0 following: 3.5K repos: 2.9K gists: 212.0

Name: Dendi Suhubdy

Type: User

Company: @bitwyre

Bio: Running a high-frequency trading exchange platform for magic internet money @bitwyre

Twitter: dendisuhubdy

Location: Bali, Indonesia

Blog: http://bitwyredwnvphpmsvd4lc4456q6gwtzz7uzksjcpyuoo4w34x6xxqoid.onion/

Contact Me

Dendi Suhubdy's Projects

speech-separation-paper

A must-read paper for speech separation based on neural networks

speech-to-text-wavenet

Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow

speech-tranformer-pytorch

Seq2Seq Speech Recognition with Transformer on Mandarin Chinese

speech_feature_extraction

Feature extraction of speech signal is the initial stage of any speech recognition system.

speech_signal_processing_and_classification

Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].