Code Monkey home page Code Monkey logo

speech-separation-paper's Introduction

A must-read paper list for speech separation based on neural networks

This repository contains papers for pure speech separation and multimodal speech separation.

Pure Speech Separation

✔️ [Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation, Po-Sen Huang, TASLP 2015] [Paper] [Code (posenhuang)]

✔️ [Complex Ratio Masking for Monaural Speech Separation, DS Williamson, TASLP 2015] [Paper]

✔️ [Deep clustering: Discriminative embeddings for segmentation and separation, JR Hershey, ICASSP 2016] [Paper] [Code (Kai Li)] [Code (funcwj)]

✔️ [Single-channel multi-speaker separation using deep clustering, Y Isik, Interspeech 2016] [Paper] [Code (Kai Li)] [Code (funcwj)]

✔️ [Permutation invariant training of deep models for speaker-independent multi-talker speech separation, Dong Yu, ICASSP 2017] [Paper] [Code (Kai Li)]

✔️ [Recognizing Multi-talker Speech with Permutation Invariant Training, Dong Yu, ICASSP 2017] [Paper]

✔️ [Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks, M Kolbæk, TASLP 2017] [Paper] [Code (Kai Li)]

✔️ [Deep attractor network for single-microphone speaker separation, Zhuo Chen, ICASSP 2017] [Paper] [Code (Kai Li)]

✔️ [Alternative Objective Functions for Deep Clustering, Zhong-Qiu Wang, ICASSP 2018] [Paper]

✔️ [End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction Zhong-Qiu Wang et al. 2018] [Paper]

✔️ [Speaker-independent Speech Separation with Deep Attractor Network, Luo Yi, TASLP 2018] [Paper] [Code (Kai Li)]

✔️ [Tasnet: time-domain audio separation network for real-time, single-channel speech separation, Luo Yi, ICASSP 2018] [Paper] [Code (Kai Li)]

✔️ [Supervised Speech Separation Based on Deep Learning An Overview,DeLiang Wang, Arxiv 2018] [Paper]

✔️ [Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation, Luo Yi, TASLP 2019] [Paper] [Code (Kai Li)]

✔️ [Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation, Yuzhou Liu, TASLP 2019] [Paper] [Code]

✔️ [Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation, Luo Yi, Arxiv 2019] [Paper] [Code (Kai Li)]

✔️ [End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation, Luo Yi, Arxiv 2019] [Paper] [Code]

✔️ [A comprehensive study of speech separation: spectrogram vs waveform separation, Fahimeh Bahmaninezhad, Interspeech 2019] [Paper]

✔️ [FurcaNeXt: End-to-end monaural speech separation with dynamic gated dilated temporal convolutional networks, Liwen Zhang, MMM 2020] [Paper]

✔️ [Filterbank design for end-to-end speech separation, Manuel Pariente et al., ICASSP 2020] [Paper]

✔️ [Voice Separation with an Unknown Number of Multiple Speakers, Eliya Nachmani, Arxiv 2020] [Paper] [Demo]

✔️ [AN EMPIRICAL STUDY OF CONV-TASNET, Berkan Kadıoglu , Arxiv 2020] [Paper]

✔️ [Voice Separation with an Unknown Number of Multiple Speakers, Eliya Nachmani, Arxiv 2020] [Paper]

✔️ [Wavesplit: End-to-End Speech Separation by Speaker Clustering, Neil Zeghidour et al. Arxiv 2020 ] [Paper]

✔️ [La Furca: Iterative Context-Aware End-to-End Monaural Speech Separation Based on Dual-Path Deep Parallel Inter-Intra Bi-LSTM with Attention, Ziqiang Shi, Arxiv 2020] [Paper]


Multi-Model Speech Separation

✔️ [Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks, Jen-Cheng Hou, TETCI 2017] [Paper] [Code]

✔️ [The Conversation: Deep Audio-Visual Speech Enhancement, Triantafyllos Afouras, Interspeech 2018] [Paper]

✔️ [End-to-end audiovisual speech recognition, Stavros Petridis, ICASSP 2018] [Paper] [Code]

✔️ [The Sound of Pixels, Hang Zhao, ECCV 2018] [Paper] [Code]

✔️ [Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation, ARIEL EPHRAT, ACM Transactions on Graphics 2018] [Paper] [Code]

✔️ [Learning to Separate Object Sounds by Watching Unlabeled Video, Ruohan Gao, ECCV 2018] [Paper]

✔️ [Time domain audio visual speech separation, Jian Wu, Arxiv 2019] [Paper]

✔️ [Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network, Ke Tan, Arxiv 2019] [Paper]

✔️ [Co-Separating Sounds of Visual Objects, Ruohan Gao, ICCV 2019] [Paper] [Code]


Evaluation index

✔️ [Performance measurement in blind audio sourceseparation, Emmanuel Vincent et al., TASLP 2004] [Paper]

✔️ [SDR – Half-baked or Well Done?, Jonathan Le Roux, ICASSP 2019] [Paper]

I may not be able to get all the articles completely. So if you have an excellent essay or tutorial, you can update it in my format. At the same time, if you think the repository meets your needs, please give a star or fork, thank you.

speech-separation-paper's People

Contributors

jusperlee avatar mpariente avatar wzx1533 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.