Code Monkey home page Code Monkey logo

final-project-level3-nlp-07's Introduction

Final Project : I Can Read Your Voice

청력이 약하신 분들의 경우 전화 통화에 있어 청해력이 떨어지는 경우가 있으며, 이에 도움을 드리기 위해 통화 내용을 실시간으로 텍스트 변환하여 출력해 통화 내용의 이해를 보조하는 것을 목표로 실시간 스트리밍 Speech-To-Text 서비스 개발

TEAM : 조지KLUE니

Members

김보성 김지후 김혜수 박이삭 이다곤 전미원 정두해
image1 image2 image3 image4 image5 image6 image7
Github Github Github Github Github Github Github

Contribution

김보성 Model Optimization • gRPC Communication

김지후 ASR Model Performance Comparison • Frontend

김혜수 Dataset Processing • Reference Paper Searching

박이삭 Auto Speech Recognition Modeling (Data I/O) • Socket Communication

전미원 Socket Communication • Audio Model Structure Search

정두해 Auto Punctuation Language Modeling • Dataset Processing

Project Flow

Screen Shot 2021-12-23 at 11 08 07 PM

Main Tasks - Audio Modeling Part

Modeling Reference : https://github.com/hchung12/espnet-asr

정확도가 높지만 streaming에 특화되지 않은 모델을 streaming 처리가 가능한 형태로 바꾸기 위해 오디오 파일 변환 과정 생략과 함께 아래와 같은 방식으로 Data I/O 방식 개선

Definition of "Frame" in conversation

Screen Shot 2021-12-24 at 2 31 36 AM

Implementation 1: Silence Threshold

Screen Shot 2021-12-24 at 2 20 51 AM

Implementation 2: Silence Length

Screen Shot 2021-12-24 at 2 20 45 AM

Implementation 3: Long Silence Ignore

Screen Shot 2021-12-24 at 2 21 11 AM

Implementation 4: Frame-Cut with Overlap

Screen Shot 2021-12-24 at 2 21 03 AM

Main Tasks - Language Modeling Part

Modeling Reference : https://github.com/xashru/punctuation-restoration

오디오 모델을 통해 출력된 텍스트 출력에는 온점(.), 반점(,), 물음표(?)와 같은 punctuation mark가 별도로 출력되지 않는 문제점을 발견하고 이러한 raw text가 입력으로 주어졌을 때 punctuation mark를 자동으로 삽입하는 언어 모델 개발

LM Architecture

Screen Shot 2021-12-22 at 11 51 53 PM

Demonstration

Screen Shot 2021-12-24 at 12 05 15 PM

시연 영상

Demo Video

설치 방법

Mac & Linux:

brew install portaudio
pip install -r requirements.txt

Windows:

sudo apt-get install portaudio
pip install -r requirements.txt

실행 방법

Client 구동

python client.py

서버 구동

python server.py

final-project-level3-nlp-07's People

Contributors

barleysack avatar jihoo97 avatar tentoto avatar vgptnv avatar wavy-jung avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.