Code Monkey home page Code Monkey logo

mongolian-speech-recognition's Introduction

An online demo trained with a Mongolian proprietary dataset (WER 8%): https://chimege.mn/.

In this repo, following papers are implemented:

This repo is partially based on:

Training

  1. Install PyTorch>=1.3 with conda
  2. Install remaining dependencies: pip install -r requirements.txt
  3. Download the Mongolian Bible dataset: cd datasets && python dl_mbspeech.py
  4. Pre compute the mel spectrograms: python preprop_dataset.py --dataset mbspeech
  5. Train: python train.py --model crnn --max-epochs 50 --dataset mbspeech --lr-warmup-steps 100
    • logs for the TensorBoard are saved in the folder logdir

Results

During the training, the ground truth and recognized texts are logged into the TensorBoard. Because the dataset contains only a single person, the predicted texts from the validation set should be already recognizable after few epochs:

EXPECTED:

аливаа цус хувцсан дээр үсрэхэд цус үсэрсэн хэсгийг та нар ариун газарт угаагтун

PREDICTED:

аливаа цус хувцсан дээр үсэрхэд цус усарсан хэсхийг та нар ариун газарт угаагтун

For fun, you can also generate an audio with a Mongolian TTS and try to recognize it. The following code generates an audio with the TTS of the Mongolian National University and does speech recognition on that generated audio:

# generate audio for 'Миний төрсөн нутаг Монголын сайхан орон'
wget -O test.wav "http://172.104.34.197/nlp-web-demo/tts?voice=1&text=Миний төрсөн нутаг Монголын сайхан орон."
# speech recognition on that TTS generated audio
python transcribe.py --checkpoint=logdir/mbspeech_crnn_sgd_wd1e-05/epoch-0050.pth --model=crnn test.wav
# will output: 'миний төрсөн нут мөнголын сайхан оөрулн'

It is also possible to use a KenLM binary model. First download it from tugstugi/mongolian-nlp. After that, install parlance/ctcdecode. Now you can transcribe with the language model:

python transcribe.py --checkpoint=path/to/checkpoint --lm=mn_5gram.binary --alpha=0.3 test.wav

Contribute

If you are Mongolian and want to help us, please record your voice on Common Voice.

mongolian-speech-recognition's People

Contributors

tugstugi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mongolian-speech-recognition's Issues

Installion issues

Sain bnu?
Bi uurii chin ajillig sullgaj joohon yum sudlah gesen yum. Daanch installation deere gatschihlaa. 1).Docker file --> workspace uurin chin file uu? ugui bol workspace file github-d baih estio file uu?
2). Apex error bas uguud bh yum
Traceback (most recent call last):
File "train.py", line 12, in
import apex
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex/init.py", line 18, in
from apex.interfaces import (ApexImplementation,
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex/interfaces.py", line 10, in
class ApexImplementation(object):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex/interfaces.py", line 14, in ApexImplementation
implements(IApex)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/zope/interface/declarations.py", line 706, in implements
raise TypeError(_ADVICE_ERROR % 'implementer')
TypeError: Class advice impossible in Python3. Use the @Implementer class decorator instead.
Install hiihed tusalj ugnuu?
Bayarlalaa

train a language model

The network outputs recognizable texts already after 30 minutes or 10 epochs:

expected:

аливаа цус хувцсан дээр үсрэхэд цус үсэрсэн хэсгийг та нар ариун газарт угаагтун

predicted:

аааааааааааалллллллливвваааааааааааааааааа ууусссс ххххууввссаанн гэээрррррррррр үүсссэррррррххх ттуусссуррссрррссссаннн хххээссссггийгг ттаааааааааааааааааааа ннаарррррррр ааааааааааааааааааааарррииинннннн гггаааааааааааааарррррртт ууггааааааааааааааааааагтттүүнннррррааааа

To collapse the repeated characters and to choose most likely word sequence, we need to train a language model using KenLM.

Illegal Instruction

eval.py gives Illegal Instruction error.
hardware : AMD A8-6410 APU with AMD Radeon R5 Graphics
image

Use bigger network

Predictions on the validation set look already good:

EXPECTED:

аливаа цус хувцсан дээр үсрэхэд цус үсэрсэн хэсгийг та нар ариун газарт угаагтун

PREDICTED:

аливаа ус хусан ээр үсэрэхэ ус үсэрсан хэсгийг та нар ариун газарт угаагтун

Now, increase the network model size add some dropouts to see whether above mistakes could be fixed.

Асуулт

python_speech_features-ын logfbank-ыг яагаад хэрэглэх хэрэгтэйг тайлбарлаж болох уу? (Жишээ нь яагаад mfcc хэрэглээгүй вэ?)
winlen, winstep, preemph зэрэг утгууд нь default хамгийн сайн утга гэж ойлгож болох уу? (Яаж tune хийх ээ сайн ойлгохгүй л байна)

Freeze support issue

Hi Erdene-Ochir Tuguldur,

thanks for sharing your work.

I am trying to start a train for the first time (Windows 10, Conda, Single GPU)but I am getting this run time error:

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

I can see that I need to add some kind of guard in the train py to avoid recursive subprocess, but I couldn't find where exact place.

Please suggest, thank you beforehand.

Which are the best loss resuls for this project?

Dear Erdene-Ochir Tuguldur,

You are doing great job! I mean also your TTS project.
The codes are so plain, training is so quick, but at the same time the solutions are powerful and effective.
May I know your best loss resuls for this project?
Recently I had developed Kaldi ASR soluion for my language.
So I would like to know is it possible to reach near resuls with your Speech Recognition project.

Thank you in advance!

Noam scheduler

Hi thank you for your great work. I wonder why do u think Noam scheduler work well for this case (mongolian)

some questions

Hi, tugstugi, thanks for sharing your works for speech recognition, while i have some issues about the code:

  1. I noticed the vocab is "B абвгдеёжзийклмноөпрстуүфхцчшъыьэюя", why there is a blank behind the character 'B'? I know 'B' stands for blank, but what the ' ' stands for?
  2. The convolution operation in this network are all 1-d conv, by this way how can this network learn temporal information?
    Look forward to your reply, thank a lot!

broken links in dataset download script

the storage bucket used to pull the Mongolian Bible dataset no longer has the Mongolian version available for download.

if anyone still has a copy of the original .zip files, I would be eternally grateful.

ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python3', 'dl_mbspeech.py']' command failed.  (See above for error)
downloading https://s3.us-east-2.amazonaws.com/bible.davarpartners.com/Mongolian/01_Genesis.zip...
extracting '01_Genesis.zip'...

2MB [00:00, 766.57MB/s]
Traceback (most recent call last):
  File "/Users/xd/Code/mongolian-speech-recognition/datasets/dl_mbspeech.py", line 37, in <module>
    zipfile = ZipFile(bible_book_file_path)
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.9/zipfile.py", line 1257, in __init__
    self._RealGetContents()
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.9/zipfile.py", line 1324, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.