Code Monkey home page Code Monkey logo

mongolian-speech-recognition's Issues

some questions

Hi, tugstugi, thanks for sharing your works for speech recognition, while i have some issues about the code:

  1. I noticed the vocab is "B абвгдеёжзийклмноөпрстуүфхцчшъыьэюя", why there is a blank behind the character 'B'? I know 'B' stands for blank, but what the ' ' stands for?
  2. The convolution operation in this network are all 1-d conv, by this way how can this network learn temporal information?
    Look forward to your reply, thank a lot!

Use bigger network

Predictions on the validation set look already good:

EXPECTED:

аливаа цус хувцсан дээр үсрэхэд цус үсэрсэн хэсгийг та нар ариун газарт угаагтун

PREDICTED:

аливаа ус хусан ээр үсэрэхэ ус үсэрсан хэсгийг та нар ариун газарт угаагтун

Now, increase the network model size add some dropouts to see whether above mistakes could be fixed.

Асуулт

python_speech_features-ын logfbank-ыг яагаад хэрэглэх хэрэгтэйг тайлбарлаж болох уу? (Жишээ нь яагаад mfcc хэрэглээгүй вэ?)
winlen, winstep, preemph зэрэг утгууд нь default хамгийн сайн утга гэж ойлгож болох уу? (Яаж tune хийх ээ сайн ойлгохгүй л байна)

Installion issues

Sain bnu?
Bi uurii chin ajillig sullgaj joohon yum sudlah gesen yum. Daanch installation deere gatschihlaa. 1).Docker file --> workspace uurin chin file uu? ugui bol workspace file github-d baih estio file uu?
2). Apex error bas uguud bh yum
Traceback (most recent call last):
File "train.py", line 12, in
import apex
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex/init.py", line 18, in
from apex.interfaces import (ApexImplementation,
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex/interfaces.py", line 10, in
class ApexImplementation(object):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex/interfaces.py", line 14, in ApexImplementation
implements(IApex)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/zope/interface/declarations.py", line 706, in implements
raise TypeError(_ADVICE_ERROR % 'implementer')
TypeError: Class advice impossible in Python3. Use the @Implementer class decorator instead.
Install hiihed tusalj ugnuu?
Bayarlalaa

broken links in dataset download script

the storage bucket used to pull the Mongolian Bible dataset no longer has the Mongolian version available for download.

if anyone still has a copy of the original .zip files, I would be eternally grateful.

ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python3', 'dl_mbspeech.py']' command failed.  (See above for error)
downloading https://s3.us-east-2.amazonaws.com/bible.davarpartners.com/Mongolian/01_Genesis.zip...
extracting '01_Genesis.zip'...

2MB [00:00, 766.57MB/s]
Traceback (most recent call last):
  File "/Users/xd/Code/mongolian-speech-recognition/datasets/dl_mbspeech.py", line 37, in <module>
    zipfile = ZipFile(bible_book_file_path)
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.9/zipfile.py", line 1257, in __init__
    self._RealGetContents()
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.9/zipfile.py", line 1324, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Noam scheduler

Hi thank you for your great work. I wonder why do u think Noam scheduler work well for this case (mongolian)

train a language model

The network outputs recognizable texts already after 30 minutes or 10 epochs:

expected:

аливаа цус хувцсан дээр үсрэхэд цус үсэрсэн хэсгийг та нар ариун газарт угаагтун

predicted:

аааааааааааалллллллливвваааааааааааааааааа ууусссс ххххууввссаанн гэээрррррррррр үүсссэррррррххх ттуусссуррссрррссссаннн хххээссссггийгг ттаааааааааааааааааааа ннаарррррррр ааааааааааааааааааааарррииинннннн гггаааааааааааааарррррртт ууггааааааааааааааааааагтттүүнннррррааааа

To collapse the repeated characters and to choose most likely word sequence, we need to train a language model using KenLM.

Which are the best loss resuls for this project?

Dear Erdene-Ochir Tuguldur,

You are doing great job! I mean also your TTS project.
The codes are so plain, training is so quick, but at the same time the solutions are powerful and effective.
May I know your best loss resuls for this project?
Recently I had developed Kaldi ASR soluion for my language.
So I would like to know is it possible to reach near resuls with your Speech Recognition project.

Thank you in advance!

Illegal Instruction

eval.py gives Illegal Instruction error.
hardware : AMD A8-6410 APU with AMD Radeon R5 Graphics
image

Freeze support issue

Hi Erdene-Ochir Tuguldur,

thanks for sharing your work.

I am trying to start a train for the first time (Windows 10, Conda, Single GPU)but I am getting this run time error:

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

I can see that I need to add some kind of guard in the train py to avoid recursive subprocess, but I couldn't find where exact place.

Please suggest, thank you beforehand.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.