Code Monkey home page Code Monkey logo

tarteel-ml's People

Contributors

aymenq avatar dependabot[bot] avatar hmzh-khn avatar karim-53 avatar murtraja avatar omerasif-itu avatar piraka9011 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tarteel-ml's Issues

Create a train-test-validation split for the recordings by verse.

Split the verse in the Qur'an 60-20-20 by verse. All recordings of that verse will be in the same set.

Note: There should be two copies of this split. One of them should be by ayah, and the other should be by unique ayah (i.e. identical ayat should be lumped together in one ayah-set).

Same number of audio files with or without surah [-s] argument

Description:
Same audio data with or without surah [-s] argument.

$ python3 download.py --use-cache --log CRITICAL
Audio Files:   0%|                                                   | 14/20565 [01:06<26:52:56,  4.71s/it]
python3 download.py -s 1 --use-cache --log CRITICAL
Audio Files:   0%|                                                               | 0/20565 [00:00<?, ?it/s]

Is it normal to have same amount of data in both cases?

Please advise. Regards.

Create a Dockerfile to support repo on multiple OS like Windows

As-salāmu alaykum. I was trying to run this project on Windows, but it is giving me errors in every step. First, it was giving errors in finding the specific versions of the modules mentioned in your requirements.txt file, so I downloaded the latest versions of them manually. This creates the possibility that I will be getting errors in the next steps. Which I am. Currently, I am getting an error at the very next step:
python download.py -s 1

Is it possible for you to create a Docker app, so we can run it on Windows without any hassle? That would likely help a lot of people. Thank you. JazākAllāhu khayrā.

how to generate language model?

I am planning to train deep speech using the Korean language. Could you provide some guidelines about how can I create a language model?

get Invalid wave header found

I have cloned Tarteel-ML repo, started to download csv for Alfatiha as mentioned in wiki but i got error
all i do was

python3 download.py -s 1
Downloading CSV from https://d2sf46268wowyo.cloudfront.net/datasets/tarteel_v1.0.csv
Done downloading CSV.
Invalid wave header found .audio/s1/a4/1_4_2787081723.wav , removing.
Invalid wave header found .audio/s1/a3/recording_FO94M2V.wav , removing.
Invalid wave header found .audio/s1/a1/1_1_3752224010.wav , removing.
Invalid wave header found .audio/s1/a6/1_6_4035742518.wav , removing.
Invalid wave header found .audio/s1/a2/1_2_4115400297.wav , removing.
Audio file .audio/s1/a5/1_5_4027410949.wav does not have speech according to VAD. Removing.
Audio file .audio/s1/a7/1_7_456658554.wav does not have speech according to VAD. Removing.
Invalid wave header found .audio/s1/a2/1_2_2198883921.wav , removing.
Invalid wave header found .audio/s1/a6/1_6_3964846100.wav , removing.
Audio file .audio/s1/a4/1_4_3355668251.wav does not have speech according to VAD. Removing.
Invalid wave header found .audio/s1/a1/1_1_526190118.wav , removing.
Invalid wave header found .audio/s1/a6/1_6_4081852003.wav , removing.
Invalid wave header found .audio/s1/a6/1_6_1540864270.wav , removing.
Audio file .audio/s1/a1/1_1_1740606045.wav does not have speech according to VAD. Removing.
Invalid wave header found .audio/s1/a3/1_3_1486618514.wav , removing.
Invalid wave header found .audio/s1/a5/1_5_1812842962.wav , removing.
Invalid wave header found .audio/s1/a2/1_2_2615458200.wav , removing.
Invalid wave header found .audio/s1/a3/1_3_2791353353.wav , removing.
Invalid wave header found .audio/s1/a3/1_3_619438520.wav , removing.
Audio file .audio/s1/a4/1_4_3452859238_pbxN174.wav does not have speech according to VAD. Removing.
Invalid wave header found .audio/s1/a7/1_7_3771589469.wav , removing.
Invalid wave header found .audio/s1/a6/1_6_1220528519.wav , removing.
Audio file .audio/s1/a3/1_3_409467092_jRelUpO.wav does not have speech according to VAD. Removing.

python 3.7.1
linux ubuntu 16.04

Train a model for gender prediction

Based on conversations with @abdulhaim and others, we have realized that it is important to know the gender of the person reciting each recordings to protect gender-based privacy during evaluation (see https://github.com/Tarteel-io/tarteel.io/issues/179).

However, only a small fraction of the recordings have a gender associated with them, because providing demographic information is optional. We can potentially overcome this issue by training a gender-identification model to provide tentative gender labels for our recordings.

Conda requirements are OSX specific

Conda requirements are currently OSX specific -- either a simpler requirements file should be created by hand (without pinning each individual dependency unnecessarily), or a separate requirements file should be made for Linux systems.

the files for the fattiha overfitting experiment

السلام عليكم

currently i am experimenting a quraan tutor for surat elekhlass, the same idea as surat alfatihah but with another audio set which from a known tutors.
it seems that i have a problem with the preprocessing step : /

can i get the files that were used for the experiment to compare it with mine?
train_src.txt, train_tgt.txt, val_src.txt, val_tgt.txt

thank you

`dataset_csv_url` : A ghost argument?

Issue: Exception

In download.py, line 103 throws exception:

'Namespace' object has no attribute 'dataset_csv_url'

while changing it to csv_url, fixes it.

Possible Cause:
Argument mismatch?

line 26 : parser.add_argument('--csv-url', type=str, default=TARTEEL_V1_CSV_URL)

line 103 : download_csv_dataset(args.dataset_csv_url, path_to_dataset_csv)

How to Add/Edit Wiki pages?

I want to add documentation to the concept of MFCC Coefficients. Also I noticed some typos in existing Wiki pages.

How can I add or edit Wiki pages?

Update tutorials

Elsalem,

Guys I have spent one day trying to run the first ML model unsuccessfully.
Problem is, there are different tutorials across the repo:

  • README.md propose to run the following
download.py: Download the Tarteel dataset
create_train_test_split.py: Create train/test/validation split csv files.
generate_alphabet|vocabulary.py: Generate all unique letters/ayahs in the Quran in a text file.
generate_csv_deepspeech.py: Create a CSV file for training with DeepSpeech.

But I am stuck at generate_csv_deepspeech.py
And I don't know what is the purpose of each data generated...

  • The wiki refer to py files that have been deleted long ago

Navigate into the audio_preprocessing directory and run python generate_features.py

  • wiki and CONTRIBUTING.md both explains how to set up the repo. It is just redundant...

I suggest that

  1. we update README.md with the minimum instruction to run the simplest ML model. and I would need your help with that please please
  2. we keep using CONTRIBUTING.md to explain how to set up the repo and bring the related content from the wiki here coz contributors can modify README.md easily but not the wiki as explained here #51

How to pass a live stream of audio to the ML

Hello

My plan is to make a mobile application for correcting the speech of the user while reading surat Al-EKlass.
i saw your tarteel application and it is fantastic ❤️ thank you for your work.

i already did the ML speech2text with OpenNMT, but i am wondering how did you pass a live stream audio from microphone to recognize it ? My plan was after training the ML i would set up the REST server then doing a GUI. but i am stuck with the audio streaming now :/

appreciate your help..,
thank you

Missing audio_processing directory

Where I can find the audio_processing directory?
following the wiki steps, after step 2, Navigate into the audio_preprocessing directory and run python generate_features.py -f mfcc -s 1 --local_download_dir "../.audio" --output_dir "../.outputs" to generate the MFCC coefficients I couldn't able to find that audio_preprocessing directory.

Downloading Surah Al-Fatihah only took long time

I was trying to download and preprocess Al-Fatiha. Here my commands:

git clone https://github.com/Tarteel-io/Tarteel-ML.git
cd Tarteel-ML/
git cherry-pick 624c46b
conda env create -f environment.yml
conda activate tarteel-ml
python download.py -s 1

I applied this commit to fix invalid wave header issue and make it download. However, it took long time for one short surah! Is this normal?

image

get error IndexError: too many indices for array

i made audio data set from 10 quran readers for each aya
so i have 10 audio files for each aya with sample rate 32000 hz in wav format also i passed all audio files on audio checking function in your download script
so now i have 61382 audio file 10 files foreach aya
so now i tried to run script Sequence-to-Sequence Model in Keras
where i prepared every thing as you build your system

  • Data/one-hot.pkl
    -.outputs/mfcc

but in your script you used
def build_dataset(local_coefs_dir='../.outputs/mfcc', surahs=[1], n=100):
i changed it to
def build_dataset(local_coefs_dir='../.outputs/mfcc', surahs=[2], n=100):

but i got this error
"IndexError: too many indices for array"
while executing this function
"convert_list_of_arrays_to_padded_array"
in this line
"padded_array[a, :r, :c] = arr"

and this is the values stored in memory
shape (1361, 13)
max_shape [13459]
padded_array (100,13459)

Logging exception due to invalid arguments

Currently this exception is thrown when downloading files: python3 download.py

--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.6/logging/__init__.py", line 994, in emit
    msg = self.format(record)
  File "/usr/lib/python3.6/logging/__init__.py", line 840, in format
    return fmt.format(record)
  File "/usr/lib/python3.6/logging/__init__.py", line 577, in format
    record.message = record.getMessage()
  File "/usr/lib/python3.6/logging/__init__.py", line 338, in getMessage
    msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
  File "download.py", line 103, in <module>
    download_csv_dataset(args.csv_url, path_to_dataset_csv)
  File "download.py", line 50, in download_csv_dataset
    logging.info("Downloading CSV from ", csv_url, " to ", dataset_csv_path, ".")
Message: 'Downloading CSV from '
Arguments: ('https://tarteel-frontend-static.s3-us-west-2.amazonaws.com/datasets/tarteel_v1.0.csv', ' to ', '.cache/csv/local.csv', '.')

It is due to this line:
logging.info("Downloading CSV from ", csv_url, " to ", dataset_csv_path, ".")

Error when trying to do "Fatihah overfitting experiment" but with surat Al-Ekhlass

Hello team tarteel, I would like to thank you for your hard work.

currently i am experimenting a quraan tutor for surat elekhlass, the same idea as surat alfatihah but with another audio set which from a known tutors.

I prepared all the files for training, but i face a problem in the training phase.

I run this command ::

!python /content/OpenNMT-py/train.py -model_type audio -enc_rnn_size 512 -dec_rnn_size 512 -audio_enc_pooling 1,2 -dropout 0 -enc_layers 2 -dec_layers 1 -rnn_type LSTM -data /content/OpenNMT-py/data/speech/demo -save_model demo-model -global_attention mlp -gpu_ranks 0 -batch_size 8 -optim adam -max_grad_norm 100 -learning_rate 0.0003 -learning_rate_decay 0.8 -train_steps 2000

the error is :

_[2020-03-04 21:03:57,891 INFO]  * tgt vocab size = 15
[2020-03-04 21:03:57,892 INFO] Building model...
[2020-03-04 21:04:02,067 INFO] NMTModel(
  (encoder): AudioEncoder(
    (W): Linear(in_features=512, out_features=512, bias=False)
    (batchnorm_0): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (rnn_0): LSTM(161, 512)
    (pool_0): MaxPool1d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=False)
    (rnn_1): LSTM(512, 512)
    (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (batchnorm_1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (decoder): InputFeedRNNDecoder(
    (embeddings): Embeddings(
      (make_embedding): Sequential(
        (emb_luts): Elementwise(
          (0): Embedding(15, 500, padding_idx=1)
        )
      )
    )
    (dropout): Dropout(p=0.0, inplace=False)
    (rnn): StackedLSTM(
      (dropout): Dropout(p=0.0, inplace=False)
      (layers): ModuleList(
        (0): LSTMCell(1012, 512)
      )
    )
    (attn): GlobalAttention(
      (linear_context): Linear(in_features=512, out_features=512, bias=False)
      (linear_query): Linear(in_features=512, out_features=512, bias=True)
      (v): Linear(in_features=512, out_features=1, bias=False)
      (linear_out): Linear(in_features=1024, out_features=512, bias=True)
    )
  )
  (generator): Sequential(
    (0): Linear(in_features=512, out_features=15, bias=True)
    (1): Cast()
    (2): LogSoftmax()
  )
)
[2020-03-04 21:04:02,067 INFO] encoder: 3747840
[2020-03-04 21:04:02,067 INFO] decoder: 4190555
[2020-03-04 21:04:02,067 INFO] * number of parameters: 7938395
[2020-03-04 21:04:02,068 INFO] Starting training on GPU: [0]
[2020-03-04 21:04:02,068 INFO] Start training loop and validate every 10000 steps...
[2020-03-04 21:04:02,069 INFO] Loading dataset from /content/OpenNMT-py/data/speech/demo.train.0.pt
[2020-03-04 21:04:02,070 INFO] number of examples: 15
Traceback (most recent call last):
  File "/content/OpenNMT-py/train.py", line 6, in <module>
    main()
  File "/content/OpenNMT-py/onmt/bin/train.py", line 204, in main
    train(opt)
  File "/content/OpenNMT-py/onmt/bin/train.py", line 88, in train
    single_main(opt, 0)
  File "/content/OpenNMT-py/onmt/train_single.py", line 143, in main
    valid_steps=opt.valid_steps)
  File "/content/OpenNMT-py/onmt/trainer.py", line 244, in train
    report_stats)
  File "/content/OpenNMT-py/onmt/trainer.py", line 365, in _gradient_accumulation
    with_align=self.with_align)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/OpenNMT-py/onmt/models/model.py", line 45, in forward
    enc_state, memory_bank, lengths = self.encoder(src, lengths)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/OpenNMT-py/onmt/encoders/audio_encoder.py", line 119, in forward
    memory_bank = pool(memory_bank)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/pooling.py", line 76, in forward
    self.return_indices)
  File "/usr/local/lib/python3.6/dist-packages/torch/_jit_internal.py", line 181, in fn
    return if_false(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 457, in _max_pool1d
    input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: Given input size: (7x1x1). Calculated output size: (7x1x0). Output size is too small_

I know that the problem is in the pooling size but i don't know how to fix it.

pip or conda ?

Hi,

I am confused:

  • the CONTRIBUTING.md ask me to install stuff using conda and environment.yml (file updated in 2019) containing numpy=1.15.4

  • on the other hand, if I follow the tutorial in the README.md file, then I will use pip to install the requirements.txt (file updated in 2020) and thus numpy==1.18.2

I would like to know, please, what I should do.
From my personal experience, it is better to use pip as it is more stable and more up-to-date.

And maybe my first contribution would be to fix that :)

Complete MFCC and Filter Bank script.

MFCC and mel-frequency filter banks are two of the most common features to pass into deep neural networks. Complete a script that calculates these values and verify that it works using MATLAB (or another piece of software).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.