Code Monkey home page Code Monkey logo

lidbox's Introduction

lidbox

  • Spoken language identification (LId) out of the box using TensorFlow.
  • Models implemented with tf.keras.
  • Metadata handling with pandas DataFrames.
  • High-performance, parallel preprocessing pipelines with tf.data
  • Simple spectral and cepstral feature extraction on the GPU with tf.signal.
  • Average detection cost (C_avg) implemented as a tf.keras.metrics.Metric subclass.
  • Angular proximity loss implemented as a tf.keras.losses.Loss subclass.

Why would I want to use this?

  • You need a simple, deep learning based speech classification pipeline. For example: waveform -> VAD filter -> augment audio data -> serialize all data to a single binary file -> extract log-scale Mel-spectra or MFCC -> use DNN/CNN/LSTM/GRU/attention (etc.) to classify by signal labels
  • You want to train a language vector/embedding extractor model (e.g. x-vector) on large amounts of data.
  • You have a TensorFlow/Keras model that you train on the GPU and want the tf.data.Dataset extraction pipeline to also be on the GPU
  • You want an end-to-end pipeline that uses TensorFlow 2 as much as possible

Why would I not want to use this?

  • You are happy doing everything with Kaldi or some other toolkits
  • You don't want to debug by reading the source code when something goes wrong
  • You don't want to install TensorFlow 2 and configure its dependencies (CUDA etc.)
  • You want to train phoneme recognizers or use CTC

Examples

Installing

Python 3.7 or 3.8 is required.

From source

python3 -m pip install https://github.com/py-lidbox/lidbox/archive/master.zip

Most recent version from PyPI

python3 -m pip install 'lidbox==1.0.0rc0'

TensorFlow

TensorFlow 2 is not included in the package requirements because you might want to do custom configuration to get the GPU working etc.

If you don't want to customize anything and instead prefer something that just works for now, the following should be enough:

python3 -m pip install tensorflow

Editable install

If you plan on making changes to the code, it is easier to install lidbox as a Python package in setuptools develop mode:

git clone --depth 1 https://github.com/py-lidbox/lidbox.git
python3 -m pip install --editable ./lidbox

Then, if you make changes to the code, there's no need to reinstall the package since the changes are reflected immediately. Just be careful not to make changes when lidbox is running, because TensorFlow will use its autograph package to convert some of the Python functions to TF graphs, which might fail if the code changes suddenly.

Citing lidbox

@inproceedings{Lindgren2020,
    author={Matias Lindgren and Tommi Jauhiainen and Mikko Kurimo},
    title={{Releasing a Toolkit and Comparing the Performance of Language Embeddings Across Various Spoken Language Identification Datasets}},
    year=2020,
    booktitle={Proc. Interspeech 2020},
    pages={467--471},
    doi={10.21437/Interspeech.2020-2706},
    url={http://dx.doi.org/10.21437/Interspeech.2020-2706}
}

lidbox's People

Contributors

janaab11 avatar matiaslindgren avatar vinye avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lidbox's Issues

Using 'common-voice-small' example setup with larger dataset results in seg fault (core dumped) error

Hi, thanks for creating such a well documented project! I'm part of a university student group using this to train a model on (we hope) 20-30 languages commonly found in Australia. Your explanations in the examples section were incredibly useful, especially since none of us have any experience in this area. Please excuse any ignorance in the following questions.

We received the below warnings when running the project using the same datasets and code as your 'common-voice-small' example, but these didn't prevent model training from completing. Now that we're increasing the size of our dataset to include additional languages (totaling ~12gb audio), we're hitting predictable seg faults when caching the dataset or during model training. We're guessing that the issues stem from the CUDA version installed on the university machines, which is something we have no control over. We're wondering if you've encountered these issues using Lidbox, and/or if you have advice on circumventing them.

This is running on Ubuntu 20.4.06 LTS with a NVIDIA A40 w/ 45gb RAM.

Memory leak in CUDA11.x

The below warning appears when either caching the dataset or - if we omit caching - when model training begins. Using nvidia-smi to monitor GPU memory usage shows that usage gradually increases until it reaches capacity, upon when the program seg faults. So it certainly seems that the issue is caused by the cuFFT plan creation memory leak.

tensorflow/core/kernels/fft_ops.cc:472] The CUDA FFT plan cache capacity of 512 has been exceeded. This may lead to extra time being spent constantly creating new plans. For CUDA 11.x, there is also a memory leak in cuFFT plan creation which may cause GPU memory usage to slowly increase. If this causes an issue, try modifying your fft parameters to increase cache hits, or build TensorFlow with CUDA 10.x or 12.x, or use explicit device placement to run frequently-changing FFTs on CPU.

The seg fault during caching:
segfault

If we omit pre-caching the dataset, we proceed to training and see the above warning along with the following warning:

The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to dataset.cache().take(k).repeat(). You should use dataset.take(k).cache().repeat() instead.

My assumption is that this second error simply increases training time because it would affect access speed rather than the number of FFTs being executed. Is that the case?

Do you have any advice about how to modify our usage of Lidbox in order to minimise the effect of this memory leak? Until we resolve the seg fault issue we are running training on CPU, which works but is incredibly slow.

Thank you for your time,
Toby

scripts/prepare.bash in Common Voice example is broken

This might be because the structure of the downloaded .tar.gx files have changed since it was first written. Currently, the validated.tsv lies deeper inside the directory than assumed by the script, hence giving the following error at runtime:

unpacking './downloads/br.tar.gz'
cut: ./data/br/validated.tsv: No such file or directory
error: unable to load list of paths from metadata file at './data/br/validated.tsv'

Error while running train-embeddings option in cli

This error comes from running step 5 lidbox train-embeddings -v config.xvector-NB.yaml in the Common Voice example:

...
2020-07-01 18:25:25.740 I lidbox.embeddings.sklearn_utils: Wrote embedding demo to './lidbox-cache/naive_bayes/common-voice-4-embeddings/figures/test/embeddings-PCA-2D.png'
2020-07-01 18:25:28.541 I lidbox.embeddings.sklearn_utils: Wrote embedding demo to './lidbox-cache/naive_bayes/common-voice-4-embeddings/figures/test/embeddings-PCA-3D.png'
2020-07-01 18:25:28.541 I lidbox.embeddings.sklearn_utils: Fitting with train_X (22794, 3) and train_y (22794,) classifier:
  GaussianNB(priors=None, var_smoothing=1e-09)
Traceback (most recent call last):
  File "/Users/knethil/.pyenv/versions/3.7.5/bin/lidbox", line 11, in <module>
    load_entry_point('lidbox==0.5.0', 'console_scripts', 'lidbox')()
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/lidbox/__main__.py", line 36, in main
    ret = command.run()
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/lidbox/cli.py", line 184, in run
    metrics = lidbox.api.fit_embedding_classifier_and_evaluate_test_set(split2ds, split2meta, labels, config)
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/lidbox/api.py", line 305, in fit_embedding_classifier_and_evaluate_test_set
    utt2prediction, utt2target = process_predictions(test_data["ids"], predictions["test"], "test")
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/lidbox/api.py", line 279, in process_predictions
    utt2prediction = generate_worst_case_predictions_for_missed_utterances(utt2prediction, utt2target, labels)
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/lidbox/api.py", line 326, in generate_worst_case_predictions_for_missed_utterances
    predictions = np.stack([p for _, p in utt2prediction])
  File "<__array_function__ internals>", line 6, in stack
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/numpy/core/shape_base.py", line 423, in stack
    raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack

I tried to follow the code and this seems to be in the part where you process predictions of the NB classifier. Is there a way to bypass this training/prediction step and just get the x-vector embeddings from the trained model ?

Support for ragged batches during training

For example, the x-vector architecture should be trained on arbitrary length input. Without ragged batches, this limits the batch size to 1. By supporting ragged batches, we could train with larger batch sizes.

No module called 'plda'

Hi,
Today I tried to use the lidbox and I ran into the following error:
File "D:\anaconda\envs\slid\lib\site-packages\lidbox\embed\sklearn_utils.py", line 6, in <module> from plda import Classifier as PLDAClassifier ModuleNotFoundError: No module named 'plda'
Seems like a package is missing on install. My installation of the lidbox is done through pip.
Thanks in advance!

Make cache-invalidation more aggressive

Changing dataset metadata should always invalidate all existing signal caches. Compare e.g. config-file contents or save checksums of metadata alongside caches.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.