Code Monkey home page Code Monkey logo

asr-study's Introduction

asr-study: a study of all-neural speech recognition models

This repository contains my efforts on developing an end-to-end ASR system using Keras and Tensorflow.

Training a character-based all-neural Brazilian Portuguese speech recognition model

Our model was trained using four datasets: CSLU Spoltech (LDC2006S16), Sid, VoxForge, and LapsBM1.4. Only the CSLU dataset is paid.

Set up the (partial) Brazilian Portuguese Speech Dataset (BRSD)

You can download the freely available datasets with the provided script (it may take a while):

$ cd data; sh download_datasets.sh

Next, you can preprocess it into an hdf5 file. Click here for more information.

$ python -m extras.make_dataset --parser brsd --input_parser mfcc

Train the network

You can train the network with the train.py script. For more usage information see this. To train with the default parameters:

$ python train.py --dataset .datasets/brsd/data.h5

Pre-trained model

You may download a pre-trained brsm v1.0 model over the full brsd dataset (including the CSLU dataset):

$ mkdir models; sh download_brsmv1.sh

Also, you can evaluate the model against the brsd test set

$ python eval.py --model models/brsmv1.h5 --dataset .datasets/brsd/data.h5

brsmv1.h5 training

Test set: LER 25.13% (using beam search decoder with beam width of 100)

Predicting the outputs

To predict the outputs of a trained model using some dataset:

$ python predict.py --model MODEL --dataset DATASET

Available dataset parsers

You can see in datasets/ all the datasets parsers available.

Creating a custom dataset parser

You may create your own dataset parser. Here an example:

class CustomParser(DatasetParser):

    def __init__(self, dataset_dir, name='default name', **kwargs):
        super(CustomParser, self).__init__(dataset_dir, name, **kwargs)

    def _iter(self):
      for line in dataset:
        yield {'duration': line['duration'],
               'input': line['input'],
               'label': line['label'],
               'non-optional-field': line['non-optional-field']}

    def _report(self, dl):
      args = extract_statistics(dl)
      report = '''General information
                  Number of utterances: %d
                  Total size (in seconds) of utterances: %.f
                  Number of speakers: %d''' % (args)

Available models

You can see all the available models in core/models.py

Creating a custom model

You may create your custom model. Here an example of CTC-based model

def custom_model(num_features=26, num_hiddens=100, num_classes=28):

    x = Input(name='inputs', shape=(None, num_features))
    o = x

    o = Bidirectional(LSTM(num_hiddens,
                      return_sequences=True,
                      consume_less='gpu'))(o)
    o = TimeDistributed(Dense(num_classes))(o)

    return ctc_model(x, o)

Contributing

There are a plenty of work to be done. All contributions are welcome :).

asr-related work

  • Add new layers
    • Batch normalized recurrent neural networks arXiv
    • Batch recurrent normalization arXiv
  • Reproduce topologies and results
  • Add language model
    • WFST
    • RNNLN
    • Beam search decoder with LM or CLM
  • Encoder-decoder models with attention mechanism
  • ASR from raw speech
  • Real-time ASR

brsp-related work

  • Investigate the brsdv1 model with
  • Increase the number of datasets (ideally with free datasets)
  • Improve the LER
  • Train a language model

code-related work

  • Test coverage
  • Examples
  • Better documentation
  • Improve the API
  • More features extractors, see audio and text
  • More datasets parsers
  • Implement a nice wrapper for Kaldi in order to enjoy their feature extractors
  • Better way of store the entire preprocessed dataset

Known bugs

  • High memory and CPU consumption
  • Predicting with batch size greater than 1 (Keras' bug)
  • warp-ctc does not seem to speed up training
  • zoneout implementation

Requirements

basic requirements

  • Python 2.7
  • Numpy
  • Scipy
  • Pyyaml
  • HDF5
  • Unidecode
  • Librosa
  • Tensorflow
  • Keras

recommended

  • warp-ctc (for fast CTC loss calculation)

optional

Acknowledgements

License

See LICENSE.md for more information

asr-study's People

Contributors

robertomest avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

asr-study's Issues

issue with loading dataset

Hi,

I get the following error for both Dummy and Voxforge

  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-nCYoKW-build/h5py/_objects.c:2840)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-nCYoKW-build/h5py/_objects.c:2798)
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open (/tmp/pip-nCYoKW-build/h5py/h5o.c:3734)
KeyError: "Unable to open object (Object 'train' doesn't exist)"

All datasets are downloaded and parser has been run. I am using branch keras 2 btw

eval.py error

Using TensorFlow backend.
WARNING:tensorflow:From /mnt/Work/Projects/2018-Locktec-SAVOZ/general/asr-study/.env/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:1029: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
[<tf.Tensor 'Reshape_21:0' shape=(?, ?, 28) dtype=float32>, <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fc13c0a7b90>, <tf.Tensor 'inputs_length:0' shape=(?, ?) dtype=int32>]
WARNING:tensorflow:From /mnt/Work/Projects/2018-Locktec-SAVOZ/general/asr-study/.env/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:1108: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Traceback (most recent call last):
File "eval.py", line 74, in
test_flow = data_gen.flow_from_fname(args.dataset, datasets=args.subset)
File "/mnt/Work/Projects/2018-Locktec-SAVOZ/general/asr-study/datasets/dataset_generator.py", line 72, in flow_from_fname
for dataset in datasets]
File "/mnt/Work/Projects/2018-Locktec-SAVOZ/general/asr-study/datasets/dataset_generator.py", line 111, in flow_from_h5_group
mode=self.mode)
File "/mnt/Work/Projects/2018-Locktec-SAVOZ/general/asr-study/datasets/dataset_generator.py", line 258, in init
inputs = h5group['inputs']
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/mnt/Work/Projects/2018-Locktec-SAVOZ/general/asr-study/.env/local/lib/python2.7/site-packages/h5py/_hl/group.py", line 177, in getitem
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'inputs' doesn't exist)"

eval.py returns an error

The example from README.md > python eval.py --model data/models/brsmv1.h5 --dataset .datasets/brsd/data.h5 returns the following error:

Requirement already satisfied: h5py in /usr/local/lib/python2.7/dist-packages
Requirement already satisfied: numpy>=1.7 in /usr/local/lib/python2.7/dist-packages (from h5py)
Requirement already satisfied: six in /usr/local/lib/python2.7/dist-packages (from h5py)
Requirement already satisfied: librosa in /usr/local/lib/python2.7/dist-packages
Requirement already satisfied: unidecode in /usr/local/lib/python2.7/dist-packages
Requirement already satisfied: decorator>=3.0.0 in /usr/local/lib/python2.7/dist-packages (from librosa)
Requirement already satisfied: audioread>=2.0.0 in /usr/local/lib/python2.7/dist-packages (from librosa)
Requirement already satisfied: scikit-learn!=0.19.0,>=0.14.0 in /usr/local/lib/python2.7/dist-packages (from librosa)
Requirement already satisfied: resampy>=0.2.0 in /usr/local/lib/python2.7/dist-packages (from librosa)
Requirement already satisfied: joblib>=0.7.0 in /usr/local/lib/python2.7/dist-packages (from librosa)
Requirement already satisfied: numpy>=1.8.0 in /usr/local/lib/python2.7/dist-packages (from librosa)
Requirement already satisfied: scipy>=0.14.0 in /usr/local/lib/python2.7/dist-packages (from librosa)
Requirement already satisfied: six>=1.3 in /usr/local/lib/python2.7/dist-packages (from librosa)
Requirement already satisfied: numba>=0.32 in /usr/local/lib/python2.7/dist-packages (from resampy>=0.2.0->librosa)
Requirement already satisfied: funcsigs in /usr/local/lib/python2.7/dist-packages (from numba>=0.32->resampy>=0.2.0->librosa)
Requirement already satisfied: enum34 in /usr/local/lib/python2.7/dist-packages (from numba>=0.32->resampy>=0.2.0->librosa)
Requirement already satisfied: llvmlite>=0.22.0.dev0 in /usr/local/lib/python2.7/dist-packages (from numba>=0.32->resampy>=0.2.0->librosa)
Requirement already satisfied: singledispatch in /usr/local/lib/python2.7/dist-packages (from numba>=0.32->resampy>=0.2.0->librosa)
/usr/local/lib/python2.7/dist-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py:1029: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py:1108: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Traceback (most recent call last):
File "eval.py", line 74, in
test_flow = data_gen.flow_from_fname(args.dataset, datasets=args.subset)
File "/content/asr-study/datasets/dataset_generator.py", line 72, in flow_from_fname
for dataset in datasets]
File "/content/asr-study/datasets/dataset_generator.py", line 111, in flow_from_h5_group
mode=self.mode)
File "/content/asr-study/datasets/dataset_generator.py", line 258, in init
inputs = h5group['inputs']
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/group.py", line 167, in getitem
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'inputs' doesn't exist)"

data parser doesn't work

the example from README.md python -m extras.make_dataset --parser brsp \ --input_parser mfcc --label_parser simple_char_parser returns the following error:

File "/home/zparcheta/anaconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/home/zparcheta/anaconda2/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/data/forked/asr-study/extras/make_dataset.py", line 32, in
regex=True)
File "utils/generic_utils.py", line 62, in get_from_module
(name, module, ', '.join(members.keys())))
KeyError: 'brsp not found in datasets*.\n Valid values are: dummy, sid, brsd, voxforge, lapsbm, cslu, datasetparser'

If I change brsp for brsd (which is the available parser in dataset folder) then

datasets.dataset_parser.BRSD: WARNING File /data/forked/asr-study/data/lapsbm/LapsBM-F019/LapsBM_0378.wav has a forbidden label: "acertou o alvo em quarenta e três por cento das suas chances". Skipping
Traceback (most recent call last):
File "/home/zparcheta/anaconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/home/zparcheta/anaconda2/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/data/forked/asr-study/extras/make_dataset.py", line 46, in
override=args.override)
File "datasets/dataset_parser.py", line 128, in to_h5
group = f.create_group(dataset)
File "/home/zparcheta/anaconda2/lib/python2.7/site-packages/h5py/_hl/group.py", line 52, in create_group
gid = h5g.create(self.id, name, lcpl=lcpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/h5py_1490028130695/work/h5py/_objects.c:2846)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/h5py_1490028130695/work/h5py/_objects.c:2804)
File "h5py/h5g.pyx", line 151, in h5py.h5g.create (/home/ilan/minonda/conda-bld/h5py_1490028130695/work/h5py/h5g.c:2929)
ValueError: Unable to create group (Name already exists)

The warning appears for each line of text and is skipping it.
How can I prepare data to training? I have already downloaded the data in data folder.

Issues on checkpoint save

File "/Users/rob/py27/lib/python2.7/site-packages/keras/engine/topology.py", line 2370, in get_config
new_node_index = node_conversion_map[node_key]
KeyError: 'labels_ib-0'

This is the same common error that saving / loading unusual graph shapes get

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.