Code Monkey home page Code Monkey logo

deepwdm's Introduction

DeepWDM - Recurrent Neural Networks for Word Duration Measurement written in Torch7

Content

The repository contains code for word duration measurement.

  • back_end folder: contains the training algorithms, it can be used for training the model on new datasets or using different features.
  • front_end folder: contains the features extraction algorithm.
  • lib folder: contains some useful python scripts.
  • data folder: contains the example file to test the repository.

Installation

The code is compatible with OSX and Linux and was tested on OSX El-Capitan and Ubuntu 14.04.

Dependencies

The code uses the following dependencies:

  • Torch7 with RNN package
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh 

# On Linux with bash
source ~/.bashrc
# On Linux with zsh
source ~/.zshrc
# On OSX or in Linux with none of the above.
source ~/.profile

# For rnn package installation
luarocks install rnn

Ubuntu

Ubuntu users should also install SoX:

apt-get install sox

Model Installation

First, download the desired model: RNN, 2 RNN Layers, Bi-Directional RNN, 2 Layers of Bi-Directionl RNNs. Than, move the model file to: back_end/results/ inside the project directory.

Usage

For measurement just type:

python predict.py "input wav file" "output text grid file" "model type"

Example

You can try our tool using the example file in the data folder. Type:

python predict.py data/test.wav data/test.TextGrid rnn

Training Your Own Model

In order to train DeepWDM model using your own data you need to preform two steps:

  • A. Extract features
  • B. Train the model

Extract features

Extracting features for training new model can be done by using the run_front_end.py script from the front/_end folder. This script get as input three parameter:

  • A. The path where to the folder which contains the .wav files.
  • B. the path to the manual annotation files. Those files should be in a TextGrid format, the same as in the example folder.
  • C. The path where to save the features and labels.

To test the feature extraction procedure, type the following command from the front_end folder:

python run_front_end.py data/test_file/ --in_path_y data/test_file/ data/test_features/

This script will generate two files(tmp.features and tmp.label), one for the features and one for the labels. These files will be used to train the model.

Train the model

In order to train the model you should run the run.lua script from the back/_end folder with the right path to the labels and features from the previous step. The parameter for the new files are: -folder_path, -x_filename and -y_filename.

In order to use the new trained model with the predict.py script, you should rename it as 1_layer_model.net, place it under the results folder and choose the rnn mode.

Useful Tricks

  • In order to load the data faster, it is recommended to convert the features and labels files to .t7 format. You can do it by simply using the convert2t7.lua script, it gets as input the path to the features and label files along with the desired output paths, and saves them as .t7 file.
  • Another option is to run the data.lua script and uncomment lines 41-42 with the torch.save() command.
  • You can try out the impact of the other parameters such as: learning rate, different optimization technique, etc.
  • If your dataset is unbalanced, i.e there are much more silence then activities in the speech signal, you can try to different weights on the loss functions. This can be done by changing the values of the weights parameter in loss.lua file.

deepwdm's People

Contributors

adiyoss avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

mlspeech

deepwdm's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.