Code Monkey home page Code Monkey logo

wavelet_prosody_toolkit's Introduction

travis-badge

Wavelet prosody analyzer

[email protected]

UPDATE 3.2.2020, Additional command-line tools: batch-processing, global spectrum and analysis-synthesis: tools.rst.

screenshot

Description

The program calculates f0, energy and duration features from speech wav-file, performs continuous wavelet analysis on combined features, finds prosodic events (prominences, boundaries) from the wavelet scalogram and aligns the events with transcribed units.

See also:

[1] Antti Suni, Juraj Šimko, Daniel Aalto, Martti Vainio, Hierarchical representation and estimation of prosody using continuous wavelet transform, Computer Speech & Language, Volume 45, 2017, Pages 123-136, ISSN 0885-2308, https://doi.org/10.1016/j.csl.2016.11.001.

The default settings of the program are roughly the same as in the paper, duration signal was generated from word level labels.

Requirements

The wavelet prosody analysis depends on several packages which are installed automatically if you use the procedure describe in ./INSTALL.rst.

Here are the main dependencies:

Here the optional dependencies:

The user is invited to have a look at the license of the dependencies.

Installation

see ./INSTALL.rst

Input information

  • audio files in wav format
  • transcriptions in either htk .lab format or Praat textgrids

Usage:

  1. Assuming the installation process is done in global mode, just do
wavelet_gui

Otherwise, go to the root directory of the program in the terminal, and start by

python3 wavelet_prosody_toolkit/wavelet_gui.py
  1. Select directory with speech and transciption files: Select Speech Directory.... Some examples are provided in samples/ directory. Files should have the same root, for example file1.wav, file1.lab or file2.wav file2.TextGrid.
  2. Select features to use in analysis: Prosodic Feats for CWT..
  3. Adjust Pitch tracking parameters for the speaker / environment, press Reprocess to see changes Set range for possible pitch values, typically males ~50-350Hz, females ~100-400Hz. If estimated track skips obviously voiced portions, move voicing threshold slider left.
  • Alternatively, pre-estimated f0 analyses can be used: file .f0 must exist and it should be either in praat matrix format or as a list file with one f0 value / line, frame shift must be constant 5ms. To get suitable format from Praat, select wav and do:
    • To Pitch: 0.005, 120, 400
    • To Matrix
    • Save as matrix text file: “/.f0”
  1. Adjust the weights of prosodic features and choose if the final signal is combined by summing or multiplying the features
  2. Select which tiers to use for durations signal generation / use duration estimated from signal
  3. Select transcription level of interest: Select Tier
  4. You can interactively zoom and move around with the button on top, and play the visible section
  5. When everything is good, you can Process all which analyzes all utterances in the directory with the current settings, and saves prosodic labels in the speech directory as <wav_file_name>.prom

Prosodic labels are saved in a tab separated form with the following columns:

<file_name> <start_time> <end_time> <unit> <prominence strength> <boundary strength>

Advanced Usage:

Additional customization of the input signals and wavelet analysis is possible by modifying the configuration file. The default configuration is located in:

wavelet_prosody_toolkit/configs/default.yaml

You can view an online version here: https://github.com/asuni/wavelet_prosody_toolkit/blob/master/wavelet_prosody_toolkit/configs/default.yaml

You are recommended to make a copy of the default.yaml file (to e.g. myconfig.yaml), and modify the copy. To apply the modified configuration, start the program by

wavelet_gui --config path/to/myconfig.yaml

Some helpful shortcuts

Here are a list of shortcuts available in the GUI:

  • CTRL+q to quit
  • F11 to switch between fullscreen et normal mode

wavelet_prosody_toolkit's People

Contributors

seblemaguer avatar asuni avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.