Code Monkey home page Code Monkey logo

za_lex's Introduction

ZA_LEX: lexical resources for South African languages

This repository contains lexical pronunciation resources and modules for use in text-to-speech (TTS) systems.

Specifically, it was originally set up to track work on updating and enhancing existing resources for the NTTS project funded by the Department of Arts and Culture (DAC) of the Government of South Africa.

The copyright and licence information for scripts in ./scripts/ can be found in ./COPYRIGHT and ./LICENCE-APACHE/./LICENCE-MIT. This repository also contains data from various sources under different licences in the ./data/* directories. Copyright and licence information for data and third-party components is contained in each individual sub-directory or source file.

For more information contact: Daniel van Niekerk (http://www.nwu.ac.za/must).

Software dependencies

Description of contents

The top level directory structure is summarised as follows:

.
|-- data
|   |-- afr
|   |-- eng
|   |-- sot
|   |-- tsn
|   |-- xho
|   `-- zul
|-- examples
|-- scripts
|-- COPYRIGHT
|-- LICENCE-APACHE
|-- LICENCE-MIT
`-- README.md
  • The data directory contains core language resources organised by language, each associated with its own LICENCE and README.
  • The examples directory contains some example outputs when running scripts as described below.
  • The scripts directory contains implementations and UNIX tools for grapheme-to-phoneme (G2P) conversion, syllabification, word decompounding and morphological analysis (some usage examples are given below).

Usage examples

Decompounding

The decompounder decomp_simple.py requires a word list and can be run for example on the Afrikaans data as follows:

cut -d " " -f 1 data/afr/pronundict.txt | scripts/decomp_simple.py examples/afr.words5.txt > examples/afr.decomp.txt

Morphological analysis

The Zulu morphological analyser can be run as follows (simplified output):

cut -f 1 data/zul/ref/nchlt_release_20130328/nchlt_isizulu.dict | scripts/morph_dcg.py data/zul/morphrules.descr.json data/zul/morphrules.dcg.txt --simpleguess > examples/zul.morphsimple.txt

Pronunciation prediction

G2P conversion can be run as follows:

cut -f 1 data/zul/ref/nchlt_release_20130328/nchlt_isizulu.dict | scripts/g2p_icu.py data/zul/phonemeset.json data/zul/g2p.translit.txt > examples/zul.simple.pronun.txt
cut -f 1 data/tsn/ref/nchlt_release_20130328/nchlt_setswana.dict | scripts/g2p_icu.py data/tsn/phonemeset.json data/tsn/g2p.translit.txt > examples/tsn.simple.pronun.txt

The syllabification modules can be run on the resulting pronunciation dictionaries:

cat examples/zul.simple.pronun.txt | scripts/syl_zul.py data/zul/phonemeset.json | cut -f 1,3 > examples/zul.syll.pronun.txt
cat examples/tsn.simple.pronun.txt | scripts/syl_tsn.py data/tsn/phonemeset.json | cut -f 1,3 > examples/tsn.syll.pronun.txt

za_lex's People

Contributors

demitasse avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.