Code Monkey home page Code Monkey logo

speech_tools's Introduction

              "Building Voices in Festival"
 Alan W Black ([email protected]), Kevin Lenzo ([email protected])
                 and see ACKNOWLEDGEMENTS
                  http://www.festvox.org

For full details about voice building see the document itself

http://festvox.org/bsv/

The included documentation, scripts and examples should be sufficient for an interested person to build their own synthetic voices in currently supported languages or new languages in the University of Edinburgh's Festival Speech Synthesis System. The quality of the result depends much on the time and skill of the builder. For English it may be possible to build a new voice in a couple of days work, a new language may take months or years to build. It should be noted that even the best voices in Festival (or any other speech synthesis system for that matter) are still nowhere near perfect quality.

This distribution includes

Support for designing, recording and autolabelling statistical parametric
    synthesis voices
Support for designing, recording and autolabelling diphone databases
Support for designing, recording and autolabelling unit selection dbs
Building simple limited domain synthesis engines
Support for building rule driven and data driven prosody models
   (duration, intonation and phrasing)
Support for building rule driven and data driven text analysis
Lexicon and building Letter to Sound rule support
Predefined scripts for building new US (and UK) English voices
Predefined scripts for building grapheme(++) voices for any language
Scripts for designing and selecting prompts to record for
   arbitrary languages

New in 2.8

https://github.com/festvox/festival/
Grapheme built voices can be converted to .flitevox files for android
Database size reduction for random forest clustergen voices
Random Forests for F0 prediction too
18 English voices, and 13 Indic voices

New in 2.7

Random forest models building for spectrum and duration in clustergen
Grapheme based synthesizers (with specific support for large number
  of unicode writing systems)
Clustergen state and stop value optimization
Wavesurfer label support
SPAM F0 support
Phrase break support
Support for SPTK's mgc parameterization

New in 2.3

Support for cygwin tools under Windows
Substantially improved CLUSTERGEN support with mlpg and mlsb

WARNING

This is not a pointy/clicky plug and play program to build new voices. It is instructions with discussion on the problems and an attempt to document the expertise we have gained in building other voices. Although we have tried to automate the task as much as possible this is no substitute for careful correction and understanding of the processes involved. There are significant pointers into the literature throughout the document that allow for more detailed study and further reading.

REQUIREMENTS

A Unix Machine

although there is nothing inheritantly Unix about the scripts, no
attempt has yet been made about porting this to other platforms

Festival and Speech Tools

This uses speech tools programs and festival itself at various
stages in builidng voices as well as (of course) for the final
voices.  Festival and the Edinburgh Speech Tools are available from

   http://www.cstr.ed.ac.uk/projects/festival/
   
or

   http://www.festvox.org/festival

or

   https://github.com/festvox
   
It is recommended that you compile your own versions of these
as you will need the libraries and include files to build some
programs in this festvox.

Wavesurfer

To display waveforms, spectragrams and phoneme labels.

Patience and understanding

Building a new voice is a lot of work, and something will probably
go wrong which may require the repetition of some long boring and
tedious process.  Even with lots of care a new voice still might 
just not work.  In distributing this document we hope to increase the
basic knowledge of synthesis out there and hopefully find people 
who can improve on this making the processing easier and more reliable
in the future.

INSTALLATION

You must have the Edinburgh Speech Tools and Festival instllation before you can build the tools in the festvox distribution.

Unpack festvox-2.8-release.tar.gz or clone it from github

git clone https://github.com/festvox/festvox
cd festvox
./configure
make

The configuration basically tries to find your version of the Edinburgh Speech Tools and uses its configuration to set compiler type etc. So you must have that installed. If configure fails try expliciting setting your ESTDIR environment variable to point ot your compiled version of the Speech Tools.

A pre-generated version of the document in html and postscript are distributed in the html/ directory

If you need to build the document itself, you will need a working version of the docbook tools, which may (or may not) already be installed on your system

To build the documenation

cd docbook
make doc

Note that even if the documentation build fails you can still use all the scripts and programs.

To use the scripts and programs in the festvox distribution each user is expected to have the environment variables ESTDIR and FESTVOXDIR set for example as (if you use bash, zsh, ksh or sh)

export ESTDIR=/home/awb/projects/speech_tools
export FESTVOXDIR=/home/awb/projects/festvox
export FLITEDIR=/home/awb/projects/flite
export SPTKDIR=/home/awb/projects/SPTK

Or if you use csh or tcsh

setenv ESTDIR /home/awb/projects/speech_tools
setenv FESTVOXDIR /home/awb/projects/festvox
setenv FLITEDIR /home/awb/projects/flite
setenv SPTKDIR /home/awb/projects/SPTK

Remember to set these to where your installations are, not ours.

speech_tools's People

Contributors

awbcmu avatar festvox avatar lenzo-ka avatar sthibaul avatar zeehio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speech_tools's Issues

Complement build system with meson?

Would you consider to review a pull request that allowed the user to build speech_tools, festival and festvox using the meson build system (https://mesonbuild.com/)?

The current build system:

  • has race conditions when building in parallel
  • for festival and festvox it requires that speech tools is found at a specific parent directory
  • does not honor --prefix
  • does not allow out of source builds (i.e. the directory where speech tools is built must be the directory where the sources are)
  • cross-compilation does not seem straightforward

meson is a nice (and modern) alternative. It should be possible to have both build systems (the current one and meson) and if meson works better eventually remove the current one in a future version.

Thanks for any feedback!

Does `int n` in `EST_Wave.copy_sample` select channel?

void copy_sample(int n, short *buf, int offset=0, int num=EST_ALL) const

I am finishing up some haskell bindings to the wonderful festival project. I include text to wave, and thus I am binding to the EST_Wave header as well. I see the formal int n of the method copy_sample specifies the row to copy from the underlying matrix type, but I don't know what that means for the sample. Is this just an index for channels in the EST_Wave? Thank you for your time.

PipeWire support

This seems to currently have support for ALSA and PulseAudio on Linux (judging from #33), and the linked pull request seems to make life easier in adding support for other sound systems.

The Linux desktop is currently in the process of making a transition to PipeWire, which improves upon the previous APIs in the following ways:

  • Capture and playback of audio and video with minimal latency.
  • Real-time multimedia processing on audio and video.
  • Multiprocess architecture to let applications share multimedia content.
  • Seamless support for PulseAudio, JACK, ALSA, and GStreamer applications.
  • Sandboxed applications support. See Flatpak for more info.

(https://pipewire.org/)

PipeWire maintains compatibility libraries for PulseAudio and ALSA, however these still have some of the limitations of their display server. Native PipeWire support would be ideal, particularly as I plan on utilizing this inside of a sandboxed environment, and PipeWire is the only multimedia API that handles that without punching a hole in the sandbox. Aside from that, the other benefits likely don't matter much, but still may be useful.

I don't know C/C++, so I probably can't be of much help, but I can try.

Can't open /dev/dsp

I'm running Linux Mint 19 Cinnamon 3.8.8

I'm trying to run main/na_record and I get the following error

~$ speech_tools/main/na_record -f 16000 -time 5 -o test.wav -otype riff

Linux: can't open /dev/dspfor reading

From searching online, it looks like OSS is no longer supported. I've tried installing some OSS compatibility modules, but still haven't had any success in running ./na_record

It seems like the /dev/dsp error has already been fixed in festival: https://bugs.launchpad.net/ubuntu/+source/festival/+bug/662630
I didn't have any issues with playback in festival either.

There's a few old workarounds listed for festival here: https://wiki.archlinux.org/index.php/Festival#Usage_with_a_Sound_Server

Is there any way I can apply those workarounds to festvox? Or are there any other methods to get recording/playback in festvox to work?

compiled with python wrappers

I would like to compile with python wrappers and the changes I made (simply by uncommenting) to config/config.in are:

## Uncomment following to enable building of wrappers
INCLUDE_MODULES += WRAPPERS

## Only set this if you *DO* want to run swig (for example to modify
## the wrappers yourself).  If so, the safest bet is to use the same
## version of swig as speech tools developers (download from
## http://www.swig.org/ (SWIG-3.02 last tested))
##
CONFIG_SWIG_COMPILER = /usr/local/bin/swig

# Languages to generate wrappers for. Currently: PYTHON
# PERL5 is no longer supported
CONFIG_WRAPPER_LANGUAGES = PYTHON 

# Language specific includes should be set to correct site paths
CONFIG_PYTHON_INCLUDES= -I/usr/include/python2.7/

and then
./configure
make info
make

and ran into this problem:

Making in directory ./wrappers ...
Making in directory wrappers/interface ...
Making in directory wrappers/interface/python ...
Making new directory wrappers/python
mkdir: python: File exists
make[2]: *** No rule to make target `update'.  Stop.
make[1]: *** [python] Error 2
make: *** [wrappers] Error 2

I'm using speech_tools 2.5, Apple LLVM version 9.1.0 (clang-902.0.39.2)
Does anyone have any advice? Thanks in advance!

finite(X) not found on Mac

In order to compile the festival repo I had to add the following to speech_tools/include/EST_math.h:

#define finite(X) isfinite(X)

For some reason it didn't cause an issue when compiling the speech_tools repo...

Fails to link under gcc 10

Hello.
When checking the logs after a Fedora mass rebuild using gcc 10, i came acros a link failure in speech_tools. You needed to add -fnocommon to the gcc args to fix it.
The error was:

Link Shared Library estools
if [ -n "" ] ; then libs='     -lncurses -lasound  -ldl  -lm  -lstdc++
-fopenmp ' ; fi ;\
gcc -shared -o libestools.so.2.5.0.1 shared_space/*.o  $libs
/usr/bin/ld: shared_space/siodeditline.o:(.data.rel.local+0x8): multiple
definition of `editline_history_file';
shared_space/editline.o:(.bss+0x28): first defined here
collect2: error: ld returned 1 exit status

../include/EST_math.h:87:29: error: '__isnanf' was not declared in this scope

/* Linux (and presumably Hurd too as Linux is GNU libc based) */           
/* Sorry I haven't confirmed this cpp symbol yet              */           
#if defined(linux)                                                         
#define isnanf(X) __isnanf(X)                                              
#endif 

In the case of musl based distributions, such as alpine linux, this assumption is incorrect.

I recommend the following fix,

#if defined(linux) && defined(__GLIBC__)
#define isnanf(X) __isnanf(X)
#endif

#ifndef isnanf	
#define isnanf isnan	
#endif

Include license text in a source tarball

Hello.
Would it be possible to include the license text in a release tarball and as a separate file in the repository? It would help the packaging effords very much.

make requires libncurses-dev

Putting this here in case anyone also gets the same error.

During ./configure:

checking for tputs in -lncurses... no

and during make:

/usr/bin/ld: siod/editline.c:403: undefined reference to `tgetstr'

It seems libncurses-dev is needed, not just libncurses, so

sudo apt install libncurses-dev

There might also be another make error:

/usr/bin/ld: cannot find -lcurses: No such file or directory

I have libncurses6, so this is solved by:

sudo ln -s /usr/lib/x86_64-linux-gnu/libncurses.so.6 /usr/lib/x86_64-linux-gnu/libcurses.so

potential integer underflow

Hi,

I am wondering if there might exist an integer underflow error:

  1. comm_samples can be an any integer:

    if (ts.fread(&comm_samples, sizeof(int), 1) != 1)

  2. If length is zero, then data_length can be a negative integer:

    data_length = (comm_samples-offset)*comm_channels;

  3. So num_samples can be also a negative integer:

    *num_samples = data_length/comm_channels;

  4. Call to fread with the negative integer:

    if ((n=ts.fread(file_data,get_word_size(actual_sample_type),

  5. memcpy with a negative number can be vulnerable:

    memcpy(buff,&buffer[pos],items_read*size);

Thanks for your time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.