Code Monkey home page Code Monkey logo

flite's Introduction

     Flite: a small run-time speech synthesis engine
                  version 2.1-release
      Copyright Carnegie Mellon University 1999-2022
                  All rights reserved

Flite is an open source small fast run-time text to speech engine. It is the latest addition to the suite of free software synthesis tools including University of Edinburgh's Festival Speech Synthesis System and Carnegie Mellon University's FestVox project, tools, scripts and documentation for building synthetic voices. However, flite itself does not require either of these systems to compile and run.

The core Flite library was developed by Alan W Black [email protected] (mostly in his so-called spare time) while employed in the Language Technologies Institute at Carnegie Mellon University. The name "flite", originally chosen to mean "festival-lite" is perhaps doubly appropriate as a substantial part of design and coding was done over 30,000ft while awb was travelling, and (usually) isn't in meetings.

The voices, lexicon and language components of flite, both their compression techniques and their actual contents were developed by Kevin A. Lenzo [email protected] and Alan W Black [email protected].

Flite is the answer to the complaint that Festival is too big, too slow, and not portable enough.

o Flite is designed for very small devices, such as phones, portables, PDAs, and also for large server machines which need to serve lots of ports.

o Flite is not a replacement for Festival but an alternative run-time engine for voices developed in the FestVox framework where size and speed is crucial.

o Flite is all in ANSI C, it contains no C++ or Scheme, thus requires more care in programming, and is harder to customize at run-time.

o It is thread safe

o Voices, lexicons and language descriptions can be compiled (mostly automatically for voices and lexicons) into C representations from their FestVox formats

o All voices, lexicons and language model data are const and in the text segment (i.e. they may be put in ROM). As they are linked in at compile time, there is virtually no startup delay. Voices may also be loaded from a single file (or across an http connection).

o Although the synthesized output is not exactly the same as the same voice in Festival they are effectively equivalent. That is, flite doesn't sound better or worse than the equivalent voice in festival, just faster, smaller and scalable.

o For standard diphone voices, maximum run time memory requirements are approximately less than twice the memory requirement for the waveform generated. For 32bit architectures this effectively means under 1M.

o The flite program supports, synthesis of individual strings or files (utterance by utterance) to direct audio devices or to waveform files.

o The flite library offers simple functions suitable for use in specific applications.

Flite is distributed with a single 8K diphone voice (derived from the cmu_us_kal voice), a pruned lexicon (derived from cmulex) and a set of models for US English. Here are comparisons with Festival using basically the same 8KHz diphone voice

            Flite    Festival
core code    60K      2.6M
USEnglish    100K     ??
lexicon      600K     5M
diphone      1.8M     2.1M
runtime      <1M      16-20M

On a 500Mhz PIII, a timing test of the first two chapters of "Alice in Wonderland" (doc/alice) was done. This produces about 1300 seconds of speech. With flite it takes 19.128 seconds (about 70.6 times faster than real time) with Festival it takes 97 seconds (13.4 times faster than real time). On the ipaq (with the 16KHz diphones) flite synthesizes 9.79 time faster than real time.


o A good C compiler, some of these files are quite large and some C
  compilers might choke on these, gcc is fine.  Sun CC 3.01 has been
  tested too.  Visual C++ 6.0 is known to fail on the large diphone
  database files.  We recommend you use GCC Windows Subsystem for Linux
  Cygwin or mingw32 instead.

o GNU Make

o An audio device isn't required as flite can write its output to 
  a waveform file. 

Supported platforms:

We have successfully compiled and run on

o Various Intel Linux systems (and iPaq Linux), under various versions
  of GCC (2.7.2 to 10.x)

o Mac OS X

o Various Android devices

o Various openwrt devices

o FreeBSD 3.x and 4.x

o Solaris 5.7, and Solaris 9

o Windows 2000/XP and later under Cygwin 1.3.5 and later

o Windows 10/11 with Windows Subsystem for Linux

o Successfully compiles and runs under 64Bit Linux architectures

o OSF1 V4.0 (gives an unimportant warning about sizes when compiled cst_val.c)

o WASI has experimental support (see below for details)

Previously we supported PalmOS and Windows CE but these seem to be rare nowadays so they are no longer actively supported.

Other similar platforms should just work, we have also cross compiled on a Linux machine for various ARM and MIPS processors. However note that new byte order architectures may not work directly as there is some careful byte order constraints in some structures. These are portable but may require reordering of some fields, contact us if you are moving to a new architecture.


New in 2.3 (Mar 2022)

o Fixed features, now grapheme voices are much closer to
Festival quality

New in 2.2 (Oct 2018)

o Better grapheme support (Wilderness Languages) hundreds of new

New in 2.1 (Oct 2017)

o Improved Indic front end support (thanks to Suresh Bazaj
@ Hear2Read)

o 18 English Voices (various accents)

o 12 Indian Voices (Bengali, Gujarati, Hindi, Kannada, Marathi,
Panjabi, Tamil and Telugu) usually with bilingual (with English)
o Can do byteswap architectures [again] (ar9331 yun arduino, zsun etc)

o flitecheck front-end test suite

o grapheme based festvox builds give working flitevox voices

o SAPI support for CG voices (thanks to Alok Parlikar @ Cobalt
Speech and Language INC)
o gcc 6.x-10.x support

o .flitevox files (and models) 40% of previous size, but
same quality

New in 2.0.0 (Dec 2014)

o Indic language support (Hindi, Tamil and Telugu)

o SSML support

o CG voices as files accessilble by file:/// and http://
  (and set of 13 voices to load)
o random forest (multimodel support) improves voice quality

o Supports diffrent sample rates/mgc order to tune for speed

o Kal diphone 500K smaller

o Fixed lots of API issues

o thread safe (again) [after initialization]

o Generalized tokenstreams (used in Bard Storyteller)

o simple-Pulseaudio support

o Improved Android support

o Removed PalmOS support from distribution

o Companion multilingual ebook reader Bard Storyteller

New in 1.4.1 (March 2010)

o better ssml support (actually does something)

o better clunit support (smaller)

o Android support

New in 1.4 (December 2009)

o crude multi-voice selection support (may change)

o 4 basic voices are included 3 clustergen (awb, rms and slt) plus
  the kal diphone database
o CMULEX now uses maximum onset for syllabification

o alsa support

o Clustergen support (including mlpg with mixed excitation) 
  But is still slow on limited processors
o Windows support with Visual Studio (specifically for the Olympus 
    Spoken Dialog System)
o WinCE support is redone with cegcc/mingw32ce with example
    example TTS app: Flowm: Flite on Windows Mobile
o Speed-ups in feature interpretation limiting calls to alloc

o Speed-ups (and fixes) for converting clunits festvox voices

New in 1.3-release (October 2005)

o fixes to lpc residual extraction to give better quality output

o An updated lexicon (festlex_CMU from festival-2.0.95) and better
  compression its about 30% of the previous size, with about
  the same accuracy
o Fairly substantial code movements to better support PalmOS and
  multi-platform cross compilation builds
o A PalmOS 5.0 port with an small example talking app ("flop")

o runs under ix86_64 linux

New in 1.2-release (February 2003) o A build process for diphone and clunits/ldom voices FestVox voices can be converted (sometimes) automatically

o Various bug fixes

o Initial support for Mac OS X (not talking to audio device yet)
  but compiles and runs
o Text files can be synthesize to a single audio file

o (optional) shared library support (Linux)


In general

tar zxvf flite-2.3-current.tar.gz

cd flite-2.3-current
make get_voices

Where tar is gnu tar (gtar), and make is gnu make (gmake).


git clone
cd flite
make get_voices

Configuration should be automatic, but maybe doesn't work in all cases especially if you have some new compiler. You can explicitly set the compiler in config/config and add any options you see fit. Configure tries to guess these but it might be unable to guess for cross compilation cases Interesting options there are

-DWORDS_BIGENDIAN=1  for bigendian machines (e.g. Sparc, M68x, ar9331)
-DNO_UNION_INITIALIZATION=1  For compilers without C 99 union inintialization
-DCST_AUDIO_NONE     if you don't need/want audio support

There are different sets of voices and languages you can select between them (and your own sets if you make config/ For example

./configure --with-langvox=transtac

Will use the languages and voices defined in config/

Cross-compiling to WASI (experimental)

In order to successfully cross-compile to WASI, firstly head over to CraneStation/wasi-sdk and install the WASI toolchain.

Afterwards, you can cross-compile to WASI as follows:

./configure --host=wasm32-wasi \
CC=/path/to/wasi-sdk/bin/clang \
AR=/path/to/wasi-sdk/bin/llvm-ar \

It is important to correctly specify ar and ranlib that is bundled with the WASI clang. Otherwise, you will most likely experience missing symbols during linking, plus you may experience weird llvm errors such as

LLVM ERROR: malformed uleb128, extends past end

When cross-compiling from macOS, you might have to manually specify the sysroot. You can do this by tweaking the CC variable as follows:

CC="/path/to/wasi-sdk/bin/clang --sysroot=/path/to/wasi-sdk/share/sysroot"

After the configure step is successful, simply run as usual:


The generated WASI binary can then be found in bin/ directory:

file bin/flite
> bin/flite: WebAssembly (wasm) binary module version 0x1 (MVP)


The ./bin/flite binary contains all supported voices and you may choose between the voices with the -voice flag and list the supported voices with the -lv flag. Note the kal (diphone) voice is a different technology from the others and is much less computationally expensive but more robotic. For each voice additional binaries that contain only that voice are created in ./bin/flite_FULLVOICENAME, e.g. ./bin/flite_cmu_us_awb. You can also refer to external clustergen .flitevox voice via a pathname argument with -voice (note the pathname must contain at least one "/")

If it compiles properly a binary will be put in bin/, note by default -g is on so it will be bigger than is actually required

./bin/flite "Flite is a small fast run-time synthesis engine" flite.wav

Will produce an 8KHz riff headered waveform file (riff is Microsoft's wave format often called .WAV).

./bin/flite doc/alice

Will play the text file doc/alice. If the first argument contains a space it is treated as text otherwise it is treated as a filename. If a second argument is given a waveform file is written to it, if no argument is given or "play" is given it will attempt to write directly to the audio device (if supported). if "none" is given the audio is simply thrown away (used for benchmarking). Explicit options are also available.

./bin/flite -v doc/alice none

Will synthesize the file without playing the audio and give a summary of the speed.

./bin/flite doc/alice alice.wav

will synthesize the whole of alice into a single file (previoous versions would only give the last utterance in the file, but that is fixed now).

An additional set of feature setting options are available, these are debug options, Voices are represented as sets of feature values (see lang/cmu_us_kal/cmu_us_kal.c) and you can override values on the command line. This can stop flite from working if malicious values are set and therefor this facility is not intended to be made available for standard users. But these are useful for debugging. Some typical examples are

Use simple concatenation of diphones without prosodic modification

./bin/flite --sets join_type=simple_join doc/intro

Print sentences as they are said

./bin/flite -pw doc/alice

Make it speak slower

./bin/flite --setf duration_stretch=1.5 doc/alice

Make it speak higher pitch

./bin/flite --setf int_f0_target_mean=145 doc/alice

The talking clock is an example talking clode as discussed on it requires a single argument HH:MM under Unix you can call it

./bin/flite_time `date +%H:%M`

List the voices linked in directly in this build

./bin/flite -lv

Speak with the US male rms voice (builtin version)

./bin/flite -voice rms -f doc/alice

Speak with the "Scottish" male awb voice (builtin version)

./bin/flite -voice awb -f doc/alice

Speak with the US female slt voice

./bin/flite -voice slt -f doc/alice

Speak with AEW voice, download on the fly from

./bin/flite -voice -f doc/alice

Speak with AHW voice loaded from the local file.

./bin/flite -voice voices/cmu_us_ahw.flitevox -f doc/alice

You can download the available voices into voices/

./bin/get_voices us_voices


./bin/get_voices indic_voices

Voice quality

So you've eagerly downloaded flite, compiled it and run it, now you are disappointed that it doesn't sound wonderful, sure its fast and small but what you really hoped for was the dulcit tones of a deep baritone voice that would make you desperately hang on every phrase it mellifluously produces. But instead you get an 8Khz diphone voice that sounds like it came from the last millenium.

Well, first, you are right, it is an 8KHz diphone voice from the last millenium, and that was actually deliberate. As we developed flite we wanted a voice that was stable and that we could directly compare with that very same voice in Festival. Flite is an engine. We want to be able take voices built with the FestVox process and compile them for flite, the result should be exactly the same quality (though of course trading the size for quality in flite is also an option). The included voice is just a sample voice that was used in the testing process.

We expect that often voices will be loaded from external files, and we have now set up a voice repository in*.flitevox

If you visit there with a browser you can hear the examples. You can also download the .flitevox files to you machine so you don't need a network connect everytime you need to load a voice.

We are now actively adding to this list of available voices in English (16) and other languages.

Bard is a companion app that reads ebooks, both displaying them and actually reading them to you out loud using flite. Bard supports a wide range of fonts, and flite voices, and books in text, html and epub format. Bard is used as a evaluation of flite's capabilities and an example of a serious application using flite.

flite's People


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flite's Issues

cmu_us_slt couldn't be loaded


I loaded flite in VS2017.
VS2017 did some upgrades, and now I have


There is also the project "cmu_us_slt", but it's marked as "(not available)".

I went into the folder "flite-master\lang\cmu_us_slt", and I didn't see a vcprj or vcproj file there.

What is the "cmu_us_slt" about, and do I not need it?

find_sts_main.c fails to compile on mingw-w64 clang

ccache clang -mtune=generic -O2 -pipe -Wall -DCST_NO_SOCKETS -DUNDER_WINDOWS -DWIN32     -D_FORTIFY_SOURCE=0 -D__USE_MINGW_ANSI_STDIO=1  -I../include  -c -o find_sts_main.o find_sts_main.c
In file included from find_sts_main.c:47:
In file included from ../include\cst_args.h:43:
In file included from ../include/cst_features.h:44:
In file included from ../include/cst_val.h:43:
In file included from ../include/cst_file.h:63:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\x86_64-w64-mingw32\include\windows.h:69:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\x86_64-w64-mingw32\include\windef.h:8:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\x86_64-w64-mingw32\include\minwindef.h:163:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\x86_64-w64-mingw32\include\winnt.h:1554:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\x86intrin.h:15:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\immintrin.h:18:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\xmmintrin.h:3005:
D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\emmintrin.h:4224:6: error: conflicting types for '_mm_clflush'
void _mm_clflush(void const * __p);
D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\emmintrin.h:4224:6: note: '_mm_clflush' is a builtin with type 'void (const void *)'
1 error generated.

Due to defining const to empty before including any system header, it breaks the function declaration for builtin functions


[2.1] symbols removed but no soname bump

While trying to package flite version 2.1 for Debian¹, I noticed that three symbols (cst_read_2d_array, cst_read_array and cst_rx_not_indic) were dropped with respect to version 2.0. I was wondering if bumping of the soname was just forgotten or if there is anything else at stake.

Can you either bump the soname, or let me know what you think I should do instead?


Add support for wasm32-wasi target

I know that WASI is currently experimental as a compilation target, but nonetheless, I did manage to get flite compiled to it and run using CraneStation/wasmtime runtime. If it's of any interest, I'd be more than happy to submit a preliminary PR and work on it to get it merged into master. Also, I'd be more than happy to monitor any changes to WASI in the future and submit any relevant updates.

How to conversion of FestVox voices to Flite?

From google/language-resources#31. I cannot conversion of FestVox voices to Flite.

gcc -g -O2 -Wall     -o flite_goog_th_unison flite_main.o flite_voice_list.o flite_lang_list.o -L . -lgoog_th_unison   -lflite_cmu_th_lang -lflite_cmu_th_lex -L/usr/local/src/tools/flite/build/x86_64-linux-gnu/lib -lflite   -lm  
/usr/bin/ld: cannot find -lflite_cmu_th_lang
/usr/bin/ld: cannot find -lflite_cmu_th_lex
collect2: error: ld returned 1 exit status
Makefile:108: recipe for target 'flite_goog_th_unison' failed
make: *** [flite_goog_th_unison] Error 1

{macOS} Speak from command line

I can compile and run flite OK on macOS (10.14.6), but I can only generate .wav files - it won't read text from the command line or a file as in this example:

./bin/flite doc/alice

What do I need to set up to be able to do that? I don't see a way to select an audio device.

Some questions about resource usage

I'm trying to find some more info about the library, I hope this is the right place to ask. I'm still very much a beginner when it comes to flite, so if anyone happens to know about any of this it would be incredibly helpful.

I'm attempting to get this library running on a resource-constrained platform, more specifically a 32-bit microcontroller with ~500 kB available RAM, 512kB ROM reserved for TTS, and plenty of flash storage. The plan is to output the resulting speech audio over i2s in real time.


About the following statement in the readme: "For standard diphone voices, maximum run time memory requirements are approximately less than twice the memory requirement for the waveform generated."

  • Does this mean splitting text into scentences, or even words, can reduce the RAM requirement because the "waveform" will be shorter?
  • If so, would feeding individual words impact speech quality with the default US english lexicon?
  • Is this the same "runtime" spec listed at <1M in the readme's memory comparison table? (Or are there other metrics that heavily affect RAM usage?)

About the other memory requirements; as I understand it: core (60k) + USEnglish (100k) + lexicon (600k) + diphone (1800k) can all potentially be stored in ROM instead of RAM

  • Is this correct?
  • Is there any hope of moving at least the diphone and lexicon to NAND flash instead of RAM/ROM?
  • If so, how do I approach this?

Any pointers in the right direction are welcome! Including possible approaches as to how I might find some answers myself.

GCC 11.2.1 "does not match original declaration" warnings with LTO

When compiling flite-2.2 on Fedora development branch (rawhide/f36), I'm getting the following warnings:

making ../build/x86_64-linux-gnu/lib/
../../lang/cmulex/cmu_lex.c:49:27: warning: type of 'cmu_lex_phone_table' does not match original declaration [-Wlto-type-mismatch]
   49 | extern const char * const cmu_lex_phone_table[54];
      |                           ^
../../lang/cmulex/cmu_lex_entries.c:14:20: note: array types have different bounds
   14 | const char * const cmu_lex_phone_table[57] =
      |                    ^
../../lang/cmulex/cmu_lex_entries.c:14:20: note: 'cmu_lex_phone_table' was previously declared here
making ../build/x86_64-linux-gnu/lib/
../../lang/cmu_grapheme_lex/cmu_grapheme_lex.h:47:27: warning: type of 'unicode_sampa_mapping' does not match original declaration [-Wlto-type-mismatch]
   47 | extern const char * const unicode_sampa_mapping[16663][5];
      |                           ^
../../lang/cmu_grapheme_lex/grapheme_unitran_tables.c:9:20: note: array types have different bounds
    9 | const char * const unicode_sampa_mapping[16798][5] =
      |                    ^
../../lang/cmu_grapheme_lex/grapheme_unitran_tables.c:9:20: note: 'unicode_sampa_mapping' was previously declared here

Avoid allocation of a buffer for the whole WAV file when streaming

This is a feature request.

Currently, when using streaming, a buffer is allocated large enough to hold a complete wav and each chunk is accessed through the start parameter of the callback function.

int my_stream_chunk(const cst_wave *w, int start, int size, 
                   int last, cst_audio_streaming_info *asi)
    // each call of this function has the same buffer and different start value

cst_audio_streaming_info *asi = cst_alloc(struct cst_audio_streaming_info_struct,1);
asi->min_buffsize = 256;
asi->asc = my_stream_chunk;
asi->userdata = NULL;

cst_wave * wav = flite_text_to_wave(text_to_synth,v);

On embedded systems, memory is limited and sometimes the whole wav file is not required. For example when streaming to external devices.

The fact that a space for a hole file is allocated is limiting the length of the text that can be sent for synthesis.

A different approach could be to allocate enough space for the largest chunk and reuse it each time my_stream_chunk is called.

Library docs

I'm really sorry if I've missed anything, but it seems that readme doesn't include link to library documentation. Where can I find it?

P.S.: I'm also looking for list of source files which are part of library itself. Is it just everything under src/?

How to configure default global settings such as voice?

Is there some way like a configuration file to set global defaults for things like the voice?
I am trying to use flite as the TTS backend for Okular (Document viewer) but I'm unable to use a voice other that the default kal16.

Tutorial explaining flite build process for new languages

This is a ToDo and I am hoping to get to this the last weekend in October. The idea is to build a tutorial describing the procedure to build a deployable voice in one Indian language that can expose the API capabilities of flite.

Is there a way to increase/add the pause between words?

Although flite is very fast, it sounds like the words are attached together while speaking, the whitespace that should separate the words is hard to determine in the file that is generated.

For example this line, when spoken the words are attached (I have found that for almost all words in a sentence), I can recognize the words when I'm looking at the text, but it's hard when the text is not there.

flite -ps -t "hello my name is John Doe!"


pau hh ax l ow m ay n ey m ih z jh aa n d ow pau

And when spoken, (without ps flag), the sound is exactly like that. The pauses are only between the sentences and not between the words.

I tried to look through the documentations and not finding anything, I tried to look through the code to see if I can increase the pause duration, but i couldn't find anything at all.

I found it hard to imagine I'm the only one who noticed this but I couldn't find anything on it so I'm making this issue.

  • flite version: flite-2.3-current Mar 2022 (
  • OS: Arch Linux x86_64
  • Kernel: 5.17.6-arch1-1

memory leak problem

there's a memory leak problem in the function "ffeature_string", can you solve it?

Multisyn voice integration


Is there any way that we can convert our voice built using Multisyn in festival to that of flite. I can't seem to find any way for it.

Token to Words - How to keep track of which words belong to a token?

Hey everyone, so I'm stuck on a problem: I need to send user-inputted text through Flite, and then display the original text on screen with synced up word highlighting. The problem is that when a token gets expanded into multiple words (1983 -> Nineteen Eighty Three) I can't find a way to keep these words "grouped" together so that I can then sync all three words up to the original highlighted token "1983". I've tried modifying the us_tokentowords function so that it returns all the words in a single string, but I can't quite get it to work. Has anyone here come up with a solution to any similar problems? Any help would be much appreciated, thanks!

How to build shared libraries on macOS?

configure --enable-shared doesn't seem to enable building of shared libraries on macOS, while it is working fine in Linux.

Tested on Catalina (x86_64) and Monterey (arm64).

Indic voice builds broken as of commit e988047

Some changes that were introduced to the voice templates (mostly for grapheme voices) now break builds of indic voices.

In particular, this does not get defined in indic voices, since they are part of indic lang.

Windowed join functions

Is there any particular reason that windowed join is not implemented in Flite?

Are there any plans to include it in the future?

MBROLA voices?

Hey there!

Festival has support for Mbrola voices, which is pretty cool. I'd like to know whether it's possible to use them here in Flite too? I know there's a way of converting festvox to flitevox files, but I'm not sure how Mbrola is handled.


Distortion with voice "clb"

Using latest Linux openSUSE, when using flite (compiled from source) with voice clb with a command line such as:
padsp flite -voice ~/gitprogs/flite/voices/cmu_us_clb.flitevox "one one one two five"
there is a nasty distortion of sound after each "one" enunciation.
Placing other words before the "one one one" helps eliminate this until at
"two three four five one one one two five"
the distortion disappears. It does not seem to be related to output volume and only occurs with voice clb, but may be related to pulse audio or some other factor. I'm wondering if this is a known issue and what simple tweaks might be helpful to track the issue down to the source?

Built-in voice loading functions?

Hi, I'm using flite in my linux c++ project, and I'm trying to use the built-in voice loading function

extern "C"
    cst_voice *cmu_us_slt(); // built in function

But there's a link error, should I add more link flags besides -lflite?
Also, is the function name I'm using right?

"VAL: tried to access car in 1023 typed val" error on big-endian (s390x)

When running flite-2.2 test on a big-endian arch (s390x), I'm getting this error:

$ cd flite-2.2
$ LD_LIBRARY_PATH=/builddir/build/BUILDROOT/flite-2.2-1.fc36.s390x/usr/lib64
$ make -C testsuite do_thread_test
make: Entering directory '/builddir/build/BUILD/flite-2.2/testsuite'
gcc -fopenmp -o multi_thread multi_thread_main.c \
	-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -march=zEC12 -mtune=z13 -fasynchronous-unwind-tables -fstack-clash-protection -Wall -DWORDS_BIGENDIAN=1    -I../include -L../build/s390x-linux-gnu/lib -lflite  -Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -lm -lpulse-simple -lpulse  \
	-l flite_cmu_us_slt -lflite_cmulex -lflite_usenglish \
	-lflite -lm -lasound -lgomp
export OMP_NUM_THREADS=100 && ./multi_thread
VAL: tried to access car in 1023 typed val
VAL: tried to access car in 1023 typed val
make: *** [Makefile:89: do_thread_test] Error 255
make: Leaving directory '/builddir/build/BUILD/flite-2.2/testsuite'

about english cmu_lts_model.c and cmu_lex_data_raw.c

I have a question,can you help me?
why you cut the cmudict, and only 36964 english words in cmu_lex_data_raw.c
I know the cmudict contains 130000 english words, and I test the cmu_lts_model, it was performed poorly in cmu_lex_data_raw.c's 36964 words, about 90% word error rate. Why does this happen?(the cmu_lts_model is trained with cmudict which is removed the 36964 words? can you help me? thanks.
Forgive my poor English.

some questions about stress

./t2p covina
pau k ow v iy1 n ax pau
covina nil k ow0 v iy1 n ax0

Hello, I meet some questions about the accent.
In the training data, there are ow0, ax0. there is not ow.
But when I use ./t2p to predict the words. I found the t2p print ow (not ow0)!
Could you help me? I want to know the detail about how the flite deals with the accent?

Using flite with pulseaudio

Using flite-2.0.7-current Jul 2017 on openSUSE Leap 15. Did git pull to ensure that latest code installed.
Pulseaudio controls where my audio is sent. Using padsp with flite works fine, audio goes to the right device. The flite docs indicate that pulse-simple can be compiled in. Experimentation shows I get the fewest errors on compile with ./configure --with-audio=pulseaudio and the link message indicates that pulse and pulse-simple are linked in. But when I run the new flite without padsp flite still complains that it cannot find /dev/dsp, so I guess I am missing a little detail or my understanding of what is supposed to happen is incomplete.

making the .flitevox voices from source?

Hi guys, great great project. These voices are really amazing.

I've built flite 2.1 from source (probably one of the smoothest builds I've had in Linux) but I noticed that the .flitevox voices need to be downloaded instead of built from source?

I was wondering what the procedure for building these voices is and where I can find the source code?

16khz output from indic voice

Hello, I am using indic voice to generate the audio

./flite/bin/flite "-voice" flite/voices/cmu_indic_hin_ab.flitevox 'पुत्र मित्र आदि सगे संबंधियों' "-o" 'try.wav'

The output file try.wav is always 16khz. However, in the it was mentioned that the output is deliberately kept at 8khz. Is it not valid for non-us voices?

listing voices from a voicedir


After putting .flitevox files in /usr/share/flite, I would have assumed that

flite -voicedir /usr/share/flite -lv

would have listed the voice stored in in /usr/share/flite, but that is not actually working.

It would be really useful to have this so that Linux distributions can just store voices there for them to be available to users without them having to understand the inners of voice paths etc.


Dalek TTS voice on Picroft - diphone file structure

I'm trying to make a Mycroft/Picroft respond in a voice like the classic BBC Dr Who baddie, a Dalek.

I started with the standard British male Mimic diphone voice, it's already pretty robotic so it's well suited. For those who may be interested, I've altered it so that it does a passable Dalek impression which has involved two main steps;

The first is to break up the response into the individually delivered words (as in 'you ... will ... be ... exterminated') rather than running words together as in human speech. To do this on Mycroft I've interrupted coding at the point that the response has been translated into text (/mycroft-core/mycroft/audio/, at 'def handle_speak(event):') and changed the code at the 'else' point. Before I show any coding, I should say that, while I've been coding for many years, I'm a complete newbie to Python (and Mycroft/Picroft) and if I'm treading on toes or infringing things please let me know or delete this, and if you copy any of this you do so at your own risk (always make copies of the original files so that you can get back to the original code). This is what I changed it to;
#insert pauses ('. ') between words for that dalek sound
utterance = utterance.replace(" ",". ")
utterance = utterance.replace(",",". . ")
utterance = utterance + ". "
mute_and_speak(utterance, ident, listen)

The second step was to add the Dalek electronic twang to the voice. After extensive Googling I found that this was originally created by passing the actor's voice through a 'ring modulator'(?). On another site (which I can't find at the moment, but the author deserves much the credit for this bit) I found that a 'software only' approximation of ring modulation was to merge a sine wave with the original voice. A sawtooth wave is a decent approximation of a sine wave and, I thought, might be faster so I chose that instead. Mycroft was reluctant to me adding the coding as a separate module so, again, I've had to butcher the original code, in this case '/mycroft-core/mycroft.tts/' at 'def _execute(self, sentence, ident, listen):'. The code was changed (at the point shown) to;

if os.path.exists(wav_file):
LOG.debug("TTS cache hit")
phonemes = self.load_phonemes(key)
wav_file, phonemes = self.get_tts(sentence, wav_file)
if phonemes:
self.save_phonemes(key, phonemes)
vis = self.viseme(phonemes) if phonemes else None
tooth_w = 0.01
tooth_h = 0.0
ifile =,'rb')
channels = ifile.getnchannels()
frames = ifile.getnframes()
width = ifile.getsampwidth()
rate = ifile.getframerate()
audio = ifile.readframes(frames)
#remove the original file
# Convert buffer int16 using NumPy
audio16 = numpy.frombuffer(audio, dtype=numpy.int16)
empty16 = ([])
h = 1
d = tooth_w
for x in audio16:
h = h - d
if h > 1 or h < tooth_h:
d = d * -1
outarray = numpy.array(empty16, dtype=numpy.int16)
dalek_file =,'wb')
except Exception as e:
print("NOT dalekified")
self.queue.put((self.audio_ext, wav_file, vis, ident, l))

I also had to import the needed modules.

The tooth_h and tooth_w variables are the height and width of the sawtooth. I normally set tooth_h to 0, this means the sawtooth goes back and forth between 1 and 0 and the value deducted or added at each step is given by tooth_w (this should be between 0 and 1, preferably low) and the change in effect can be dramatic. There are hours of fun to be had messing about with tooth_w, there is a balance to be found between making it more 'Dalek' but keeping it intelligible.

My problem is that adding the coding at this point involves reopening the .wav file getting all the frames and precessing each, then rebuilding the file. This adds a 'noticeable' (read irritating) delay to the response, probably at least doubling the original noticeable response delay. My understanding of diphone voices are that they are created by concatenating tiny speech sounds held in some sort of database held in the original flitevox voice file. What would make it much faster would be to sawtooth each of these tiny fragments and return them to the file so that the Dalek voice was built in. Since each sawtooth fragment would be the same size as the original this shouldn't be a problem, if I could get at them. so my question is, is there an easy way to do this, or a complete description of the structure of a diphone file somewhere, or some kindly genius out there who could help? Cheers

How to add a new language?

Hi all,

Awesome small library. I do have a question? Is it possible to add a new support language? if yes, how?

Many Thanks!

compiled binaries?

Are there some compiled binaries that can be used out of the box for windows? If yes, where can i download them, if no.. why arent there any?

add new language from scratch

Hello, it is possible to create a new language from audio and text files. If so, I would like to know the workflow. I am interested in giving German voices, ie flite files.

best regeards

flite to convert Text to phonetic (hello --> [HH EH L OW])

I need to use flite to convert Text to phonetic ( hello --> [HH EH L OW]). How can biuld part of the project to do that. I need just this part of Code.
realy, I need small fast run-time tolkit to use as a front-end convert text to .lab file for test new sentence in HTS.

Allow installation under MacOs

The current branch does not install on Mac OS. The reason for this is that the cp command hard-coded in the Makefile of the ./main directory uses flags not supported on Mac.
cp -pd ...
This can be solved by replacing the flags with -r in case of "Darwin". Even though -r has a totally different semantics it can be used as a replacement in this particular case.

    UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Darwin)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.