Code Monkey home page Code Monkey logo

Comments (8)

nshmyrev avatar nshmyrev commented on August 25, 2024

Page source is available here:

http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx?do=edit

it is wiki, not html render. Could you please edit and share it instead. You can paste it to pastebin and give a link here.

from pocketsphinx.

steowens avatar steowens commented on August 25, 2024

I would do that except that when I go there to that link, it is read only for me. Someone on the IRC channel suggested that I submit an issue vi this project.

from pocketsphinx.

nshmyrev avatar nshmyrev commented on August 25, 2024

Yes, our wiki is readonly. You take page source, copy it to text file, edit, then paste to pastebin and share a link here.

Thank you.

from pocketsphinx.

steowens avatar steowens commented on August 25, 2024

apparently my combination of machine and web browser makes selecting the entire page source pretty much impossible. I don't seem to have a select all option either. May I ask you why you don't simply put the page source under change control on github so that people can submit pull requests?

from pocketsphinx.

steowens avatar steowens commented on August 25, 2024

I would love to contribute as long as it doesn't start to feel like doing my income taxes.

from pocketsphinx.

steowens avatar steowens commented on August 25, 2024
====== Building application with pocketsphinx ======
===== Installation =====

Pocketsphinx is a library that depends on another library called SphinxBase which provides common functionality 
across all CMUSphinx projects. To install Pocketsphinx, you need to install both Pocketsphinx and Sphinxbase. It's possible to use Pocketsphinx both in Linux, Windows, on MacOS, iPhone and Android.

First of all, download the released packages pocketsphinx and sphinxbase from project downloads, checkout them from subversion or github. For more details see [[download | download page]].

Unpack them into same directory. On Windows, you will need to rename 'sphinxbase-X.Y' (where X.Y is the SphinxBase version number) to simply
'sphinxbase' to satisfy project pocketsphinx configuration.

**THIS TUTORIAL DESCRIBES POCKETSPHINX 5PREALPHA, IT IS NOT GOING TO WORK ON OLDER VERSIONS
**

==== Unix-like Installation ====

To build pocketsphinx in a unix-like environment (such as Linux, Solaris, FreeBSD etc) you need to make sure you have the following dependencies installed: gcc, automake, autoconf, libtool, bison, swig at least version 2.0, python development package, pulseaudio development package. If you want to build without dependencies you can use proper configure options like --without-swig-python but for beginner it is recommended to install all dependencies.

You need to download both sphinxbase and pocketsphinx packages and unpack them. Please note that you can not use sphinxbase and pocketsphinx of different version, please make sure that versions are in sync. After unpack you should see the following two main folders:

     sphinxbase-X.X
     pocketsphinx-X.x

On step one, build and install SphinxBase. Change current directory to ''sphinxbase'' folder. If you downloaded directly from the repository, you need to do this at least once to generate the ''configure'' file:

     % ./autogen.sh

if you downloaded the release version, or ran ''autogen.sh'' at least once, then compile and install:

     % ./configure
     % make
     % make install

The last step might require root permissions so it might be ''sudo make install''. If you want to use fixed-point arithmetic, you must configure SphinxBase with the --enable-fixed option. You can also set installation prefix with ''--prefix''. You can also configure with or without SWIG python support.

The sphinxbase will be installed in ''/usr/local/'' folder by default. Not every system loads libraries from this folder automatically. To load them you need to configure the path to look for shared libaries. It can be done either in the file ''/etc/ld.so.conf'' or with exporting environment variables:

     export LD_LIBRARY_PATH=/usr/local/lib
     export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

Then change to pocketsphinx folder and perform the same steps

    % ./configure
    % make
    % make install

To test installation, run '''pocketsphinx_continuous -inmic yes''' and check that it recognizes words you are saying to the microphone.

==== Windows ====

In MS Windows (TM), under MS Visual Studio 2010 (or newer - we test with Visual C++ 2010 Express):

  * load sphinxbase.sln located in sphinxbase directory
  * compile all the projects in SphinxBase (from ''sphinxbase.sln'')
  * load ''pocketsphinx.sln'' in pocketsphinx directory
  * compile all the projects in PocketSphinx

MS Visual Studio will build the executables and libraries under ''.\bin\Release'' or ''.\bin\Debug'' (depending on the target you choose on MS Visual Studio). To run ''pocketsphinx_continuous.exe'', don't forget to copy sphinxbase.dll to the bin folder. Otherwise the executable will fail to find this library. Unlike on Linux, the path to the model is not preconfigured in Windows, so you have to specify pocketsphinx_continuous where to find the model with -hmm, -lm and -dict options. Change to pocketsphinx folder and run

       bin/Release/pocketsphinx_continuous.exe -inmic yes -hmm model/en-us/en-us -lm model/en-us/en-us.lm.bin -dict model/en-us/cmudict-en-us.dict

to recognize from microphone. To recognize from file run

        bin/Release/pocketsphinx_continuous.exe -infile test/data/goforward.raw -hmm model/en-us/en-us -lm model/en-us/en-us.lm.bin -dict model/en-us/cmudict-en-us.dict



===== Pocketsphinx API Core Ideas =====

Pocketsphinx API is designed to ease the use of speech recognizer functionality in your applications

  - It is much more likely to remain stable both in terms of source and binary compatibility, due to the use of abstract types.
  - It is fully re-entrant, so there is no problem having multiple decoders in the same process.
  - It has enabled a drastic reduction in code footprint and a modest but significant reduction in memory consumption.

Reference documentation for the new API is available at http://cmusphinx.sourceforge.net/api/pocketsphinx/

===== Basic Usage (hello world) =====

There are few key things you need to know on how to use the API:

  - Command-line parsing is done externally (in ''<cmd_ln.h>'')
  - Everything takes a ''ps_decoder_t *'' as the first argument.

To illustrate the new API, we will step through a simple "hello world" example.  This example is somewhat specific to Unix in the locations of files and the compilation process.  We will create a C source file called ''hello_ps.c''.  To compile it (on Unix), use this command:

<code>
gcc -o hello_ps hello_ps.c \
    -DMODELDIR=\"`pkg-config --variable=modeldir pocketsphinx`\" \
    `pkg-config --cflags --libs pocketsphinx sphinxbase`
</code>

Please note that compilation errors here mean that you didn't carefully read the tutorial and didn't follow the installation guide above. For example pocketsphinx needs to be properly installed to be available through pkg-config system. To check that pocketsphinx is installed properly, just run ''pkg-config --cflags --libs pocketsphinx sphinxbase'' from the command line and see that output looks like

<code>
-I/usr/local/include -I/usr/local/include/sphinxbase -I/usr/local/include/pocketsphinx  
-L/usr/local/lib -lpocketsphinx -lsphinxbase -lsphinxad
</code>

==== Initialization ====
The first thing we need to do is to create a configuration object, which for historical reasons is called ''cmd_ln_t''.  Along with the general boilerplate for our C program, we will do it like this:

<code>
#include <pocketsphinx.h>

int
main(int argc, char *argv[])
{
        ps_decoder_t *ps;
        cmd_ln_t *config;

    config = cmd_ln_init(NULL, ps_args(), TRUE,
                 "-hmm", MODELDIR "/en-us/en-us",
                 "-lm", MODELDIR "/en-us/en-us.lm.bin",
                 "-dict", MODELDIR "/en-us/cmudict-en-us.dict",
                 NULL);
        if (config == NULL)
                return 1;

        return 0;
}
</code>

The ''cmd_ln_init()'' function takes a variable number of null-terminated string arguments, followed by NULL.  The first argument is any previous ''cmd_ln_t *'' which is to be updated.  The second argument is an array of argument definitions - the standard set can be obtained by calling ''ps_args()''.  The third argument is a flag telling the argument parser to be "strict" - if this is ''TRUE'', then duplicate arguments or unknown arguments will cause parsing to fail.

Note after compiling and executing the above code you will get no output, but you should see no errors either.  If you get an error such as:
<b>"error while loading shared libraries: libpocketsphinx.so.3"</b> You may want to run through the following steps:

<ol>
<li>Examine your <em>/etc/ld.so.conf</em> file.  Usually it will either contain the list of directories that the linker looks for shared libraries in or it will pull in other 
files from the /etc/ld.so.conf.d directory.  Make sure that there is an entry that loads from the <em>/usr/local/lib</em> folder.</li> 
<li>
</ol>

If you did the above and you still get this error.  Then try running <em>sudo /sbin/ldconfig</em>.  Sometimes the linker configuration needs to be refreshed especially if you have built and installed from source.

The ''MODELDIR'' macro is defined on the GCC command-line by using ''pkg-config'' to obtain the ''modeldir'' variable from PocketSphinx configuration.  On Windows, you can simply add a preprocessor definition to the code, such as this:

<code>
#define MODELDIR "c:/sphinx/model"
</code>
(replace this with wherever your models are installed).  Now, to initialize the decoder, use ps_init:

<code>
        ps = ps_init(config);
        if (ps == NULL)
                return 1;
</code>

==== Decoding a file stream ====

Because live audio input is somewhat platform-specific, we will confine ourselves to decoding audio files.  The "turtle" language model recognizes a very simple "robot control" language, which recognizes phrases such as "go forward ten meters".  In fact, there is an audio file helpfully included in the PocketSphinx source code which contains this very sentence.  You can find it in ''test/data/goforward.raw''.  Copy it to the current directory.  If you want to create your own version of it, it needs to be a single-channel (monaural), little-endian, unheadered 16-bit signed PCM audio file sampled at 16000 Hz.

Main pocketsphinx use case is to read audio data in blocks of memory from somewhere and feed them to the decoder. To do that we first open the file and start decoding of the utterance using ''ps_start_utt()'':

<code>
        fseek(fh, 0, SEEK_SET);
        rv = ps_start_utt(ps);
        if (rv < 0)
                return 1;
</code>
We will then read 512 samples at a time from the file, and feed them to the decoder using ''ps_process_raw()'':

<code>
        int16 buf[512];
        while (!feof(fh)) {
            size_t nsamp;
            nsamp = fread(buf, 2, 512, fh);
            rv = ps_process_raw(ps, buf, nsamp, FALSE, FALSE);
        }
</code>
Then we will need to mark the end of the utterance using ''ps_end_utt()'':

<code>
        rv = ps_end_utt(ps);
        if (rv < 0)
                return 1;
</code>
Then we retrieve the hypothesis to get recognition result

<code>
        hyp = ps_get_hyp(ps, &score);
        if (hyp == NULL)
                return 1;
        printf("Recognized: %s\n", hyp);
</code>

==== Cleaning up ====
To clean up, simply call ''ps_free()'' on the object that was returned by ''ps_init()''.  Free the configuration object with cmd_ln_free_r.

==== Code listing ====

<code>
#include <pocketsphinx.h>

int
main(int argc, char *argv[])
{
    ps_decoder_t *ps;
    cmd_ln_t *config;
    FILE *fh;
    char const *hyp, *uttid;
        int16 buf[512];
    int rv;
    int32 score;
        const char* inputFileName = "goforward.raw";

    config = cmd_ln_init(NULL, ps_args(), TRUE,
                 "-hmm", MODELDIR "/en-us/en-us",
                 "-lm", MODELDIR "/en-us/en-us.lm.bin",
                 "-dict", MODELDIR "/en-us/cmudict-en-us.dict",
                 NULL);
    if (config == NULL)
        return 1;
    ps = ps_init(config);
    if (ps == NULL)
        return 1;

    fh = fopen(inputFileName, "rb");
    if (fh == NULL){
        fprintf(stderr, "Unable to open input file: %s", inputFileName);
        return -1;
    }
        rv = ps_start_utt(ps);
    if (rv < 0)
        return 1;
        while (!feof(fh)) {
            size_t nsamp;
            nsamp = fread(buf, 2, 512, fh);
            rv = ps_process_raw(ps, buf, nsamp, FALSE, FALSE);
        }
        rv = ps_end_utt(ps);
    if (rv < 0)
        return 1;
    hyp = ps_get_hyp(ps, &score);
    if (hyp == NULL)
        return 1;
    printf("Recognized: %s\n", hyp);

    fclose(fh);
        ps_free(ps);
        cmd_ln_free_r(config);
    return 0;
}

</code>

===== Advanced Usage =====

For more complicated uses of the API please check the API reference.

  - For word segmentations, the API provides an iterator object which is used to, well, iterate over the sequence of words.  This iterator object is an abstract type, with some accessors provided to obtain timepoints, scores, and (most interestingly) posterior probabilities for each word.
  - Confidence of the whole utterance can be accessed with ps_get_prob method.
  - You can access lattice if needed
  - You can configure multiple searches and switch between them in runtime.

==== Searches =====

Developer can configure several "search" objects with different grammars and langauge models and switch them in runtime to provide interactive experience for the user.

There are different possible search modes:
   - keyword - efficiently looks for keyphrase and ignores other speech. allows to configure detection threshold.</li>
   - grammar - recognizes speech according to JSGF grammar. Unlike keyphrase grammar search doesn't ignore words which are not in grammar but tries to recognize them.
   - ngram/lm - recognizes natural speech with a language model.
   - allphone - recognizes phonemes with a phonetic language model.

Each search has a name and can be referenced by a name, names are application-specific. The function ps_set_search allows to activate the search previously added by a name. 

To add the search one needs to point to the grammar/language model describing the search. The location of the grammar is specific to the application. If only a simple recognition is required it is sufficient to add a single search or just configure the required mode with configuration options.

The exact design of a searches depends on your application. For example, you might want to listen for activation keyword first and once keyword is recognized switch to ngram search to recognize actual command. Once you recognized the command you can switch to grammar search to recognize the confirmation and then switch back to keyword listening mode to wait for another command.

from pocketsphinx.

steowens avatar steowens commented on August 25, 2024

Apologies for the mismatched spacing. Text edit doesn't do a good job of helping ensure code is formatted well.

from pocketsphinx.

nshmyrev avatar nshmyrev commented on August 25, 2024

I updated the wiki page per your comments. Thank you for suggestions.

from pocketsphinx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.