Code Monkey home page Code Monkey logo

asr-server's Introduction

About

FastCGI support for Kaldi. It allows Kaldi based speech recognition to be used though Apache or Nginx (or any other that support FastCGI) HTTP servers. It also contains simple HTML-based client, that allows testing Kaldi speech recognitionfrom a web page.

Licence

Apache 2.0

Installation guide

Summary

This guide will help you to download and build your own simple ASR web-service based on Kaldi ASR code.

Preparing prerequisites

Creating a working dir

Let's create a directory where all data will be downloaded and built.

mkdir ~/apiai
cd ~/apiai

You are free to choose any other name and path you wish to, but will have to keep in mind that your name differs from the name given in the guide.

Due to server code is based on Kaldi almost all prerequisites matches to Kaldi ones. Besides that a FastCGI library is required to communicate with HTTP server.

Getting Kaldi

As a first step you have to clone Kaldi source tree available at https://github.com/kaldi-asr/kaldi:

git clone https://github.com/kaldi-asr/kaldi

This command will clone source tree to kaldi directory. To configure and build Kaldi please refer to kaldi/INSTALL file. For detailed information please look for Kaldi official instruction: http://kaldi-asr.org/doc/install.html

Installing libraries

There are some extra libraries required. You may install them using system packet manager.

In openSuSE you may run:

$ sudo zypper install FastCGI-devel

It you have Debian or Ubuntu:

$ sudo apt-get install libfcgi-dev

Getting the code

Return to your working directory where you put Kaldi sources

$ cd ~/apiai

and then clone server source code

$ git clone https://github.com/api-ai/asr-server asr-server

It is recommended to checkout code to the same directory where kaldi-apiai is located to allow configure tool to detect Kaldi location automatically.

Building the app

$ cd asr-server

Before running a make process you have to configure build scripts by running a special utility:

$ ./configure

It will check that all required libraries installed to your system and also will look for Kaldi libraries in ../kaldi folder. If you have Kaldi installed somewhere else you may explicitly pass the path via --kaldi-root option:

$ ./configure --kaldi-root=<path_to_kaldi>

If configuration process has finished successfully you may begin the building process by running make script:

$ make

Getting a recognition model

When application build complete you need to download language specific data.

Return to your working directory where you put Kaldi sources

$ cd ~/apiai

Builded ASR application uses a Kaldi nnet3 models, which you can get by training a neural network with your personal data set or use a pretrained network provided by us. Currently it is only English model available at https://github.com/api-ai/api-ai-english-asr-model/releases/download/1.0/api.ai-kaldi-asr-model.zip.

$ wget https://github.com/api-ai/api-ai-english-asr-model/releases/download/1.0/api.ai-kaldi-asr-model.zip

Unzip the archive to asr-server directory.

$ unzip api.ai-kaldi-asr-model.zip

Running the app

Set the model directory as a working dir:

$ cd api.ai-kaldi-asr-model

There are several ways available to run application. The first one is to run it as a standalone app listening on socket defined with --fcgi-socket option:

$ ../asr-server/fcgi-nnet3-decoder --fcgi-socket=:8000

This command runs application listening on any IP address and port 8000. You are also free to define a path Unix socket, or explicit IP address (in a A.B.C.D:PORT form).

As an alternative way you may use special spawn-fcgi utility:

$ spawn-fcgi -n -p 8000 -- ../asr-server/fcgi-nnet3-decoder

Configuring HTTP service

You may use any web-server which have FastCGI support: Apache, Nginx, Lighttpd etc.

Installing Apache2

openSuSE:

$ sudo zypper in apache2

Debian and Ubuntu:

$ sudo apt-get install apache2

Configuring Apache2

Enable FastCGI proxy module with a2enmod:

$ sudo a2enmod proxy_fcgi

Then you have to add to Apache2 configuration file following line:

ProxyPass "/asr" "fcgi://localhost:8000/"

If your Apache configured to include all .conf files from /etc/apache2/conf.d folder you may create separate asr_proxy.conf file with following content:

ProxyPass "/asr" "fcgi://localhost:8000/"
Alias /asr-html/ "/home/username/apiai/asr-server/asr-html/"
<Directory "/home/username/apiai/asr-server/asr-html">
	Options Indexes MultiViews
	AllowOverride None
	Require all granted
</Directory>

Now restart Apache:

$ sudo /etc/init.d/apache2 restart

Installing Nginx

You can download latest sources from official website http://nginx.org/ and build Nginx with yourself or use your system package manager.

openSuSE:

$ sudo zypper install nginx

Debian and Ubuntu:

$ sudo apt-get install nginx

Configuring Nginx

Open nginx.conf and write down the following code:

http {
	server {
		location /asr {
			fastcgi_pass 127.0.0.1:8000;
			# Disabling this option invokes immediate sending replies to client
			fastcgi_buffering off;
			# Disabling this option invokes immediate decoding incoming audio data
			fastcgi_request_buffering off;
			include      fastcgi_params;
		}

		location /asr-html {
			root /home/username/apiai/asr-server/;
			index index.html;
		}
	}
}

This will setup Nginx to pass all requests coming to url /asr directly to ASR service listening 8000 port via FastCGI gate. For detailed information please please refer to nginx documentation (e.g. https://www.nginx.com/resources/wiki/start/topics/examples/fastcgiexample/)

Speech Recognition

Server accepts raw mono 16-bits 16 KHz PCM data. You can convert your audio using any popular encoding utilities, for instance, you can use ffmpeg:

$ ffmpeg -i audio.wav -f s16le -ar 16000 -ac 1 audio.raw

Recognition using web browser

There is a simple JS implementation that allows you to recognize speech using system mic. Open in your browser:

http://localhost/asr-html/

and follow the instructions on the page.

Recognition from command line using curl

Now, let’s recognize audio.raw by calling web-service with curl utility:

$ curl -H "Content-Type: application/octet-stream" --data-binary @audio.raw http://localhost/asr

On successfull recognition the command will return something like this:

{
	"status":"ok",
	"data":[{"confidence":0.900359,"text":"HELLO WORLD"}]
}

On error the return value will be like this:

{"status":"error","data":[{"text":"Failed to decode"}]}

Recognition request parameters

There are several parameters to tune up recognition process. All parameters are expected to be passed via query string as web-form fields enumeration (e.g. ?name1=value1&name2=value2).

Parameter Description Acceptable values Default value
nbest Set the number of possible returned values
{
	"status":"ok",
	"data":[
		{"confidence":0.900359,"text":"HELLO WORLD"},
		{"confidence":0.89012,"text":"HELLO WORD"}
	]
}
1-10 1
endofspeech Enable or disable end-of-speech points during recognition. If endpoint detected all then current result have returned and the rest data would be skipped. Also in case of interrupted recognition 2 fields would be added to response: "interrupted" with value "endofspeech", and "time" with time point showing the number of milliseconds have been processed.
{
	"status":"ok",
	"data":[{"confidence":0.900359,"text":"HELLO WORLD"}],
	"interrupted":"endofspeech",
	"time":3800
}
true or false true
intermediate Set time interval in milliseconds between intermediate results while recognition being in progress.
		The result returned as an simple sequence of JSON documents.
		Each intermediate document have "status" field set to "intermediate",
		last one will have "status" set to "ok".

{"status":"intermediate","data":[
	{"confidence":0.908981,"text":"HELLO"}
]}
{"status":"intermediate","data":[
	{"confidence":0.903025,"text":"HELLO WORLD"}
]}
{"status":"ok","data":[
	{"confidence":0.903025,"text":"HELLO WORLD"}
]}
>500 0
multipart If enabled the result would be returned as an HTTP multipart response with "content-type" set to "multipart/x-mixed-replace" and each response part has "Content-Disposition" header value equal to "form-data". Intermediate parts named as "partial" and a final part is named as "result".

--ResponseBoundary
Content-Disposition: form-data; name="partial"
Content-type: application/json

{"status":"intermediate","data":[ {"confidence":0.908981,"text":"HELLO"} ]}

--ResponseBoundary Content-Disposition: form-data; name="partial" Content-type: application/json

{"status":"intermediate","data":[ {"confidence":0.903025,"text":"HELLO WORLD"} ]}

--ResponseBoundary Content-Disposition: form-data; name="result" Content-type: application/json

{"status":"ok","data":[ {"confidence":0.903025,"text":"HELLO WORLD"} ]}

--ResponseBoundary--

true or false false

asr-server's People

Contributors

bjascob avatar jbuck avatar maxhawkins avatar mdoulaty avatar realill avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

asr-server's Issues

features with pitch

Hi all,

First at all, thanks authors for this great program. We could now make an easy demo with Kaldi models. That's really great!

I tested with the api.ai models (no ivector) and the multi_cn_chain_sp_online models (with ivector) downloaded from the Kaldi's models website. Both seem working well.

However, when I tested a TDNN-F models with pitch and ivector, it is not working. The program output always wrong texts.

For info, I already used --add-pitch=true and --online-pitch-config=conf/online_pitch.conf (online_pitch.conf contains --sample-frequency=16000).

I have no problem with online2-tcp-nnet3-decode-faster :
online2-tcp-nnet3-decode-faster
--samp-freq=16000
--frames-per-chunk=20
--extra-left-context-initial=0
--frame-subsampling-factor=3
--feature-type=mfcc
--mfcc-config=conf/mfcc.conf
--ivector-extraction-config=conf/ivector_extractor.conf
--add-pitch=true
--online-pitch-config=conf/online_pitch.conf
--endpoint.silence-phones=1:2:3:4:5:6:7:8:9:10:11:12:13:14:15
--min-active=200
--max-active=7000
--beam=15.0
--lattice-beam=6.0
--acoustic-scale=1.0
--port-num=5050
final.mdl HCLG.fst words.txt

Is this an incompatibility between Kaldi's version ? Why online2-tcp-nnet3-decode-faster works well ?
Does someone have the same problem or have any solution? Thank a lot.

Best regards,
Asterix

Can I decode 8k audio?

Thanks for the server. I have a model trained with 8k audios and want to decode 8k audios, how can I do it? When I run the server, it has an error as shown below, but I have trouble finding where to add the --allow_{upsample,downsample} option. Can you help me please?

"ERROR (fcgi-nnet3-decoder[5.5.517~1-06bf]:MaybeCreateResampler():online-feature.cc:99) Sampling frequency mismatch, expected 8000, got 16000
Perhaps you want to use the options --allow_{upsample,downsample}",

Result always comes back as "YES"

I followed the directions and it looks like Kaldi and the asr-server are installed correctly, however, whenever I test the API using ether the web interface or uploading a raw file the result is always:

{"status":"ok","data":[{"confidence":0.916982,"text":"YES"}],"interrupted":"endofspeech","time":900}

The actual text is never transcribed. Where would I start debugging this type of issue? I tried increasing the verbosity of the asr-server but that didn't really provide any useful output to me of what the issue could be.

Real-time factor

If considering real-time factor,How did you improve the recognition speed of "Decode"?

Crash on use --verbose

ERROR (fcgi-nnet3-decoder[5.2.62~1-a2342]:ToInt():parse-options.cc:598) Invalid integer option ""

[ Stack-Trace: ]
../asr-server/fcgi-nnet3-decoder() [0xf2e23a]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::ParseOptions::ToInt(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
kaldi::ParseOptions::SetOption(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
kaldi::ParseOptions::Read(int, char const* const*)
apiai::FcgiDecodingApp::Run(int, char**)
main
__libc_start_main
_start

terminate called after throwing an instance of 'std::runtime_error'
  what():
zsh: abort (core dumped)  ../asr-server/fcgi-nnet3-decoder --verbose --fcgi-socket=:8000

ASR server connection timeout

I successfully setup the server using apahe2 and it's working fine . But the app stops after 2-3 hrs automatically and needs to be started again. Any help ?
The following error comes :

packet_write_wait : Connection to 52.32.208.166 port 22 : Broken pipe

g++: error: [somewhere]/liblapack.a: No such file or directory

~/apiai/asr-server$ make
make -C src
make[1]: Entering directory '/home/osboxes/apiai/asr-server/src'
ar -cr libstidecoder.a Timing.o Response.o RequestRawReader.o ResponseJsonWriter.o ResponseMultipartJsonWriter.o OnlineDecoder.o Nnet3LatgenFasterDecoder.o QueryStringParser.o FcgiDecodingApp.o
ranlib libstidecoder.a
g++ -shared -o liblibstidecoder.so -Wl,--no-undefined -Wl,--as-needed -Wl,-soname=liblibstidecoder.so,--whole-archive libstidecoder.a -Wl,--no-whole-archive -Wl,-rpath=/home/osboxes/apiai/kaldi/tools/openfst/lib -rdynamic -Wl,-rpath=/home/osboxes/apiai/kaldi/src/lib /home/osboxes/apiai/kaldi/src/online2/libkaldi-online2.so /home/osboxes/apiai/kaldi/src/ivector/libkaldi-ivector.so /home/osboxes/apiai/kaldi/src/nnet2/libkaldi-nnet2.so /home/osboxes/apiai/kaldi/src/nnet3/libkaldi-nnet3.so /home/osboxes/apiai/kaldi/src/lat/libkaldi-lat.so /home/osboxes/apiai/kaldi/src/decoder/libkaldi-decoder.so /home/osboxes/apiai/kaldi/src/cudamatrix/libkaldi-cudamatrix.so /home/osboxes/apiai/kaldi/src/feat/libkaldi-feat.so /home/osboxes/apiai/kaldi/src/transform/libkaldi-transform.so /home/osboxes/apiai/kaldi/src/gmm/libkaldi-gmm.so /home/osboxes/apiai/kaldi/src/hmm/libkaldi-hmm.so /home/osboxes/apiai/kaldi/src/tree/libkaldi-tree.so /home/osboxes/apiai/kaldi/src/matrix/libkaldi-matrix.so /home/osboxes/apiai/kaldi/src/fstext/libkaldi-fstext.so /home/osboxes/apiai/kaldi/src/util/libkaldi-util.so /home/osboxes/apiai/kaldi/src/base/libkaldi-base.so /home/osboxes/apiai/kaldi/tools/openfst/lib/libfst.so [somewhere]/liblapack.a [somewhere]/libcblas.a [somewhere]/libatlas.a [somewhere]/libf77blas.a -lm -lpthread -ldl -lfcgi -lfcgi++
g++: error: [somewhere]/liblapack.a: No such file or directory
g++: error: [somewhere]/libcblas.a: No such file or directory
g++: error: [somewhere]/libatlas.a: No such file or directory
g++: error: [somewhere]/libf77blas.a: No such file or directory
make[1]: *** [/home/osboxes/apiai/kaldi/src/makefiles/default_rules.mk:33: liblibstidecoder.so] Error 1
make[1]: Leaving directory '/home/osboxes/apiai/asr-server/src'
make: *** [Makefile:2: all] Error 2

More Kaldi API changes?

I tried to switch in a new model (built with the latest Kaldi) and I hit this error

ERROR: FstImpl::ReadHeader: FST not of type vector: <unspecified>
ERROR (fcgi-nnet3-decoder[5.3.15~1-f14e]:ReadFstKaldi():kaldi-fst-io.cc:40) Could not read fst from HCLG.fst

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
fst::ReadFstKaldi(std::string)
apiai::Nnet3LatgenFasterDecoder::Initialize(kaldi::OptionsItf&)
apiai::FcgiDecodingApp::Run(int, char**)
main

I found this posting
https://groups.google.com/forum/#!msg/kaldi-help/0SuCkeHyUmU/qIW-INChAAAJ
in which Dan Povey suggested the fix
I believe ReadFstKaldi() should now be ReadFstKaldiGeneric(), to allow reading const_fst.
I made the change in Nnet3LatgenFasterDecoder.cc and this particular load error went away. But the recognition itself is trashed, which suggests to me that there is something else going on.

Rebuild when server running

When I use the FastCGI api-ai server, the repository often been rebuilded.
LOG (fcgi-nnet3-decoder:RebuildRepository():determinize-lattice-pruned.cc:283) Rebuilding repository.

How can I fix it?

Many thanks

Logs for incoming request and outgoing response

Hey , thanks for the server as well as for pre-trained model .

i am kind of new at kaldi and server management , can anyone guide me how to get logs for incoming request and outgoing response.

i can see KALDI_LOG in the code but i don't know where there are saved .

Thanks in advance

Error with my own Model

Hi,

I am getting the following error when I use my own model.

{"status":"error","data":[{"text":"Assertion failed: features.NumCols() == mfcc_dim + ivector_dim && "Mismatch in features dim"
Stack trace is:
kaldi::KaldiGetStackTrace()
kaldi::KaldiAssertFailure_(char const_, char const_, int, char const_)
kaldi::nnet3::DecodableNnet3SimpleOnline::ComputeForFrame(int)
kaldi::nnet3::DecodableNnet3SimpleOnline::LogLikelihood(int, int)
kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface_)
kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface_, int)
kaldi::SingleUtteranceNnet3Decoder::AdvanceDecoding()
apiai::Nnet3LatgenFasterDecoder::AcceptWaveform(float, kaldi::VectorBase const&, bool)
apiai::OnlineDecoder::Decode(apiai::Request&, apiai::Response&)
apiai::FcgiDecodingApp::ProcessingRoutine(apiai::Decoder&)
apiai::FcgiDecodingApp::Run(int, char_*)
../asr-server/fcgi-nnet3-decoder(main+0x5f) [0x8267cc]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f821c563f45]
../asr-server/fcgi-nnet3-decoder() [0x8266a9]

[stack trace: ]
kaldi::KaldiGetStackTrace()
kaldi::MessageLogger::~MessageLogger()
kaldi::KaldiAssertFailure_(char const_, char const_, int, char const_)
kaldi::nnet3::DecodableNnet3SimpleOnline::ComputeForFrame(int)
kaldi::nnet3::DecodableNnet3SimpleOnline::LogLikelihood(int, int)
kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface_)
kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface_, int)
kaldi::SingleUtteranceNnet3Decoder::AdvanceDecoding()
apiai::Nnet3LatgenFasterDecoder::AcceptWaveform(float, kaldi::VectorBase const&, bool)
apiai::OnlineDecoder::Decode(apiai::Request&, apiai::Response&)
apiai::FcgiDecodingApp::ProcessingRoutine(apiai::Decoder&)
apiai::FcgiDecodingApp::Run(int, char_*)
../asr-server/fcgi-nnet3-decoder(main+0x5f) [0x8267cc]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f821c563f45]
../asr-server/fcgi-nnet3-decoder() [0x8266a9]

"}]}

My model is trained using MFCC + I-Vectors. It is the chain-tdnn model. mfcc.conf file looks like:

--use-energy=false # use average of log energy, not energy.
--sample-frequency=16000 # Switchboard is sampled at 8kHz
--num-mel-bins=40 # similar to Google's setup.
--num-ceps=40 # there is no dimensionality reduction.
--low-freq=40 # low cutoff frequency for mel bins
--high-freq=-200 # high cutoff frequently, relative to Nyquist of 4000 (=3800)

I believe the error is due to using mfcc+i-vectors. is there a quick fix for this?

8 Khz Acoustic Model with Ivector

I have built an acoustic model for NNET3 of 8 Khz with ivectors ( similar configuration to Switchboard). I'm trying it with asr-server. I made some changes (for example, #def AUDIO_FREQUENCY = 8000) in all the places of the code where 16 Khz appears. The system runs without errors but the results is "" with sentences of 8 kHz, 16 bit raw when in Kaldi decoder the result is correct. I would like to know if I should make more modifications to be able to run my model. I have seen that the api.ai model does not have ivectors. Thanks in advance.

What kind of server

Hi,

What type of server config would it need to process/decode 10 concurrent speech recognition? How many cores and ram? Not training only for decoding.

Thanks

Error in building app (compilation phase)

Hi,
while compiling asr-server, it gives the following error.
g++: error: /home/somnath/kaldi/src/thread/libkaldi-thread.so: No such file or directory

I am using the latest kaldi-version (August-2017). All other .so files are there. See the following
/home/somnath/kaldi/src/base/libkaldi-base.so
/home/somnath/kaldi/src/chain/libkaldi-chain.so
/home/somnath/kaldi/src/cudamatrix/libkaldi-cudamatrix.so
/home/somnath/kaldi/src/decoder/libkaldi-decoder.so
/home/somnath/kaldi/src/feat/libkaldi-feat.so
/home/somnath/kaldi/src/fstext/libkaldi-fstext.so
/home/somnath/kaldi/src/gmm/libkaldi-gmm.so
/home/somnath/kaldi/src/hmm/libkaldi-hmm.so
/home/somnath/kaldi/src/ivector/libkaldi-ivector.so
/home/somnath/kaldi/src/kws/libkaldi-kws.so
/home/somnath/kaldi/src/lat/libkaldi-lat.so
/home/somnath/kaldi/src/lib/libkaldi-base.so
/home/somnath/kaldi/src/lib/libkaldi-chain.so
/home/somnath/kaldi/src/lib/libkaldi-cudamatrix.so
/home/somnath/kaldi/src/lib/libkaldi-decoder.so
/home/somnath/kaldi/src/lib/libkaldi-feat.so
/home/somnath/kaldi/src/lib/libkaldi-fstext.so
/home/somnath/kaldi/src/lib/libkaldi-gmm.so
/home/somnath/kaldi/src/lib/libkaldi-hmm.so
/home/somnath/kaldi/src/lib/libkaldi-ivector.so
/home/somnath/kaldi/src/lib/libkaldi-kws.so
/home/somnath/kaldi/src/lib/libkaldi-lat.so
/home/somnath/kaldi/src/lib/libkaldi-lm.so
/home/somnath/kaldi/src/lib/libkaldi-matrix.so
/home/somnath/kaldi/src/lib/libkaldi-nnet.so
/home/somnath/kaldi/src/lib/libkaldi-nnet2.so
/home/somnath/kaldi/src/lib/libkaldi-nnet3.so
/home/somnath/kaldi/src/lib/libkaldi-online.so
/home/somnath/kaldi/src/lib/libkaldi-online2.so
/home/somnath/kaldi/src/lib/libkaldi-sgmm2.so
/home/somnath/kaldi/src/lib/libkaldi-transform.so
/home/somnath/kaldi/src/lib/libkaldi-tree.so
/home/somnath/kaldi/src/lib/libkaldi-util.so
/home/somnath/kaldi/src/lm/libkaldi-lm.so
/home/somnath/kaldi/src/matrix/libkaldi-matrix.so
/home/somnath/kaldi/src/nnet/libkaldi-nnet.so
/home/somnath/kaldi/src/nnet2/libkaldi-nnet2.so
/home/somnath/kaldi/src/nnet3/libkaldi-nnet3.so
/home/somnath/kaldi/src/online/libkaldi-online.so
/home/somnath/kaldi/src/online2/libkaldi-online2.so
/home/somnath/kaldi/src/sgmm2/libkaldi-sgmm2.so
/home/somnath/kaldi/src/transform/libkaldi-transform.so
/home/somnath/kaldi/src/tree/libkaldi-tree.so
/home/somnath/kaldi/src/util/libkaldi-util.so

Moreover, I cannot find the src/thread in my kaldi version. Please let me know the problem.

can we get text only

Whenever I am running it is working.But can we get only the text output without getting any status,data.output,time etc............
And by the way is it possible to run the server after stopping the ../asr-server/fcgi-nnet3-decoder --fcgi-socket=:8080.

simple wav decoding don't work

En_1272-128104-0000.zip
installation and compile was success. But when trying a simple wav english file, there's no recognition.

to be sure that file is correct , i used "ffmpeg -i En_1272-128104-0000.wav -f s16le -ar 16000 -ac 1 En_1272-128104-0000.raw"

what i have missing ?
thanks to help

RESULT

curl -H "Content-Type: application/octet-stream" --data-binary En_1272-128104-0000.raw http://localhost/asr

{"status":"ok","data":[{"confidence":0.932974,"text":""}]}

VLOG[4] (fcgi-nnet3-decoder[5.2.215~1-5e7d]:FinalizeDecoding():lattice-faster-online-decoder.cc:788) pruned tokens from 4053 to 74
VLOG[4] (fcgi-nnet3-decoder[5.2.215~1-5e7d]:GetRawLattice():lattice-faster-online-decoder.cc:191) init:40 buckets:83 load:0.891566 max:1
VLOG[1] (fcgi-nnet3-decoder[5.2.215~1-5e7d]:DeterminizeLatticePhonePruned():determinize-lattice-pruned.cc:1440) Doing first pass of determinization on phone + word lattices.
VLOG[1] (fcgi-nnet3-decoder[5.2.215~1-5e7d]:DeterminizeLatticePhonePruned():determinize-lattice-pruned.cc:1455) Doing second pass of determinization on word lattices.
VLOG[1] (fcgi-nnet3-decoder[5.2.215~1-5e7d]:Decode():OnlineDecoder.cc:250) Recognized @ 71 ms
VLOG[1] (fcgi-nnet3-decoder[5.2.215~1-5e7d]:Decode():OnlineDecoder.cc:255) Decode subroutine done

custom language model

its working fine for me but how can customize a language model for it with json model from api.ai or any limited text corpus with hight priority to recognize ?
please let me know thank you very much in advanced

Errors buiding asr-server with kaldi release as of May 10, 2016 (CentOS 6.7, g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16)

g++ -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Wno-unused-local-typedefs -Winit-self -DHAVE_EXECINFO_H=1 -rdynamic -DHAVE_CXXABI_H -DHAVE_ATLAS -I/home/xyz/apiai/kaldi-apiai/tools/ATLAS/include -I/home/xyz/apiai/kaldi-apiai/tools/openfst/include -I/home/xyz/apiai/kaldi-apiai/src -L/home/xyz/apiai/kaldi-apiai/src -g -c -o Nnet3LatgenFasterDecoder.o Nnet3LatgenFasterDecoder.cc
In file included from Nnet3LatgenFasterDecoder.cc:16:
Nnet3LatgenFasterDecoder.h:49: error: 'OnlineNnet2FeaturePipelineConfig' in namespace 'kaldi' does not name a type
Nnet3LatgenFasterDecoder.h:52: error: ISO C++ forbids declaration of 'OnlineNnet2FeaturePipelineInfo' with no type
Nnet3LatgenFasterDecoder.h:52: error: invalid use of '::'
Nnet3LatgenFasterDecoder.h:52: error: expected ';' before '' token
Nnet3LatgenFasterDecoder.h:58: error: ISO C++ forbids declaration of 'OnlineIvectorExtractorAdaptationState' with no type
Nnet3LatgenFasterDecoder.h:58: error: invalid use of '::'
Nnet3LatgenFasterDecoder.h:58: error: expected ';' before '
' token
Nnet3LatgenFasterDecoder.h:59: error: ISO C++ forbids declaration of 'OnlineNnet2FeaturePipeline' with no type
Nnet3LatgenFasterDecoder.h:59: error: invalid use of '::'
Nnet3LatgenFasterDecoder.h:59: error: expected ';' before '' token
Nnet3LatgenFasterDecoder.cc: In constructor 'apiai::Nnet3LatgenFasterDecoder::Nnet3LatgenFasterDecoder()':
Nnet3LatgenFasterDecoder.cc:25: error: 'feature_info
' was not declared in this scope
Nnet3LatgenFasterDecoder.cc: In destructor 'virtual apiai::Nnet3LatgenFasterDecoder::~Nnet3LatgenFasterDecoder()':
Nnet3LatgenFasterDecoder.cc:33: error: 'feature_info_' was not declared in this scope
Nnet3LatgenFasterDecoder.cc: In member function 'virtual void apiai::Nnet3LatgenFasterDecoder::RegisterOptions(kaldi::OptionsItf&)':
Nnet3LatgenFasterDecoder.cc:55: error: 'feature_config_' was not declared in this scope
Nnet3LatgenFasterDecoder.cc: In member function 'virtual bool apiai::Nnet3LatgenFasterDecoder::Initialize(kaldi::OptionsItf&)':
Nnet3LatgenFasterDecoder.cc:73: error: 'feature_info_' was not declared in this scope
Nnet3LatgenFasterDecoder.cc:73: error: expected type-specifier
Nnet3LatgenFasterDecoder.cc:73: error: expected ';'
Nnet3LatgenFasterDecoder.cc: In member function 'virtual void apiai::Nnet3LatgenFasterDecoder::InputStarted()':
Nnet3LatgenFasterDecoder.cc:105: error: 'adaptation_state_' was not declared in this scope
Nnet3LatgenFasterDecoder.cc:105: error: expected type-specifier
Nnet3LatgenFasterDecoder.cc:105: error: expected ';'
Nnet3LatgenFasterDecoder.cc:107: error: 'feature_pipeline_' was not declared in this scope
Nnet3LatgenFasterDecoder.cc:107: error: expected type-specifier
Nnet3LatgenFasterDecoder.cc:107: error: expected ';'
Nnet3LatgenFasterDecoder.cc: In member function 'virtual void apiai::Nnet3LatgenFasterDecoder::CleanUp()':
Nnet3LatgenFasterDecoder.cc:121: error: 'adaptation_state_' was not declared in this scope
Nnet3LatgenFasterDecoder.cc:122: error: 'feature_pipeline_' was not declared in this scope
Nnet3LatgenFasterDecoder.cc: In member function 'virtual bool apiai::Nnet3LatgenFasterDecoder::AcceptWaveform(kaldi::BaseFloat, const kaldi::VectorBase&)':
Nnet3LatgenFasterDecoder.cc:132: error: 'feature_pipeline_' was not declared in this scope
Nnet3LatgenFasterDecoder.cc: In member function 'virtual void apiai::Nnet3LatgenFasterDecoder::InputFinished()':
Nnet3LatgenFasterDecoder.cc:144: error: 'feature_pipeline_' was not declared in this scope
Nnet3LatgenFasterDecoder.cc: In member function 'virtual void apiai::Nnet3LatgenFasterDecoder::GetLattice(kaldi::CompactLattice_)':
Nnet3LatgenFasterDecoder.cc:156: error: 'feature_pipeline_' was not declared in this scope
Nnet3LatgenFasterDecoder.cc:156: error: 'adaptation_state_' was not declared in this scope
/home/xyz/apiai/kaldi-apiai/src/base/kaldi-math.h: At global scope:
/home/xyz/apiai/kaldi-apiai/src/base/kaldi-math.h:130: warning: 'kaldi::kLogZeroBaseFloat' defined but not used
cc1plus: warning: unrecognized command line option "-Wno-unused-local-typedefs"
make[1]: *** [Nnet3LatgenFasterDecoder.o] Error 1
make[1]: Leaving directory `/home/xyz/apiai/asr-server/src'
make: *** [all] Error 2

Low throughput w/ kaldi + asr-server + kaldi

I am testing using nginx -> fastcgi_pass -> asr-server -> kaldi successfully, but the throughput we're getting is very low - on the order of 1-2rq/ps using significantly sized AWS instances.

Do people have similar experiences using this in production? We're running running with the ../asr-server/fcgi-nnet3-decoder --fcgi-socket=:8000 option.

The requested URL /asr-html/ was not found on this server

I followed all the steps.
apache2 is correctly installed
after running
/apiai/asr-server/fcgi-nnet3-decoder --fcgi-socket=:8000
m getting the following output on terminal
/apiai/asr-server/fcgi-nnet3-decoder --feature-type=mfcc --mfcc-config=mfcc.conf --frame-subsampling-factor=3 --max-active=2000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 --endpoint.silence-phones=1 --endpoint.rule1.min-trailing-silence=0.5 --endpoint.rule2.min-trailing-silence=0.15 --endpoint.rule3.min-trailing-silence=0.1 --fcgi-socket=:8000
LOG (fcgi-nnet3-decoder[5.4.1291-90363]:Run():FcgiDecodingApp.cc:199) Listening FastCGI data at ":8000"
LOG (fcgi-nnet3-decoder[5.4.129
1-90363]:CompileLooped():nnet-compile-looped.cc:334) Spent 0.0474169 seconds in looped compilation.

and on accessing localhost/asr-html/ from browser
m getting
The requested URL /asr-html/ was not found on this server

Cuda build

My kaldi build with cuda by default, so my build initially failed. We need to somehow support including libraries required by kaldi. For instance check if OpenBlas works.

"Got no data" after change "--online=false"

I'm not using ivector for nnet3 model, referring to the help:

--online                    : You can set this to false to disable online iVector estimation and have all the data for each utterance used, even at utterance start.  This is useful where you just want the best results and don't care about online operation.  Setting this to false has the same effect as setting --use-most-recent-ivector=true and --greedy-ivector-extractor=true in the file given to --ivector-extraction-config, and --chunk-length=-1. (bool, default = true)

I changed --online option to false, but got the wrong return which is:

{"status":"error","data":[{"text":"Got no data"}]}

Any ideas on this?
Appreciate your help.

Building against latest Kaldi

There's a few issues with building this code against the latest Kaldi.

  • In OnlineDecoder.h we need a std:: in front of vector in a few places (or a using namespace std; at the top)
  • The constructor for SingleUtteranceNnet3Decoder in kaldi/src/online2/online-nnet3-decoding.h
    has changed. It now takes a LatticeFasterDecoderConfig instead of a OnlineNnet3DecodingConfig
    and a nnet3::DecodableNnetSimpleLoopedInfo instead of a nnet3::AmNnetSimple
    I was able to do a hack fix by finding an older version of online-nnet3-decoding.cc/.h and online-nnet3-decodable-simple.cc/.h and changing the include/Makefile to use the local version instead.
    I also had to comment out computer.Forward() in online-nnet3-decodable-simple.cc because the new Kaldi lib doesn't seem to have this method (and I'm not sure yet what the impact will be).

It looks like a real fix shouldn't be too hard but I'm not very familiar with Kaldi so I'd need to do a bunch of digging before I could switch the classes around to use the new lib correctly.

These changes may be enough to get things to work for me. At this point it does compile and link but I haven't gotten it fully running yet.

Can I extend asr-server to recognize language?

Hi, sorry to bother with such a question.
I used kaldi to train lre07 to get the models to identify spoken languages.
I want to apply the models for online recognizing languages. And I think asr-server is good tool to extend to support online language identification with online ivector extraction.
Would you please give me some guidance how to do this? Thank you very much.

Regards,
Luke

503 error

After all installation when i am running through server it shows after stop the recording.how to remove this error and follow the instructions on the server.

<title>503 Service Unavailable</title>

Service Unavailable

The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.


Apache/2.4.29 (Ubuntu) Server at localhost Port 80

aspire model nnet3

how can we use ASpIRE Chain Model with this ASR ?

i tried to use it but i get this error
ERROR: FstImpl::ReadHeader: FST not of type vector:
ERROR (fcgi-nnet3-decoder[5.2.183~1351-32310]:ReadFstKaldi():kaldi-fst-io.cc:40) Could not read fst from HCLG.fst

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
fst::ReadFstKaldi(std::__cxx11::basic_string<char, std::char_traits, std::allocator >)
apiai::Nnet3LatgenFasterDecoder::Initialize(kaldi::OptionsItf&)
apiai::FcgiDecodingApp::Run(int, char**)
main
__libc_start_main
_start

terminate called after throwing an instance of 'std::runtime_error'
what():
Aborted (core dumped)

503 Service Unavailable when call to /asr

I'm use ubuntu 16.04+apache2+php7.1fpm

I'm install all dependency package already and try to run asr-server via
../asr-server/fcgi-nnet3-decoder --fcgi-socket=127.0.0.1:9999
after run
../asr-server/fcgi-nnet3-decoder --feature-type=mfcc --mfcc-config=mfcc.conf --frame-subsampling-factor=3 --max-active=2000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 --endpoint.silence-phones=1 --endpoint.rule1.min-trailing-silence=0.5 --endpoint.rule2.min-trailing-silence=0.15 --endpoint.rule3.min-trailing-silence=0.1 --fcgi-socket=127.0.0.1:9999 LOG (fcgi-nnet3-decoder[5.0.51~1380-cd97]:Run():FcgiDecodingApp.cc:199) Listening FastCGI data at "127.0.0.1:9999"

next try to go to http://localhost/asr-html/ and record sound after that request(/asr) still pending until timeout I'm don't know why?

and
I'm try to run example in apiai/kaldi/egs/apiai_decode/s5 that it work

can some one help me
Thanks.

HEY
resolved
must to run spawn-fcgi -n -p 9999 -- ../asr-server/fcgi-nnet3-decoder

Where are the running logs

I see there are code to dump the log(KALDI_VLOG), but I don't know where I can read the output log after running decoding. Would you please help me?

Regards,
Luke

I am not getting any text for decoding

I have followed the steps given

However i always get the following output from the asr server

{"status":"ok","data":[{"confidence":0.862751,"text":""}],"interrupted":"endofspeech","time":1080}

Please guide on how to check the asr logs

Online decoding?

Thanks for sharing this awesome project. I have run it and test by using my own voice. the ASR accuracy is good. Some questions about this project,

  1. It seems that this project don't support online decoding, namely only when user press stop button, then the decoder on the server started to decoding the received raw data. Am I right?

  2. I find that in the kaldi model folder, the model file is stored in final.mdl, not final.nnet, so I am wondering whether current project is decoding using GMM-HMM model, not nnet model?

Thanks in advance!

Build model docs

Would you like update readme to make my own model for asr-server?

Something is confusing within the readme

How am I supposed to refer to the spawned fcgi with curl exactly ?
The curl command doesn't even specify the port, and definitely something is not working as curl returns "(52) empty reply from server" when I am trying to use curl.
No idea why there is no working explanation there ....

Getting error for my own model

ERROR (fcgi-nnet3-decoder[5.5.268~1-f9828]:DecodableNnetLoopedOnlineBase():decodable-online-looped.cc:50) Ivector feature dimension mismatch: got -1 but network expects 100

[ Stack-Trace: ]
kaldi::MessageLogger::LogMessage() const
kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)
kaldi::nnet3::DecodableNnetLoopedOnlineBase::DecodableNnetLoopedOnlineBase(kaldi::nnet3::DecodableNnetSimpleLoopedInfo const&, kaldi::OnlineFeatureInterface*, kaldi::OnlineFeatureInterface*)
kaldi::SingleUtteranceNnet3DecoderTpl<fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl > > >::SingleUtteranceNnet3DecoderTpl(kaldi::LatticeFasterDecoderConfig const&, kaldi::TransitionModel const&, kaldi::nnet3::DecodableNnetSimpleLoopedInfo const&, fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl > > const&, kaldi::OnlineNnet2FeaturePipeline*)
apiai::Nnet3LatgenFasterDecoder::InputStarted()
apiai::OnlineDecoder::Decode(apiai::Request&, apiai::Response&)
apiai::FcgiDecodingApp::ProcessingRoutine(apiai::Decoder&)
apiai::FcgiDecodingApp::Run(int, char**)
main
__libc_start_main
_start

fastcgi/fcgio.h

I was not able to build it on Ubuntu because Ubuntu has fcgio.h instead of fastcgi/fcgio.h

Mac OS Compatibility: pthread_tryjoin_np

Thanks for open sourcing this library. It looks like great code and I'm eager to try it out!

I tried compiling on Mac but ran into some issues. There were some minor problems with the configure script and one compilation error. Unfortunately pthread_threadid_np is not supported on Mac OS.

FcgiDecodingApp.cc:233:10: error: use of undeclared identifier
      'pthread_tryjoin_np'; did you mean 'pthread_threadid_np'?
                                if (!pthread_tryjoin_np(*i, NULL)) {
                                     ^~~~~~~~~~~~~~~~~~
                                     pthread_threadid_np

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.