Code Monkey home page Code Monkey logo

speech-javascript-sdk's Introduction

IBM Watson Speech Services for Web Browsers

Build Status npm-version

Allows you to easily add voice recognition and synthesis to any web app with minimal code.

Built for Browsers

This library is primarily intended for use in web browsers. Check out ibm-watson to use Watson services (speech and others) from Node.js.

However, a server-side component is required to generate auth tokens. SDKs are available for Node.js, Java, Python, which can be used with a server-side application.

Installation - standalone

Pre-compiled bundles are available from on GitHub Releases - just download the file and drop it into your website: https://github.com/watson-developer-cloud/speech-javascript-sdk/releases

Installation - bower

bower install --save watson-speech

Installation - npm with Browserify or Webpack

This library can be bundled with browserify or Webpack and easy included in larger projects:

npm install --save watson-speech

This method enables a smaller bundle by only including the desired components, for example:

var recognizeMic = require('watson-speech/speech-to-text/recognize-microphone');

Using with IAM

This SDK CAN be used in the browser with services that use IAM for authentication. This does require a server-side component - an endpoint used to retrieve the token.

In a local environment you should set only the TEXT_TO_SPEECH_IAM_APIKEY and SPEECH_TO_TEXT_IAM_APIKEY in your .env file.

Once that is set up, the token can be used in your SDK request with the parameter access_token.

Using with Angular

Make sure polyfills.ts has following entries:

(window as any).global = window;
(window as any).process = require('process/browser');
import 'zone.js/dist/zone';  // Included with Angular CLI.
global.Buffer = global.Buffer || require('buffer').Buffer;

Changes

See CHANGELOG.md for a complete list of changes.

Development

Global Transaction ID

Unfortunately, there is no way to set or get the global transaction id provided by the initial websocket handshake with the Speech service. This is due to limitations from the W3C Websocket API for browser. It is possible; however, to grab the X-Global-Transaction-Id header through a browser's dev tool "Network" tab. After making a call to the service, look for a request to the v1/recognize endpoint which should return a 101 code. The response headers from that request will contain the X-Global-Transaction-Id header.

Testing

The test suite is broken up into offline unit tests and integration tests that test against actual service instances.

  • npm test will run the linter and the offline tests
  • npm run test-offline will run the offline tests
  • npm run test-integration will run the integration tests

To run the integration tests, service credentials are required. Make sure you have a valid .env file in the root directory.

speech-javascript-sdk's People

Contributors

apaparazzi0329 avatar bobfang avatar demiand avatar dependabot[bot] avatar dpopp07 avatar germanattanasio avatar hkomine avatar jeancarl avatar jeff-arn avatar jmeis avatar kant avatar kushan-chinthaka avatar mediumtaj avatar mingzhi22 avatar mreinstein avatar nfriedly avatar nhayashida avatar nylki avatar playground avatar ricardojudo avatar rizwan29 avatar sirspidey avatar thomas-pfeiffer avatar toastal avatar w0o avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speech-javascript-sdk's Issues

clean up example server js

The server js example is not up to our current standards. Specifically, it

  • Should use the current Node.js SDK
  • Should not use util._extend (Object.assign() is acceptable, but it really shouldn't be needed at all.)
  • Should use the env properties that the SDK loads automatically instead of abbreviated ones & vcap_services
  • Should probably all be in a single js file

Browser Bug Fix - No activity times out after 5 secs

I believe this SDK suffers with the same issue outlined here:
https://support.mozilla.org/en-US/questions/984179

Where because some part of the audio stream needs to be global to stop the microphone from stopping automatically after a few seconds of inactivity.

I've added to line 18:

window.watsonAudioInput = [];

And changed:

var audioInput = context.createMediaStreamSource(stream);

to:

var audioInput = context.createMediaStreamSource(stream);
watsonAudioInput.push(audioInput);

Now the microphone will continue to record until you'd like it to stop, even if there is no sound input within the first few seconds.

Unable to get all of transcribed audio when using push-to-talk approach

Basically what I am trying to do is build a walkie-talkie type of chatbot.

When the user is holding down a button, audio should be transcribed and when they lift up on the button, the complete transcription should be sent to a server to be processed. I want all the audio sent while the button was held down to be transcribed, so I can't send the transcription at the point of the button lift up (which might be missing the last word or syllable). The transcription should be sent after of all the audio has been transcribed. I see no way to do this because there isn't an "all-audio-transcribed" type of event.

Basically if you take the example https://watson-speech.mybluemix.net/microphone-streaming.html and change the following:

  1. make it a single button labeled "push-to-talk"
  2. add another div with id "outputSent" that represents what is sent to a server
  3. mousedown event starts the stream of audio to STT
  4. mouseup event stops the streaming of audio
  5. when transcription is finished, update #outputSent

I think something needs to be added to the API for # 5 to work.

I think using a STT service in Frankfurt will help demonstrate this problem so that transcription takes a little longer after the user lifts up on the button.

/cc @nfriedly

How to handle re-authentication

So we will need to fetch a new token after a hour it would appear. What is the recommended method of discovering that a new token is needed from the server so the request can be made? It does say a hour but that current sessions are not affected so it may not even matter (I am returning a ttsToken when session starts)

Custom Language Modeling and recognizeElement

http://stackoverflow.com/questions/39004133/watson-nodejs-speech-to-text-train-language-model

I have been using watson-speech@^0.7.5 to generate subtitles for streaming videos (HLS) for sporting customers.

Further, I have been able to train language models.

I would like to use recognizeElement and my customization_id for my trained language model. However, I have 2 problems:

  1. I think recognizeElement has been deprecated

The library call I am using is

videoProps.stream = WatsonSpeechToText.recognizeElement({ element: myMediaElement, token: videoProps.ctx.token, muteSource: false, autoPlay: false, model:videoProps.ctx.currentModel, timestamps: true, profanity_filter: true, inactivity_timeout: -1, continuous: true }) .pipe(new WatsonSpeechToText.FormatStream());

However, I noticed that watson-speech 0.19.3, the API has been removed. Is there an alternative now?
Also, I would like to use a custom language model I trained. Will this API be updated to include the following call?

element: myMediaElement, token: videoProps.ctx.token, muteSource: false, autoPlay: false, customization_id:videoProps.ctx.currentModel.replace('custom:',''), timestamps: true, profanity_filter: true, inactivity_timeout: -1, continuous: true

  1. I do not think the API supports customization_id's.
    While looking in recognize-stream.js, I noticed that OPENING_MESSAGE_PARAMS_ALLOWED nor QUERY_PARAMS_ALLOWED support customization_id.

I can certain pull down the source and make the changes but again, recognize element is gone.

npm run test fails

npm ERR! Darwin 14.5.0
npm ERR! argv "/usr/local/Cellar/node/4.2.1/bin/node" "/usr/local/bin/npm" "run" "test"
npm ERR! node v4.2.1
npm ERR! npm v2.14.7
npm ERR! code ELIFECYCLE
npm ERR! [email protected] test: karma start test/resources/karma.conf.js --single-run
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] test script 'karma start test/resources/karma.conf.js --single-run'.
npm ERR! This is most likely a problem with the watson-speech package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR! karma start test/resources/karma.conf.js --single-run
npm ERR! You can get their info via:
npm ERR! npm owner ls watson-speech
npm ERR! There is likely additional logging output above.

Getting _global is undefined

I have included the watson-speech.js for my client side (browser) code, but get the error mentioned in the title. Am attaching a screenshot of the line where its cropping up.
I am using the latest version (v0.33.1) and am creating a bundle using webpack to deploy my application.
Although i see the _global defined as a IFFE right above this line am wondering why would it not get the reference to the window Object.
Does it have to be outside the webback and referenced directly from the browser to work correctly.

Thanking in anticipation,
Kapil.

watson-speech-codebase

watson-speech

I tried this in both Chrome and MF.

developer experience checklist

  • make sure it scales, it could be in the front page of reddit
    • Mainly looking at page weight: Shoot for 2-3mb max and rendering in 5 sec or less.
    • Test with dev tools throttling to 3G speeds and make sure things are still reasonable.
  • Add google analytics
  • blue-green deployment + travis (see this)
  • testing + travis (see this)
  • security.js (helmet + express-rate-limitation) + CSRF (see personality-insights and speech demos)
  • package.json should not specify node-engine so that Bluemix will always use the latest one.
  • Google RE-Captcha support (make sure design take this into account when designing a demo)
    • Talk to design to add it for existing demos.
  • Update travis to send emails when there is a tag release.
  • Bluemix deployment tracker and privacy notice

Invalid token not calling error event

When I submit an invalid token to test catching if a token I'm using has expired, an error appears in the console. To create a more responsive experience I didn't want to generate a new token every time the microphone starts to make sure it's valid, i'd rather use the first token generated that will take an hour to expire. It would be good if this error being thrown could also trigger the error event that we can define.

Javascript console output:
VM10015:35 WebSocket connection to 'wss://stream.watsonplatform.net/speech-to-text/api/v1/recognize?model=en-US_BroadbandModel&watson-token=abc' failed: Error during WebSocket handshake: Unexpected response code: 400
Uncaught TypeError: Uncaught, unspecified "error" event.
Thanks!

Custom model support in javascript sdk

i am using the application https://audio-analysis-starter-kit.mybluemix.net.
I was trying to use custom speech to text models created by me having specific customization id.

The application is using RecognizeElement for transcribing youtube video to text.
After looking at the reference and source code,i came to know that customization_id support is not there in RecognizeElement or rather any function in javascript sdk(RecognizeMicrophone,RecognizeFile etc)

So, i used nodejs sdk's RecognizeStream function which accepts customization_id,after converting video to audio on server side using ffmpeg.

but i am having some performance issues as i have to send "data" events continuously from backend to frontend.(i am using ajax long polling)
and server sent events are not working in my case as registering "EventSource" calls my recognize route which i don't want.

Is there any plan to add support for custom models in speech javascript sdk ?

inactivity_timeout - no event fired

It'd be good for an event to be fired when the inactivity_timeout is reached. That way we could change the button to unselected again and set variables so we know that we are no longer listening to the microphone. This is for when using the recognizeMicrophone method.

I can see that the timeout is simply submitted to the service, so perhaps it'd be best to simply write the timeouts on my side, which I will do for the time being - but thought it was worth a little mention as something that'd be useful.

Unable to record audio with latest version

Cannot update the speech-to-text demo with the latest version of the sdk.
Steps to reproduce:
"watson-speech": "^0.30.1" => cant record audio
"watson-speech": "0.30.1" => records audio

How to solve cross origin issue

i am calling this API from client side and i have to call this from client side

https://stream.watsonplatform.net/text-to-speech/api/v1/synthesize?text=some_text&watson-token=some_token

infact when i tried to get token it still throw same error, i have to call this from browser for my application dont know what to do, stuck more then 9days never found something that could resolve this issue how do i get rid of this cross origin error which making this request infact every request unsuccessful from being able to call
any suggestion please?

image

Issue / question about Speech to text and Text to speech [.synthetize]

Hi,

I'm not sure it's quite an issue, but for me it is.

I try to make a personnel assistant, and for that I have to use Speech to text and text to speech together, unfortunately, for the moment, both are activated in the same time, so watson heard it(him)self and so on.

Is there any method with .synthetize to know when, text to speak stop? like a .on('stop') or something similar? By the way, with .on('error'). I've got a error saying that is not a function.

function audio_synthetize(SomeText) {
  $.ajax({
    type: "POST",
    url: "/api/text-to-speech/token",
    success: function(data) {
      WatsonSpeech.TextToSpeech.synthesize({
        text: SomeText,
        voice: 'en-US_MichaelVoice',
        token: data
      }).on('error', function (error)
      {
        console.log(error);
      });

    }
  });
}

Thx for your help!

Speech to Text in different languages

How can I use Speech to Text (recognizeMicrophone or recognizeFile) for a different language ? NodeJS sdk has a 'model' parameter where language is specified, what is the equivalent here in this javascript sdk ?

getReadable is not a function

After a fresh clone of this repository and an npm install of the examples folder, the speaker-stream-file-html.html example results in the following error upon pressing the button:

TypeError: response.body.getReadable is not a function
    at watson-speech.js:11290
    at <anonymous>

The watson-speech.js folder does not have this function defined (obviously) and this is the only reference to this function I can find in the entire repo. Has this function been deprecated in favour of a newer one?

thanks,
Nick

Trying to call getUserMedia from an insecure document. exports — watson-speech.min.js:6:111929 recongizeSearchStart — backdrop-speech.js:34 fn f — angular.js:23298 $apply — angular.js:15945 (anonymous function) — angular.js:23303 dispatch — jquery-1.11.3.min.js:4:8555 handle — jquery-1.11.3.min.js:4:5258 exports — watson-speech.min.js:6:111929

ISO 11.2
safari
using https ,port:443
error message:
Trying to call getUserMedia from an insecure document.
exports — watson-speech.min.js:6:111929

unclear code in speech_to_text, needs a comment explaining "why"

There is some filtering logic happening here: https://github.com/watson-developer-cloud/speech-javascript-sdk/blob/master/speech-to-text/webaudio-l16-stream.js#L84

@nfriedly mentioned this had something to with antialiasing downsampling, but neither of us could come up with clear wordage on why that code is present or what exactly it's doing.

This is one of the few areas in the code which is very hard to understand by simpling reading it; the rest of the module is pretty self-documented and easy to follow. :)

Cleaning %HESITATIONs

A really tiny fix, but if a %HESITATION is at the end of a sentence, it won't be replaced by an ellipsis.

Better to change:
var reHesitation = /%HESITATION\s/g; // when the service tetects a "hesitation" pause, it literally puts the string "%HESITATION" into the transcription

to:
var reHesitation = /%HESITATION/g; // when the service tetects a "hesitation" pause, it literally puts the string "%HESITATION" into the transcription

is stream bifurcation possible?

My app transcribes audio, but also does a bunch of other processing on the side (sentiment analysis, etc.) I want the speech-to-text to display as quickly as possible in the app, but I also want to save the raw audio on my server for analysis which happens later.

I was thinking maybe this could be done by getting at the underlying mediaStream used by this sdk, and send the audio data in parallel to my backend. Is this possible?

thanks for a great library!

iOS 11 Microphone capture not working.

Hi there,

I've been playing with the SDK for a few weeks now and I've been unable to get the Microphone to work on iOS11, which has recently included getUserMedia and WebRTC support.

At first I thought it was a bug with iOS, but I did some tests outside of the SDK, i've been able to get the mic to work. But with the SDK it fails silently.

Works on iOS 11:
https://youennf.github.io/webrtc-tests/src/content/getusermedia/audio/

Does not work on iOS 11:
https://watson-speech.mybluemix.net/microphone-streaming.html

It might something todo with the get-user-media-promise dependency.

  • Steps to reproduce:
    Use links above

  • Expected behaviour:
    Microphone to transcribe audio or at least some audio response / echo.

  • Actual behaviour:
    No error or response on iOS 11 or in console via Safari. Demo does not work.

  • Browser version(s)
    iOS 11 Safari

  • SDK version:
    0.34.1

Thanks,

Liam

No need to buffer early audio

Nothing is processed until the first {"state": "listening"} message, but early audio is automatically buffered by the Watson service. Therefore, there is no need to buffer it in JS, and disabling this should reduce the initial latency.

webrtc interface?

safari 11 is shipping webrtc support, and I think there's an opportunity here for:

  • a lower latency connection than websockets (based on udp under the hood I think)
  • using native browser streaming apis rather than rolling our own with websockets + browserified stream shim

I realize this is not a small undertaking, but hey, a guy can dream right? :)

Error Uncaught TypeError: Cannot read property 'map' of undefined at example microphone-alternatives.html

Hi,

I'm running the example code microphone-alternatives.html and I got this error in the console (and get no result on screen). I'm using chrome for this but there is a similar error on firefox.

microphone-alternatives.html:52 Uncaught TypeError: Cannot read property 'map' of undefined
at FormatStream. (microphone-alternatives.html:52)
at FormatStream.EventEmitter.emit (events.js:81)
at readableAddChunk (_stream_readable.js:201)
at FormatStream.Readable.push (_stream_readable.js:165)
at FormatStream.Transform.push (_stream_transform.js:133)
at FormatStream.formatResult (format-stream.js:93)
at FormatStream.Transform._read (_stream_transform.js:172)
at FormatStream.Transform._write (_stream_transform.js:160)
at doWrite (_stream_writable.js:335)
at writeOrBuffer (_stream_writable.js:321)

not emitting error event

I am using .recognizeMicrophone function
stream = WatsonSpeech.SpeechToText.recognizeMicrophone({ token: token, continuous: false, outputElement: '<css selector>', content_type: 'audio/wav' });

I have added a "Push to Talk" button [When user click on it this button it will start recording till the button get released] but when I release this button before establishing WebSocket connection it gives bellow error

  • recognize-stream.js:345 Uncaught InvalidStateError: Failed to execute 'send' on 'WebSocket': Still in CONNECTING state.

I have also added the stream.on('error') function for handling this error but it is not emitting that event.
stream.on('error', function(err) { console.log("error is produced when calling STT"); console.log(err); });

Same for bellow 2 errors:

  • recognize-stream.js:260 Uncaught Error: Stream was 0 bytes but needs to be at least 100 bytes.
  • recognize-stream.js:260 Uncaught Error: Session timed out due to inactivity after 30 seconds.

Please advice me how to handle this error.
And correct me if my approach is wrong.

Regards,
Mahesh Revanwar

detect "state:listening" for recognizeFile

How can I detect when stt service return the "state:listening" when using recoginzeFile modules? I've tried create a condition using data.state == "listening", however every time service return the "state", it doesn't hit the condition.

continuous: false option is no longer supported

After chrome updated to version 58.0.3029.110, microphone no longer auto stops after first pause.

var stream = WatsonSpeech.SpeechToText.recognizeMicrophone({
token: token,
continuous: false, // false = automatically stop transcription the first time a pause is detected
outputElement: '#output' // CSS selector or DOM Element
});

And calling stream.stop.bind(stream); does not stop transcripting either.
This example http://localhost:3002/microphone-streaming-auto-stop.html was too working up until 5/18/2017

Unnecessary Decoder Write

From my application, I am using code along these lines ...

var token = "_MY_TOKEN_";
var watson;

function doButtonClick() {
  watson = WatsonSpeech.SpeechToText.recognizeMicrophone( {
    continuous: false,
    objectMode: true,
    token: token
  } );
  watson.setEncoding( 'utf8' );
  watson.on( 'data', doWatsonData );
  watson.on( 'error', doWatsonError );
  watson.on( 'end', doWatsonEnd );
}

function doWatsonData( data ) {
  // Results in [object Object]
  console.log( data );
}

In the "data" event, I use "console.log( data )" and get [object Object]. Placing a breakpoint there, and tracing it backwards into the SDK code, I found that given the settings above, that there's an attempt to "decode" the result, when there is no need to actually decode the result - it has already been decoded, and is ready to be passed along exactly as it is.

To see this in action, use the settings from above, then place a breakpoint around line 3934 in "watson-speech.js" where the line "state.decoder.write(chunk)" is located. When run, the value of "chunk" will be a JavaScript object with the "alternatives" array, etc. up until that line. After that line, the value of "chunk" will be "[object Object]".

For my needs, I was able to remedy this extra encoding by simply commenting out that line, and leaving it commented out. The event gets emitted almost immediately afterwards, and arrives in my handler as an actual object, with properties intact. A more thorough fix would probably involve taking a closer look at the decoding, and see why it is turing a valid JavaScript object into the string representation.

Error: No speech detected for 30s.

Hi there i am encountering the following error can you please help me that how can i set the connection timeout from 30s to 2 minutes?
Error: No speech detected for 30s.
at emitError (watson-speech.js:4973)
at WebSocket.socket.onmessage (watson-speech.js:5001)

Promise not implemented

Hi, in Internet Explorer 11, Uncaught Exception: 'Promise' is undefined at (compiled_code):420 shown when Mic button is clicked. Work fine in Chrome and firefox. Below is the code:

<script src="scripts/fetch.js"></script> <script src="scripts/watson-speech.js"></script> <script> document.querySelector('#button').onclick = function (){ fetch('scripts/token') .then(function(response) { return response.text(); }).then(function (token) { var stream = WatsonSpeech.SpeechToText.recognizeMicrophone({ token: token, outputElement: '#output' // CSS selector or DOM Element }); stream.on('error', function(err) { console.log(err); }); //document.querySelector('#stop').onclick = function() { // stream.stop(); // }; }).catch(function(error) { console.log(error); }); }; </script>

Please advice

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.