Code Monkey home page Code Monkey logo

speech-javascript-sdk's Introduction

IBM Watson Speech Services for Web Browsers

Build Status npm-version

Allows you to easily add voice recognition and synthesis to any web app with minimal code.

Warning This library is still has a few rough edges and may yet see breaking changes.

For Web Browsers Only

This library is primarily intended for use in browsers. Check out watson-developer-cloud to use Watson services (speech and others) from Node.js.

However, a server-side component is required to generate auth tokens. The examples/ folder includes example Node.js and Python servers, and SDKs are available for Node.js, Java, Python, and there is also a REST API.

Installation - standalone

Pre-compiled bundles are available from on GitHub Releases - just download the file and drop it into your website: https://github.com/watson-developer-cloud/speech-javascript-sdk/releases

Installation - npm with browserify

This library is built with browserify and easy to use in browserify-based projects :

npm install --save watson-speech

Breaking change for v0.22.0

The format of objects emitted in objectMode has changed from {alternatives: [...], index: 1} to {results: [{alternatives: [...]}], result_index: 1}. This was done to enable the new speaker_labels feature. There is a new ResultExtractor class and recognizeMicrophone() and recognizeFile() both accept a new extract_results option to restore the old behavior.

The format now exactly matches what the Watson Speech to Text service returns and shouldn't change again unless the Watson service changes.

API & Examples

The basic API is outlined below, see complete API docs at http://watson-developer-cloud.github.io/speech-javascript-sdk/master/

See several examples at https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/master/examples/static/

All API methods require an auth token that must be generated server-side. (See https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/master/examples/ for a couple of basic examples in Node.js and Python.)

Speaks the supplied text through an automatically-created <audio> element. Currently limited to text that can fit within a GET URL (this is particularly an issue on Internet Explorer before Windows 10 where the max length is around 1000 characters after the token is accounted for.)

Options:

  • text - the text to transcribe // todo: list supported languages
  • voice - the desired playback voice's name - see .getVoices(). Note that the voices are language-specific.
  • autoPlay - set to false to prevent the audio from automatically playing

Options:

  • keepMic: if true, preserves the MicrophoneStream for subsequent calls, preventing additional permissions requests in Firefox
  • Other options passed to RecognizeStream
  • Other options passed to WritableElementStream if options.outputElement is set

Requires the getUserMedia API, so limited browser compatibility (see http://caniuse.com/#search=getusermedia) Also note that Chrome requires https (with a few exceptions for localhost and such) - see https://www.chromium.org/Home/chromium-security/prefer-secure-origins-for-powerful-new-features

Pipes results through a FormatStream by default, set options.format=false to disable.

Known issue: Firefox continues to display a microphone icon in the address bar after recording has ceased. This is a browser bug.

Can recognize and optionally attempt to play a File or Blob (such as from an <input type="file"/> or from an ajax request.)

Options:

  • data: a Blob or File instance.
  • play: (optional, default=false) Attempt to also play the file locally while uploading it for transcription
  • Other options passed to RecognizeStream
  • Other options passed to WritableElementStream if options.outputElement is set

playrequires that the browser support the format; most browsers support wav and ogg/opus, but not flac.) Will emit a playback-error on the RecognizeStream if playback fails. Playback will automatically stop when .stop() is called on the RecognizeStream.

Pipes results through a TimingStream by if options.play=true, set options.realtime=false to disable.

Pipes results through a FormatStream by default, set options.format=false to disable.

Changes

There have been a few breaking changes in recent releases:

  • Removed SpeechToText.recognizeElement() due to quality issues
  • renamed recognizeBlob to recognizeFile to make the primary usage more apparent
  • Changed playFile option of recognizeBlob() to just play, corrected default
  • Changed format of objects emitted in objectMode to exactly match what service sends. Added ResultExtractor class and extract_results option to enable older behavior.

See CHANGELOG.md for a complete list of changes.

todo

  • Further solidify API
  • break components into standalone npm modules where it makes sense
  • run integration tests on travis (fall back to offline server for pull requests)
  • add even more tests
  • better cross-browser testing (IE, Safari, mobile browsers - maybe saucelabs?)
  • update node-sdk to use current version of this lib's RecognizeStream (and also provide the FormatStream + anything else that might be handy)
  • move result and results events to node wrapper (along with the deprecation notice)
  • improve docs
  • consider a wrapper to match https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
  • support a "hard" stop that prevents any further data events, even for already uploaded audio, ensure timing stream also implements this.
  • look for bug where single-word final results may omit word confidence (possibly due to FormatStream?)
  • fix bug where TimingStream shows words slightly before they're spoken

speech-javascript-sdk's People

Contributors

nfriedly avatar germanattanasio avatar ricardojudo avatar hkomine avatar kushan-chinthaka avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.