Code Monkey home page Code Monkey logo

speechmatics-python's Introduction

Speechmatics

Gem Version Build Status

Speechmatics (https://speechmatics.com) provides an API for speech to text (https://speechmatics.com/api-details).

This gem implements the API making it easier to integrate into Ruby and/or Rails projects.

Installation

Add this line to your application's Gemfile:

gem 'speechmatics'

And then execute:

$ bundle

Or install it yourself as:

$ gem install speechmatics

Usage

See the tests, or here is basic usage:

# configure with api key and user id to use for all requsts
Speechmatics.configure do |sm|
  sm.auth_token = '<your api key here>'
  sm.user_id    = 1234

  # these are defaults
  sm.adapter    = :excon
  sm.endpoint   = 'https://api.speechmatics.com/v1.0/'
  sm.user_agent = "Speechmatics Ruby Gem #{Speechmatics::VERSION}"
end

# create a new client
c = Speechmatics::Client.new

# create a new client, passing in Faraday parameters
# you can also pass in the same options as in `configure`
c = Speechmatics::Client.new(:request => { :timeout => 240 })

# retrieve user
j = c.user.get

# list jobs
jobs = c.user.jobs.list

# create job
info = c.user.jobs.create(data_file: '/Users/nobody/dev/speechmatics/test/zero.wav')

# create job with more options
info = c.user.jobs.create(
  data_file: '/Users/nobody/dev/speechmatics/test/zero.wav',
  content_type: 'audio/x-wav; charset=binary',
  notification: '<email | none | callback>',
  callback: 'http://www.example.com/transcript_callback'
)

# retrieve job
job = c.user.jobs.find(5678)

# retrieve trancript for a job
trans = c.user.jobs(5678).transcript
# alt syntax
trans = c.user.jobs.transcript(5678)

Changes

  • 0.2.0 - 10/12/2016

    • Added alignment support, thanks @rogerz42892
  • 0.1.4 - 8/21/2015

    • Add support for video files.
  • 0.1.3 - 6/10/2015

    • Add default timeout of 120 seconds.
  • 0.1.2 - 6/10/2015

    • Yanked
  • 0.1.1 - 3/26/2015

    • Fix to work with Faraday 0.7.6
  • 0.1.0 - 10/24/2014

    • Remove /user/$userid/jobs/$jobid/audio endpoint, no longer supported by speechmatics
  • 0.0.2 - 8/7/2014

    • Minor bug fix, treat integers/numbers as strings for params
    • Use mimemagic to determine content type (no more libmagic dependency; this works on heroku)
    • Switched the endpoint to use new 'https' requirement

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

speechmatics-python's People

Contributors

aaronng91 avatar anjz avatar benjamingorman avatar dan-cochrane avatar davidhowlett avatar dependabot[bot] avatar dumitrugutu avatar francisr avatar funous avatar giorgoshadji avatar hennerm avatar jamesg-speechmatics avatar jameso-speechmatics avatar jamied157 avatar jrg1381 avatar khwong-speechmatics avatar peteruh avatar prachetaphadnis avatar pradeepk324 avatar rakeshv247 avatar sm-gravid-day avatar speechmatics-bot avatar teridspeech avatar tudorcrl avatar viren-nadkarni avatar vladosaurus avatar weakcamel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speechmatics-python's Issues

translation_config not used when provided in TranscriptionConfig

Describe the bug
So I tried to adapt the transcribe_from_microphone.py example to be able to translate. For this, I found out that the class TranscriptionConfig can optionally take a TranslationConfig (following the advice of checking the output of speechmatics.cli.get_transcription_config({"config_file":"","mode":"rt"}). But it will never get used as it is thrown out again when as_config() is called on TranscriptionConfig. However using TranslationConfig (or RTTranslationConfig) is not allowed as they don't provide as_config().

I would expect to be able to add either a TranslationConfig into TranscriptionConfig or to use TranslationConfig with as a configuration. I'm not sure what of those two options is supposed to work but none of them working seems like a bug to me - please correct me if I'm wrong!
It doesn't seem as there is any working example of doing translation in the current version of speechmatics-python.

English-only Language Detection in Transcription for English/Spanish Audio

Current behaviour

Only English is detected when transcribing audio (testing Batch transcription) for a 12+ minute English/Spanish video ("en" is the value for "language" in all words in the transcription results)

Steps to Reproduce

Download audio from the YouTube link or GoogleDrive link

Update the audio_file (path to audio file) and speechmaticsAPIkey variables


import speechmatics
import ssl
import certifi

from speechmatics.batch_client import BatchClient

audio_file = "PATH_TO_AUDIO_FILE" 

ssl_context = ssl.create_default_context()
ssl_context.load_verify_locations(certifi.where())

settings = speechmatics.models.ConnectionSettings(
  url="https://asr.api.speechmatics.com/v2",
  auth_token=speechmaticsAPIkey,
    ssl_context=ssl_context,
)

operatingPoint = "enhanced"
expectedLanguages = speechmatics.models.BatchLanguageIdentificationConfig(
            expected_languages=["en", "es"]
          )

LANGUAGE = "auto"

conf = speechmatics.models.BatchTranscriptionConfig(
    language=LANGUAGE,
    operating_point= operatingPoint,
    language_identification_config=expectedLanguages,
)

with BatchClient(settings) as client:
    job_id = client.submit_job(audio=audio_file, transcription_config=conf)
    transcript = client.wait_for_completion(job_id, transcription_format='json-v2')

Expected Behaviour

The 'language' data points in transcript['results'] corresponding to Spanish words are expected to be be "es".

Environment

Mac, Ventura 13.6.2, Python 3.10, standard Python venv.

Other Info

Diarization works for this file, but language detection is still English-only.

Sandboxed (MacOS) Batch transcription fails due to permission error accessing '/private/etc/apache2/mime.types'

Batch transcription works using the same code tested in the python console.

When sandboxed, the Batch transcription process fails, as some underlying library tries to access "/private/etc/apache2/mime.types".

To prevent this permission error, the file needs to be accessed within the app environment (or an alternative option is needed to avoid the file, if possible).

Real-time transcription works in the sandboxed context.

I first thought to store a local copy of the mime.types file and track down where Speechmatics is accessing it (to reroute the library to access the local version), but it is elusive and I suspect there is a better solution.

If there isn't a straightforward solution using the Speechmatics Python method, I'll plan to test with a lower-abstraction approach in python.

Batch transcription test:

import speechmatics
from speechmatics.batch_client import BatchClient

ssl_context = ssl.create_default_context()
ssl_context.load_verify_locations(certifi.where())

conf = speechmatics.models.BatchTranscriptionConfig(
              language=LANGUAGE,
              output_local=englishLocale if LANGUAGE == "en" else None,
              operating_point=operatingPoint,
            )

          settings = speechmatics.models.ConnectionSettings(
            url="https://asr.api.speechmatics.com/v2",
            auth_token=speechmaticsAPIkey,
            ssl_context=ssl_context,
          )

          try:
            with BatchClient(settings) as client:
              job_id = client.submit_job(audio=audio_file, transcription_config=conf)
              transcript = client.wait_for_completion(job_id, transcription_format='json-v2')

Auth key error in both Python and CLI version

Describe the bug
Auth key gives 403

To Reproduce

peechmatics config set --auth-token yMJVc*******************HaZ
speechmatics batch transcribe audio.mp3

Expected behavior
I expect it to work

Screenshots/Logs

Processing audio.mp3
==========
httpx.HTTPStatusError: Client error '403 Forbidden' for url 'https://asr.api.speechmatics.com/v2/jobs?sm-sdk=python-cli-1.12.0'
For more information check: https://httpstatuses.com/403

Additional context
API keys are generated and I am sure are correct.

Trial Server Websocket Connection

Hello, i am trying to connect to the Trial realtime API with an authentification token in Python3 on Windows 10:
Example library usage: https://speechmatics.github.io/speechmatics-python/index.html

import speechmatics

# Define connection parameters
conn = speechmatics.models.ConnectionSettings(
      url="ws://trial.asr.api.speechmatics.com/v2",
      ssl_context=None,
      auth_token = 'xxx',
)

However, i get this error:

<class 'websockets.exceptions.InvalidStatusCode'>", 'exception': 'server rejected WebSocket connection: HTTP 200

Is there anything wrong the the endpoint url, or SSL connection settings?

Regards, Josef

No support for https endpoints?

I wanted to try out the API as documented here: https://docs.speechmatics.com/en/cloud/howto/
and was searching for a Python library to use instead of the ugly curl commands and found this package,
but apparently it only supports websockets not https endpoints?

websockets.exceptions.InvalidURI: https://trial.asr.api.speechmatics.com isn't a valid URI: scheme isn't ws or wss

If I have a https URL that works with https://docs.speechmatics.com/en/cloud/howto/, is it possible to find a corresponding websocket url that works with this library?

When I use --url ws://trial.asr.api.speechmatics.com/v2 is still get the error message complaining about https://...

add pydocstyle checks and fix the findings

Is your feature request related to a problem? Please describe.
n/a

Describe the solution you'd like
As a developer, I want the source code checked against pydocstyle checker when i type make lint.

I also want the CI to pass only when docstrings comply with the checker's verdict.

Describe alternatives you've considered
n/a

Additional context
n/a

Deprecated functionality

Hey, we tried to use this repo, and had to do some too-many fixes for it to be working with the current version of speechmatics, which is 3.8.

Couple of more important points:

  • SM no longer returns "AudioAdded", instead it returns "DataAdded"
  • Before sending each binary data, one should send a json data in following format:
    {
    "message": "AddData",
    "offset": 0,
    "seq_no": self.seq_no,
    "size": audio_chunk_size
    }
  • Weirdly, the "message_buffer_size" in models.py is used for the "count" of the buffer_sephamore, which is the lock used to make sure not to send too many chunks to bog down the SM instance.

There were couple more fixes that i do not remember right now. I highly recomend for this repo to be updated, or make sure to troughly test the repo before using it (and apply the fixes).

Our team may send out a pull-request later.

transcribe remote resource

Is your feature request related to a problem? Please describe.
The cli (and the API) should support transcribe remote (publicly available) resources (like assemblyai.com).

Describe the solution you'd like
Allow passing an URL as a FILEPATH to speechmatics batch transcribe

Additional context
Uploading from a machine may be very slow while fetching from a server/bucket/remote resource is fast and cheap.

asyncio.run() cannot be called from a running event loop

Describe the bug

File D:\Python\Spyder\pkgs\spyder_kernels\py3compat.py:356 in compat_exec
exec(code, globals, locals)

File d:\workspace\soundtranscribe\speechrecognition.py:99
ws.run_synchronously(file, conf, settings)

File D:\Python\Spyder\Python\lib\site-packages\speechmatics\client.py:505 in run_synchronously
asyncio.run(asyncio.wait_for(self.run(*args, **kwargs), timeout=timeout))

File asyncio\runners.py:33 in run

RuntimeError: asyncio.run() cannot be called from a running event loop

To Reproduce

import speechmatics

LANGUAGE = "ar"
AUDIO_FILE_PATH = wav_path
CONNECTION_URL = f"wss://[eu2.rt.speechmatics.com/v2/{LANGUAGE}](http://eu2.rt.speechmatics.com/v2/%7BLANGUAGE%7D)"
AUTH_TOKEN = "My secret Token" # hiden

Create a transcription client

ws = speechmatics.client.WebsocketClient(
    speechmatics.models.ConnectionSettings(
        url=CONNECTION_URL,
        auth_token=AUTH_TOKEN,
        generate_temp_token=True, # Enterprise customers don't need to provide this parameter
    )
)

Define an event handler to print the partial transcript

def print_partial_transcript(msg):
    print(f"(PART) {msg['metadata']['transcript']}")

Define an event handler to print the full transcript

def print_transcript(msg):
    print(f"(FULL) {msg['metadata']['transcript']}")

Register the event handler for partial transcript

ws.add_event_handler(
    event_name=speechmatics.models.ServerMessageType.AddPartialTranscript,
    event_handler=print_partial_transcript,
)

Register the event handler for full transcript

ws.add_event_handler(
    event_name=speechmatics.models.ServerMessageType.AddTranscript,
    event_handler=print_transcript,
)

settings = speechmatics.models.AudioSettings()

Define transcription parameters

conf = speechmatics.models.TranscriptionConfig(
    language=LANGUAGE,
    enable_partials=True,
)

with open(AUDIO_FILE_PATH, 'rb') as file:
    ws.run_synchronously(file, conf, settings)

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots/Logs
If applicable, add screenshots or logs to help explain your problem.

Additional context
Add any other context about the problem here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.