speechmatics / speechmatics-python Goto Github PK

View Code? Open in Web Editor NEW

52.0 52.0 15.0 2.95 MB

Python library and CLI for Speechmatics

Home Page: https://speechmatics.github.io/speechmatics-python/

License: MIT License

Makefile 0.26% Python 99.74%

cli speech-recognition speech-to-text transcription

speechmatics-python's Introduction

Speechmatics

Speechmatics (https://speechmatics.com) provides an API for speech to text (https://speechmatics.com/api-details).

This gem implements the API making it easier to integrate into Ruby and/or Rails projects.

Installation

Add this line to your application's Gemfile:

gem 'speechmatics'

And then execute:

$ bundle

Or install it yourself as:

$ gem install speechmatics

Usage

See the tests, or here is basic usage:

# configure with api key and user id to use for all requsts
Speechmatics.configure do |sm|
  sm.auth_token = '<your api key here>'
  sm.user_id    = 1234

  # these are defaults
  sm.adapter    = :excon
  sm.endpoint   = 'https://api.speechmatics.com/v1.0/'
  sm.user_agent = "Speechmatics Ruby Gem #{Speechmatics::VERSION}"
end

# create a new client
c = Speechmatics::Client.new

# create a new client, passing in Faraday parameters
# you can also pass in the same options as in `configure`
c = Speechmatics::Client.new(:request => { :timeout => 240 })

# retrieve user
j = c.user.get

# list jobs
jobs = c.user.jobs.list

# create job
info = c.user.jobs.create(data_file: '/Users/nobody/dev/speechmatics/test/zero.wav')

# create job with more options
info = c.user.jobs.create(
  data_file: '/Users/nobody/dev/speechmatics/test/zero.wav',
  content_type: 'audio/x-wav; charset=binary',
  notification: '<email | none | callback>',
  callback: 'http://www.example.com/transcript_callback'
)

# retrieve job
job = c.user.jobs.find(5678)

# retrieve trancript for a job
trans = c.user.jobs(5678).transcript
# alt syntax
trans = c.user.jobs.transcript(5678)

Changes

0.2.0 - 10/12/2016
- Added alignment support, thanks @rogerz42892
0.1.4 - 8/21/2015
- Add support for video files.
0.1.3 - 6/10/2015
- Add default timeout of 120 seconds.
0.1.2 - 6/10/2015
- Yanked
0.1.1 - 3/26/2015
- Fix to work with Faraday 0.7.6
0.1.0 - 10/24/2014
- Remove /user/$userid/jobs/$jobid/audio endpoint, no longer supported by speechmatics
0.0.2 - 8/7/2014
- Minor bug fix, treat integers/numbers as strings for params
- Use mimemagic to determine content type (no more libmagic dependency; this works on heroku)
- Switched the endpoint to use new 'https' requirement

Contributing

Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

speechmatics-python's People

Contributors

Stargazers

Watchers

Forkers

davidhowlett vladosaurus mchughes288 rakeshv247 funous benjamingorman sharonbubz roy-bentley paulg-speechmatics pradeepk324 hennerm nareshreddy204 tyronect siliciuss axmoney

speechmatics-python's Issues

Remove httpx dependency

Describe the bug
httpx is pinned to version 0.22, which can cause dependency clashes when installing other packages (such as googletrans), I don't think httpx is needed, we could instead use an async client such as https://docs.aiohttp.org/en/stable/

translation_config not used when provided in TranscriptionConfig

Describe the bug
So I tried to adapt the transcribe_from_microphone.py example to be able to translate. For this, I found out that the class TranscriptionConfig can optionally take a TranslationConfig (following the advice of checking the output of speechmatics.cli.get_transcription_config({"config_file":"","mode":"rt"}). But it will never get used as it is thrown out again when as_config() is called on TranscriptionConfig. However using TranslationConfig (or RTTranslationConfig) is not allowed as they don't provide as_config().

I would expect to be able to add either a TranslationConfig into TranscriptionConfig or to use TranslationConfig with as a configuration. I'm not sure what of those two options is supposed to work but none of them working seems like a bug to me - please correct me if I'm wrong!
It doesn't seem as there is any working example of doing translation in the current version of speechmatics-python.

English-only Language Detection in Transcription for English/Spanish Audio

Current behaviour

Only English is detected when transcribing audio (testing Batch transcription) for a 12+ minute English/Spanish video ("en" is the value for "language" in all words in the transcription results)

Steps to Reproduce

Download audio from the YouTube link or GoogleDrive link

Update the audio_file (path to audio file) and speechmaticsAPIkey variables


import speechmatics
import ssl
import certifi

from speechmatics.batch_client import BatchClient

audio_file = "PATH_TO_AUDIO_FILE" 

ssl_context = ssl.create_default_context()
ssl_context.load_verify_locations(certifi.where())

settings = speechmatics.models.ConnectionSettings(
  url="https://asr.api.speechmatics.com/v2",
  auth_token=speechmaticsAPIkey,
    ssl_context=ssl_context,
)

operatingPoint = "enhanced"
expectedLanguages = speechmatics.models.BatchLanguageIdentificationConfig(
            expected_languages=["en", "es"]
          )

LANGUAGE = "auto"

conf = speechmatics.models.BatchTranscriptionConfig(
    language=LANGUAGE,
    operating_point= operatingPoint,
    language_identification_config=expectedLanguages,
)

with BatchClient(settings) as client:
    job_id = client.submit_job(audio=audio_file, transcription_config=conf)
    transcript = client.wait_for_completion(job_id, transcription_format='json-v2')

Expected Behaviour

The 'language' data points in transcript['results'] corresponding to Spanish words are expected to be be "es".

Environment

Mac, Ventura 13.6.2, Python 3.10, standard Python venv.

Other Info

Diarization works for this file, but language detection is still English-only.

Sandboxed (MacOS) Batch transcription fails due to permission error accessing '/private/etc/apache2/mime.types'

Batch transcription works using the same code tested in the python console.

When sandboxed, the Batch transcription process fails, as some underlying library tries to access "/private/etc/apache2/mime.types".

To prevent this permission error, the file needs to be accessed within the app environment (or an alternative option is needed to avoid the file, if possible).

Real-time transcription works in the sandboxed context.

I first thought to store a local copy of the mime.types file and track down where Speechmatics is accessing it (to reroute the library to access the local version), but it is elusive and I suspect there is a better solution.

If there isn't a straightforward solution using the Speechmatics Python method, I'll plan to test with a lower-abstraction approach in python.

Batch transcription test:

import speechmatics
from speechmatics.batch_client import BatchClient

ssl_context = ssl.create_default_context()
ssl_context.load_verify_locations(certifi.where())

conf = speechmatics.models.BatchTranscriptionConfig(
              language=LANGUAGE,
              output_local=englishLocale if LANGUAGE == "en" else None,
              operating_point=operatingPoint,
            )

          settings = speechmatics.models.ConnectionSettings(
            url="https://asr.api.speechmatics.com/v2",
            auth_token=speechmaticsAPIkey,
            ssl_context=ssl_context,
          )

          try:
            with BatchClient(settings) as client:
              job_id = client.submit_job(audio=audio_file, transcription_config=conf)
              transcript = client.wait_for_completion(job_id, transcription_format='json-v2')

Speechmatic Realtime Transcribe Python SocketIO Implementation

Add support/example for microphone or stream input for Realtime STT in python-sdk/websocket

There is no support/documentation provided for Microphone or Data Stream right now for input when using Transcribe in Real-Time

Implement an example websockets in any language with mic or stream support

Auth key error in both Python and CLI version

Describe the bug
Auth key gives 403

To Reproduce

peechmatics config set --auth-token yMJVc*******************HaZ
speechmatics batch transcribe audio.mp3

Expected behavior
I expect it to work

Screenshots/Logs

Processing audio.mp3
==========
httpx.HTTPStatusError: Client error '403 Forbidden' for url 'https://asr.api.speechmatics.com/v2/jobs?sm-sdk=python-cli-1.12.0'
For more information check: https://httpstatuses.com/403

Additional context
API keys are generated and I am sure are correct.

Trial Server Websocket Connection

Hello, i am trying to connect to the Trial realtime API with an authentification token in Python3 on Windows 10:
Example library usage: https://speechmatics.github.io/speechmatics-python/index.html

import speechmatics

# Define connection parameters
conn = speechmatics.models.ConnectionSettings(
      url="ws://trial.asr.api.speechmatics.com/v2",
      ssl_context=None,
      auth_token = 'xxx',
)

However, i get this error:

<class 'websockets.exceptions.InvalidStatusCode'>", 'exception': 'server rejected WebSocket connection: HTTP 200

Is there anything wrong the the endpoint url, or SSL connection settings?

Regards, Josef

Codecov integration is broken

Describe the bug
Codecove integration was not updated after migrating to GitHub actions.

No support for https endpoints?

I wanted to try out the API as documented here: https://docs.speechmatics.com/en/cloud/howto/
and was searching for a Python library to use instead of the ugly curl commands and found this package,
but apparently it only supports websockets not https endpoints?

websockets.exceptions.InvalidURI: https://trial.asr.api.speechmatics.com isn't a valid URI: scheme isn't ws or wss

If I have a https URL that works with https://docs.speechmatics.com/en/cloud/howto/, is it possible to find a corresponding websocket url that works with this library?

When I use --url ws://trial.asr.api.speechmatics.com/v2 is still get the error message complaining about https://...

add pydocstyle checks and fix the findings

Is your feature request related to a problem? Please describe.
n/a

Describe the solution you'd like
As a developer, I want the source code checked against pydocstyle checker when i type make lint.

I also want the CI to pass only when docstrings comply with the checker's verdict.

Describe alternatives you've considered
n/a

Additional context
n/a

Deprecated functionality

Hey, we tried to use this repo, and had to do some too-many fixes for it to be working with the current version of speechmatics, which is 3.8.

Couple of more important points:

SM no longer returns "AudioAdded", instead it returns "DataAdded"
Before sending each binary data, one should send a json data in following format:
{
"message": "AddData",
"offset": 0,
"seq_no": self.seq_no,
"size": audio_chunk_size
}
Weirdly, the "message_buffer_size" in models.py is used for the "count" of the buffer_sephamore, which is the lock used to make sure not to send too many chunks to bog down the SM instance.

There were couple more fixes that i do not remember right now. I highly recomend for this repo to be updated, or make sure to troughly test the repo before using it (and apply the fixes).

Our team may send out a pull-request later.

transcribe remote resource

Is your feature request related to a problem? Please describe.
The cli (and the API) should support transcribe remote (publicly available) resources (like assemblyai.com).

Describe the solution you'd like
Allow passing an URL as a FILEPATH to speechmatics batch transcribe

Additional context
Uploading from a machine may be very slow while fetching from a server/bucket/remote resource is fast and cheap.

asyncio.run() cannot be called from a running event loop

Describe the bug

File D:\Python\Spyder\pkgs\spyder_kernels\py3compat.py:356 in compat_exec
exec(code, globals, locals)

File d:\workspace\soundtranscribe\speechrecognition.py:99
ws.run_synchronously(file, conf, settings)

File D:\Python\Spyder\Python\lib\site-packages\speechmatics\client.py:505 in run_synchronously
asyncio.run(asyncio.wait_for(self.run(*args, **kwargs), timeout=timeout))

File asyncio\runners.py:33 in run

RuntimeError: asyncio.run() cannot be called from a running event loop

To Reproduce

import speechmatics

LANGUAGE = "ar"
AUDIO_FILE_PATH = wav_path
CONNECTION_URL = f"wss://[eu2.rt.speechmatics.com/v2/{LANGUAGE}](http://eu2.rt.speechmatics.com/v2/%7BLANGUAGE%7D)"
AUTH_TOKEN = "My secret Token" # hiden

Create a transcription client

ws = speechmatics.client.WebsocketClient(
    speechmatics.models.ConnectionSettings(
        url=CONNECTION_URL,
        auth_token=AUTH_TOKEN,
        generate_temp_token=True, # Enterprise customers don't need to provide this parameter
    )
)

Define an event handler to print the partial transcript

def print_partial_transcript(msg):
    print(f"(PART) {msg['metadata']['transcript']}")

Define an event handler to print the full transcript

def print_transcript(msg):
    print(f"(FULL) {msg['metadata']['transcript']}")

Register the event handler for partial transcript

ws.add_event_handler(
    event_name=speechmatics.models.ServerMessageType.AddPartialTranscript,
    event_handler=print_partial_transcript,
)

Register the event handler for full transcript

ws.add_event_handler(
    event_name=speechmatics.models.ServerMessageType.AddTranscript,
    event_handler=print_transcript,
)

settings = speechmatics.models.AudioSettings()

Define transcription parameters

conf = speechmatics.models.TranscriptionConfig(
    language=LANGUAGE,
    enable_partials=True,
)

with open(AUDIO_FILE_PATH, 'rb') as file:
    ws.run_synchronously(file, conf, settings)

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots/Logs
If applicable, add screenshots or logs to help explain your problem.

Additional context
Add any other context about the problem here.