alexxit / streamassist Goto Github PK

Home Assistant custom component that allows you to turn almost any camera and almost any speaker into a local voice assistant

License: MIT License

Python 100.00%

hacs home-assistant voice-assistant voice-control voice-recognition

streamassist's Introduction

Stream Assist

Home Assistant custom component that allows you to turn almost any camera and almost any speaker into a local voice assistant.

Component will use:

Stream integration for receiving audio from camera (RTSP/HTTP/RTMP) and automatic transcoding of audio codec into a format suitable for Speech-to-Text (STT)
Assist pipeline integration for run: Speech-to-Text (STT) => Natural Language Processing (NLP) => Text-to-Speech (TTS)
Almost any Media player for play audio respose from Text-to-Speech (TTS)

Assist pipeline can use:

openWakeWord core Add-on for wake word detection
Whisper core Add-on for local STT
Piper core Add-on for local TTS
Faster Whisper custom integration for local STT
Google Translate core integration for cloud TTS

Video instruction from fixtSE

Installation

HACS > Integrations > 3 dots (upper top corner) > Custom repositories > URL: AlexxIT/StreamAssist, Category: Integration > Add > wait > Stream Assist > Install

Or manually copy stream_assist folder from latest release to /config/custom_components folder.

Configuration

Config wake word detection (WAKE)

Add wake word detection Add-on Settings > Add-ons > Add-on Store > openWakeWord > Install
Config WAKE Add-on:
openWakeWord > Configuration
Add WAKE Integration:
Settings > Integrations > openWakeWord > Configure

Config local Speech-to-Text (STT)

Add local Speech-to-Text Add-on
Settings > Add-ons > Add-on Store > Whisper > Install
Config STT Add-on:
Whisper > Configuration
Add STT Integration:
Settings > Integrations > Whisper > Configure

Config local Text-to-Speech (TTS)

Add local Text-to-Speech Add-on
Settings > Add-ons > Add-on Store > Piper > Install
Config TTS Integration:
Piper > Configuration
Add TTS Integration:
Settings > Integrations > Piper > Configure

Config local Voice assistant (INTENT)

Config Voice assistant:
Settings > Voice assistants > Home Assistant > Select: STT, TTS and WAKE

Config Stream Assist

Add Stream Assist Integration
Settings > Integrations > Add Integration > Stream Assist
Config Stream Assist Integration
Settings > Integrations > Stream Assist > Configure

You can select or camera entity_id as audio (MIC) source or stream URL.

You can select Voice Assistant Pipeline for recognition process: WAKE => STT => NLP => TTS. By default componen will use default pipeline. You can create several Pipelines with different settings. And several Stream Assist components with different settings.

You can select one or multiple Media players (SND) to output audio response. If your camera support two way audio you can use WebRTC Camera custom integration to add it as Media player.

You can set STT start media for play "beep" after WAKE detection (ex: media-source://media_source/local/beep.mp3).

Using

Component has MIC switch and multiple sensors - WAKE, STT, INTENT, TTS. There may be fewer sensors, depending on the Pipeline settings.

The sensor attributes contain a lot of useful information about the results of each step of the assistant.

You can also view the pipelines running history in the Home Assistant interface:

Settings > Voice assistants > Pipeline > 3 dots > Debug

Service

You can run pipeline as a service. Almost all settings optional. But allow you to achieve customisations that are not possible in Hass by default.

service: stream_assist.run
data:
  stream_source: rtsp://...
  camera_entity_id: camera.xxx
  player_entity_id: media_player.xxx
  stt_start_media: media-source://media_source/local/beep.mp3
  pipeline_id: abcdefg...
  assist:
    start_stage: wake_word  # wake_word, stt, intent, tts
    end_stage: tts
    pipeline:
      conversation_language: en
      conversation_engine: homeassistant
      language: en
      name: Home Assistant
      stt_engine: stt.faster_whisper
      stt_language: en
      tts_engine: tts.google_en_com
      tts_language: en
      tts_voice: None
      wake_word_entity: wake_word.openwakeword
      wake_word_id: None
    wake_word_settings: { timeout: 5 }
    audio_settings:
      noise_suppression_level: None
      auto_gain_dbfs: None
      volume_multiplier: None
    conversation_id: None
    device_id: None
    intent_input: None
    tts_audio_output: None  # None, wav, mp3
    tts_input: None
  stream:
    file: ...
    options: {}

Tips

Recommended settings for Whisper:
- Model: small-int8 or medium-int8
- Beam size: 5
You can add remote Whisper/Piper installation from another server:
- First server: Settings > Add-ons > Whisper/Piper > Configuration > Network > Select port
- Second server: Settings > Integrations > Add integration > Wyoming Protocol > Select: first server IP, add-on port
You can use Google Translate integration instead of Piper, which support many languages for TTS.
If your environment does not allow you to install add-ons, you can install Faster Whisper custom integration for local STT.

streamassist's People

Contributors

Stargazers

Watchers

Forkers

biakss starsoccer tbrasser marcomow ej52 pjfian asmsaifs witold-gren bkbilly

streamassist's Issues

Support go2rtc stream name in url - like in webrtc HA card

[New Feature] - Create extra two entries with question and response text

Hi @AlexxIT, thanks for really great plugin.

Could you create an additional entities that would store the last processed query and response? eg. Last question and Last response for each configured stream? This would really help a lot in streamlining and customizing text and response processing..

PS. Also it would be really nice to add the ability to set the end of sentence recognition (eg. Default, Aggressive). I noticed that when there is a slight noise in the background, the text is constantly being processed even if I finish speak sentence.. process wait for complete silence to stop processing sound.

normal increase in CPU by 40% and RAM by 10%? rpi4 8gb

after following this guide https://fixtse.com/blog/stream-assist I noticed a large increase in CPU and RAM, is this normal? rpi4 8gb

Wyoming-whisper errors and no timeout on STT error

When using Wyoming-whisper docker image I've got this in logs, and STT sensor is 'processing' quite long (I believe it never ends)

wyoming-whisper           | ERROR:asyncio:Task exception was never retrieved
wyoming-whisper           | future: <Task finished name='Task-11' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.9/dist-packages/wyoming/server.py:26> exception=ValueError("can't extend empty axis 0 using modes other than 'constant' or 'empty'")>
wyoming-whisper           | Traceback (most recent call last):
wyoming-whisper           |   File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 32, in run
wyoming-whisper           |     if not (await self.handle_event(event)):
wyoming-whisper           |   File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/handler.py", line 61, in handle_event
wyoming-whisper           |     segments, _info = self.model.transcribe(
wyoming-whisper           |   File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/faster_whisper/transcribe.py", line 124, in transcribe
wyoming-whisper           |     features = self.feature_extractor(audio)
wyoming-whisper           |   File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/faster_whisper/feature_extractor.py", line 152, in __call__
wyoming-whisper           |     frames = self.fram_wave(waveform)
wyoming-whisper           |   File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/faster_whisper/feature_extractor.py", line 98, in fram_wave
wyoming-whisper           |     frame = np.pad(frame, pad_width=padd_width, mode="reflect")
wyoming-whisper           |   File "<__array_function__ internals>", line 200, in pad
wyoming-whisper           |   File "/usr/local/lib/python3.9/dist-packages/numpy/lib/arraypad.py", line 815, in pad
wyoming-whisper           |     raise ValueError(
wyoming-whisper           | ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'

Missing "remove/abort/unselect" entity_id in add integrity dialog

When I choice to use entity_id, but then I would like to use url there is no way to provide empty entity_id

Question about Android audio only

First off, this is super impressive and very close to what I'm wanting to use. I'm working on making a replacement for my Echo Show devices by using an Android tablet. fixtSE made a video featuring your StreamAssist and in it he demos it with Android. I'd absolutely love to use this if it could be used as an Android audio input device. I am already using the camera on the device with Fully Kiosk for motion detect and am pretty sure that the IPCAM software and the Fully Kiosk can't use the same camera device.

Is it possible to use StreamAssist but only use the mic on the tablet and leave the camera available to Fully Kiosk?

Thanks!

FR: Allow (via automation) to start listening again (without wakeword) after SND

Currently after SND stage is done, I have to start over by speaking the wakeword, and then there is no context of the previous pipeline run.

Would it be possible to go back to STT after SND. (with a timeout on silence)?
Or maybe there is a more elegant/effective way to do this.

Error on configuration / attempt to use

First, thank you for this integration! This is exactly what I wished for when I first saw the wake word article for HA assist.

I added my camera via entity. After clicking configure, I get following error (duplicate of #6 )

Logger: aiohttp.server
Source: /usr/local/lib/python3.11/site-packages/aiohttp/web_protocol.py:403
First occurred: 22:39:15 (1 occurrences)
Last logged: 22:39:15

Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/aiohttp/web_protocol.py", line 433, in _handle_request
    resp = await request_handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/web_app.py", line 504, in _handle
    resp = await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/web_middlewares.py", line 117, in impl
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/security_filter.py", line 85, in security_filter_middleware
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/forwarded.py", line 100, in forwarded_middleware
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/request_context.py", line 28, in request_context_middleware
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/auth.py", line 236, in auth_middleware
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/headers.py", line 31, in headers_middleware
    response = await handler(request)
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/view.py", line 148, in handle
    result = await handler(request, **request.match_info)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/decorators.py", line 63, in with_admin
    return await func(self, request, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/config/config_entries.py", line 213, in post
    return await super().post(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/data_validator.py", line 72, in wrapper
    result = await method(view, request, data, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/data_entry_flow.py", line 71, in post
    result = await self._flow_mgr.async_init(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/data_entry_flow.py", line 265, in async_init
    result = await self._async_handle_step(flow, flow.init_step, data)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/data_entry_flow.py", line 394, in _async_handle_step
    result: FlowResult = await getattr(flow, method)(user_input)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/stream_assist/config_flow.py", line 58, in async_step_init
    defaults.setdefault("vad_mode", VAD.vad_mode)
                                    ^^^^^^^^^^^^
AttributeError: type object 'VoiceCommandSegmenter' has no attribute 'vad_mode'

When I turn on switch the first time, I get following error:

Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/config/custom_components/stream_assist/switch.py", line 152, in async_process_audio_stream
    async for _ in self.audio_stream(self.close):
  File "/config/custom_components/stream_assist/switch.py", line 110, in audio_stream
    if not self.vad.process(chunk):
           ^^^^^^^^^^^^^^^^^^^^^^^
TypeError: VoiceCommandSegmenter.process() missing 1 required positional argument: 'is_speech'

And VAD changes to standby and does not do anything.

Won't Work with Tapo C110 & C200

Hello, the possibility to use the camera stream is fantastic. But unfortunately it does not work for me. Tested via the Tapo integration (models C110 & 2x C200) and via the restream from frigate with audio in aac. Only via Android IP camera it works. How can I debug?
However, the video stream in Homeassistant of the cameras has sound.

Maybe someone can help me.

STT, target is not populated

Hello, thanks for taking the time to create this.

I have run into one issue, when the response comes back and TTS is triggered to play on a media device it does not play. I have configured it for a media player, but when I check the logs, I see that the command sent had no target devices even though I have selected one. Is this correct to expect the device to be here?

Thanks!

Question: Works with Google Nest Battery Cam?

Would this work for a google nest cam battery 2nd generation (given the limitations of how sdm api works)?

If not, could I use a Wyoming Satellite (https://www.youtube.com/watch?v=eTKgc0YDCwE) as an audio input?

P.S. I am using Home Assistant cloud for stt and tts and have open wakeword running on the satellite device (not on the same device as my home assistant installation).

STT sometimes hangs for 7-10 sec

Sometimes STT is lightning-fast, but half of the time it hangs in "start" state for up to 10 seconds after i finished talking...
Overall - it's great integration, thank you!

Add integration dialog - check stream before add

It is possible to add stream that not existing or is broken

Add delay until STT start media finishes playing

Hello. Great job. I was waiting for the wake word for Stream Assist and I'm glad you managed to do it. My problem is that for "STT start media" I want to use personalized random answers like ”yes, i m listening”, ”how can I assist you” etc. and, because VAD is too aggressive, it also records part of the answer ”yes , i m listening” reason for which it gives an error response, that it did not understand the request. I tried an automation so that when it detects the wake word it turns off the microphone switch for a second and then turns it on again, but it doesn't start listening again. Can you make it possible to set a delay between wake word detection and STT listening?

Error: "Config flow could not be loaded"

When I try to config the integration entity I get error: Config flow could not be loaded:

Logs show error: AttributeError: type object 'VoiceCommandSegmenter' has no attribute 'vad_mode':

Logger: aiohttp.server
Source: /usr/local/lib/python3.11/site-packages/aiohttp/web_protocol.py:403
First occurred: 10:06:19 PM (7 occurrences)
Last logged: 10:26:31 PM

Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/aiohttp/web_protocol.py", line 433, in _handle_request
    resp = await request_handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/web_app.py", line 504, in _handle
    resp = await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/web_middlewares.py", line 117, in impl
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/security_filter.py", line 85, in security_filter_middleware
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/forwarded.py", line 227, in forwarded_middleware
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/request_context.py", line 28, in request_context_middleware
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/ban.py", line 80, in ban_middleware
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/auth.py", line 236, in auth_middleware
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/headers.py", line 31, in headers_middleware
    response = await handler(request)
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/view.py", line 148, in handle
    result = await handler(request, **request.match_info)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/decorators.py", line 63, in with_admin
    return await func(self, request, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/config/config_entries.py", line 213, in post
    return await super().post(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/http/data_validator.py", line 72, in wrapper
    result = await method(view, request, data, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/data_entry_flow.py", line 71, in post
    result = await self._flow_mgr.async_init(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/data_entry_flow.py", line 265, in async_init
    result = await self._async_handle_step(flow, flow.init_step, data)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/data_entry_flow.py", line 394, in _async_handle_step
    result: FlowResult = await getattr(flow, method)(user_input)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/stream_assist/config_flow.py", line 58, in async_step_init
    defaults.setdefault("vad_mode", VAD.vad_mode)
                                    ^^^^^^^^^^^^
AttributeError: type object 'VoiceCommandSegmenter' has no attribute 'vad_mode'

I've tried selecting the entity from list:

I've tried pasting in the go2rtc camera name.

Go2RTC

Here are my default HA Voice Pipeline settings:

Thanks for help @AlexxIT !

No TTS audio playing over Sonos media player

I am getting the following error when trying to use a Sonos media player. The beep.wav plays after the wake word and commands are executed but I get no TTS audio through my Sonos AMP / ceiling speakers.

ERROR (SyncWorker_45) [homeassistant.components.sonos.media_player] Sonos does not support media type "audio/mpeg"

Thanks for all you do.

Add wake word support

Now that core has added the wake word integration (home-assistant/core#96380) this component could make use of it after running the Voice Activity Detector.

It could be integrated using wyoming protocol, since that core integration also provides a Wyoming implementation that can be used with an openWakeWord container.

Create icon on Home Assistant Brands

I've designed a new icon and I've created a pull request on Home Assistant Brands: home-assistant/brands#5307

Hope you like it!

Add visual custom responses with gifs and tts

A while ago I managed to get custom visual responses to play on an android tablet using an Esp32 satellite.
This is a small demo.
Thanks to the fact that AlexxIT managed to introduce wake word detection in the Stream Asssit integration, I thought of replacing the Esp32 satellite that I was using in this project with the Stream Assist integration and Rtpmic app on android tablet.
I added to "Stt start media" custom tts responses and two gifs for speech and listen to be played with Browser Mod popup.
The advantage is that the custom responses are very easy to adapt for each language using your favorite tts service.
Because I made several changes in the Stream Assist integration code, before making the pull request I chose to copy AlexxIT's repository and add it with the changes made to my github page along with the necessary instructions.
@AlexxIT please, if you have time, take a look at this modified repository and tell me your opinion, if you want to add these changes to your integration or you prefer to keep these changes only in the repository modified by me on my github.

Add area into Assist context

All satellites HA presented (including Wyoming-satellite and ESP32-S3 satellite) have something called "area awareness". I guess it's just device area getting provided to Assist pipeline with STT data.

In current state there's no area-awareness for StreamAssist. Could you please look into it?