rhasspy / rhasspy3 Goto Github PK

View Code? Open in Web Editor NEW

283.0 28.0 22.0 15.72 MB

An open source voice assistant toolkit for many human languages

License: MIT License

Python 85.18% Shell 8.41% Makefile 0.27% CSS 0.02% JavaScript 0.55% HTML 5.58%

rhasspy3's Introduction

NOTE: This is a very early developer preview!

An open source toolkit for building voice assistants.

Rhasspy focuses on:

Privacy - no data leaves your computer unless you want it to
Broad language support - more than just English
Customization - everything can be changed

Getting Started

Check out the tutorial
Connect Rhasspy to Home Assistant
- Install the Rhasspy 3 add-on
Run one or more satellites
Join the community

Missing Pieces

This is a developer preview, so there are lots of things missing:

A user friendly web UI
An automated method for installing programs/services and downloading models
Support for custom speech to text grammars
Intent systems besides Home Assistant
The ability to accumulate context within a pipeline

Core Concepts

Domains

Rhasspy is organized by domain:

mic - audio input
wake - wake word detection
asr - speech to text
vad - voice activity detection
intent - intent recognition from text
handle - intent or text input handling
tts - text to speech
snd - audio output

Programs

Rhasspy talks to external programs using the Wyoming protocol. You can add your own programs by implementing the protocol or using an adapter.

Adapters

Small scripts that live in bin/ and bridge existing programs into the Wyoming protocol.

For example, a speech to text program (asr) that accepts a WAV file and outputs text can use asr_adapter_wav2text.py

Pipelines

Complete voice loop from microphone input (mic) to speaker output (snd). Stages are:

detect (optional)
- Wait until wake word is detected in mic
transcribe
- Listen until vad detects silence, then convert audio to text
recognize (optional)
- Recognize an intent from text
handle
- Handle an intent or text, producing a text response
speak
- Convert handle output text to speech, and speak through snd

Servers

Some programs take a while to load, so it's best to leave them running as a server. Use bin/server_run.py or add --server <domain> <name> when running the HTTP server.

See servers section of configuration.yaml file.

Supported Programs

mic
wake
vad
- silero
- webrtcvad
asr
handle
- home_assistant_conversation
tts
- piper
- mimic3
- larynx
- coqui-tts
- marytts
- flite
- festival
- espeak-ng
snd
- aplay
- gstreamer_udp

HTTP API

http://localhost:13331/<endpoint>

Unless overridden, the pipeline named "default" is used.

/pipeline/run
- Runs a full pipeline from mic to snd
- Produces JSON
- Override pipeline or:
  - wake_program
  - asr_program
  - intent_program
  - handle_program
  - tts_program
  - snd_program
- Skip stages with start_after
  - wake - skip detection, body is detection name (text)
  - asr - skip recording, body is transcript (text) or WAV audio
  - intent - skip recognition, body is intent/not-recognized event (JSON)
  - handle - skip handling, body is handle/not-handled event (JSON)
  - tts - skip synthesis, body is WAV audio
- Stop early with stop_after
  - wake - only detection
  - asr - detection and transcription
  - intent - detection, transcription, recognition
  - handle - detection, transcription, recognition, handling
  - tts - detection, transcription, recognition, handling, synthesis
/wake/detect
- Detect wake word in WAV input
- Produces JSON
- Override wake_program or pipeline
/asr/transcribe
- Transcribe audio from WAV input
- Produces JSON
- Override asr_program or pipeline
/intent/recognize
- Recognizes intent from text body (POST) or text (GET)
- Produces JSON
- Override intent_program or pipeline
/handle/handle
- Handles intent/text from body (POST) or input (GET)
- Content-Type must be application/json for intent input
- Override handle_program or pipeline
/tts/synthesize
- Synthesizes audio from text body (POST) or text (GET)
- Produces WAV audio
- Override tts_program or pipeline
/tts/speak
- Plays audio from text body (POST) or text (GET)
- Produces JSON
- Override tts_program, snd_program, or pipeline
/snd/play
- Plays WAV audio via snd
- Override snd_program or pipeline
/config
- Returns JSON config
/version
- Returns version info

WebSocket API

ws://localhost:13331/<endpoint>

Audio streams are raw PCM in binary messages.

Use the rate, width, and channels parameters for sample rate (hertz), width (bytes), and channel count. By default, input audio is 16Khz 16-bit mono, and output audio is 22Khz 16-bit mono.

The client can "end" the audio stream by sending an empty binary message.

/pipeline/asr-tts
- Run pipeline from asr (stream in) to tts (stream out)
- Produces JSON messages as events happen
- Override pipeline or:
  - asr_program
  - vad_program
  - handle_program
  - tts_program
- Use in_rate, in_width, in_channels for audio input format
- Use out_rate, out_width, out_channels for audio output format
/wake/detect
- Detect wake word from websocket audio stream
- Produces a JSON message when audio stream ends
- Override wake_program or pipeline
/asr/transcribe
- Transcribe a websocket audio stream
- Produces a JSON message when audio stream ends
- Override asr_program or pipeline
/snd/play
- Play a websocket audio stream
- Produces a JSON message when audio stream ends
- Override snd_program or pipeline

rhasspy3's People

Contributors

Stargazers

Watchers

rhasspy3's Issues

Satellite demo configuration missing snd program

Just had to add in the sound program to the given example to get it to work as expected. The pipeline is there, but not the program section.

satellite configuration

I guess I miss something.

On the server I have configured following:

cmd:
/root/rhasspy3/script/run /root/rhasspy3/bin/pipeline_run.py --loop --debug
config:

pipelines:
  default:
    mic:
      name: arecord
    wake:
      name: porcupine1
    vad:
      name: silero
    asr:
      name: faster-whisper.client
    handle:
      name: home_assistant

cmd:
/root/rhasspy3/script/run /root/rhasspy3/bin/server_run.py asr faster-whisper
config:

servers:
    faster-whisper:
      command: |
        script/server --language ${language} --device ${device} "${model}"
      template_args:
        language: "de"
        model: "${data_dir}/large-v2"
        device: "cuda"  # cpu or cuda

starting http_server with:

/root/rhasspy3/script/http_server --debug

On the satelite:

cmd:
/root/rhasspy3/script/run /root/rhasspy3/bin/satellite_run.py
config:

satellites:
  default:
    mic:
      name: arecord
    wake:
      name: porcupine1
    remote:
      name: websocket
    snd:
      name: aplay

  remote:
    websocket:
      command: |
        script/run "${uri}"
      template_args:
        uri: "ws://192.168.0.109:13331/pipeline/asr-tts"

"local" processing working fine

but satellite does not - debugoutput:

DEBUG:rhasspy3.core:Loading config from /root/rhasspy3/rhasspy3/configuration.yaml
DEBUG:rhasspy3.core:Skipping /root/rhasspy3/config/configuration.yaml
DEBUG:rhasspy3.program:mic_adapter_raw.py ['--samples-per-chunk', '1024', '--rate', '16000', '--width', '2', '--channels', '1', 'arecord -q -D pulse -r 16000 -c 1 -f S16_LE -t raw -']
DEBUG:rhasspy3.program:.venv/bin/python3 ['bin/porcupine_stream.py', '--model', '/root/rhasspy3/config/data/wake/porcupine1/resources/keyword_files_de/linux/ananas_linux.ppn', '--lang_model', '/root/rhasspy3/config/data/wake/porcupine1/lib/common/porcupine_params_de.pv']
DEBUG:rhasspy3.wake:detect: processing audio
DEBUG:rhasspy3.wake:detect: Detection(name='ananas_linux', timestamp=88896097256797)
DEBUG:rhasspy3.program:script/run ['ws://192.168.0.109:13331/pipeline/asr-tts']

After that nothing happes - have to kill the process.

Any ideas?

Stream to HA Assist Pipeline

I'm a bit lost -- I have installed the rhasspy3 hassio addon, which isn't quite usable yet as far as I can tell..
And if it was, I still wouldn't get to where I want, which is this;

Do wake on rpi satellite
Stream audio to home assistant server (I want to do stt using a cloud service or local wyoming whisper server).
Stream TTS audio back to the satellite.

This could be accomplished by doing what the existing remote/websocket does, except against HA`s WebSocket API.

I have to assume I'm kicking in open doors here - but after reading around both here and there, I couldn't find any mention about this..

I found this python library: https://pypi.org/project/wyoming-piper/ that lists this repo as source. But I can see is not the same repo but I guess is created by the owners of this repo (rhasspy project), so I post the issue here.

I think there should be a repo just for that library this way we could easily create PRs or fork it.

But anyway this issue is to add support for CUDA, piper supports it as you can see in the readme file (https://github.com/rhasspy/piper).
But needs to add the onnxruntime-gpu dependency and support the --cuda parameter.

Timeout or lockup after "rhasspy3.vad:segment: speaking ended" with longer reply

For reference I'm using the following pipeline (though the halting point seems to be something with or just after ASR processing)

mic:
  name: arecord
vad:
  name: silero
asr:
  name: faster-whisper.client
wake:
  name: porcupine1
handle:
  name: home_assistant
tts:
  name: piper.client
snd:
  name: aplay

In this pipeline, if I use tiny-int8 with whisper (running as a server), it will quickly return a response and the pipeline will continue after VAD but with relatively poor recognition

If I use any other model, the whisper portion takes slightly longer and the whole pipeline sticks at "Speaking Ended", i.e. per the debug

DEBUG:rhasspy3.core:Loading config from /home/rhasspy/rhasspy3/rhasspy3/configuration.yaml
DEBUG:rhasspy3.core:Loading config from /home/rhasspy/rhasspy3/config/configuration.yaml
DEBUG:rhasspy3.program:mic_adapter_raw.py ['--rate', '16000', '--width', '2', '--channels', '1', 'arecord -q -r 16000 -D plughw:CARD=Device -c 1 -f S16_LE -t raw -']
DEBUG:rhasspy3.program:client_unix_socket.py ['var/run/faster-whisper.socket']
DEBUG:rhasspy3.program:.venv/bin/python3 ['bin/porcupine_stream.py', '--model', 'jarvis_raspberry-pi.ppn']
DEBUG:rhasspy3.wake:detect: processing audio
DEBUG:rhasspy3.wake:detect: Detection(name='jarvis_raspberry-pi', timestamp=8540921121325)
DEBUG:rhasspy3.program:vad_adapter_raw.py ['--rate', '16000', '--width', '2', '--channels', '1', '--samples-per-chunk', '512', 'script/speech_prob "share/silero_vad.onnx"']
DEBUG:rhasspy3.vad:segment: processing audio
DEBUG:rhasspy3.vad:segment: speaking started
DEBUG:rhasspy3.vad:segment: speaking ended

The best I can tell, the slightly additional delay in response from the whisper server is causing something to lock up after VAD and never pass things on to the next part of the pipeline. I suspect there is some sort of timeout with the connection to the whisper server which causes it to never get the response and thus not move one. It doesn't seejm to matter which VAD I actually use (both silero and webrtcvad similarly get stuck) so it's probably in the processing between overall VAD and STT functionality.
From what I can see in "top" I'm not running into some sort of memory-limit that causes the crash, as while there is a bit of a CPU spike during STT I've still got mem free

Hardware: Raspberry Pi 4, 2GB

Two small issues in the VAD->ASR pipeline processing

Hello again. I've been playing with Rhasspy3 over the course of last several days and I found few small issues.

For a beginning, here's my pipeline, pretty standard, all the programs settings are default as per your tutorial. The only small diff here is that I swapped whisper for vosk to try it out some additional ASR models:

pipelines:
  default:
    mic:
      name: arecord
    wake:
      name: porcupine1
    vad:
      name: silero
    asr:
      name: vosk.client
    handle:
      name: repeat
    tts:
      name: piper.client
    snd:
      name: aplay

Issue number one - the silero-vad sensitivity? other VADs?

When the wake step is activated (by me saying a wake word to microphone), it sometimes happens that VAD (silero) is unable to capture me speaking, so the pipeline is keep hanging on that step unit the timeout from VAD. It could be because my microphone is noisy. Or I was just speaking too quiet. Or the default threshold of silero-vad is just a bit too low for my setup.

I guess, I'm not the only one who'll be playing with Rhasspy3 while using questionable quality microphones, with a constant background humming tone heard on records from such microphones :) so I was trying to find some ways to configure silero-vad to make it more sensitive but it looks like there're no such settings exposed to configuration.yaml right now. So this one is more like a gentle low priority feature request. In my particular case, I think I should just invest some bucks into some more decent mic compared to what I have now, prob an electret one.

Then I tried to swap silero-vad with something different. There are two other VADs Rhasspy3 is shipped with: energy and webrtcvad and I seen both of them are having some kind of sensitivity configurable in the configuration. I used similar steps as in your tutorial to get them installed, but as soon as I plug them into pipeline it starts to spill error messages on me. It looks like they're not quite ready, right? Or was it just me doing something wrong? I'm happy to dig more into that myself, if you tell me that both energy and webrtcvad are working out of the box.

Issue number two - even if VAD wasn't triggered, the captured PCM is still sent downwards to the pipe to ASR

Consider this scenario:

saying a wakeword loud
seeing the pipeline went to the point of VAD
silently whispering something, without triggering VAD
seeing VAD is not being activated
after a timeout from VAD (something like 10-15 seconds I guess), whatever audio was captured by microphone is still sent to ASR and then the recognized text is sent to HANDLER.

The issue, as I see it, in the point 5. If VAD is presenting in the pipeline configuration (it is mandatory right now I guess), but it wasn't triggered for whatever reason, after the timeout happens the pipeline shouldn't be sending captured audio down to ASR, because otherwise - what's the point of having VAD here? :) If someone still needs this kind of behavior to send PCM to ASR even without VAD being triggered, I guess it can be made configurable from the configuration.yaml perspective.

That's it so far. Again - great piece of software!

satellite communication issue

it looks like some additional logic needed to handle scenario when base station sends tts result back to satellite and close connection.

ERROR:satellite_run:Unexpected error communicating with remote base station
Traceback (most recent call last):
  File "/home/pi/rhasspy3/bin/satellite_run.py", line 131, in main
    await async_write_event(mic_event, remote_proc.stdin)
  File "/home/pi/rhasspy3/rhasspy3/event.py", line 77, in async_write_event
    await writer.drain()
  File "/usr/lib/python3.9/asyncio/streams.py", line 387, in drain
    await self._protocol._drain_helper()
  File "/usr/lib/python3.9/asyncio/streams.py", line 197, in _drain_helper
    await waiter
BrokenPipeError

Simple code below will fix this. but im not sure that this is the right way to handle this

try
     await async_write_event(mic_event, remote_proc.stdin) 
except ..

faster-whisper cuda over cpu

is it possible to set cuda instead of cpu usage for faster-whisper shomwhere?

Open transcription seems performing worse than rhasspy2's kaldi closed transcription

Also whisper is a lot better at detecting random text quick it seems to perform a lot worse at detecting the intent compared to the closed trained transcription that we had in rhasspy 2.
I think the closed transcription also helped a lot with fuzzy matching i.e. when people used slightly different words or when other people are talking in the background. I feel like as long as there is no smart NLU that can match the spoken text on the intent this approach might be not fit enough to be useful for interfacing with home automation. I see value in having a good open transcription and I am currently thinking how this could be combined with the preciseness of the old system.

suggestion: simultaneous tts and snd

rhasspy3/pipeline.py supports simultaneous execution of mic+vad+asr processes, which is really cool,
But the tts and snd are sequential, which could be improved. Piper, for example, is able to generate the audio at 200% speed on Raspberry Pi 4.
We already have all the pieces for that, probably we just need to connect them: .snd.play_stream and .tts.synthesize_stream.

Confusing line in configuration.yaml in tutorial

There is a part in the tutorial where faster-whisper is set up

asr:
    faster-whisper: ...
    faster-whisper.client:
      command: |
        client_unix_socket.py var/run/faster-whisper.socket

The second line is confusing (at least for me). I was assuming I should keep previous created entry forfaster-whisper: ... and add another for faster-whisper.client. Which apparently was wrong.

another issue I have had is in piper setup.

template_args:
        model: "${data_dir}/en-us-blizzard_lessac-medium.onnx"

I don't know why, but my file name was different: en-us-lessac-low.onnx . I spend some time trying to understand why am I getting an error when running the script

Request: missing SDIST on Pypi or matching tag for 1.4.0 on Github, for easier Integration in Gentoo Overlay

Dear Mike,

I hope this message finds you well. I'm reaching out to express my appreciation for your library and also to discuss some improvements that could enhance its compatibility with Gentoo Overlay for Home Assistant.

Firstly, I am utilizing your library within my Gentoo Overlay for Home Assistant, and I hope you don't mind.

To streamline the integration process, I have a few suggestions:

Release File in SDIST format:
In order to optimize the usage of Gentoo's mirror system and improve download statistics, it would be highly beneficial to have a release file available either on PyPI in SDIST format or as a tar.gz file on GitHub. You did this until 1.2.0. While I understand that GitHub releases are convenient, a PyPI SDIST tar.gz release would be preferable. This dual availability ensures flexibility in case of issues, as other integrations in the repository generally provide both options.
Tag for Version 1.4.0:
I kindly request you to add a tag for version 1.4.0. This addition not only aids GitHub in generating a usable source archive but also ensures consistency between GitHub and PyPI releases. Aligning the GitHub release points with the commit used for the PyPI release facilitates a seamless experience for users and maintainers alike.
Comprehensive SDIST Build for PyPI:
When creating an SDIST for PyPI, please ensure that all necessary files for building are included. A common oversight is missing files such as requirements.txt or README.md, which are referenced by setup.py during the compilation process. Ensuring the completeness of the SDIST will contribute to a smoother integration process for all distibutions using it.

I understand and appreciate the effort you invest in maintaining the library, and these suggestions are aimed at improving its accessibility for a wider user base. Your consideration of these enhancements would be highly valued.

Thank you for your dedication and hard work. Looking forward to continued collaboration.

SileroVAD fails sometimes with "BrokenPipeError: [Errno 32] Broken pipe"

Hello there,

I was following your tutorial, up to the point of testing the Silero VAD - "Voice Activity Detection" chapter.

I noticed that sometimes (in my case like in 30% of cases) it throws an exception while recording sound activity like this:

(.venv) root@085dd18c329f:~/rhasspy3# script/run bin/mic_record_sample.py sample.wav
INFO:mic_record_sample:Recording sample.wav
INFO:mic_record_sample:Speaking started
INFO:mic_record_sample:Speaking ended
Traceback (most recent call last):
  File "/root/rhasspy3/config/programs/vad/silero/bin/silero_speech_prob.py", line 89, in <module>
    main()
  File "/root/rhasspy3/config/programs/vad/silero/bin/silero_speech_prob.py", line 34, in main
    print(speech_probability, flush=True)
BrokenPipeError: [Errno 32] Broken pipe

However when it happens it still manages to capture something to sample.wav

If it matters, I'm playing with Rhasspy3 in a ubuntu:latest docker container (22.04.2), for sound inside of the container I'm using ALSA emulation of pipewire client installed in the container. The Pipewire client is connected to my host Pipewire server via socket file. All ALSA native utilities inside of the container (aplay / arecord) are working just fine - I'm pretty sure the sound setup is a solid one.

Adding whisper instead of faster-whisper as stt

Hi, I would like to use whisper as stt module, as i want to try a specific model.
Couls someone explain me how to configure the configuration.yaml?
I install whisper,
and without any further modifications of configuration.yaml, when I try to run script/http_server --debug --server asr whisper --server tts piper

I get
DEBUG:rhasspy3.core:Loading config from /home/marc/rhasspy3/rhasspy3/configuration.yaml DEBUG:rhasspy3.core:Loading config from /home/marc/rhasspy3/config/configuration.yaml DEBUG:rhasspy:['server_run.py', '--config', '/home/marc/rhasspy3/config', 'asr', 'whisper'] INFO:rhasspy:Starting asr whisper DEBUG:rhasspy:['server_run.py', '--config', '/home/marc/rhasspy3/config', 'tts', 'piper'] INFO:rhasspy:Starting tts piper DEBUG:asyncio:Using selector: EpollSelector [2024-01-18 11:51:05 +0100] [114103] [INFO] Running on http://0.0.0.0:13331 (CTRL + C to quit) INFO:hypercorn.error:Running on http://0.0.0.0:13331 (CTRL + C to quit) INFO:piper_server:Ready Load time: 0.484693 sec Output directory: "/tmp/tmpcbokrp99" usage: whisper_server.py [-h] [--language LANGUAGE] [--device {cpu,cuda}] --socketfile SOCKETFILE [--debug] model whisper_server.py: error: unrecognized arguments: --model-directory tiny.en ERROR:rhasspy:Unexpected error running server: domain=asr, name=whisper Traceback (most recent call last): File "/home/marc/rhasspy3/rhasspy3_http_api/__main__.py", line 159, in run_server subprocess.run(command, check=True, cwd=rhasspy.base_dir, env=env) File "/usr/lib/python3.11/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['server_run.py', '--config', '/home/marc/rhasspy3/config', 'asr', 'whisper']' returned non-zero exit status 2.

The `wyoming-v1` branch

Hi @synesthesiam . Is the master branch maintained? What remains from the wyoming-v1 branch to be merged to master? Do you need any help with the project?

Multi-Channel Audio Input Issues

Description

I've encountered an issue with Rhasspy 3 when using multi-channel audio input, specifically when more than one channel is used (e.g., arecord -c 4). In these cases, the wake word detection becomes unresponsive, and the application seems to hang during audio processing.

Steps to Reproduce

Configure Rhasspy 3 to use an audio input source with multiple channels (e.g., using arecord with -c 4 for four channels).
Attempt to trigger the wake word.

Expected Behavior

Rhasspy should process multi-channel audio inputs correctly, either by internally converting them to a single channel for processing or by handling multi-channel data without issues.

Actual Behavior

When multi-channel audio is used:

The wake word detection becomes unresponsive.
The application hangs or delays significantly during audio processing.

Questions and Concerns

Is this behavior expected due to current limitations in Rhasspy 3's handling of multi-channel audio?
If it's an unintended issue, what might be the best approach to resolve it? For instance, should there be an internal mechanism to convert multi-channel input to mono before processing, or should Rhasspy be enhanced to handle multi-channel audio natively?
Any guidance or suggestions on how to configure Rhasspy 3 for multi-channel audio inputs would be greatly appreciated.

Additional Context

The issue seems to revolve around how Rhasspy 3 interacts with multi-channel audio inputs and its impact on subsequent processing stages, particularly wake word detection.

Any insights or assistance on this matter would be highly valuable.

Thank you for your attention to this issue.

[Question] Faster-Whisper Home Assistant GPU or Tensor (Coral) Suport?

I host the Faster-Whisper Docker container on my server. I did not see if it can use other hardware to speed things up?

Thanks!

Small issue in tutorial

Keep following your tutorial, it's like reading a novel of your favorite author. Found a small issue there, I guess:

Instead of

echo 'What time is it?' | script/run bin/handle_handle.py --debug

there should be

echo 'what time is it?' | script/run bin/handle_text.py --debug

"Illegal Instruction" encountered trying to start wyoming-whisper

Apologies if this is not the forum or if it is answered elsewhere.. But, just trying to pull someone else's docker compose with wyoming-whisper in it, and encountering a python error on startup:

/run.sh: line 5: 7 Illegal instruction python3 -m wyoming_faster_whisper --uri 'tcp://0.0.0.0:10300' --data-dir /data --download-dir /data "$@"

I'm assuming it's because of the underlying system Docker is running on? An old MSI desktop with an AMD 64 cpu running Win10 and Docker Community.

Any help, or starting points to getting the container running?

Docker compose file is largely pulled from somewhere else, and I doubt is the cause.

version: "3.9"
services:
  piper:
    container_name: piper
    image: rhasspy/wyoming-piper
    ports:
      - '10200:10200'
    volumes:
      - '/media/storage/piper/data:/data'
    command: --voice en-gb-southern_english_female-low
   
  whisper:
   container_name: whisper
   image: rhasspy/wyoming-whisper
   ports:
     - '10300:10300'
   volumes:
     - '/media/storage/whisper/data:/data'
   command: --model tiny-int8 --language en

Satellite: stuck at script/run ['ws://<serverip>:13331/pipeline/asr-tts']

Hello to all,

first of all thank you for this awesome work!!

This is what I have achieved:

worked through the whole tutorial and the whole pipeline is working
at the server I started two processes for being ready for clients:

./script/http_server --debug --server asr faster-whisper --server tts piper &
curl -X POST 'localhost:13331/pipeline/run' (in a while loop)

Problem:
Wake word is working at satellite, but it gets stuck at:

DEBUG:rhasspy3.core:Loading config from /home/pi/rhasspy3/rhasspy3/configuration.yaml DEBUG:rhasspy3.core:Loading config from /home/pi/rhasspy3/config/configuration.yaml DEBUG:rhasspy3.program:mic_adapter_raw.py ['--samples-per-chunk', '1024', '--rate', '16000', '--width', '2', '--channels', '1', 'arecord -q -r 16000 -c 1 -f S16_LE -t raw -D plughw:CARD=Mini,DEV=0 -'] DEBUG:rhasspy3.program:.venv/bin/python3 ['bin/porcupine_stream.py', '--model', 'jarvis_raspberry-pi.ppn'] DEBUG:rhasspy3.wake:detect: processing audio DEBUG:rhasspy3.wake:detect: Detection(name='jarvis_raspberry-pi', timestamp=1239407741987) DEBUG:rhasspy3.program:script/run ['ws://10.10.10.5:13331/pipeline/asr-tts']

-> From there nothing is happening any more. Any ideas?

TTS appears to not output if the text is too long

Hello,
In testing out rhasspy with a few commands via Home Assistant and the addon, I noticed that if the text is long TTS is not outputting replies, though the log shows the full reply and that it was sent to TTS:
DEBUG:rhasspy3.program:client_unix_socket.py [‘var/run/faster-whisper.socket’] DEBUG:rhasspy3.program:vad_adapter_raw.py [‘–rate’, ‘16000’, ‘–width’, ‘2’, ‘–channels’, ‘1’, ‘–samples-per-chunk’, ‘512’, ‘script/speech_prob “share/silero_vad.onnx”’] DEBUG:rhasspy3_http_api.pipeline:stream-to-stream: voice started DEBUG:rhasspy3_http_api.pipeline:stream-to-stream: voice stopped INFO:faster_whisper_server: What’s the weather like? DEBUG:rhasspy3_http_api.pipeline:stream-to-stream: asr=Transcript(text=" What’s the weather like?“) DEBUG:rhasspy3.program:handle_adapter_text.py [‘bin/converse.py --language “” “http://supervisor/core/api/conversation/process” “/app/config/data/handle/home_assistant/token”’] DEBUG:rhasspy3.handle:handle: input=Transcript(text=” What’s the weather like?") DEBUG:rhasspy3.handle:handle: Handled(text=‘Currently the weather is sunny, with a temperature of 43 degrees. Under present weather conditions the temperature feels like 36 degrees. In the next few hours the weather will be more of the same, with a temperature of 43 degrees.’) DEBUG:rhasspy3_http_api.pipeline:stream-to-stream: handle=Handled(text=‘Currently the weather is sunny, with a temperature of 43 degrees. Under present weather conditions the temperature feels like 36 degrees. In the next few hours the weather will be more of the same, with a temperature of 43 degrees.’) DEBUG:rhasspy3_http_api.pipeline:stream-to-stream: sending tts DEBUG:rhasspy3.program:client_unix_socket.py [‘var/run/larynx2.socket’] Real-time factor: 0.107471 (infer=1.27417 sec, audio=11.856 sec) DEBUG:rhasspy3_http_api.pipeline:stream-to-stream: tts done

Not sure if this is an issue with the websocket/larynx, or just the browser not handling the longer stream correctly (was testing the satellite via browser on the addon web UI interface).

In further testing I also noticed that after the request, regardless of TTS length, my dev console gets flooded with this message continuously:

Testing other commands (such as What time is it or What is the date) worked fine and outputted TTS audio.

Thanks!

Suggestion: Allow handle program to initiate conversations

Currently the pipeline only starts the handle program after user speech. However there are use cases such as:

Setting up a reminder
Timer
Multiple responses from the handle program for a single request, such as "doing it now, please hold on...", then "Done."

I suggest two options:

Start the handle program in advance and keep it running
Add a new domain (i.e. agent) that is able to initiate conversations on behalf of the user, but I am not sure about the communication protocol.

Custom wake word support

Are custom wake words supported like rhasspy2? I followed the tutorial for porcupine and tested with a few words, then generated a ppm and placed it under keyword_files. It shows up with config/programs/wake/porcupine1/script/list_models, but crashes on load (eg script/run bin/wake_detect.py) with a JSONDecodeError:

DEBUG:rhasspy3.core:Loading config from /app/rhasspy3/configuration.yaml
DEBUG:rhasspy3.core:Skipping /app/config/configuration.yaml
DEBUG:wake_detect:mic program: PipelineProgramConfig(name='arecord', template_args=None, after=None)
DEBUG:wake_detect:wake program: PipelineProgramConfig(name='porcupine1', template_args=None, after=None)
DEBUG:rhasspy3.program:mic_adapter_raw.py ['--samples-per-chunk', '1024', '--rate', '16000', '--width', '2', '--channels', '1', 'arecord -q -D "default" -r 16000 -c 1 -f S16_LE -t raw -']
DEBUG:wake_detect:Detecting wake word
DEBUG:rhasspy3.program:python3 ['bin/porcupine_stream.py', '--model', 'Professor_en_linux_v2_2_0.ppn']
Traceback (most recent call last):
  File "/app/config/programs/wake/porcupine1/bin/porcupine_stream.py", line 71, in <module>
    main()
  File "/app/config/programs/wake/porcupine1/bin/porcupine_stream.py", line 25, in main
    porcupine, names = load_porcupine(args)
  File "/app/config/programs/wake/porcupine1/bin/porcupine_shared.py", line 57, in load_porcupine
    Traceback (most recent call last):
  File "/app/bin/wake_detect.py", line 80, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/app/bin/wake_detect.py", line 69, in main
    detection = await detect(rhasspy, wake_program, mic_proc.stdout)
  File "/app/rhasspy3/wake.py", line 109, in detect
    wake_event = wake_task.result()
  File "/app/rhasspy3/event.py", line 48, in async_read_event
    event_dict = json.loads(json_line)
  File "/usr/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Pipeline that sends a feeback message after the wake word

Hi, I used the old rhasspy 2.5 and liked the fact that after detecting the wake word, i got an audio feedback.
I would like to see this feature in the test pipeline. I am not yet understanding everything enough to implement it by myself.
So the objectif would be to have a pipeline that between the wake word and the intent handling, would send an audio feddback, in form of a preregistered wave file.
The best of all worlds would be ta have a tutorial about to achieve this.

Pipeline seems to randomly hang after loading the wav2txt model

I'd like to get more debugging information on this but the last line (on a full pipeline running locally) is INFO:

DEBUG:rhasspy3.core:Loading config from /home/doc/rhasspy3/rhasspy3/configuration.yaml
DEBUG:rhasspy3.core:Loading config from /home/doc/rhasspy3/config/configuration.yaml
DEBUG:rhasspy3.program:mic_adapter_raw.py ['--samples-per-chunk', '1024', '--rate', '16000', '--width', '2', '--channels', '1', 'arecord -q -D "default" -r 16000 -c 1 -f S16_LE -t raw -']
DEBUG:rhasspy3.program:asr_adapter_wav2text.py ['script/wav2text --language en "/home/doc/rhasspy3/config/data/asr/faster-whisper/tiny-int8" "{wav_file}"']
DEBUG:rhasspy3.program:.venv/bin/python3 ['bin/porcupine_stream.py', '--model', 'grasshopper_linux.ppn']
DEBUG:rhasspy3.wake:detect: processing audio
DEBUG:rhasspy3.wake:detect: Detection(name='grasshopper_linux', timestamp=17743906487598)
DEBUG:rhasspy3.program:vad_adapter_raw.py ['--rate', '16000', '--width', '2', '--channels', '1', '--samples-per-chunk', '512', 'script/speech_prob "share/silero_vad.onnx"']
DEBUG:rhasspy3.vad:segment: processing audio
DEBUG:rhasspy3.vad:segment: speaking started
DEBUG:rhasspy3.vad:segment: speaking ended
INFO:faster_whisper_wav2text:Model loaded

This is my configuration.yaml, largely copied from the default, with modifications for testing and voice selection:

programs:

  # -----------
  # Audio input
  # -----------
  mic:

    # apt-get install alsa-utils
    arecord:
      command: |
        arecord -q -D "${device}" -r 16000 -c 1 -f S16_LE -t raw -
      adapter: |
        mic_adapter_raw.py --samples-per-chunk 1024 --rate 16000 --width 2 --channels 1
      template_args:
        device: "default"

    # https://people.csail.mit.edu/hubert/pyaudio/docs/
    pyaudio:
      command: |
        script/events

    # https://python-sounddevice.readthedocs.io
    sounddevice:
      command: |
        script/events

    # apt-get install gstreamer1.0-tools gstreamer1.0-plugins-base
    gstreamer_udp:
      command: |
        gst-launch-1.0 -v udpsrc address=${address} port=${port} ! rawaudioparse use-sink-caps=false format=pcm pcm-format=${format} sample-rate=${rate} num-channels=${channels} ! audioconvert ! audioresample ! volume volume=3.0 ! level ! fdsink fd=1 sync=false
      template_args:
        format: s16le
        rate: 16000
        channels: 1
        address: "0.0.0.0"
        port: 5000
      adapter: |
        mic_adapter_raw.py --samples-per-chunk 1024 --rate 16000 --width 2 --channels 1

    udp_raw:
      command: |
        bin/udp_raw.py --host ${host} --port ${port}
      template_args:
        host: 0.0.0.0
        port: 5000

  # -------------------
  # Wake word detection
  # -------------------
  wake:

    # https://github.com/Picovoice/porcupine
    # Models: see script/list_models
    porcupine1:
      command: |
        .venv/bin/python3 bin/porcupine_stream.py --model "${model}"
      template_args:
        model: "grasshopper_linux.ppn"

    # https://github.com/Kitt-AI/snowboy
    # Models included in share/
    # Custom wake word: https://github.com/rhasspy/snowboy-seasalt
    snowboy:
      command: |
        .venv/bin/python3 bin/snowboy_raw_text.py --model "${model}"
      adapter: |
        wake_adapter_raw.py
      template_args:
        model: "share/snowboy.umdl"

    # https://github.com/mycroftAI/mycroft-precise
    # Model included in share/
    precise-lite:
      command: |
        .venv/bin/python3 bin/precise.py "${model}"
      adapter: |
        wake_adapter_raw.py
      template_args:
        model: "share/hey_mycroft.tflite"

    # TODO: snowman
    # https://github.com/Thalhammer/snowman/

  # ------------------------
  # Voice activity detection
  # ------------------------
  vad:

    # https://github.com/snakers4/silero-vad
    # Model included in share/
    silero:
      command: |
        script/speech_prob "${model}"
      adapter: |
        vad_adapter_raw.py --rate 16000 --width 2 --channels 1 --samples-per-chunk 512
      template_args:
        model: "share/silero_vad.onnx"

    # https://pypi.org/project/webrtcvad/
    webrtcvad:
      command: |
        script/speech_prob ${sensitivity}
      adapter: |
        vad_adapter_raw.py --rate 16000 --width 2 --channels 1 --samples-per-chunk 480
      template_args:
        sensitivity: 3

    # Uses rms energy threshold.
    # For testing only.
    energy:
      command: |
        bin/energy_speech_prob.py --threshold ${threshold} --width 2 --samples-per-chunk 1024
      adapter: |
        vad_adapter_raw.py --rate 16000 --width 2 --channels 1 --samples-per-chunk 1024
      template_args:
        threshold: 300

  # --------------
  # Speech to text
  # --------------
  asr:

    # https://alphacephei.com/vosk/
    # Models: https://alphacephei.com/vosk/models
    vosk:
      command: |
        script/raw2text "${model}"
      adapter: |
        asr_adapter_raw2text.py --rate 16000 --width 2 --channels 1
      template_args:
        model: "${data_dir}/vosk-model-small-en-us-0.15"

    # Run server: asr vosk
    vosk.client:
      command: |
        client_unix_socket.py var/run/vosk.socket

    # https://stt.readthedocs.io
    # Models: https://coqui.ai/models/
    coqui-stt:
      command: |
        script/raw2text "${model}"
      adapter: |
        asr_adapter_raw2text.py --rate 16000 --width 2 --channels 1
      template_args:
        model: "${data_dir}/english_v1.0.0-large-vocab"

    # Run server: asr coqui-stt
    coqui-stt.client:
      command: |
        client_unix_socket.py var/run/coqui-stt.socket

    # https://github.com/cmusphinx/pocketsphinx
    # Models: https://github.com/synesthesiam/voice2json-profiles
    pocketsphinx:
      command: |
        script/raw2text "${model}"
      adapter: |
        asr_adapter_raw2text.py --rate 16000 --width 2 --channels 1
      template_args:
        model: "${data_dir}/en-us_pocketsphinx-cmu"

    # Run server: asr pocketsphinx
    pocketsphinx.client:
      command: |
        client_unix_socket.py var/run/pocketsphinx.socket

    # https://github.com/openai/whisper
    # Models: tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large
    # Languages: af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,
    # eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,
    # ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,
    # pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,
    # ur,uz,vi,yi,yo,zh
    whisper:
      command: |
        script/wav2text --language ${language} --model-directory "${data_dir}" "${model}" "{wav_file}"
      adapter: |
        asr_adapter_wav2text.py
      template_args:
        language: "en"
        model: "tiny.en"

    # Run server: asr whisper
    whisper.client:
      command: |
        client_unix_socket.py var/run/whisper.socket

    # https://github.com/ggerganov/whisper.cpp/
    # Models: https://huggingface.co/datasets/ggerganov/whisper.cpp
    whisper-cpp:
      command: |
        script/wav2text "${model}" "{wav_file}"
      adapter: |
        asr_adapter_wav2text.py
      template_args:
        model: "${data_dir}/ggml-tiny.en.bin"

    # Run server: asr whisper-cpp
    whisper-cpp.client:
      command: |
        client_unix_socket.py var/run/whisper-cpp.socket

    # https://github.com/guillaumekln/faster-whisper/
    # Models: https://github.com/rhasspy/models/releases/tag/v1.0
    # (asr_faster-whisper-*)
    faster-whisper:
      command: |
        script/wav2text --language ${language} "${model}" "{wav_file}"
      adapter: |
        asr_adapter_wav2text.py
      template_args:
        model: "${data_dir}/tiny-int8"
        language: "en"

    # Run server: asr faster-whisper
    faster-whisper.client:
      command: |
        client_unix_socket.py var/run/faster-whisper.socket


  # --------------
  # Text to speech
  # --------------
  tts:

    # https://github.com/rhasspy/piper/
    # Models: https://github.com/piper/piper/releases/tag/v0.0.2
    piper:
      command: |
        bin/piper --model "${model}" --output_file -
      adapter: |
        tts_adapter_text2wav.py
      template_args:
        model: "${data_dir}/en-us-libritts-high.onnx"
      install:
        command: |
          script/setup.py --destination '${program_dir}/bin'
        check_file: "${program_dir}/bin/piper"
        download:
          command: |
            script/download.py --destination '${data_dir}' '${model}'
        downloads:
          en-us_libritts:
            description: "U.S. English voice"
            check_file: "${data_dir}/en-us-libritts-high.onnx"


    # Run server: tts piper
    piper.client:
      command: |
        client_unix_socket.py var/run/piper.socket

    # https://github.com/rhasspy/larynx/
    # Models: https://rhasspy.github.io/larynx/
    larynx:
      command: |
        .venv/bin/larynx --voices-dir "${data_dir}" --voice "${voice}"
      adapter: |
        tts_adapter_text2wav.py
      template_args:
        voice: "en-us"

    # Run server: tts larynx
    larynx.client:
      command: |
        bin/larynx_client.py ${url} ${voice}
      template_args:
        url: "http://localhost:5002/process"
        voice: "en-us"
      adapter: |
        tts_adapter_text2wav.py

    # https://github.com/espeak-ng/espeak-ng/
    # apt-get install espeak-ng
    espeak-ng:
      command: |
        espeak-ng -v "${voice}" --stdin -w "{temp_file}"
      adapter: |
        tts_adapter_text2wav.py --temp_file
      template_args:
        voice: "en-us"

    # http://www.festvox.org/flite/
    # Models: https://github.com/rhasspy/models/releases/tag/v1.0
    # (tts_flite-*)
    flite:
      command: |
        flite -voice "${voice}" -o "{temp_file}"
      template_args:
        voice: "cmu_us_slt"
      adapter: |
        tts_adapter_text2wav.py --temp_file

    # http://www.cstr.ed.ac.uk/projects/festival/
    # apt-get install festival festival-<lang>
    festival:
      command: |
        text2wave -o "{temp_file}" -eval "(voice_${voice})"
      template_args:
        voice: "cmu_us_slt_arctic_hts"
      adapter: |
        tts_adapter_text2wav.py --temp_file

    # https://tts.readthedocs.io
    # Models: see script/list_models
    coqui-tts:
      command: |
        .venv/bin/tts --model_name "${model}" --out_path "{temp_file}" --text "{text}"
      adapter: |
        tts_adapter_text2wav.py --temp_file --text
      template_args:
        model: "tts_models/en/ljspeech/vits"
        speaker_id: ""

    coqui-tts.client:
      command: |
        tts_adapter_http.py "${url}" --param speaker_id "${speaker_id}"
      template_args:
        url: "http://localhost:5002/api/tts"
        speaker_id: ""

    # http://mary.dfki.de/
    # Models: https://github.com/synesthesiam/opentts/releases/tag/v2.1
    # (marytts-voices.tar.gz)
    marytts:
      command: |
        bin/marytts.py "${url}" "${voice}"
      template_args:
        url: "http://localhost:59125/process"
        voice: "cmu-slt-hsmm"
      adapter: |
        tts_adapter_text2wav.py

    # https://github.com/mycroftAI/mimic3
    # Models: https://mycroftai.github.io/mimic3-voices/
    mimic3:
      command: |
        .venv/bin/mimic3 --voices-dir "${data_dir}" --voice "${voice}" --stdout
      adapter: |
        tts_adapter_text2wav.py
      template_args:
        voice: "apope"

    # Run server: tts mimic3
    mimic3.client:
      command: |
        client_unix_socket.py var/run/mimic3.socket

  # ------------------
  # Intent recognition
  # ------------------
  intent:

    # Simple regex matching
    regex:
      command: |
        bin/regex.py -i TurnOn "turn on (the )?(?P<name>.+)"

    # TODO: fsticuffs
    # https://github.com/rhasspy/rhasspy-nlu

    # TODO: hassil
    # https://github.com/home-assistant/hassil

    # TODO: rapidfuzz
    # https://github.com/rhasspy/rhasspy-fuzzywuzzy

    # TODO: snips-nlu
    # https://snips-nlu.readthedocs.io

  # ---------------
  # Intent Handling
  # ---------------
  handle:

    # Text only: repeats transcript back
    repeat:
      command: |
        cat
      shell: true
      adapter: |
        handle_adapter_text.py

    # Text only: send to HA Assist
    # https://www.home-assistant.io/docs/assist
    #
    # 1. Change server url
    # 2. Put long-lived access token in etc/token
    home_assistant:
      command: |
        bin/converse.py --language "${language}" "${url}" "${token_file}"
      adapter: |
        handle_adapter_text.py
      template_args:
        url: "http://10.0.0.153:8123/api/conversation/process"
        token_file: "${data_dir}/token"
        language: "en"

    # Intent only: answer English date/time requests
    date_time:
      command: |
        bin/date_time.py
      adapter: |
        handle_adapter_text.py

    # Intent only: produces canned response to regex intent system
    test:
      command: |
        name="$(jq -r .slots.name)"
        echo "Turned on ${name}."
      shell: true
      adapter: |
        handle_adapter_json.py

  # ------------
  # Audio output
  # ------------
  snd:

    # apt-get install alsa-utils
    aplay:
      command: |
        aplay -q -D "${device}" -r 22050 -f S16_LE -c 1 -t raw
      adapter: |
        snd_adapter_raw.py --rate 22050 --width 2 --channels 1
      template_args:
        device: "default"

    udp_raw:
      command: |
        bin/udp_raw.py --host "${host}" --port ${port}
      template_args:
        host: "127.0.0.1"
        port: 5001

  # -------------------
  # Remote base station
  # -------------------
  remote:

    # Sample tool to communicate with websocket API.
    # Use rhasspy3/bin/satellite_run.py
    websocket:
      command: |
        script/run "${uri}"
      template_args:
        uri: "ws://localhost:13331/pipeline/asr-tts"

# -----------------------------------------------------------------------------

servers:
  asr:
    vosk:
      command: |
        script/server "${model}"
      template_args:
        model: "${data_dir}/vosk-model-small-en-us-0.15"

    coqui-stt:
      command: |
        script/server "${model}"
      template_args:
        model: "${data_dir}/english_v1.0.0-large-vocab"

    pocketsphinx:
      command: |
        script/server "${model}"
      template_args:
        model: "${data_dir}/en-us_pocketsphinx-cmu"

    whisper:
      command: |
        script/server --model-directory "${data_dir}" --language ${language} --device ${device} ${model}
      template_args:
        language: "en"
        model: "tiny.en"
        device: "cpu"  # or cuda

    whisper-cpp:
      command: |
        script/server "${model}"
      template_args:
        model: "${data_dir}/ggml-tiny.en.bin"

    faster-whisper:
      command: |
        script/server --language ${language} --device ${device} "${model}"
      template_args:
        language: "en"
        model: "${data_dir}/tiny-int8"
        device: "cpu"  # or cuda

  tts:
    mimic3:
      command: |
        script/server --voice "${voice}" "${data_dir}"
      template_args:
        voice: "en_US/ljspeech_low"

    piper:
      command: |
        script/server "${model}"
      template_args:
        model: "${data_dir}/en-us-libritts-high.onnx"

    larynx:
      command: |
        script/server --voices-dir "${data_dir}" --host "${host}"
      template_args:
        host: "127.0.0.1"

    coqui-tts:
      command: |
        script/server


# -----------------------------------------------------------------------------

# Example satellites
# satellites:
#   default:
#     mic:
#       name: arecord
#     wake:
#       name: porcupine1
#     remote:
#       name: websocket
#     snd:
#       name: aplay

# -----------------------------------------------------------------------------

# Example pipelines
pipelines:

  # English (default)
  default:
    mic:
      name: arecord
    wake:
      name: porcupine1
    vad:
      name: silero
    asr:
      name: faster-whisper
    # intent:
    #   name: regex
    handle:
      name: repeat
    #  name: home_assistant
    tts:
      name: piper
    snd:
      name: aplay

  # # German
  # de:
  #   inherit: default
  #   asr:
  #     name: faster-whisper
  #     template_args:
  #       language: de
  #   tts:
  #     name: piper
  #     template_args:
  #       model: "${data_dir}/de-thorsten-low.onnx"

  # # French
  # fr:
  #   inherit: default
  #   asr:
  #     name: faster-whisper
  #     template_args:
  #       language: fr
  #   tts:
  #     name: piper
  #     template_args:
  #       model: "${data_dir}/fr-siwis-low.onnx"

  # # Spanish
  # es:
  #   inherit: default
  #   asr:
  #     name: faster-whisper
  #     template_args:
  #       language: es
  #   tts:
  #     name: piper
  #     template_args:
  #       model: "${data_dir}/es-carlfm-low.onnx"

  # # Italian
  # it:
  #   inherit: default
  #   asr:
  #     name: faster-whisper
  #     template_args:
  #       language: it
  #   tts:
  #     name: piper
  #     template_args:
  #       model: "${data_dir}/it-riccardo_fasol-low.onnx"

  # # Catalan
  # ca:
  #   inherit: default
  #   asr:
  #     name: faster-whisper
  #     template_args:
  #       language: ca
  #   tts:
  #     name: piper
  #     template_args:
  #       model: "${data_dir}/ca-upc_ona-low.onnx"

  # # Danish
  # da:
  #   inherit: default
  #   asr:
  #     name: faster-whisper
  #     template_args:
  #       language: da
  #   tts:
  #     name: piper
  #     template_args:
  #       model: "${data_dir}/da-nst_talesyntese-medium.onnx"

  # # Dutch
  # nl:
  #   inherit: default
  #   asr:
  #     name: faster-whisper
  #     template_args:
  #       language: nl
  #   tts:
  #     name: piper
  #     template_args:
  #       model: "${data_dir}/nl-nathalie-low.onnx"

  # # Norwegian
  # no:
  #   inherit: default
  #   asr:
  #     name: faster-whisper
  #     template_args:
  #       language: no
  #   tts:
  #     name: piper
  #     template_args:
  #       model: "${data_dir}/no-talesyntese-medium.onnx"

  # # Ukrainian
  # uk:
  #   inherit: default
  #   asr:
  #     name: faster-whisper
  #     template_args:
  #       language: uk
  #   tts:
  #     name: piper
  #     template_args:
  #       model: "${data_dir}/uk-lada-low.onnx"

  # # Vietnamese
  # vi:
  #   inherit: default
  #   asr:
  #     name: faster-whisper
  #     template_args:
  #       language: vi
  #   tts:
  #     name: piper
  #     template_args:
  #       model: "${data_dir}/vi-vivos-low.onnx"

  # # Chinese
  # zh:
  #   inherit: default
  #   asr:
  #     name: faster-whisper
  #     template_args:
  #       language: zh
  #   tts:
  #     name: piper
  #     template_args:
  #       model: "${data_dir}/zh-cn-huayan-low.onnx"

running Rhasspy3 (tutorial) in a docker container

It's not an issue, it's a side note!

Just in case, if anyone is interested to play with Rhasspy3 in a docker container (whatever your motivation is) - onnxruntime doesn't work quite well with Alpine-based containers, because of musl. So build your containers out of something different. In my case I'm running this on a top of my Ubuntu 22.10 with Pipewire as a sound server executed under my own user - thus the fd mount.

# create a docker container
docker run -it -v /run/user/$(id -u)/pipewire-0:/tmp/pipewire-0 -e XDG_RUNTIME_DIR=/tmp --publish 13331:13331 --name rhasspy3 ubuntu /bin/bash

# inside of the docker container
apt update && apt install pipewire-audio-client-libraries alsa-utils vim python3.10-venv git -y

# in UBUNTU 22.04 you'll need to make that PCM device a default one
cat << EOF > ~/.asoundrc
pcm.!default {
    type plug
    slave.pcm "pipewire"
}
EOF

# make sure sound playback works with now-default ALSA device (press ^C to stop) 
# if it doesn't - check/change your default Pipewire sink, like in Gnome that will be in Sound Settings -> Output
# (or write a lua script for wireplumber)
speaker-test -c2

# Now let's test microphone. Say something after running the arecord
arecord -d5 test.wav
# And play it back:
aplay test.wav
# If you don't hear anything back - check / change your default Pipewire input device 
# in Gnome that will be in Sound Settings -> Input

# install dependencies and clone the project
apt install git -y
cd /root
git clone https://github.com/rhasspy/rhasspy3
cd rhasspy3

# check configuration
script/run bin/config_print.py

# check microphone
script/run bin/mic_test_energy.py

# config VAD
apt install python3.10-venv -y
mkdir -p config/programs/vad/
cp -R programs/vad/silero config/programs/vad/
python3 -m venv config/programs/vad/silero/.venv
. config/programs/vad/silero/.venv/bin/activate
config/programs/vad/silero/script/setup

# install and test VAD
script/run bin/mic_record_sample.py sample.wav
# make a pause, say something, then after a pause it will be captured to a file
# play the resulted file
aplay sample.wav
...

.... or just follow the rest guide - https://github.com/rhasspy/rhasspy3/blob/master/docs/tutorial.md

I'd like to thank @synesthesiam Michael once again, I followed his guide and everything works just perfectly.

Need to mention, if you're planning to run this on a headless system without being logged in to X all the time, Pipewire in the most of the popular distributions is configured to run under a user account and reconfiguring it to run as root is highly discouraged. So on most recent linuxes it works like this: before you log in, pipewire is executed under "gdm" user, if you're running Gnome with GDM. And then, after you login, the gdm's Pipewire is stopped and your own one is started instead.

I think, for such cases (headless system with some occasional interactive use), the ideal scenario here would be to reconfigure pipewire systemwide e.g. in /etc to accept local network connections and use them instead. So it doesn't matter if the gdm user will be running Pipewire, or your own user, docker container will be able to connect to it over TCP.

I spent quite a lot time, figuring out where the tail or head is in such "sound in docker using pure alsa" / "sound in docker when the host is running pipewire" / "sound in docker when the host running pulseaudio" configurations :)

write() failed: Broken pipe

Hi! I've been trying it out on my Raspberry Pi. Unfortunately I can't make it work. The individual parts seems to be working fine, but the pipeline is not working properly. It recognizes the words I am saying to it,

[denis@speaker rhasspy3]$ LANG=en_US.UTF8 script/run bin/pipeline_run.py --debug
DEBUG:rhasspy3.core:Loading config from /home/denis/rhasspy3/rhasspy3/configuration.yaml
DEBUG:rhasspy3.core:Loading config from /home/denis/rhasspy3/config/configuration.yaml
DEBUG:rhasspy3.program:mic_adapter_raw.py ['--rate', '16000', '--width', '2', '--channels', '1', 'parecord --client-name=porcupine --rate=16000 --channels=1 --format=s16le --raw --property=media.software=Rhasspy']
DEBUG:rhasspy3.program:client_unix_socket.py ['var/run/faster-whisper.socket']
DEBUG:rhasspy3.program:.venv/bin/python3 ['bin/porcupine_stream.py', '--model', '/home/denis/rhasspy3/config/data/wake/porcupine1/resources/keyword_files_de/raspberry-pi/ananas_raspberry-pi.ppn', '--lang_model', '/home/denis/rhasspy3/config/data/wake/porcupine1/lib/common/porcupine_params_de.pv']
DEBUG:rhasspy3.wake:detect: processing audio
DEBUG:rhasspy3.wake:detect: Detection(name='ananas_raspberry-pi', timestamp=49831396948773)
DEBUG:rhasspy3.program:vad_adapter_raw.py ['--rate', '16000', '--width', '2', '--channels', '1', '--samples-per-chunk', '512', '--timeout-seconds', '45', 'script/speech_prob "share/silero_vad.onnx"']
DEBUG:rhasspy3.vad:segment: processing audio
DEBUG:rhasspy3.vad:segment: speaking started
DEBUG:rhasspy3.vad:segment: speaking ended
write() failed: Broken pipe

Same happens on asr_transcribe, regardless of whether it is a subprocess or a server/client:

[denis@speaker rhasspy3]$ LANG=en_US.UTF8 script/run bin/asr_transcribe.py
INFO:asr_transcribe:Ready
{"type": "transcript", "data": {"text": " Hello, this is a test."}}
write() failed: Broken pipe

I am on master branch, this is my configuration.yaml:

programs:
  mic:
    parecord:
      command: |
        parecord --client-name=porcupine --rate=16000 --channels=1 --format=s16le --raw --property=media.software=Rhasspy
      adapter: |
        mic_adapter_raw.py --rate 16000 --width 2 --channels 1
  vad:
    silero:
      command: |
        script/speech_prob "share/silero_vad.onnx"
      adapter: |
        vad_adapter_raw.py --rate 16000 --width 2 --channels 1 --samples-per-chunk 512 --timeout-seconds 45
  asr:
    faster-whisper:
      command: |
        script/wav2text "${data_dir}/base" "{wav_file}"
      adapter: |
        asr_adapter_wav2text.py
    faster-whisper.client:
      command: |
        client_unix_socket.py var/run/faster-whisper.socket
  wake:
    porcupine1:
      command: |
        .venv/bin/python3 bin/porcupine_stream.py --model "${model}" --lang_model "${lang_model}"
      template_args:
        model: "${data_dir}/resources/keyword_files_de/linux/ananas_linux.ppn"
        lang_model: "${data_dir}/lib/common/porcupine_params_de.pv"
  handle:
    date_time:
      command: |
        bin/date_time.py
      adapter: |
        handle_adapter_text.py
  tts:
    piper:
      command: |
        bin/piper --model "${model}" --output_file -
      adapter: |
        tts_adapter_text2wav.py
      template_args:
        model: "${data_dir}/en-us-lessac-low.onnx"
    piper.client:
      command: |
        client_unix_socket.py var/run/piper.socket
  snd:
    paplay:
      command: |
        paplay --rate=22050 --format=s16le --channels=1 --raw --property=media.software=Rhasspy
      adapter: |
        snd_adapter_raw.py --rate 22050 --width 2 --channels 1

servers:
  asr:
    faster-whisper:
      command: |
        script/server --language "en" "${data_dir}/tiny-int8"
  tts:
    piper:
      command: |
        script/server "${model}"
      template_args:
        model: "${data_dir}/en-us-lessac-low.onnx"

pipelines:
  default:
    mic:
      name: parecord
    vad:
      name: silero
    asr:
      name: faster-whisper.client
    wake:
      name: porcupine1
      template_args:
        model: "${data_dir}/resources/keyword_files_de/raspberry-pi/ananas_raspberry-pi.ppn"
    handle:
      name: repeat
    tts:
      name: piper.client
    snd:
      name: paplay

Unfortunately, the debug logs are not extensive enough for me to understand at which point it fails.

suggestion: detect language using asr

Some ASR engines (i.e. whisper) support detection of the language. It would be nice to have a feature to adjust the pipeline language variable dynamically for intent, handle, and tts based on that.

Installing Wyoming Whisper without Docker llooks impossible: 404 error when downloading model

I post it here because Python package index website indicate this repository for home page so first of all, sorry if it is the wrong place.

I would like to test Whisper using Wyoming.
I use Home Assistant core installation, so I have not Docker for anything.
Having Docker installed only for one thing does not look reasonable for me, so I try to install Wyoming Whisper manually.

I looked into the add-on code to see how Wyoming Whisper installed and made the following on my side:

mkdir -p wyoming-whisper/data
cd wyoming-whisper
python3.11 -m venv venv
source venv/bin/activate
pip install wheel
pip install wyoming-faster-whisper==0.0.3
python3 -m wyoming_faster_whisper --uri 'tcp://0.0.0.0:10300' --model medium --beam-size "1" --language "fr" --data-dir ./data --download-dir ./data

But when running the last command, I have the following:

WARNING:wyoming_faster_whisper.download:Model hashes do not match
WARNING:wyoming_faster_whisper.download:Expected: {'config.json': 'e5a2f85afc17f73960204cad2b002633', 'model.bin': '5f852c3335fbd24002ffbb965174e3d7', 'vocabulary.txt': 'c1120a13c94a8cbb132489655cdd1854'}
WARNING:wyoming_faster_whisper.download:Got: {'model.bin': '', 'config.json': '', 'vocabulary.txt': ''}
INFO:__main__:Downloading FasterWhisperModel.MEDIUM to ./data
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/pzajda/wyoming-whisper/venv/lib/python3.11/site-packages/wyoming_faster_whisper/__main__.py", line 135, in <module>
    asyncio.run(main())
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/pzajda/wyoming-whisper/venv/lib/python3.11/site-packages/wyoming_faster_whisper/__main__.py", line 75, in main
    model_dir = download_model(model, args.download_dir)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/wyoming-whisper/venv/lib/python3.11/site-packages/wyoming_faster_whisper/download.py", line 90, in download_model
    with urlopen(model_url) as response:
         ^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 525, in open
    response = meth(req, response)
               ^^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 634, in http_response
    response = self.parent.error(
               ^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 563, in error
    return self._call_chain(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

How it could work for the add-on but not manually?
And how could I solve this?
I also posted a topic on Home Assistant community but it looks like I am alone to do that kind of setup 🙁

I have the same idea as you.

I have the same idea as you.
The core is the websocket API.

and i'm working on it

Faster-Whisper Home Assistant "Unknown" State

I am running this in a docker container via faster-whisper, Wyoming protocol, in Home Assistant. The state of this entity is showing up as "unknown." It works fine the state is just not updating at all. On Piper the last time it was used shows as the state.

Wake_word_command (with no pauses or confirmations)

Hello there.

Have you seen what OpenVoiceOS guys are doing with their Mycroft fork - https://youtu.be/2D1IZaj2Uws

From the video description it looks like they have made the wake word ack sound (beep) to be played in parallel with the beginning of the recording of a command from mic. This is a hacky way, but it does the job pretty well as you can see yourself on a video.

I was thinking if I'd ever had to implement that myself, I would probably done that bit differently. First of all I would have been recording sound from microphone all the time to some kind of round-robin-buffer. And if the wake word was detected during that, to make a note when exactly it was detected and stream the PCM from that moment of time from a buffer to the ASR module until I see a pause. Streaming (and parallel processing of speech recognition as you talk) should drastically reduce the delay, caused by the sequential nature of current architecture, i.e. when the speech is first get recorded and only then the resulted wav file is fed to ASR.

As a result:

You don't need to make that huge pause yourself to listen that BEEP sound between you say the wake word and you say an actual command
the voice recognition starts earlier, shortly after you started to speak, so by the moment you stop speaking the ASR module will just need to process a tiny bit of PCM stream. This gives more headroom for ASR and allows ppl to use larger and heavier ASR models, with the realtime factor close to 1, without the need of waiting extra seconds for their hardware to recognize what they have just said starting from the first byte of PCM, as it happens now?
this all should greatly improve user experience

Nevertheless, amazing work! thanks a lot for all your contributions to the community.

handle_adapter_text.py only supports one line of the response text

Even if the program will return multiple lines, we will only use first one.
Does the wyoming protocol support multi-line text in the Handled event? I don't see any reason why not, but may be I am wrong.

suggestion: save and reuse conversation_id

Some handle programs (such as Home Assistant) support conversation_id parameter to be able to continue the dialog.
We should be able to save the returned conversation_id and use it in a subsequent requests.
It would allow more natural conversations.

Also in the future some handlers could be able to send an indication they wish to continue the dialog. We could use that flag to bypass wake word detection for the next iteration.

Suggestion: continuously listen to the wake word

The pipeline runner should constantly listen to the wake words and interrupt a running iteration if it is detected. Ok, may be not constantly, but after a vad VoiceStop event. That would allow us to cancel a lengthy or misunderstood reply. This feature is available in all commercial voice assistants.

Outputting sound after wake word (satellite)

Hi everyone !
i've managed to config a satellite connected to my home assistant server, it work like a charm but i would like to enhance my experience with a sound after triggering the wake word program.
ive tried to add a command on my configuration with no luck.

Is anybody knows a tweak to do this ?

Here's my satellite configuration:

satellites:
  default:
    mic:
      name: arecord
    wake:
      name: porcupine2
      after: 
        command: |
          aplay "${data_dir}/wake_alfred.wav"
    remote:
      name: home_assistant
    snd:
      name: aplay

Thanks

Wyoming source repository

Hi, I'm wondering where I can find the source repository for the wyoming package, that is up on PyPi.

The homepage points to this repository, and the structure looks similar to the rhasspy3 directory in this repository, but the code is different.

Also, is the code open for contributions?

Enhancement: allow piper to use any of the voices available in libritts

By allowing template args for ${voice} for piper (program and server), piper can be used with any of the hundreds (or more) of voices included with libritts. There was an online tool to try them all out and I went through heaps of them to find one that I was keen on using. Can't think of it or find it right now, of course.

Updated Docker Image for Wyoming-piper

Hopefully this is the right repo to post this in.

I saw that the Home Assistant add on was updated with the latest release of wyoming-piper, but the docker image on dockerhub was not. I see a link from PyPi to this repo, but did not find the source code for the python library or that same docker image here.

Would you be able to push an update for that docker image as well as point me to the right repo where it’s hosted?

Thanks!

Docker image: non-ASCII characters in voice name trigger UnicodeEncodeError

When using the rhasspy/wyoming-piper docker image with the following Docker compose configuration:

piper:
    container_name: piper
    image: rhasspy/wyoming-piper
    volumes:
      - './piper:/data'
    command: --voice "pt_PT-tugão-medium"

The non-unicode ã character in the voice name is triggering the following error:

Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.9/dist-packages/wyoming_piper/__main__.py", line 193, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.9/dist-packages/wyoming_piper/__main__.py", line 161, in main
    await process_manager.get_process()
  File "/usr/local/lib/python3.9/dist-packages/wyoming_piper/process.py", line 114, in get_process
    ensure_voice_exists(
  File "/usr/local/lib/python3.9/dist-packages/wyoming_piper/download.py", line 137, in ensure_voice_exists
    with urlopen(file_url) as response, open(
  File "/usr/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.9/urllib/request.py", line 517, in open
    response = self._open(req, data)
  File "/usr/lib/python3.9/urllib/request.py", line 534, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.9/urllib/request.py", line 1389, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/usr/lib/python3.9/urllib/request.py", line 1346, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib/python3.9/http/client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.9/http/client.py", line 1266, in _send_request
    self.putrequest(method, url, **skips)
  File "/usr/lib/python3.9/http/client.py", line 1104, in putrequest
    self._output(self._encode_request(request))
  File "/usr/lib/python3.9/http/client.py", line 1184, in _encode_request
    return request.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\xe3' in position 53: ordinal not in range(128)

rhasspy / rhasspy3 Goto Github PK

rhasspy3's Introduction

Getting Started

Missing Pieces

Core Concepts

Domains

Programs

Adapters

Pipelines

Servers

Supported Programs

HTTP API

WebSocket API

rhasspy3's People

Contributors

Stargazers

Watchers

Forkers

rhasspy3's Issues

Issue number one - the silero-vad sensitivity? other VADs?

Issue number two - even if VAD wasn't triggered, the captured PCM is still sent downwards to the pipe to ASR

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Questions and Concerns

Additional Context

Recommend Projects

Recommend Topics

Recommend Org