Code Monkey home page Code Monkey logo

rhasspy3's Introduction

Rhasspy 3

NOTE: This is a very early developer preview!

An open source toolkit for building voice assistants.

Voice assistant pipeline

Rhasspy focuses on:

  • Privacy - no data leaves your computer unless you want it to
  • Broad language support - more than just English
  • Customization - everything can be changed

Getting Started

Missing Pieces

This is a developer preview, so there are lots of things missing:

  • A user friendly web UI
  • An automated method for installing programs/services and downloading models
  • Support for custom speech to text grammars
  • Intent systems besides Home Assistant
  • The ability to accumulate context within a pipeline

Core Concepts

Domains

Rhasspy is organized by domain:

  • mic - audio input
  • wake - wake word detection
  • asr - speech to text
  • vad - voice activity detection
  • intent - intent recognition from text
  • handle - intent or text input handling
  • tts - text to speech
  • snd - audio output

Programs

Rhasspy talks to external programs using the Wyoming protocol. You can add your own programs by implementing the protocol or using an adapter.

Adapters

Small scripts that live in bin/ and bridge existing programs into the Wyoming protocol.

For example, a speech to text program (asr) that accepts a WAV file and outputs text can use asr_adapter_wav2text.py

Pipelines

Complete voice loop from microphone input (mic) to speaker output (snd). Stages are:

  1. detect (optional)
    • Wait until wake word is detected in mic
  2. transcribe
    • Listen until vad detects silence, then convert audio to text
  3. recognize (optional)
    • Recognize an intent from text
  4. handle
    • Handle an intent or text, producing a text response
  5. speak
    • Convert handle output text to speech, and speak through snd

Servers

Some programs take a while to load, so it's best to leave them running as a server. Use bin/server_run.py or add --server <domain> <name> when running the HTTP server.

See servers section of configuration.yaml file.


Supported Programs


HTTP API

http://localhost:13331/<endpoint>

Unless overridden, the pipeline named "default" is used.

  • /pipeline/run
    • Runs a full pipeline from mic to snd
    • Produces JSON
    • Override pipeline or:
      • wake_program
      • asr_program
      • intent_program
      • handle_program
      • tts_program
      • snd_program
    • Skip stages with start_after
      • wake - skip detection, body is detection name (text)
      • asr - skip recording, body is transcript (text) or WAV audio
      • intent - skip recognition, body is intent/not-recognized event (JSON)
      • handle - skip handling, body is handle/not-handled event (JSON)
      • tts - skip synthesis, body is WAV audio
    • Stop early with stop_after
      • wake - only detection
      • asr - detection and transcription
      • intent - detection, transcription, recognition
      • handle - detection, transcription, recognition, handling
      • tts - detection, transcription, recognition, handling, synthesis
  • /wake/detect
    • Detect wake word in WAV input
    • Produces JSON
    • Override wake_program or pipeline
  • /asr/transcribe
    • Transcribe audio from WAV input
    • Produces JSON
    • Override asr_program or pipeline
  • /intent/recognize
    • Recognizes intent from text body (POST) or text (GET)
    • Produces JSON
    • Override intent_program or pipeline
  • /handle/handle
    • Handles intent/text from body (POST) or input (GET)
    • Content-Type must be application/json for intent input
    • Override handle_program or pipeline
  • /tts/synthesize
    • Synthesizes audio from text body (POST) or text (GET)
    • Produces WAV audio
    • Override tts_program or pipeline
  • /tts/speak
    • Plays audio from text body (POST) or text (GET)
    • Produces JSON
    • Override tts_program, snd_program, or pipeline
  • /snd/play
    • Plays WAV audio via snd
    • Override snd_program or pipeline
  • /config
    • Returns JSON config
  • /version
    • Returns version info

WebSocket API

ws://localhost:13331/<endpoint>

Audio streams are raw PCM in binary messages.

Use the rate, width, and channels parameters for sample rate (hertz), width (bytes), and channel count. By default, input audio is 16Khz 16-bit mono, and output audio is 22Khz 16-bit mono.

The client can "end" the audio stream by sending an empty binary message.

  • /pipeline/asr-tts
    • Run pipeline from asr (stream in) to tts (stream out)
    • Produces JSON messages as events happen
    • Override pipeline or:
      • asr_program
      • vad_program
      • handle_program
      • tts_program
    • Use in_rate, in_width, in_channels for audio input format
    • Use out_rate, out_width, out_channels for audio output format
  • /wake/detect
    • Detect wake word from websocket audio stream
    • Produces a JSON message when audio stream ends
    • Override wake_program or pipeline
  • /asr/transcribe
    • Transcribe a websocket audio stream
    • Produces a JSON message when audio stream ends
    • Override asr_program or pipeline
  • /snd/play
    • Play a websocket audio stream
    • Produces a JSON message when audio stream ends
    • Override snd_program or pipeline

rhasspy3's People

Contributors

mic92 avatar nikito avatar synesthesiam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rhasspy3's Issues

running Rhasspy3 (tutorial) in a docker container

It's not an issue, it's a side note!

Just in case, if anyone is interested to play with Rhasspy3 in a docker container (whatever your motivation is) - onnxruntime doesn't work quite well with Alpine-based containers, because of musl. So build your containers out of something different. In my case I'm running this on a top of my Ubuntu 22.10 with Pipewire as a sound server executed under my own user - thus the fd mount.

# create a docker container
docker run -it -v /run/user/$(id -u)/pipewire-0:/tmp/pipewire-0 -e XDG_RUNTIME_DIR=/tmp --publish 13331:13331 --name rhasspy3 ubuntu /bin/bash

# inside of the docker container
apt update && apt install pipewire-audio-client-libraries alsa-utils vim python3.10-venv git -y

# in UBUNTU 22.04 you'll need to make that PCM device a default one
cat << EOF > ~/.asoundrc
pcm.!default {
    type plug
    slave.pcm "pipewire"
}
EOF

# make sure sound playback works with now-default ALSA device (press ^C to stop) 
# if it doesn't - check/change your default Pipewire sink, like in Gnome that will be in Sound Settings -> Output
# (or write a lua script for wireplumber)
speaker-test -c2

# Now let's test microphone. Say something after running the arecord
arecord -d5 test.wav
# And play it back:
aplay test.wav
# If you don't hear anything back - check / change your default Pipewire input device 
# in Gnome that will be in Sound Settings -> Input

# install dependencies and clone the project
apt install git -y
cd /root
git clone https://github.com/rhasspy/rhasspy3
cd rhasspy3

# check configuration
script/run bin/config_print.py

# check microphone
script/run bin/mic_test_energy.py

# config VAD
apt install python3.10-venv -y
mkdir -p config/programs/vad/
cp -R programs/vad/silero config/programs/vad/
python3 -m venv config/programs/vad/silero/.venv
. config/programs/vad/silero/.venv/bin/activate
config/programs/vad/silero/script/setup

# install and test VAD
script/run bin/mic_record_sample.py sample.wav
# make a pause, say something, then after a pause it will be captured to a file
# play the resulted file
aplay sample.wav
...

.... or just follow the rest guide - https://github.com/rhasspy/rhasspy3/blob/master/docs/tutorial.md

I'd like to thank @synesthesiam Michael once again, I followed his guide and everything works just perfectly.

Need to mention, if you're planning to run this on a headless system without being logged in to X all the time, Pipewire in the most of the popular distributions is configured to run under a user account and reconfiguring it to run as root is highly discouraged. So on most recent linuxes it works like this: before you log in, pipewire is executed under "gdm" user, if you're running Gnome with GDM. And then, after you login, the gdm's Pipewire is stopped and your own one is started instead.

I think, for such cases (headless system with some occasional interactive use), the ideal scenario here would be to reconfigure pipewire systemwide e.g. in /etc to accept local network connections and use them instead. So it doesn't matter if the gdm user will be running Pipewire, or your own user, docker container will be able to connect to it over TCP.

I spent quite a lot time, figuring out where the tail or head is in such "sound in docker using pure alsa" / "sound in docker when the host is running pipewire" / "sound in docker when the host running pulseaudio" configurations :)

TTS appears to not output if the text is too long

Hello,
In testing out rhasspy with a few commands via Home Assistant and the addon, I noticed that if the text is long TTS is not outputting replies, though the log shows the full reply and that it was sent to TTS:
DEBUG:rhasspy3.program:client_unix_socket.py [‘var/run/faster-whisper.socket’] DEBUG:rhasspy3.program:vad_adapter_raw.py [‘–rate’, ‘16000’, ‘–width’, ‘2’, ‘–channels’, ‘1’, ‘–samples-per-chunk’, ‘512’, ‘script/speech_prob “share/silero_vad.onnx”’] DEBUG:rhasspy3_http_api.pipeline:stream-to-stream: voice started DEBUG:rhasspy3_http_api.pipeline:stream-to-stream: voice stopped INFO:faster_whisper_server: What’s the weather like? DEBUG:rhasspy3_http_api.pipeline:stream-to-stream: asr=Transcript(text=" What’s the weather like?“) DEBUG:rhasspy3.program:handle_adapter_text.py [‘bin/converse.py --language “” “http://supervisor/core/api/conversation/process” “/app/config/data/handle/home_assistant/token”’] DEBUG:rhasspy3.handle:handle: input=Transcript(text=” What’s the weather like?") DEBUG:rhasspy3.handle:handle: Handled(text=‘Currently the weather is sunny, with a temperature of 43 degrees. Under present weather conditions the temperature feels like 36 degrees. In the next few hours the weather will be more of the same, with a temperature of 43 degrees.’) DEBUG:rhasspy3_http_api.pipeline:stream-to-stream: handle=Handled(text=‘Currently the weather is sunny, with a temperature of 43 degrees. Under present weather conditions the temperature feels like 36 degrees. In the next few hours the weather will be more of the same, with a temperature of 43 degrees.’) DEBUG:rhasspy3_http_api.pipeline:stream-to-stream: sending tts DEBUG:rhasspy3.program:client_unix_socket.py [‘var/run/larynx2.socket’] Real-time factor: 0.107471 (infer=1.27417 sec, audio=11.856 sec) DEBUG:rhasspy3_http_api.pipeline:stream-to-stream: tts done

Not sure if this is an issue with the websocket/larynx, or just the browser not handling the longer stream correctly (was testing the satellite via browser on the addon web UI interface).

In further testing I also noticed that after the request, regardless of TTS length, my dev console gets flooded with this message continuously:
image

Testing other commands (such as What time is it or What is the date) worked fine and outputted TTS audio.

Thanks!

Stream to HA Assist Pipeline

I'm a bit lost -- I have installed the rhasspy3 hassio addon, which isn't quite usable yet as far as I can tell..
And if it was, I still wouldn't get to where I want, which is this;

  1. Do wake on rpi satellite
  2. Stream audio to home assistant server (I want to do stt using a cloud service or local wyoming whisper server).
  3. Stream TTS audio back to the satellite.

This could be accomplished by doing what the existing remote/websocket does, except against HA`s WebSocket API.

I have to assume I'm kicking in open doors here - but after reading around both here and there, I couldn't find any mention about this..

Confusing line in configuration.yaml in tutorial

There is a part in the tutorial where faster-whisper is set up

asr:
    faster-whisper: ...
    faster-whisper.client:
      command: |
        client_unix_socket.py var/run/faster-whisper.socket

The second line is confusing (at least for me). I was assuming I should keep previous created entry forfaster-whisper: ... and add another for faster-whisper.client. Which apparently was wrong.

another issue I have had is in piper setup.

template_args:
        model: "${data_dir}/en-us-blizzard_lessac-medium.onnx"

I don't know why, but my file name was different: en-us-lessac-low.onnx . I spend some time trying to understand why am I getting an error when running the script

satellite configuration

I guess I miss something.

On the server I have configured following:

cmd:
/root/rhasspy3/script/run /root/rhasspy3/bin/pipeline_run.py --loop --debug
config:

pipelines:
  default:
    mic:
      name: arecord
    wake:
      name: porcupine1
    vad:
      name: silero
    asr:
      name: faster-whisper.client
    handle:
      name: home_assistant
cmd:
/root/rhasspy3/script/run /root/rhasspy3/bin/server_run.py asr faster-whisper
config:

servers:
    faster-whisper:
      command: |
        script/server --language ${language} --device ${device} "${model}"
      template_args:
        language: "de"
        model: "${data_dir}/large-v2"
        device: "cuda"  # cpu or cuda

starting http_server with:

/root/rhasspy3/script/http_server --debug

On the satelite:

cmd:
/root/rhasspy3/script/run /root/rhasspy3/bin/satellite_run.py
config:

satellites:
  default:
    mic:
      name: arecord
    wake:
      name: porcupine1
    remote:
      name: websocket
    snd:
      name: aplay

  remote:
    websocket:
      command: |
        script/run "${uri}"
      template_args:
        uri: "ws://192.168.0.109:13331/pipeline/asr-tts"

"local" processing working fine

but satellite does not - debugoutput:

DEBUG:rhasspy3.core:Loading config from /root/rhasspy3/rhasspy3/configuration.yaml
DEBUG:rhasspy3.core:Skipping /root/rhasspy3/config/configuration.yaml
DEBUG:rhasspy3.program:mic_adapter_raw.py ['--samples-per-chunk', '1024', '--rate', '16000', '--width', '2', '--channels', '1', 'arecord -q -D pulse -r 16000 -c 1 -f S16_LE -t raw -']
DEBUG:rhasspy3.program:.venv/bin/python3 ['bin/porcupine_stream.py', '--model', '/root/rhasspy3/config/data/wake/porcupine1/resources/keyword_files_de/linux/ananas_linux.ppn', '--lang_model', '/root/rhasspy3/config/data/wake/porcupine1/lib/common/porcupine_params_de.pv']
DEBUG:rhasspy3.wake:detect: processing audio
DEBUG:rhasspy3.wake:detect: Detection(name='ananas_linux', timestamp=88896097256797)
DEBUG:rhasspy3.program:script/run ['ws://192.168.0.109:13331/pipeline/asr-tts']

After that nothing happes - have to kill the process.

Any ideas?

Wake_word_command (with no pauses or confirmations)

Hello there.

Have you seen what OpenVoiceOS guys are doing with their Mycroft fork - https://youtu.be/2D1IZaj2Uws

From the video description it looks like they have made the wake word ack sound (beep) to be played in parallel with the beginning of the recording of a command from mic. This is a hacky way, but it does the job pretty well as you can see yourself on a video.

I was thinking if I'd ever had to implement that myself, I would probably done that bit differently. First of all I would have been recording sound from microphone all the time to some kind of round-robin-buffer. And if the wake word was detected during that, to make a note when exactly it was detected and stream the PCM from that moment of time from a buffer to the ASR module until I see a pause. Streaming (and parallel processing of speech recognition as you talk) should drastically reduce the delay, caused by the sequential nature of current architecture, i.e. when the speech is first get recorded and only then the resulted wav file is fed to ASR.

As a result:

  1. You don't need to make that huge pause yourself to listen that BEEP sound between you say the wake word and you say an actual command
  2. the voice recognition starts earlier, shortly after you started to speak, so by the moment you stop speaking the ASR module will just need to process a tiny bit of PCM stream. This gives more headroom for ASR and allows ppl to use larger and heavier ASR models, with the realtime factor close to 1, without the need of waiting extra seconds for their hardware to recognize what they have just said starting from the first byte of PCM, as it happens now?
  3. this all should greatly improve user experience

Nevertheless, amazing work! thanks a lot for all your contributions to the community.

"Illegal Instruction" encountered trying to start wyoming-whisper

Apologies if this is not the forum or if it is answered elsewhere.. But, just trying to pull someone else's docker compose with wyoming-whisper in it, and encountering a python error on startup:

/run.sh: line 5: 7 Illegal instruction python3 -m wyoming_faster_whisper --uri 'tcp://0.0.0.0:10300' --data-dir /data --download-dir /data "$@"

I'm assuming it's because of the underlying system Docker is running on? An old MSI desktop with an AMD 64 cpu running Win10 and Docker Community.

Any help, or starting points to getting the container running?

Docker compose file is largely pulled from somewhere else, and I doubt is the cause.

version: "3.9"
services:
  piper:
    container_name: piper
    image: rhasspy/wyoming-piper
    ports:
      - '10200:10200'
    volumes:
      - '/media/storage/piper/data:/data'
    command: --voice en-gb-southern_english_female-low
   
  whisper:
   container_name: whisper
   image: rhasspy/wyoming-whisper
   ports:
     - '10300:10300'
   volumes:
     - '/media/storage/whisper/data:/data'
   command: --model tiny-int8 --language en

Small issue in tutorial

Keep following your tutorial, it's like reading a novel of your favorite author. Found a small issue there, I guess:

Instead of

echo 'What time is it?' | script/run bin/handle_handle.py --debug

there should be

echo 'what time is it?' | script/run bin/handle_text.py --debug

Installing Wyoming Whisper without Docker llooks impossible: 404 error when downloading model

I post it here because Python package index website indicate this repository for home page so first of all, sorry if it is the wrong place.

I would like to test Whisper using Wyoming.
I use Home Assistant core installation, so I have not Docker for anything.
Having Docker installed only for one thing does not look reasonable for me, so I try to install Wyoming Whisper manually.

I looked into the add-on code to see how Wyoming Whisper installed and made the following on my side:

mkdir -p wyoming-whisper/data
cd wyoming-whisper
python3.11 -m venv venv
source venv/bin/activate
pip install wheel
pip install wyoming-faster-whisper==0.0.3
python3 -m wyoming_faster_whisper --uri 'tcp://0.0.0.0:10300' --model medium --beam-size "1" --language "fr" --data-dir ./data --download-dir ./data

But when running the last command, I have the following:

WARNING:wyoming_faster_whisper.download:Model hashes do not match
WARNING:wyoming_faster_whisper.download:Expected: {'config.json': 'e5a2f85afc17f73960204cad2b002633', 'model.bin': '5f852c3335fbd24002ffbb965174e3d7', 'vocabulary.txt': 'c1120a13c94a8cbb132489655cdd1854'}
WARNING:wyoming_faster_whisper.download:Got: {'model.bin': '', 'config.json': '', 'vocabulary.txt': ''}
INFO:__main__:Downloading FasterWhisperModel.MEDIUM to ./data
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/pzajda/wyoming-whisper/venv/lib/python3.11/site-packages/wyoming_faster_whisper/__main__.py", line 135, in <module>
    asyncio.run(main())
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/pzajda/wyoming-whisper/venv/lib/python3.11/site-packages/wyoming_faster_whisper/__main__.py", line 75, in main
    model_dir = download_model(model, args.download_dir)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/wyoming-whisper/venv/lib/python3.11/site-packages/wyoming_faster_whisper/download.py", line 90, in download_model
    with urlopen(model_url) as response:
         ^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 525, in open
    response = meth(req, response)
               ^^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 634, in http_response
    response = self.parent.error(
               ^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 563, in error
    return self._call_chain(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

How it could work for the add-on but not manually?
And how could I solve this?
I also posted a topic on Home Assistant community but it looks like I am alone to do that kind of setup 🙁

Open transcription seems performing worse than rhasspy2's kaldi closed transcription

Also whisper is a lot better at detecting random text quick it seems to perform a lot worse at detecting the intent compared to the closed trained transcription that we had in rhasspy 2.
I think the closed transcription also helped a lot with fuzzy matching i.e. when people used slightly different words or when other people are talking in the background. I feel like as long as there is no smart NLU that can match the spoken text on the intent this approach might be not fit enough to be useful for interfacing with home automation. I see value in having a good open transcription and I am currently thinking how this could be combined with the preciseness of the old system.

satellite communication issue

it looks like some additional logic needed to handle scenario when base station sends tts result back to satellite and close connection.

ERROR:satellite_run:Unexpected error communicating with remote base station
Traceback (most recent call last):
  File "/home/pi/rhasspy3/bin/satellite_run.py", line 131, in main
    await async_write_event(mic_event, remote_proc.stdin)
  File "/home/pi/rhasspy3/rhasspy3/event.py", line 77, in async_write_event
    await writer.drain()
  File "/usr/lib/python3.9/asyncio/streams.py", line 387, in drain
    await self._protocol._drain_helper()
  File "/usr/lib/python3.9/asyncio/streams.py", line 197, in _drain_helper
    await waiter
BrokenPipeError

Simple code below will fix this. but im not sure that this is the right way to handle this

try
     await async_write_event(mic_event, remote_proc.stdin) 
except ..

Wyoming source repository

Hi, I'm wondering where I can find the source repository for the wyoming package, that is up on PyPi.

The homepage points to this repository, and the structure looks similar to the rhasspy3 directory in this repository, but the code is different.

Also, is the code open for contributions?

Faster-Whisper Home Assistant "Unknown" State

I am running this in a docker container via faster-whisper, Wyoming protocol, in Home Assistant. The state of this entity is showing up as "unknown." It works fine the state is just not updating at all. On Piper the last time it was used shows as the state.

Custom wake word support

Are custom wake words supported like rhasspy2? I followed the tutorial for porcupine and tested with a few words, then generated a ppm and placed it under keyword_files. It shows up with config/programs/wake/porcupine1/script/list_models, but crashes on load (eg script/run bin/wake_detect.py) with a JSONDecodeError:

DEBUG:rhasspy3.core:Loading config from /app/rhasspy3/configuration.yaml
DEBUG:rhasspy3.core:Skipping /app/config/configuration.yaml
DEBUG:wake_detect:mic program: PipelineProgramConfig(name='arecord', template_args=None, after=None)
DEBUG:wake_detect:wake program: PipelineProgramConfig(name='porcupine1', template_args=None, after=None)
DEBUG:rhasspy3.program:mic_adapter_raw.py ['--samples-per-chunk', '1024', '--rate', '16000', '--width', '2', '--channels', '1', 'arecord -q -D "default" -r 16000 -c 1 -f S16_LE -t raw -']
DEBUG:wake_detect:Detecting wake word
DEBUG:rhasspy3.program:python3 ['bin/porcupine_stream.py', '--model', 'Professor_en_linux_v2_2_0.ppn']
Traceback (most recent call last):
  File "/app/config/programs/wake/porcupine1/bin/porcupine_stream.py", line 71, in <module>
    main()
  File "/app/config/programs/wake/porcupine1/bin/porcupine_stream.py", line 25, in main
    porcupine, names = load_porcupine(args)
  File "/app/config/programs/wake/porcupine1/bin/porcupine_shared.py", line 57, in load_porcupine
    Traceback (most recent call last):
  File "/app/bin/wake_detect.py", line 80, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/app/bin/wake_detect.py", line 69, in main
    detection = await detect(rhasspy, wake_program, mic_proc.stdout)
  File "/app/rhasspy3/wake.py", line 109, in detect
    wake_event = wake_task.result()
  File "/app/rhasspy3/event.py", line 48, in async_read_event
    event_dict = json.loads(json_line)
  File "/usr/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

SileroVAD fails sometimes with "BrokenPipeError: [Errno 32] Broken pipe"

Hello there,

I was following your tutorial, up to the point of testing the Silero VAD - "Voice Activity Detection" chapter.

I noticed that sometimes (in my case like in 30% of cases) it throws an exception while recording sound activity like this:

(.venv) root@085dd18c329f:~/rhasspy3# script/run bin/mic_record_sample.py sample.wav
INFO:mic_record_sample:Recording sample.wav
INFO:mic_record_sample:Speaking started
INFO:mic_record_sample:Speaking ended
Traceback (most recent call last):
  File "/root/rhasspy3/config/programs/vad/silero/bin/silero_speech_prob.py", line 89, in <module>
    main()
  File "/root/rhasspy3/config/programs/vad/silero/bin/silero_speech_prob.py", line 34, in main
    print(speech_probability, flush=True)
BrokenPipeError: [Errno 32] Broken pipe

However when it happens it still manages to capture something to sample.wav

If it matters, I'm playing with Rhasspy3 in a ubuntu:latest docker container (22.04.2), for sound inside of the container I'm using ALSA emulation of pipewire client installed in the container. The Pipewire client is connected to my host Pipewire server via socket file. All ALSA native utilities inside of the container (aplay / arecord) are working just fine - I'm pretty sure the sound setup is a solid one.

Updated Docker Image for Wyoming-piper

Hopefully this is the right repo to post this in.

I saw that the Home Assistant add on was updated with the latest release of wyoming-piper, but the docker image on dockerhub was not. I see a link from PyPi to this repo, but did not find the source code for the python library or that same docker image here.

Would you be able to push an update for that docker image as well as point me to the right repo where it’s hosted?

Thanks!

Satellite: stuck at script/run ['ws://<serverip>:13331/pipeline/asr-tts']

Hello to all,

first of all thank you for this awesome work!!

This is what I have achieved:

  • worked through the whole tutorial and the whole pipeline is working
  • at the server I started two processes for being ready for clients:
  • ./script/http_server --debug --server asr faster-whisper --server tts piper &
  • curl -X POST 'localhost:13331/pipeline/run' (in a while loop)

Problem:
Wake word is working at satellite, but it gets stuck at:

DEBUG:rhasspy3.core:Loading config from /home/pi/rhasspy3/rhasspy3/configuration.yaml DEBUG:rhasspy3.core:Loading config from /home/pi/rhasspy3/config/configuration.yaml DEBUG:rhasspy3.program:mic_adapter_raw.py ['--samples-per-chunk', '1024', '--rate', '16000', '--width', '2', '--channels', '1', 'arecord -q -r 16000 -c 1 -f S16_LE -t raw -D plughw:CARD=Mini,DEV=0 -'] DEBUG:rhasspy3.program:.venv/bin/python3 ['bin/porcupine_stream.py', '--model', 'jarvis_raspberry-pi.ppn'] DEBUG:rhasspy3.wake:detect: processing audio DEBUG:rhasspy3.wake:detect: Detection(name='jarvis_raspberry-pi', timestamp=1239407741987) DEBUG:rhasspy3.program:script/run ['ws://10.10.10.5:13331/pipeline/asr-tts']

-> From there nothing is happening any more. Any ideas?

Two small issues in the VAD->ASR pipeline processing

Hello again. I've been playing with Rhasspy3 over the course of last several days and I found few small issues.

For a beginning, here's my pipeline, pretty standard, all the programs settings are default as per your tutorial. The only small diff here is that I swapped whisper for vosk to try it out some additional ASR models:

pipelines:
  default:
    mic:
      name: arecord
    wake:
      name: porcupine1
    vad:
      name: silero
    asr:
      name: vosk.client
    handle:
      name: repeat
    tts:
      name: piper.client
    snd:
      name: aplay

Issue number one - the silero-vad sensitivity? other VADs?

When the wake step is activated (by me saying a wake word to microphone), it sometimes happens that VAD (silero) is unable to capture me speaking, so the pipeline is keep hanging on that step unit the timeout from VAD. It could be because my microphone is noisy. Or I was just speaking too quiet. Or the default threshold of silero-vad is just a bit too low for my setup.

I guess, I'm not the only one who'll be playing with Rhasspy3 while using questionable quality microphones, with a constant background humming tone heard on records from such microphones :) so I was trying to find some ways to configure silero-vad to make it more sensitive but it looks like there're no such settings exposed to configuration.yaml right now. So this one is more like a gentle low priority feature request. In my particular case, I think I should just invest some bucks into some more decent mic compared to what I have now, prob an electret one.

Then I tried to swap silero-vad with something different. There are two other VADs Rhasspy3 is shipped with: energy and webrtcvad and I seen both of them are having some kind of sensitivity configurable in the configuration. I used similar steps as in your tutorial to get them installed, but as soon as I plug them into pipeline it starts to spill error messages on me. It looks like they're not quite ready, right? Or was it just me doing something wrong? I'm happy to dig more into that myself, if you tell me that both energy and webrtcvad are working out of the box.

Issue number two - even if VAD wasn't triggered, the captured PCM is still sent downwards to the pipe to ASR

Consider this scenario:

  1. saying a wakeword loud
  2. seeing the pipeline went to the point of VAD
  3. silently whispering something, without triggering VAD
  4. seeing VAD is not being activated
  5. after a timeout from VAD (something like 10-15 seconds I guess), whatever audio was captured by microphone is still sent to ASR and then the recognized text is sent to HANDLER.

The issue, as I see it, in the point 5. If VAD is presenting in the pipeline configuration (it is mandatory right now I guess), but it wasn't triggered for whatever reason, after the timeout happens the pipeline shouldn't be sending captured audio down to ASR, because otherwise - what's the point of having VAD here? :) If someone still needs this kind of behavior to send PCM to ASR even without VAD being triggered, I guess it can be made configurable from the configuration.yaml perspective.

That's it so far. Again - great piece of software!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.