Code Monkey home page Code Monkey logo

agents's Introduction

The LiveKit icon, the name of the repository and some sample code in the background.

LiveKit: Real-time video, audio and data for developers

LiveKit is an open source project that provides scalable, multi-user conferencing based on WebRTC. It's designed to provide everything you need to build real-time video audio data capabilities in your applications.

LiveKit's server is written in Go, using the awesome Pion WebRTC implementation.

GitHub stars Slack community Twitter Follow GitHub release (latest SemVer) GitHub Workflow Status License

Features

Documentation & Guides

https://docs.livekit.io

Live Demos

Ecosystem

  • Agents: build real-time multimodal AI applications with programmable backend participants
  • Egress: record or multi-stream rooms and export individual tracks
  • Ingress: ingest streams from external sources like RTMP, WHIP, HLS, or OBS Studio

SDKs & Tools

Client SDKs

Client SDKs enable your frontend to include interactive, multi-user experiences.

Language Repo Declarative UI Links
JavaScript (TypeScript) client-sdk-js React docs | JS example | React example
Swift (iOS / MacOS) client-sdk-swift Swift UI docs | example
Kotlin (Android) client-sdk-android Compose docs | example | Compose example
Flutter (all platforms) client-sdk-flutter native docs | example
Unity WebGL client-sdk-unity-web docs
React Native (beta) client-sdk-react-native native
Rust client-sdk-rust

Server SDKs

Server SDKs enable your backend to generate access tokens, call server APIs, and receive webhooks. In addition, the Go SDK includes client capabilities, enabling you to build automations that behave like end-users.

Language Repo Docs
Go server-sdk-go docs
JavaScript (TypeScript) server-sdk-js docs
Ruby server-sdk-ruby
Java (Kotlin) server-sdk-kotlin
Python (community) python-sdks
PHP (community) agence104/livekit-server-sdk-php

Tools

Install

Tip

We recommend installing LiveKit CLI along with the server. It lets you access server APIs, create tokens, and generate test traffic.

The following will install LiveKit's media server:

MacOS

brew install livekit

Linux

curl -sSL https://get.livekit.io | bash

Windows

Download the latest release here

Getting Started

Starting LiveKit

Start LiveKit in development mode by running livekit-server --dev. It'll use a placeholder API key/secret pair.

API Key: devkey
API Secret: secret

To customize your setup for production, refer to our deployment docs

Creating access token

A user connecting to a LiveKit room requires an access token. Access tokens (JWT) encode the user's identity and the room permissions they've been granted. You can generate a token with our CLI:

livekit-cli create-token \
    --api-key devkey --api-secret secret \
    --join --room my-first-room --identity user1 \
    --valid-for 24h

Test with example app

Head over to our example app and enter a generated token to connect to your LiveKit server. This app is built with our React SDK.

Once connected, your video and audio are now being published to your new LiveKit instance!

Simulating a test publisher

livekit-cli join-room \
    --url ws://localhost:7880 \
    --api-key devkey --api-secret secret \
    --room my-first-room --identity bot-user1 \
    --publish-demo

This command publishes a looped demo video to a room. Due to how the video clip was encoded (keyframes every 3s), there's a slight delay before the browser has sufficient data to begin rendering frames. This is an artifact of the simulation.

Deployment

Use LiveKit Cloud

LiveKit Cloud is the fastest and most reliable way to run LiveKit. Every project gets free monthly bandwidth and transcoding credits.

Sign up for LiveKit Cloud.

Self-host

Read our deployment docs for more information.

Building from source

Pre-requisites:

  • Go 1.22+ is installed
  • GOPATH/bin is in your PATH

Then run

git clone https://github.com/livekit/livekit
cd livekit
./bootstrap.sh
mage

Contributing

We welcome your contributions toward improving LiveKit! Please join us on Slack to discuss your ideas and/or PRs.

License

LiveKit server is licensed under Apache License v2.0.


LiveKit Ecosystem
Real-time SDKsReact Components · Browser · iOS/macOS · Android · Flutter · React Native · Rust · Node.js · Python · Unity (web) · Unity (beta)
Server APIsNode.js · Golang · Ruby · Java/Kotlin · Python · Rust · PHP (community)
Agents FrameworksPython · Playground
ServicesLivekit server · Egress · Ingress · SIP
ResourcesDocs · Example apps · Cloud · Self-hosting · CLI

agents's People

Contributors

brightsparc avatar calinr avatar cs50victor avatar davidzhao avatar dsa avatar keepingitneil avatar lukasio avatar mattherzog avatar ocupe avatar seanmuirhead avatar theomonnom avatar ty-elastic avatar vanics avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

agents's Issues

ZeroDivisionError in voice assistant during function calling and answering

Just tried function calling example with following versions:

livekit-agents==0.7.0
livekit-plugins-openai==0.5.0
livekit-plugins-deepgram==0.5.0
livekit-plugins-elevenlabs==0.5.0
livekit-plugins-silero==0.5.0

When i request toggle lights, I get ZeroDivisionError error:

Task exception was never retrieved
future: <Task finished name='Task-6761' coro=<entrypoint.<locals>._answer_light_toggling() done, defined at /path/to/project/test.py:65> exception=ZeroDivisionError('float division by zero')>

Traceback (most recent call last):
  File "/path/to/project/test.py", line 80, in _answer_light_toggling
    await assistant.say(stream)
  File "/path/to/packages/livekit/agents/voice_assistant/assistant.py", line 225, in say
    await self._start_speech(data, interrupt_current_if_possible=True)
  File "/path/to/packages/livekit/agents/voice_assistant/assistant.py", line 634, in _start_speech
    await self._play_task
  File "/path/to/packages/livekit/agents/voice_assistant/assistant.py", line 712, in _play_speech_if_validated
    await _synthesize_task
  File "/path/to/packages/livekit/agents/voice_assistant/assistant.py", line 818, in _synthesize_task
    await _forward_task
  File "/path/to/packages/livekit/agents/voice_assistant/assistant.py", line 773, in _forward_stream
    tts_forwarder.mark_audio_segment_end()
  File "/path/to/packages/livekit/agents/transcription/tts_forwarder.py", line 170, in mark_audio_segment_end
    seg.avg_speed = len(self._calc_hyphenes(seg.text)) / seg.audio_duration
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
ZeroDivisionError: float division by zero
Here is the `test.py` file content

import asyncio
import logging
from enum import Enum
from typing import Annotated

from livekit.agents import (
    JobContext,
    JobRequest,
    WorkerOptions,
    cli,
    llm,
)
from livekit.agents.voice_assistant import AssistantContext, VoiceAssistant
from livekit.plugins import deepgram, elevenlabs, openai, silero


class Room(Enum):
    BEDROOM = "bedroom"
    LIVING_ROOM = "living room"
    KITCHEN = "kitchen"
    BATHROOM = "bathroom"
    OFFICE = "office"


class AssistantFnc(llm.FunctionContext):
    @llm.ai_callable(desc="Turn on/off the lights in a room")
    async def toggle_light(
        self,
        room: Annotated[Room, llm.TypeInfo(desc="The specific room")],
        status: bool,
    ):
        logging.info("toggle_light %s %s", room, status)
        ctx = AssistantContext.get_current()
        key = "enabled_rooms" if status else "disabled_rooms"
        li = ctx.get_metadata(key, [])
        li.append(room)
        ctx.store_metadata(key, li)

    @llm.ai_callable(desc="User want the assistant to stop/pause speaking")
    def stop_speaking(self):
        pass  # do nothing


async def entrypoint(ctx: JobContext):
    gpt = openai.LLM(model="gpt-4-turbo")

    initial_ctx = llm.ChatContext(
        messages=[
            llm.ChatMessage(
                role=llm.ChatRole.SYSTEM,
                text="You are a voice assistant created by LiveKit. Your interface with users will be voice. You should use short and concise responses, and avoiding usage of unpronouncable punctuation.",
            )
        ]
    )

    assistant = VoiceAssistant(
        vad=silero.VAD(),
        stt=deepgram.STT(),
        llm=gpt,
        tts=elevenlabs.TTS(),
        fnc_ctx=AssistantFnc(),
        chat_ctx=initial_ctx,
    )

    async def _answer_light_toggling(enabled_rooms, disabled_rooms):
        prompt = "Make a summary of the following actions you did:"
        if enabled_rooms:
            enabled_rooms_str = ", ".join(enabled_rooms)
            prompt += f"\n - You enabled the lights in the following rooms: {enabled_rooms_str}"

        if disabled_rooms:
            disabled_rooms_str = ", ".join(disabled_rooms)
            prompt += f"\n - You disabled the lights in the following rooms {disabled_rooms_str}"

        chat_ctx = llm.ChatContext(
            messages=[llm.ChatMessage(role=llm.ChatRole.SYSTEM, text=prompt)]
        )

        stream = await gpt.chat(chat_ctx)
        await assistant.say(stream)

    @assistant.on("agent_speech_interrupted")
    def _agent_speech_interrupted(chat_ctx: llm.ChatContext, msg: llm.ChatMessage):
        msg.text += "... (user interrupted you)"

    @assistant.on("function_calls_finished")
    def _function_calls_done(ctx: AssistantContext):
        logging.info("function_calls_done %s", ctx)
        enabled_rooms = ctx.get_metadata("enabled_rooms", [])
        disabled_rooms = ctx.get_metadata("disabled_rooms", [])

        if enabled_rooms or disabled_rooms:
            # if there was a change in the lights, summarize it and let the user know
            asyncio.ensure_future(_answer_light_toggling(enabled_rooms, disabled_rooms))

    assistant.start(ctx.room)
    await asyncio.sleep(3)
    await assistant.say("Hey, how can I help you today?")


async def request_fnc(req: JobRequest) -> None:
    logging.info("received request %s", req)
    await req.accept(entrypoint)


if __name__ == "__main__":
    cli.run_app(WorkerOptions(request_fnc))

remix_and_resample effect optimization


                if isinstance(data, rtc.AudioFrame):
                    # TODO(theomonnom): The remix_and_resample method is low quality
                    # and should be replaced with a continuous resampling
                    frame = data.remix_and_resample(
                        self._sample_rate, self._num_channels
                    )

When will this TODO be optimized? After testing, the accuracy of this solution is much different from that of using pyaudio to read data from the microphone.

Bug in the examples/kitt

After the latest updates, the example agent KITT does not run

Exception in inference 'SynthesizeStream' object has no attribute 'flush'
Traceback (most recent call last):
  File "examples/kitt/inference_job.py", line 111, in _run
    await asyncio.gather(
  File "examples/kitt/inference_job.py", line 132, in _llm_task
    await self._tts_stream.flush()
          ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'SynthesizeStream' object has no attribute 'flush'

Agent speech output audio is interpreted as user speech

When using LiveKit agents, sometimes the agent hears its own TTS output (eg via the laptop speakers) which is then interpreted as speech from the user.

This then creates a feedback loop where the agent will then translate + respond a second time to its own speech output.

This only seems to happen when device volume is above ~25-30% and audio is being played through the device speakers.

To provide a seamless UX though, the user shouldn't have to worry about managing volume level in order to prevent this.

My current approach is:

  1. When instantiating a LiveKit room, enabling audioSuppression and echoCancellation, eg:
    <LiveKitRoom
        token={createAudioCoachingCallRequest.result.room_access_token}
        serverUrl={createAudioCoachingCallRequest.result.active_server_websocket_url}
        audio={{echoCancellation: true, noiseSuppression: true}}
        connect={true}
    >
  1. Enabling allowInterruptions=True in agent.py, eg:
    assistant = VoiceAssistant(
        ...,
        allow_interruptions=True,
    )
  1. Muting the user's mic on user_speech_committed + agent_started_speaking events, then unmuting on agent_speech_committed event (eg, after the Agent finishes speaking).

Muting the user's mic is a short-term workaround -- the main limitation being that the user can't interrupt the agent once it starts speaking.

Are there best practices for preventing this feedback loop / is this something LiveKit is working on addressing?

TTS Steaming is Broken for livekit-plugins-elevenlabs v0.4.0

This is a more clear description + RCA of #279

I think the issue is related to the TTS streaming implementation of livekit-plugins-elevenlabs. The reason I think that is because when I comment out this code from the assistant.py code, the very first assistant.say call is sent down to the agent playground.

Debugging further, I was able to verify that OpenAI is sending down the correct response to voices, but it wasn't getting streamed as audio to the playground.

I also tried downgrading to all the 0.4.0 dev versions, and none of them fixed the issue. 0.3.0 didn't work at all with the other packages.

The product code I'm using is the official agents quickstart guide.

Quickstart on the doc doesn't work. "Waiting for audio track" forever

I was trying to learn the agent but even the quickstart on the documentation doesn't work. I have properly set the steps and also set up deepgram api key. The agent playground works and I was able to join the room and Agent connected true but status is "starting" forever and in audio section, "Waiting for audio track" forever. I'm using Mac M3 with chrome.
Please help me out. Thanks in advance.

OpenAI: Optional arguments for ai.callable results in Key Error

When using an optional parameter with @llm.ai_callable, an "Key Error: 'key'" is thrown at https://github.com/livekit/agents/blob/main/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/llm.py#L169

@llm.ai_callable(desc="Use this tool to find a flight.")
async def search_flights(
    self,
    departure_id: Annotated[str, llm.TypeInfo(desc="Departure airport's IATA code or a city's Freebase ID starting with.")],
    passengers: Annotated[str, llm.TypeInfo(desc="Number of passenger to book the ticket for.")] = "1"
):

This happens because the args do not contain the "passenger" argument. Using arg.default it would be possible to populate the default values beforehand.

fnc = fncs[name]
# validate args before calling fnc
for arg in fnc.args.values():
    # Populate args with default values
    if arg.name not in args and arg.default is not inspect.Parameter.empty:
        args[arg.name] = arg.default

    if arg.default is inspect.Parameter.empty and arg.name not in args:
        logger.error(f"missing required arg {arg.name} for ai_callable {name}")
        return
  • Affected version livekit-plugins-openai~=0.4.dev1

Duplicated agent responses (LLM inference + TTS audio)

I've noticed that occasionally the agent will generate two distinct responses (LLM inference and TTS audio) for the same user input.

Interestingly, the second LLM inference isn't generated until after the first TTS audio is completed.

Usually, the second LLM inference + response will be generated using the entire user input, with the first inference being generated using a fraction of it (eg, it doesn't always seem to wait for the user to finish, or handle an interruption cleanly).

Another variant of this I've seen is that sometimes the LLM inference seems to get "caught" - eg, the AI will respond to the previous question I asked instead of the current one, but only once I ask the following question.

Will add logs here as repro's happen locally.

Am using this local setup on Macbook Chrome:

    assistant = VoiceAssistant(
        vad=silero.VAD(),
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o"),
        tts=elevenlabs.TTS(voice=DEFAULT_VOICE),
        chat_ctx=initial_ctx,
        fnc_ctx=fnc_ctx,
        allow_interruptions=True,
        debug=True
    )

Ability to perform actions when the session ends

Discussion from Slack:


I'm experimenting with the transcription agent example. It works well, but I'd like to do a bit of post processing after the room ends. I can't figure out where to do this though. It seems like the agent worker is possibly being killed immediately.
I've tried listening on the 'disconnect' event in my agent like this:

@job.room.on("disconnected")
async def on_room_disconnect():
    print('disconnected, do work here?')  # this never runs

We should ensure that all disconnected callbacks are finished before we terminate the worker process.

STT Timing Information -> Propose emitting END_OF_SPEECH before FINAL_TRANSCRIPT

I'm having trouble capturing timing information with VAD + STT.

given:

openai_stt = openai.STT()
vad = silero.VAD()
vad_stream = vad.stream()
stt = StreamAdapter(openai_stt, vad_stream)
stt_stream = stt.stream()

I looked into the StreamAdapter and found that it was re-emitting the VAD start/end of speech events. I was planning to use those to capture timing, but then I found that the END_OF_SPEECH event is delayed until after the FINAL_TRANSCRIPT, meaning that the timing now includes inference and API call overhead.

It looks like the END_OF_SPEECH event includes the first alternative just for convenience. I would propose to propagate the VAD events as-is in the adapter, and direct user to the transcript events to get transcription results.

diff --git a/livekit-agents/livekit/agents/stt/stream_adapter.py b/livekit-agents/livekit/agents/stt/stream_adapter.py
index 7050178..9b2d918 100644
--- a/livekit-agents/livekit/agents/stt/stream_adapter.py
+++ b/livekit-agents/livekit/agents/stt/stream_adapter.py
@@ -76,6 +76,9 @@ class StreamAdapterWrapper(SpeechStream):
                     start_event = SpeechEvent(SpeechEventType.START_OF_SPEECH)
                     self._event_queue.put_nowait(start_event)
                 elif event.type == VADEventType.END_OF_SPEECH:
+                    end_event = SpeechEvent(type=SpeechEventType.END_OF_SPEECH)
+                    self._event_queue.put_nowait(end_event)
+
                     merged_frames = merge_frames(event.frames)
                     event = await self._stt.recognize(
                         buffer=merged_frames, *self._args, **self._kwargs
@@ -87,12 +90,6 @@ class StreamAdapterWrapper(SpeechStream):
                         alternatives=[event.alternatives[0]],
                     )
                     self._event_queue.put_nowait(final_event)
-
-                    end_event = SpeechEvent(
-                        type=SpeechEventType.END_OF_SPEECH,
-                        alternatives=[event.alternatives[0]],
-                    )
-                    self._event_queue.put_nowait(end_event)
         except Exception:
             logging.exception("stt stream adapter failed")
         finally:

agent architecture - recovery from agent shutdown

if an agent (implemented by users of livekit using the python sdk) crashes, then ideally the livekit-backend should continue pinging the agent pool until another agent becomes available.

however this is not the case:
if the agent stops (for instance I am running the agent in my debugger, and stop the debugger to change code),
and then run the agent again, the livekit-backend never calls my agent to rejoin the room.

for a system built with agents to become production-ready, this error recovery mechanism is a must

deepgram connection failed

Has anyone encountered this problem?

{"asctime": "2024-05-20 20:00:29,659", "level": "WARNING", "name": "livekit.plugins.deepgram", "message": "deepgram connection failed, retrying in 2s", "job_id": "AJ_oRch8GR9fxqf", "pid": 99836}

{"asctime": "2024-05-20 20:00:31,663", "level": "INFO", "name": "livekit.plugins.deepgram", "message": "connecting to deepgram url wss://api.deepgram.com/v1/listen?model=nova-2-general&punctuate=true&smart_format=true&interim_results=true&encoding=linear16&sample_rate=48000&vad_events=true&channels=1&endpointing=0&language=en-us", "job_id": "AJ_oRch8GR9fxqf", "pid": 99836}

{"asctime": "2024-05-20 20:00:29,662", "level": "ERROR", "name": "livekit.plugins.deepgram", "message": "deepgram Exception.with_tracebackTraceback (most recent call last):\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/aiohttp/connector.py\", line 1025, in _wrap_create_connection\n return await self._loop.create_connection(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/asyncio/base_events.py\", line 1085, in create_connection\n raise exceptions[0]\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/asyncio/base_events.py\", line 1069, in create_connection\n sock = await self._connect_sock(\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/asyncio/base_events.py\", line 973, in _connect_sock\n await self.sock_connect(sock, address)\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/asyncio/selector_events.py\", line 634, in sock_connect\n return await fut\n ^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/asyncio/selector_events.py\", line 674, in _sock_connect_cb\n raise OSError(err, f'Connect call failed {address}')\nTimeoutError: [Errno 60] Connect call failed ('38.104.135.211', 443)\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/livekit/plugins/deepgram/stt.py\", line 228, in _run\n ws = await self._session.ws_connect(url, headers=headers)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/aiohttp/client.py\", line 835, in _ws_connect\n resp = await self.request(\n ^^^^^^^^^^^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/aiohttp/client.py\", line 581, in _request\n conn = await self._connector.connect(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/aiohttp/connector.py\", line 544, in connect\n proto = await self._create_connection(req, traces, timeout)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/aiohttp/connector.py\", line 944, in _create_connection\n _, proto = await self._create_direct_connection(req, traces, timeout)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/aiohttp/connector.py\", line 1257, in _create_direct_connection\n raise last_exc\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/aiohttp/connector.py\", line 1226, in _create_direct_connection\n transp, proto = await self._wrap_create_connection(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/aiohttp/connector.py\", line 1033, in _wrap_create_connection\n raise client_error(req.connection_key, exc) from exc\naiohttp.client_exceptions.ClientConnectorError: Cannot connect to host api.deepgram.com:443 ssl:default [Connect call failed ('38.104.135.211', 443)]\n", "job_id": "AJ_oRch8GR9fxqf", "pid": 99836}

livekit is not support CentOS 7?

I want to start livekit-agents
I used livekit-cloud.
But I meet this error.

OSError: /home/shared/apps/tutor-livekit-speech/.venv/lib/python3.12/site-packages/livekit/rtc/resources/liblivekit_ffi.so: cannot open shared object file: No such file or directory

system:
Linux version 4.19.104-300.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)) livekit/livekit#1 SMP Mon Feb 17 15:34:16 UTC 2020
CentOS Linux release 7.6.1810 (Core)

version:
livekit: 0.11.1
livekit-agents:0.6.0

language:
python3.12

OnDemand Agent

A use case we're exploring involves implementing a translator agent that would only be needed when participants are speaking different languages. It seems that an on-demand agent would be better suited than agents automatically joining the room since the translator agent wouldn't always be needed. We envision user interaction that allows for starting and stopping the agent as needed. Other than running "python my_agent.py simulate-job --room-name ", are there any other approaches for this or anything the team considering?

Minimal Assistant Download Files error

I get the following onnx error when trying to download files. Please help

➜ python minimal_assistant.py download-files
Using cache found in /Users/sajjadk/.cache/torch/hub/snakers4_silero-vad_master
Traceback (most recent call last):
File "/Users/sajjadk/dev/onset_assist/agents/examples/voice-assistant/minimal_assistant.py", line 43, in
cli.run_app(WorkerOptions(request_fnc))
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/livekit/agents/cli/cli.py", line 127, in run_app
cli()
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/livekit/agents/cli/cli.py", line 124, in download_files
plugin.download_files()
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/livekit/plugins/silero/init.py", line 29, in download_files
_ = torch.hub.load(
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/torch/hub.py", line 568, in load
model = _load_local(repo_or_dir, model, *args, **kwargs)
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/torch/hub.py", line 597, in _load_local
model = entry(*args, **kwargs)
TypeError: silero_vad() got an unexpected keyword argument 'use_onnx'

In Kitt example, eleven labs API throws many time out errors

What I expect

When Running the agent I see logs but no errors

What happens

The repeated error you get while the agent is running:

ERROR:root:Unhandled message from ElevenLabs: {'message': 'Have not received a new text input within the timeout of 20.0 seconds. Streaming input terminated. Please make sure to either feed the input in the timely manner, or to send the end of input text (empty string "") when done.', 'error': 'input_timeout_exceeded', 'code': 1008}

How to reproduce

Follow the steps to run the kitt example and open her up in the playground.

Severity

This error happens repeatedly and even after a few seconds of this failure on an unpaid eleven labs account a free account will get banned with this error response:

ERROR:root:Unhandled message from ElevenLabs: {'message': 'Unusual activity detected. Free Tier usage disabled. If you are using 
a proxy/VPN you might need to purchase a Paid Plan to not trigger our abuse detectors. Free Tier only works if users do not abuse it, for example by creating multiple free accounts. If we notice that many people try to abuse it, we will need to reconsider Free Tier altogether. \nPlease play fair and purchase any Paid Subscription to continue.', 'error': 'detected_unusual_activity', 'code': 1008}

Eleven Labs TTS Stream Doesn't return the text of the audio events being generated

Just add this code to fix it please

          text = ''
          try:
              text = ''.join(msg['normalizedAlignment']['chars'])
          except Exception:
              pass

In this section of the code
env/lib/python3.10/site-packages/livekit/plugins/elevenlabs/tts.py

LINE 286: msg = json.loads(msg.data)
if msg.get("audio"):
    data = base64.b64decode(msg["audio"])
    audio_frame = rtc.AudioFrame(
        data=data,
        sample_rate=self._config.sample_rate,
        num_channels=1,
        samples_per_channel=len(data) // 2,
    )
    text = ''
    try:
        text = ''.join(msg['normalizedAlignment']['chars'])
    except Exception:
        pass
    self._event_queue.put_nowait(
        tts.SynthesisEvent(
            type=tts.SynthesisEventType.AUDIO,
            audio=tts.SynthesizedAudio(text=text, data=audio_frame),
        )
    )

what is the working version of example/kitt

I am trying to local deploy the example/kitt on branch neil/kitt, but it fails to work, the agent can join in the room, but when llm replies the text, the agent is not speaking. The log shows below, looks tts is not called in the process. Is the latest example/kitt working on branch neil/kitt? I know there are many changes in the kitt area. How can I get the working example kitt working? Thanks.
image
Uploading image (1).png…

livekit is not support CentOS 7?

I want to start livekit-agents
I used livekit-cloud.
I start this:
python3 main.py download-files
But I meet this error.

OSError: /home/shared/apps/tutor-livekit-speech/.venv/lib/python3.12/site-packages/livekit/rtc/resources/liblivekit_ffi.so: cannot open shared object file: No such file or directory

system:
Linux version 4.19.104-300.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)) #1 SMP Mon Feb 17 15:34:16 UTC 2020
CentOS Linux release 7.6.1810 (Core)

version:
livekit: 0.11.1
livekit-agents:0.6.0

language:
python3.12

wait_pc_connection error job_request.accept

I am testing agents and everything has been working fine until today when I started getting wait_pc_connection error thrown in the async def job_request_cb(job_request: agents.JobRequest):
await job_request.accept(

I am also seeing this in the logs, which I am not sure if is related:

2024-04-12 11:09:44,964 - ERROR - livekit_api::signal_client::signal_stream:171:livekit_api::signal_client::signal_stream - unhandled websocket message Err(Protocol(ResetWithoutClosingHandshake))
2024-04-12 11:09:44,965 - DEBUG - livekit::rtc_engine::rtc_session:455:livekit::rtc_engine::rtc_session - received leave request: LeaveRequest { can_reconnect: false, reason: DuplicateIdentity }

I tried upgrading all the of the livekit python libraries but no luck.

Any ideas?

Audio was attempted to be pushed after the stream was closed.

Hi there! 👋

I ran into a little hiccup while working on the voice assistant.

It seems like there is an issue in livekit/agents/transcription/tts_forwarder.py.

Here is the error message :

Task exception was never retrieved
future: <Task finished name='Task-17356' coro=<VoiceAssistant._synthesize_streamed_speech_co.<locals>._read_generated_audio_task() done, defined at /Users/ubeytdemir/anaconda3/lib/python3.11/site-packages/livekit/agents/voice_assistant/assistant.py:755> exception=RuntimeError('push_audio called after close')>
Traceback (most recent call last):
  File "/Users/ubeytdemir/anaconda3/lib/python3.11/site-packages/livekit/agents/voice_assistant/assistant.py", line 766, in _read_generated_audio_task
    tts_forwarder.push_audio(event.audio.data)
  File "/Users/ubeytdemir/anaconda3/lib/python3.11/site-packages/livekit/agents/transcription/tts_forwarder.py", line 148, in push_audio
    raise RuntimeError("push_audio called after close")
RuntimeError: push_audio called after close

TTS plugin : 11Labs
Python Version : 3.11.5
OS: Darwin arm64

AttributeError on docker run: 'livekit.rtc' has no attribute 'ArgbFrame'

I executed docker build and run as follows.

docker build fal ./examples/fal
docker run fal

Then, the following error occurred.

Traceback (most recent call last):
  File "/app/fal.py", line 21, in <module>
    from fal_sd_turbo import FalSDTurbo
  File "/app/fal_sd_turbo.py", line 61, in <module>
    class SDTurboHighFPSStream:
  File "/app/fal_sd_turbo.py", line 201, in SDTurboHighFPSStream
    async def __anext__(self) -> rtc.ArgbFrame:
                                 ^^^^^^^^^^^^^
AttributeError: module 'livekit.rtc' has no attribute 'ArgbFrame'

I would appreciate it if you could tell me how to handle this error.

Thank you.

wait_pc_connection time out

I meet some error.
the pc_connection is always time out.(maybe my free account?)

@davidzhao
@JARVISMindEngineer

need some help please.

2024-06-05 16:05:05,356 - livekit - ERROR - livekit_ffi::server::room:200:livekit_ffi::server::room - error while connecting to a room: engine: connection error: wait_pc_connection timed out 2024-06-05 16:05:05,356 - livekit.agents - DEBUG - disconnecting from room 2024-06-05 16:05:05,356 - livekit - ERROR - livekit_ffi::server::room:200:livekit_ffi::server::room - error while connecting to a room: engine: connection error: wait_pc_connection timed out {"asctime": "2024-06-05 16:05:05,356", "level": "ERROR", "name": "livekit", "message": "livekit_ffi::server::room:200:livekit_ffi::server::room - error while connecting to a room: engine: connection error: wait_pc_connection timed out", "taskName": "Task-13", "job_id": "AJ_6eR7aeYjj2QR", "pid": 22306} 2024-06-05 16:05:05,608 - livekit.agents - ERROR - pipe closed, exiting job {"asctime": "2024-06-05 16:05:05,608", "level": "ERROR", "name": "livekit.agents", "message": "pipe closed, exiting job", "taskName": "Task-13", "job_id": "AJ_6eR7aeYjj2QR", "pid": 22306} 2024-06-05 16:05:05,608 - livekit.agents - INFO - job process closed {"asctime": "2024-06-05 16:05:05,608", "level": "INFO", "name": "livekit.agents", "message": "job process closed", "taskName": "Task-13", "job_id": "AJ_6eR7aeYjj2QR", "pid": 22306}

and sometimes other error
`2024-06-05 17:20:50,914 - livekit.agents - WARNING - assignment for job AJ_4UiBnp38bMBp timed out
{"asctime": "2024-06-05 17:20:50,914", "level": "WARNING", "name": "livekit.agents", "message": "assignment for job AJ_4UiBnp38bMBp timed out", "taskName": "Task-65", "req": "<livekit.agents.job_request.JobRequest object at 0x151d76780>"}
2024-06-05 17:20:50,915 - livekit.agents - ERROR - user request handler for job AJ_4UiBnp38bMBp failed
Traceback (most recent call last):
File "/opt/homebrew/Cellar/[email protected]/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
return await fut
^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/yangqingyuan/venv/lib/python3.12/site-packages/livekit/agents/worker.py", line 386, in _user_cb
await self._opts.request_fnc(req)
File "/Users/yangqingyuan/PycharmProjects/livekit/main.py", line 65, in request_fnc
await req.accept(entrypoint)
File "/Users/yangqingyuan/venv/lib/python3.12/site-packages/livekit/agents/job_request.py", line 127, in accept
raise exc
File "/Users/yangqingyuan/venv/lib/python3.12/site-packages/livekit/agents/worker.py", line 429, in _wait_response
await asyncio.wait_for(wait_assignment, consts.ASSIGNMENT_TIMEOUT)
File "/opt/homebrew/Cellar/[email protected]/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 519, in wait_for
async with timeouts.timeout(timeout):
File "/opt/homebrew/Cellar/[email protected]/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/timeouts.py", line 115, in aexit
raise TimeoutError from exc_val
TimeoutError
{"asctime": "2024-06-05 17:20:50,915", "level": "ERROR", "name": "livekit.agents", "message": "user request handler for job AJ_4UiBnp38bMBp failed", "exc_info": "Traceback (most recent call last):\n File "/opt/homebrew/Cellar/[email protected]/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 520, in wait_for\n return await fut\n ^^^^^^^^^\nasyncio.exceptions.CancelledError\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/Users/yangqingyuan/venv/lib/python3.12/site-packages/livekit/agents/worker.py", line 386, in _user_cb\n await self._opts.request_fnc(req)\n File "/Users/yangqingyuan/PycharmProjects/livekit/main.py", line 65, in request_fnc\n await req.accept(entrypoint)\n File "/Users/yangqingyuan/venv/lib/python3.12/site-packages/livekit/agents/job_request.py", line 127, in accept\n raise exc\n File "/Users/yangqingyuan/venv/lib/python3.12/site-packages/livekit/agents/worker.py", line 429, in _wait_response\n await asyncio.wait_for(wait_assignment, consts.ASSIGNMENT_TIMEOUT)\n File "/opt/homebrew/Cellar/[email protected]/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 519, in wait_for\n async with timeouts.timeout(timeout):\n File "/opt/homebrew/Cellar/[email protected]/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/timeouts.py", line 115, in aexit\n raise TimeoutError from exc_val\nTimeoutError", "taskName": "Task-66", "req": "<livekit.agents.job_request.JobRequest object at 0x151d76780>"}`

VADEventType error

When I was running the worker locally, I encountered an issue where it was impossible to conduct agent conversations. After debugging, it was discovered that within the "silero/vad.py" file, the method "_dispatch_event" consistently failed to create "agents.vad.VADEvent".

Subsequently, the issue was identified to be related to the VADEventType, as illustrated in the figure.
wait for help @davidzhao

问题图1
image
image

My playground is not voice and can't talk with agent #293

I successfully deployed the worker locally and registered it using ElevenLabs, but after a successful connection, there is no voice saying "Hey, how can I help you today?".

I talk with agent, there is not response and voice. And I didn't find some error log.

image

this is my log:

import sys; print('Python %s on %s' % (sys.version, sys.platform)) /Users/yangqingyuan/anaconda3/bin/python -X pycache_prefix=/Users/yangqingyuan/Library/Caches/JetBrains/PyCharm2024.1/cpython-cache /Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client 127.0.0.1 --port 61578 --file /Users/yangqingyuan/main.py start Connected to pydev debugger (build 241.15989.155) {"asctime": "2024-05-20 17:55:01,480", "level": "INFO", "name": "livekit.agents", "message": "starting worker", "version": "0.6.0"} {"asctime": "2024-05-20 17:55:04,062", "level": "INFO", "name": "livekit.agents", "message": "registered worker", "id": "AW_3947XPhMbsS8", "server_info": "edition: Cloud\nversion: \"1.6.1\"\nprotocol: 13\nregion: \"Japan\"\nnode_id: \"NC_OTOKYO1A_a23my2USzTK5\"\n"} {"asctime": "2024-05-20 17:55:13,499", "level": "INFO", "name": "root", "message": "received request <livekit.agents.job_request.JobRequest object at 0x142008ad0>"} {"asctime": "2024-05-20 17:55:13,607", "level": "INFO", "name": "livekit.agents", "message": "accepted job AJ_nPJmwytXoBGM", "job": "id: \"AJ_nPJmwytXoBGM\"\nroom {\n sid: \"RM_PNHFgotZGwGc\"\n name: \"playground-rVeN-H9h0\"\n empty_timeout: 300\n creation_time: 1716198912\n enabled_codecs {\n mime: \"video/H264\"\n }\n enabled_codecs {\n mime: \"video/VP8\"\n }\n enabled_codecs {\n mime: \"video/VP9\"\n }\n enabled_codecs {\n mime: \"video/AV1\"\n }\n enabled_codecs {\n mime: \"audio/red\"\n }\n enabled_codecs {\n mime: \"audio/opus\"\n }\n version {\n unix_micro: 1716198912832897\n }\n departure_timeout: 20\n}\nnamespace: \"default\"\n"} {"asctime": "2024-05-20 17:55:22,110", "level": "INFO", "name": "livekit", "message": "livekit_ffi::server:125:livekit_ffi::server - initializing ffi server v0.5.0", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:22,113", "level": "INFO", "name": "livekit", "message": "livekit_ffi::cabi:27:livekit_ffi::cabi - initializing ffi server v0.5.0", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:22,125", "level": "INFO", "name": "livekit", "message": "livekit_api::signal_client::signal_stream:88:livekit_api::signal_client::signal_stream - connecting to wss://frogpig-iidj6sdo.livekit.cloud/rtc?sdk=rust&protocol=9&auto_subscribe=1&adaptive_stream=0&access_token=...", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} Using cache found in /Users/yangqingyuan/.cache/torch/hub/snakers4_silero-vad_master 2024-05-20 17:55:33.112970 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '628'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115145 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '629'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115185 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '623'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115194 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '625'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115204 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '620'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115500 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '139'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115529 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '131'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115535 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '140'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115541 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '134'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115546 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '136'. It is not used by any node and should be removed from the model. {"asctime": "2024-05-20 17:55:33,517", "level": "WARNING", "name": "livekit.agents", "message": "Running <Task pending name='Task-16' coro=<entrypoint() running at /Users/yangqingyuan/main.py:43> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[_start.<locals>._start_if_valid.<locals>.log_exception() at /Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/livekit/agents/ipc/job_main.py:99]> took too long: 2.36 seconds", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:36,517", "level": "INFO", "name": "livekit.agents", "message": "assistant - saying", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:36,518", "level": "INFO", "name": "livekit.agents", "message": "assistant - synthesizing text", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:36,519", "level": "INFO", "name": "livekit.agents", "message": "assistant - enqueuing speech", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:36,520", "level": "INFO", "name": "livekit.agents", "message": "assistant - speech validated, data=", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:36,527", "level": "INFO", "name": "livekit.agents", "message": "_SpeechData(source=<async_generator object VoiceAssistant.say.<locals>._gen at 0x134344d00>, allow_interruptions=True, add_to_ctx=True, val_ch=<livekit.agents.aio.channel.Chan object at 0x134303390>, validated=True, interrupted=True, collected_text='', answering_user_speech=None)", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:37,159", "level": "INFO", "name": "livekit.plugins.elevenlabs", "message": "waiting for 11labs message", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:37,991", "level": "INFO", "name": "livekit.plugins.elevenlabs", "message": "received 11labs message WSMessage(type=<WSMsgType.TEXT: 1>, data='{\"audio\":\"\",\"isFinal\":null,\"normalizedAlignment\":{\"chars\":[\" \",\"H\",\"e\",\"y\",\",\",\" \",\"h\",\"o\",\"w\",\" \",\"c\",\"a\",\"n\",\" \",\"I\",\" \",\"h\",\"e\",\"l\",\"p\",\" \",\"y\",\"o\",\"u\",\" \",\"t\",\"o\",\"d\",\"a\",\"y\",\"?\",\" \"],\"charStartTimesMs\":[0,35,81,174,221,255,279,313,348,372,406,441,476,511,546,580,615,650,685,720,755,789,813,836,871,906,929,975,1045,1091,1207,1277],\"charDurationsMs\":[35,46,93,47,34,24,34,35,24,34,35,35,35,35,34,35,35,35,35,35,34,24,23,35,35,23,46,70,46,116,70,163]},\"alignment\":{\"chars\":[\"H\",\"e\",\"y\",\",\",\" \",\" \",\"h\",\"o\",\"w\",\" \",\"c\",\"a\",\"n\",\" \",\"I\",\" \",\"h\",\"e\",\"l\",\"p\",\" \",\"y\",\"o\",\"u\",\" \",\"t\",\"o\",\"d\",\"a\",\"y\",\"?\"],\"charStartTimesMs\":[0,81,174,221,255,279,279,313,348,372,406,441,476,511,546,580,615,650,685,720,755,789,813,836,871,906,929,975,1045,1091,1207],\"charDurationsMs\":[81,93,47,34,24,0,34,35,24,34,35,35,35,35,34,35,35,35,35,35,34,24,23,35,35,23,46,70,46,116,233]}}', extra='')", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:37,995", "level": "INFO", "name": "livekit.plugins.elevenlabs", "message": "waiting for 11labs message", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:56:48,664", "level": "WARNING", "name": "livekit.plugins.deepgram", "message": "deepgram connection failed, retrying in 0s", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:57:28,808", "level": "WARNING", "name": "livekit.agents", "message": "job is unresponsive", "delay": 27, "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:58:03,682", "level": "WARNING", "name": "livekit.plugins.deepgram", "message": "deepgram connection failed, retrying in 2s", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921}

deepgram transcribe quality in livekit much lower than when transcribing on deepgram demo website

please compare the transcribing the accuracy of running the livekit+deepgram agent demo with running the deepgram demo on their website, you will see quality of transcription much higher on deepgram.

deepgram demo:
https://console.deepgram.com/project/<YOUR_PROJECT_ID>/mission/transcribe-your-voice-in-realtime

livekit+agent demo:
https://github.com/livekit/agents/tree/main/livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram

KITT not working - latest master

KITT example is not working throwing this exception:

Exception in thread Thread-2 (_write_thread):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
{"asctime": "2024-04-13 20:00:42,152", "level": "DEBUG", "name": "root", "message": "process started", "job_id": "AJ_AUFCiWbbBkka", "url": "http://livekit-server.livekit", "pid": 92412}
failed to write log: to_bytes() missing required argument 'byteorder' (pos 2)
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/venv/lib/python3.10/site-packages/livekit/agents/apipe.py", line 49, in _write_thread
    ipc_enc.write_msg(self._p, msg)
  File "/venv/lib/python3.10/site-packages/livekit/agents/ipc_enc.py", line 45, in write_msg
    b.write(msg.MSG_ID.to_bytes(4))
TypeError: to_bytes() missing required argument 'byteorder' (pos 2)

error when running the examples on codespaces

When I try to run the voice assistant example on github codespaces, upon the execution of python minimal_assistant.py download-files
I get the following error :

python minimal_assistant.py download-files
{"asctime": "2024-05-27 18:26:13,005", "level": "INFO", "name": "livekit.agents", "message": "Downloading files for <livekit.plugins.deepgram.DeepgramPlugin object at 0x7b7d8a435030>"}
{"asctime": "2024-05-27 18:26:13,005", "level": "INFO", "name": "livekit.agents", "message": "Finished downloading files for <livekit.plugins.deepgram.DeepgramPlugin object at 0x7b7d8a435030>"}
{"asctime": "2024-05-27 18:26:13,005", "level": "INFO", "name": "livekit.agents", "message": "Downloading files for <livekit.plugins.elevenlabs.ElevenLabsPlugin object at 0x7b7d8a436050>"}
{"asctime": "2024-05-27 18:26:13,005", "level": "INFO", "name": "livekit.agents", "message": "Finished downloading files for <livekit.plugins.elevenlabs.ElevenLabsPlugin object at 0x7b7d8a436050>"}
{"asctime": "2024-05-27 18:26:13,005", "level": "INFO", "name": "livekit.agents", "message": "Downloading files for <livekit.plugins.openai.OpenAIPlugin object at 0x7b7d8a436e60>"}
{"asctime": "2024-05-27 18:26:13,005", "level": "INFO", "name": "livekit.agents", "message": "Finished downloading files for <livekit.plugins.openai.OpenAIPlugin object at 0x7b7d8a436e60>"}
{"asctime": "2024-05-27 18:26:13,005", "level": "INFO", "name": "livekit.agents", "message": "Downloading files for <livekit.plugins.silero.SileroPlugin object at 0x7b7d89471ba0>"}
Using cache found in /home/codespace/.cache/torch/hub/snakers4_silero-vad_master
Traceback (most recent call last):
  File "/workspaces/agents/examples/voice-assistant/minimal_assistant.py", line 43, in <module>
    cli.run_app(WorkerOptions(request_fnc))
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/livekit/agents/cli/cli.py", line 191, in run_app
    cli()
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/livekit/agents/cli/cli.py", line 188, in download_files
    plugin.download_files()
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/livekit/plugins/silero/__init__.py", line 29, in download_files
    _ = torch.hub.load(
  File "/home/codespace/.local/lib/python3.10/site-packages/torch/hub.py", line 568, in load
    model = _load_local(repo_or_dir, model, *args, **kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/torch/hub.py", line 594, in _load_local
    hub_module = _import_module(MODULE_HUBCONF, hubconf_path)
  File "/home/codespace/.local/lib/python3.10/site-packages/torch/hub.py", line 106, in _import_module
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/codespace/.cache/torch/hub/snakers4_silero-vad_master/hubconf.py", line 5, in <module>
    from utils_vad import (init_jit_model,
  File "/home/codespace/.cache/torch/hub/snakers4_silero-vad_master/utils_vad.py", line 2, in <module>
    import torchaudio
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/torchaudio/__init__.py", line 2, in <module>
    from . import _extension  # noqa  # usort: skip
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/torchaudio/_extension/__init__.py", line 38, in <module>
    _load_lib("libtorchaudio")
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 60, in _load_lib
    torch.ops.load_library(path)
  File "/home/codespace/.local/lib/python3.10/site-packages/torch/_ops.py", line 1032, in load_library
    ctypes.CDLL(path)
  File "/usr/local/python/3.10.13/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libtorch_cuda.so: cannot open shared object file: No such file or directory

Anyone knows the cause behind this ?

Can clients re-use token after disconnect?

I recently used livekit for my application using agents example. And I had a problem:

What happended?

I use Android application to connect to livekit with genereated token and wss url, everything is okay. Then I disconnect my client from livekit using room.disconnect(). I saw that my agents still running for this room. After that, I reconnect using the same token and I only see a black screen.
In server side I saw this log:

{"asctime": "2024-04-25 09:11:32,833", "level": "ERROR", "name": "livekit.agents", "message": "livekit_api::signal_client::signal_stream:178:livekit_api::signal_client::signal_stream - unhandled websocket message Err(Protocol(ResetWithoutClosingHandshake))", "job_id": "AJ_8oNxanmAa5c2", "pid": 133}
{"asctime": "2024-04-25 09:11:32,837", "level": "DEBUG", "name": "livekit", "message": "livekit::rtc_engine:377:livekit::rtc_engine - engine task closed"}
{"asctime": "2024-04-25 09:11:32,837", "level": "DEBUG", "name": "livekit", "message": "livekit::room:943:livekit::room - disconnected from room: UnknownReason"}

What I expected:

I can re-use the token from client to re-connect to the room.
Please help,
Thanks!

elevenlabs-plugin: "cloned" + "professional" voices not working

With the minimal_assistant.py example, the above categories of voices don't seem to generate output properly. Instead it seems to hang for long periods of time.

Am seeing LLM chat completion requests completing successfully, which seems to suggest that STT and OpenAI are working, but no audio output.

I'm using a Macbook with M1 / Ventura 13.0.1, running Python 3.12.

elevenlabs-plugin is working correctly with "premade" voices, and also with OpenAI's TTS (though it doesn't support streaming)

Examples of non-working code:

Voice = elevenlabs.Voice(
    id=MY_VOICE_ID,
    name="Voice Name",
    category="professional",
    settings=elevenlabs.VoiceSettings(
        stability=0.60, similarity_boost=1.0
    )
)

assistant = VoiceAssistant(
  vad=silero.VAD(),
  stt=deepgram.STT(),
  llm=openai.LLM(),
  tts=elevenlabs.TTS(voice=Voice),
  chat_ctx=initial_ctx,
)
assistant.start(ctx.room)

As well as:

Voice = elevenlabs.Voice(
    id=MY_VOICE_ID,
    name="Voice Name",
    category="cloned",
    settings=elevenlabs.VoiceSettings(
        stability=0.60, similarity_boost=1.0
    )
)

assistant = VoiceAssistant(
  vad=silero.VAD(),
  stt=deepgram.STT(),
  llm=openai.LLM(),
  tts=elevenlabs.TTS(voice=Voice),
  chat_ctx=initial_ctx,
)
assistant.start(ctx.room)

What I'm seeing in logs from agent running locally:

2024-05-14 01:56:21,398 INFO  httpx  HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"	  job_id=AJ_oiddbfaSEWxh pid=56250
2024-05-14 01:56:22,141 INFO  httpx  HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"	  job_id=AJ_oiddbfaSEWxh pid=56250
2024-05-14 01:56:23,720 ERROR  livekit.plugins.elevenlabs  11labs connection failed
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/livekit/plugins/elevenlabs/tts.py", line 365, in _run_ws
    await asyncio.gather(send_task(), recv_task())
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/livekit/plugins/elevenlabs/tts.py", line 340, in recv_task
    raise Exception("11labs connection closed unexpectedly")
Exception: 11labs connection closed unexpectedly
	  job_id=AJ_oiddbfaSEWxh pid=56250
2024-05-14 01:56:32,020 INFO  httpx  HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"	  job_id=AJ_oiddbfaSEWxh pid=56250
2024-05-14 01:56:33,258 ERROR  livekit.plugins.elevenlabs  11labs connection failed
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/livekit/plugins/elevenlabs/tts.py", line 365, in _run_ws
    await asyncio.gather(send_task(), recv_task())
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/livekit/plugins/elevenlabs/tts.py", line 340, in recv_task
    raise Exception("11labs connection closed unexpectedly")
Exception: 11labs connection closed unexpectedly
	  job_id=AJ_oiddbfaSEWxh pid=56250

Happy to share more context as needed - awesome project!

Elevenlabs should work for Free and Starter price tiers

The ElevenLabs plugin was written when PCM audio was supported in lower price tiers. Now it looks like only mp3 audio is supported in the lower price tiers.

Our default implementation currently doesn't work. This issue represents:

  1. The ElevenLabs Plugin should support for the mp3 output formats
  2. The default for the ElevenLabs should be an output format supported by the lowest price tier

Hidden agent can not publish audio track

A hidden agent will not publish its audio track and have other participants subscribe to it. Other real participants do not automatically subscribe to this track and do not receive track publish events. If the agent is marked not hidden / or visible, then the tracks publish correctly and the other participants automatically subscribe to them. I imagine most use cases will be for the agent to be hidden. Is this a bug or by design? For a use case we are looking at, it would be great to have this functionality.

ideo Frame Not Appearing in Livekit Playground on Mac M3 - Agent Status Stuck at "Starting"

Hi I'm trying to test the playground with agent but the video frame doesn't appear somehow. It shows dark green color on the video section. And Agent connection True but agent status is "starting" forever.
Here is my code for agent. I'm using mac M3.
import logging
from livekit import rtc
from livekit.agents import JobContext, JobRequest, WorkerOptions, cli

WIDTH = 640
HEIGHT = 480

async def entrypoint(job: JobContext):
room = job.room
source = rtc.VideoSource(WIDTH, HEIGHT)
track = rtc.LocalVideoTrack.create_video_track("video", source)
options = rtc.TrackPublishOptions(source=rtc.TrackSource.SOURCE_CAMERA)
publication = await room.local_participant.publish_track(track, options)
logging.info("published track", extra={"track_sid": publication.sid})

async def request_fnc(req: JobRequest) -> None:
logging.info("received request %s", req)
await req.accept(entrypoint)

if name == "main":
from dotenv import load_dotenv
import os

load_dotenv()

LIVEKIT_API_KEY = os.getenv("LIVEKIT_API_KEY")
LIVEKIT_API_SECRET = os.getenv("LIVEKIT_API_SECRET")
LIVEKIT_URL = os.getenv("NEXT_PUBLIC_LIVEKIT_URL")
cli.run_app(WorkerOptions(request_fnc=request_fnc, api_key=LIVEKIT_API_KEY, api_secret=LIVEKIT_API_SECRET,ws_url=LIVEKIT_URL))

Could someone help?

Elevenlabs TTS websocket connection design

Hi,

I was able to make the minimal_assistant.py implementation work. Once I sorted out all the difficulties, it runs pretty well! Kudos for that 😃.

I have a question regarding the WebSocket connections used in the ElevenLabs TTS module. In my environment, I noticed that the WebSocket creation is being triggered every time the agent responds to the user. Consequently, the WebSocket is being closed every time the agent stops talking.

Questions:

  • Is this a design decision? If so, could you please explain the rationale behind it?

  • Is there a specific reason for not maintaining a persistent WebSocket connection throughout the session?

I believe closing and reopening the WebSocket repeatedly introduces unnecessary overhead. Maintaining one or a few stable connections throughout the session might be more efficient.

Looking forward to your insights on this.

Thank you!

openai.tts with StreamAdapter has some bugs

The experience with this SDK is a bit poor

1、elevenlabs tts.py line:333, if meet api error, please take a log. @MichaelYang1995 china area can't visit elevenlabs, with VPN only use paid elevenlabs API. image be a paid api user to try.

2、openai.tts with StreamAdapter has some bugs, if you follow agent quick-start doc, and use follow code:

openai_tts = openai.TTS(
            model=openai.TTSModels, 
            voice=openai.TTSVoices)
    vad = silero.VAD()
    vad_stream = vad.stream(min_silence_duration=1.0)
    tts = agents.tts.StreamAdapter(openai_tts, vad_stream)

you will got some error like VADStream has no attribute 'stream', from file: livekit/agents/voice_assistant/assistant.py:728 @keepingitneil it's a code bug right? i want to use openai.tts-1 not elevenlabs, how to fix it ?

Feature Request: Node.js Environment Support for LiveKit Agents

Feature Details:

Node.js SDK Integration: Develop a dedicated Node.js SDK for LiveKit, providing developers with native access to LiveKit functionalities within their Node.js environments. This SDK should offer comprehensive coverage of LiveKit features, ensuring parity with existing SDKs.
NPM Package: Publish the Node.js SDK as an NPM package, facilitating easy installation and dependency management for Node.js projects. This approach aligns with established Node.js development practices and enhances the accessibility of LiveKit within the Node.js ecosystem.
Comprehensive Documentation: Furnish extensive documentation specifically tailored for Node.js developers, offering clear guidance on integrating and utilizing LiveKit functionalities within Node.js applications. This documentation should include code examples, best practices, and troubleshooting tips to streamline the development process.
Support for Node.js Frameworks: Ensure compatibility with popular Node.js frameworks such as Express.js, NestJS, and Fastify, enabling developers to seamlessly integrate LiveKit into their existing projects without friction. Compatibility with these frameworks enhances flexibility and fosters rapid development.
Event Emitters and Promises: Leverage Node.js conventions such as event emitters and promises to provide an intuitive and asynchronous programming model for interacting with LiveKit APIs. This approach aligns with Node.js development paradigms and enhances the developer experience when working with LiveKit in Node.js environments.
Community Engagement: Actively engage with the Node.js developer community through forums, social media, and developer outreach programs to solicit feedback, address concerns, and foster a vibrant ecosystem around LiveKit in Node.js. Incorporating community feedback ensures that the Node.js SDK evolves in alignment with developer needs and industry trends.
Benefits:

Expanded Developer Reach: By supporting Node.js environments, LiveKit can attract a broader range of developers who prefer Node.js for their real-time communication projects, thereby increasing its user base and fostering community growth.
Enhanced Developer Experience: Node.js developers can leverage their existing skills and familiarity with the Node.js ecosystem to seamlessly integrate LiveKit into their projects, resulting in a more streamlined development experience.
Ecosystem Synergy: Integration with Node.js opens up opportunities for collaboration and integration with other Node.js libraries and frameworks, enriching the LiveKit ecosystem and enabling developers to leverage a wider range of tools and resources.
Conclusion:
Introducing native support for Node.js environments within LiveKit represents a significant opportunity to expand the platform's reach, enhance developer experience, and foster ecosystem growth. By embracing Node.js, LiveKit can empower a new wave of developers to build innovative real-time communication applications while enriching the overall developer community.

Adding timestamp (+ customizable logging configuration?) to worker logging

Currently, worker.py does not offer an easy way to customize log formats, particularly to include timestamps in logs for debugging. This makes it difficult to debug our worker deployment.

Feature request
Add logging functionality to allow users to customize the log format, either by allowing users to use custom logger or simply adding another CLI flag to toggle timestamps in logs.

Questions for maintainers

  • Have you considered adding timestamps to worker logging?
  • Is there a specific approach you prefer?

Re-join the room after agent server restarted

Currently, if agent server were restarted, all agents of joined rooms would be disconnected. Is there any way to make the agent re-join the room, or to invite the agent to the room actively?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.