Code Monkey home page Code Monkey logo

cartesia-js's Introduction

Cartesia JavaScript Client

NPM Version Discord

This client provides convenient access to Cartesia's TTS models. Sonic is the fastest text-to-speech model around—it can generate a second of audio in just 650ms, and it can stream out the first audio chunk in just 135ms. Alongside Sonic, we also offer an extensive prebuilt voice library for a variety of use cases.

The full API documentation can be found on docs.cartesia.ai.

Installation

# NPM
npm install @cartesia/cartesia-js
# Yarn
yarn add @cartesia/cartesia-js
# PNPM
pnpm add @cartesia/cartesia-js
# Bun
bun add @cartesia/cartesia-js

Usage

CRUD on Voices

import Cartesia from "@cartesia/cartesia-js";

const cartesia = new Cartesia({
	apiKey: "your-api-key",
});

// List all voices.
const voices = await cartesia.voices.list();
console.log(voices);

// Get a voice.
const voice = await cartesia.voices.get("<voice-id>");
console.log(voice);

// Create a voice.
const newVoice = await cartesia.voices.create({
	name: "Tim",
	description: "A deep, resonant voice.",
	embedding: Array(192).fill(1.0),
});
console.log(newVoice);

// Clone a voice from a URL.
const clonedVoice = await cartesia.voices.clone({
	mode: "url",
	url: "https://youtu.be/AdtLxlttrHg?si=07OLmDPg__0IN14f&t=6",
});

// Clone a voice from a file.
const clonedVoice = await cartesia.voices.clone({
	mode: "clip",
	clip: myFile, // Pass a File object or a Blob.
});

TTS over WebSocket

import Cartesia from "@cartesia/cartesia-js";

const cartesia = new Cartesia({
	apiKey: "your-api-key",
});

// Initialize the WebSocket. Make sure the sample rate you specify is supported.
const websocket = cartesia.tts.websocket({ sampleRate: 44100 });

try {
	await websocket.connect();
} catch (error) {
	console.error(`Failed to connect to Cartesia: ${error}`);
}

// Create a stream.
const response = await websocket.send({
	model: "sonic-english",
	voice: {
		mode: "embedding",
		embedding: Array(192).fill(1.0),
	},
	transcript: "Hello, world!"
	// The WebSocket sets output_format on your behalf.
	// The container is "raw" and the encoding is "pcm_f32le".
});

// Access the raw messages from the WebSocket.
response.on("message", (message) => {
	// Raw message.
	console.log("Received message:", message);
});

// You can also access messages using a for-await-of loop.
for await (const message of response.events('message')) {
	// Raw message.
	console.log("Received message:", message);
}

Multilingual TTS [Alpha]

You can define the language of the text you want to synthesize by setting the language field in the request object. Make sure that you are using model: "sonic-multilingual" in the request object.

Supported languages are available at docs.cartesia.ai.

Playing audio in the browser

(The WebPlayer class only supports playing audio in the browser.)

// If you're using the client in the browser, you can control audio playback using our WebPlayer:
import { WebPlayer } from "@cartesia/cartesia-js";

console.log("Playing stream...");

// Create a Player object.
const player = new WebPlayer();

// Play the audio. (`response` includes a custom Source object that the Player can play.)
// The call resolves when the audio finishes playing.
await player.play(response.source);

console.log("Done playing.");

React

We export a React hook that simplifies the process of using the TTS API. The hook manages the WebSocket connection and provides a simple interface for buffering, playing, pausing and restarting audio.

import { useTTS } from '@cartesia/cartesia-js/react';

function TextToSpeech() {
	const tts = useTTS({
		apiKey: "your-api-key",
		sampleRate: 44100,
	})

	const [text, setText] = useState("");

	const handlePlay = async () => {
		// Begin buffering the audio.
		const response = await tts.buffer({
			model_id: "sonic-english",
			voice: {
				mode: "embedding",
				embedding: Array(192).fill(1.0),
			},
			transcript: text,
		});

		// Immediately play the audio. (You can also buffer in advance and play later.)
		await tts.play();
	}

	return (
		<div>
			<input type="text" value={text} onChange={(event) => setText(event.target.value)} />
			<button onClick={handlePlay}>Play</button>

			<div>
				{tts.playbackStatus} | {tts.bufferStatus} | {tts.isWaiting}
			</div>
		</div>
	);
}

cartesia-js's People

Contributors

kbrgl avatar sauhardjain avatar github-actions[bot] avatar chongzluong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.