assemblyai / assemblyai-java-sdk Goto Github PK

The AssemblyAI Java SDK provides an easy-to-use interface for interacting with the AssemblyAI API, which supports async and real-time transcription, audio intelligence models, as well as the latest LeMUR models.

Home Page: https://www.assemblyai.com/

License: MIT License

Java 100.00%

ai asr assemblyai java llm speech-to-text stt transcription

assemblyai-java-sdk's Introduction

AssemblyAI Java Library

Documentation

API reference documentation is available here.

Requirements

Java 8+

Installation

Gradle

Add the dependency in your build.gradle:

dependencies {
    implementation 'com.assemblyai:assemblyai-java:1.x.x'
}

Maven

Add the dependency in your pom.xml:

<dependency>
    <groupId>com.assemblyai</groupId>
    <artifactId>assemblyai-java</artifactId>
    <version>1.x.x</version>
</dependency>

HTTP Client Usage

The SDK exports a vanilla HTTP client, AssemblyAI. You can use this to call into each of our API endpoints and get typed responses back.

import com.assemblyai.api.AssemblyAI;

AssemblyAI aai = AssemblyAI.builder()
  .apiKey("YOUR_API_KEY")
  .build();

Transcript transcript = aai.transcripts().get("transcript-id");

System.out.printlin("Received response!" + transcript);

Handling Errors

When the API returns a non-success status code (4xx or 5xx response), a subclass of ApiError will be thrown:

import com.assemblyai.api.core.ApiError;

try {
  aai.transcript().get("transcript-id");
} catch (ApiError error) {
  System.out.println(error.getBody());
  System.out.println(error.getStatusCode());
}

Creating a transcript

When you create a transcript, you can either pass in a URL to an audio file or upload a file directly.

import com.assemblyai.api.types.Transcript;

// Transcribe file at remote URL
Transcript transcript = aai.transcripts().transcribe(
        "https://storage.googleapis.com/aai-web-samples/espn-bears.m4a");

// Upload a file via local path and transcribe
transcript = aai.transcripts().transcribe(
        new File("./news.mp4"));

transcribe queues a transcription job and polls it until the status is completed or error. If you don't want to wait until the transcript is ready, you can use submit:

import com.assemblyai.api.types.Transcript;

// Transcribe file at remote URL
Transcript transcript = aai.transcripts().submit(
        "https://storage.googleapis.com/aai-web-samples/espn-bears.m4a");

// Upload a file via local path and transcribe
transcript = aai.transcripts().submit(
        new File("./news.mp4"));

Using the Realtime Transcriber

The Realtime Transcriber can be used to process any live audio streams and sends data over websockets. The Realtime Transcriber will take event handlers

import com.assemblyai.api.Transcriber;

RealtimeTranscriber realtime = RealtimeTranscriber.builder()
  .apiKey("YOUR_API_KEY")
  .onPartialTranscript(partial -> System.out.println(partial))
  .onFinalTranscript(finalTranscript -> System.out.println(finalTranscript))
  .build();

realtime.sendAudio(new byte[]{...});

realtime.close();

Staged Builders

The generated builders all follow the staged builder pattern. Read more here. Staged builders only allow you to construct the object once all required properties have been specified.

For example, in the snippet below, you will not be able to access the build method on CreateTranscriptParameters until you have specified the mandatory audioUrl variable.

import com.assemblyai.api.TranscriptParams;

TranscriptParams params = TranscriptParams.builder()
  .audioUrl("https://...")
  .build();

Contributing

While we value open-source contributions to this SDK, this library is generated programmatically. Additions made directly to this library would have to be moved over to our generation code, otherwise they would be overwritten upon the next generated release. Feel free to open a PR as a proof of concept, but know that we will not be able to merge it as-is. We suggest opening an issue first to discuss with us!

On the other hand, contributions to the README are always very welcome!

assemblyai-java-sdk's People

Contributors

Stargazers

Watchers

Forkers

devo-hansen gialnet armandobelardo

assemblyai-java-sdk's Issues

Add `onclose` to the real-time transcriber

The real-time transcriber should have an onclose callback so users can know why the connection was closed.

`wordSearch` method without words doesn't make sense

The wordSearch method has an overload without any words which doesn't make sense:

public WordSearchResponse wordSearch(String transcriptId) {

This overload should be removed, not generated.

Longer timeout limit needed for LeMUR (java.net.SocketTimeoutException: timeout)

Transcripts with longer token size time out with LeMUR functions.

Requests with longer files result in this error: Caused by: java.net.SocketTimeoutException: timeout

For context, here is the code used:

import com.assemblyai.api.AssemblyAI;
import com.assemblyai.api.resources.lemur.requests.*;
import com.assemblyai.api.resources.transcripts.types.*;
import java.util.List;

public final class App {
    public static void main(String[] args) {

        AssemblyAI client = AssemblyAI.builder()
                .apiKey("api-key")
                .build();

        String url = "https://storage.googleapis.com/aai-web-samples/meeting.mp3";

        Transcript transcript = client.transcripts().transcribe(url);

        String prompt = "Provide a brief summary of the transcript.";

        var params = LemurTaskParams.builder()
                .prompt(prompt)
                .transcriptIds(List.of(transcript.getId()))
                .build();

        var result = client.lemur().task(params);

        System.out.println(result.getResponse());

Library does not work with java 1.8

Although it is not stated in the repository, i am working under the assumption that the library is supposed to work on the widespead java 1.8 version.
Unfortunately it seems that the AssemblyAI.build() method throws a NoSuchMethodException when compiled and run on java 1.8

The issue disappears when running on JDK 21.

Is the library supposed to work on java 1.8, and if not, what is the minimum version required?

AssemblyAI.transcripts().transcribe() fails silently and hangs

When providing the method with a webm audio file, the assemblyAI platfom immediately shows that the transcription failed with an error, however the method does not throw an exception, and hangs indefinitely.

SpeechModel should be enum but is string

See spec: https://github.com/AssemblyAI/assemblyai-api-spec/blob/a28083d778c569a836eb65fcd045d330610ac5ec/openapi.yml#L883
https://github.com/AssemblyAI/assemblyai-api-spec/blob/main/openapi.yml#L1410

There's only one value for the enum, which is why Fern may have tripped up.

Improve README to match other SDKs

The other SDKs have README's that have more code samples.
This SDK could benefit from having the same samples.

Node: https://github.com/AssemblyAI/assemblyai-node-sdk
Ruby: https://github.com/assemblyai/assemblyai-ruby-sdk
Python: https://github.com/AssemblyAI/assemblyai-python-sdk

`ApiError` should be `ApiException`

While in our spec the name of this error is Error, Java seems to use the Exception suffix for exceptions, we should stick to that.

NRE on any API method

A customer is using some of our sample code from our docs.

AssemblyAI client = AssemblyAI.builder()
        .apiKey("")
        .build();

String audioUrl = "https://storage.googleapis.com/aai-web-samples/5_common_sports_injuries.mp3";

var params = TranscriptOptionalParams.builder()
        .speakerLabels(true)
        .build();

try {
    Transcript transcript = client.transcripts().transcribe(audioUrl, params);

    System.out.println(transcript.getText());

    transcript.getUtterances().ifPresent(utterances ->
            utterances.forEach(utterance ->
                    System.out.println("Speaker " + utterance.getSpeaker() + ": " + utterance.getText())
            )
    );
} catch (Exception exception) {
    log.error(exception.getMessage() + " " + exception.getCause());
}

They're getting the following error:

Cannot invoke "com.assemblyai.api.core.RequestOptions.getTimeout()" because "requestOptions" is null null

When I take a peek at the code, every API method accepts request options, but they're never checked for null, so every API method is broken.

Generate Java SDK API reference

It would be helpful for devs to have a generated API reference to find Java SDK specific details.

LeMUR `answerFormat`

answerFormat is needed to pass a query for lemur().actionItems. But LemurBaseParams doesn't have answerFormat as a parameter.

Currently:

var response = client.lemur().actionItems(LemurBaseParams
                .builder()
                .transcriptIds(List.of(transcript.getId()))
                .context(LemurBaseParamsContext.of("A GitLab meeting to discuss logistic"))
                .build());

Desired:

var response = client.lemur().actionItems(LemurBaseParams
                .builder()
                .transcriptIds(List.of(transcript.getId()))
                .context(LemurBaseParamsContext.of("A GitLab meeting to discuss logistic"))
                .answerFormat("Bullet points")
                .build());

Don't throw an exception when an unknown message is received in realtime

Currently, the RealtimeMessage.visit method will throw when it receives an unknown message type.
In the future, our realtime service may send new message types which shouldn't introduce a breaking change.
Therefore, the implementation needs to change to not throw an error, instead maybe have an "unknown" overload too.

https://github.com/AssemblyAI/assemblyai-java-sdk/blob/main/src/main/java/com/assemblyai/api/resources/realtime/types/RealtimeMessage.java#L33-L46

I think the current visitor pattern makes sense in other scenarios, but in receiving realtime messages, it needs to be modified.

Add temporary token support for RealtimeTranscriber

Add a way to add optional params to realtime transcriber

Title says it all. I want to be able to add params for different languages and also be able to turn automatic language detection on. Would it be possible to add this functionality to this SDK?

RealtimeTranscriber - onPartialTranscript

onPartialTranscript is not invoking where as all the events are firing in onSessionStart

RealtimeTranscriber.builder()
                .apiKey(ASSEMBLY_AI_API_KEY)
                .onPartialTranscript(this::displayPartialMessage)
                .onFinalTranscript(this::displayFinalMessage)
                .onError((err) -> {
                    displayErrorMessage(err.getMessage());
                })
                .onSessionStart(System.err::println)
                .build();

Please help.

Add encoding parameter to RealtimeTranscriber

LeMUR `context`

context should be able to handle type String as a input instead of LemurBaseParamsContext.

Current functionality:

var response = client.lemur().summary(LemurSummaryParams.builder()
                .transcriptIds(List.of(transcript.getId()))
                .context(LemurBaseParamsContext.of("A GitLab meeting to discuss logistic"))
                .answerFormat("TLDR").build());

Error not reported when using bad API key

When you use the realtime transcriber with an empty API key, the error is not passed to onError.

I am getting the following output in the console.

{
  "message_type" : "SessionBegins",
  "error" : "Not authorized"
}

and

{
  "session_id" : "ab0ae487-ed4d-4d24-b6c3-65a701df20f6",
  "expires_at" : "2024-01-18T21:31:50.915638Z",
  "message_type" : "SessionBegins"
}
{
  "message_type" : "SessionBegins",
  "error" : "Received invalid request. Please check the documentation for the correct request schemas."
}

This is only in the Java SDK, so it's not how the service is sending the message to the client, but how the Java SDK is deserializing the message.

SDK should support Java 8

The SDK should support Java 8, but based on a recent customer interaction, the SDK is using some APIs that are not in Java 8.
Specifically the Map.of method.

Terminate session properly for real-time transcriber

Closing the real-time transcriber should send the terminate session message and wait for session termination.
Between the request for termination and confirmation for termination, partial and final transcripts will continue to come through.

Support streams for file upload

The FilesClient and PollingTranscriptsClient should support streams so users can stream files from disk, network, etc.

`contentSafetyConfidence` parameter missing in `TranscriptOptionalParams`

Missing contentSafetyConfidence parameter in TranscriptOptionalParams class.

Desired usage example:

var params = TranscriptOptionalParams.builder()
             .contentSafety(true)
             .contentSafetyConfidence(60)
             .build();

Parameter: contentSafetyConfidence
Type: integer
Description: The confidence threshold for content moderation. Values must be between 25 and 100.

Docs Usage: https://www.assemblyai.com/docs/audio-intelligence/content-moderation#adjust-the-confidence-threshold

Simplify `wordSearch`

Word search only accepts a single parameter in addition to the ID, so it's a little ridiculous to use a builder for this.
It's fine to keep a builder for future expansion, but can we have an overload that takes a single string or a list of strings?

Currently:
client.transcripts().wordSearch(transcript.getId(), WordSearchParams.builder().words(word).build());