Code Monkey home page Code Monkey logo

assemblyai-java-sdk's Introduction

AssemblyAI Java Library

Maven Central fern shield GitHub License AssemblyAI Twitter AssemblyAI YouTube Discord

Documentation

API reference documentation is available here.

Requirements

Java 8+

Installation

Gradle

Add the dependency in your build.gradle:

dependencies {
    implementation 'com.assemblyai:assemblyai-java:1.x.x'
}

Maven

Add the dependency in your pom.xml:

<dependency>
    <groupId>com.assemblyai</groupId>
    <artifactId>assemblyai-java</artifactId>
    <version>1.x.x</version>
</dependency>

HTTP Client Usage

The SDK exports a vanilla HTTP client, AssemblyAI. You can use this to call into each of our API endpoints and get typed responses back.

import com.assemblyai.api.AssemblyAI;

AssemblyAI aai = AssemblyAI.builder()
  .apiKey("YOUR_API_KEY")
  .build();

Transcript transcript = aai.transcripts().get("transcript-id");

System.out.printlin("Received response!" + transcript);

Handling Errors

When the API returns a non-success status code (4xx or 5xx response), a subclass of ApiError will be thrown:

import com.assemblyai.api.core.ApiError;

try {
  aai.transcript().get("transcript-id");
} catch (ApiError error) {
  System.out.println(error.getBody());
  System.out.println(error.getStatusCode());
}

Creating a transcript

When you create a transcript, you can either pass in a URL to an audio file or upload a file directly.

import com.assemblyai.api.types.Transcript;

// Transcribe file at remote URL
Transcript transcript = aai.transcripts().transcribe(
        "https://storage.googleapis.com/aai-web-samples/espn-bears.m4a");

// Upload a file via local path and transcribe
transcript = aai.transcripts().transcribe(
        new File("./news.mp4"));

transcribe queues a transcription job and polls it until the status is completed or error. If you don't want to wait until the transcript is ready, you can use submit:

import com.assemblyai.api.types.Transcript;

// Transcribe file at remote URL
Transcript transcript = aai.transcripts().submit(
        "https://storage.googleapis.com/aai-web-samples/espn-bears.m4a");

// Upload a file via local path and transcribe
transcript = aai.transcripts().submit(
        new File("./news.mp4"));

Using the Realtime Transcriber

The Realtime Transcriber can be used to process any live audio streams and sends data over websockets. The Realtime Transcriber will take event handlers

import com.assemblyai.api.Transcriber;

RealtimeTranscriber realtime = RealtimeTranscriber.builder()
  .apiKey("YOUR_API_KEY")
  .onPartialTranscript(partial -> System.out.println(partial))
  .onFinalTranscript(finalTranscript -> System.out.println(finalTranscript))
  .build();

realtime.sendAudio(new byte[]{...});

realtime.close();

Staged Builders

The generated builders all follow the staged builder pattern. Read more here. Staged builders only allow you to construct the object once all required properties have been specified.

For example, in the snippet below, you will not be able to access the build method on CreateTranscriptParameters until you have specified the mandatory audioUrl variable.

import com.assemblyai.api.TranscriptParams;

TranscriptParams params = TranscriptParams.builder()
  .audioUrl("https://...")
  .build();

Contributing

While we value open-source contributions to this SDK, this library is generated programmatically. Additions made directly to this library would have to be moved over to our generation code, otherwise they would be overwritten upon the next generated release. Feel free to open a PR as a proof of concept, but know that we will not be able to merge it as-is. We suggest opening an issue first to discuss with us!

On the other hand, contributions to the README are always very welcome!

assemblyai-java-sdk's People

Contributors

armandobelardo avatar dannysheridan avatar dsinghvi avatar fern-api[bot] avatar fern-bot avatar ploeber avatar smithakolan avatar swimburger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

assemblyai-java-sdk's Issues

Longer timeout limit needed for LeMUR (java.net.SocketTimeoutException: timeout)

Transcripts with longer token size time out with LeMUR functions.

Requests with longer files result in this error: Caused by: java.net.SocketTimeoutException: timeout

For context, here is the code used:

import com.assemblyai.api.AssemblyAI;
import com.assemblyai.api.resources.lemur.requests.*;
import com.assemblyai.api.resources.transcripts.types.*;
import java.util.List;

public final class App {
    public static void main(String[] args) {

        AssemblyAI client = AssemblyAI.builder()
                .apiKey("api-key")
                .build();

        String url = "https://storage.googleapis.com/aai-web-samples/meeting.mp3";

        Transcript transcript = client.transcripts().transcribe(url);

        String prompt = "Provide a brief summary of the transcript.";

        var params = LemurTaskParams.builder()
                .prompt(prompt)
                .transcriptIds(List.of(transcript.getId()))
                .build();

        var result = client.lemur().task(params);

        System.out.println(result.getResponse());

Library does not work with java 1.8

Although it is not stated in the repository, i am working under the assumption that the library is supposed to work on the widespead java 1.8 version.
Unfortunately it seems that the AssemblyAI.build() method throws a NoSuchMethodException when compiled and run on java 1.8

The issue disappears when running on JDK 21.

Is the library supposed to work on java 1.8, and if not, what is the minimum version required?

NRE on any API method

A customer is using some of our sample code from our docs.

AssemblyAI client = AssemblyAI.builder()
        .apiKey("")
        .build();

String audioUrl = "https://storage.googleapis.com/aai-web-samples/5_common_sports_injuries.mp3";

var params = TranscriptOptionalParams.builder()
        .speakerLabels(true)
        .build();

try {
    Transcript transcript = client.transcripts().transcribe(audioUrl, params);

    System.out.println(transcript.getText());

    transcript.getUtterances().ifPresent(utterances ->
            utterances.forEach(utterance ->
                    System.out.println("Speaker " + utterance.getSpeaker() + ": " + utterance.getText())
            )
    );
} catch (Exception exception) {
    log.error(exception.getMessage() + " " + exception.getCause());
}

They're getting the following error:

Cannot invoke "com.assemblyai.api.core.RequestOptions.getTimeout()" because "requestOptions" is null null

When I take a peek at the code, every API method accepts request options, but they're never checked for null, so every API method is broken.

LeMUR `answerFormat`

answerFormat is needed to pass a query for lemur().actionItems. But LemurBaseParams doesn't have answerFormat as a parameter.

Currently:

var response = client.lemur().actionItems(LemurBaseParams
                .builder()
                .transcriptIds(List.of(transcript.getId()))
                .context(LemurBaseParamsContext.of("A GitLab meeting to discuss logistic"))
                .build());

Desired:

var response = client.lemur().actionItems(LemurBaseParams
                .builder()
                .transcriptIds(List.of(transcript.getId()))
                .context(LemurBaseParamsContext.of("A GitLab meeting to discuss logistic"))
                .answerFormat("Bullet points")
                .build());

Don't throw an exception when an unknown message is received in realtime

Currently, the RealtimeMessage.visit method will throw when it receives an unknown message type.
In the future, our realtime service may send new message types which shouldn't introduce a breaking change.
Therefore, the implementation needs to change to not throw an error, instead maybe have an "unknown" overload too.

https://github.com/AssemblyAI/assemblyai-java-sdk/blob/main/src/main/java/com/assemblyai/api/resources/realtime/types/RealtimeMessage.java#L33-L46

I think the current visitor pattern makes sense in other scenarios, but in receiving realtime messages, it needs to be modified.

RealtimeTranscriber - onPartialTranscript

onPartialTranscript is not invoking where as all the events are firing in onSessionStart

RealtimeTranscriber.builder()
                .apiKey(ASSEMBLY_AI_API_KEY)
                .onPartialTranscript(this::displayPartialMessage)
                .onFinalTranscript(this::displayFinalMessage)
                .onError((err) -> {
                    displayErrorMessage(err.getMessage());
                })
                .onSessionStart(System.err::println)
                .build();

Please help.

LeMUR `context`

context should be able to handle type String as a input instead of LemurBaseParamsContext.

Current functionality:

var response = client.lemur().summary(LemurSummaryParams.builder()
                .transcriptIds(List.of(transcript.getId()))
                .context(LemurBaseParamsContext.of("A GitLab meeting to discuss logistic"))
                .answerFormat("TLDR").build());

Error not reported when using bad API key

When you use the realtime transcriber with an empty API key, the error is not passed to onError.

I am getting the following output in the console.

{
  "message_type" : "SessionBegins",
  "error" : "Not authorized"
}

and

{
  "session_id" : "ab0ae487-ed4d-4d24-b6c3-65a701df20f6",
  "expires_at" : "2024-01-18T21:31:50.915638Z",
  "message_type" : "SessionBegins"
}
{
  "message_type" : "SessionBegins",
  "error" : "Received invalid request. Please check the documentation for the correct request schemas."
}

This is only in the Java SDK, so it's not how the service is sending the message to the client, but how the Java SDK is deserializing the message.

SDK should support Java 8

The SDK should support Java 8, but based on a recent customer interaction, the SDK is using some APIs that are not in Java 8.
Specifically the Map.of method.

Terminate session properly for real-time transcriber

Closing the real-time transcriber should send the terminate session message and wait for session termination.
Between the request for termination and confirmation for termination, partial and final transcripts will continue to come through.

`contentSafetyConfidence` parameter missing in `TranscriptOptionalParams`

Missing contentSafetyConfidence parameter in TranscriptOptionalParams class.

Desired usage example:

var params = TranscriptOptionalParams.builder()
             .contentSafety(true)
             .contentSafetyConfidence(60)
             .build();

Parameter: contentSafetyConfidence
Type: integer
Description: The confidence threshold for content moderation. Values must be between 25 and 100.

Docs Usage: https://www.assemblyai.com/docs/audio-intelligence/content-moderation#adjust-the-confidence-threshold

Simplify `wordSearch`

Word search only accepts a single parameter in addition to the ID, so it's a little ridiculous to use a builder for this.
It's fine to keep a builder for future expansion, but can we have an overload that takes a single string or a list of strings?

Currently:
client.transcripts().wordSearch(transcript.getId(), WordSearchParams.builder().words(word).build());

Desired:

client.transcripts().wordSearch(transcript.getId(), word);

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.