Code Monkey home page Code Monkey logo

chromadb-java-client's Introduction

Chroma Vector Database Java Client

This is a very basic/naive implementation in Java of the Chroma Vector Database API.

This client works with Chroma Versions 0.4.3+

Features

Embeddings Support

  • OpenAI API
  • Cohere API (including Multi-language support)
  • Sentence Transformers
  • PaLM API
  • Custom Embedding Function

Feature Parity with ChromaDB API

  • Reset
  • Heartbeat
  • List Collections
  • Get Version
  • Create Collection
  • Delete Collection
  • Collection Add
  • Collection Get (partial without additional parameters)
  • Collection Count
  • Collection Query
  • Collection Modify
  • Collection Update
  • Collection Upsert
  • Collection Create Index
  • Collection Delete - delete documents in collection

TODO

Usage

Add Maven dependency:

<dependency>
    <groupId>io.github.amikos-tech</groupId>
    <artifactId>chromadb-java-client</artifactId>
    <version>0.1.5</version>
</dependency>

Ensure you have a running instance of Chroma running. We recommend one of the two following options:

Example OpenAI Embedding Function

In this example we rely on tech.amikos.chromadb.OpenAIEmbeddingFunction to generate embeddings for our documents.

| Important: Ensure you have OPENAI_API_KEY environment variable set

package tech.amikos;

import com.google.gson.internal.LinkedTreeMap;
import tech.amikos.chromadb.Client;
import tech.amikos.chromadb.Collection;
import tech.amikos.chromadb.EmbeddingFunction;
import tech.amikos.chromadb.OpenAIEmbeddingFunction;

import java.util.*;

public class Main {
    public static void main(String[] args) {
        try {
            Client client = new Client(System.getenv("CHROMA_URL"));
            String apiKey = System.getenv("OPENAI_API_KEY");
            EmbeddingFunction ef = new OpenAIEmbeddingFunction(apiKey,"text-embedding-3-small");
            Collection collection = client.createCollection("test-collection", null, true, ef);
            List<Map<String, String>> metadata = new ArrayList<>();
            metadata.add(new HashMap<String, String>() {{
                put("type", "scientist");
            }});
            metadata.add(new HashMap<String, String>() {{
                put("type", "spy");
            }});
            collection.add(null, metadata, Arrays.asList("Hello, my name is John. I am a Data Scientist.", "Hello, my name is Bond. I am a Spy."), Arrays.asList("1", "2"));
            Collection.QueryResponse qr = collection.query(Arrays.asList("Who is the spy"), 10, null, null, null);
            System.out.println(qr);
        } catch (Exception e) {
            e.printStackTrace();
            System.out.println(e);
        }
    }
}

The above should output:

{"documents":[["Hello, my name is Bond. I am a Spy.","Hello, my name is John. I am a Data Scientist."]],"ids":[["2","1"]],"metadatas":[[{"type":"spy"},{"type":"scientist"}]],"distances":[[0.28461432,0.50961685]]}

Custom OpenAI Endpoint

For endpoints compatible with OpenAI Embeddings API (e.g. ollama), you can use the following:

Note: We have added a builder to help with the configuration of the OpenAIEmbeddingFunction

EmbeddingFunction ef = OpenAIEmbeddingFunction.Instance()
        .withOpenAIAPIKey(apiKey)
        .withModelName("llama2")
        .withApiEndpoint("http://localhost:11434/api/embedding") // not really custom, but just to test the method
        .build();

Quick Start Guide with Ollama:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama run llama2 # press Ctrl+D to exit after model downloads successfully
# test it
curl http://localhost:11434/api/embeddings -d '{\n  "model": "llama2",\n  "prompt": "Here is an article about llamas..."\n}'

Example Cohere Embedding Function

In this example we rely on tech.amikos.chromadb.CohereEmbeddingFunction to generate embeddings for our documents.

| Important: Ensure you have COHERE_API_KEY environment variable set

package tech.amikos;

import tech.amikos.chromadb.*;
import tech.amikos.chromadb.Collection;

import java.util.*;

public class Main {
  public static void main(String[] args) {
    try {
      Client client = new Client(System.getenv("CHROMA_URL"));
      client.reset();
      String apiKey = System.getenv("COHERE_API_KEY");
      EmbeddingFunction ef = new CohereEmbeddingFunction(apiKey);
      Collection collection = client.createCollection("test-collection", null, true, ef);
      List<Map<String, String>> metadata = new ArrayList<>();
      metadata.add(new HashMap<String, String>() {{
        put("type", "scientist");
      }});
      metadata.add(new HashMap<String, String>() {{
        put("type", "spy");
      }});
      collection.add(null, metadata, Arrays.asList("Hello, my name is John. I am a Data Scientist.", "Hello, my name is Bond. I am a Spy."), Arrays.asList("1", "2"));
      Collection.QueryResponse qr = collection.query(Arrays.asList("Who is the spy"), 10, null, null, null);
      System.out.println(qr);
    } catch (Exception e) {
      e.printStackTrace();
      System.out.println(e);
    }
  }
}

The above should output:

{"documents":[["Hello, my name is Bond. I am a Spy.","Hello, my name is John. I am a Data Scientist."]],"ids":[["2","1"]],"metadatas":[[{"type":"spy"},{"type":"scientist"}]],"distances":[[5112.614,10974.804]]}

Example Hugging Face Sentence Transformers Embedding Function

In this example we rely on tech.amikos.chromadb.HuggingFaceEmbeddingFunction to generate embeddings for our documents.

| Important: Ensure you have HF_API_KEY environment variable set

package tech.amikos;

import tech.amikos.chromadb.*;
import tech.amikos.chromadb.Collection;

import java.util.*;

public class Main {
  public static void main(String[] args) {
    try {
      Client client = new Client(System.getenv("CHROMA_URL"));
      client.reset();
      String apiKey = System.getenv("HF_API_KEY");
      EmbeddingFunction ef = new HuggingFaceEmbeddingFunction(apiKey);
      Collection collection = client.createCollection("test-collection", null, true, ef);
      List<Map<String, String>> metadata = new ArrayList<>();
      metadata.add(new HashMap<String, String>() {{
        put("type", "scientist");
      }});
      metadata.add(new HashMap<String, String>() {{
        put("type", "spy");
      }});
      collection.add(null, metadata, Arrays.asList("Hello, my name is John. I am a Data Scientist.", "Hello, my name is Bond. I am a Spy."), Arrays.asList("1", "2"));
      Collection.QueryResponse qr = collection.query(Arrays.asList("Who is the spy"), 10, null, null, null);
      System.out.println(qr);
    } catch (Exception e) {
      System.out.println(e);
    }
  }
}

The above should output:

{"documents":[["Hello, my name is Bond. I am a Spy.","Hello, my name is John. I am a Data Scientist."]],"ids":[["2","1"]],"metadatas":[[{"type":"spy"},{"type":"scientist"}]],"distances":[[0.9073759,1.6440368]]}

Example Auth

Note: This is a workaround until the client overhaul is completed

Basic Auth:

package tech.amikos;

import tech.amikos.chromadb.*;
import tech.amikos.chromadb.Collection;

import java.util.*;

public class Main {
  public static void main(String[] args) {
    try {
      Client client = new Client(System.getenv("CHROMA_URL"));
      String encodedString = Base64.getEncoder().encodeToString("admin:admin".getBytes());
      client.setDefaultHeaders(new HashMap<>() {{
          put("Authorization", "Basic " + encodedString);
      }});
      // your code here
    } catch (Exception e) {
      System.out.println(e);
    }
  }
}

Static Auth - Authorization:

package tech.amikos;

import tech.amikos.chromadb.*;
import tech.amikos.chromadb.Collection;

import java.util.*;

public class Main {
  public static void main(String[] args) {
    try {
      Client client = new Client(System.getenv("CHROMA_URL"));
      String encodedString = Base64.getEncoder().encodeToString("admin:admin".getBytes());
      client.setDefaultHeaders(new HashMap<>() {{
          put("Authorization", "Bearer test-token");
      }});
      // your code here
    } catch (Exception e) {
      System.out.println(e);
    }
  }
}

Static Auth - X-Chroma-Token:

package tech.amikos;

import tech.amikos.chromadb.*;
import tech.amikos.chromadb.Collection;

import java.util.*;

public class Main {
  public static void main(String[] args) {
    try {
      Client client = new Client(System.getenv("CHROMA_URL"));
      String encodedString = Base64.getEncoder().encodeToString("admin:admin".getBytes());
      client.setDefaultHeaders(new HashMap<>() {{
          put("X-Chroma-Token", "test-token");
      }});
      // your code here
    } catch (Exception e) {
      System.out.println(e);
    }
  }
}

Development Notes

We have made some minor changes on top of the ChromaDB API (src/main/resources/openapi/api.yaml) so that the API can work with Java and Swagger Codegen. The reason is that statically type languages like Java don't like the anyOf and oneOf keywords (This also is the reason why we don't use the generated java client for OpenAI API).

Contributing

Pull requests are welcome.

References

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.