amikos-tech / chromadb-java-client Goto Github PK

View Code? Open in Web Editor NEW

29.0 2.0 2.0 154 KB

A thin client for Chroma Vector DB implemented in Java

License: MIT License

Java 100.00%

ai chromadb machine-learning

chromadb-java-client's People

Contributors

Stargazers

Watchers

Forkers

aravindcz omt66

chromadb-java-client's Issues

Default ChromaDB embeddings ` all-MiniLM-L6-v2`

My understanding was that ChromaDB's default embeddings are running locally and do not require an API key. However I cannot find an example like this in the README, all examples require an API key. Am I missing something?

new OpenAI embedding model

I would like to try “text-embedding-3-large”, a newly released embedding model from openai. A function that allows users to select the desired embedding model is needed.

Where and Where_document syntax

Unable to use Where and Where_document because in code it's just Map<String,String> but in the documentation it's something like:
{ "metadata_field": { "$nin": ["value1", "value2", "value3"] } }

Can you provide example how to use them from the library?

I have a question about how I store my data.

As far as I know, it's more efficient to store the "document" in an RDBMS rather than in a vector database, but in this project, they force you to store all values. What's the reason for this?

Auth support

About

As a user, I want the Java client to support auth.

Acceptance Criteria

Auth abstractions
Basic Auth support
Static Token support

Update operation support

List Collections Support

Document Count Support

Hugging Face Support

Support Hugging Face embeddings:

https://huggingface.co/blog/getting-started-with-embeddings

Fluent API

关于使用openai自定义代理地址异常问题

您好,因为我目前所处**境内.为了保证可以正常使用openai的所有服务,我们对openai的地址做了代理.我查看readma发现咱们项目可以使用自定义openai代理,所以我构建如图的代码测试.但是报错如下:

我确认我们的代理是可以正常请求的:

请求改问题是否因为我的错误使用导致又或是其他问题.
感谢

Maven Central Auto Release

Ref: https://central.sonatype.org/publish/publish-maven/

POM update needed.

<plugin>
      <groupId>org.sonatype.plugins</groupId>
      <artifactId>nexus-staging-maven-plugin</artifactId>
      <version>1.6.7</version>
      <extensions>true</extensions>
      <configuration>
        <serverId>ossrh</serverId>
        <nexusUrl>https://s01.oss.sonatype.org/</nexusUrl>
        <autoReleaseAfterClose>true</autoReleaseAfterClose>
      </configuration>
    </plugin>

Cohere Embedding Support

We'll use this as a starting point: https://github.com/cohere-ai/cohere-python/blob/main/cohere/client.py

responses = {
            "embeddings": [],
            "compressed_embeddings": [],
        }
        json_bodys = []

        for i in range(0, len(texts), self.batch_size):
            texts_batch = texts[i : i + self.batch_size]
            json_bodys.append(
                {
                    "model": model,
                    "texts": texts_batch,
                    "truncate": truncate,
                    "compress": compress,
                    "compression_codebook": compression_codebook,
                }
            )

        meta = None
        for result in self._executor.map(lambda json_body: self._request(cohere.EMBED_URL, json=json_body), json_bodys):
            responses["embeddings"].extend(result["embeddings"])
            responses["compressed_embeddings"].extend(result.get("compressed_embeddings", []))
            meta = result["meta"] if not meta else meta

        return Embeddings(
            embeddings=responses["embeddings"],
            compressed_embeddings=responses["compressed_embeddings"],
            meta=meta,
        )

The handler file is not exist in source code.

The following classes is not exist in source code:

import tech.amikos.chromadb.handler.ApiClient;
import tech.amikos.chromadb.handler.ApiException;
import tech.amikos.chromadb.handler.DefaultApi;

Collection.get is missing offset and include options

According to the Chroma documentation there is an offset and include value for the get method.

These seem to be missing?

openai embedding custom url

The bge-m3 model is a local embedding model that surpasses openai's large embedding model. I would like to load this into a server compatible with openai API and use it in conjunction with chromadb. Therefore, it would be nice if users could use custom URLs when using the openai embedding feature in your project.

Thank you for your efforts,

Improve testing with assertions

For now, most of the testing was "visually inspect stuff", but we need assertions to ensure bugs don't slip through.