amikos-tech / chromadb-java-client Goto Github PK
View Code? Open in Web Editor NEWA thin client for Chroma Vector DB implemented in Java
License: MIT License
A thin client for Chroma Vector DB implemented in Java
License: MIT License
My understanding was that ChromaDB's default embeddings are running locally and do not require an API key. However I cannot find an example like this in the README, all examples require an API key. Am I missing something?
I would like to try “text-embedding-3-large”, a newly released embedding model from openai. A function that allows users to select the desired embedding model is needed.
Unable to use Where
and Where_document
because in code it's just Map<String,String> but in the documentation it's something like:
{ "metadata_field": { "$nin": ["value1", "value2", "value3"] } }
Can you provide example how to use them from the library?
As far as I know, it's more efficient to store the "document" in an RDBMS rather than in a vector database, but in this project, they force you to store all values. What's the reason for this?
As a user, I want the Java client to support auth.
Support Hugging Face embeddings:
Ref: https://central.sonatype.org/publish/publish-maven/
POM update needed.
<plugin>
<groupId>org.sonatype.plugins</groupId>
<artifactId>nexus-staging-maven-plugin</artifactId>
<version>1.6.7</version>
<extensions>true</extensions>
<configuration>
<serverId>ossrh</serverId>
<nexusUrl>https://s01.oss.sonatype.org/</nexusUrl>
<autoReleaseAfterClose>true</autoReleaseAfterClose>
</configuration>
</plugin>
We'll use this as a starting point: https://github.com/cohere-ai/cohere-python/blob/main/cohere/client.py
responses = {
"embeddings": [],
"compressed_embeddings": [],
}
json_bodys = []
for i in range(0, len(texts), self.batch_size):
texts_batch = texts[i : i + self.batch_size]
json_bodys.append(
{
"model": model,
"texts": texts_batch,
"truncate": truncate,
"compress": compress,
"compression_codebook": compression_codebook,
}
)
meta = None
for result in self._executor.map(lambda json_body: self._request(cohere.EMBED_URL, json=json_body), json_bodys):
responses["embeddings"].extend(result["embeddings"])
responses["compressed_embeddings"].extend(result.get("compressed_embeddings", []))
meta = result["meta"] if not meta else meta
return Embeddings(
embeddings=responses["embeddings"],
compressed_embeddings=responses["compressed_embeddings"],
meta=meta,
)
The following classes is not exist in source code:
import tech.amikos.chromadb.handler.ApiClient;
import tech.amikos.chromadb.handler.ApiException;
import tech.amikos.chromadb.handler.DefaultApi;
According to the Chroma documentation there is an offset and include value for the get method.
These seem to be missing?
The bge-m3 model is a local embedding model that surpasses openai's large embedding model. I would like to load this into a server compatible with openai API and use it in conjunction with chromadb. Therefore, it would be nice if users could use custom URLs when using the openai embedding feature in your project.
Thank you for your efforts,
For now, most of the testing was "visually inspect stuff", but we need assertions to ensure bugs don't slip through.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.