aquila-network / aquila Goto Github PK
View Code? Open in Web Editor NEWAn easy to use Neural Search Engine. Index latent vectors along with JSON metadata and do efficient k-NN search.
Home Page: https://aquila.network
An easy to use Neural Search Engine. Index latent vectors along with JSON metadata and do efficient k-NN search.
Home Page: https://aquila.network
This is a sub task of https://github.com/a-mma/AquilaDB/issues/5
The python client cannot communicate with the database after a clean install following the tutorial on osx.
from aquiladb import AquilaClient as acl
# create DB instance
db = acl('localhost', 50051)
# convert a sample document
# convertDocument
sample = db.convertDocument([0.1,0.2,0.3,0.4], {"hello": "world"})
# add document to AquilaDB
db.addDocuments([sample])
This leads to the following error message:
Traceback (most recent call last):
File "test_aquiladb.py", line 12, in <module>
db.addDocuments([sample])
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aquiladb/AquilaDB.py", line 22, in addDocuments
response = self.stub.addDocuments(vecdb_pb2.addDocRequest(documents=documents_in))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/grpc/_channel.py", line 565, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1576143957.461971000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3818,"referenced_errors":[{"created":"@1576143957.461968000","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":395,"grpc_status":14}]}"
Firstly, I want to say thanks for this project to you,
I am looking for gpu acceleration for AquilaDB, because I think its build on faiss, so may have gpu acceleration?
Does aquilaDB has gpu acc ?
Try reducing global variable usage
When building any of the images on Windows the docker fails to run due to CRLF line terminators in for example init_aquila_db.sh.
Because AquilaDB is currently function as a standalone database only, it is important to add external volume as attachment for data persistence. This feature will be removed later.
Describe the bug
When running the docker with --env FIXED_VEC_DIMENSION the Annoy and FAISS fail to initiate because these variables are implictly cast to str, whereas it need to be ints
To Reproduce
docker run -d -i -p 50051:50051 --env MIN_DOCS2INDEX=1 --env FIXED_VEC_DIMENSION=1000 -v "<local data persist directory>:/data" -t ammaorg/aquiladb:latest
Add a doc to aquiladb
from aquiladb import AquilaClient as acl
db = acl('localhost', 50051)
vec_len = 1000
a = [1 for i in range( int(vec_len/2) )]
a.extend([2 for i in range( int(vec_len/2) )])
sample = db.convertDocument(a, {"id": "1"})
print(db.addDocuments([sample]))
Server Logs
0|vecdb | running VecID Worker
1|peer_manager | TypeError: Cannot read property 'rows' of undefined
1|peer_manager | at /AquilaDB/src/p2p/routing_table/index.js:157:34
0|vecdb | running VecID Worker
1|peer_manager | TypeError: Cannot read property 'rows' of undefined
1|peer_manager | at /AquilaDB/src/p2p/routing_table/index.js:157:34
0|vecdb | running VecID Worker
0|vecdb | running VecID Worker
1|peer_manager | TypeError: Cannot read property 'rows' of undefined
1|peer_manager | at /AquilaDB/src/p2p/routing_table/index.js:157:34
0|vecdb | null { total_rows: 2,
0|vecdb | offset: 0,
0|vecdb | rows:
0|vecdb | [ { id: '44a51a50564ff0c68a87f6c55f47e0f6',
0|vecdb | key: '44a51a50564ff0c68a87f6c55f47e0f6',
0|vecdb | value: [Object],
0|vecdb | doc: [Object] },
0|vecdb | { id: 'bc59dd1b39ff829a33e3be10c624606e',
0|vecdb | key: 'bc59dd1b39ff829a33e3be10c624606e',
0|vecdb | value: [Object],
0|vecdb | doc: [Object] } ] }
0|vecdb | 2 ' documents retrieved for faiss index training'
2|vecstore | Annoy init index
0|vecdb | { Error: 2 UNKNOWN: Exception calling application: an integer is required (got type str)
0|vecdb | at Object.exports.createStatusError (/AquilaDB/src/node_modules/grpc/src/common.js:91:15)
0|vecdb | at Object.onReceiveStatus (/AquilaDB/src/node_modules/grpc/src/client_interceptors.js:1209:28)
0|vecdb | at InterceptingListener._callNext (/AquilaDB/src/node_modules/grpc/src/client_interceptors.js:568:42)
0|vecdb | at InterceptingListener.onReceiveStatus (/AquilaDB/src/node_modules/grpc/src/client_interceptors.js:618:8)
0|vecdb | at callback (/AquilaDB/src/node_modules/grpc/src/client_interceptors.js:847:24)
0|vecdb | code: 2,
0|vecdb | metadata: Metadata { _internal_repr: {}, flags: 0 },
0|vecdb | details:
0|vecdb | 'Exception calling application: an integer is required (got type str)' }
0|vecdb | running VecID Worker
1|peer_manager | TypeError: Cannot read property 'rows' of undefined
1|peer_manager | at /AquilaDB/src/p2p/routing_table/index.js:157:34
please complete the following information:
Currently, even though documents are being sent from client as batches, vectors were forwarded to VectorDB separately. This causes exponential delay in annoy as index size grows. Send data in batches to improve annoy performance.
Links on the readme and wiki still refer to the old project address and and so are broken.
Hi. This is a great project.I see that there is a python client, we use ml.net and c# and wondering if there is a c# client planned.
Aquila DB should support multiple size vectors either by trunkating or by padding inputs by making it a fixed size internally
When k-NN search is performed keep distance between query and target vectors as an attribute in the retrieved document
I'm interested in putting Wikipedia onto your database. Do you have a public forum where I can get advice for this? I assume someone has done this already. Thanks for making a great program! Sorry for putting this into issues.
Create database in hub response JSON has databaseName and Create database in Db response has database_name. I think it is better to follow same convention on both hub and db.
from aquiladb import AquilaClient as acl
db = acl('localhost', 50051)
sample = db.convertDocument([0.1,0.2,0.3,0.4], {"hello": "world"})
db.addDocuments([sample])
vector = db.convertMatrix([0.1,0.2,0.3,0.4])
k = 10
result = db.getNearest(vector, k)
This is sample data set which is here https://github.com/a-mma/AquilaDB/wiki/Get-started-with-AquilaDB , and in my try, it returns empty list with something like :
status: true
documents: "[]"
Any idea ?
Currently, there is a limit (which got introduced as a side effect of bulk retrieval logic) in vector dimension throttled by gRPC request limit. Currently, once the document count hit vecount
a bulk retrieval from document database is performed, which in turn blows JS heap memory as well as gRPC data limit.
Proposed fixes:
Current docker image size is insane. It is 2.55 GB. Reduce that to below 1GB or less. Apply changes from this reference: https://hackernoon.com/tips-to-reduce-docker-image-sizes-876095da3b34
More than 10.000 vectors are indexed with FAISS. After I index all vectors with FAISS, I queried a vector but it can not find itself. But If I index all vectors with ANNOY, it works as expected. Actually, I am not sure whether it is a bug.
I am using following YAML to run AquilaDb
version: '3'
services:
aquiladb:
image: ammaorg/aquiladb
ports:
- "50051:50051"
volumes:
- /home/asd/db-data:/data
restart: always
volumes:
db-data:
But when I remove the container, I start again it but all the indexes are lost. Probably It happens to because all files under default_docsdb are removed and new ones are added.
Need to investigate
Forgive me please I'm little bit newbie, assume that if I index 100.000 data and It worked well but after that I reboot my server, How should i connect my previous indexed db ?
I was trying to install AquilaDB using the docker file. I tried the following command
docker build -t ammaorg/aquiladb:latest .
It was showing the following error
unable to prepare context: unable to evaluate symlinks in Dockerfile path
Hi, AquilaDB seems really really neat and a terrific tool. Thanks for building it!
I worked through the Google USE / Python example and am now trying to adapt it to my usecase, but finding some persistent encoding issues on document retrieval. For instance, one of documents contains a right single quotation mark U+2019. This is read in correctly and written correctly to the CouchDB document store (I checked via the Couchdb interface). However, a db.getNearest query's response contains \x19 there instead, which isn't a valid character in JSON and causes a mess.
The issue is between btoa
/atob
(which I think assume UTF-16 strings?) and the response out of pouchdb (UTF-8) in this file.
Here's a minimal example, though you'll have to adjust the document ID and the slice indices to match your document containing the problematic character, of course.
const atob = require('atob');
const btoa = require('btoa');
var PouchDB = require('pouchdb');
var db = new PouchDB('http://localhost:5984/default_docsdb')
q = db.allDocs({include_docs: true, keys: ['3c80fca415c221bf3702e055c055c21f']}).then((a) => { return a})
let resp = null
q.then((a) => resp = a)
Here's a (portion of) a document from PouchDB that needs to get transmitted to the client, e.g. via the Python library.
> JSON.stringify(resp.rows).slice(289, 296)
'today’s'
The problem is that btoa
mis-encodes it:
> atob(btoa(JSON.stringify(resp.rows).slice(289, 296)))
'today\u0019s'
One solution: js-base64
.
> Base64.decode(Base64.encode(JSON.stringify(resp.rows).slice(289, 296)));
'today’s'
One solution that's apparently not a good one:
> decodeURIComponent(atob(btoa(encodeURIComponent(JSON.stringify(resp.rows).slice(289, 296)))))
'today’s'
Describe the bug
error: error parsing https://github.com/a-mma/AquilaDB/blob/develop/kubernetes/aquiladb.yml: error converting YAML to JSON: yaml: line 115: mapping values are not allowed in this context
To Reproduce
Run this command: kubectl apply -f https://github.com/a-mma/AquilaDB/blob/develop/kubernetes/aquiladb.yml
Expected behavior
Successful launch of aquiladb as kubernetes service
Server Logs
If possible, collect logs from AquilaDB container by following below steps in your terminal:
docker ps
and note down container id for AquilaDBdocker exec -i -t <container id> /bin/bash
pm2 logs
and copy contents from thereplease complete the following information:
Additional context
Add any other context about the problem here.
It is possible to store index data from FAISS to disk. Ref: https://github.com/facebookresearch/faiss/wiki/Index-IO,-index-factory,-cloning-and-hyper-parameter-tuning and https://github.com/facebookresearch/faiss/blob/master/demos/demo_ondisk_ivf.py
It is necessary to fail protect FAISS during restarts. it is also good to distribute indexes to multiple DB instances to avoid training and keep consistent results.
Hi @freakeinstein, it's me again! :)
How do I get AquilaDB to persist the FAISS db to disk? I'm trying to be able to restart the underlying AWS instance (and thus the AquilaDB Docker container) and have my data persist.
It doesn't seem like the data does persist. Even after I set up a working example, no file ever shows up in /data/VDB
. Is this a bug? Am I missing something?
I'm running off of the master
branch (having implemented the b64-related change myself).
Thanks!
This is a sub task of https://github.com/a-mma/AquilaDB/issues/5
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.