Code Monkey home page Code Monkey logo

aquila-network / aquila Goto Github PK

View Code? Open in Web Editor NEW
374.0 21.0 26.0 1.88 MB

An easy to use Neural Search Engine. Index latent vectors along with JSON metadata and do efficient k-NN search.

Home Page: https://aquila.network

Python 8.53% Shell 0.30% Dockerfile 0.35% Go 0.79% TypeScript 25.86% JavaScript 0.08% HTML 61.00% SCSS 3.10%
feature-vectors similarity-search knn-search information-retrieval neural-information-retrieval vector-database approximate-nearest-neighbor-search search-engine nearest-neighbor-search embedding

aquila's People

Contributors

admin-adb avatar buriedgod avatar freakeinstein avatar jawahar273 avatar jeswinkninan avatar manekshms avatar nibu99 avatar sopaoglu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aquila's Issues

Clean install leads to communication error

Problem

The python client cannot communicate with the database after a clean install following the tutorial on osx.

Reproducing the error

from aquiladb import AquilaClient as acl                                                                                                                                   
                                                                                                                                                                           
# create DB instance                                                                                                                                                       
db = acl('localhost', 50051)                                                                                                                                               
                                                                                                                                                                           
# convert a sample document                                                                                                                                                
# convertDocument                                                                                                                                                          
sample = db.convertDocument([0.1,0.2,0.3,0.4], {"hello": "world"})                                                                                                         
                                                                                                                                                                           
# add document to AquilaDB                                                                                                                                                 
db.addDocuments([sample]) 

This leads to the following error message:

Traceback (most recent call last):
  File "test_aquiladb.py", line 12, in <module>
    db.addDocuments([sample])
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aquiladb/AquilaDB.py", line 22, in addDocuments
    response = self.stub.addDocuments(vecdb_pb2.addDocRequest(documents=documents_in))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/grpc/_channel.py", line 565, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses"
	debug_error_string = "{"created":"@1576143957.461971000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3818,"referenced_errors":[{"created":"@1576143957.461968000","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":395,"grpc_status":14}]}"

System information

  • System Version: macOS 10.14.6 (18G87)
  • Kernel Version: Darwin 18.7.0
  • Python 3.7.2
  • docker image latest

Is there any gpu acceleration for AquilaDB ?

Firstly, I want to say thanks for this project to you,

I am looking for gpu acceleration for AquilaDB, because I think its build on faiss, so may have gpu acceleration?

Does aquilaDB has gpu acc ?

[BUG] Environment variable FIXED_VEC_DIMENSION is not an integer

Describe the bug
When running the docker with --env FIXED_VEC_DIMENSION the Annoy and FAISS fail to initiate because these variables are implictly cast to str, whereas it need to be ints

To Reproduce

docker run -d -i -p 50051:50051 --env MIN_DOCS2INDEX=1  --env FIXED_VEC_DIMENSION=1000 -v "<local data persist directory>:/data" -t ammaorg/aquiladb:latest

Add a doc to aquiladb

from aquiladb import AquilaClient as acl
db = acl('localhost', 50051)

vec_len = 1000

a = [1 for i in range( int(vec_len/2) )]
a.extend([2 for i in range( int(vec_len/2) )])
sample = db.convertDocument(a, {"id": "1"})
print(db.addDocuments([sample]))

Server Logs

0|vecdb | running VecID Worker
1|peer_manager | TypeError: Cannot read property 'rows' of undefined
1|peer_manager | at /AquilaDB/src/p2p/routing_table/index.js:157:34
0|vecdb | running VecID Worker
1|peer_manager | TypeError: Cannot read property 'rows' of undefined
1|peer_manager | at /AquilaDB/src/p2p/routing_table/index.js:157:34
0|vecdb | running VecID Worker
0|vecdb | running VecID Worker
1|peer_manager | TypeError: Cannot read property 'rows' of undefined
1|peer_manager | at /AquilaDB/src/p2p/routing_table/index.js:157:34
0|vecdb | null { total_rows: 2,
0|vecdb | offset: 0,
0|vecdb | rows:
0|vecdb | [ { id: '44a51a50564ff0c68a87f6c55f47e0f6',
0|vecdb | key: '44a51a50564ff0c68a87f6c55f47e0f6',
0|vecdb | value: [Object],
0|vecdb | doc: [Object] },
0|vecdb | { id: 'bc59dd1b39ff829a33e3be10c624606e',
0|vecdb | key: 'bc59dd1b39ff829a33e3be10c624606e',
0|vecdb | value: [Object],
0|vecdb | doc: [Object] } ] }
0|vecdb | 2 ' documents retrieved for faiss index training'
2|vecstore | Annoy init index
0|vecdb | { Error: 2 UNKNOWN: Exception calling application: an integer is required (got type str)
0|vecdb | at Object.exports.createStatusError (/AquilaDB/src/node_modules/grpc/src/common.js:91:15)
0|vecdb | at Object.onReceiveStatus (/AquilaDB/src/node_modules/grpc/src/client_interceptors.js:1209:28)
0|vecdb | at InterceptingListener._callNext (/AquilaDB/src/node_modules/grpc/src/client_interceptors.js:568:42)
0|vecdb | at InterceptingListener.onReceiveStatus (/AquilaDB/src/node_modules/grpc/src/client_interceptors.js:618:8)
0|vecdb | at callback (/AquilaDB/src/node_modules/grpc/src/client_interceptors.js:847:24)
0|vecdb | code: 2,
0|vecdb | metadata: Metadata { _internal_repr: {}, flags: 0 },
0|vecdb | details:
0|vecdb | 'Exception calling application: an integer is required (got type str)' }
0|vecdb | running VecID Worker
1|peer_manager | TypeError: Cannot read property 'rows' of undefined
1|peer_manager | at /AquilaDB/src/p2p/routing_table/index.js:157:34

please complete the following information:

  • Host OS: Windows 10
  • Docker image label (tag) latest

Send batch data to VectorDB

Currently, even though documents are being sent from client as batches, vectors were forwarded to VectorDB separately. This causes exponential delay in annoy as index size grows. Send data in batches to improve annoy performance.

C# client library

Hi. This is a great project.I see that there is a python client, we use ml.net and c# and wondering if there is a c# client planned.

support multiple size vectors

Aquila DB should support multiple size vectors either by trunkating or by padding inputs by making it a fixed size internally

Wikipedia Dump

I'm interested in putting Wikipedia onto your database. Do you have a public forum where I can get advice for this? I assume someone has done this already. Thanks for making a great program! Sorry for putting this into issues.

Sample Code returns empty

from aquiladb import AquilaClient as acl
db = acl('localhost', 50051)

sample = db.convertDocument([0.1,0.2,0.3,0.4], {"hello": "world"})

db.addDocuments([sample])
vector = db.convertMatrix([0.1,0.2,0.3,0.4])

k = 10
result = db.getNearest(vector, k)

This is sample data set which is here https://github.com/a-mma/AquilaDB/wiki/Get-started-with-AquilaDB , and in my try, it returns empty list with something like :

status: true
documents: "[]"

Any idea ?

Allow indexing large vectors [ENHANCEMENT]

Currently, there is a limit (which got introduced as a side effect of bulk retrieval logic) in vector dimension throttled by gRPC request limit. Currently, once the document count hit vecount a bulk retrieval from document database is performed, which in turn blows JS heap memory as well as gRPC data limit.

Proposed fixes:

  • move bulk loading logic to vecstore module
  • enable mini batch reading from disk

Faiss Indexer Problem

More than 10.000 vectors are indexed with FAISS. After I index all vectors with FAISS, I queried a vector but it can not find itself. But If I index all vectors with ANNOY, it works as expected. Actually, I am not sure whether it is a bug.

Docker Volume Problem

I am using following YAML to run AquilaDb

version: '3'
services:
  aquiladb:
    image: ammaorg/aquiladb
    ports:
      - "50051:50051"
    volumes:
      - /home/asd/db-data:/data
    restart: always
volumes:
  db-data:

But when I remove the container, I start again it but all the indexes are lost. Probably It happens to because all files under default_docsdb are removed and new ones are added.

Storage Db

Forgive me please I'm little bit newbie, assume that if I index 100.000 data and It worked well but after that I reboot my server, How should i connect my previous indexed db ?

docker command not working

I was trying to install AquilaDB using the docker file. I tried the following command

docker build -t ammaorg/aquiladb:latest .

It was showing the following error

unable to prepare context: unable to evaluate symlinks in Dockerfile path

a b64/utf8 encoding issue

Hi, AquilaDB seems really really neat and a terrific tool. Thanks for building it!

I worked through the Google USE / Python example and am now trying to adapt it to my usecase, but finding some persistent encoding issues on document retrieval. For instance, one of documents contains a right single quotation mark U+2019. This is read in correctly and written correctly to the CouchDB document store (I checked via the Couchdb interface). However, a db.getNearest query's response contains \x19 there instead, which isn't a valid character in JSON and causes a mess.

The issue is between btoa/atob (which I think assume UTF-16 strings?) and the response out of pouchdb (UTF-8) in this file.

Here's a minimal example, though you'll have to adjust the document ID and the slice indices to match your document containing the problematic character, of course.

const atob = require('atob');
const btoa = require('btoa');
var PouchDB = require('pouchdb');
var db = new PouchDB('http://localhost:5984/default_docsdb')
q = db.allDocs({include_docs: true, keys: ['3c80fca415c221bf3702e055c055c21f']}).then((a) => { return a})
let resp = null
q.then((a) => resp = a)

Here's a (portion of) a document from PouchDB that needs to get transmitted to the client, e.g. via the Python library.

> JSON.stringify(resp.rows).slice(289, 296)
'today’s'

The problem is that btoa mis-encodes it:

> atob(btoa(JSON.stringify(resp.rows).slice(289, 296)))
'today\u0019s'

One solution: js-base64.

> Base64.decode(Base64.encode(JSON.stringify(resp.rows).slice(289, 296)));
'today’s'

One solution that's apparently not a good one:

> decodeURIComponent(atob(btoa(encodeURIComponent(JSON.stringify(resp.rows).slice(289, 296)))))
'today’s'

[BUG] Kubernetes deploy command fails

Describe the bug
error: error parsing https://github.com/a-mma/AquilaDB/blob/develop/kubernetes/aquiladb.yml: error converting YAML to JSON: yaml: line 115: mapping values are not allowed in this context

To Reproduce
Run this command: kubectl apply -f https://github.com/a-mma/AquilaDB/blob/develop/kubernetes/aquiladb.yml

Expected behavior
Successful launch of aquiladb as kubernetes service

Server Logs
If possible, collect logs from AquilaDB container by following below steps in your terminal:

  1. run docker ps and note down container id for AquilaDB
  2. run docker exec -i -t <container id> /bin/bash
  3. run pm2 logs and copy contents from there

please complete the following information:

  • Host OS: [e.g. Ubuntu 18.04]
  • Docker image label (tag) [e.g. latest, bleeding, release-v0.2.2]
  • AquilaDB Version [e.g. v0.2.2]

Additional context
Add any other context about the problem here.

persisting data

Hi @freakeinstein, it's me again! :)

How do I get AquilaDB to persist the FAISS db to disk? I'm trying to be able to restart the underlying AWS instance (and thus the AquilaDB Docker container) and have my data persist.

It doesn't seem like the data does persist. Even after I set up a working example, no file ever shows up in /data/VDB. Is this a bug? Am I missing something?

I'm running off of the master branch (having implemented the b64-related change myself).

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.