Comments (6)
Hi, this shows how I've gotten this type of combination query working in the past: http://elastiknn.klibisz.com/api/#running-nearest-neighbors-query-on-a-filtered-subset-of-documents. If that doesn't fit your usecase, can you post some example docs to try your query?
from elastiknn.
@joseph-macraty I happened to run into an issue with the combined query while working on something else. I found that my original example in the docs technically works, but it will actually evaluate over all of the docs, instead of just the ones matching a filter. I updated the example linked above so that it will only run knn on the docs matching a filter.
from elastiknn.
Hi @alexklibisz ,
Thanks! It's working perfectly now:)
from elastiknn.
@joseph-macraty If I may ask, what's your usecase for Elastiknn? (just trying to get a sense of how people are using it in practice, since I don't use Elasticsearch at work myself)
from elastiknn.
We are working on text based search engines for different use cases. Here's how we are currently using Elasitknn:
We had developed a couple of BERT based models for search and were testing it out with Elastic Cloud. We were satisified with it and wanted to use them in production. We initially thought we couldn't use ES because it did not support Approximate Vector Search. We explored other options like Faiss/Annoy but for it we had to modify a non-trivial amount of our existing pipeline/codebase (we were using ES earlier). They also added a significant computational cost.
That is when I came across one of your comments on the ES repo. It was really smooth to setup and we got up and running in <1hr. Our current index has about 5.8 million (768 dimmensional vectors) documents and on a Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
with 8GB ram, we have an average time of 1.56s. Recall has been great too with only a minor drop.
All in all, I think unless you have hundreds of millions of documents or require really fast search (and can afford the added effort and computational resources) Elastiknn is the best option. For a large number of use-cases, elastiknn could replace FAISS/Annoy. In our inital searches though, Elastiknn never came up and we could have easily missed it. Is there anyway to increase the visibilty of this excellent repo? I think semantic search is an upcoming field and hence there are only a few blogs on it and all of them use the other more popular ANN implementations. If there are any ways we can contribute (writing blogs?), we would love to!
from elastiknn.
That's all great to hear. Great motivation for me to keep chipping away at this. Mind if I ask what company you're at?
The original source of this idea was a very similar problem to the one you described. That was at an old job, and I no longer have the problem day-to-day, but I got a lot better with Java/Scala/gradle/etc in my most recent job, so I've given this another pass.
In terms of visibility, I'm planning to do an "Introducing Elastiknn"-style blog post. The plan has been to do that after I get it integrated with the ann-benchmarks project. That seems to be table stakes for any ANN solution nowadays. It' been tough because the JVM is painfully slow compared to all of the C/C++/in-memory implementations used in that project. I'm pretty confident I can speedup one remaining bottleneck and that will make a big difference. Then once it's merged into ann-benchmarks I'll do a more celebratory writeup on medium or something.
from elastiknn.
Related Issues (20)
- Cross-build for Elasticsearch 7.x and 8.x HOT 11
- Stop publishing Scala and Java libraries
- Migrate to Scala 3
- JAVA api
- RecallSuite tests are extremely slow in Github Actions HOT 2
- Adding elastiknn as an extension in the Elastic cloud fails with releases 8.4.2.1 and 8.4.3.0 HOT 4
- Migrate documentation site to github pages HOT 1
- Integrate with Coveralls for test coverage
- Try PyLucene for ann-benchmarks implementation
- Upgrade ann-benchmarks to 8.6.2 (or latest)
- Try Vectors from Project Panama for vector similarity computations HOT 1
- Plugin [.installing-18148280304972249747] is missing a descriptor properties file HOT 1
- Run benchmarks in Github Actions on a standalone EC2 instance HOT 1
- Try vectors from Project Panama for LSH operations HOT 3
- can't create a mapping HOT 1
- Try quick select algorithm for KthGreatest implementation HOT 4
- Try resampling vectors to speed up L2LshModel
- Try getting rid of HashAndFreq to minimize allocations HOT 1
- Try re-using threadlocal arrays in ArrayHitCounter HOT 2
- Try caching the query vector's FloatVector segments when computing distance HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elastiknn.