Code Monkey home page Code Monkey logo

Comments (6)

alexklibisz avatar alexklibisz commented on May 13, 2024

Hi, this shows how I've gotten this type of combination query working in the past: http://elastiknn.klibisz.com/api/#running-nearest-neighbors-query-on-a-filtered-subset-of-documents. If that doesn't fit your usecase, can you post some example docs to try your query?

from elastiknn.

alexklibisz avatar alexklibisz commented on May 13, 2024

@joseph-macraty I happened to run into an issue with the combined query while working on something else. I found that my original example in the docs technically works, but it will actually evaluate over all of the docs, instead of just the ones matching a filter. I updated the example linked above so that it will only run knn on the docs matching a filter.

from elastiknn.

joseph-macraty avatar joseph-macraty commented on May 13, 2024

Hi @alexklibisz ,
Thanks! It's working perfectly now:)

from elastiknn.

alexklibisz avatar alexklibisz commented on May 13, 2024

@joseph-macraty If I may ask, what's your usecase for Elastiknn? (just trying to get a sense of how people are using it in practice, since I don't use Elasticsearch at work myself)

from elastiknn.

joseph-macraty avatar joseph-macraty commented on May 13, 2024

We are working on text based search engines for different use cases. Here's how we are currently using Elasitknn:

We had developed a couple of BERT based models for search and were testing it out with Elastic Cloud. We were satisified with it and wanted to use them in production. We initially thought we couldn't use ES because it did not support Approximate Vector Search. We explored other options like Faiss/Annoy but for it we had to modify a non-trivial amount of our existing pipeline/codebase (we were using ES earlier). They also added a significant computational cost.

That is when I came across one of your comments on the ES repo. It was really smooth to setup and we got up and running in <1hr. Our current index has about 5.8 million (768 dimmensional vectors) documents and on a Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz with 8GB ram, we have an average time of 1.56s. Recall has been great too with only a minor drop.

All in all, I think unless you have hundreds of millions of documents or require really fast search (and can afford the added effort and computational resources) Elastiknn is the best option. For a large number of use-cases, elastiknn could replace FAISS/Annoy. In our inital searches though, Elastiknn never came up and we could have easily missed it. Is there anyway to increase the visibilty of this excellent repo? I think semantic search is an upcoming field and hence there are only a few blogs on it and all of them use the other more popular ANN implementations. If there are any ways we can contribute (writing blogs?), we would love to!

from elastiknn.

alexklibisz avatar alexklibisz commented on May 13, 2024

That's all great to hear. Great motivation for me to keep chipping away at this. Mind if I ask what company you're at?

The original source of this idea was a very similar problem to the one you described. That was at an old job, and I no longer have the problem day-to-day, but I got a lot better with Java/Scala/gradle/etc in my most recent job, so I've given this another pass.

In terms of visibility, I'm planning to do an "Introducing Elastiknn"-style blog post. The plan has been to do that after I get it integrated with the ann-benchmarks project. That seems to be table stakes for any ANN solution nowadays. It' been tough because the JVM is painfully slow compared to all of the C/C++/in-memory implementations used in that project. I'm pretty confident I can speedup one remaining bottleneck and that will make a big difference. Then once it's merged into ann-benchmarks I'll do a more celebratory writeup on medium or something.

from elastiknn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.