Code Monkey home page Code Monkey logo

Comments (3)

Kerollmops avatar Kerollmops commented on May 17, 2024 3

Ok, so after 14 days of intensive reflection (lol), I found a solution to reduce the timings of the query_all function, removing the HashMap<DocumentId, Vec<Match>> and replacing it with a Vec<(DocumentId, Match)> that is sort in parallel using rayon.

The previous vec is finally aggregated into 7 vec of same data type (i.e. all distances, all exact), following the data oriented previously developped design of the engine.

Here are the before/after performance logs of the search engine by searching "s" by using the default-fields.toml schema.

Searching for: s

97360 total documents to classify
677358 total matches to classify

query_all took 88.97 ms

criterion SumOfTypos, documents group of size 97360
criterion SumOfTypos sort took 5.57 ms

criterion NumberOfWords, documents group of size 97360
criterion NumberOfWords sort took 6.42 ms

criterion WordsProximity, documents group of size 97360
criterion WordsProximity sort took 7.53 ms

criterion SumOfWordsAttribute, documents group of size 97360
criterion SumOfWordsAttribute sort took 25.56 ms

criterion SumOfWordsPosition, documents group of size 50898
criterion SumOfWordsPosition sort took 3.23 ms

criterion Exact, documents group of size 50898
criterion Exact sort took 4.70 ms

criterion DocumentId, documents group of size 50898
criterion DocumentId sort took 6.85 ms

Found 4 results in 152.88 ms
Searching for: s

97360 total documents to classify
677358 total matches to classify

query_all took 32.94 ms

criterion SumOfTypos, documents group of size 97360
criterion SumOfTypos sort took 3.56 ms

criterion NumberOfWords, documents group of size 97360
criterion NumberOfWords sort took 3.06 ms

criterion WordsProximity, documents group of size 97360
criterion WordsProximity sort took 4.35 ms

criterion SumOfWordsAttribute, documents group of size 97360
criterion SumOfWordsAttribute sort took 9.23 ms

criterion SumOfWordsPosition, documents group of size 50898
criterion SumOfWordsPosition sort took 1.75 ms

criterion Exact, documents group of size 50898
criterion Exact sort took 2.35 ms

criterion DocumentId, documents group of size 50898
criterion DocumentId sort took 3.64 ms

Found 4 results in 61.06 ms

It seems to be a success, a 2.50x times improvement, note that we use multithreading, the rayon library is nicely designed and use a pool of threads but it could have an impact on the number of concurrent http requests.

I need to transpose the old version criterion tests to the new one.

from meilisearch.

Kerollmops avatar Kerollmops commented on May 17, 2024

Working on a simple solution brings to good timings (branch data-oriented).

97360 total documents to classify
626460 total matches to classify

query_all took 106.18 ms

criterion SumOfTypos,               documents group of size 97360
criterion SumOfTypos                sort took 4.36 ms

criterion NumberOfWords,            documents group of size 97360
criterion NumberOfWords             sort took 3.51 ms

criterion WordsProximity,           documents group of size 97360
criterion WordsProximity            sort took 1.76 ms

criterion SumOfWordsAttribute,      documents group of size 97360
criterion SumOfWordsAttribute       sort took 10.47 ms

criterion SumOfWordsPosition,       documents group of size 33657
criterion SumOfWordsPosition        sort took 5.97 ms

criterion Exact,                    documents group of size 16708
criterion Exact                     sort took 882.94 μs

criterion DocumentId,               documents group of size 16708
criterion DocumentId                sort took 1.39 ms

from meilisearch.

Kerollmops avatar Kerollmops commented on May 17, 2024

After many hours of reflection I did not find a solution to fix the query_all important overhead.

from meilisearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.