Code Monkey home page Code Monkey logo

Comments (3)

mschoch avatar mschoch commented on September 24, 2024 7

Yeah you're right, Java is also garbage collected, so perhaps my explanation is wrong or too simplistic. But at a minimum search-benchmark-game would benefit from some variance/stddev metric to give a sense of whether or not the measurements had converged to something meaningful. When search-benchmark-game first reported that Bleve v2 was slower than v1 I spent a week trying to answer why, and eventually gave up. When I reduced the workload to a single query at a time, I would often see the results reverse with v2 being faster. But the whole thing was inconsistent, and we never were able to reliably separate the signal from the noise. I documented some of approach/findings here: blevesearch/bleve#1550

Basically, up till now, we've been focused on making the indexing time and final index size comparable to Lucene. And while we're still not all the way there, we're close enough now to expect that our index size is no longer the only explanation for poor search performance. Obviously additional research into this is needed.

Just to be clear, for a Bluge v1 release, I'd like Bluge to perform as well as Bleve v2. However, matching Lucene's performance is a long-term goal that we will continue to work on.

from bluge.

mschoch avatar mschoch commented on September 24, 2024 2

First, yes absolutely Bluge should perform the same or better than Bleve v2 (upon which it is largely based). There are 2 reasons why today it may not:

  1. An important optimization for non-scoring queries was removed from Bluge. The reason is that I think Bleve got the design wrong, and I'd like to improve that in Bluge.
  2. Bleve/Bluge search has many tight loops, and it's very likely we introduce some unintentional perf regressions along the way while refactoring the code. These tight loops have the effect magnifying a small change to produce large effects.

Regarding search-benchmark-game, I love the idea, and I've spend weeks of time working with it. Unfortunately, I came to the conclusion that the design of search-benchmark-game is problematic for comparing Bleve and Bluge. The trouble I ran into is that it runs a mixed-workload of different query types in the same process. It then attempts to measure them independently and present the results. Unfortunately, because Go is garbage collected, there are often cases where the runtime may be performing work related to previous queries. I was able to modify the way I ran search-benchmark-game, to focus on single queries and while that helped significantly, we still were unable to get consistent reproducible results. (NOTE: running mixed query loads is also makes it harder to interpret pprof output collected as well)

As, we are going to need some sort of new perf test framework to validate this Bluge vs Bleve comparison, I think it will certainly be inspired by search-benchmark-game (and use the same dataset). But, I suspect it will be more specific to the Bluge vs Bleve comparison to simplify things.

from bluge.

prabhatsharma avatar prabhatsharma commented on September 24, 2024

Any specific reason(s) you think bleve/bluge does not perform as well as lucene (java - garbage collected language) in the benchmark considering it is following similar approach to indexing.

from bluge.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.