Comments (3)
Yeah you're right, Java is also garbage collected, so perhaps my explanation is wrong or too simplistic. But at a minimum search-benchmark-game would benefit from some variance/stddev metric to give a sense of whether or not the measurements had converged to something meaningful. When search-benchmark-game first reported that Bleve v2 was slower than v1 I spent a week trying to answer why, and eventually gave up. When I reduced the workload to a single query at a time, I would often see the results reverse with v2 being faster. But the whole thing was inconsistent, and we never were able to reliably separate the signal from the noise. I documented some of approach/findings here: blevesearch/bleve#1550
Basically, up till now, we've been focused on making the indexing time and final index size comparable to Lucene. And while we're still not all the way there, we're close enough now to expect that our index size is no longer the only explanation for poor search performance. Obviously additional research into this is needed.
Just to be clear, for a Bluge v1 release, I'd like Bluge to perform as well as Bleve v2. However, matching Lucene's performance is a long-term goal that we will continue to work on.
from bluge.
First, yes absolutely Bluge should perform the same or better than Bleve v2 (upon which it is largely based). There are 2 reasons why today it may not:
- An important optimization for non-scoring queries was removed from Bluge. The reason is that I think Bleve got the design wrong, and I'd like to improve that in Bluge.
- Bleve/Bluge search has many tight loops, and it's very likely we introduce some unintentional perf regressions along the way while refactoring the code. These tight loops have the effect magnifying a small change to produce large effects.
Regarding search-benchmark-game, I love the idea, and I've spend weeks of time working with it. Unfortunately, I came to the conclusion that the design of search-benchmark-game is problematic for comparing Bleve and Bluge. The trouble I ran into is that it runs a mixed-workload of different query types in the same process. It then attempts to measure them independently and present the results. Unfortunately, because Go is garbage collected, there are often cases where the runtime may be performing work related to previous queries. I was able to modify the way I ran search-benchmark-game, to focus on single queries and while that helped significantly, we still were unable to get consistent reproducible results. (NOTE: running mixed query loads is also makes it harder to interpret pprof output collected as well)
As, we are going to need some sort of new perf test framework to validate this Bluge vs Bleve comparison, I think it will certainly be inspired by search-benchmark-game (and use the same dataset). But, I suspect it will be more specific to the Bluge vs Bleve comparison to simplify things.
from bluge.
Any specific reason(s) you think bleve/bluge does not perform as well as lucene (java - garbage collected language) in the benchmark considering it is following similar approach to indexing.
from bluge.
Related Issues (20)
- panic while merging in unit test HOT 5
- ice v2 data race HOT 6
- Comparison with Bleve and others HOT 1
- index out of range when visiting stored fields HOT 4
- Date aggregations support HOT 2
- TestBug87 fails in custom implementation of search.Context HOT 2
- Question on aggregation bucket HOT 2
- makeslice len out of range
- Define logger interface
- multi index search
- Example of indexing a document with tags? HOT 1
- Difference between a NewTextField() and NewKeywordField()
- Sorting by ascending order of _score
- Indexing/Analyzing URLs, Email Addresses, etc?
- Indexing/Querying Emojis
- Is there a way to use this library more as a caching layer?
- Concurrently close writer panic HOT 1
- Pre-query for getting terms list.
- index out of range panic
- Memory Size or Limiting Memory usage
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bluge.