Comments (8)
@jpountz I will do that. But, keep the testing that provides the parallelism to ensure we are covered when things are enabled in the future
from lucene.
git bisect points to this commit: b940511
from lucene.
Intra merge concurrency is causing a race condition here it seems? I can debug where.
from lucene.
Reverting point value parallelism fixes this bug. This tells me that with how merging point values from multiple threads is busted. I will see if there is a quick fix. If there isn't I vote we do not merge points in parallel.
from lucene.
Ah, looking at the clone
code for merge state @jpountz
for (int i = 0; i < storedFieldsReaders.length; ++i) {
if (storedFieldsReaders[i] != null) {
storedFieldsReaders[i] = storedFieldsReaders[i].getMergeInstance();
}
if (termVectorsReaders[i] != null) {
termVectorsReaders[i] = termVectorsReaders[i].getMergeInstance();
}
if (normsProducers[i] != null) {
normsProducers[i] = normsProducers[i].getMergeInstance();
}
if (docValuesProducers[i] != null) {
docValuesProducers[i] = docValuesProducers[i].getMergeInstance();
}
if (fieldsProducers[i] != null) {
fieldsProducers[i] = fieldsProducers[i].getMergeInstance();
}
if (pointsReaders[i] != null) {
pointsReaders[i] = pointsReaders[i].getMergeInstance();
}
if (knnVectorsReaders[i] != null) {
knnVectorsReaders[i] = knnVectorsReaders[i].getMergeInstance();
}
}
This assumes getMergeInstance()
does something. But points simply returns this
.
I am thinking that sharing a reader
between threads is always a bad idea.
from lucene.
Wow, thanks for finding this, it's indeed broken. I'll look into it.
from lucene.
@jpountz given the feature freeze of 9.12, what do you think of disabling intra-merge parallelism for everything :/ and we enable it one at a time for things in the future as wrinkles are worked out?
from lucene.
Agreed, this sounds safer with 9.12 around the corner.
from lucene.
Related Issues (20)
- Gradle builds slow to start HOT 2
- monitor: CollectingMatcher
- Improve TestTaxonomyFacetAssociations#validateFloats to not rely on summation ordering HOT 3
- Stop duplicating per-segment work across segment partitions
- Should KNN indexing throw an exception if `beamWidth < maxConn` to alert users to misconfiguration? HOT 2
- DrillSideways does not support intra-segment concurrency
- Nightly gh action "buildAndPushRelease and smokeTestRelease.py" should save release.log on failure
- Make dynamic range facets value collection and sorting faster
- IntObjectHashMap.values().toArray() method throws ClassCastException
- Can `gradle tidy` reflow text properly when it inserts newlines? HOT 4
- Can we remove `compress` option for quantized KNN vector indexing? HOT 7
- Support random access byte/float vector values as first class part of API
- The "PatternCaptureGroupTokenFilter" generates identical offsets, which causes issues with highlighting the string.
- Use ULP float comparison instead of epsilon-based comparison
- Should we auto-adjust top score doc and top field collector manager based on slices? HOT 2
- Odd nightly error in buildAndPushRelease: NoSuchMethodException: no such method: java.lang.invoke.MethodHandle.linkToStatic HOT 1
- SpanOrQuery uses IDFs of failed subqueries in score calculation.
- Relax Lucene Index Upgrade Policy to Allow Safe Upgrades Across Multiple Major Versions HOT 1
- Should EdgeNGramTokenizer's DEFAULT_MAX_GRAM_SIZE be ONE?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lucene.