Comments (9)
Not sure I get the report. Are you saying that even after reverting the latency didn't go back? 🤔 How does thanos_receive_forward_delay_seconds
look like? The number of go routines? Is everything OK with ingesters (grpc_server_handling_seconds_bucket{grpc_method="RemoteWrite"}
)?
from thanos.
Actually I suspect it is special to our setup because we use multi az hashring, will debug more with tracing:
[
{
"endpoints": [
{
"address": "thanos-receive-rep0-0.thanos-receive-svc:10901",
"az": "zone-0"
},
{
"address": "thanos-receive-rep0-1.thanos-receive-svc:10901",
"az": "zone-0"
},
{
"address": "thanos-receive-rep1-0.thanos-receive-svc:10901",
"az": "zone-1"
},
{
"address": "thanos-receive-rep1-1.thanos-receive-svc:10901",
"az": "zone-1"
},
{
"address": "thanos-receive-rep2-0.thanos-receive-svc:10901",
"az": "zone-2"
},
{
"address": "thanos-receive-rep2-1.thanos-receive-svc:10901",
"az": "zone-2"
}
],
"hashring": "thanos-receive"
}
]
from thanos.
We see large forward delays but grpc latency is low:
from thanos.
Is it because there is not enough workers due to #7045 and requests keep queueing?
from thanos.
I've used 10 x cpu cores as the # of workers, maybe that's not enough? Also got some tracing results:
from thanos.
increased the worker count to 3000, the request started to be sequential in newer version:
from thanos.
I think I found the bug, this RemoteWriteAsync operation isn't parallel but sequential due to res := <-w.workResult
:
// Do the writes to remote nodes. Run them all in parallel.
for writeDestination := range remoteWrites {
h.sendRemoteWrite(ctx, params.tenant, writeDestination, remoteWrites[writeDestination], params.alreadyReplicated, responses, wg)
}
func (p *peerWorker) RemoteWriteAsync(ctx context.Context, req *storepb.WriteRequest, er endpointReplica, seriesIDs []int, responseWriter chan writeResponse, cb func(error)) {
p.initWorkers()
w := peerWorkItem{
cc: p.cc,
req: req,
workResult: make(chan peerWorkResponse, 1),
workItemCtx: ctx,
er: er,
sendTime: time.Now(),
}
p.work <- w
res := <-w.workResult
responseWriter <- newWriteResponse(seriesIDs, res.err, er)
cb(res.err)
}
from thanos.
Hi @yeya24 and @GiedriusS , I've submitted a fix, appreciate your review: #7267
from thanos.
@jnyi there's one more conflict to solve in the PR, FYI. You were also pinged there.
from thanos.
Related Issues (20)
- [Thanos Storegateway ]"failed to read index-header from disk; recreating" path=/data/01H1K45K0FRT36S1RCAWHW7R9A/index-heade HOT 1
- Adding User Agent to HTTP Logs
- Compact: Display TODO plan HOT 7
- compactor: does not compact 4 consecutive 2-hour blocks HOT 6
- compactor: series not 16-byte aligned error HOT 2
- Improved file access logging
- Sidecar: reporting as ready on startup when no Prometheus process is running
- tools bucket: Add ability to discover external labels from prometheus address for `upload-blocks` HOT 1
- Thanos Sidecar - Flush Endpoint HOT 9
- Grafana only shows raw data from Thanos HOT 4
- Instance Principal Provider - Region issue
- Consider X-Forwarded-For on HTTP/GRPC Logging
- When I restart any Receive, the entire Seek cluster is unavailable, and the reboot can only be restored after the local data is fully understood HOT 2
- MaxTime is set to a too large number when doing larger latency requests? HOT 1
- Unauthorized errors for some endpoints with query-frontend HOT 1
- External labels not applied to alerts HOT 3
- 0.35: Panic with query mode distributed HOT 1
- query: Passing `THANOS-TENANT: <tenant>` header has no effect unless `--query.enforce-tenancy` is set HOT 3
- query: different results for rate function when not dedup or using implicit step interval HOT 8
- Thanos compactor causing huge memory spikes when compacting raw blocks HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from thanos.