Code Monkey home page Code Monkey logo

Comments (9)

GiedriusS avatar GiedriusS commented on July 21, 2024

Not sure I get the report. Are you saying that even after reverting the latency didn't go back? 🤔 How does thanos_receive_forward_delay_seconds look like? The number of go routines? Is everything OK with ingesters (grpc_server_handling_seconds_bucket{grpc_method="RemoteWrite"})?

from thanos.

jnyi avatar jnyi commented on July 21, 2024

Actually I suspect it is special to our setup because we use multi az hashring, will debug more with tracing:

[
    {
        "endpoints": [
            {
                "address": "thanos-receive-rep0-0.thanos-receive-svc:10901",
                "az": "zone-0"
            },
            {
                "address": "thanos-receive-rep0-1.thanos-receive-svc:10901",
                "az": "zone-0"
            },
            {
                "address": "thanos-receive-rep1-0.thanos-receive-svc:10901",
                "az": "zone-1"
            },
            {
                "address": "thanos-receive-rep1-1.thanos-receive-svc:10901",
                "az": "zone-1"
            },
            {
                "address": "thanos-receive-rep2-0.thanos-receive-svc:10901",
                "az": "zone-2"
            },
            {
                "address": "thanos-receive-rep2-1.thanos-receive-svc:10901",
                "az": "zone-2"
            }
        ],
        "hashring": "thanos-receive"
    }
]

from thanos.

jnyi avatar jnyi commented on July 21, 2024

We see large forward delays but grpc latency is low:

Screenshot 2024-04-08 at 2 49 08 PM

from thanos.

yeya24 avatar yeya24 commented on July 21, 2024

Is it because there is not enough workers due to #7045 and requests keep queueing?

from thanos.

jnyi avatar jnyi commented on July 21, 2024

I've used 10 x cpu cores as the # of workers, maybe that's not enough? Also got some tracing results:

Before in v0.34:
Screenshot 2024-04-08 at 4 27 37 PM

After in v0.35
Screenshot 2024-04-08 at 4 27 28 PM
:

from thanos.

jnyi avatar jnyi commented on July 21, 2024

increased the worker count to 3000, the request started to be sequential in newer version:

Screenshot 2024-04-08 at 4 57 43 PM Screenshot 2024-04-08 at 4 57 22 PM

from thanos.

jnyi avatar jnyi commented on July 21, 2024

I think I found the bug, this RemoteWriteAsync operation isn't parallel but sequential due to res := <-w.workResult:

	// Do the writes to remote nodes. Run them all in parallel.
	for writeDestination := range remoteWrites {
		h.sendRemoteWrite(ctx, params.tenant, writeDestination, remoteWrites[writeDestination], params.alreadyReplicated, responses, wg)
	}

func (p *peerWorker) RemoteWriteAsync(ctx context.Context, req *storepb.WriteRequest, er endpointReplica, seriesIDs []int, responseWriter chan writeResponse, cb func(error)) {
	p.initWorkers()

	w := peerWorkItem{
		cc:          p.cc,
		req:         req,
		workResult:  make(chan peerWorkResponse, 1),
		workItemCtx: ctx,
		er:          er,

		sendTime: time.Now(),
	}

	p.work <- w
	res := <-w.workResult

	responseWriter <- newWriteResponse(seriesIDs, res.err, er)
	cb(res.err)
}

from thanos.

jnyi avatar jnyi commented on July 21, 2024

Hi @yeya24 and @GiedriusS , I've submitted a fix, appreciate your review: #7267

from thanos.

douglascamata avatar douglascamata commented on July 21, 2024

@jnyi there's one more conflict to solve in the PR, FYI. You were also pinged there.

from thanos.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.