Our requests limiter works like a semaphore now. The requests are not processed in a F

I can prove this with an Etch-A-Sketch (so to speak): <a href="https://github.com/gunn

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Here's an illustration of what happens <a href="https://play.golang.org/p/V4riLAtcUAq"

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Unlucky requests can get needlessly dropped about carbonapi HOT 11 CLOSED

bookingcom commented on May 19, 2024

Unlucky requests can get needlessly dropped

from carbonapi.

Comments (11)

gunnihinn commented on May 19, 2024 2

I can prove this with an Etch-A-Sketch (so to speak): https://github.com/gunnihinn/queuesim

from carbonapi.

ericherman commented on May 19, 2024 2

@gunnihinn I would be interested in seeing a ring-buffer added to the model.

Ring-buffers are sometimes found in soft-realtime applications like audio streams because during periods of overload, the dropped data is experienced as a "hickup" in the stream, but the content continues to (largely) be sensible. By contrast, dropping data non-contiguously is more likely to create nonsense and noise. Perhaps using a ring-buffer would be a sensible approach to shedding in this case?

(Another aspect which can be an advantage is that the buffer is allocated at the start and remains constant size for the duration of execution, however I guess that is probably less important here.)

from carbonapi.

gunnihinn commented on May 19, 2024 1

You actually don't want a FIFO queue.

Under normal, happy conditions it doesn't matter what type of queue you have because almost everything gets served.

When you're under heavy load and need to start shedding some to survive, using a FIFO queue can cause you to time out almost all the requests sitting in your queue if you pick a few heavy requests from it in a row. If you instead pick requests randomly, or using a FILO queue, you increase the expected number of requests you can serve even under that load.

IIRC, the first Google SRE book has some discussion of this in one of the chapters on load shedding or cascading failures. Andre Vereira at Booking also knows some stuff about this if you want a second face-to-face opinion.

from carbonapi.

grzkv commented on May 19, 2024

Here's an illustration of what happens https://play.golang.org/p/V4riLAtcUAq

from carbonapi.

ksurent commented on May 19, 2024

If you're looking to optimise throughput in scenarios with many concurrent requests, there're some systems out there that adopted LIFO queues. Could be interesting to see if this strategy can be useful here.

from carbonapi.

grzkv commented on May 19, 2024

@ksurent Thanks for the suggestions. We will be able to compare after we do the refactoring.

I think, in this case, we need something simple. Just a classic limited-length requests queue instead of a semaphore. Or something like that.

from carbonapi.

ksurent commented on May 19, 2024

Sure, I was just suggesting a different queueing discipline (LIFO vs FIFO).

from carbonapi.

grzkv commented on May 19, 2024

@gunnihinn @ksurent

Thanks for all the insight you shared. This is a much more interesting problem than it seems at first glance. The suggestion to go with FIFO looked to be the simplest approach but probably is not the best one.

I will invest time to study the topic in depth and go through all the references (and the model) you gave.

from carbonapi.

grzkv commented on May 19, 2024

@gunnihinn

I can prove this with an Etch-A-Sketch (so to speak): gunnihinn/queuesim

This model does not include the case described in the first post. Also, it operates only on bounded queues, while our current "queue" is not bounded (I am not claiming it is a good thing though). The queue limiting happens by timeouts application, thus yielding a CoDel queue.

Correct me if I am getting something wrong.

While going through various materials, I agree that FIFO most likely is not the best solution. Currently, @ericherman's proposal of ring-buffer looks best to me.

from carbonapi.

gunnihinn commented on May 19, 2024

This model does not include the case described in the first post.

Indeed; I only wanted to point out that FIFO queues are maybe not the best option, and that analyzing the different options is a somewhat nontrivial task. It's your show, go with whatever performs best in whatever tests you setup.

Currently, @ericherman's proposal of ring-buffer looks best to me.

Well... is it better than the one you already have?

from carbonapi.

grzkv commented on May 19, 2024

@gunnihinn

Well... is it better than the one you already have?

Currently we make a goroutine for each request. These are waiting in "queue" for the semaphore. The problem is that there can be very many of goroutines in queue (up to 300k in one of our setups). This takes a lot of memory (8kB x 300k = 2.4GB, 8kB being initial goroutine stack size) and slows down GCs up to 1 second. This also creates contention between goroutines that slows everything down.

We need to move queuing from goroutines to some data structure. It is the discussion here about which one. Also, we need to make it bounded. Currently, it's not. Because of this, we sometimes get explosions in the number of goroutines.

It is clear that FIFO is not a good bounded structure (thanks for the model!). LIFO is not bounded, neither is a pool with random selection. We would need to make some bounding strategy on top (as it is described in the resources shared above). Compared to these, ring buffer seems to be the simplest solution. It is also naturally resilient.

But this is just a guess, we will see when we get to test these. Just explaining my thinking.

from carbonapi.

Unlucky requests can get needlessly dropped about carbonapi HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent