Code Monkey home page Code Monkey logo

Comments (18)

aembke avatar aembke commented on September 28, 2024 1

The way I'll likely do this is to add a new interface (perhaps a multi_buffer function) that returns a modified version of a TransactionClient which will buffer commands in memory until EXEC or DISCARD is called. Following the EXEC the client would send all the commands, perhaps in one pipelined series of commands if configured by the caller.

Unfortunately this would require some pretty invasive changes to the client, so I'll need to think more about how to do this. Depending on how invasive this is I may have to push it out to 5.1.0 instead of the upcoming 5.0.0 release.

from fred.rs.

aembke avatar aembke commented on September 28, 2024 1

Ah actually I've been wondering about manual pipelining for a while now. For most of my use cases the automatic option covered what I was looking for, but I can't help but feel like I'm missing something since so many other clients provide a manual option instead. Can you tell me more about where you'd want to use manual pipelining instead of the automatic option?

from fred.rs.

aembke avatar aembke commented on September 28, 2024

Hi @mkurtak,

I'm not sure I fully understand the use case or how the redis latency comes into play, but maybe if you have a small snippet that illustrates what you're trying to do that might help.

That being said, I can maybe address your two bullet points:

Fred allows to call another MULTI once the response from EXEC has arrived. I think new MULTI can be allowed once the EXEC has been sent

This is intended since the server wont allow you to send a second MULTI before receiving an EXEC/DISCARD for the first anyways. With fred this is implemented where we disallow the next MULTI until the EXEC/DISCARD response is received (as opposed to being sent). We wait for a response before allowing another transaction in order to detect cases where the connection closes while the EXEC/DISCARD is in-flight. Even when the caller specifies the pipeline flag the client will still always wait for the EXEC/DISCARD response before allowing another MULTI.

It's not possible to split the sending the commands and receiving the response. I don't know when it is safe to call second MULTI

I may not fully understand your use case, so take this with a grain of salt, but this strikes me as an issue callers would face any time they have concurrent tasks sharing a connection where at least one task is using the transaction interface. At first glance this seems like it would require some form of synchronization between the tasks, but I might be misunderstanding something.

That being said, it should be possible to do what you want with fred, but in its current state it might be pretty difficult. I can imagine supporting this in cases where the client was not pipelined and the caller made sure EXEC was called before the next MULTI, even if the MULTI command was sent prior to the EXEC response arriving. I intended to support this use case when I made MULTI, EXEC, and DISCARD act like blocking commands to allow for this exact thing, but I don't think I ever tested this with concurrent tasks so I'm not sure how this would look or whether it actually works reliably without some form of manual synchronization.

I'll need to think on if there's a way to do this in a clean way in fred though. Or if I'm misunderstanding your use case please let me know.

from fred.rs.

aembke avatar aembke commented on September 28, 2024

Just to expand on the first part a bit more - fred waits for the EXEC response to come back on the current connection before allowing another MULTI to avoid the following situation:

  1. The caller starts a transaction.
  2. The caller sends EXEC or DISCARD.
  3. The caller sends MULTI to start a new transaction.
  4. While the EXEC/DISCARD is in-flight to the server the connection dies.
  5. The replay logic replays EXEC/DISCARD, then MULTI. To the transaction task everything looks fine so far.
  6. A bunch of on_reconnect logic runs that doesn't account for it being run in a transaction. The caller sees all sorts of errors due to not expecting a "QUEUED" response from the server in this context.

For this reason fred makes sure that transactions start and finish fully on a healthy connection. Or if that cant happen then we abort the transaction.

To be honest, I tend to just use lua nowadays everywhere I need a transaction to avoid all of this. But I understand that's not really an option for a lot of use cases so I'm not advocating for it here in your case.

In the meantime if you have some scaffolding that illustrates what you're trying to do (or how you want it to look with the different tasks) I'll add it to the tests and make it work if possible for the next version. Transactions have always been a weak spot for me since I just don't use them all that often, so it's certainly possible that fred needs some changes to work effectively here.

from fred.rs.

mkurtak avatar mkurtak commented on September 28, 2024

Hi @aembke,

I understand your point. We are migrating from javascript ioredis, which ends a transaction when exec is sent. We are using redis that has higher latency than the writer.

I am not sure how would LUA solve the issue. Do you suggest to install LUA script on server and then send commands calling functions from client? Does fred allow to have multiple LUA scripts in-flight?

I'll prepare an example during the weekend.

from fred.rs.

mkurtak avatar mkurtak commented on September 28, 2024

Sorry @aembke I didn't have time to prepare the example, but I understand your point. Should I close the issue?

from fred.rs.

aembke avatar aembke commented on September 28, 2024

@mkurtak I think I understand where you're coming from in that it's currently difficult with the current interface to synchronize transactions across different concurrent tokio tasks. Callers could do this with an async mutex, but that's pretty tedious.

I'm in the middle of the next major version at the moment so I have some flexibility to change the interface to make this easier. The client currently supports a in_transaction() function that can be used to check if another task is using the connection for a transaction, but that's not really that useful since then callers are forced to sleep or spin before checking that value again.

If I added an interface that allowed callers to wait for a transaction to finish would that help?

from fred.rs.

mkurtak avatar mkurtak commented on September 28, 2024

@aembke thanks for your help. Actually, the bigger problem is that there is no possibility to have multiple running transactions, so we will probably migrate to Lua.

I think having this feature would be nice, but since we will not use it, I don't want to push you in this direction.

from fred.rs.

aembke avatar aembke commented on September 28, 2024

Sounds good. If ioredis allows for multiple concurrent transactions they must either be queueing commands up in memory before sending anything (with their own synchronization in the client), or they must be creating new connections on the fly. A transaction sets global state on the connection so callers can't use multiple transactions on the same connection at the same time.

Fred takes a different approach and won't create multiple connections, so doing this with fred would likely require some more invasive design changes to your app. Or, like you said, if you switch to lua then there's no connection state change, so everything tends to become simpler.

I'll close this out in the meantime, but let me know if you have issues with the lua interface.

from fred.rs.

alkeryn avatar alkeryn commented on September 28, 2024

@aembke sorry for the bump but wouldn't it be possible to have a form of queue so that you can allow creating multiple transactions on different threads / tokio tasks, but it will make them wait for them to complete upon exec ?.
this way you don't have to fail tasks or implement some crazy waiting within them or make a form of lilo queue yourself.

the way it works right now the performance penality can be high for things like actix web.

from fred.rs.

aembke avatar aembke commented on September 28, 2024

Yeah absolutely that's a good idea. This is good timing too because this could be a breaking change to the transactions interface, but I'm planning on releasing a new major version soon.

The tradeoff with this approach is that you wont necessarily get immediate feedback on intermediate commands failing prior to the exec/discard. However, with this approach it would be quite a bit easier to do retry on entire transactions, and would also likely make it easier to reason about concurrent transactions like you mentioned. It would also provide an opportunity for callers to configure pipeline settings on individual transactions.

from fred.rs.

alkeryn avatar alkeryn commented on September 28, 2024

thanks man, take the time you need, i mean you are providing us with a great lib for free, i'm not gonna complain !
though one thing i'd really want too is manual pipelining, that'd be great ^^

from fred.rs.

alkeryn avatar alkeryn commented on September 28, 2024

@aembke here is an example case scenario where i'd have wanted to use manual pipelining.
users can like another users's profile, but to waste less trafic, i send them as an array of a hundred likes instead of doing a get request each time.

the array contains a list of userid to be liked.

so i end up doing a bunch of zadd in a row.
can't use multi key commands as it'll be clustered and putting them in a hash tag didn't seem like a good solution to me.

rn the best option i found was to make a vector of futures for all of the zadd's and await them all at once.

if i could just pipeline all the commands in one go that'd be neat.

i'd imagine something like have a pilinepe object where you can just add commands to just like if it was a session, and then send them all at once with a single pipeline.send().await or something like that.

i'm guessing fred's automatic pipelining might be pretty good but as i didn't write it myself idk how much case scenarios it does cover.

though using manual pipelining exclusively and disabling automatic pipelining may reduce latency as i'm assuming your driver may wait a little to see if there are more request to be made so it can pipeline them.

what do you think ? :)

from fred.rs.

aembke avatar aembke commented on September 28, 2024

Ah ok. I'm not sure I fully understand, but I'll do my best to describe what Fred does and how that may help here.

Bear with me here, this will be a long explanation, but I've wanted to put this in the docs for a while, so I'll take a shot at writing it up here first.

Here's some background and assumptions I'm making about the client and server:

  • A client instance only creates and uses a single connection. If a client pools connections under the hood and round-robins commands against this pool then I can understand the need for manual pipelining. However, Fred does not do this.
  • The server processes commands sequentially. This is one reason Redis is single threaded - it makes reasoning about command ordering much easier.
  • The connection will always be based on TCP, so we can rely on data arriving in the order that it is sent.
  • The protocol does not support any form of explicit multiplexing, therefore responses must arrive in the order requests are sent (you can think of the command stream to the server as a FIFO queue essentially).

Here are some network diagrams I'm thinking of adding to the docs to better explain this, but I'll include them here:

A non-pipelined client has the following network flow:

Non Pipelined

And the pipelined equivalent (without the edge labels so it doesn't get too cluttered):

Pipelined

Just to call out a few things to anybody that might read this in the future:

  • With or without pipelining you're sending and receiving the same number of bytes, so this doesn't affect network usage in that sense.
  • Wrapping the requests in a transaction also doesn't affect this. Folks might not know this, but the server sends a QUEUED response to every intermediate command in a transaction, so you're not actually lowering network traffic by using a transaction since every intermediate command still gets a response before it's processed. In fact, the opposite happens. When you call EXEC the server sends the responses for every intermediate command in the final EXEC response, so you actually end up receiving more data (QUEUED + the actual response later) than if you had not used a transaction. The only real way to collapse multiple requests into a single response is by using lua scripts.

The only thing you're saving with pipelining is RTT (round trip time). The redis docs do a good job covering this too.

I'm guessing you're familiar with everything I just covered there, but I wanted to put it in here for anybody that stumbles on this discussion in the future.

Now, here's what Fred does:

In order to implement pipelining all we really need to do is attach a FIFO queue to each connection. Fred uses a VecDeque for this. When a command goes out on the wire we push it to the back of the queue, and when a response is received we pop a command off the front of the queue. Since the server always processes commands in order, and TCP guarantees in-order delivery, we know the first command in the queue always corresponds to the next response we'll receive.

Fred will use this mechanism all the time, but when you disable pipelining all it's really doing is capping the size of that queue to 1. When pipelining is enabled that queue can grow arbitrarily until we hit any of the limits defined in the backpressure settings on the client.

Now in your case you already identified the real issue here - when or how do you await the future associated with each command? Let's say for the sake of argument you're doing something like this:

// notice the lack of `await` on each of these
let commands = vec![
  client.get("foo"),
  client.get("bar"),
  client.get("baz")
];

// this might not compile but you get the idea
let result: Vec<String> = futures::future::try_join_all(commands).await?;

Now regardless of whether pipelining is enabled the client is going to get 3 responses from the server. The only thing that really matters here is that your app doesn't require the response from GET foo before it can send GET bar, etc. However, IMO it's better to express this via the presence or lack of await points rather than explicit pipelining in the client interface.

There's some subtlety here, but this is why I made the intentional choice in Fred that a request will go into an in-memory queue before you call await. This preserves the ordering of your commands even when you defer calling await until later via select or join, etc. Therefore, complicated use cases that use select, join, etc only determine how you await responses, but not the order that requests hit the wire. The ordering of your requests on the wire will always match the order you call the functions as if they were blocking/synchronous with respect to the calling thread even if you defer await points.

Now here's where this really matters, and why I made pipelining a client implementation detail rather than exposing manual pipelines to callers.

In most cases you're going to use Redis in a web application, or something that accepts requests from callers over HTTP, gRPC, AMQP, etc, but something where you're processing a lot of concurrent requests. Nearly every server framework out there is going to give you an interface that intentionally limits the scope of your functions to a single request. In other words, you're probably never going to be operating on all concurrent requests in memory on your server at the same time in one function scope.

So let's say you need to make 3 Redis requests per incoming request in your web app. And let's assume you're using a Redis client that doesn't have automatic pipelining, but rather requires callers to use a manual pipeline interface. Also, let's assume all incoming requests are going to share the same Redis client, and therefore the same Redis connection.

Assuming you use the manual pipeline interface for the 3 Redis requests per incoming HTTP/gRPC/whatever request, you'd see the following network flow.

Manual Pipelining

But that's not what you want, you want this:

Automatic Pipeline

To be fair, that last image looks better than the first partially just because I drew the request overlays on the server side. But I didn't want to clutter the image too much drawing them on the client due to the overlapping labels.

However, in order to achieve that last network flow diagram you'd need to pipeline your Redis requests across all incoming requests to your server, which creates a lot of problems. When do you finalize and send the pipeline? How would you need to restructure your code to actually make this happen if you can only access one request in any given function scope?

In a nutshell, this is why I opted to make pipelining automatic. The Redis client can view and operate on every outbound request regardless of how you have your app code structured, so that seemed like the best place to hide the complexity that comes with pipelining. If you enable the automatic pipeline feature you'll see a network flow diagram like that last image.

Finally, I should probably better document this behavior, but I want to respond to your last point:

though using manual pipelining exclusively and disabling automatic pipelining may reduce latency as i'm assuming your driver may wait a little to see if there are more request to be made so it can pipeline them.

The client does do some checking to see if any other requests are queued when it sends one, but not for the purposes of pipelining. The only thing Fred uses this information for is to decide whether or not it should flush the socket after sending some bytes. If there are queued commands waiting to be sent Fred will defer flushing the socket until either there are no more commands sitting in memory waiting to go out over the wire, or until the max_feed_count value is reached. Callers then have some control over how often the socket is flushed via this setting. However, this doesn't affect the automatic pipelining - either way the commands will be written to the socket as fast as possible. This is done as an optimization to avoid unnecessary syscalls, but callers can fine tune this with that max_feed_count setting.

TLDR:

In your case you want to keep the join_all (or equivalent) implementation, and if you enable automatic pipelining you'll see the exact same network flow diagram that you would if you had used a manual pipelining interface. The only difference is that you express this via await points (or the lack thereof + join_all) rather than explicit pipelining.

from fred.rs.

alkeryn avatar alkeryn commented on September 28, 2024

thanks for all these detailed explications ! :)
you put a lot of time into writing it, even more so with the graphs and all ! :)

however i was just curious about something, for some reason running await in a for loop is slightly faster than running them all
without await then joining, well i was just toying around.
but here is a sample of code :

// this is faster
  let mut k : String = String::new();
  
  for i in 0..100 {
      k = db.get::<String, _>("key").await?;
  }                                               
// this is slower
let mut v : Vec<_> = Vec::new();
let mut k : String = String::new();

for i in 0..100 {
    v.push(db.get::<String, _>("key"));
}

for i in v {
    k = i.await?;
}

i was expecting the former to be much slower than the later but it is like 10% faster for some reasons.
try_join_all() was the fastest of them all by a much bigger margin though.
do you have any idea why ? :)

from fred.rs.

aembke avatar aembke commented on September 28, 2024

Hmm, that's interesting. Those should have the exact same behavior on the wire, but there might be some strangeness going on with the scheduler there. Under the hood each request uses a oneshot channel to respond to the caller, and I've seen some strange performance behavior there. Often it's not bad, and in many cases it's actually fast when I expect it to be slow and vice versa, but I was never able to figure out why.

For example, here's a similar thing I encountered in the pool implementation. I thought the old behavior should work (essentially doing what you're doing with await in a loop). but in fact it had a race condition where it wouldn't work. But once I switched to using join it not only worked every time, but also was much faster. In the end one thing I've come to realize is that tokio is a strange beast and if you want futures to run concurrently you should always use select, join, etc, instead of loops with await.

My suspicion is that every await point has more overhead to it than you might think, and by using join, select, etc you're cutting out a lot of explicit await points. That's just speculation though.

from fred.rs.

aembke avatar aembke commented on September 28, 2024

If you have a jaeger (or whatever tracing vendor you prefer) setup going locally you could try turning on partial tracing to see the difference here too. That might help answer some of the questions. The pipeline test code shows how to do this if you want to take some time to set that up.

from fred.rs.

aembke avatar aembke commented on September 28, 2024

I'm going to put a note about this in the TODO list and take a look at it for 5.1.0. in the meantime i'm going to close this for now, but feel free to reopen.

from fred.rs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.