A common pattern is enqueuing multiple RUN and

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Add `DAGRUN` (or `PIPERUN`) command about redisai HOT 10 CLOSED

redisai commented on September 23, 2024

Add `DAGRUN` (or `PIPERUN`) command

from redisai.

Comments (10)

filipecosta90 commented on September 23, 2024 1

@lantiga @K-Jo @gkorland @hhsecond following the discussed today, bringing this subject up with the minimum POC required:

DAGRUN (1 level depth ) Design Proposal

Specify a direct acyclic graph of operations ( for now we only allow chaining ) to run within redisai.

AI.DAGRUN [PERSIST] [COMMAND1] |> [PERSIST] [COMMAND2] |> [PERSIST] [COMMANDN]

Some considerations:

By default, the results of each operation will only live during the execution of the DAG.
If the user specifies PERSIST on the start of a specific command, we persist it's outputs to the keyspace. If the first argument of argv for that operation is PERSIST the object savings are done via Key opening and call to RedisModule_ModuleTypeSetValue. When PERSIST is not present, object savings are done locally and kept only during the context of the DAG meaning that no output keys are open.

IF COMMAND1 sets a tensor, it can be referenced by any further command on the chaining. The reference order first to look locally, then search for the key in the keyspace.

Chaining operations:

To chain operations, we will use Pipe-forward operator (|>):
Pipe-forward operator lets you pass an intermediate result onto the next function

Commands in scope of the POC:

AI.TENSORSET
- Receives (RedisModuleCtx *ctx, RedisModuleString *argv, int argc,RedisAI_RunInfo runinfo) and outputs int
- Stores in RedisModuleCtx *ctx the tensor
- If first argument of argv is not PERSIST no key opening an no RedisModule_ModuleTypeSetValue
AI.MODELRUN
- Receives (RedisModuleCtx *ctx, RedisModuleString *argv, int argc,RedisAI_RunInfo runinfo) and outputs int
- Checks first in RedisModuleCtx *ctx for input tensors presence
- Stores in RedisModuleCtx *ctx the output tensors
- If first argument of argv is not PERSIST no key opening and no RedisModule_ModuleTypeSetValue
AI.TENSORGET
- Receives (RedisModuleCtx *ctx, RedisModuleString *argv, int argc,RedisAI_RunInfo runinfo) and outputs int
- Checks first in RedisModuleCtx *ctx for tensor presence
- PERSIST has no effect here

from redisai.

hhsecond commented on September 23, 2024

I have more thoughts on this variant but could be a bit sophisticated and might remove certain benefits that we could get from the proposed DAGRUN. I am not completely sure about the performance implication but if that's negligible, this could be a good feature to have

Proposal Update

If a user could name the DAGRUN

AI.DAGRUN dag_name MODELRUN ...

Advantages

User can use that name in the future in DAGRUNASYNC call
If user has a requirement where his DAG can't be completed with single command, he could use the name to refer to previous DAGRUN instance to access the volatile keys (it won't be volatile then :(, and it won't be removed untill user closes the DAGRUN instance) and run the same partially completed dag later. A DAGRUN command by default will close right after the execution of the command but if a user supplies a flag (like -k, k for keep), the volatile keys stays in the memory and will be deleted only when explicitly closed the DAGRUN instance.

An Example

my_dag will be available after this run

AI.DAGRUN my_dag -k TENSORSET ~a~ FLOAT 2 VALUES 2 3 => \
          TENSORSET ~b~ FLOAT 2 VALUES 2 3 => \
          MODELRUN foo INPUTS ~a~ ~b~ OUTPUTS ~c~

my_dag will be available after this run

AI.DAGRUN my_dag -k TENSORSET ~d~ FLOAT 2 VALUES 3 4 => \
          MODELRUN bar INPUTS ~c~ ~d~ OUTPUTS ~e~

my_dag will be deleted and won't be available after this run since -k flag is not present

AI.DAGRUN my_dag MODELRUN baz INPUTS ~e~ OUTPUTS f => \
          TENSORGET f

from redisai.

lantiga commented on September 23, 2024

Thank you @hhsecond, I'm going to think about this. One essential feature of DAG to me is that it has to be isolated: it has to clearly fail or succeed as a whole, volatile vars should not be visible to other clients or to other calls, the client should block for the duration of the DAG, etc.

This proposal adds some complexity, as we need to have the DAG persist in key-space itself if we want it to survive a failure, etc. So mainly the failure modes and AOF/replication need further consideration.

What we could do is proceed without naming the DAG for now (there's enough to implement) and keep options open using a slightly different syntax in the future

AI.DAGRUN NAME my_dag MODELRUN ...

This way we'll know if the DAG is named or not.

Note

BTW, important design point: regarding fail or succeed, for non-volatile keys we should treat them as volatile and then write them upon unblocking. This way if you run

AI.DAGRUN TENSORSET a FLOAT 2 VALUES 2 3 => MODELRUN foo INPUTS a OUTPUTS b => TENSORGET b

a should be treated as if it were volatile and should be written to keyspace (together with b) right before the DAG returns. This way we can handle failure modes more cleanly, without dangling intermediate keys if something goes wrong within the DAG.

from redisai.

hhsecond commented on September 23, 2024

That makes sense, let's do that. Just out of curiosity, why do you think the volatile vars should not be visible to other calls -> because of the complexity you mentioned above or are there other reasons as well?

from redisai.

lantiga commented on September 23, 2024

Making them persistent introduces a few complications that I'd rather not address right now:

when do we clean them?
do they need to be visibile to other clients? If not, what happens if a client reconnects, do we need a mechanism to identify that that client is still the same client?
what happens to AOF, serialization in general and replication? If the server goes down what happens to volatiles?

Making them live within a command gives us a lot of power for very little added complication, with no impact on Redis as we know it.

from redisai.

lantiga commented on September 23, 2024

Hi @filipecosta90 thank you. My observations:

the data pipe (|>) analogy could be misleading even for this first round, because it implies "Pipe-forward operator lets you pass an intermediate result onto the next function"; indeed you might be setting two tensors in two separate commands and then run (so it's not that the output of one goes in the other); the separator could still be ok, but what we want here is a sequence of commands with a local scope;
we need a way to assign a key to local scope on individual keys, rather than the whole command; for instance, I may want to decide that an output for MODELRUN is returned without hitting keyspace, and the other is stored (e.g. chatbot model with persistent state). This is the reason why I had the ~~wings~~ sigil around the key name. Of course this might be a silly key convention, so let's look for an alternative, but let's make it key-level.
An alternative could be to declare the local scope at the beginning of the command, as in:

AI.DAGRUN LOCALS foo bar baz |>
    AI.TENSORSET foo ... |>
    AI.TENSORSET bar ... |>
    AI.MODELRUN m INPUTS foo bar fiz OUTPUTS ket baz |>
    AI.TENSORGET baz

from redisai.

filipecosta90 commented on September 23, 2024

we need a way to assign a key to local scope on individual keys, rather than the whole command;

Given the following example, if we want to only persist ket and baz we use the PERSIST on the operation part that we the outputs (AI.MODELRUN m INPUTS foo bar fiz OUTPUTS ket baz)

AI.DAGRUN 
    AI.TENSORSET foo ... |>
    AI.TENSORSET bar ... |>
    PERSIST AI.MODELRUN m INPUTS foo bar fiz OUTPUTS ket baz |>
    AI.TENSORGET baz

If we analyze even further the example you gave, modelrun would be using 2 local "volatile" tensors ( foo and bar ) and one tensor present on the keyset ( fiz ).

Regarding the operator to use, anyone that enables us to do a quick separation is good for me :) => is a good one.

from redisai.

lantiga commented on September 23, 2024

Given the following example, if we want to only persist ket and baz we use the PERSIST on the operation part that we the outputs (AI.MODELRUN m INPUTS foo bar fiz OUTPUTS ket baz)

For more clarity, I want to have the freedom to identify keys as volatile (I just added volatile here in the name for more clarity):

AI.DAGRUN LOCALS volatile_foo volatile_bar volatile_baz |>
    AI.TENSORSET volatile_foo ... |>
    AI.TENSORSET volatile_bar ... |>
    AI.MODELRUN m INPUTS volatile_foo volatile_bar fiz OUTPUTS ket volatile_baz |>
    AI.TENSORGET volatile_baz

I don't like command-level PERSIST or LOCAL because they impose an unnecessary constraint, as they apply to all the output keys.

I'm leaning strongly towards declaring LOCALS as part of the command. It's very explicit and flexible.

from redisai.

filipecosta90 commented on September 23, 2024

Given the following example, if we want to only persist ket and baz we use the PERSIST on the operation part that we the outputs (AI.MODELRUN m INPUTS foo bar fiz OUTPUTS ket baz)

For more clarity, I want to have the freedom to identify keys as volatile (I just added volatile here in the name for more clarity):
AI.DAGRUN LOCALS volatile_foo volatile_bar volatile_baz |>
    AI.TENSORSET volatile_foo ... |>
    AI.TENSORSET volatile_bar ... |>
    AI.MODELRUN m INPUTS volatile_foo volatile_bar fiz OUTPUTS ket volatile_baz |>
    AI.TENSORGET volatile_baz
I don't like command-level PERSIST or LOCAL because they impose an unnecessary constraint, as they apply to all the output keys.

I'm leaning strongly towards declaring LOCALS as part of the command. It's very explicit and flexible.

You're right :) Agree that it brings more freedom and control from command based granularity to output based 👍

from redisai.

lantiga commented on September 23, 2024

With reference to:

AI.TENSORSET

Receives (RedisModuleCtx *ctx, RedisModuleString *argv, int argc,RedisAI_RunInfo runinfo) and outputs int
Stores in RedisModuleCtx *ctx the tensor
If first argument of argv is not PERSIST no key opening an no RedisModule_ModuleTypeSetValue
AI.MODELRUN

Receives (RedisModuleCtx *ctx, RedisModuleString *argv, int argc,RedisAI_RunInfo runinfo) and outputs int
Checks first in RedisModuleCtx *ctx for input tensors presence
Stores in RedisModuleCtx *ctx the output tensors
If first argument of argv is not PERSIST no key opening and no RedisModule_ModuleTypeSetValue
AI.TENSORGET

Receives (RedisModuleCtx *ctx, RedisModuleString *argv, int argc,RedisAI_RunInfo runinfo) and outputs int
Checks first in RedisModuleCtx *ctx for tensor presence
PERSIST has no effect here

Apart from PERSIST (see above), it all sounds good.

What needs a bit of extra care is the fact that whatever comes after MODELRUN will have to be called from the Run_Reply callback. Since generalizing this may be brittle, I propose to keep an intermediate representation of the DAG in ctx, indicating the done/in progress/todo state, so that Run_Reply can desume what needs to be done next.

This way generalizing to multi-level DAG might not be that much of a deal.

from redisai.

Add `DAGRUN` (or `PIPERUN`) command about redisai HOT 10 CLOSED

Comments (10)

DAGRUN (1 level depth ) Design Proposal

Some considerations:

Chaining operations:

Commands in scope of the POC:

Proposal Update

Advantages

An Example

Note

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent