Code Monkey home page Code Monkey logo

Comments (3)

jbbarth avatar jbbarth commented on July 18, 2024

Here's another illustration for this problem, extracted from real code that was misbehaving on SWF.

Let's assume we run two tasks for two sub-datasets (called "blocks" below), each second one depending on the completion of the first one for the same block. If the first task takes longer for block 0, then we'll submit second task twice for block 1, and never for block 0, with the following code:

import time

from simpleflow import activity, Workflow, futures

@activity.with_attributes(task_list="test", version="1.0")
def first_activity(block):
    print "start first_activity for block {}".format(block)
    if block == 0:
        time.sleep(5)
    else:
        time.sleep(1)
    print "finish first_activity for block {}".format(block)

@activity.with_attributes(task_list="test", version="1.0")
def second_activity(block):
    print "triggered second_activity for block {}".format(block)

class ChangingWorkflow(Workflow):
    name = "changing"
    version = "1.0"
    task_list = "test"

    def run(self):
        _all = []

        for block in [0, 1]:
            first = self.submit(first_activity, block)
            _all.append(first)

            if first.finished:
                second = self.submit(second_activity, block)
                _all.append(second)

        futures.wait(*_all)

The resulting log in a single activity worker speaks for itself:

start first_activity for block 1
start first_activity for block 0
finish first_activity for block 1
triggered second_activity for block 1
finish first_activity for block 0
triggered second_activity for block 1

If we can assume that tasks are idempotent, sure that simplifies the problem. I'm not sure about variations of the arguments though (we can imagine simple scenarios where the arguments themselves would vary in the workflow replay, and we probably don't want to spin up a new activity each time).

A more generic solution would consist in giving the ability to control the task id generation from the Workflow.submit() method or a decorator on the task, for instance by specifying a suffix that is user-controlled and varies with some parameter the task depends on (like block here).

What I don't like here overall is that it's a trap for users, it's very easy to assume that everything will work well regarding the code above, and if you don't understand this basic simpleflow assumption you'll fail.

Back to the drawing board, stay tuned ;-)

from simpleflow.

jbbarth avatar jbbarth commented on July 18, 2024

Closing this old one, idempotent tasks mostly do the job \o/

from simpleflow.

jbbarth avatar jbbarth commented on July 18, 2024

New variant of this now that we have chains/groups. Example of a buggy decider (given "my_task" is not configured as idempotent):

self.submit(
    Group(
        Chain(ActivityTask(my_task, 1),
              ActivityTask(my_task, 2))
        Chain(ActivityTask(my_task, 3),
              ActivityTask(my_task, 4))
    )
)

=> 1st decision: 2 activities are scheduled:

  • my_task-1 with the input {"args": [1]}
  • then we jump to the next chain
  • my_task-2 with the input {"args": [3]}

Now imagine that my_task-1 finishes. A new decision is triggered.

=> 2nd decision:

  • my_task-1 is already finished, nothing to do
  • the next task in the chain gets the activity ID of my_task-2 in simpleflow ; simpleflow looks at the history and sees it's already scheduled, so nothing to do (wrong! this is not the same, but it cannot know)
  • then it sees the 1st task in the next chain, and names it my_task-3, and schedules it with the input: {"args": [3]}

In the end we will finish with 4 tasks executed:

  • my_task-1 with the input {"args": [1]}
  • my_task-2 with the input {"args": [3]}
  • my_task-3 with the input {"args": [3]}
  • my_task-4 with the input {"args": [4]}

=> probably not what we wanted...

from simpleflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.