Comments (3)
Here's another illustration for this problem, extracted from real code that was misbehaving on SWF.
Let's assume we run two tasks for two sub-datasets (called "blocks" below), each second one depending on the completion of the first one for the same block. If the first task takes longer for block 0, then we'll submit second task twice for block 1, and never for block 0, with the following code:
import time
from simpleflow import activity, Workflow, futures
@activity.with_attributes(task_list="test", version="1.0")
def first_activity(block):
print "start first_activity for block {}".format(block)
if block == 0:
time.sleep(5)
else:
time.sleep(1)
print "finish first_activity for block {}".format(block)
@activity.with_attributes(task_list="test", version="1.0")
def second_activity(block):
print "triggered second_activity for block {}".format(block)
class ChangingWorkflow(Workflow):
name = "changing"
version = "1.0"
task_list = "test"
def run(self):
_all = []
for block in [0, 1]:
first = self.submit(first_activity, block)
_all.append(first)
if first.finished:
second = self.submit(second_activity, block)
_all.append(second)
futures.wait(*_all)
The resulting log in a single activity worker speaks for itself:
start first_activity for block 1
start first_activity for block 0
finish first_activity for block 1
triggered second_activity for block 1
finish first_activity for block 0
triggered second_activity for block 1
If we can assume that tasks are idempotent, sure that simplifies the problem. I'm not sure about variations of the arguments though (we can imagine simple scenarios where the arguments themselves would vary in the workflow replay, and we probably don't want to spin up a new activity each time).
A more generic solution would consist in giving the ability to control the task id generation from the Workflow.submit()
method or a decorator on the task, for instance by specifying a suffix that is user-controlled and varies with some parameter the task depends on (like block
here).
What I don't like here overall is that it's a trap for users, it's very easy to assume that everything will work well regarding the code above, and if you don't understand this basic simpleflow assumption you'll fail.
Back to the drawing board, stay tuned ;-)
from simpleflow.
Closing this old one, idempotent tasks mostly do the job \o/
from simpleflow.
New variant of this now that we have chains/groups. Example of a buggy decider (given "my_task" is not configured as idempotent):
self.submit(
Group(
Chain(ActivityTask(my_task, 1),
ActivityTask(my_task, 2))
Chain(ActivityTask(my_task, 3),
ActivityTask(my_task, 4))
)
)
=> 1st decision: 2 activities are scheduled:
my_task-1
with the input{"args": [1]}
- then we jump to the next chain
my_task-2
with the input{"args": [3]}
Now imagine that my_task-1 finishes. A new decision is triggered.
=> 2nd decision:
my_task-1
is already finished, nothing to do- the next task in the chain gets the activity ID of
my_task-2
in simpleflow ; simpleflow looks at the history and sees it's already scheduled, so nothing to do (wrong! this is not the same, but it cannot know) - then it sees the 1st task in the next chain, and names it
my_task-3
, and schedules it with the input:{"args": [3]}
In the end we will finish with 4 tasks executed:
my_task-1
with the input{"args": [1]}
my_task-2
with the input{"args": [3]}
my_task-3
with the input{"args": [3]}
my_task-4
with the input{"args": [4]}
=> probably not what we wanted...
from simpleflow.
Related Issues (20)
- activity_rerun fails on MetrologyTask's
- Add __main__.py module
- Add python executable information to "identity" data
- Seasonal cleanups
- workflow.filter: add start/close timestamps
- decider.start: work without workflows
- execute.python: add `env` argument
- Documentation: steps
- Canvas: misc. enhancements
- Handle dataclasses
- Simpleflow CLI not working on Mac HOT 1
- Use pyproject.toml
- Improve error handling and helpers for failed tasks
- Upgrade to boto3 HOT 6
- Circular dependencies hell HOT 3
- Write access to the repo? HOT 2
- Replace usage of "OrderedDict" by "dict"
- Remove old copyright mentions from 2013 HOT 6
- Installation from git is broken HOT 10
- boto3: Read timeouts in pollers
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simpleflow.