There are two things: it doesn't work the graph visua

Would like to hear from <a class="user-mention notranslate" data-hovercard-type="user"

Can't have a function with both Parallelizable and Collect about hamilton HOT 1 OPEN

skrawcz commented on May 28, 2024

Can't have a function with both Parallelizable and Collect

from hamilton.

Comments (1)

zilto commented on May 28, 2024

Would like to hear from @elijahbenizzy, but I think this dataflow works as intended since Parallelizable/Collect was introduced.

Issue 1: `motor` is collected twice

Parallel/Collect work in pair. Since motor is Parallel, it should be Collected downstream only once. In the simplest case, Parallel iterates and Collect creates a list. When I see a Parallel when reading the code, I'm asking myself "where is it collected?". It seems to improve readability to have it collected only once.

Issue 2: a node can't be collect and parallel

I don't know why it's a limitation, but it seems to improve readability of the graph. Metaflow imposes the same constraint. Each foreach step need an explicit join step to merge artifacts (it's not always trivial). However, Metaflow allows for nested foreach.

Solution

As you describe @skrawcz , the following code works and is also IMHO more readable.

def motor(motor_list: list[int]) -> Parallelizable[int]:
   for _motor in motor_list:
      yield _motor

def _is_motor_on(motor: int ) -> bool:
    return motor % 2 == 0

def motor_status(motor: int) -> dict:
   # logic to check
   return {
   "motor_id": motor,
   "is_on": _is_motor_on(motor)
  }

def motor_status_collection(motor_status: Collect[dict]) -> list[dict]:
    return list(motor_status)

def on_motor(motor_status_collection: list[dict]) -> Parallelizable[int]:
  for motor_dict in motor_status_collection:
      if motor_dict["is_on"]:
          yield motor_dict["motor_id"]

def status_check_1(on_motor: int) -> float:
    # some status check.
    return 2.3 * on_motor

def status_check_2(on_motor: int, status_check_1: float) -> str:
    return f"some result based on {on_motor} and {status_check_1}"

def status_result(on_motor: int, status_check_1: float, status_check_2: str) -> dict:
    return locals()

def on_motor_statuses(status_result: Collect[dict]) -> pd.DataFrame:
    return pd.DataFrame(status_result)

Next steps

It seems that the desired user workflow is easy to support with the current features (with the few edits shared). I think we can improve the documentation around Parallel/Collect. Also, we might want to catch these errors at Driver instantiation and make the graph fail. I believe the challenge is that without the parallel / collect stuff, the initial DAG structure is valid. Otherwise, since the submitted DAG is invalid, it's unclear what the behavior of the viz should be. The issue seem to exist upstream of the viz.

from hamilton.

Can't have a function with both Parallelizable and Collect about hamilton HOT 1 OPEN

Comments (1)

Issue 1: `motor` is collected twice

Issue 2: a node can't be collect and parallel

Solution

Next steps

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Comments (1)

Issue 1: motor is collected twice

Issue 2: a node can't be collect and parallel

Solution

Next steps

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

Issue 1: `motor` is collected twice