The znflow from zincware

Assign each Node a unique `uuid`

Using a uuid seems more deliberate than using id(node) and might have some advantages in the future.

import uuid

node = ... # Node, nodify

node.uuid == uuid.uuid4()

Better error messages: print node that causes the error

If an error happens inside the project.run print the Node that caused the error

Test with networkx 3

ZnFlow/pyproject.toml

Line 11 in 5686243

networkx = "^2.8.8"

Issues with `init`

The following Code has multiple issues:

class MyNode(znflow.Node):
    def __init__(self, name):
        self.name = name
        self._name = self.name

with znflow.DiGraph() as dag:
    node = MyNode(name="John Doe")

print(node.name)

It does not run through
self._name would not be self.name but Connector(node, "name")

This looks really nice. Just a general comment on Dask at a large scale. The issue we came across was that Dask will deploy as many jobs as possible to a single task which will mean if you set your cluster parameters such that you want 5 GB of memory because that is how much on job requires, it will violate it immediately as it just gives this amount of memory to one worker who can run as many jobs as possible.

The way I am trying to avoid this is to assign worker resources which limit the number of tasks it can deploy, e.g. 1 GPU per model training means only one model can run on the worker at a time. Or, for espresso, 1 "espresso". But in the case you have here, if you task was a large matrix computation and you heavily parallelised over nodes, I think you would reach a dead worker pretty fast.

Originally posted by @SamTov in #21 (review)

Unexpected output

ZnFlow/znflow/deployment.py

Line 149 in 0169cd7

    
           def get_results(self, obj: typing.Union[NodeBaseMixin, list, dict, NodeView], /):

Check with:

from ase.calculators.lj import LennardJones
from ase.build import molecule
import numpy as np
import znflow

water = molecule('H2O')

waters = []
for _ in range(1000):
    water = molecule('H2O')
    water.positions += np.random.normal(scale=0.2, size=(3, 3))
    water.calc = LennardJones()
    waters.append(water)


@znflow.nodify
def get_forces(atoms):
    return atoms.get_forces()

@znflow.nodify
def concatenate(forces):
    return np.concatenate(forces)

with znflow.DiGraph() as graph:
    forces = [get_forces(water) for water in waters]
    forces = concatenate(forces) # TODO this is rather slow

and

deployment = znflow.deployment.Deployment(graph=graph,)
deployment.submit_graph()
results = deployment.get_results(forces) # returns a FunctionFuture instead of np.ndarray

Do not use ZnInit

I think that ZnInit can be fully replaced by simple inheritance. It should be possible to apply the graph to anything that has the parent class / mixing to overwrite setattribute and digraph.

This way, ZnFlow can be applied to e.g. data classes, attrs and generally every class.

add docstrings

Currently we do not lint for docstrings:

ZnFlow/pyproject.toml

Lines 45 to 50 in 378e8ff

    
           [tool.ruff] 
        
           line-length = 90 
        
           select = ["E", "F"] #, "D"] #, "N", "C", "ANN"] 
        
           extend-ignore = [ 
        
               "D213", "D203" 
        
           ]

this should be changed to select = ["E", "F", "D", "N", "C", "ANN"] with as few as possible exclusions.
Doing so, will require adding docstrings and type hints to the code.

combine should also work with list of Nodes

ZnFlow/znflow/combine.py

Lines 92 to 96 in 0169cd7

    
           elif return_dict_attr: 
        
               raise TypeError( 
        
                   "znflow.combine can only return a dict if the result type is" 
        
                   " 'CombinedConnections'." 
        
               )

This is not true. It should also work with a list of Nodes. znflow.combine([a, b], attribute="atoms", return_dict_attr="uuid")

Give the `empty` object a `repr` and `str`

ZnFlow/znflow/base.py

Line 79 in 031dab2

empty = object()

Having __repr__ and __str__ would make it easier to identify, e.g. if checking for the graph this can cause issues.
Also, why not use None in instance._graph_ instead?

`_external_` Node

Allow for Nodes that are not on the Graph.

@dataclasses.dataclass
class ComputeMean(znflow.Node):
    x: float
    y: float
    
    results: float = None

   _external_ = True
    

    def run(self):
        self.results = (self.x + self.y) / 2

Do not resolve Connections for the external Node, but only connections that point towards this Node. Also do not call the run method of this Node.

Test with PyDantic

ZnFlow/tests/test_node.py

Lines 10 to 30 in 57834c9

    
           class PlainNode(znflow.Node): 
        
               def __init__(self, value): 
        
                   self.value = value 
        
               def run(self): 
        
                   self.value += 1 
        
           @dataclasses.dataclass 
        
           class DataclassNode(znflow.Node): 
        
               value: int 
        
               def run(self): 
        
                   self.value += 1 
        
           class ZnInitNode(zninit.ZnInit, znflow.Node): 
        
               value: int = zninit.Descriptor() 
        
               def run(self): 
        
                   self.value += 1

CI on push to main

CI does not run on PR merge in main. Add push on branch main to CI

Allow getitem

There are often scenarios where getitem would be helpful.
It is one of only a few things that might be good to natively support?
It could be added to:

ZnFlow/znflow/base.py

Line 78 in 2759062

class Connection:

How would this be resolved in ZnTrack?

Should the graph add all Nodes automatically?

We currently have:

import znflow

@znflow.nodify
def add(*args):
   return sum(args)

with znflow.DiGraph() as graph:
   add(1, 2, 3)
   result = add(4, 5, 6)

assert len(graph) == 2

vs

import znflow

@znflow.nodify
def add(*args):
   return sum(args)

with znflow.DiGraph() as graph:
   add(1, 2, 3)
   result = add(4, 5, 6)

   graph.add(result)

assert len(graph) == 1

In other words: Is creating a new Node inside the contextmanager sufficient, to show that this Node is now attached to the graph?

Issue with `znflow.Node.result`

Check if there is an issue, if the znflow.Node has an attribute result.
Try:

class Node(znflow.Node):

@property
def result(self):
   raise NotImplementedError

with znflow.DiGraph() as graph:
     _ = Node().result

Dask support

Have an extra with dask, to run and distribute the graph.

Improve Readme

Add class based to the Readme.
Mention Future API for functions and maybe Connections?

Add combination of both.

add update=False

Add self.updated=False before making the update in the handle method.

ZnFlow/znflow/utils.py

Lines 36 to 42 in 3cf29bd

    
           def handle(self, value, **kwargs): 
        
               """Fallback handling if no siggledispatch was triggered.""" 
        
               result = self.default(value, **kwargs) 
        
               if result is not value: 
        
                   self.updated = True 
        
               return result

Split `NodeConnector` and `FunctionConnector`

Currently bot NodeConnector and FunctionConnector have a shared parent.

NodeConnector should:

only have a reference to the Node and the attribute, no graph reference required.

FunctionConnector should:

Have a state
implement https://docs.python.org/3/library/asyncio-future.html methods (at least, result, set_result, done, cancled)
be renamed to FunctionFuture
get_result renamed to result

Allow List Combinations

You can do:

with znflow.DiGraph() as graph:
    node = Node(data=[a, b, c])

but you can not do:

with znflow.DiGraph() as graph:
    node = Node(data=a+b+c)

Please comment, if you think this is a useful and good feature.

Move all IterableHandlers into a module

The following and maybe others should be put into a seperate module. Could be utils or updaters or handlers, etc.
Especially class _UpdateConnectors(utils.IterableHandler): could be useful.

ZnFlow/znflow/graph.py

Lines 21 to 50 in 3cf29bd

    
           class _AttributeToConnection(utils.IterableHandler): 
        
               def default(self, value, **kwargs): 
        
                   assert not kwargs 
        
                   if isinstance(value, FunctionFuture): 
        
                       if value._graph_ is None: 
        
                           return value 
        
                       return Connection(value, attribute="result") 
        
                   elif isinstance(value, Node): 
        
                       if value._graph_ is None: 
        
                           return value 
        
                       return Connection(value, attribute=None) 
        
                   else: 
        
                       return value 
        
           class _AddConnectionToGraph(utils.IterableHandler): 
        
               def default(self, value, **kwargs): 
        
                   if isinstance(value, Connection): 
        
                       graph = kwargs["graph"] 
        
                       v_attr = kwargs.get("attribute") 
        
                       node_instance = kwargs["node_instance"] 
        
                       if v_attr is None: 
        
                           graph.add_connections(value, node_instance) 
        
                       else: 
        
                           graph.add_connections(value, node_instance, v_attr=v_attr) 
        
           class _UpdateConnectors(utils.IterableHandler): 
        
               def default(self, value, **kwargs): 
        
                   return value.result if isinstance(value, Connection) else value

I think this might be slow

We need to debug zntrack.Project.run() because it seems be rather slow.
Maybe this could be a cause.

ZnFlow/znflow/graph.py

Line 74 in fadf4c9

for attribute in dir(node_instance):

Test combine outside graph

Combine should work independent of inside the graph or not

support / mention Python futures

see https://docs.python.org/3/library/asyncio-future.html

Connecting classes

Connect classes and not just instances?

rename `sorted_iter`

rename

ZnFlow/znflow/graph.py

Line 43 in 378e8ff

def sorted_iter(self):

to something more descriptive, e.g. get_sorted_nodes

Restructuring: Store connections on the networkx graph only

Currently we store connections in the networkx graph and inside the NodeConnector.
This could cause issues if only one of them is changed. It is rather unlikely because they are frozen but still possible.

My suggestion: use the networkx MultiDiGraph to store the connections there, and only there. Do not use a NodeConnector but a NodeConnection which reads from the graph to display how it is connected.

In this case the Node itself doesn't have to know anything about the graph it belongs to and the graph fulfills it's purpose of managing the connections instead of Connector.result()

`znflow.Property` does behave differently to `property`

In the ZnTrack package, a znflow.Property does not overwrite zntrack.zn.outs() which causes strange error messages.

`znflow.map`

Allow for:

class AddOne:
	inputs: int
	output: int

	def run(self) -> None:
		self.output = self.inputs + 1

	def map(self, **kwargs) -> None:
		result = []
		for value in kwargs["inputs"]:
			self.inputs = value
			self.run()
			result.append(self.output)
		self.output = result

class AddOneSum(AddOne):
	def map(self, **kwargs) -> None:
		super().map(**kwargs)
		self.output = sum(self.output)

with graph:
	a.data: int
	b = AddOne(a.data)

b.output: int

with graph:
   c.data: list[int]
   d = znflow.map(AddOne, inputs=c.data)(**{}) # optional non-mapped args

d.output: list[int]

with graph:
	e = znflow.map(AddOneSum, inputs=c.data)()

e.output: int

each MappedNode should have the same arguments than the original one, but with an added dimension on the mapped property.
Otherwise, they shold be treated equally.

Points to discuss:

provide a default map method?
name: map vs _map_ ?
allow for custom arguments to the znflow.map, e.g. a more functional approach where you pass a transformation, e.g. the sum there and don't necessitate writing it inside the node itself.

Tests for `znflow.combine`

There are new features in znflow.combine from #67 that aren't automatically tested yet.

Change in_construction

ZnFlow/znflow/node.py

Line 57 in ee5e46e

cls._in_construction = False

We could change in_construction in such a way, that it is true until the __init__ was called? This way also things like __subclass_init__ should be covered. See zincware/IPSuite#51

Check if _protected_ is required

Check if the attribute protected is still required.

Future / Connection result

ZnFlow/znflow/base.py

Line 268 in 0169cd7

result: any = dataclasses.field(default=None, init=False, repr=True)

Should the result really be None or should it raise an Error if it is not available?

Add License

Wrap init return value

The wrap init decorator does not return from init. Typically init does not have a return value but maybe someone does something with it

Improve lint actions

add poetry group for lint
install using --no-root
update actions to not fail fast and run all checks in a single action checkout

Better error messages

train_selection = val_selection.excluded_atoms

val_data += val_selection.atoms
train_data += train_selection #.atoms raises error

have something like: znflow.ConnectionError: connected attribute "list" has no attribute "atoms"

instead of:

TypeError: Can not add <class 'ipsuite.configuration_selection.random.RandomSelection'> to <class 'znflow.base.Connection'>.

and

AttributeError: 'Connection' object has no attribute 'atoms'

Fix Assertion Error

Replace with Value or probably TypeError because Assertion should not be used in production.

Plot a better error message not only including the UUID but also a repr or the Node so it can be identified more easily.

ZnFlow/znflow/graph.py

Lines 134 to 135 in 3cf29bd

    
           assert u_of_edge.uuid in self, f"'{u_of_edge.uuid=}' not in '{self=}'" 
        
           assert v_of_edge.uuid in self, f"'{v_of_edge.uuid=}' not in '{self=}'"

Use `znflow.edges`

ZnFlow does rely on Connections instead of using znflow.edges. It would probably better to rely on edges, because they are easier to modify / add if you want to set order afterwards (see dask4dvc)

[Docs] reopen a Graph.

The following should be added to the documentaiton.
You can also use:

graph = znflow.DiGraph()

with graph:
  node1 = ...

with graph:
  node2 = ...

....

Add tests for `znflow.Property`

The following class is not tested enough.
Ideally, there would also be another way without entirely reimplementing the property class but I don't know how.

ZnFlow/znflow/base.py

Lines 23 to 73 in a24ee7c

    
           class Property: 
        
               """Custom Property with disabled graph. 
        
               References 
        
               ---------- 
        
               Adapted from https://docs.python.org/3/howto/descriptor.html#properties 
        
               """ 
        
               def __init__(self, fget=None, fset=None, fdel=None, doc=None): 
        
                   self.fget = disable_graph()(fget) 
        
                   self.fset = disable_graph()(fset) 
        
                   self.fdel = disable_graph()(fdel) 
        
                   if doc is None and fget is not None: 
        
                       doc = fget.__doc__ 
        
                   self.__doc__ = doc 
        
                   self._name = "" 
        
               def __set_name__(self, owner, name): 
        
                   self._name = name 
        
               def __get__(self, obj, objtype=None): 
        
                   if obj is None: 
        
                       return self 
        
                   if self.fget is None: 
        
                       raise AttributeError(f"property '{self._name}' has no getter") 
        
                   return self.fget(obj) 
        
               def __set__(self, obj, value): 
        
                   if self.fset is None: 
        
                       raise AttributeError(f"property '{self._name}' has no setter") 
        
                   self.fset(obj, value) 
        
               def __delete__(self, obj): 
        
                   if self.fdel is None: 
        
                       raise AttributeError(f"property '{self._name}' has no deleter") 
        
                   self.fdel(obj) 
        
               def getter(self, fget): 
        
                   prop = type(self)(fget, self.fset, self.fdel, self.__doc__) 
        
                   prop._name = self._name 
        
                   return prop 
        
               def setter(self, fset): 
        
                   prop = type(self)(self.fget, fset, self.fdel, self.__doc__) 
        
                   prop._name = self._name 
        
                   return prop 
        
               def deleter(self, fdel): 
        
                   prop = type(self)(self.fget, self.fset, fdel, self.__doc__) 
        
                   prop._name = self._name 
        
                   return prop

	[tool.ruff]
	line-length = 90
	select = ["E", "F"] #, "D"] #, "N", "C", "ANN"]
	extend-ignore = [
	"D213", "D203"
	]

	elif return_dict_attr:
	raise TypeError(
	"znflow.combine can only return a dict if the result type is"
	" 'CombinedConnections'."
	)

	class PlainNode(znflow.Node):
	def __init__(self, value):
	self.value = value

	def run(self):
	self.value += 1


	@dataclasses.dataclass
	class DataclassNode(znflow.Node):
	value: int

	def run(self):
	self.value += 1


	class ZnInitNode(zninit.ZnInit, znflow.Node):
	value: int = zninit.Descriptor()

	def run(self):
	self.value += 1

	def handle(self, value, **kwargs):
	"""Fallback handling if no siggledispatch was triggered."""

	result = self.default(value, **kwargs)
	if result is not value:
	self.updated = True
	return result

	class _AttributeToConnection(utils.IterableHandler):
	def default(self, value, **kwargs):
	assert not kwargs
	if isinstance(value, FunctionFuture):
	if value._graph_ is None:
	return value
	return Connection(value, attribute="result")
	elif isinstance(value, Node):
	if value._graph_ is None:
	return value
	return Connection(value, attribute=None)
	else:
	return value


	class _AddConnectionToGraph(utils.IterableHandler):
	def default(self, value, **kwargs):
	if isinstance(value, Connection):
	graph = kwargs["graph"]
	v_attr = kwargs.get("attribute")
	node_instance = kwargs["node_instance"]
	if v_attr is None:
	graph.add_connections(value, node_instance)
	else:
	graph.add_connections(value, node_instance, v_attr=v_attr)


	class _UpdateConnectors(utils.IterableHandler):
	def default(self, value, **kwargs):
	return value.result if isinstance(value, Connection) else value

	assert u_of_edge.uuid in self, f"'{u_of_edge.uuid=}' not in '{self=}'"
	assert v_of_edge.uuid in self, f"'{v_of_edge.uuid=}' not in '{self=}'"

	class Property:
	"""Custom Property with disabled graph.

	References
	----------
	Adapted from https://docs.python.org/3/howto/descriptor.html#properties
	"""

	def __init__(self, fget=None, fset=None, fdel=None, doc=None):
	self.fget = disable_graph()(fget)
	self.fset = disable_graph()(fset)
	self.fdel = disable_graph()(fdel)
	if doc is None and fget is not None:
	doc = fget.__doc__
	self.__doc__ = doc
	self._name = ""

	def __set_name__(self, owner, name):
	self._name = name

	def __get__(self, obj, objtype=None):
	if obj is None:
	return self
	if self.fget is None:
	raise AttributeError(f"property '{self._name}' has no getter")
	return self.fget(obj)

	def __set__(self, obj, value):
	if self.fset is None:
	raise AttributeError(f"property '{self._name}' has no setter")
	self.fset(obj, value)

	def __delete__(self, obj):
	if self.fdel is None:
	raise AttributeError(f"property '{self._name}' has no deleter")
	self.fdel(obj)

	def getter(self, fget):
	prop = type(self)(fget, self.fset, self.fdel, self.__doc__)
	prop._name = self._name
	return prop

	def setter(self, fset):
	prop = type(self)(self.fget, fset, self.fdel, self.__doc__)
	prop._name = self._name
	return prop

	def deleter(self, fdel):
	prop = type(self)(self.fget, self.fset, fdel, self.__doc__)
	prop._name = self._name
	return prop

zincware / znflow Goto Github PK

znflow's People

Contributors

Stargazers

Watchers

znflow's Issues

Recommend Projects

Recommend Topics

Recommend Org