Code Monkey home page Code Monkey logo

znflow's People

Contributors

pre-commit-ci[bot] avatar pythonfz avatar ruttor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

znflow's Issues

Assign each Node a unique `uuid`

Using a uuid seems more deliberate than using id(node) and might have some advantages in the future.

import uuid

node = ... # Node, nodify

node.uuid == uuid.uuid4()

Issues with `__init__`

The following Code has multiple issues:

class MyNode(znflow.Node):
    def __init__(self, name):
        self.name = name
        self._name = self.name

with znflow.DiGraph() as dag:
    node = MyNode(name="John Doe")

print(node.name)
  1. It does not run through
  2. self._name would not be self.name but Connector(node, "name")

Dask worker resources

This looks really nice. Just a general comment on Dask at a large scale. The issue we came across was that Dask will deploy as many jobs as possible to a single task which will mean if you set your cluster parameters such that you want 5 GB of memory because that is how much on job requires, it will violate it immediately as it just gives this amount of memory to one worker who can run as many jobs as possible.

The way I am trying to avoid this is to assign worker resources which limit the number of tasks it can deploy, e.g. 1 GPU per model training means only one model can run on the worker at a time. Or, for espresso, 1 "espresso". But in the case you have here, if you task was a large matrix computation and you heavily parallelised over nodes, I think you would reach a dead worker pretty fast.

Originally posted by @SamTov in #21 (review)

Unexpected output

def get_results(self, obj: typing.Union[NodeBaseMixin, list, dict, NodeView], /):

Check with:

from ase.calculators.lj import LennardJones
from ase.build import molecule
import numpy as np
import znflow

water = molecule('H2O')

waters = []
for _ in range(1000):
    water = molecule('H2O')
    water.positions += np.random.normal(scale=0.2, size=(3, 3))
    water.calc = LennardJones()
    waters.append(water)


@znflow.nodify
def get_forces(atoms):
    return atoms.get_forces()

@znflow.nodify
def concatenate(forces):
    return np.concatenate(forces)

with znflow.DiGraph() as graph:
    forces = [get_forces(water) for water in waters]
    forces = concatenate(forces) # TODO this is rather slow

and

deployment = znflow.deployment.Deployment(graph=graph,)
deployment.submit_graph()
results = deployment.get_results(forces) # returns a FunctionFuture instead of np.ndarray

Do not use ZnInit

I think that ZnInit can be fully replaced by simple inheritance. It should be possible to apply the graph to anything that has the parent class / mixing to overwrite setattribute and digraph.

This way, ZnFlow can be applied to e.g. data classes, attrs and generally every class.

add docstrings

Currently we do not lint for docstrings:

ZnFlow/pyproject.toml

Lines 45 to 50 in 378e8ff

[tool.ruff]
line-length = 90
select = ["E", "F"] #, "D"] #, "N", "C", "ANN"]
extend-ignore = [
"D213", "D203"
]

this should be changed to select = ["E", "F", "D", "N", "C", "ANN"] with as few as possible exclusions.
Doing so, will require adding docstrings and type hints to the code.

`_external_` Node

Allow for Nodes that are not on the Graph.

@dataclasses.dataclass
class ComputeMean(znflow.Node):
    x: float
    y: float
    
    results: float = None

   _external_ = True
    

    def run(self):
        self.results = (self.x + self.y) / 2

Do not resolve Connections for the external Node, but only connections that point towards this Node. Also do not call the run method of this Node.

Test with PyDantic

class PlainNode(znflow.Node):
def __init__(self, value):
self.value = value
def run(self):
self.value += 1
@dataclasses.dataclass
class DataclassNode(znflow.Node):
value: int
def run(self):
self.value += 1
class ZnInitNode(zninit.ZnInit, znflow.Node):
value: int = zninit.Descriptor()
def run(self):
self.value += 1

Allow getitem

There are often scenarios where getitem would be helpful.
It is one of only a few things that might be good to natively support?
It could be added to:

class Connection:

How would this be resolved in ZnTrack?

Should the graph add all Nodes automatically?

We currently have:

import znflow

@znflow.nodify
def add(*args):
   return sum(args)

with znflow.DiGraph() as graph:
   add(1, 2, 3)
   result = add(4, 5, 6)

assert len(graph) == 2

vs

import znflow

@znflow.nodify
def add(*args):
   return sum(args)

with znflow.DiGraph() as graph:
   add(1, 2, 3)
   result = add(4, 5, 6)

   graph.add(result)

assert len(graph) == 1

In other words: Is creating a new Node inside the contextmanager sufficient, to show that this Node is now attached to the graph?

Issue with `znflow.Node.result`

Check if there is an issue, if the znflow.Node has an attribute result.
Try:

class Node(znflow.Node):

@property
def result(self):
   raise NotImplementedError

with znflow.DiGraph() as graph:
     _ = Node().result

Dask support

Have an extra with dask, to run and distribute the graph.

Improve Readme

Add class based to the Readme.
Mention Future API for functions and maybe Connections?

Add combination of both.

add update=False

Add self.updated=False before making the update in the handle method.

ZnFlow/znflow/utils.py

Lines 36 to 42 in 3cf29bd

def handle(self, value, **kwargs):
"""Fallback handling if no siggledispatch was triggered."""
result = self.default(value, **kwargs)
if result is not value:
self.updated = True
return result

Allow List Combinations

You can do:

with znflow.DiGraph() as graph:
    node = Node(data=[a, b, c])

but you can not do:

with znflow.DiGraph() as graph:
    node = Node(data=a+b+c)

Please comment, if you think this is a useful and good feature.

Move all IterableHandlers into a module

The following and maybe others should be put into a seperate module. Could be utils or updaters or handlers, etc.
Especially class _UpdateConnectors(utils.IterableHandler): could be useful.

ZnFlow/znflow/graph.py

Lines 21 to 50 in 3cf29bd

class _AttributeToConnection(utils.IterableHandler):
def default(self, value, **kwargs):
assert not kwargs
if isinstance(value, FunctionFuture):
if value._graph_ is None:
return value
return Connection(value, attribute="result")
elif isinstance(value, Node):
if value._graph_ is None:
return value
return Connection(value, attribute=None)
else:
return value
class _AddConnectionToGraph(utils.IterableHandler):
def default(self, value, **kwargs):
if isinstance(value, Connection):
graph = kwargs["graph"]
v_attr = kwargs.get("attribute")
node_instance = kwargs["node_instance"]
if v_attr is None:
graph.add_connections(value, node_instance)
else:
graph.add_connections(value, node_instance, v_attr=v_attr)
class _UpdateConnectors(utils.IterableHandler):
def default(self, value, **kwargs):
return value.result if isinstance(value, Connection) else value

Restructuring: Store connections on the networkx graph only

Currently we store connections in the networkx graph and inside the NodeConnector.
This could cause issues if only one of them is changed. It is rather unlikely because they are frozen but still possible.

My suggestion: use the networkx MultiDiGraph to store the connections there, and only there. Do not use a NodeConnector but a NodeConnection which reads from the graph to display how it is connected.

In this case the Node itself doesn't have to know anything about the graph it belongs to and the graph fulfills it's purpose of managing the connections instead of Connector.result()

`znflow.map`

Allow for:

class AddOne:
	inputs: int
	output: int

	def run(self) -> None:
		self.output = self.inputs + 1

	def map(self, **kwargs) -> None:
		result = []
		for value in kwargs["inputs"]:
			self.inputs = value
			self.run()
			result.append(self.output)
		self.output = result

class AddOneSum(AddOne):
	def map(self, **kwargs) -> None:
		super().map(**kwargs)
		self.output = sum(self.output)

with graph:
	a.data: int
	b = AddOne(a.data)

b.output: int

with graph:
   c.data: list[int]
   d = znflow.map(AddOne, inputs=c.data)(**{}) # optional non-mapped args

d.output: list[int]

with graph:
	e = znflow.map(AddOneSum, inputs=c.data)()

e.output: int

each MappedNode should have the same arguments than the original one, but with an added dimension on the mapped property.
Otherwise, they shold be treated equally.

Points to discuss:

  • provide a default map method?
  • name: map vs _map_ ?
  • allow for custom arguments to the znflow.map, e.g. a more functional approach where you pass a transformation, e.g. the sum there and don't necessitate writing it inside the node itself.

Wrap init return value

The wrap init decorator does not return from init. Typically init does not have a return value but maybe someone does something with it

Improve lint actions

add poetry group for lint
install using --no-root
update actions to not fail fast and run all checks in a single action checkout

Better error messages

train_selection = val_selection.excluded_atoms

val_data += val_selection.atoms
train_data += train_selection #.atoms raises error

have something like: znflow.ConnectionError: connected attribute "list" has no attribute "atoms"

instead of:

TypeError: Can not add <class 'ipsuite.configuration_selection.random.RandomSelection'> to <class 'znflow.base.Connection'>.

and

AttributeError: 'Connection' object has no attribute 'atoms'

Fix Assertion Error

  1. Replace with Value or probably TypeError because Assertion should not be used in production.
  2. Plot a better error message not only including the UUID but also a repr or the Node so it can be identified more easily.

    ZnFlow/znflow/graph.py

    Lines 134 to 135 in 3cf29bd

    assert u_of_edge.uuid in self, f"'{u_of_edge.uuid=}' not in '{self=}'"
    assert v_of_edge.uuid in self, f"'{v_of_edge.uuid=}' not in '{self=}'"

Use `znflow.edges`

ZnFlow does rely on Connections instead of using znflow.edges. It would probably better to rely on edges, because they are easier to modify / add if you want to set order afterwards (see dask4dvc)

[Docs] reopen a Graph.

The following should be added to the documentaiton.
You can also use:

graph = znflow.DiGraph()

with graph:
  node1 = ...

with graph:
  node2 = ...

....

Add tests for `znflow.Property`

The following class is not tested enough.
Ideally, there would also be another way without entirely reimplementing the property class but I don't know how.

ZnFlow/znflow/base.py

Lines 23 to 73 in a24ee7c

class Property:
"""Custom Property with disabled graph.
References
----------
Adapted from https://docs.python.org/3/howto/descriptor.html#properties
"""
def __init__(self, fget=None, fset=None, fdel=None, doc=None):
self.fget = disable_graph()(fget)
self.fset = disable_graph()(fset)
self.fdel = disable_graph()(fdel)
if doc is None and fget is not None:
doc = fget.__doc__
self.__doc__ = doc
self._name = ""
def __set_name__(self, owner, name):
self._name = name
def __get__(self, obj, objtype=None):
if obj is None:
return self
if self.fget is None:
raise AttributeError(f"property '{self._name}' has no getter")
return self.fget(obj)
def __set__(self, obj, value):
if self.fset is None:
raise AttributeError(f"property '{self._name}' has no setter")
self.fset(obj, value)
def __delete__(self, obj):
if self.fdel is None:
raise AttributeError(f"property '{self._name}' has no deleter")
self.fdel(obj)
def getter(self, fget):
prop = type(self)(fget, self.fset, self.fdel, self.__doc__)
prop._name = self._name
return prop
def setter(self, fset):
prop = type(self)(self.fget, fset, self.fdel, self.__doc__)
prop._name = self._name
return prop
def deleter(self, fdel):
prop = type(self)(self.fget, self.fset, fdel, self.__doc__)
prop._name = self._name
return prop

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.