zincware / znflow Goto Github PK
View Code? Open in Web Editor NEWA general purpose framework for building and running computational graphs.
License: Apache License 2.0
A general purpose framework for building and running computational graphs.
License: Apache License 2.0
Using a uuid
seems more deliberate than using id(node)
and might have some advantages in the future.
import uuid
node = ... # Node, nodify
node.uuid == uuid.uuid4()
If an error happens inside the project.run
print the Node that caused the error
Line 11 in 5686243
The following Code has multiple issues:
class MyNode(znflow.Node):
def __init__(self, name):
self.name = name
self._name = self.name
with znflow.DiGraph() as dag:
node = MyNode(name="John Doe")
print(node.name)
self._name
would not be self.name
but Connector(node, "name")
This looks really nice. Just a general comment on Dask at a large scale. The issue we came across was that Dask will deploy as many jobs as possible to a single task which will mean if you set your cluster parameters such that you want 5 GB of memory because that is how much on job requires, it will violate it immediately as it just gives this amount of memory to one worker who can run as many jobs as possible.
The way I am trying to avoid this is to assign worker resources which limit the number of tasks it can deploy, e.g. 1 GPU per model training means only one model can run on the worker at a time. Or, for espresso, 1 "espresso". But in the case you have here, if you task was a large matrix computation and you heavily parallelised over nodes, I think you would reach a dead worker pretty fast.
Originally posted by @SamTov in #21 (review)
Line 149 in 0169cd7
Check with:
from ase.calculators.lj import LennardJones
from ase.build import molecule
import numpy as np
import znflow
water = molecule('H2O')
waters = []
for _ in range(1000):
water = molecule('H2O')
water.positions += np.random.normal(scale=0.2, size=(3, 3))
water.calc = LennardJones()
waters.append(water)
@znflow.nodify
def get_forces(atoms):
return atoms.get_forces()
@znflow.nodify
def concatenate(forces):
return np.concatenate(forces)
with znflow.DiGraph() as graph:
forces = [get_forces(water) for water in waters]
forces = concatenate(forces) # TODO this is rather slow
and
deployment = znflow.deployment.Deployment(graph=graph,)
deployment.submit_graph()
results = deployment.get_results(forces) # returns a FunctionFuture instead of np.ndarray
I think that ZnInit can be fully replaced by simple inheritance. It should be possible to apply the graph to anything that has the parent class / mixing to overwrite setattribute and digraph.
This way, ZnFlow can be applied to e.g. data classes, attrs and generally every class.
Currently we do not lint for docstrings:
Lines 45 to 50 in 378e8ff
this should be changed to select = ["E", "F", "D", "N", "C", "ANN"]
with as few as possible exclusions.
Doing so, will require adding docstrings and type hints to the code.
Lines 92 to 96 in 0169cd7
This is not true. It should also work with a list of Nodes. znflow.combine([a, b], attribute="atoms", return_dict_attr="uuid")
Line 79 in 031dab2
Having __repr__
and __str__
would make it easier to identify, e.g. if checking for the graph this can cause issues.
Also, why not use None in instance._graph_
instead?
Allow for Nodes that are not on the Graph.
@dataclasses.dataclass
class ComputeMean(znflow.Node):
x: float
y: float
results: float = None
_external_ = True
def run(self):
self.results = (self.x + self.y) / 2
Do not resolve Connections for the external Node, but only connections that point towards this Node. Also do not call the run method of this Node.
Lines 10 to 30 in 57834c9
CI does not run on PR merge in main. Add push on branch main to CI
There are often scenarios where getitem would be helpful.
It is one of only a few things that might be good to natively support?
It could be added to:
Line 78 in 2759062
How would this be resolved in ZnTrack?
We currently have:
import znflow
@znflow.nodify
def add(*args):
return sum(args)
with znflow.DiGraph() as graph:
add(1, 2, 3)
result = add(4, 5, 6)
assert len(graph) == 2
vs
import znflow
@znflow.nodify
def add(*args):
return sum(args)
with znflow.DiGraph() as graph:
add(1, 2, 3)
result = add(4, 5, 6)
graph.add(result)
assert len(graph) == 1
In other words: Is creating a new Node inside the contextmanager sufficient, to show that this Node is now attached to the graph?
Check if there is an issue, if the znflow.Node
has an attribute result.
Try:
class Node(znflow.Node):
@property
def result(self):
raise NotImplementedError
with znflow.DiGraph() as graph:
_ = Node().result
Have an extra with dask, to run and distribute the graph.
Add class based to the Readme.
Mention Future API for functions and maybe Connections?
Add combination of both.
Add self.updated=False
before making the update in the handle method.
Lines 36 to 42 in 3cf29bd
Currently bot NodeConnector
and FunctionConnector
have a shared parent.
NodeConnector should:
FunctionConnector should:
FunctionFuture
get_result
renamed to result
You can do:
with znflow.DiGraph() as graph:
node = Node(data=[a, b, c])
but you can not do:
with znflow.DiGraph() as graph:
node = Node(data=a+b+c)
Please comment, if you think this is a useful and good feature.
The following and maybe others should be put into a seperate module. Could be utils or updaters or handlers, etc.
Especially class _UpdateConnectors(utils.IterableHandler):
could be useful.
Lines 21 to 50 in 3cf29bd
We need to debug zntrack.Project.run()
because it seems be rather slow.
Maybe this could be a cause.
Line 74 in fadf4c9
Combine should work independent of inside the graph or not
Connect classes and not just instances?
Currently we store connections in the networkx graph and inside the NodeConnector.
This could cause issues if only one of them is changed. It is rather unlikely because they are frozen but still possible.
My suggestion: use the networkx MultiDiGraph to store the connections there, and only there. Do not use a NodeConnector but a NodeConnection which reads from the graph to display how it is connected.
In this case the Node itself doesn't have to know anything about the graph it belongs to and the graph fulfills it's purpose of managing the connections instead of Connector.result()
In the ZnTrack package, a znflow.Property
does not overwrite zntrack.zn.outs()
which causes strange error messages.
Allow for:
class AddOne:
inputs: int
output: int
def run(self) -> None:
self.output = self.inputs + 1
def map(self, **kwargs) -> None:
result = []
for value in kwargs["inputs"]:
self.inputs = value
self.run()
result.append(self.output)
self.output = result
class AddOneSum(AddOne):
def map(self, **kwargs) -> None:
super().map(**kwargs)
self.output = sum(self.output)
with graph:
a.data: int
b = AddOne(a.data)
b.output: int
with graph:
c.data: list[int]
d = znflow.map(AddOne, inputs=c.data)(**{}) # optional non-mapped args
d.output: list[int]
with graph:
e = znflow.map(AddOneSum, inputs=c.data)()
e.output: int
each MappedNode
should have the same arguments than the original one, but with an added dimension on the mapped property.
Otherwise, they shold be treated equally.
Points to discuss:
map
method?map
vs _map_
?znflow.map
, e.g. a more functional approach where you pass a transformation, e.g. the sum
there and don't necessitate writing it inside the node itself.There are new features in znflow.combine
from #67 that aren't automatically tested yet.
Line 57 in ee5e46e
We could change in_construction
in such a way, that it is true until the __init__
was called? This way also things like __subclass_init__
should be covered. See zincware/IPSuite#51
Check if the attribute protected is still required.
Line 268 in 0169cd7
Should the result really be None or should it raise an Error if it is not available?
The wrap init decorator does not return from init. Typically init does not have a return value but maybe someone does something with it
add poetry group for lint
install using --no-root
update actions to not fail fast and run all checks in a single action checkout
train_selection = val_selection.excluded_atoms
val_data += val_selection.atoms
train_data += train_selection #.atoms raises error
have something like: znflow.ConnectionError: connected attribute "list" has no attribute "atoms"
instead of:
TypeError: Can not add <class 'ipsuite.configuration_selection.random.RandomSelection'> to <class 'znflow.base.Connection'>.
and
AttributeError: 'Connection' object has no attribute 'atoms'
repr
or the Node so it can be identified more easily.Lines 134 to 135 in 3cf29bd
ZnFlow does rely on Connections
instead of using znflow.edges
. It would probably better to rely on edges, because they are easier to modify / add if you want to set order afterwards (see dask4dvc)
The following should be added to the documentaiton.
You can also use:
graph = znflow.DiGraph()
with graph:
node1 = ...
with graph:
node2 = ...
....
The following class is not tested enough.
Ideally, there would also be another way without entirely reimplementing the property class but I don't know how.
Lines 23 to 73 in a24ee7c
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.