Code Monkey home page Code Monkey logo

pyungo's People

Contributors

cedricleroy avatar nelsontodd avatar tosa95 avatar veronicaguo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pyungo's Issues

RFE: auto-populate graph from function name & args

I propose to auto-populate Graph.add_node() calls based on function & argument-names (using python's inspect standard-library's module), and allow some form of string-filtering on the function/args names.
Knowingly this would work for (singular) outputs only.

The proposal is easier to explain with sample client code:

def funcname_chopper(funcname):
    for prefix in ['calc_', 'compute_', 'make_']:
        if funcname.startswith(prefix):
            return  prefix[len(prefix):]

graph = pyungo.Graph(
    outname_converter=funcname_chopper)

# equivalent to: register(inputs=['a', 'b'], outputs=['c']
@graph.register
def make_c(a, b):
    return a+b


# equivalent to: register(inputs=['a', 'b', 'c'], outputs=['make_d']
@graph.register(inpname_converter=lambda n: n[5:], outname_converter=None)
def calc_d(some_a, look_b, stop_c):
    return a+b

Issue with single output being an array

Reference: https://github.com/cedricleroy/pyungo/blob/master/pyungo/core.py#L146-L152

I ran into an issue for a specific type of node. The node returns 1 output that is a list of multiple datetime objects (like a timestamps vector). Because of the lines referenced above, the graph only saves the 1st item of that returned list, because it thinks that multiple outputs will be returned (since iter(res) doesn't fail), but there is only one output_name in the node (like "timestamps" for instance).
Essentially, the for loop goes through the timestamps list, and returns the first element of that list as data to be saved...

Is schema enforced on outputs and internal data-nodes?

Adapting the quickstart example:

schema = {
    "type": "object",
    "properties": {
        "a": {"type": "number"},
        "b": {"type": "number"}
    }
}

graph = Graph(schema=schema)

@graph.register(inputs=['a'], outputs=['b'])
def f1(a):
    return "Hey!"

@graph.register(inputs=['b'], outputs=['c'])
def f2(b):
    return 2 * b

graph.calculate(data={'a': 1})

I was expecting an error, but got the pipeline went through and got the result Hey@!Hey!.
Am i doing something wrong?
This feature is particularly important for data on the internal nodes, because it is not as easy to test them as inputs/outputs.

Memory usage

Hi,
I'm not sure if this is a bug or a feature request.

I have a workflow that is very memory intensive but also very well suited to decomposition to a DAG.
My problem is that it if I keep any intermediate outputs in memory I will quickly exceed the capacity of one computer to hold the data in RAM.
I had hoped pyungo would be clever enough to allow intermediate states to be garbage collected, but it doesn't seem so.

See a sample program:
`
from pyungo import Graph
import numpy as np
import gc

@Profile
def main():
graph = Graph()

@graph.register()
def calc_a():
    a = np.random.rand(8192,8192)
    return a

@graph.register()
def calc_b():
    b = np.random.rand(8192,8192)
    return b

@graph.register()
def calc_c(a,b):
    gc.collect()
    c = a * b 
    print("c")
    return c

@graph.register()
def calc_d():
    gc.collect()
    d = np.random.rand(8192,8192)
    print("d")
    return d

@graph.register()
def calc_pfd(c,d):
    gc.collect()
    e = c * d
    return e

gc.collect()
res = graph.calculate(data={})
gc.collect()
print(res)
del res
gc.collect()
del graph
gc.collect()

main()
`

Output:
`
(venv) zenbook% python -m memory_profiler memtest.py
INFO:root:Starting calculation...
INFO:root:Ran Node(08f958eb-84ff-49ad-a2fb-a2ada5788705, <calc_a>, [], ['a']) in 0:00:02.127759
d
INFO:root:Ran Node(9cd9ce4e-16d7-4a43-83b7-a8e01e8bd8ba, <calc_d>, [], ['d']) in 0:00:01.618884
INFO:root:Ran Node(ea8b8c8f-d8fc-4967-a7c5-c3ba0dbcd550, <calc_b>, [], ['b']) in 0:00:01.519026
c
INFO:root:Ran Node(ac6d7004-7cb1-41ff-8476-a2a1ce9e64d6, <calc_c>, ['a', 'b'], ['c']) in 0:00:01.029356
INFO:root:Ran Node(7786d30e-7796-4d19-a692-07f2904ea6c8, <calc_pfd>, ['c', 'd'], ['e']) in 0:00:00.853072
INFO:root:Calculation finished in 0:00:07.152394
[[0.32979496 0.00617538 0.01675385 ... 0.08284045 0.03303956 0.09351132]
[0.00268712 0.20226707 0.06033366 ... 0.07918911 0.01333745 0.15655172]
[0.0007408 0.01337496 0.17597583 ... 0.19520472 0.0274126 0.07911974]
...
[0.00958562 0.00919059 0.10846052 ... 0.01235475 0.02207799 0.26674223]
[0.06822633 0.03539608 0.08139489 ... 0.08097827 0.10901089 0.02113664]
[0.01915152 0.00518849 0.34347554 ... 0.04939359 0.48837681 0.11771939]]
Filename: memtest.py

Line # Mem usage Increment Line Contents

 5   29.688 MiB   29.688 MiB   @profile
 6                             def main():
 7   29.688 MiB    0.000 MiB       graph = Graph()
 8                             
 9   29.691 MiB    0.000 MiB       @graph.register()
10   29.691 MiB    0.004 MiB       def calc_a():
11  541.562 MiB  511.871 MiB           a = np.random.rand(8192,8192)
12  541.562 MiB    0.000 MiB           return a
13                             
14 1053.578 MiB    0.000 MiB       @graph.register()
15   29.691 MiB    0.000 MiB       def calc_b():
16 1565.590 MiB  512.012 MiB           b = np.random.rand(8192,8192)
17 1565.590 MiB    0.000 MiB           return b
18                             
19 1565.590 MiB    0.000 MiB       @graph.register()
20   29.691 MiB    0.000 MiB       def calc_c(a,b):
21 1565.590 MiB    0.000 MiB           gc.collect()
22 2077.605 MiB  512.016 MiB           c = a * b 
23 2077.605 MiB    0.000 MiB           print("c")
24 2077.605 MiB    0.000 MiB           return c
25                             
26  541.562 MiB    0.000 MiB       @graph.register()
27   29.691 MiB    0.000 MiB       def calc_d():
28  541.562 MiB    0.000 MiB           gc.collect()
29 1053.578 MiB  512.016 MiB           d = np.random.rand(8192,8192)
30 1053.578 MiB    0.000 MiB           print("d")
31 1053.578 MiB    0.000 MiB           return d
32                             
33 2077.605 MiB    0.000 MiB       @graph.register()
34   29.691 MiB    0.000 MiB       def calc_pfd(c,d):
35 2077.605 MiB    0.000 MiB           gc.collect()
36 2589.621 MiB  512.016 MiB           e = c * d
37 2589.621 MiB    0.000 MiB           return e
38                             
39   29.691 MiB    0.000 MiB       gc.collect()
40 2589.621 MiB    0.000 MiB       res = graph.calculate(data={})
41 2589.621 MiB    0.000 MiB       gc.collect()
42 2589.621 MiB    0.000 MiB       print(res)
43 2589.621 MiB    0.000 MiB       del res
44 2589.621 MiB    0.000 MiB       gc.collect()
45   29.730 MiB    0.000 MiB       del graph
46   29.730 MiB    0.000 MiB       gc.collect()

`

After calc_c has run, a and b should be able to be garbage collected, but it seems a reference is held by graph to every output.

Create a node without decorator

Add a method in Graph to register a new node without using a decorator:

graph.add_node(inputs=['a', 'b'], outputs=['c'], function=f_my_function)

Why does __init__.py not import anything?

Hi. I understand this is a style choice, but why use an empty __init__.py instead of filling it with from .core import *?
If you do it the second way, you can import with from pyungo import Graph instead of from pyungo.core import Graph, which is nice because I had no idea core.py existed inside of this package until I looked. Thank you!

RFE: support sub-graphs

It would be nice to add e method like:

bigger_graph = Graph.add_subgraph(some_graph)

and port all nodes from some_graph into bigger_graph.

RFE: allow for optional kwargs fro input data

  • Python's kwargs are optional (since defaults are given in the function declaration).
  • This library's kwargs feature is not default - it has to exist in the input-data or an error is raised.

These two facts cause a mismatch when converting traditional code into a graph-pipeline.

If it is feasible, it would really help to add another optional keword in the Graph.add_node().

Steps to reproduce

The following code:

graph = pyungo.Graph()

@graph.register(inputs=['a'], kwargs=['b'], outputs=['c'])
def f(a, b=2):
    return a + b

graph.calculate({'a': 1})

... raises PyungoError: The following inputs are needed: ['b']
while the function is fully capable of working without b.

Proposal

This should work:

@graph.register(inputs=['a'], optional=['b'], outputs=['c'])
def f(a, b=2):
    return a + b

graph.calculate({'a': 1})

and produce 3.

RFE: allow to have nodes producing the same output

There is a use-case for having multiple nodes producing the same output.
And only decide on calculation-time which path to use.

Example: convert units, and have multiple input-units convert to the same output.

It would be even useful to have a flag set on calculation-time whether to raise if dupe outputs detected, or just issue a warning and chose an arbitrary node, in cases where there is duplication in the inputs, and all paths produce the same result. Further doen the road, the flag could become a tri-state, to calculate all paths and compare results and raise if different only.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.