Data-processing. By monkeys. For humans.
Bonobo is a data-processing library for python 3.5+ that emphasises writing simple, atomic, plain old python functions and chaining them using a basic acyclic graph. The nodes will need a bit of plumbery to be runnable in different means (iteratively, in threads, in processes, on different machines ...) but that should be as transparent as possible.
The only thing asked of the developer is to write "pure" functions to process data (create a new dict, don't change in place, etc.), and everything should be fine from this point.
It's a young rewrite of an old python2.7 tool that ran millions of transformations per day for years on production, so as though it may not yet be complete or fully stable (please, allow us to reach 1.0), the underlying concepts work.
- Documentation: http://docs.bonobo-project.org/
- Release announcements: http://eepurl.com/csHFKL
- Old project (for reference, don't use anymore, instead, help us recode the missing parts in bonobo): http://etl.rdc.li/
Made with โฅ by Romain Dorgueil and contributors.
Bonobo is young. This roadmap is alive, and will evolve. Its only purpose is to write down incoming things somewhere.
- Changelog
- Migration guide
- Update documentation
- Threaded does not terminate anymore (fixed ?)
- More tests
Bugs:
- KeyboardInterrupt does not work anymore. (fixed ?)
- ThreadPool does not stop anymore. (fiexd ?)
- Support for position arguments (options), required options are good candidates.
- Be careful with order, especially with python 3.5. (done)
- @contextual decorator is not clean enough. Once the behavior is right, find a way to use regular inheritance, without meta.
- ValueHolder API not clean. Find a better way.
Class-tree for Graph and Nodes
Class-tree for execution contexts:
- GraphExecutionContext
- NodeExecutionContext
- PluginExecutionContext
Class-tree for ExecutionStrategies
- NaiveStrategy
- PoolExecutionStrategy * ThreadPoolExecutionStrategy * ProcessPoolExecutionStrategy
- ThreadExecutionStrategy
- ProcessExecutionStrategy
Class-tree for bags
- Bag
- ErrorBag
- InheritingBag
Co-routines: for unordered, or even ordered but long io.
"context processors": replace initialize/finalize by a generator that yields only once
"execute" function:
def execute(graph: Graph, *, strategy: ExecutionStrategy, plugins: List[Plugin]) -> Execution: pass
Handling console. Can we use a queue, and replace stdout / stderr ?