🐵 bonobo

Data-processing. By monkeys. For humans.

Bonobo is a data-processing library for python 3.5+ that emphasises writing simple, atomic, plain old python functions and chaining them using a basic acyclic graph. The nodes will need a bit of plumbery to be runnable in different means (iteratively, in threads, in processes, on different machines ...) but that should be as transparent as possible.

The only thing asked of the developer is to write "pure" functions to process data (create a new dict, don't change in place, etc.), and everything should be fine from this point.

It's a young rewrite of an old python2.7 tool that ran millions of transformations per day for years on production, so as though it may not yet be complete or fully stable (please, allow us to reach 1.0), the underlying concepts work.

Documentation: http://docs.bonobo-project.org/
Release announcements: http://eepurl.com/csHFKL
Old project (for reference, don't use anymore, instead, help us recode the missing parts in bonobo): http://etl.rdc.li/

Made with ♥ by Romain Dorgueil and contributors.

Roadmap (in progress)

Bonobo is young. This roadmap is alive, and will evolve. Its only purpose is to write down incoming things somewhere.

Version 0.2

Changelog
Migration guide
Update documentation
Threaded does not terminate anymore (fixed ?)
More tests

Bugs:

KeyboardInterrupt does not work anymore. (fixed ?)
ThreadPool does not stop anymore. (fiexd ?)

Configuration

Support for position arguments (options), required options are good candidates.

Context processors

Be careful with order, especially with python 3.5. (done)
@contextual decorator is not clean enough. Once the behavior is right, find a way to use regular inheritance, without meta.
ValueHolder API not clean. Find a better way.

Random thoughts and things to do

Class-tree for Graph and Nodes
Class-tree for execution contexts:
- GraphExecutionContext
- NodeExecutionContext
- PluginExecutionContext
Class-tree for ExecutionStrategies
- NaiveStrategy
- PoolExecutionStrategy * ThreadPoolExecutionStrategy * ProcessPoolExecutionStrategy
- ThreadExecutionStrategy
- ProcessExecutionStrategy
Class-tree for bags
- Bag
- ErrorBag
- InheritingBag
Co-routines: for unordered, or even ordered but long io.
"context processors": replace initialize/finalize by a generator that yields only once

"execute" function:

def execute(graph: Graph, *, strategy: ExecutionStrategy, plugins: List[Plugin]) -> Execution:
    pass

Handling console. Can we use a queue, and replace stdout / stderr ?

kwresearch / bonobo Goto Github PK

bonobo's Introduction

🐵 bonobo

Roadmap (in progress)

Version 0.2

Configuration

Context processors

Random thoughts and things to do

bonobo's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent