Code Monkey home page Code Monkey logo

dag_gettsim's Introduction

dag_gettsim

Introduction

This is a small toy package to explore the use of Directed Acyclic Graphs (DAGs) in gettsim. It is only meant as an illustration and has no practical use whatsoever.

There is an accompanying tutorial repository. We opted for a separate repository, to make the distinction between gettsim code and user code very explicit.

This is based on ideas by Tobias Raabe, Hans-Martin von Gaudecker and Janoś Gabler.

Installation

Currently only local installation is possible. Open a terminal in the root folder and type:

conda env create -f environment.yml conda activate dag_gettsim pip install -e .

Some Design Choices

Data Arguments / Data Storage

I suggest that internally, we store data as dictionary of pandas Series, not as a DataFrame. This is easier for Data that is needed on different aggregation levels (household vs. individual). Then, theoretically, data arguments could be anything (numpy array, pandas Series, DataFrames), but I would suggest we always use Series.

Function Outputs

Anything that could be a data argument can also be a function output. So I suggest we require all functions to return pandas Series, but we could be more flexible if necessary.

How Parameters are Handled

All functions take one argument called params, which is a pandas Series. The Series contains all parameters that are needed for the model that is being estimated. This is a huge reduction compared to the full parameter database, but probably much more than what is actually needed inside a single function.

Some functions might not even need parameters. I would still pass in the params almost everywhere in case a user wants to replace the function by something that needs params with minimal changes to the rest.

How functions are specified

I decided to have a dictionary of functions (which in the long run could be paths to functions that are then imported using importlib) instead of a flat list of functions. This basically allows to rename functions before passing them to gettsim. I make heavy use of this in the tutorial and really like it!

To-Do

  • Implement garbage collection in execute_dag (should be quite easy)
  • Implement a parallel scheduler and experiment with existing ones
  • Make model specifications yaml and json compatible. Currently the user provided functions have to be in a dictionary. We should also allow for paths to functions that are then imported using importlib
  • Implement a better parameter backend (e.g. Database) and allow to store metadata (units, ...) for parameters

Lessons learned

  • Whenever you are tempted to write a function that does something with a DAG, check if networkx has already implemented your function.

Open Questions

  • Do we really have to distinguish between new_functions, obsolete_functions and overriden_functions. I think it is enough to provide one dictionary with new functions. If something is new or overriden can be determined automatically. If something should be ignored can be determined from the targets.
  • Should all functions have the params argument or should this be optional? A try except block around the partial step would make it optional.

dag_gettsim's People

Contributors

janosg avatar tobiasraabe avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.