Code Monkey home page Code Monkey logo

ml4ai / delphi Goto Github PK

View Code? Open in Web Editor NEW
24.0 24.0 20.0 56.89 MB

Framework for assembling causal probabilistic models from text and software.

Home Page: http://ml4ai.github.io/delphi

License: Apache License 2.0

Makefile 0.16% Python 6.56% Fortran 87.75% Shell 0.08% CSS 1.32% Smarty 0.01% TeX 0.07% JavaScript 0.16% HTML 0.07% Dockerfile 0.01% Julia 0.03% C++ 2.68% CMake 0.05% Forth 0.03% Scheme 0.07% Gnuplot 0.91% Scala 0.04% Pascal 0.01% Pawn 0.01% C 0.01%

delphi's People

Contributors

adarshp avatar aishwarya34 avatar bgyori avatar cl4yton avatar cthoyt avatar fan-luo avatar jastier avatar jiaminghao avatar lchamp87x avatar manujinda avatar min-yin-sri avatar pratikbhd avatar skdebray avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

delphi's Issues

DBN-JSON representation for Fortran files without a PROGRAM module

The current structure of the pgm.json file is as follows:

{
"start": <name_of_PROGRAM_module>
"name": "pgm.json"
"dateCreated": <date_of_creation>
"functions": [<list_of_functions>]
}

This means that the "start" key is only created when there is a PROGRAM module in the FORTRAN file. Both the PETPT.for and PETASCE.for files do not have a PROGRAM module and only contain SUBROUTINES. For these files, a "start" field is not created and the pgm/lambdas generation script crashes.

For now, I will add an initial check where a search for this PROGRAM module is made and if not found, a dummy "start" field will be added. Moving forward, how can we represent such FORTRAN files?

Locations of docstrings

stuff in here? Note, it does not seem like you can attach a docstring to any

Replying to your comment, @cl4yton - we can't place docstrings in arbitrary places for Python to automatically process them, that's true. However, we can always modify the __doc__ attribute of objects to set their docstrings, for example:

def function_name():
    pass

function_name.__doc__ = "docstring"

Pickling lambda functions

@pauldhein and I were discussing this a while ago - it occurred to us that since the output of the program analysis pipeline is a pickled Python object, there is no reason, in principle, why the lambda functions couldn't be pickled alongside the rest of the output. For example, the following function in PETPT_lambdas.py:

def PETPT__lambda__TD_0(TMAX, TMIN):
    TD = ((0.6*TMAX)+(0.4*TMIN))
    return TD

Could be constructed as follows:

PETPT__lambda__TD_0 = eval("lambda TMAX, TMIN: ((0.6*TMAX)+(0.4*TMIN))")

Here, the string argument to eval could be constructed in the same way the second line of the existing lambda functions in the lambdas.py files are built up from parsing the XML AST output of OFP.

Alternatively (and this seems to me to be the right way), one could take advantage of type annotations and use the more powerful def syntax for declaring functions -

exec("def PETPT__lambda__TD_0(TMAX: float, TMIN: float) -> float: return ((0.6*TMAX)+(0.4*TMIN))")

(assuming we can get these types - can we?)

and later the PETPT__lambda__TD_0 object can be used as a value in the dict object produced by genPGM.py.

Since functions are first class objects in Python, you can actually set attributes for functions as well - perhaps this might make it easier to keep track of things like the function type (assign/lambda/condition, etc.), the reference, and so on:

PETPT__lambda__TD_0.fn_type = "lambda"
PETPT__lambda__TD_0.reference = 9
PETPT__lambda__TD_0.target = "TD"

And then if someone wants to serialize the GrFN object to a JSON file, we could define the following function:

import inspect

def to_json_serialized_dict(function):
    return {
        "name": function.__name__,
        "type": function.fn_type,
        "target": function.target,
        "sources": inspect.signature(function) 
        # Plus some processing to massage the above into a JSON-serializable dict
        ...
    }

Not super urgent but I do think that it might be a investment worth making to simplify things in the long run...

delphi depends on unreleased features from Indra

delphy/core.py imports the class Influence from indra.statements module.

However, the latest indra version release, 1.5, which is the one that pip installs by default from PyPi, does not include this class, since this feature was introduced after the latest release.

I suggest the following as an interim solution:

  • Update the requirements.txt file to point at the indra git repository instead of Pypi.
  • Add dependency_links in setup.py to make it install indra using git instead of PyPi.

And, for later releases:

  • Use released versions as dependencies and hold back any changes that depend on unreleased features.
  • Add version specifications in requirements.txt and setup.py to avoid confusions and prevent problems due to backwards incompatible changes in any of the dependencies.

Reduce the amount of stateful behaviour in genPGM.py

I'm having some issues with global state in genPGM.py - basically, the lambdas.py file seems to change upon multiple runs of the same function with the same inputs.

Steps to reproduce (assuming you are in the Delphi repo root directory) -

cd delphi/program_analysis/autoTranslate
./autoTranslate ../../../data/program_analysis/crop_yield.f
python

Then in the Python interpreter, do:

>>> from delphi.program_analysis.autoTranslate.scripts.genPGM.import get_asts_from_files, create_pgm_dict
>>> asts = get_asts_from_files(['crop_yield.py'])
>>> pgm_dict = create_pgm_dict('lambdas.py', asts, 'pgm.json')

The lambdas.py file is unchanged by this.

def UPDATE_EST__lambda__TOTAL_RAIN_0(TOTAL_RAIN, RAIN):
    TOTAL_RAIN = (TOTAL_RAIN+RAIN)
    return TOTAL_RAIN

def UPDATE_EST__lambda__IF_1_0(TOTAL_RAIN):
    return (TOTAL_RAIN<=40)

def UPDATE_EST__lambda__YIELD_EST_0(TOTAL_RAIN):
    YIELD_EST = (-((((TOTAL_RAIN-40)**2)/16))+100)
    return YIELD_EST

def UPDATE_EST__lambda__YIELD_EST_1(TOTAL_RAIN):
    YIELD_EST = (-(TOTAL_RAIN)+140)
    return YIELD_EST

def CROP_YIELD__lambda__MAX_RAIN_0():
    MAX_RAIN = 4.0
    return MAX_RAIN

def CROP_YIELD__lambda__CONSISTENCY_0():
    CONSISTENCY = 64.0
    return CONSISTENCY

def CROP_YIELD__lambda__ABSORPTION_0():
    ABSORPTION = 0.6
    return ABSORPTION

def CROP_YIELD__lambda__YIELD_EST_0():
    YIELD_EST = 0
    return YIELD_EST

def CROP_YIELD__lambda__TOTAL_RAIN_0():
    TOTAL_RAIN = 0
    return TOTAL_RAIN

def CROP_YIELD__lambda__RAIN_0(DAY, CONSISTENCY, MAX_RAIN, ABSORPTION):
    RAIN = ((-((((DAY-16)**2)/CONSISTENCY))+MAX_RAIN)*ABSORPTION)
    return RAIN

However, upon calling this function a second time:

>>> pgm_dict = create_pgm_dict('lambdas.py', asts, 'pgm.json')

The numbers at the end of the function names in lambdas.py get incremented by one.

def UPDATE_EST__lambda__TOTAL_RAIN_1(TOTAL_RAIN, RAIN):
    TOTAL_RAIN = (TOTAL_RAIN+RAIN)
    return TOTAL_RAIN

def UPDATE_EST__lambda__IF_1_1(TOTAL_RAIN):
    return (TOTAL_RAIN<=40)

def UPDATE_EST__lambda__YIELD_EST_2(TOTAL_RAIN):
    YIELD_EST = (-((((TOTAL_RAIN-40)**2)/16))+100)
    return YIELD_EST

def UPDATE_EST__lambda__YIELD_EST_3(TOTAL_RAIN):
    YIELD_EST = (-(TOTAL_RAIN)+140)
    return YIELD_EST

def CROP_YIELD__lambda__MAX_RAIN_1():
    MAX_RAIN = 4.0
    return MAX_RAIN

def CROP_YIELD__lambda__CONSISTENCY_1():
    CONSISTENCY = 64.0
    return CONSISTENCY

def CROP_YIELD__lambda__ABSORPTION_1():
    ABSORPTION = 0.6
    return ABSORPTION

def CROP_YIELD__lambda__YIELD_EST_1():
    YIELD_EST = 0
    return YIELD_EST

def CROP_YIELD__lambda__TOTAL_RAIN_1():
    TOTAL_RAIN = 0
    return TOTAL_RAIN

def CROP_YIELD__lambda__RAIN_1(DAY, CONSISTENCY, MAX_RAIN, ABSORPTION):
    RAIN = ((-((((DAY-16)**2)/CONSISTENCY))+MAX_RAIN)*ABSORPTION)
    return RAIN

This side effect needs to be gotten rid of.

Implement updating of ProgramAnalysisGraph nodes to be in sync with the loop index

Right now, there is a 'delay' in the updating of nodes in a ProgramAnalysisGraph (which should be integrated better into the AnalysisGraph class) that makes it so that the 'downstream' outputs like YIELD_EST are updated a couple of steps after the 'upstream' outputs like RAIN (in the example crop_yield.f. This results in the DAY variable which serves as the loop index lagging behind by 2 compared to the FORTRAN program.

Add 'drop-in'/'setup' functionality.

Basically, this is the idea: an analyst should be able to 'set up' the workspace with a script before launching the visualizer/simulator. Thus, the app object should be available to import, and app.run() can be called after the setup to launch the app with the desired configuration.

Things that need to be able to be pulled into the workspace:

  • INDRA Statements
  • Eidos JSON output
  • Raw text
  • Text files
  • Directory of text files.

Library Functions

Appropriate handlers need to be created for library functions (such as read, write, etc) in the program analysis code. It is not clear to me at this moment if it is necessary they preserve the actual behavior of fortran's libraries though. By that, I mean I don't think delphi cares about the particulars of the call. I believe it only cares that the code is receiving input or producing output. If that is the case, rather than creating handler to translate fortran library calls into python calls, it may be possible to instead replace all input calls with a single call to a function like 'input' and all output calls to a function like 'output'. The user can then define what an input and output call is.

Implement functionality to process data from FEWSNET

The data is contained in shape files, so we will need to figure out how to use those.

  • Write script to programmatically download FEWSNET data.
  • Write script that gets IPC phase classifications for individual South Sudan districts for different time periods.
  • Connect data from shapefiles that contain IPC phase classification data and the shape files that contain Administrative boundaries

indra and networkx version

While creating a fresh virtual env for delphi development using requirements, I noticed this was generated:
indra 1.7.0 has requirement networkx==1.11, but you'll have networkx 2.1 which is incompatible.
Is this a concern?

GrFN spec interlanguage compatibility

@adarshp, I took a look at the spec for GrFN.
https://delphi.readthedocs.io/en/master/grfn_spec.html#top-level-grfn-specification
This is a great representation. Kind of like a higher level IR for translating between languages.

One thing I noticed is this note:

TODO: we think Fortran is restricted to integer values for iteration variables, which would include iteration over indexes into arrays. Need to double check this.

If the GrFN schema is going to work for multiple languages it is going to need to support iterator loops like C++, Python, Julia have.

I guess you could have int-loop and iterator loop or a per language loop construct.

Update readme file with new configuration options

This is how I understand how the system has to be run now:

Creation of the model:
root@59c89406cc39:/src/delphils# ./delphi.py --create_model --indra_statements data/sample_indra_statements.pkl --adjective_data data/adjectiveData.tsv --output_cag_json /out/testDelphiDanielJSON --output_dressed_cag /out/testDelphiDanielCAG --output_variables_path /out/testDelphiDanielVar

Execution of the model:
root@59c89406cc39:/src/delphils# ./delphi.py --execute_model --input_dressed_cag /out/testDelphiDanielCAG --input_variables_path /out/testDelphiDanielVar --output_sequences /out/DelphiSequencesResult.csv

Kill json_dev branch

@adarshp : I'd like to kill the json_dev branch. Rather than merge with master (which is far ahead), I'm going to create a new branch called pa_dev which will be for program analysis. But before I go ahead an kill json_dev, just wanted to run by you.

Add flag for input adjective data

To incorporate delphi into a workflow, the inputs and outputs must be explicitly specified - right now the path to the gradable adjective data file is hard-coded into the system. @dgarijo

New handlers

The program analysis code has only been tested on some fairly small programs. So it is likely that there are program constructs it has not seen before, such as arrays, strings and while loops. Handlers will need to be added into the program analysis code for these constructs.

where to put for2py test inputs

I've been making up tests for various Fortran language constructs for2py will have to handle (currently: I/O and modules; soon: multi-dimensional arrays). Right now these tests are in a couple of different places: some are in delphi/tests/data and some are in delphi/delphi/program_analysis/autoTranslate/tests/test_data/. It would be good for all of these to live in the same place. Where should I put them?

Option to specify output location when creating model/dressed CAG

@dgarijo: Re: our email conversation - it is possible to set the output path of the result folder while creating the model using the --model_dir flag. However, this is probably not so clear from the help message. In any case, I'll add flags to the delphi CLI to specify the separate locations of the output model files.

program analysis also in delphi

@adarshp : Posting as "question" for discussion, although I'm "stating" it here...
The program analysis project right now has the following components:
(1) Analyze fortran to map to python
(2) Analyze pythons AST to map to CAG with functions that can be input to delphi
(3) Sensitivity analysis of delphi CAG with functions
Item (3) will be in delphi (has general use).
For now, I'd like to put parts of (2) also in the delphi project, under the directory program_analysis/ (at project root, sibling to sensitivity). For now I'll keep this in the sensitivity branch. This means probably adding Jon Stephens to the project.
Long term: This may move out depending on whether we consider the python side of program analysis a component of delphi (which currently I'm ok with it)

[Feature request]: Running Delphi with only a subset of the variables in a CAG

In order to connect Delphi to other workflows in MINT, we may need to run the model with a subset of variables of the original CAG. In those cases, it would be useful to write a function that takes in a list of variables, and removes nodes in the CAG that are not supposed to be in the CAG, prior to execution.

Set up an backcasting evaluation framework for Delphi.

  • Implement a function that takes the name (str) of a concept as a parameter, and the maximum depth for graph traversal (int), and returns a CAG centered around the concept.
  • Write a function that takes a concept-level CAG as an input parameter connects each concept of the CAG with an indicator, while ensuring that indicators are not shared among concepts.
  • Set initial conditions for backcasting - get mean values of indicators from relevant data sources, for South Sudan in 2016.
  • Run the DBN for one time step
  • Compare predicted values of indicators to actual values.

Set up DB for storing parameterization data.

Right now, Delphi uses data tables stored as plain text files to parameterize its models. However, this will not scale with increasing amounts of data. Another concern is minimization of git repo bloat. For these reasons, it might be good to have an online database (hosted on vision or a SISTA server) that Delphi can query programmatically. I'm leaning towards Neo4j since that's the DB system I have the most experience with.

Handle Function Returns

The program analysis code does not yet handle function returns. This is in part because the handling of such returns requires a few non-trivial additions. As a note, a good resource on functions can be found here:
https://pages.mtu.edu/~shene/COURSES/cs201/NOTES/F90-Subprograms.pdf

The following additions must be made:

  • Functions must be "contained" within programs or subroutines which are the only thing that can call them. Thus, functions must be scoped so that multiple definitions of the same name in different contains scopes will not interfere with each other. This can be done by renaming functions. I have a few notes on this as well: (1) I believe contains statements can be nested. (2) I'm not sure, but it might be possible to access local variables in the parent scopes.
  • The value to return is encoded differently than most other languages. The return value is stored in a variable of the same name as the function. Thus, for a function foo, the return value is set by assigning foo = value. Logic that is aware of this will need to be added to properly translate to python.
  • There is an actual return keyword that terminates a function/subroutine/program. A transformation will likely be necessary to preserve this behavior as I assume delphi expects an entire container/function to execute to its entirety. If that is the case, a program transformation will likely be necessary to preserve the return behavior.

Request to change contents of "input" attribute of loop_plate specification

(Just realized I should have first created this as an issue!)

@stephensj2 : @pauldhein is making good progress getting the DBN-JSON to DBN graph wiring working. Paul identified a change in the DBN-JSON output that we'd like to ask you to make -- this is just for the loop_plate specification. Up to this point, my thought was that the "input" attribute of the loop_place spec should list all of the variables that are referenced within the loop_plate. It turns out, it is much more useful to Paul to have this be the list of variable names that are set in the scope (container fn) that the loop_plate appears within. And in this case, we don't need actual <veriable_reference>s (no need for the index info), just the <variable_name> (the base string name of the variable). I've update the description of the Function Loop Place Specification to reflect this (text in maroon).

Is it easy for you to make this change?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.