Code Monkey home page Code Monkey logo

Comments (8)

jpfairbanks avatar jpfairbanks commented on August 18, 2024 1

Yeah I totally understand about the time. The schedule is crazy compressed.

It looks like swapping out implementations of the following functions could yield an implementation that generates any kind of code (this is a note to future implementors).

        self.printFn = {
            "subroutine": self.printSubroutine,
            "program": self.printProgram,
            "call": self.printCall,
            "arg": self.printArg,
            "variable": self.printVariable,
            "do": self.printDo,
            "index": self.printIndex,
            "if": self.printIf,
            "op": self.printOp,
            "literal": self.printLiteral,
            "ref": self.printRef,
            "assignment": self.printAssignment,
            "exit": self.printExit,
            "return": self.printReturn,
            "function": self.printFunction,
            "ret": self.printFuncReturn,
        }

I don't expect working with fortran programs yet. We are starting with programs that are written in Julia as a source language, because that will let us focus on the metamodeling, reasoning, and transformation tasks without having to solve the very difficult problem of information extraction from lower level code that you guys are tackling. We also found that epirecipes cookbook, which gives us some "real world" models written by scientists with explanatory text and references to the original papers.

Of course we will be restricted with regard to established applications which are often written in Fortran. But when we get to that bridge (in year 3 of our 5 year program 😉 ) we will cross it by writing a Julia backend for AutoMATES and pressing "the easy button".

from delphi.

skdebray avatar skdebray commented on August 18, 2024

@adarshp @pauldhein
That's an interesting idea and we can certainly use that approach. I don't think there are any technical hurdles to doing this. (The type information for arguments and return values should be available from declarations in the original source code.)

There was one thing that I found really interesting thing about this. Based on my understanding of how eval is implemented, as well as various online posts (e.g., "Why eval/exec is bad", "Be careful with exec and eval in Python"), I had fully expected the code using eval and exec to be significantly slower than the straightforward function definition. When I did an experiment, though, the three versions turned out to have essentially identical performance. I was not expecting this. :) The code I used for my experiment is given below. The run times were:

T0 = 19.066645 [straightforward function]
T1 = 20.407631 [eval]
T2 = 20.169486 [exec]

Here is the timing experiment:

#!/usr/bin/env python3

import time

TMAX = 100.00
TMIN = 0.01

NITERS = 100000000

def get_time0(niters):
    n = 0
    time0 = time.time()
    for i in range(niters):
        n += PETPT__lambda__TD_0_0(TMAX, TMIN)
    time1 = time.time()
    return time1 - time0

def get_time1(niters):
    n = 0
    time0 = time.time()
    for i in range(niters):
        n += PETPT__lambda__TD_0_1(TMAX, TMIN)
    time1 = time.time()
    return time1 - time0

def get_time2(niters):
    n = 0
    time0 = time.time()
    for i in range(niters):
        n += PETPT__lambda__TD_0_2(TMAX, TMIN)
    time1 = time.time()
    return time1 - time0

def PETPT__lambda__TD_0_0(TMAX, TMIN):
    TD = ((0.6*TMAX)+(0.4*TMIN))
    return TD

PETPT__lambda__TD_0_1 = eval("lambda TMAX, TMIN: ((0.6*TMAX)+(0.4*TMIN))")

exec("def PETPT__lambda__TD_0_2(TMAX: float, TMIN: float) -> float: return ((0.6*TMAX)+(0.4*TMIN))")

def main():
    t0 = get_time0(NITERS)
    print('T0 = {:f}'.format(t0))

    t1 = get_time1(NITERS)
    print('T1 = {:f}'.format(t1))

    t2 = get_time2(NITERS)
    print('T2 = {:f}'.format(t2))

main()

from delphi.

adarshp avatar adarshp commented on August 18, 2024

That's good news, thanks for doing that benchmarking! I would really like to go into the guts of the program analysis code sometime in the near future and understand it better as well...

from delphi.

skdebray avatar skdebray commented on August 18, 2024

@adarshp @cl4yton
I think it would be useful for us to sit down and nail down our implementation decisions along with a priority order on them so that we're all in agreement.

from delphi.

jpfairbanks avatar jpfairbanks commented on August 18, 2024

@skdebray, from my understanding of why eval could be slow, the prohibition applies not to defining functions in eval, but to doing calculations inside eval. If, like in your benchmark, you eval a function definition, then that function is equivalent to the function defined in literal code. I think you will see a performance difference with

# parsing and interpreting is slooooow.
x = 0
for i in range(1000):
    eval("x += 1")
print(x)
# faster because we don't have to parse and interpret the "x+=1" every time.
x = 0
for i in range(1000):
    x += 1
print(x)

I think that using a pickle specific format to represent the code really commits you to being python specific. For example, how would you write an interpreter for GrFNs in another language if the functions are stored as pickles of python functions?

Having a language neutral representation of the functions would be helpful. For example

def PETPT__lambda__TD_0(TMAX, TMIN):
    TD = ((0.6*TMAX)+(0.4*TMIN))
    return TD

could be represented as

(def PETP_lambda_TD_0
        (TMAX, TMIN)
        (body (assign TD ((0.6*TMAX)+(0.4*TMIN)))
              (return TD))
)

which is a lispy way of writing this JSON value:

{ "funcdef": {"name": "PETPT__lambda__TD_0",
  "args": ["TMAX", "TMIN"],
  "body": [ "assign TD ((0.6*TMAX)+(0.4*TMIN))",
              "return TD"]
}}

The particular syntax for expressing the functions doesn't matter but I think that whatever it is, it should be something that can be interpreted by multiple languages. That way someone can come along and write a GrFN to L compiler where L is some awesome programming language.

from delphi.

skdebray avatar skdebray commented on August 18, 2024

@jpfairbanks You're absolutely correct about the eval'd code's performance in this situation -- my initial misconception was just one of my un-thought-through prejudices ("All Xs are Y! Grrr!"). Once I saw the numbers and looked at the code, the reason immediately became clear.

Re: "using a pickle specific format to represent the code really commits you to being python specific" -- your point is well taken, and was raised also by @pratikbhd (also working on this project). We front-end folks are happy to provide @adarshp whatever representation allows his team to move forward quickly. But I'd be happy to discuss this further.

from delphi.

jpfairbanks avatar jpfairbanks commented on August 18, 2024

Yeah I agree that "just pickle it" is really easy to get going early, but I think in the long run limits generalizability. I think there are functions that cannot be pickled. This post https://medium.com/@emlynoregan/serialising-all-the-functions-in-python-cd880a63b591 shows that dill cannot handle mutually recursive functions. I think this restriction also applies to libraries that use numpy/scipy because they link against particular C/fortran codes and are generally pretty complicated.

Is there some document/code that describes how the fortran to python translation is done? I would like to look there to see what the intermediate representation looks like. Without reading the code I assume it looks like

  1. read fortran source code into a tree (xml, json, in memory objects)
  2. transform that tree according to some logic
  3. render a set of python code snippets by walking the tree

from delphi.

adarshp avatar adarshp commented on August 18, 2024

Hi @jpfairbanks, thanks for pointing that out, I was unaware of those limitations of pickling. Looks like we have some thinking to do on our end about the IR. I believe the IR (as opposed to the IR') that you referred to in #132 would be what the variable outputDict points to here:

outputDict = translator.analyze(trees, comments)

The script translate.py contains the code that transforms the XML output of the OpenFortranParser into a dict that then gets transformed to Python code by pyTranslate, which in turn gets parsed by the Python ast module to create an AST, which then gets parsed by genPGM.py to produce the GrFN object.

An example of the pipeline can be found in the test module, test_program_analysis.py in these lines:

original_fortran_file = "tests/data/crop_yield.f"
preprocessed_fortran_file = "crop_yield_preprocessed.f"
with open(original_fortran_file, "r") as f:
inputLines = f.readlines()
with open(preprocessed_fortran_file, "w") as f:
f.write(f2py_pp.process(inputLines))
xml_string = sp.run(
[
"java",
"fortran.ofp.FrontEnd",
"--class",
"fortran.ofp.XMLPrinter",
"--verbosity",
"0",
"crop_yield_preprocessed.f",
],
stdout=sp.PIPE,
).stdout
trees = [ET.fromstring(xml_string)]
comments = get_comments.get_comments(preprocessed_fortran_file)
xml_to_json_translator=translate.XMLToJSONTranslator()
outputDict = xml_to_json_translator.analyze(trees, comments)
pySrc = pyTranslate.create_python_string(outputDict)
asts = [ast.parse(pySrc)]
pgm_dict = genPGM.create_pgm_dict(
"crop_yield_lambdas.py", asts, "crop_yield.json"
)
return pgm_dict

Also, while the idea of moving to a less language-specific representation does sound attractive to me, I'm hesitant to commit to it with a concrete timeline because it would involve a non-trivial refactoring of our codebase, and I am not sure we (or more specifically, @skdebray and @pratikbhd , who will be doing the actual heavy lifting here, so I don't want to be cavalier about committing their time) have the requisite person-hours to implement this change immediately, especially given the short 18-month timeline and tight schedule of deliverables negotiated in our OT contract with DARPA. Besides, for the ASKE program, we are restricting ourselves to analyzing Fortran programs, so the numpy/scipy issue is, I think a bit beyond the scope of this iteration of the AutoMATES project.

I say 'this iteration', because I am optimistic that if all the ASKE performer groups do well on this program, it will help make the case for a four or five-year full DARPA program with the same themes as ASKE, which would really be required to take the tools that we are making from prototypes to more full-fledged products, and will give us the breathing room to prioritize generalization rather than quick prototyping.

That being said, I do think it is important to generalize the representation, and I would like to revisit this issue down the road, so I'm going to leave this issue open.

Also, we welcome pull requests, so if this language-specific IR would really help boost the work you are doing in the ASKE program by giving you access to semantically enriched computation graphs from Fortran programs (which would make sense, since they comprise a not-insignificant amount of existing scientific code), and you believe that you can implement this change in a timeframe consistent with meeting your schedule of deliverables, you are welcome to try to do so, and I would be happy to help as much as possible.

from delphi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.