wala / ml Goto Github PK

License: Eclipse Public License 2.0

Python 15.04% Java 84.74% HTML 0.18% Shell 0.03%

analysis python tensorflow wala

ml's Introduction

The T. J. Watson Libraries for Analysis (WALA) provide static analysis capabilities for Java bytecode and related languages and for JavaScript. The system is licensed under the Eclipse Public License, which has been approved by the OSI (Open Source Initiative) as a fully certified open source license. The initial WALA infrastructure was independently developed as part of the DOMO research project at the IBM T.J. Watson Research Center. In 2006, IBM donated the software to the community.

For recent updates on WALA, join the mailing list.

Core WALA Features

WALA features include:

Java type system and class hierarchy analysis
Source language framework supporting Java and JavaScript
Interprocedural dataflow analysis (RHS solver)
Context-sensitive tabulation-based slicer
Pointer analysis and call graph construction
SSA-based register-transfer language IR
General framework for iterative dataflow
General analysis utilities and data structures
A bytecode instrumentation library (Shrike)

Getting Started

The fastest way to get started with WALA is to use the packages in Maven Central, as noted here. See the WALA-start repo for a Gradle-based example. We are actively re-organizing the deeper wiki technical documentation. In the meantime, you can check out tutorial slides to get an overview of WALA:

Core WALA (PDF)
WALA JavaScript (PDF)

You can also watch screencasts of the WALA JavaScript tutorial here.

Finally, for now, to search the wiki documentation, we recommend a site-specific search on GitHub, e.g., a search for "call graph".

Documentation

We're hosting documentation for WALA on the GitHub wiki. We've chosen a wiki format just so that you can contribute. Don't be shy!

The WALA publications department is populating this wiki with technical documentation on a demand-driven basis, driven by questions posted to the wala-wala mailing list and also Gitter. We recommend this page for searching the mailing list archives.

Currently, we have the JavaDoc documentation for the WALA code being updated continuously. If you think a particular file deserves better javadoc, please open a feature request.

Getting Help

To get help with WALA, please either email the mailing list, ask a question on Gitter, or open an issue.

Required Java Versions

Most components of each official WALA release are built for use with Java 11 or newer. However, components that use Eclipse require at least Java 17.

Building from Source

WALA uses Gradle as its build system. If you intend to modify or build WALA yourself, then see the Gradle-specific README for more instructions and helpful tips.

WALA Tools in JavaScript

Recently, we have been expanding the set of WALA tools implemented in JavaScript. We have released a normalizer and some basic program analyses for JavaScript in the JS_WALA GitHub repository. We have also made available jsdelta and WALA Delta, delta debuggers for JavaScript-processing tools. Please see the linked GitHub repositories for further details on these tools.

WALA-Based Tools

Several groups have built open-source tools that enhance or build on WALA that may be useful to other WALA users. For details, see the Wala-based tools page.

Acknowledgements

We thank YourKit for providing WALA developers with a complimentary license for their excellent Java profiler, which we use to improve and maintain WALA performance.

ml's People

Contributors

Stargazers

Watchers

Forkers

anomy sifislag lujiefsi lczxxx123 wdong2 anemone95 sritejakv zb062 mfazelnia maldil jpksh90 ryutaro-kodama juliandolby ponder-lab canliture

ml's Issues

Log when adding a tensor dataflow source

When a dataflow source is added in the tensor analysis, issue a log with what is being added. Right here:

ML/com.ibm.wala.cast.python.ml/source/com/ibm/wala/cast/python/ml/client/PythonTensorAnalysisEngine.java

Line 83 in 7e18125

&& ni.getException() != vn) sources.add(src);

jython and jython3 versions

The jython and jython3 versions used are not in the maven repo. I can find the source code of jython with the same version and build it with Ant. But for jython3, I did not even find a version 0.0.1-SNAPSHOT. Is this the same package as https://github.com/jython/jython3?

Inconsistent Jython versions

Commit b4b6e8e says to use Python 2 for now, but it references jython3:

ML/com.ibm.wala.cast.python.ml/pom.xml

Line 12 in bb086b5

<artifactId>jython3</artifactId>

That results in an inconsistency with the follow:

ML/com.ibm.wala.cast.python.ml/pom.xml

Line 46 in bb086b5

<artifactId>com.ibm.wala.cast.python.jython</artifactId>

If we are using Jython 3, I believe we should be using com.ibm.wala.cast.python.jython3 instead of com.ibm.wala.cast.python.jython.

What happens here as a result is that we load up Jython 3 code that calls methods that don't exist in Jython 2, thus throwing a NoSuchMethod exception. An example is the PythonTree class, which has the method getLineno() in Jython (Python 2) but getLine() in Jython 3 (Python 3).

Certain decorators cause functions not to appear in the call graph

There are certain decorators that cause the call graph to be created correctly. Several times now, I see that nodes are missing when the function is decorated. The decorator can alter the behavior of the function, and I assume that the call graph construction is trying to account for that. More info soon.

Using `if name == 'main':` results in a different IR and CFG

I suspect that if __name__ == '__main__': is not being considered as an entry point. Using it and not using it on the same examples gives me a different IR and CFGs. Seems related to #17.

Using Python AST module instead of jython/jython3

Given Jython is almost dead, is it possible to use Python built-in ast module and parse its dump in Java and convert it to CAst, instead of using Jython/Jython3?

Missing the map() built-in function

The map() built-in function is missing, which has consequences for lambdas. Consider the following code:

# From https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions

my_list = [10]


def fun_with_side_effects(y):
    my_list[0] = 1
    return y ** 2


def f():
    squares = list(map(lambda x: fun_with_side_effects(x), range(10)))


f()

The IR for f():

callees of node f : [range, list]

IR of node 3, context CallStringContext: [ script A.py.do()LRoot;@113 ]
<Code body of function Lscript A.py/f>
CFG:
BB0[-1..-2]
    -> BB1
BB1[0..6]
    -> BB2
    -> BB5
BB2[7..7]
    -> BB3
    -> BB5
BB3[8..8]
    -> BB4
    -> BB5
BB4[9..9]
    -> BB5
BB5[-1..-2]
Instructions:
BB0
BB1
0   v5 = lexical:list@Lscript A.py           A.py [12:14] -> [12:18]
1   v8 = lexical:map@Lscript A.py            A.py [12:19] -> [12:22]
2   v10 = new <PythonLoader,Lscript A.py/f/lambda1>@2<no information>
3   global:global script A.py/f/lambda1 = v10<no information>
4   putfield v1.< PythonLoader, LRoot, lambda1, <PythonLoader,LRoot> > = v10<no information> [1=[the function]]
5   v15 = lexical:range@Lscript A.py         A.py [12:59] -> [12:64]
6   v13 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v15,v16:#10 @6 exception:v17A.py [12:59] -> [12:68]
BB2
7   v6 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v8,v10,v13 @7 exception:v18A.py [12:19] -> [12:69]
BB3
8   v3 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v5,v6 @8 exception:v19A.py [12:14] -> [12:70] [3=[squares]]
BB4
BB5

So far so good. For comparison, we can compare list() above, that is not missing (FYI). But, in the pointer analysis, map() maps to nothing:

[Node: <Code body of function Lscript A.py/f> Context: CallStringContext: [ script A.py.do()LRoot;@113 ], v8] --> []

By contrast, list() is populated:

[Node: <Code body of function Lscript A.py/f> Context: CallStringContext: [ script A.py.do()LRoot;@113 ], v5] --> [[com.ibm.wala.cast.python.ipa.summaries.BuiltinFunctions$BuiltinFunction@9053b33]]

One consequence of this issue is that a lambda given to map() is not showing in the call graph. In other words, there is no call graph node for the lambda above, and subsequently, we have no node for fun_with_side_effects either.

Digression

The map() function returns an iterator, but there is no such type in PythonTypes. We would seemingly have to add it.
List comprehensions do work, and I would think lambdas would be similar if they are not also implemented. It's just tough to see if they are if the functions that take them as arguments aren't implemented.

Losing tensors in datasets

Description

Currently, we cannot track tensors contained in tf.data.Datasets:

import tensorflow as tf


def add(a, b):
  return a + b


dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])

for element in dataset:
    c = add(element, element)

We get:

Oct 11, 2023 11:09:48 AM com.ibm.wala.cast.python.ml.test.TestTensorflowModel testTf2
INFO: Tensor analysis: answer:

Regression

List Literals

Seems to work just fine with lists:

import tensorflow as tf


def add(a, b):
  return a + b


list = [tf.ones([1, 2]), tf.ones([2, 2])]

for element in list:
    c = add(element, element)

We get:

Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.client.PythonTensorAnalysisEngine getDataflowSources
INFO: Added dataflow source [Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list.py.do()LRoot;@103 ], v5]:[Empty].
Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.client.PythonTensorAnalysisEngine getDataflowSources
INFO: Added dataflow source [Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list.py.do()LRoot;@97 ], v5]:[Empty].
Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel testTf2
INFO: Tensor analysis: answer:
[SITE_IN_NODE{<Code body of function Lscript tf2_test_tensor_list.py>:Llist in CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]},<field 0>][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v249][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[SITE_IN_NODE{<Code body of function Lscript tf2_test_tensor_list.py>:Llist in CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]}; [Core[Root]]][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[SITE_IN_NODE{<Code body of function Lscript tf2_test_tensor_list.py>:Llist in CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]},<field 1>][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v257][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v265][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[<field global element>][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v268][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v267][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v270][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v271][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py/add> Context: CallStringContext: [ script tf2_test_tensor_list.py.do()LRoot;@117 ], v2][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py/add> Context: CallStringContext: [ script tf2_test_tensor_list.py.do()LRoot;@117 ], v3][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Ret-V:Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list.py.do()LRoot;@103 ]][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Ret-V:Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list.py.do()LRoot;@97 ]][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list.py.do()LRoot;@103 ], v5][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list.py.do()LRoot;@97 ], v5][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]

Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel lambda$0
WARNING: Encountered: class com.ibm.wala.ipa.callgraph.propagation.InstanceFieldKey
Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel lambda$0
WARNING: Encountered: class com.ibm.wala.cast.ipa.callgraph.ReflectedFieldPointerKey$3
Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel lambda$0
WARNING: Encountered: class com.ibm.wala.ipa.callgraph.propagation.InstanceFieldKey
Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel lambda$0
WARNING: Encountered: class com.ibm.wala.ipa.callgraph.propagation.StaticFieldKey
Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel lambda$0
WARNING: Encountered: class com.ibm.wala.ipa.callgraph.propagation.ReturnValueKey
Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel lambda$0
WARNING: Encountered: class com.ibm.wala.ipa.callgraph.propagation.ReturnValueKey

Lists Without Literals

But, creating a list without using a literal doesn't work:

import tensorflow as tf


def add(a, b):
  return a + b


list = list()

list.append(tf.ones([1, 2]))
list.append(tf.ones([2, 2]))

for element in list:
    c = add(element, element)

We get:

Oct 11, 2023 2:51:26 PM com.ibm.wala.cast.python.ml.client.PythonTensorAnalysisEngine getDataflowSources
INFO: Added dataflow source [Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list3.py.do()LRoot;@106 ], v5]:[Empty].
Oct 11, 2023 2:51:26 PM com.ibm.wala.cast.python.ml.client.PythonTensorAnalysisEngine getDataflowSources
INFO: Added dataflow source [Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list3.py.do()LRoot;@99 ], v5]:[Empty].
Oct 11, 2023 2:51:26 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel testTf2
INFO: Tensor analysis: answer:
[Node: <Code body of function Lscript tf2_test_tensor_list3.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v264][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Ret-V:Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list3.py.do()LRoot;@106 ]][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list3.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v252][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Ret-V:Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list3.py.do()LRoot;@99 ]][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list3.py.do()LRoot;@106 ], v5][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list3.py.do()LRoot;@99 ], v5][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]

Oct 11, 2023 2:51:26 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel lambda$0
WARNING: Encountered: class com.ibm.wala.ipa.callgraph.propagation.ReturnValueKey
Oct 11, 2023 2:51:26 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel lambda$0
WARNING: Encountered: class com.ibm.wala.ipa.callgraph.propagation.ReturnValueKey

Python ModRef analysis says that a constant is modified

Consider the following code:

def g(p1, p2):
    assert p1 == 5 and p2 == 2


def f():
    g(5, p2=2)


f()

The ModRef analysis for Python (com.ibm.wala.cast.python.modref.PythonModRef) is returning that the following locations are "modified" by f():

[[ConstantKey:p2:<PythonLoader,Lstring>]; [Core[Root]]]
[[ConstantKey:p2:<PythonLoader,Lstring>],<field 0>]

But, according to the docs, a ConstantKey is "an instance key which represents a unique, constant object." It would thus seem that instances represented by ConstantKey instance keys can't be returned as modified by the ModRef analysis.

Select option Prepare Plugin Module 'LSP' For Deployment

Where is the "Prepare Plugin Module 'LSP' For Deployment" icon?
Can't find it even I have successfully imported this package.

Starred not type is not handled

In Python 3.5 or above, Starred node is used in Call node with *args, for example:

def say(*a):
    print(a)

b = [1,2]
say(*b)

This will generate Starred node for *b when calling say. Part of the ast is below:

        "args": [
          {
            "value": {
              "id": "b",
              "ctx": {
                "type": "Load",
                "ftype": "Load"
              },
              "extra_attr": true,
              "type": "Name"
            },
            "ctx": {
              "type": "Load",
              "ftype": "Load"
            },
            "extra_attr": true,
            "type": "Starred"
          }
        ]

Missing callbacks related to datasets

Missing experimental_distribute_datasets_from_function

Called here: https://github.com/mead-ml/mead-baseline/blob/5d7632bb151c2d09501ebf49f36ba8c4204df4c8/mead/api_examples/pretrain_discrim_tf.py#L307

The callback function is dataset_test_fn, defined here: https://github.com/mead-ml/mead-baseline/blob/master/mead/api_examples/pretrain_discrim_tf.py#L301-L306

But that function doesn't show up in the call graph. The method reference should be: < PythonLoader, Lscript pretrain_discrim_tf.py/get_dataset, do()LRoot; >. However, we don't see it in the call graph nodes: https://gist.github.com/khatchad/bd48384f3561fdcc7c8c6e9141d93de5

Related to #92 and blocked on #89.

Can't deploy to Maven repository

Maven doesn't like capital letters in the artifact IDs:

ML/pom.xml

Line 7 in 9e935bf

See the Guide to naming conventions on groupId, artifactId, and version
. Workaround patch listed here.

Run Spotless on CI to enforce code formatting

It looks like Spotless has a Maven plugin:

https://github.com/diffplug/spotless/blob/main/plugin-maven/README.md

We should use it and run spotless:check on CI.

Does it support full python grammar?

I am currently doing some analysis of python code (but not only machine learning scripts). I read your paper and it mainly focus on inferring tensor types. I am wondering if this support full python/python3 grammar. I mean if I can convert any python/python3 source code to WALA IR without losing any information (e.g., mapping between WALA IR and python source code)?

Use Git submodules for custom dependency builds

Currently, there are two custom dependency builds:

ML/.github/workflows/continuous-integration.yml

Lines 19 to 40 in 40fae52

	- name: Checkout wala/IDE sources.
	uses: actions/[email protected]
	with:
	repository: wala/IDE
	# fetch-depth: 50
	path: ./IDE
	- name: Checkout juliandolby/jython3 sources.
	uses: actions/[email protected]
	with:
	repository: juliandolby/jython3
	path: ./jython3
	- name: Install Jython3.
	run: \|
	cd ./jython3
	ant
	cd dist
	mvn install:install-file -Dfile=./jython-dev.jar -DgroupId="org.python" -DartifactId="jython3" -Dversion="0.0.1-SNAPSHOT" -Dpackaging="jar" -DgeneratePom=true
	shell: bash
	- name: Install IDE.
	run: \|
	cd ./IDE/com.ibm.wala.cast.lsp
	mvn clean install -B -q -DskipTests

Currently, these dependencies are cloned from GH from master. It would be better instead to specify a specific SHA1 to be checked out as to stabilize the build. Moreover, to build from source, users must examine the build file or https://github.com/wala/ML/blob/master/CONTRIBUTING.md.

I believe that both of these issues may be helped using Git submodules. There, you can specify the clone URL and SHA1 to be used, and the submodules can be cloned along with the main repo. There is also a GH action for submodules (see https://github.com/actions/checkout#user-content-usage).

Can't determine whether an API call is from TensorFlow when more than two functions are in a call chain

When processing the following:

import tensorflow as tf

def func2(t):
  pass

@tf.function
def func():
  a = tf.constant([[1.0, 2.0], [3.0, 4.0]])
  b = tf.constant([[1.0, 1.0], [0.0, 1.0]])
  c = tf.matmul(a, b)
  tensor = tf.Tensor(c.op, 0, tf.float32)
  func2(tensor)

func()

We found an error in determining whether we are dealing with a TensorFlow API. We found that the tensor analysis is empty:

ML/com.ibm.wala.cast.python.ml.test/source/com/ibm/wala/cast/python/ml/test/TestTensorflowModel.java

Line 82 in 93c96d8

TensorTypeAnalysis analysis = E.performAnalysis(builder);

Inheritance not supported

Consider the following example:

class D:
    pass


class C(D):

    def func(self, x):
        return x * x


c = C()
a = c.func(5)
assert a == 25

In the call graph, the callees of the outer script are the following:

Node: synthetic < PythonLoader, Lscript myscript.py/C, do()LRoot; > Context: CallStringContext: [ script myscript.py.do()LRoot;@100 ]
Node: synthetic < PythonLoader, L$script myscript.py/C/func, trampoline2()LRoot; > Context: CallStringContext: [ script myscript.py.do()LRoot;@103 ]

When we dump the call graph, we see:

callees of node Lscript myscript.py : [C, trampoline2]

All good. Now, considering the following example:

class D:

    def func(self, x):
        return x * x


class C(D):
    pass


c = C()
a = c.func(5)
assert a == 25

In the call graph, the callees of the outer script are the following:

Node: synthetic < PythonLoader, Lscript myscript.py/C, do()LRoot; > Context: CallStringContext: [ script myscript.py.do()LRoot;@100 ]

When we dump the call graph, we see:

callees of node Lscript myscript.py : [C]

How to install this wala on pycharm

The image above is the last step in the wala guidance
However, what should I put in the extension box?
Could someone screen record how they install this plugin? (Assuming we got the latest Pycharm & LSP installed properly)

Thank you so much!!

Not checking actual value numbers for tensor parameters in TF2 tests

We are only checking the length of the expected array.

ML/com.ibm.wala.cast.python.ml.test/source/com/ibm/wala/cast/python/ml/test/TestTensorflowModel.java

Lines 306 to 308 in eec6f81

    
           assertEquals(expectedTensorParameterValueNumbers.length, actualValueNumberSet.size()); 
        
           Arrays.stream(expectedTensorParameterValueNumbers) 
        
               .forEach(ev -> actualValueNumberSet.contains(ev));

Missing the open() built-in function

https://docs.python.org/3/library/functions.html#open

Related to #93.

Regression

We'd have to add a PythonType corresponding to files and set that as the return type of this built-in function. The issue would block the addition of the IO libary.

Fail to build due to org.python:jython:jar:2.7.2a1+ is missing

I try to build Ariadne from source code. I have success to build WALA code, IDE. But when build Ariadne, I meet exception:
Failed to execute goal on project com.ibm.wala.cast.python: Could not resolve dependencies for project com.ibm.wala:com.ibm.wala.cast.python:jar:0.0.1-SNAPSHOT: Could not find artifact org.python:jython:jar:2.7.2a1+ in central (https://repo.maven.apache.org/maven2) -> [Help 1]
I checked the maven repository, and find the latest version is 2.7.1b3, I change the jython version in pom as 2.7.1b3, but meet compile error.
So how can I get the 2.7.2a1 version of jython?
Thanks!

Missing the print built-in function summary

Description

Consider the following example input program:

# A.py
def f(x):
  print("Traced with: " + str(x))


f(1)

This is the IR of f():

Node: <Code body of function Lscript A.py/f> Context: CallStringContext: [ script A.py.do()LRoot;@97 ]
<Code body of function Lscript A.py/f>
CFG:
BB0[-1..-2]
    -> BB1
BB1[0..2]
    -> BB2
    -> BB3
BB2[3..4]
    -> BB3
BB3[-1..-2]
Instructions:
BB0
BB1
0   v5 = lexical:print@Lscript A.py          A.py [2:2] -> [2:7]
1   v10 = lexical:str@Lscript A.py           A.py [2:26] -> [2:29]
2   v8 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v10,v2 @2 exception:v11A.py [2:26] -> [2:32] [2=[x]]
BB2
3   v6 = binaryop(add) v7:#Traced with:  , v8A.py [2:8] -> [2:32]
4   v3 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v5,v6 @4 exception:v12A.py [2:2] -> [2:33]
BB3

Above, v5 is the print() function, while v10 is the str() function. However, in the pointer analysis, I am seeing the following:

  [Node: <Code body of function Lscript A.py/f> Context: CallStringContext: [ script A.py.do()LRoot;@97 ], v5] ->
  [Node: <Code body of function Lscript A.py/f> Context: CallStringContext: [ script A.py.do()LRoot;@97 ], v10] ->
     [com.ibm.wala.cast.python.ipa.summaries.BuiltinFunctions$BuiltinFunction@1724e9f5]

The points-to set for v5 in f() of A.py is empty, while the points-to set of v10 contains the built-in.

Regression

I believe that the print() function needs to be added here:

ML/com.ibm.wala.cast.python/source/com/ibm/wala/cast/python/ipa/summaries/BuiltinFunctions.java

Lines 274 to 289 in 1b1ffac

    
           static { 
        
             //		builtinFunctions.put("enumerate", Either.forLeft(PythonTypes.enumerate)); 
        
             builtinFunctions.put("enumerate", Either.forRight(2)); 
        
             builtinFunctions.put("int", Either.forLeft(TypeReference.Int)); 
        
             builtinFunctions.put("round", Either.forLeft(TypeReference.Int)); 
        
             builtinFunctions.put("len", Either.forLeft(TypeReference.Int)); 
        
             builtinFunctions.put("list", Either.forLeft(PythonTypes.list)); 
        
             builtinFunctions.put("range", Either.forLeft(PythonTypes.list)); 
        
             builtinFunctions.put("sorted", Either.forLeft(PythonTypes.list)); 
        
             builtinFunctions.put("str", Either.forLeft(PythonTypes.string)); 
        
             builtinFunctions.put("sum", Either.forLeft(TypeReference.Int)); 
        
             builtinFunctions.put("type", Either.forLeft(PythonTypes.object)); 
        
             builtinFunctions.put("zip", Either.forLeft(PythonTypes.list)); 
        
             builtinFunctions.put("slice", Either.forRight(2)); 
        
             builtinFunctions.put("__delete__", Either.forRight(2)); 
        
           }

Add test cases using varying import statements

ML/com.ibm.wala.cast.python.test/data/tf2.py

Lines 1 to 7 in 93c96d8

    
           import tensorflow as tf 
        
           def add(a, b): 
        
             return a + b 
        
           c = add(tf.ones([1, 2]), tf.ones([2, 2]))  #  [[2., 2.], [2., 2.]]

Please add a test case that imports tf.ones in a different way, i.e., from tensorflow import ones (or something like that) and then change the code so that it reads:

c = add(ones([1, 2]), ones([2, 2]))

Missing callbacks

Missing Strategy.run()

Called here: https://github.com/mead-ml/mead-baseline/blob/5d7632bb151c2d09501ebf49f36ba8c4204df4c8/mead/api_examples/pretrain_discrim_tf.py#L414.

The callback function _replicated_train_step() is defined here: https://github.com/mead-ml/mead-baseline/blob/5d7632bb151c2d09501ebf49f36ba8c4204df4c8/mead/api_examples/pretrain_discrim_tf.py#L394-L405.

But, we don't see it in the call graph. The method reference should be: < PythonLoader, Lscript pretrain_discrim_tf.py/train/_replicated_train_step, do()LRoot; >. The call graph nodes: https://gist.github.com/khatchad/ab56e3be454103829275c7507999b7d1

Regression

There are some callbacks defined in tensorflow.xml, as well as others. I see the return values are specified, but I wonder if adding the callback summary will also consider the given function argument as invoked:

ML/com.ibm.wala.cast.python.ml/data/tensorflow.xml

Lines 758 to 766 in 1b1ffac

    
           <package name="tensorflow/estimator"> 
        
             <class name="Estimator" allocatable="true"> 
        
               <method name="do" descriptor="()LRoot;" numArgs="2" paramNames="self model"> 
        
                 <new def="x" class="Ltensorflow/estimator/train/train" /> 
        
                 <putfield class="LRoot" field="train" fieldType="LRoot" ref="self" value="x" /> 
        
                 <putfield class="LRoot" field="$callback" fieldType="LRoot" ref="x" value="model" /> 
        
                 <return value="arg0" /> 
        
               </method> 
        
             </class>

ML/com.ibm.wala.cast.python.ml/data/tensorflow.xml

Lines 778 to 787 in 1b1ffac

    
           <package name="tensorflow/estimator/train"> 
        
             <class name="train" allocatable="true"> 
        
               <method name="do" descriptor="()LRoot;" numArgs="3"> 
        
                 <getfield class="LRoot" field="$callback" fieldType="LRoot" ref="arg0" def="xx" /> 
        
                 <getfield class="LRoot" field="data" fieldType="LRoot" ref="arg1" def="data" /> 
        
                 <call class="LRoot" name="do" descriptor="()LRoot;" type="virtual" arg0="xx" arg1="data" arg2="3" numArgs="3" def="v" /> 
        
                 <return value="v" /> 
        
               </method> 
        
             </class> 
        
           </package>

The ones there are for estimators (looks like there's two definitions). I wonder if we just need to add more for, e.g., TF2 APIs. I found these docs for the estimator.

Implicit `call()` for models/layers not working

Bug in processing the call for call(...) when calling the model (line 36)

ML/com.ibm.wala.cast.python.test/data/tf2_test_model_call2.py

Lines 6 to 36 in eec6f81

    
           class SequentialModel(tf.keras.Model): 
        
             def __init__(self, **kwargs): 
        
               super(SequentialModel, self).__init__(**kwargs) 
        
               self.flatten = tf.keras.layers.Flatten(input_shape=(28, 28)) 
        
               # Add a lot of small layers 
        
               num_layers = 100 
        
               self.my_layers = [tf.keras.layers.Dense(64, activation="relu") 
        
                                 for n in range(num_layers)] 
        
               self.dropout = tf.keras.layers.Dropout(0.2) 
        
               self.dense_2 = tf.keras.layers.Dense(10) 
        
             def call(self, x): 
        
               x = self.flatten(x) 
        
               for layer in self.my_layers: 
        
                 x = layer(x) 
        
               x = self.dropout(x) 
        
               x = self.dense_2(x) 
        
               return x 
        
           input_data = tf.random.uniform([20, 28, 28]) 
        
           model = SequentialModel() 
        
           result = model(input_data)

Similar to #24 for __call__, the call(..) method does not appear in the callgraph.

Issues with global variables defined in the module code

Hello,
I have some issues with variables defined in the module code(which are global). If they are constants they are never written as such with a AstGlobalWrite instruction but if they are used in a function of the same module, a constant var is used with the correct value.
If they are not constants they do not have a corresponding AstGlobalWrite instruction unless that variable is used on a function defined in the same module.
example:

#glovar.py
globVar = "Foo"
globVar2 = int(3)

Produces:

BB0
BB1
0   v5 = invokestatic < PythonLoader, Lstr, import()Lstr; > @0 exception:v6glovar.py [4:0] -> [5:17] [5=[str]]
1   v8 = invokestatic < PythonLoader, Lfloat, import()Lfloat; > @1 exception:v9glovar.py [4:0] -> [5:17] [8=[float]]
2   v11 = invokestatic < PythonLoader, Lint, import()Lint; > @2 exception:v12glovar.py [4:0] -> [5:17] [11=[int]]
3   v14 = invokestatic < PythonLoader, L__name__, import()L__name__; > @3 exception:v15glovar.py [4:0] -> [5:17] [14=[__name__]]
4   v17 = invokestatic < PythonLoader, LFalse, import()LFalse; > @4 exception:v18glovar.py [4:0] -> [5:17] [17=[False]]
5   v20 = invokestatic < PythonLoader, LTrue, import()LTrue; > @5 exception:v21glovar.py [4:0] -> [5:17] [20=[True]]
6   v23 = invokestatic < PythonLoader, LNone, import()LNone; > @6 exception:v24glovar.py [4:0] -> [5:17] [23=[None]]
7   v26 = invokestatic < PythonLoader, Lprint, import()Lprint; > @7 exception:v27glovar.py [4:0] -> [5:17] [26=[print]]
8   v29 = invokestatic < PythonLoader, Lsuper, import()Lsuper; > @8 exception:v30glovar.py [4:0] -> [5:17] [29=[super]]
9   v32 = invokestatic < PythonLoader, Llen, import()Llen; > @9 exception:v33glovar.py [4:0] -> [5:17] [32=[len]]
10   v35 = invokestatic < PythonLoader, Lopen, import()Lopen; > @10 exception:v36glovar.py [4:0] -> [5:17] [35=[open]]
11   v38 = invokestatic < PythonLoader, Lhasattr, import()Lhasattr; > @11 exception:v39glovar.py [4:0] -> [5:17] [38=[hasattr]]
12   v41 = invokestatic < PythonLoader, LBaseException, import()LBaseException; > @12 exception:v42glovar.py [4:0] -> [5:17] [41=[BaseException]]
13   v44 = invokestatic < PythonLoader, Labs, import()Labs; > @13 exception:v45glovar.py [4:0] -> [5:17] [44=[abs]]
14   v47 = invokestatic < PythonLoader, Lrange, import()Lrange; > @14 exception:v48glovar.py [4:0] -> [5:17] [47=[range]]
16   v51 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v11,v52:#3.0 @16 exception:v53glovar.py [5:11] -> [5:17] [51=[globVar2]11=[int]]
BB2

while:

#glovar.py
def printVar():
	print globVar + globVar2

globVar = "Foo"
globVar2 = int(3)

Produces:

module code:
BB0
BB1
0   v5 = invokestatic < PythonLoader, Lstr, import()Lstr; > @0 exception:v6glovar.py [1:0] -> [5:17] [5=[str]]
1   v8 = invokestatic < PythonLoader, Lfloat, import()Lfloat; > @1 exception:v9glovar.py [1:0] -> [5:17] [8=[float]]
2   v11 = invokestatic < PythonLoader, Lint, import()Lint; > @2 exception:v12glovar.py [1:0] -> [5:17] [11=[int]]
3   v14 = invokestatic < PythonLoader, L__name__, import()L__name__; > @3 exception:v15glovar.py [1:0] -> [5:17] [14=[__name__]]
4   v17 = invokestatic < PythonLoader, LFalse, import()LFalse; > @4 exception:v18glovar.py [1:0] -> [5:17] [17=[False]]
5   v20 = invokestatic < PythonLoader, LTrue, import()LTrue; > @5 exception:v21glovar.py [1:0] -> [5:17] [20=[True]]
6   v23 = invokestatic < PythonLoader, LNone, import()LNone; > @6 exception:v24glovar.py [1:0] -> [5:17] [23=[None]]
7   v26 = invokestatic < PythonLoader, Lprint, import()Lprint; > @7 exception:v27glovar.py [1:0] -> [5:17] [26=[print]]
8   v29 = invokestatic < PythonLoader, Lsuper, import()Lsuper; > @8 exception:v30glovar.py [1:0] -> [5:17] [29=[super]]
9   v32 = invokestatic < PythonLoader, Llen, import()Llen; > @9 exception:v33glovar.py [1:0] -> [5:17] [32=[len]]
10   v35 = invokestatic < PythonLoader, Lopen, import()Lopen; > @10 exception:v36glovar.py [1:0] -> [5:17] [35=[open]]
11   v38 = invokestatic < PythonLoader, Lhasattr, import()Lhasattr; > @11 exception:v39glovar.py [1:0] -> [5:17] [38=[hasattr]]
12   v41 = invokestatic < PythonLoader, LBaseException, import()LBaseException; > @12 exception:v42glovar.py [1:0] -> [5:17] [41=[BaseException]]
13   v44 = invokestatic < PythonLoader, Labs, import()Labs; > @13 exception:v45glovar.py [1:0] -> [5:17] [44=[abs]]
14   v47 = invokestatic < PythonLoader, Lrange, import()Lrange; > @14 exception:v48glovar.py [1:0] -> [5:17] [47=[range]]
15   v50 = new <PythonLoader,Lscript glovar.py/printVar>@15glovar.py [1:0] -> [5:17] [50=[printVar]]
17   v55 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v11,v56:#3.0 @17 exception:v57glovar.py [5:11] -> [5:14] [11=[int]]
18   global:global globVar2 = v55            glovar.py [5:0] -> [5:1]
BB2


printVar:
BB0
BB1
0   v6 = global:global globVar2              glovar.py [2:17] -> [2:25]
1   v3 = binaryop(add) v4:#Foo , v6glovar.py [2:7] -> [2:25]
BB2

They should always have an AstGlobalWrite instruction because they can be imported by other modules that import the one they are declared in.

An interesting thing I found is that if I alter the source code, changing the order of the functions, the IR produced is different:

#glovar.py
globVar = "Foo"
globVar2 = int(3)

def printVar():
	print globVar + globVar2

The IR it produces changes to:

module code:
BB0
BB1
0   v5 = invokestatic < PythonLoader, Lstr, import()Lstr; > @0 exception:v6glovar.py [1:0] -> [5:32] [5=[str]]
1   v8 = invokestatic < PythonLoader, Lfloat, import()Lfloat; > @1 exception:v9glovar.py [1:0] -> [5:32] [8=[float]]
2   v11 = invokestatic < PythonLoader, Lint, import()Lint; > @2 exception:v12glovar.py [1:0] -> [5:32] [11=[int]]
3   v14 = invokestatic < PythonLoader, L__name__, import()L__name__; > @3 exception:v15glovar.py [1:0] -> [5:32] [14=[__name__]]
4   v17 = invokestatic < PythonLoader, LFalse, import()LFalse; > @4 exception:v18glovar.py [1:0] -> [5:32] [17=[False]]
5   v20 = invokestatic < PythonLoader, LTrue, import()LTrue; > @5 exception:v21glovar.py [1:0] -> [5:32] [20=[True]]
6   v23 = invokestatic < PythonLoader, LNone, import()LNone; > @6 exception:v24glovar.py [1:0] -> [5:32] [23=[None]]
7   v26 = invokestatic < PythonLoader, Lprint, import()Lprint; > @7 exception:v27glovar.py [1:0] -> [5:32] [26=[print]]
8   v29 = invokestatic < PythonLoader, Lsuper, import()Lsuper; > @8 exception:v30glovar.py [1:0] -> [5:32] [29=[super]]
9   v32 = invokestatic < PythonLoader, Llen, import()Llen; > @9 exception:v33glovar.py [1:0] -> [5:32] [32=[len]]
10   v35 = invokestatic < PythonLoader, Lopen, import()Lopen; > @10 exception:v36glovar.py [1:0] -> [5:32] [35=[open]]
11   v38 = invokestatic < PythonLoader, Lhasattr, import()Lhasattr; > @11 exception:v39glovar.py [1:0] -> [5:32] [38=[hasattr]]
12   v41 = invokestatic < PythonLoader, LBaseException, import()LBaseException; > @12 exception:v42glovar.py [1:0] -> [5:32] [41=[BaseException]]
13   v44 = invokestatic < PythonLoader, Labs, import()Labs; > @13 exception:v45glovar.py [1:0] -> [5:32] [44=[abs]]
14   v47 = invokestatic < PythonLoader, Lrange, import()Lrange; > @14 exception:v48glovar.py [1:0] -> [5:32] [47=[range]]
16   v51 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v11,v52:#3.0 @16 exception:v53glovar.py [2:11] -> [2:17] [51=[globVar2]11=[int]]
18   v56 = new <PythonLoader,Lscript glovar.py/printVar>@18glovar.py [1:0] -> [5:32] [56=[printVar]]
BB2


printVar:
BB0
BB1
0   v6 = lexical:globVar2@Lscript glovar.py  glovar.py [5:17] -> [5:25]
1   v3 = binaryop(add) v4:#Foo , v6          glovar.py [5:7] -> [5:25]
BB2

What is the reason the IR is different in this case?
Thanks in advance,
Sifis.

Edit: Another issue of the IR visible in these examples is the fact that print statements do not produce a method invocation.

IR incorrect for list comprehension

with COMPREHENSION_IR=true, the generated IR has no filter part.
e.g., l = [e*e for e in l if e >= 2]

The CAst is like:

    ASSIGN at test1.py [14:0] -> [14:1]
      VAR at test1.py [14:0] -> [14:1]
        "l" at test1.py [14:0] -> [14:1]
      COMPREHENSION_EXPR at test1.py [14:5] -> [14:8]
        NEW at test1.py [14:5] -> [14:8]
          "list" at test1.py [14:5] -> [14:8]
        FUNCTION_EXPR
          "com.ibm.wala.cast.python.ast.PythonParser$CAstVisitor$defineFunction$func$1@262b2c86"
        EXPR_LIST at test1.py [14:5] -> [14:8]
          VAR at test1.py [14:18] -> [14:19]
            "l" at test1.py [14:18] -> [14:19]
        EXPR_LIST at test1.py [14:5] -> [14:8]
          FUNCTION_EXPR
            "com.ibm.wala.cast.python.ast.PythonParser$CAstVisitor$defineFunction$func$1@5ed828d"

function: comprehension1
(  the function  e  )
RETURN
  BLOCK_EXPR
    BINARY_EXPR at test1.py [14:5] -> [14:8]
      "*" at test1.py [14:5] -> [14:8]
      VAR at test1.py [14:5] -> [14:6]
        "e" at test1.py [14:5] -> [14:6]
      VAR at test1.py [14:7] -> [14:8]
        "e" at test1.py [14:7] -> [14:8]

function: filter2
(  the function  e  )
RETURN
  BLOCK_EXPR
    IF_EXPR at test1.py [14:23] -> [14:29]
      BINARY_EXPR at test1.py [14:23] -> [14:29]
        ">=" at test1.py [14:23] -> [14:29]
        VAR at test1.py [14:23] -> [14:24]
          "e" at test1.py [14:23] -> [14:24]
        VAR at test1.py [14:23] -> [14:29]
          "0" at test1.py [14:23] -> [14:29]
      "true" at test1.py [14:23] -> [14:29]
      "false" at test1.py [14:23] -> [14:29]

But the IR is like:

25   v47 = new <PythonLoader,Llist>@25       test1.py [14:5] -> [14:8]
26   v50 = new <PythonLoader,Lscript test1.py/comprehension1>@26test1.py [14:5] -> [14:8]
27   global:global script test1.py/comprehension1 = v50test1.py [14:5] -> [14:8]
28   putfield v1.< PythonLoader, LRoot, comprehension1, <PythonLoader,LRoot> > = v50test1.py [14:5] -> [14:8]
29   v52 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v50,v47,v39 @29 exception:v53test1.py [14:5] -> [14:8] [52=[l]39=[l]]
BB2

<Code body of function Lscript test1.py/comprehension1>
CFG:
BB0[-1..-2]
    -> BB1
BB1[0..1]
    -> BB2
BB2[-1..-2]
Instructions:
BB0
BB1
0   v3 = binaryop(mul) v2 , v2               test1.py [14:5] -> [14:8] [2=[e]]
1   return v3                                test1.py [14:5] -> [14:8]
BB2

The condition "e >= 2" is missing in the IR.

Any hint on how to fix this?

Adding custom entry points to generate callgraph

Hi,

I am using PythonAnalysisEngine to generate a callgraph for Python programs. However, when I give multiple input files which does not have a main function in them, I do not get any edges in the callgraph (I just generated a callgraph for a single file with and without main function and compared the edges in the callgraph. When the input file doesn't contain a main function, then I get few to no edges in the call graph).

I want to give custom entrypoints along with the main function, for example, all the functions in all the input files as the entrypoints and I tried the following code in PythonAnalysisEngine by overriding makeDefaultEntrypoints:

   @Override
    protected Iterable<Entrypoint> makeDefaultEntrypoints(AnalysisScope scope, IClassHierarchy cha) {
        Set<Entrypoint> result = HashSetFactory.make();
        for(Module m : moduleFiles) {
            IClass entry = cha.lookupClass(TypeReference.findOrCreate(PythonTypes.pythonLoader,
                    TypeName.findOrCreate(scriptName(m))));
            assert entry != null: "bad root name " + scriptName(m) + ":\n" + cha;
            MethodReference er = MethodReference.findOrCreate(entry.getReference(), AstMethodReference.fnSelector);
            System.out.println("Root Entry Point: " + er.toString());
            result.add(new DefaultEntrypoint(er, cha));
        }

        //Begin - I added the following code
        for (IClass klass : cha) {
            for (IMethod method : klass.getDeclaredMethods()) {
                MethodReference er = method.getReference();
                result.add(new DefaultEntrypoint(er, cha));
            }
        }
        //End
        return result;
    }

I get the following exception:

java.lang.IllegalArgumentException: Illegal i: -1
	at com.ibm.wala.ssa.SymbolTable.ensureSymbol(SymbolTable.java:189)
	at com.ibm.wala.ipa.summaries.SyntheticIR.updateForInstruction(SyntheticIR.java:120)
	at com.ibm.wala.ipa.summaries.SyntheticIR.makeSymbolTable(SyntheticIR.java:95)
	at com.ibm.wala.ipa.summaries.SyntheticIR.<init>(SyntheticIR.java:55)
	at com.ibm.wala.ipa.summaries.SummarizedMethodWithNames$SyntheticIRWithNames.<init>(SummarizedMethodWithNames.java:141)
	at com.ibm.wala.ipa.summaries.SummarizedMethodWithNames.makeIR(SummarizedMethodWithNames.java:187)
	at com.ibm.wala.ipa.summaries.SyntheticIRFactory.makeIR(SyntheticIRFactory.java:34)
	at com.ibm.wala.ssa.DefaultIRFactory.makeIR(DefaultIRFactory.java:66)
	at com.ibm.wala.cast.ir.ssa.AstIRFactory$AstDefaultIRFactory.makeIR(AstIRFactory.java:59)
	at com.ibm.wala.ssa.SSACache.findOrCreateIR(SSACache.java:69)
	at com.ibm.wala.ipa.callgraph.AnalysisCache.getIR(AnalysisCache.java:74)
	at com.ibm.wala.ipa.callgraph.propagation.cfa.ContextInsensitiveSSAInterpreter.getIR(ContextInsensitiveSSAInterpreter.java:45)
	at com.ibm.wala.ipa.callgraph.propagation.cfa.ContextInsensitiveSSAInterpreter.getIRView(ContextInsensitiveSSAInterpreter.java:50)
	at com.ibm.wala.ipa.callgraph.propagation.cfa.DelegatingSSAContextInterpreter.getIRView(DelegatingSSAContextInterpreter.java:64)
	at com.ibm.wala.ipa.callgraph.propagation.SSAPropagationCallGraphBuilder.unconditionallyAddConstraintsFromNode(SSAPropagationCallGraphBuilder.java:208)
	at com.ibm.wala.ipa.callgraph.propagation.SSAPropagationCallGraphBuilder.addConstraintsFromNode(SSAPropagationCallGraphBuilder.java:190)
	at com.ibm.wala.ipa.callgraph.propagation.PropagationCallGraphBuilder.addConstraintsFromNewNodes(PropagationCallGraphBuilder.java:311)
	at com.ibm.wala.ipa.callgraph.propagation.StandardSolver.solve(StandardSolver.java:56)
	at com.ibm.wala.ipa.callgraph.propagation.PropagationCallGraphBuilder.makeCallGraph(PropagationCallGraphBuilder.java:251)
	at adapters.WalaNCFA.getStaticCallGraph(WalaNCFA.java:83)

Could you please help me in pointing out if there is another way of achieving this (giving all methods as entrypoints to generate a callgraph) or Is there any thing that I am missing in the above code?

Thanks in advance.

Support TF2

Support tensor analysis from tensors created by TF2 APIs.

Supporting Python 3 causes tensor dimensions not to be calculated

Replacing com.ibm.wala.cast.python.jython.test with com.ibm.wala.cast.python.jython3.test as a dependency in com.ibm.wala.cast.python.ml.test/pom.xml causes com.ibm.wala.cast.python.ml.test.TestNeuroImageExamples.testEx1CG to fail. Specifically, tensor dimensions aren't being calculated because Jython3 is not doing a constant propagation that Jython is doing. Found during https://github.com/ponder-lab/ML/issues/4#issuecomment-1558310839. See https://github.com/ponder-lab/ML/issues/4#issuecomment-1559961372 for more details.

Instructions on how to use Ariadne on PyCharm

Hello,
I want to use Ariadne with PyCharm. The instructions in https://wala.github.io/IDE/pycharm/ seem incomplete. Specifically they refer to a script file that does not exist in my projects. Could you provide additional instructions?
Thanks,
Sifis.

Log the tensor analysis results in the test code

Add a log of the analysis after the below statement:

https://github.com/wala/ML/blob/f732d04826714a2e156eb212db37df31cb0c73f4/com.ibm.wala.cast.python.ml.test/source/com/ibm/wala/cast/python/ml/test/TestTensorflowModel.java#L128C5-L128C23

Reference Jython repo as a fork in the WALA org

Just in case @msridhar wants to reference the Jython repo as a fork in the WALA org.

Originally posted by @khatchad in #87 (comment)

More than one function with the same name in different modules.

Hello,
I'm just starting to use the com.ibm.wala.cast.python project(my aim is to produce facts for a declarative static analysis), so my issues may be simply cause by wrong usage of the API.
From what I understand each function is modeled as an IClass in the Class Hierarchy. Creating a Class Hierarchy for a scope of more than one modules can cause problems when two files declare the same function, only one is kept in the hierarchy. Is this intended? I could solve this by creating a different scope/hierarchy for each module but I'm not sure if this is the correct way to do this.
Also how does your front end deal with imports? I haven't been able to find something about it and it seems to be essential for resolving calls to the correct function.
Lastly, are there any notes or documentation for these projects available?
Thanks in advance,
Sifis.

The superclass tf.keras.Model is missing

Consider the following input code:

https://github.com/wala/ML/blob/eec6f816f0c95ce23778d30eb71baf1d4edf0f24/com.ibm.wala.cast.python.test/data/tf2_test_model_call.py

Class SequentialModel extends Model, but I am seeing the following as the superName in the debugger:

superName	TypeName  (id=220)	
Lobject

Regression

It could be that tf.keras.Model isn't summarized.

Run the test Python file

For each unit test that takes a Python file as input, run the Python file and check the status code. Unless we are testing invalid Python files, we should get a status code of 0. This would enable checking the usage of assert statements in the Python files.

getSourceFileName for PythonClasses

Hello,
Classes of PythonClass type in the class hierarchy give a NullPointerException upon calling getSourceFileName(). When they are created in PythonLoader::defineType() a null Position is given. Is getting the Position from PythonCAstToIRTranslator::defineType's type argument and passing it to PythonLoader::defineType() enough to fix this?

Thanks in advance,
Sifis.

Edit: Fixed via pull request #4

`call` not supported

__call__ doesn't show up in the call graph.

Using com.ibm.wala.cast.python.ml as a library is problematic due to Maven Shade plug-in

com.ibm.wala.cast.python currently uses the Maven Shade plug-in:

ML/com.ibm.wala.cast.python.ml/pom.xml

Lines 65 to 100 in a8e1d8e

    
           <plugins> 
        
             <plugin> 
        
               <groupId>org.apache.maven.plugins</groupId> 
        
               <artifactId>maven-shade-plugin</artifactId> 
        
               <version>2.3</version> 
        
               <configuration> 
        
               <filters> 
        
                 <filter> 
        
                   <artifact>*:*</artifact> 
        
                   <excludes> 
        
                     <exclude>META-INF/*.SF</exclude> 
        
                     <exclude>META-INF/*.DSA</exclude> 
        
                     <exclude>META-INF/*.RSA</exclude> 
        
                   </excludes> 
        
                 </filter> 
        
               </filters> 
        
               <!-- Additional configuration. --> 
        
               </configuration> 
        
               <executions> 
        
                 <!-- Run shade goal on package phase --> 
        
                 <execution> 
        
                   <phase>package</phase> 
        
                   <goals> 
        
                     <goal>shade</goal> 
        
                   </goals> 
        
                   <configuration> 
        
                     <transformers> 
        
                       <!-- add Main-Class to manifest file --> 
        
                       <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> 
        
                       <mainClass>com.ibm.wala.cast.python.ml.driver.Ariadne</mainClass> 
        
                       </transformer> 
        
                     </transformers> 
        
                   </configuration> 
        
                 </execution> 
        
               </executions> 
        
             </plugin>

I believe it does this for two reasons:

Produce an executable JAR by adding the appropriate MANIFEST.mf file.
Produce a standalone JAR that includes all of the dependencies ("uber" JAR).

However, this is problematic for using com.ibm.wala.cast.python.ml as a library:

If the uber JAR is reused as a dependency of some other project, directly including classes from the artifact's dependencies in the uber JAR can cause class loading conflicts due to duplicate classes on the class path.

True, False and None values.

Hello,
I noticed that True, False and None were not treated as constants but as variables.
The cause of this issue is that in python 2.7 they are translated to ast.name nodes just like variables.
To fix that I added a check in PythonParser::visitName() for each of these values and create constant nodes for them.
I chose to model None as a null constant object in order not to have to make changes to the cast project code.
You can inspect the change in sifislag@2d5fee2.
This change was added automatically to my pending pull request #5 .
Thanks,
Sifis.

`IllegalStateException` when using `from`...`import`

For the following example:

from tensorflow import random

def add(a, b):
  return a + b


c = add(random.uniform([1, 2]), random.uniform([2, 2]))

We get the following error:

[ERROR]   TestTensorflowModel.testTf2:98->testTf2:113 » IllegalState Encountered unexpected API call.

Use Google Java Format and then enforce the formatting via spotless

          Minor nit: I really prefer we always use braces for `if` and `else` blocks.  It's ok for now, but we should add a task to re-format Java source in this repo using Google Java Format and then enforce the formatting via spotless.

Originally posted by @msridhar in #53 (comment)

WALA dependency versions

Is it possible to use this tool with WALA of version lower than 1.4.x?

I tried importing WALA 1.5.5 into Eclipse, but it has thousands of errors. I tried different Eclipse versions but still lots of errors. I saw in 1,4,x branch there are already Eclipse .project file in the repo, so it may work. But in the pom.xml of this repo, the version of WALA dependency is fixed as 1.5.5. So does this repo uses any features only available in WALA 1.5.5?

`TypeInference` not working

I am getting a class cast exception when trying to use TypeInference on Python code. The instruction causing the exception is as follows:

s	PythonInvokeInstruction  (id=106)
10 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > 3,11,2 @5 exception:12

The problem occurs in com.ibm.wala.cast.python.ssa.PythonInvokeInstruction.visit(IVisitor). It tries to cast a TypeInference$TypeOperatorFactory (the type of v below) to a com.ibm.wala.cast.python.ssa.PythonInstructionVisitor:

ML/com.ibm.wala.cast.python/source/com/ibm/wala/cast/python/ssa/PythonInvokeInstruction.java

Line 121 in a8e1d8e

((PythonInstructionVisitor)v).visitPythonInvoke(this);

Can't resolve wildcard imports

The following is valid Python code, but currently, we cannot resolve the wildcard import. In this example, ones() cannot be resolved.

from tensorflow import *

def add(a, b):
  return a + b

c = add(ones([1, 2]), ones([2, 2]))

Add Strategy.reduce() to XML summaries

The API at https://www.tensorflow.org/versions/r2.9/api_docs/python/tf/distribute/Strategy#reduce returns a tensor. Can we track it? We need a summary for the return value (it's a Tensor):

From the docs:

Returns A Tensor that's the concatenation of value across replicas along axis dimension.

Example from the docs:

strategy = tf.distribute.MirroredStrategy(["GPU:0", "GPU:1"])
# A DistributedValues with component tensor of shape (2, 1) on each replica
distributed_values = strategy.experimental_distribute_values_from_function(lambda _: tf.identity(tf.constant([[1], [2]])))
@tf.function
def run():
  return strategy.gather(distributed_values, axis=0)
run()

Default parameter values of methods.

Hello,

Attempting to generate IR for methods with default parameters I get the following error:

com.ibm.wala.util.debug.UnimplementedError: TypeContext.cfg()
        at com.ibm.wala.util.debug.Assertions.UNREACHABLE(Assertions.java:55)
        at com.ibm.wala.cast.ir.translator.AstTranslator$TypeContext.cfg(AstTranslator.java:2746)
        at com.ibm.wala.cast.ir.translator.AstTranslator.doGlobalRead(AstTranslator.java:426)
        at com.ibm.wala.cast.python.ir.PythonCAstToIRTranslator.leaveVar(PythonCAstToIRTranslator.java:326)
        at com.ibm.wala.cast.python.ir.PythonCAstToIRTranslator.leaveVar(PythonCAstToIRTranslator.java:60)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visit(CAstVisitor.java:536)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visitEntities(CAstVisitor.java:188)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visitScopedEntities(CAstVisitor.java:139)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visitScopedEntities(CAstVisitor.java:131)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visitEntities(CAstVisitor.java:182)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visitScopedEntities(CAstVisitor.java:139)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visit(CAstVisitor.java:898)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visitChildren(CAstVisitor.java:407)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visitAllChildren(CAstVisitor.java:415)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visit(CAstVisitor.java:472)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visitEntities(CAstVisitor.java:222)
        at com.ibm.wala.cast.ir.translator.AstTranslator.walkEntities(AstTranslator.java:4794)
        at com.ibm.wala.cast.ir.translator.AstTranslator.translate(AstTranslator.java:4943)
        at com.ibm.wala.cast.loader.CAstAbstractModuleLoader.init(CAstAbstractModuleLoader.java:127)
        at com.ibm.wala.cast.loader.SingleClassLoaderFactory.getLoader(SingleClassLoaderFactory.java:42)
        at com.ibm.wala.ipa.cha.ClassHierarchy.<init>(ClassHierarchy.java:267)
        at com.ibm.wala.ipa.cha.ClassHierarchy.<init>(ClassHierarchy.java:209)
        at com.ibm.wala.ipa.cha.SeqClassHierarchyFactory.make(SeqClassHierarchyFactory.java:51)
        ...

Minimal example:

class MyC(object):
    def __init__(self, b=3):
        self.b = b

I think the problem is found in com.ibm.wala.cast/.../CAstVisitor.java line 186++ but I was not sure of what the appropriate way to fix it is.

Thanks,
Sifis.

TF2?

Does Ariadne work on TensorFlow 2 code?

Port Travis CI stuff over to GitHub Actions

Port the Travis CI stuff over to GitHub Actions.

The file that should change is https://github.com/wala/ML/blob/master/.travis.yml.

          Does Travis CI even still run for this project?  Might be good to port the CI stuff over to GitHub Actions

Originally posted by @msridhar in #35 (comment)

	assertEquals(expectedTensorParameterValueNumbers.length, actualValueNumberSet.size());
	Arrays.stream(expectedTensorParameterValueNumbers)
	.forEach(ev -> actualValueNumberSet.contains(ev));

	static {
	// builtinFunctions.put("enumerate", Either.forLeft(PythonTypes.enumerate));
	builtinFunctions.put("enumerate", Either.forRight(2));
	builtinFunctions.put("int", Either.forLeft(TypeReference.Int));
	builtinFunctions.put("round", Either.forLeft(TypeReference.Int));
	builtinFunctions.put("len", Either.forLeft(TypeReference.Int));
	builtinFunctions.put("list", Either.forLeft(PythonTypes.list));
	builtinFunctions.put("range", Either.forLeft(PythonTypes.list));
	builtinFunctions.put("sorted", Either.forLeft(PythonTypes.list));
	builtinFunctions.put("str", Either.forLeft(PythonTypes.string));
	builtinFunctions.put("sum", Either.forLeft(TypeReference.Int));
	builtinFunctions.put("type", Either.forLeft(PythonTypes.object));
	builtinFunctions.put("zip", Either.forLeft(PythonTypes.list));
	builtinFunctions.put("slice", Either.forRight(2));
	builtinFunctions.put("__delete__", Either.forRight(2));
	}

	import tensorflow as tf

	def add(a, b):
	return a + b


	c = add(tf.ones([1, 2]), tf.ones([2, 2])) # [[2., 2.], [2., 2.]]

	<package name="tensorflow/estimator">
	<class name="Estimator" allocatable="true">
	<method name="do" descriptor="()LRoot;" numArgs="2" paramNames="self model">
	<new def="x" class="Ltensorflow/estimator/train/train" />
	<putfield class="LRoot" field="train" fieldType="LRoot" ref="self" value="x" />
	<putfield class="LRoot" field="$callback" fieldType="LRoot" ref="x" value="model" />
	<return value="arg0" />
	</method>
	</class>

	<package name="tensorflow/estimator/train">
	<class name="train" allocatable="true">
	<method name="do" descriptor="()LRoot;" numArgs="3">
	<getfield class="LRoot" field="$callback" fieldType="LRoot" ref="arg0" def="xx" />
	<getfield class="LRoot" field="data" fieldType="LRoot" ref="arg1" def="data" />
	<call class="LRoot" name="do" descriptor="()LRoot;" type="virtual" arg0="xx" arg1="data" arg2="3" numArgs="3" def="v" />
	<return value="v" />
	</method>
	</class>
	</package>

	class SequentialModel(tf.keras.Model):

	def __init__(self, **kwargs):
	super(SequentialModel, self).__init__(**kwargs)

	self.flatten = tf.keras.layers.Flatten(input_shape=(28, 28))

	# Add a lot of small layers
	num_layers = 100
	self.my_layers = [tf.keras.layers.Dense(64, activation="relu")
	for n in range(num_layers)]

	self.dropout = tf.keras.layers.Dropout(0.2)
	self.dense_2 = tf.keras.layers.Dense(10)

	def call(self, x):
	x = self.flatten(x)

	for layer in self.my_layers:
	x = layer(x)

	x = self.dropout(x)
	x = self.dense_2(x)

	return x


	input_data = tf.random.uniform([20, 28, 28])

	model = SequentialModel()
	result = model(input_data)

	<plugins>
	<plugin>
	<groupId>org.apache.maven.plugins</groupId>
	<artifactId>maven-shade-plugin</artifactId>
	<version>2.3</version>
	<configuration>
	<filters>
	<filter>
	<artifact>:</artifact>
	<excludes>
	<exclude>META-INF/*.SF</exclude>
	<exclude>META-INF/*.DSA</exclude>
	<exclude>META-INF/*.RSA</exclude>
	</excludes>
	</filter>
	</filters>
	<!-- Additional configuration. -->
	</configuration>
	<executions>
	<!-- Run shade goal on package phase -->
	<execution>
	<phase>package</phase>
	<goals>
	<goal>shade</goal>
	</goals>
	<configuration>
	<transformers>
	<!-- add Main-Class to manifest file -->
	<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
	<mainClass>com.ibm.wala.cast.python.ml.driver.Ariadne</mainClass>
	</transformer>
	</transformers>
	</configuration>
	</execution>
	</executions>
	</plugin>

wala / ml Goto Github PK

ml's Introduction

Core WALA Features

Getting Started

Documentation

Getting Help

Required Java Versions

Building from Source

WALA Tools in JavaScript

WALA-Based Tools

Acknowledgements

ml's People

Contributors

Stargazers

Watchers

Forkers

ml's Issues

Digression

Description

Regression

List Literals

Lists Without Literals

Regression

Description

Regression

Regression

Regression

Recommend Projects

Recommend Topics

Recommend Org