Code Monkey home page Code Monkey logo

ml's Introduction

WALA logo

GitHub Actions status Join the chat at https://gitter.im/WALAHelp/Lobby


The T. J. Watson Libraries for Analysis (WALA) provide static analysis capabilities for Java bytecode and related languages and for JavaScript. The system is licensed under the Eclipse Public License, which has been approved by the OSI (Open Source Initiative) as a fully certified open source license. The initial WALA infrastructure was independently developed as part of the DOMO research project at the IBM T.J. Watson Research Center. In 2006, IBM donated the software to the community.

For recent updates on WALA, join the mailing list.

Core WALA Features

WALA features include:

  • Java type system and class hierarchy analysis
  • Source language framework supporting Java and JavaScript
  • Interprocedural dataflow analysis (RHS solver)
  • Context-sensitive tabulation-based slicer
  • Pointer analysis and call graph construction
  • SSA-based register-transfer language IR
  • General framework for iterative dataflow
  • General analysis utilities and data structures
  • A bytecode instrumentation library (Shrike)

Getting Started

The fastest way to get started with WALA is to use the packages in Maven Central, as noted here. See the WALA-start repo for a Gradle-based example. We are actively re-organizing the deeper wiki technical documentation. In the meantime, you can check out tutorial slides to get an overview of WALA:

You can also watch screencasts of the WALA JavaScript tutorial here.

Finally, for now, to search the wiki documentation, we recommend a site-specific search on GitHub, e.g., a search for "call graph".

Documentation

We're hosting documentation for WALA on the GitHub wiki. We've chosen a wiki format just so that you can contribute. Don't be shy!

The WALA publications department is populating this wiki with technical documentation on a demand-driven basis, driven by questions posted to the wala-wala mailing list and also Gitter. We recommend this page for searching the mailing list archives.

Currently, we have the JavaDoc documentation for the WALA code being updated continuously. If you think a particular file deserves better javadoc, please open a feature request.

Getting Help

To get help with WALA, please either email the mailing list, ask a question on Gitter, or open an issue.

Required Java Versions

Most components of each official WALA release are built for use with Java 11 or newer. However, components that use Eclipse require at least Java 17.

Building from Source

WALA uses Gradle as its build system. If you intend to modify or build WALA yourself, then see the Gradle-specific README for more instructions and helpful tips.

WALA Tools in JavaScript

Recently, we have been expanding the set of WALA tools implemented in JavaScript. We have released a normalizer and some basic program analyses for JavaScript in the JS_WALA GitHub repository. We have also made available jsdelta and WALA Delta, delta debuggers for JavaScript-processing tools. Please see the linked GitHub repositories for further details on these tools.

WALA-Based Tools

Several groups have built open-source tools that enhance or build on WALA that may be useful to other WALA users. For details, see the Wala-based tools page.

Acknowledgements

YourKit logo

We thank YourKit for providing WALA developers with a complimentary license for their excellent Java profiler, which we use to improve and maintain WALA performance.

ml's People

Contributors

dependabot[bot] avatar juliandolby avatar khatchad avatar msridhar avatar shinnar avatar sifislag avatar tatianacv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ml's Issues

jython and jython3 versions

The jython and jython3 versions used are not in the maven repo. I can find the source code of jython with the same version and build it with Ant. But for jython3, I did not even find a version 0.0.1-SNAPSHOT. Is this the same package as https://github.com/jython/jython3?

Inconsistent Jython versions

Commit b4b6e8e says to use Python 2 for now, but it references jython3:

<artifactId>jython3</artifactId>

That results in an inconsistency with the follow:

<artifactId>com.ibm.wala.cast.python.jython</artifactId>

If we are using Jython 3, I believe we should be using com.ibm.wala.cast.python.jython3 instead of com.ibm.wala.cast.python.jython.

What happens here as a result is that we load up Jython 3 code that calls methods that don't exist in Jython 2, thus throwing a NoSuchMethod exception. An example is the PythonTree class, which has the method getLineno() in Jython (Python 2) but getLine() in Jython 3 (Python 3).

Certain decorators cause functions not to appear in the call graph

There are certain decorators that cause the call graph to be created correctly. Several times now, I see that nodes are missing when the function is decorated. The decorator can alter the behavior of the function, and I assume that the call graph construction is trying to account for that. More info soon.

Missing the map() built-in function

The map() built-in function is missing, which has consequences for lambdas. Consider the following code:

# From https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions

my_list = [10]


def fun_with_side_effects(y):
    my_list[0] = 1
    return y ** 2


def f():
    squares = list(map(lambda x: fun_with_side_effects(x), range(10)))


f()

The IR for f():

callees of node f : [range, list]

IR of node 3, context CallStringContext: [ script A.py.do()LRoot;@113 ]
<Code body of function Lscript A.py/f>
CFG:
BB0[-1..-2]
    -> BB1
BB1[0..6]
    -> BB2
    -> BB5
BB2[7..7]
    -> BB3
    -> BB5
BB3[8..8]
    -> BB4
    -> BB5
BB4[9..9]
    -> BB5
BB5[-1..-2]
Instructions:
BB0
BB1
0   v5 = lexical:list@Lscript A.py           A.py [12:14] -> [12:18]
1   v8 = lexical:map@Lscript A.py            A.py [12:19] -> [12:22]
2   v10 = new <PythonLoader,Lscript A.py/f/lambda1>@2<no information>
3   global:global script A.py/f/lambda1 = v10<no information>
4   putfield v1.< PythonLoader, LRoot, lambda1, <PythonLoader,LRoot> > = v10<no information> [1=[the function]]
5   v15 = lexical:range@Lscript A.py         A.py [12:59] -> [12:64]
6   v13 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v15,v16:#10 @6 exception:v17A.py [12:59] -> [12:68]
BB2
7   v6 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v8,v10,v13 @7 exception:v18A.py [12:19] -> [12:69]
BB3
8   v3 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v5,v6 @8 exception:v19A.py [12:14] -> [12:70] [3=[squares]]
BB4
BB5

So far so good. For comparison, we can compare list() above, that is not missing (FYI). But, in the pointer analysis, map() maps to nothing:

[Node: <Code body of function Lscript A.py/f> Context: CallStringContext: [ script A.py.do()LRoot;@113 ], v8] --> []

By contrast, list() is populated:

[Node: <Code body of function Lscript A.py/f> Context: CallStringContext: [ script A.py.do()LRoot;@113 ], v5] --> [[com.ibm.wala.cast.python.ipa.summaries.BuiltinFunctions$BuiltinFunction@9053b33]]

One consequence of this issue is that a lambda given to map() is not showing in the call graph. In other words, there is no call graph node for the lambda above, and subsequently, we have no node for fun_with_side_effects either.

Digression

  1. The map() function returns an iterator, but there is no such type in PythonTypes. We would seemingly have to add it.
  2. List comprehensions do work, and I would think lambdas would be similar if they are not also implemented. It's just tough to see if they are if the functions that take them as arguments aren't implemented.

Losing tensors in datasets

Description

Currently, we cannot track tensors contained in tf.data.Datasets:

import tensorflow as tf


def add(a, b):
  return a + b


dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])

for element in dataset:
    c = add(element, element)

We get:

Oct 11, 2023 11:09:48 AM com.ibm.wala.cast.python.ml.test.TestTensorflowModel testTf2
INFO: Tensor analysis: answer:

Regression

List Literals

Seems to work just fine with lists:

import tensorflow as tf


def add(a, b):
  return a + b


list = [tf.ones([1, 2]), tf.ones([2, 2])]

for element in list:
    c = add(element, element)

We get:

Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.client.PythonTensorAnalysisEngine getDataflowSources
INFO: Added dataflow source [Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list.py.do()LRoot;@103 ], v5]:[Empty].
Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.client.PythonTensorAnalysisEngine getDataflowSources
INFO: Added dataflow source [Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list.py.do()LRoot;@97 ], v5]:[Empty].
Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel testTf2
INFO: Tensor analysis: answer:
[SITE_IN_NODE{<Code body of function Lscript tf2_test_tensor_list.py>:Llist in CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]},<field 0>][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v249][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[SITE_IN_NODE{<Code body of function Lscript tf2_test_tensor_list.py>:Llist in CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]}; [Core[Root]]][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[SITE_IN_NODE{<Code body of function Lscript tf2_test_tensor_list.py>:Llist in CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]},<field 1>][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v257][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v265][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[<field global element>][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v268][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v267][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v270][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v271][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py/add> Context: CallStringContext: [ script tf2_test_tensor_list.py.do()LRoot;@117 ], v2][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list.py/add> Context: CallStringContext: [ script tf2_test_tensor_list.py.do()LRoot;@117 ], v3][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Ret-V:Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list.py.do()LRoot;@103 ]][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Ret-V:Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list.py.do()LRoot;@97 ]][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list.py.do()LRoot;@103 ], v5][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list.py.do()LRoot;@97 ], v5][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]

Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel lambda$0
WARNING: Encountered: class com.ibm.wala.ipa.callgraph.propagation.InstanceFieldKey
Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel lambda$0
WARNING: Encountered: class com.ibm.wala.cast.ipa.callgraph.ReflectedFieldPointerKey$3
Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel lambda$0
WARNING: Encountered: class com.ibm.wala.ipa.callgraph.propagation.InstanceFieldKey
Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel lambda$0
WARNING: Encountered: class com.ibm.wala.ipa.callgraph.propagation.StaticFieldKey
Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel lambda$0
WARNING: Encountered: class com.ibm.wala.ipa.callgraph.propagation.ReturnValueKey
Oct 11, 2023 2:25:11 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel lambda$0
WARNING: Encountered: class com.ibm.wala.ipa.callgraph.propagation.ReturnValueKey

Lists Without Literals

But, creating a list without using a literal doesn't work:

import tensorflow as tf


def add(a, b):
  return a + b


list = list()

list.append(tf.ones([1, 2]))
list.append(tf.ones([2, 2]))

for element in list:
    c = add(element, element)

We get:

Oct 11, 2023 2:51:26 PM com.ibm.wala.cast.python.ml.client.PythonTensorAnalysisEngine getDataflowSources
INFO: Added dataflow source [Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list3.py.do()LRoot;@106 ], v5]:[Empty].
Oct 11, 2023 2:51:26 PM com.ibm.wala.cast.python.ml.client.PythonTensorAnalysisEngine getDataflowSources
INFO: Added dataflow source [Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list3.py.do()LRoot;@99 ], v5]:[Empty].
Oct 11, 2023 2:51:26 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel testTf2
INFO: Tensor analysis: answer:
[Node: <Code body of function Lscript tf2_test_tensor_list3.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v264][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Ret-V:Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list3.py.do()LRoot;@106 ]][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: <Code body of function Lscript tf2_test_tensor_list3.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v252][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Ret-V:Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list3.py.do()LRoot;@99 ]][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list3.py.do()LRoot;@106 ], v5][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]
[Node: synthetic < PythonLoader, Ltensorflow/functions/ones, do()LRoot; > Context: CallStringContext: [ script tf2_test_tensor_list3.py.do()LRoot;@99 ], v5][{[D:Symbolic,n, D:Compound,[D:Constant,28, D:Constant,28]] of pixel}]

Oct 11, 2023 2:51:26 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel lambda$0
WARNING: Encountered: class com.ibm.wala.ipa.callgraph.propagation.ReturnValueKey
Oct 11, 2023 2:51:26 PM com.ibm.wala.cast.python.ml.test.TestTensorflowModel lambda$0
WARNING: Encountered: class com.ibm.wala.ipa.callgraph.propagation.ReturnValueKey

Python ModRef analysis says that a constant is modified

Consider the following code:

def g(p1, p2):
    assert p1 == 5 and p2 == 2


def f():
    g(5, p2=2)


f()

The ModRef analysis for Python (com.ibm.wala.cast.python.modref.PythonModRef) is returning that the following locations are "modified" by f():

[[ConstantKey:p2:<PythonLoader,Lstring>]; [Core[Root]]]
[[ConstantKey:p2:<PythonLoader,Lstring>],<field 0>]

But, according to the docs, a ConstantKey is "an instance key which represents a unique, constant object." It would thus seem that instances represented by ConstantKey instance keys can't be returned as modified by the ModRef analysis.

Starred not type is not handled

In Python 3.5 or above, Starred node is used in Call node with *args, for example:

def say(*a):
    print(a)

b = [1,2]
say(*b)

This will generate Starred node for *b when calling say. Part of the ast is below:

        "args": [
          {
            "value": {
              "id": "b",
              "ctx": {
                "type": "Load",
                "ftype": "Load"
              },
              "extra_attr": true,
              "type": "Name"
            },
            "ctx": {
              "type": "Load",
              "ftype": "Load"
            },
            "extra_attr": true,
            "type": "Starred"
          }
        ]

Missing callbacks related to datasets

Missing experimental_distribute_datasets_from_function

Called here: https://github.com/mead-ml/mead-baseline/blob/5d7632bb151c2d09501ebf49f36ba8c4204df4c8/mead/api_examples/pretrain_discrim_tf.py#L307

The callback function is dataset_test_fn, defined here: https://github.com/mead-ml/mead-baseline/blob/master/mead/api_examples/pretrain_discrim_tf.py#L301-L306

But that function doesn't show up in the call graph. The method reference should be: < PythonLoader, Lscript pretrain_discrim_tf.py/get_dataset, do()LRoot; >. However, we don't see it in the call graph nodes: https://gist.github.com/khatchad/bd48384f3561fdcc7c8c6e9141d93de5

Related to #92 and blocked on #89.

Does it support full python grammar?

I am currently doing some analysis of python code (but not only machine learning scripts). I read your paper and it mainly focus on inferring tensor types. I am wondering if this support full python/python3 grammar. I mean if I can convert any python/python3 source code to WALA IR without losing any information (e.g., mapping between WALA IR and python source code)?

Use Git submodules for custom dependency builds

Currently, there are two custom dependency builds:

- name: Checkout wala/IDE sources.
uses: actions/[email protected]
with:
repository: wala/IDE
# fetch-depth: 50
path: ./IDE
- name: Checkout juliandolby/jython3 sources.
uses: actions/[email protected]
with:
repository: juliandolby/jython3
path: ./jython3
- name: Install Jython3.
run: |
cd ./jython3
ant
cd dist
mvn install:install-file -Dfile=./jython-dev.jar -DgroupId="org.python" -DartifactId="jython3" -Dversion="0.0.1-SNAPSHOT" -Dpackaging="jar" -DgeneratePom=true
shell: bash
- name: Install IDE.
run: |
cd ./IDE/com.ibm.wala.cast.lsp
mvn clean install -B -q -DskipTests

Currently, these dependencies are cloned from GH from master. It would be better instead to specify a specific SHA1 to be checked out as to stabilize the build. Moreover, to build from source, users must examine the build file or https://github.com/wala/ML/blob/master/CONTRIBUTING.md.

I believe that both of these issues may be helped using Git submodules. There, you can specify the clone URL and SHA1 to be used, and the submodules can be cloned along with the main repo. There is also a GH action for submodules (see https://github.com/actions/checkout#user-content-usage).

Can't determine whether an API call is from TensorFlow when more than two functions are in a call chain

When processing the following:

import tensorflow as tf

def func2(t):
  pass

@tf.function
def func():
  a = tf.constant([[1.0, 2.0], [3.0, 4.0]])
  b = tf.constant([[1.0, 1.0], [0.0, 1.0]])
  c = tf.matmul(a, b)
  tensor = tf.Tensor(c.op, 0, tf.float32)
  func2(tensor)

func()

We found an error in determining whether we are dealing with a TensorFlow API. We found that the tensor analysis is empty:

TensorTypeAnalysis analysis = E.performAnalysis(builder);

Inheritance not supported

Consider the following example:

class D:
    pass


class C(D):

    def func(self, x):
        return x * x


c = C()
a = c.func(5)
assert a == 25

In the call graph, the callees of the outer script are the following:

Node: synthetic < PythonLoader, Lscript myscript.py/C, do()LRoot; > Context: CallStringContext: [ script myscript.py.do()LRoot;@100 ]
Node: synthetic < PythonLoader, L$script myscript.py/C/func, trampoline2()LRoot; > Context: CallStringContext: [ script myscript.py.do()LRoot;@103 ]

When we dump the call graph, we see:

callees of node Lscript myscript.py : [C, trampoline2]

All good. Now, considering the following example:

class D:

    def func(self, x):
        return x * x


class C(D):
    pass


c = C()
a = c.func(5)
assert a == 25

In the call graph, the callees of the outer script are the following:

Node: synthetic < PythonLoader, Lscript myscript.py/C, do()LRoot; > Context: CallStringContext: [ script myscript.py.do()LRoot;@100 ]

When we dump the call graph, we see:

callees of node Lscript myscript.py : [C]

How to install this wala on pycharm

image

The image above is the last step in the wala guidance
However, what should I put in the extension box?
Could someone screen record how they install this plugin? (Assuming we got the latest Pycharm & LSP installed properly)

Thank you so much!!

Fail to build due to org.python:jython:jar:2.7.2a1+ is missing

I try to build Ariadne from source code. I have success to build WALA code, IDE. But when build Ariadne, I meet exception:
Failed to execute goal on project com.ibm.wala.cast.python: Could not resolve dependencies for project com.ibm.wala:com.ibm.wala.cast.python:jar:0.0.1-SNAPSHOT: Could not find artifact org.python:jython:jar:2.7.2a1+ in central (https://repo.maven.apache.org/maven2) -> [Help 1]
I checked the maven repository, and find the latest version is 2.7.1b3, I change the jython version in pom as 2.7.1b3, but meet compile error.
So how can I get the 2.7.2a1 version of jython?
Thanks!

Missing the print built-in function summary

Description

Consider the following example input program:

# A.py
def f(x):
  print("Traced with: " + str(x))


f(1)

This is the IR of f():

Node: <Code body of function Lscript A.py/f> Context: CallStringContext: [ script A.py.do()LRoot;@97 ]
<Code body of function Lscript A.py/f>
CFG:
BB0[-1..-2]
    -> BB1
BB1[0..2]
    -> BB2
    -> BB3
BB2[3..4]
    -> BB3
BB3[-1..-2]
Instructions:
BB0
BB1
0   v5 = lexical:print@Lscript A.py          A.py [2:2] -> [2:7]
1   v10 = lexical:str@Lscript A.py           A.py [2:26] -> [2:29]
2   v8 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v10,v2 @2 exception:v11A.py [2:26] -> [2:32] [2=[x]]
BB2
3   v6 = binaryop(add) v7:#Traced with:  , v8A.py [2:8] -> [2:32]
4   v3 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v5,v6 @4 exception:v12A.py [2:2] -> [2:33]
BB3

Above, v5 is the print() function, while v10 is the str() function. However, in the pointer analysis, I am seeing the following:

  [Node: <Code body of function Lscript A.py/f> Context: CallStringContext: [ script A.py.do()LRoot;@97 ], v5] ->
  [Node: <Code body of function Lscript A.py/f> Context: CallStringContext: [ script A.py.do()LRoot;@97 ], v10] ->
     [com.ibm.wala.cast.python.ipa.summaries.BuiltinFunctions$BuiltinFunction@1724e9f5]

The points-to set for v5 in f() of A.py is empty, while the points-to set of v10 contains the built-in.

Regression

I believe that the print() function needs to be added here:

static {
// builtinFunctions.put("enumerate", Either.forLeft(PythonTypes.enumerate));
builtinFunctions.put("enumerate", Either.forRight(2));
builtinFunctions.put("int", Either.forLeft(TypeReference.Int));
builtinFunctions.put("round", Either.forLeft(TypeReference.Int));
builtinFunctions.put("len", Either.forLeft(TypeReference.Int));
builtinFunctions.put("list", Either.forLeft(PythonTypes.list));
builtinFunctions.put("range", Either.forLeft(PythonTypes.list));
builtinFunctions.put("sorted", Either.forLeft(PythonTypes.list));
builtinFunctions.put("str", Either.forLeft(PythonTypes.string));
builtinFunctions.put("sum", Either.forLeft(TypeReference.Int));
builtinFunctions.put("type", Either.forLeft(PythonTypes.object));
builtinFunctions.put("zip", Either.forLeft(PythonTypes.list));
builtinFunctions.put("slice", Either.forRight(2));
builtinFunctions.put("__delete__", Either.forRight(2));
}

Missing callbacks

Missing Strategy.run()

Called here: https://github.com/mead-ml/mead-baseline/blob/5d7632bb151c2d09501ebf49f36ba8c4204df4c8/mead/api_examples/pretrain_discrim_tf.py#L414.

The callback function _replicated_train_step() is defined here: https://github.com/mead-ml/mead-baseline/blob/5d7632bb151c2d09501ebf49f36ba8c4204df4c8/mead/api_examples/pretrain_discrim_tf.py#L394-L405.

But, we don't see it in the call graph. The method reference should be: < PythonLoader, Lscript pretrain_discrim_tf.py/train/_replicated_train_step, do()LRoot; >. The call graph nodes: https://gist.github.com/khatchad/ab56e3be454103829275c7507999b7d1

Regression

There are some callbacks defined in tensorflow.xml, as well as others. I see the return values are specified, but I wonder if adding the callback summary will also consider the given function argument as invoked:

<package name="tensorflow/estimator">
<class name="Estimator" allocatable="true">
<method name="do" descriptor="()LRoot;" numArgs="2" paramNames="self model">
<new def="x" class="Ltensorflow/estimator/train/train" />
<putfield class="LRoot" field="train" fieldType="LRoot" ref="self" value="x" />
<putfield class="LRoot" field="$callback" fieldType="LRoot" ref="x" value="model" />
<return value="arg0" />
</method>
</class>

<package name="tensorflow/estimator/train">
<class name="train" allocatable="true">
<method name="do" descriptor="()LRoot;" numArgs="3">
<getfield class="LRoot" field="$callback" fieldType="LRoot" ref="arg0" def="xx" />
<getfield class="LRoot" field="data" fieldType="LRoot" ref="arg1" def="data" />
<call class="LRoot" name="do" descriptor="()LRoot;" type="virtual" arg0="xx" arg1="data" arg2="3" numArgs="3" def="v" />
<return value="v" />
</method>
</class>
</package>

The ones there are for estimators (looks like there's two definitions). I wonder if we just need to add more for, e.g., TF2 APIs. I found these docs for the estimator.

Implicit `call()` for models/layers not working

Bug in processing the call for call(...) when calling the model (line 36)

class SequentialModel(tf.keras.Model):
def __init__(self, **kwargs):
super(SequentialModel, self).__init__(**kwargs)
self.flatten = tf.keras.layers.Flatten(input_shape=(28, 28))
# Add a lot of small layers
num_layers = 100
self.my_layers = [tf.keras.layers.Dense(64, activation="relu")
for n in range(num_layers)]
self.dropout = tf.keras.layers.Dropout(0.2)
self.dense_2 = tf.keras.layers.Dense(10)
def call(self, x):
x = self.flatten(x)
for layer in self.my_layers:
x = layer(x)
x = self.dropout(x)
x = self.dense_2(x)
return x
input_data = tf.random.uniform([20, 28, 28])
model = SequentialModel()
result = model(input_data)

Similar to #24 for __call__, the call(..) method does not appear in the callgraph.

Issues with global variables defined in the module code

Hello,
I have some issues with variables defined in the module code(which are global). If they are constants they are never written as such with a AstGlobalWrite instruction but if they are used in a function of the same module, a constant var is used with the correct value.
If they are not constants they do not have a corresponding AstGlobalWrite instruction unless that variable is used on a function defined in the same module.
example:

#glovar.py
globVar = "Foo"
globVar2 = int(3)

Produces:

BB0
BB1
0   v5 = invokestatic < PythonLoader, Lstr, import()Lstr; > @0 exception:v6glovar.py [4:0] -> [5:17] [5=[str]]
1   v8 = invokestatic < PythonLoader, Lfloat, import()Lfloat; > @1 exception:v9glovar.py [4:0] -> [5:17] [8=[float]]
2   v11 = invokestatic < PythonLoader, Lint, import()Lint; > @2 exception:v12glovar.py [4:0] -> [5:17] [11=[int]]
3   v14 = invokestatic < PythonLoader, L__name__, import()L__name__; > @3 exception:v15glovar.py [4:0] -> [5:17] [14=[__name__]]
4   v17 = invokestatic < PythonLoader, LFalse, import()LFalse; > @4 exception:v18glovar.py [4:0] -> [5:17] [17=[False]]
5   v20 = invokestatic < PythonLoader, LTrue, import()LTrue; > @5 exception:v21glovar.py [4:0] -> [5:17] [20=[True]]
6   v23 = invokestatic < PythonLoader, LNone, import()LNone; > @6 exception:v24glovar.py [4:0] -> [5:17] [23=[None]]
7   v26 = invokestatic < PythonLoader, Lprint, import()Lprint; > @7 exception:v27glovar.py [4:0] -> [5:17] [26=[print]]
8   v29 = invokestatic < PythonLoader, Lsuper, import()Lsuper; > @8 exception:v30glovar.py [4:0] -> [5:17] [29=[super]]
9   v32 = invokestatic < PythonLoader, Llen, import()Llen; > @9 exception:v33glovar.py [4:0] -> [5:17] [32=[len]]
10   v35 = invokestatic < PythonLoader, Lopen, import()Lopen; > @10 exception:v36glovar.py [4:0] -> [5:17] [35=[open]]
11   v38 = invokestatic < PythonLoader, Lhasattr, import()Lhasattr; > @11 exception:v39glovar.py [4:0] -> [5:17] [38=[hasattr]]
12   v41 = invokestatic < PythonLoader, LBaseException, import()LBaseException; > @12 exception:v42glovar.py [4:0] -> [5:17] [41=[BaseException]]
13   v44 = invokestatic < PythonLoader, Labs, import()Labs; > @13 exception:v45glovar.py [4:0] -> [5:17] [44=[abs]]
14   v47 = invokestatic < PythonLoader, Lrange, import()Lrange; > @14 exception:v48glovar.py [4:0] -> [5:17] [47=[range]]
16   v51 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v11,v52:#3.0 @16 exception:v53glovar.py [5:11] -> [5:17] [51=[globVar2]11=[int]]
BB2

while:

#glovar.py
def printVar():
	print globVar + globVar2

globVar = "Foo"
globVar2 = int(3)

Produces:

module code:
BB0
BB1
0   v5 = invokestatic < PythonLoader, Lstr, import()Lstr; > @0 exception:v6glovar.py [1:0] -> [5:17] [5=[str]]
1   v8 = invokestatic < PythonLoader, Lfloat, import()Lfloat; > @1 exception:v9glovar.py [1:0] -> [5:17] [8=[float]]
2   v11 = invokestatic < PythonLoader, Lint, import()Lint; > @2 exception:v12glovar.py [1:0] -> [5:17] [11=[int]]
3   v14 = invokestatic < PythonLoader, L__name__, import()L__name__; > @3 exception:v15glovar.py [1:0] -> [5:17] [14=[__name__]]
4   v17 = invokestatic < PythonLoader, LFalse, import()LFalse; > @4 exception:v18glovar.py [1:0] -> [5:17] [17=[False]]
5   v20 = invokestatic < PythonLoader, LTrue, import()LTrue; > @5 exception:v21glovar.py [1:0] -> [5:17] [20=[True]]
6   v23 = invokestatic < PythonLoader, LNone, import()LNone; > @6 exception:v24glovar.py [1:0] -> [5:17] [23=[None]]
7   v26 = invokestatic < PythonLoader, Lprint, import()Lprint; > @7 exception:v27glovar.py [1:0] -> [5:17] [26=[print]]
8   v29 = invokestatic < PythonLoader, Lsuper, import()Lsuper; > @8 exception:v30glovar.py [1:0] -> [5:17] [29=[super]]
9   v32 = invokestatic < PythonLoader, Llen, import()Llen; > @9 exception:v33glovar.py [1:0] -> [5:17] [32=[len]]
10   v35 = invokestatic < PythonLoader, Lopen, import()Lopen; > @10 exception:v36glovar.py [1:0] -> [5:17] [35=[open]]
11   v38 = invokestatic < PythonLoader, Lhasattr, import()Lhasattr; > @11 exception:v39glovar.py [1:0] -> [5:17] [38=[hasattr]]
12   v41 = invokestatic < PythonLoader, LBaseException, import()LBaseException; > @12 exception:v42glovar.py [1:0] -> [5:17] [41=[BaseException]]
13   v44 = invokestatic < PythonLoader, Labs, import()Labs; > @13 exception:v45glovar.py [1:0] -> [5:17] [44=[abs]]
14   v47 = invokestatic < PythonLoader, Lrange, import()Lrange; > @14 exception:v48glovar.py [1:0] -> [5:17] [47=[range]]
15   v50 = new <PythonLoader,Lscript glovar.py/printVar>@15glovar.py [1:0] -> [5:17] [50=[printVar]]
17   v55 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v11,v56:#3.0 @17 exception:v57glovar.py [5:11] -> [5:14] [11=[int]]
18   global:global globVar2 = v55            glovar.py [5:0] -> [5:1]
BB2


printVar:
BB0
BB1
0   v6 = global:global globVar2              glovar.py [2:17] -> [2:25]
1   v3 = binaryop(add) v4:#Foo , v6glovar.py [2:7] -> [2:25]
BB2

They should always have an AstGlobalWrite instruction because they can be imported by other modules that import the one they are declared in.

An interesting thing I found is that if I alter the source code, changing the order of the functions, the IR produced is different:

#glovar.py
globVar = "Foo"
globVar2 = int(3)

def printVar():
	print globVar + globVar2

The IR it produces changes to:

module code:
BB0
BB1
0   v5 = invokestatic < PythonLoader, Lstr, import()Lstr; > @0 exception:v6glovar.py [1:0] -> [5:32] [5=[str]]
1   v8 = invokestatic < PythonLoader, Lfloat, import()Lfloat; > @1 exception:v9glovar.py [1:0] -> [5:32] [8=[float]]
2   v11 = invokestatic < PythonLoader, Lint, import()Lint; > @2 exception:v12glovar.py [1:0] -> [5:32] [11=[int]]
3   v14 = invokestatic < PythonLoader, L__name__, import()L__name__; > @3 exception:v15glovar.py [1:0] -> [5:32] [14=[__name__]]
4   v17 = invokestatic < PythonLoader, LFalse, import()LFalse; > @4 exception:v18glovar.py [1:0] -> [5:32] [17=[False]]
5   v20 = invokestatic < PythonLoader, LTrue, import()LTrue; > @5 exception:v21glovar.py [1:0] -> [5:32] [20=[True]]
6   v23 = invokestatic < PythonLoader, LNone, import()LNone; > @6 exception:v24glovar.py [1:0] -> [5:32] [23=[None]]
7   v26 = invokestatic < PythonLoader, Lprint, import()Lprint; > @7 exception:v27glovar.py [1:0] -> [5:32] [26=[print]]
8   v29 = invokestatic < PythonLoader, Lsuper, import()Lsuper; > @8 exception:v30glovar.py [1:0] -> [5:32] [29=[super]]
9   v32 = invokestatic < PythonLoader, Llen, import()Llen; > @9 exception:v33glovar.py [1:0] -> [5:32] [32=[len]]
10   v35 = invokestatic < PythonLoader, Lopen, import()Lopen; > @10 exception:v36glovar.py [1:0] -> [5:32] [35=[open]]
11   v38 = invokestatic < PythonLoader, Lhasattr, import()Lhasattr; > @11 exception:v39glovar.py [1:0] -> [5:32] [38=[hasattr]]
12   v41 = invokestatic < PythonLoader, LBaseException, import()LBaseException; > @12 exception:v42glovar.py [1:0] -> [5:32] [41=[BaseException]]
13   v44 = invokestatic < PythonLoader, Labs, import()Labs; > @13 exception:v45glovar.py [1:0] -> [5:32] [44=[abs]]
14   v47 = invokestatic < PythonLoader, Lrange, import()Lrange; > @14 exception:v48glovar.py [1:0] -> [5:32] [47=[range]]
16   v51 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v11,v52:#3.0 @16 exception:v53glovar.py [2:11] -> [2:17] [51=[globVar2]11=[int]]
18   v56 = new <PythonLoader,Lscript glovar.py/printVar>@18glovar.py [1:0] -> [5:32] [56=[printVar]]
BB2


printVar:
BB0
BB1
0   v6 = lexical:globVar2@Lscript glovar.py  glovar.py [5:17] -> [5:25]
1   v3 = binaryop(add) v4:#Foo , v6          glovar.py [5:7] -> [5:25]
BB2

What is the reason the IR is different in this case?
Thanks in advance,
Sifis.

Edit: Another issue of the IR visible in these examples is the fact that print statements do not produce a method invocation.

IR incorrect for list comprehension

with COMPREHENSION_IR=true, the generated IR has no filter part.
e.g., l = [e*e for e in l if e >= 2]

The CAst is like:

    ASSIGN at test1.py [14:0] -> [14:1]
      VAR at test1.py [14:0] -> [14:1]
        "l" at test1.py [14:0] -> [14:1]
      COMPREHENSION_EXPR at test1.py [14:5] -> [14:8]
        NEW at test1.py [14:5] -> [14:8]
          "list" at test1.py [14:5] -> [14:8]
        FUNCTION_EXPR
          "com.ibm.wala.cast.python.ast.PythonParser$CAstVisitor$defineFunction$func$1@262b2c86"
        EXPR_LIST at test1.py [14:5] -> [14:8]
          VAR at test1.py [14:18] -> [14:19]
            "l" at test1.py [14:18] -> [14:19]
        EXPR_LIST at test1.py [14:5] -> [14:8]
          FUNCTION_EXPR
            "com.ibm.wala.cast.python.ast.PythonParser$CAstVisitor$defineFunction$func$1@5ed828d"

function: comprehension1
(  the function  e  )
RETURN
  BLOCK_EXPR
    BINARY_EXPR at test1.py [14:5] -> [14:8]
      "*" at test1.py [14:5] -> [14:8]
      VAR at test1.py [14:5] -> [14:6]
        "e" at test1.py [14:5] -> [14:6]
      VAR at test1.py [14:7] -> [14:8]
        "e" at test1.py [14:7] -> [14:8]

function: filter2
(  the function  e  )
RETURN
  BLOCK_EXPR
    IF_EXPR at test1.py [14:23] -> [14:29]
      BINARY_EXPR at test1.py [14:23] -> [14:29]
        ">=" at test1.py [14:23] -> [14:29]
        VAR at test1.py [14:23] -> [14:24]
          "e" at test1.py [14:23] -> [14:24]
        VAR at test1.py [14:23] -> [14:29]
          "0" at test1.py [14:23] -> [14:29]
      "true" at test1.py [14:23] -> [14:29]
      "false" at test1.py [14:23] -> [14:29]

But the IR is like:

25   v47 = new <PythonLoader,Llist>@25       test1.py [14:5] -> [14:8]
26   v50 = new <PythonLoader,Lscript test1.py/comprehension1>@26test1.py [14:5] -> [14:8]
27   global:global script test1.py/comprehension1 = v50test1.py [14:5] -> [14:8]
28   putfield v1.< PythonLoader, LRoot, comprehension1, <PythonLoader,LRoot> > = v50test1.py [14:5] -> [14:8]
29   v52 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v50,v47,v39 @29 exception:v53test1.py [14:5] -> [14:8] [52=[l]39=[l]]
BB2

<Code body of function Lscript test1.py/comprehension1>
CFG:
BB0[-1..-2]
    -> BB1
BB1[0..1]
    -> BB2
BB2[-1..-2]
Instructions:
BB0
BB1
0   v3 = binaryop(mul) v2 , v2               test1.py [14:5] -> [14:8] [2=[e]]
1   return v3                                test1.py [14:5] -> [14:8]
BB2

The condition "e >= 2" is missing in the IR.

Any hint on how to fix this?

Adding custom entry points to generate callgraph

Hi,

I am using PythonAnalysisEngine to generate a callgraph for Python programs. However, when I give multiple input files which does not have a main function in them, I do not get any edges in the callgraph (I just generated a callgraph for a single file with and without main function and compared the edges in the callgraph. When the input file doesn't contain a main function, then I get few to no edges in the call graph).

I want to give custom entrypoints along with the main function, for example, all the functions in all the input files as the entrypoints and I tried the following code in PythonAnalysisEngine by overriding makeDefaultEntrypoints:

   @Override
    protected Iterable<Entrypoint> makeDefaultEntrypoints(AnalysisScope scope, IClassHierarchy cha) {
        Set<Entrypoint> result = HashSetFactory.make();
        for(Module m : moduleFiles) {
            IClass entry = cha.lookupClass(TypeReference.findOrCreate(PythonTypes.pythonLoader,
                    TypeName.findOrCreate(scriptName(m))));
            assert entry != null: "bad root name " + scriptName(m) + ":\n" + cha;
            MethodReference er = MethodReference.findOrCreate(entry.getReference(), AstMethodReference.fnSelector);
            System.out.println("Root Entry Point: " + er.toString());
            result.add(new DefaultEntrypoint(er, cha));
        }

        //Begin - I added the following code
        for (IClass klass : cha) {
            for (IMethod method : klass.getDeclaredMethods()) {
                MethodReference er = method.getReference();
                result.add(new DefaultEntrypoint(er, cha));
            }
        }
        //End
        return result;
    }

I get the following exception:

java.lang.IllegalArgumentException: Illegal i: -1
	at com.ibm.wala.ssa.SymbolTable.ensureSymbol(SymbolTable.java:189)
	at com.ibm.wala.ipa.summaries.SyntheticIR.updateForInstruction(SyntheticIR.java:120)
	at com.ibm.wala.ipa.summaries.SyntheticIR.makeSymbolTable(SyntheticIR.java:95)
	at com.ibm.wala.ipa.summaries.SyntheticIR.<init>(SyntheticIR.java:55)
	at com.ibm.wala.ipa.summaries.SummarizedMethodWithNames$SyntheticIRWithNames.<init>(SummarizedMethodWithNames.java:141)
	at com.ibm.wala.ipa.summaries.SummarizedMethodWithNames.makeIR(SummarizedMethodWithNames.java:187)
	at com.ibm.wala.ipa.summaries.SyntheticIRFactory.makeIR(SyntheticIRFactory.java:34)
	at com.ibm.wala.ssa.DefaultIRFactory.makeIR(DefaultIRFactory.java:66)
	at com.ibm.wala.cast.ir.ssa.AstIRFactory$AstDefaultIRFactory.makeIR(AstIRFactory.java:59)
	at com.ibm.wala.ssa.SSACache.findOrCreateIR(SSACache.java:69)
	at com.ibm.wala.ipa.callgraph.AnalysisCache.getIR(AnalysisCache.java:74)
	at com.ibm.wala.ipa.callgraph.propagation.cfa.ContextInsensitiveSSAInterpreter.getIR(ContextInsensitiveSSAInterpreter.java:45)
	at com.ibm.wala.ipa.callgraph.propagation.cfa.ContextInsensitiveSSAInterpreter.getIRView(ContextInsensitiveSSAInterpreter.java:50)
	at com.ibm.wala.ipa.callgraph.propagation.cfa.DelegatingSSAContextInterpreter.getIRView(DelegatingSSAContextInterpreter.java:64)
	at com.ibm.wala.ipa.callgraph.propagation.SSAPropagationCallGraphBuilder.unconditionallyAddConstraintsFromNode(SSAPropagationCallGraphBuilder.java:208)
	at com.ibm.wala.ipa.callgraph.propagation.SSAPropagationCallGraphBuilder.addConstraintsFromNode(SSAPropagationCallGraphBuilder.java:190)
	at com.ibm.wala.ipa.callgraph.propagation.PropagationCallGraphBuilder.addConstraintsFromNewNodes(PropagationCallGraphBuilder.java:311)
	at com.ibm.wala.ipa.callgraph.propagation.StandardSolver.solve(StandardSolver.java:56)
	at com.ibm.wala.ipa.callgraph.propagation.PropagationCallGraphBuilder.makeCallGraph(PropagationCallGraphBuilder.java:251)
	at adapters.WalaNCFA.getStaticCallGraph(WalaNCFA.java:83)

Could you please help me in pointing out if there is another way of achieving this (giving all methods as entrypoints to generate a callgraph) or Is there any thing that I am missing in the above code?

Thanks in advance.

Support TF2

Support tensor analysis from tensors created by TF2 APIs.

Supporting Python 3 causes tensor dimensions not to be calculated

Replacing com.ibm.wala.cast.python.jython.test with com.ibm.wala.cast.python.jython3.test as a dependency in com.ibm.wala.cast.python.ml.test/pom.xml causes com.ibm.wala.cast.python.ml.test.TestNeuroImageExamples.testEx1CG to fail. Specifically, tensor dimensions aren't being calculated because Jython3 is not doing a constant propagation that Jython is doing. Found during https://github.com/ponder-lab/ML/issues/4#issuecomment-1558310839. See https://github.com/ponder-lab/ML/issues/4#issuecomment-1559961372 for more details.

More than one function with the same name in different modules.

Hello,
I'm just starting to use the com.ibm.wala.cast.python project(my aim is to produce facts for a declarative static analysis), so my issues may be simply cause by wrong usage of the API.
From what I understand each function is modeled as an IClass in the Class Hierarchy. Creating a Class Hierarchy for a scope of more than one modules can cause problems when two files declare the same function, only one is kept in the hierarchy. Is this intended? I could solve this by creating a different scope/hierarchy for each module but I'm not sure if this is the correct way to do this.
Also how does your front end deal with imports? I haven't been able to find something about it and it seems to be essential for resolving calls to the correct function.
Lastly, are there any notes or documentation for these projects available?
Thanks in advance,
Sifis.

Run the test Python file

For each unit test that takes a Python file as input, run the Python file and check the status code. Unless we are testing invalid Python files, we should get a status code of 0. This would enable checking the usage of assert statements in the Python files.

getSourceFileName for PythonClasses

Hello,
Classes of PythonClass type in the class hierarchy give a NullPointerException upon calling getSourceFileName(). When they are created in PythonLoader::defineType() a null Position is given. Is getting the Position from PythonCAstToIRTranslator::defineType's type argument and passing it to PythonLoader::defineType() enough to fix this?

Thanks in advance,
Sifis.

Edit: Fixed via pull request #4

Using com.ibm.wala.cast.python.ml as a library is problematic due to Maven Shade plug-in

com.ibm.wala.cast.python currently uses the Maven Shade plug-in:

<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<!-- Additional configuration. -->
</configuration>
<executions>
<!-- Run shade goal on package phase -->
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<!-- add Main-Class to manifest file -->
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.ibm.wala.cast.python.ml.driver.Ariadne</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>

I believe it does this for two reasons:

  1. Produce an executable JAR by adding the appropriate MANIFEST.mf file.
  2. Produce a standalone JAR that includes all of the dependencies ("uber" JAR).

However, this is problematic for using com.ibm.wala.cast.python.ml as a library:

If the uber JAR is reused as a dependency of some other project, directly including classes from the artifact's dependencies in the uber JAR can cause class loading conflicts due to duplicate classes on the class path.

True, False and None values.

Hello,
I noticed that True, False and None were not treated as constants but as variables.
The cause of this issue is that in python 2.7 they are translated to ast.name nodes just like variables.
To fix that I added a check in PythonParser::visitName() for each of these values and create constant nodes for them.
I chose to model None as a null constant object in order not to have to make changes to the cast project code.
You can inspect the change in sifislag@2d5fee2.
This change was added automatically to my pending pull request #5 .
Thanks,
Sifis.

`IllegalStateException` when using `from`...`import`

For the following example:

from tensorflow import random

def add(a, b):
  return a + b


c = add(random.uniform([1, 2]), random.uniform([2, 2]))

We get the following error:

[ERROR]   TestTensorflowModel.testTf2:98->testTf2:113 » IllegalState Encountered unexpected API call.

WALA dependency versions

Is it possible to use this tool with WALA of version lower than 1.4.x?

I tried importing WALA 1.5.5 into Eclipse, but it has thousands of errors. I tried different Eclipse versions but still lots of errors. I saw in 1,4,x branch there are already Eclipse .project file in the repo, so it may work. But in the pom.xml of this repo, the version of WALA dependency is fixed as 1.5.5. So does this repo uses any features only available in WALA 1.5.5?

`TypeInference` not working

I am getting a class cast exception when trying to use TypeInference on Python code. The instruction causing the exception is as follows:

s	PythonInvokeInstruction  (id=106)
10 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > 3,11,2 @5 exception:12

The problem occurs in com.ibm.wala.cast.python.ssa.PythonInvokeInstruction.visit(IVisitor). It tries to cast a TypeInference$TypeOperatorFactory (the type of v below) to a com.ibm.wala.cast.python.ssa.PythonInstructionVisitor:

((PythonInstructionVisitor)v).visitPythonInvoke(this);

Can't resolve wildcard imports

The following is valid Python code, but currently, we cannot resolve the wildcard import. In this example, ones() cannot be resolved.

from tensorflow import *

def add(a, b):
  return a + b

c = add(ones([1, 2]), ones([2, 2]))

Add Strategy.reduce() to XML summaries

The API at https://www.tensorflow.org/versions/r2.9/api_docs/python/tf/distribute/Strategy#reduce returns a tensor. Can we track it? We need a summary for the return value (it's a Tensor):

From the docs:

Returns A Tensor that's the concatenation of value across replicas along axis dimension.

Example from the docs:

strategy = tf.distribute.MirroredStrategy(["GPU:0", "GPU:1"])
# A DistributedValues with component tensor of shape (2, 1) on each replica
distributed_values = strategy.experimental_distribute_values_from_function(lambda _: tf.identity(tf.constant([[1], [2]])))
@tf.function
def run():
  return strategy.gather(distributed_values, axis=0)
run()

Default parameter values of methods.

Hello,

Attempting to generate IR for methods with default parameters I get the following error:

com.ibm.wala.util.debug.UnimplementedError: TypeContext.cfg()
        at com.ibm.wala.util.debug.Assertions.UNREACHABLE(Assertions.java:55)
        at com.ibm.wala.cast.ir.translator.AstTranslator$TypeContext.cfg(AstTranslator.java:2746)
        at com.ibm.wala.cast.ir.translator.AstTranslator.doGlobalRead(AstTranslator.java:426)
        at com.ibm.wala.cast.python.ir.PythonCAstToIRTranslator.leaveVar(PythonCAstToIRTranslator.java:326)
        at com.ibm.wala.cast.python.ir.PythonCAstToIRTranslator.leaveVar(PythonCAstToIRTranslator.java:60)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visit(CAstVisitor.java:536)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visitEntities(CAstVisitor.java:188)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visitScopedEntities(CAstVisitor.java:139)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visitScopedEntities(CAstVisitor.java:131)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visitEntities(CAstVisitor.java:182)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visitScopedEntities(CAstVisitor.java:139)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visit(CAstVisitor.java:898)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visitChildren(CAstVisitor.java:407)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visitAllChildren(CAstVisitor.java:415)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visit(CAstVisitor.java:472)
        at com.ibm.wala.cast.tree.visit.CAstVisitor.visitEntities(CAstVisitor.java:222)
        at com.ibm.wala.cast.ir.translator.AstTranslator.walkEntities(AstTranslator.java:4794)
        at com.ibm.wala.cast.ir.translator.AstTranslator.translate(AstTranslator.java:4943)
        at com.ibm.wala.cast.loader.CAstAbstractModuleLoader.init(CAstAbstractModuleLoader.java:127)
        at com.ibm.wala.cast.loader.SingleClassLoaderFactory.getLoader(SingleClassLoaderFactory.java:42)
        at com.ibm.wala.ipa.cha.ClassHierarchy.<init>(ClassHierarchy.java:267)
        at com.ibm.wala.ipa.cha.ClassHierarchy.<init>(ClassHierarchy.java:209)
        at com.ibm.wala.ipa.cha.SeqClassHierarchyFactory.make(SeqClassHierarchyFactory.java:51)
        ...

Minimal example:

class MyC(object):
    def __init__(self, b=3):
        self.b = b

I think the problem is found in com.ibm.wala.cast/.../CAstVisitor.java line 186++ but I was not sure of what the appropriate way to fix it is.

Thanks,
Sifis.

TF2?

Does Ariadne work on TensorFlow 2 code?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.