wala / graph4code Goto Github PK

GraphGen4Code: a toolkit for creating code knowledge graphs based on WALA code analysis and extraction of documentation and forum content.

Home Page: https://wala.github.io/graph4code/

License: Eclipse Public License 2.0

q 0.02% Shell 0.59% Python 16.36% Jupyter Notebook 50.02% Dockerfile 0.06% HTML 0.36% Java 32.48% DIGITAL Command Language 0.11%

code-understanding knowledge-graph

graph4code's Introduction

The T. J. Watson Libraries for Analysis (WALA) provide static analysis capabilities for Java bytecode and related languages and for JavaScript. The system is licensed under the Eclipse Public License, which has been approved by the OSI (Open Source Initiative) as a fully certified open source license. The initial WALA infrastructure was independently developed as part of the DOMO research project at the IBM T.J. Watson Research Center. In 2006, IBM donated the software to the community.

For recent updates on WALA, join the mailing list.

Core WALA Features

WALA features include:

Java type system and class hierarchy analysis
Source language framework supporting Java and JavaScript
Interprocedural dataflow analysis (RHS solver)
Context-sensitive tabulation-based slicer
Pointer analysis and call graph construction
SSA-based register-transfer language IR
General framework for iterative dataflow
General analysis utilities and data structures
A bytecode instrumentation library (Shrike)

Getting Started

The fastest way to get started with WALA is to use the packages in Maven Central, as noted here. See the WALA-start repo for a Gradle-based example. We are actively re-organizing the deeper wiki technical documentation. In the meantime, you can check out tutorial slides to get an overview of WALA:

Core WALA (PDF)
WALA JavaScript (PDF)

You can also watch screencasts of the WALA JavaScript tutorial here.

Finally, for now, to search the wiki documentation, we recommend a site-specific search on GitHub, e.g., a search for "call graph".

Documentation

We're hosting documentation for WALA on the GitHub wiki. We've chosen a wiki format just so that you can contribute. Don't be shy!

The WALA publications department is populating this wiki with technical documentation on a demand-driven basis, driven by questions posted to the wala-wala mailing list and also Gitter. We recommend this page for searching the mailing list archives.

Currently, we have the JavaDoc documentation for the WALA code being updated continuously. If you think a particular file deserves better javadoc, please open a feature request.

Getting Help

To get help with WALA, please either email the mailing list, ask a question on Gitter, or open an issue.

Required Java Versions

Most components of each official WALA release are built for use with Java 11 or newer. However, components that use Eclipse require at least Java 17.

Building from Source

WALA uses Gradle as its build system. If you intend to modify or build WALA yourself, then see the Gradle-specific README for more instructions and helpful tips.

WALA Tools in JavaScript

Recently, we have been expanding the set of WALA tools implemented in JavaScript. We have released a normalizer and some basic program analyses for JavaScript in the JS_WALA GitHub repository. We have also made available jsdelta and WALA Delta, delta debuggers for JavaScript-processing tools. Please see the linked GitHub repositories for further details on these tools.

WALA-Based Tools

Several groups have built open-source tools that enhance or build on WALA that may be useful to other WALA users. For details, see the Wala-based tools page.

Acknowledgements

We thank YourKit for providing WALA developers with a complimentary license for their excellent Java profiler, which we use to improve and maintain WALA performance.

graph4code's People

Contributors

Stargazers

Watchers

graph4code's Issues

Need Help with flowsTo Query on example1.nq sample

Hi, i am able to run the following query on Apache Jena on example1.nq file, however the query clause
?read graph4code:flowsTo+ ?fit1 . , is not returning the flowsTo objects, is there anything missing to connect the data flow on pandas.read_csv

Query :
PREFIX sio: http://semanticscience.org/resource/
PREFIX graph4code: http://purl.org/twc/graph4code/
PREFIX schema: http://schema.org/
PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#

select * where {
graph ?g {
?read rdfs:label "pandas.read_csv." .
?fit1 schema:about "fit" .
#?read graph4code:flowsTo+ ?fit1 .
?clf rdfs:label ?name .
?clf sio:SIO_000061 ?p .
?clf schema:text ?txt .
}
}

packages in environments.txt

Hi,

I have tried to install with requirements.txt but it turns out that a lot of the packages couldn't be installed due to the version problem (either version is not found with pip or there's some compatibility issues). I have questions listed below:

What would be the version of python3 that you are using? Could you share it?
Is it possible for you to list all package version that you're having? I guess it could be some problem when we don't have the full list of packages with their version.
Some dependencies are installed to extract the documentations. Is it possible for you guys to list out packages required for your code to work? For the extracting documentation part I can use the packages required for the dataset I'm dealing with.

Thanks a lot for answering in advance.

Relational dependancies not being tracked?

It doesn't seem like there's any way to run static analysis to track relational dependancies? Even if you run analysis over a package, building the call graph only seems to look at one module at a time.

Clarification on the Graph's Purpose and Utility

I recently stumbled upon this repository and found the idea of leveraging graphs for code analysis to be quite engaging.

However, as I explored the repository, I am curious to know:

What is the fundamental aim behind constructing this graph?
In what practical scenarios can this graph be employed?
How might a developer benefit from the insights derived from this graph? Could it aid in code optimization, debugging, or perhaps facilitating collaboration among developers by providing a clearer view of code dependencies and structures?
Are there any specific tasks or challenges in code development that this graph is particularly well-suited to address?
How does this graph integrate with existing development tools or workflows, if at all?

I'm keen on understanding how this graph could be useful in practical setting. I believe that understanding how this can be applied to practical applications, and recognizing the benefits of this graph, could significantly enhance appreciation and comprehension of this project within the developer community.

Looking forward to your response.

Built-in modules and modules with aliases have issues generating docstring knowledge graphs?

I am very grateful for taking the time to answer my questions amidst your busy schedule. I have encountered some issues again. When I generate the knowledge graph of the docstring, I find that the built-in modules such as inspect cannot generate its knowledge graph normally because pip cannot download the built-in modules. For some modules with aliases such as sklearn, the version downloaded by pip is also sklearn instead of sklearn-stick, which causes problems in generating the knowledge graph. However, the import sklearn in the sample you provided can generate the knowledge graph normally.

CodeBreakerPrereq-0.0.1-SNAPSHOT.jar not exists

When I execute ‘bash setup.sh’ to build the code analysis library, there exists an error "Failed to execute goal org.apache.maven.plugins:maven-install-plugin:2.4:install-file (default-cli) on project standalone-pom: The specified file '/workspace/graph4code-master/code_breaker_prereq/target/CodeBreakerPrereq-0.0.1-SNAPSHOT.jar' not exists". And I can't find the jar file.May I ask how to deal with it? Thank u!

Creating Forums graph

problem with ES_ports in "Creating Forums graph" last command

When executing the last command for the forums graph ( python -u create_forum_graph.py --stackoverflow_in_dir ../output/ai_stackexchange_dump/ --docstring_dir ../output/modules_out/ --graph_output_dir ../output/ai_stackexchange_graph/ --pickled_files_out ../output/ai_stackexchange_dump/ --index_name ai_stackexchange --graph_main_prefix ai_stackexchange
),

The first try worked but now every time I try to execute this command it blocks here.

Also, the terminal where ElasticSearch is executing, as soon as I launch this last command for the Forums graph, outputs the highlighted messages).

Those are the code references about this output in util.py, called inside create_forum_graphs.py (method create_stackoverflow_graph())

Here is the first try that worked successfully.

Error: Could not open the site: http://graph4code.whyis.io/

Hello, I saw and read your paper and think it is very enlightening and nice. And I have imported the generated sample knowledge graph into the Apeach jena database. I found that the URI website(http://graph4code.whyis.io/)(http://purl.org/twc/graph4code/python) for the function/module is no longer valid. Is there a description of the function/module on that website or anything like that? I'm very curious about this. If you can reply me, I can be very grateful you !

Does WALA support Python?

You mentioned in the paper that WALA can analyze Python source code, but I can't find python support in WALA's README. How can I use WALA for analysing Python source code?

How to update graph4code with latest data?

I'm quite curious and interested in understanding how to keep graph4code up-to-date with the latest data. Could you possibly elaborate on the steps or procedures involved in updating graph4code? Also, is it possible to automate this update process to ensure that the latest data is always incorporated into graph4code in an ongoing basis?

help and next steps...

I have a question/issue: I am trying to follow the process specified here on example_scripts/test1.py. Ignoring some errors I am getting during the steps, I am reasonably successful in running the steps upto and including "Creating dosctrings graph."

When I look at the example triplet {example1.py, example1.json, example1.nq} in the sample_graph directory with what I get for test1.py, the docstrings_triples.nq file I have is much larger than the example1.nq: about 300,000 lines vs. 615 lines. I would appreciate an expalanation of why and recommendation on how to "fix" it. What was done for creating example1.nq vs. docstrings_triples.nq file I am getting?
Using example1.nq as the example, is it possible to share an outline and queries of what comes next: loading that nq file into a db and example queries on this graph that could be of interest just to kick-start some possibilities?

Error: Could not find or load main class util.RunTurtleSingleAnalysis

@ksrinivs64
When I try to run the code analysis part using:
java -DoutputDir=hello.json -cp codebreaker3.jar util.RunTurtleSingleAnalysis hello.py null null

I got the error:

Error: Could not find or load main class util.RunTurtleSingleAnalysis
Caused by: java.lang.ClassNotFoundException: util.RunTurtleSingleAnalysis

Could you help me with it?

no jars file

Hi, I followed the tutorial to download but didn't find the jars/codebreaker3.jar file.

I've read some of the closed issues, stating that download the graph4code_prereq_jars.tar solves the problem. But for me downloading the file only give
.

No sign of the mentioned file. Am I doing something wrong or missing?

how to get the file 'class.map' of my own environment

Hello, I saw and read your paper "A Toolkit for Generating Code Knowledge Graphs" and think it is very nice and inspectful. And I try to run your code for extracting graph on my own code and experiment. Since your class.map corresponds to some libraries, I would like ask how I can get the class.map file of my own environment ?

Where to download jars/codebreaker3.jar

Where to download jars/codebreaker3.jar, I cold not find it.

Issue in tasks 3-4 "collecting docstrings documentation" for scripts that contain built-in modules

I am trying to create/collect a series of git repositories containing multiple python scripts.

Though, there is a problem in one of your scripts when I have to collect the docstrings documentations (to be run in the dir containing all analysis output previously collected): if one of the candidate python scripts chosen contains a built-in module like future / functools, unittest, the script /src/inspect_docstrings_per_module.py (called inside scripts/inspect_modules_for_docstrings.sh) returns errors (as shown below) and the docstrings output for those modules (present in the top modules to inspect) is not created correctly.

The problem therefore seems to be related to the script to be run at step 3 (src/generate_top_modules.py) that shouldn’t consider these built-in modules to be installed (with pip install) within the conda environment creation done at step 4.
(I am referring to the steps at your GitHub page: https://github.com/wala/graph4code)

Right now I cannot proceed using custom GitHub repositories to be analysed because those always contain python scripts that include some built-in modules that results in this error at step 4 (after being selected as a top_module into /output/top_modules.json at step 3).

macOS --arch64 laptop not compatible

Hi, here are the details for my mac: Chip: Mac M1 Pro. I have been trying to install compatible versions for elastic search but nothing is working. Can you tell give me the link for which version I need to install as well as what steps I need to follow to ensure elastic search will work on my device. Is there anything in the graph4code steps that I need to do different apart what is listed? Please and let me know! Thanks