Code Monkey home page Code Monkey logo

aipl's Introduction

AIPL

Here is the relevant open-source code for the article titled “Improving Issue-PR Link Prediction via Knowledge-aware Heterogeneous Graph”

Introduction

In this work, we designe an approach, named AIPL, capable of predicting Issue-PR links on GitHub. It leverages the heterogeneous graph to model multi-type GitHub data and employ the metapath-based technology to incorporate crucial information transmitting among multi-type data. When given a pair of an issue and a PR, AIPL can suggest whether there could be a link.

Environment

AIPL is implemented by PyTorch over a server equipped with NVIDIA GTX 1060 GPU.

Dependencies

  • python 3.7.3
  • PyTorch 1.13.1
  • NumPy 1.21.5
  • Pandas 1.3.4
  • scikit-learn 1.0.2
  • scipy 1.7.3
  • DGL 0.6.1
  • NetworkX 2.6.3

File Introduction

Dataset

We release our annotated dataset in this file dir.

facebook/react & vuejs/vue

Annotated dataset based on repositories facebook_react and vuejs/vue

  • Index Information of nodes and edges on heterogeneous graph
  • Features Embeddings of nodes
  • Training set& Test set Annotated dataset
  • adjM.npz The adjacency matrix of heterogeneous graph

Note that, all the files regarding metapaths are so big that it's hard to upload them to this open-source repository. However, all the required files can be obtained by running the file construct_metapath.py.

Code

  • baseline The code of our baselines, including iLinker, A-M, random walk, metapath2vec, R-GCN, GTN, Simple-HGN, HGT, HAN, Sehgnn, and MECCH.
  • AIPL The code of AIPL, please read the following introduction for a better understanding.

Code Functions

The relevant codes of our method include building heterogeneous graph, constrcuting metapath and training graph-based model.
The first step is to run build_graph.py . The second step is to run construct_metapath.py. The third strp is to run AIPL_main.py
The detailed explanations are as follows:
build_graph.py
The code snippet constructs a heterogeneous graph and generates node features for users, repositories (repos), issues, and pull requests (PRs).
It loads data related to various relationships like user-repo, user-issue, user-PR, repo-repo, repo-issue, repo-PR, issue-issue, issue-PR, and PR-PR from corresponding directories and creates an adjacency matrix (adjM).
Additionally, it extracts feature vectors such as title vectors from CSV files to create features for repos, issues, and PRs.
The code then saves the adjacency matrix and node features in numpy arrays for further analysis.
construct_metapath.py
The code first loads data from various edge and index files, including user-repo, user-issue, user-pr, repo-repo, repo-issue, repo-pr, issue-issue, issue-pr, and pr-pr.
It then loads adjacency matrices and organizes them into lists based on different node types such as users, repositories, issues, and prs.
Next, the code generates expected metapaths based on predefined patterns. These metapaths are then mapped to corresponding indices and stored in pickle files, numpy arrays, and adjacency lists for further analysis and processing.
AIPL_main.py
The code is related to the model training and model inferences. User can train and evaluate AIPL by running AIPL_main.py.
The script handles data loading, model setup, training with early stopping, and evaluation using metrics like accuracy, precision, recall, and F1-score.
Specifically, the functions of loading data and batching are called using the files data.py, preprocess.py, and tools.py in the 'utils' folder.
Regarding the construction of the AIPL model, it includes intra-metapath aggregation, inter-metapath aggregation, and attention mechanism.
These codes are presented in the base_magnn.py and magnn_lp.py under the 'magnn_model' directory, directly called by 'AIPL_main'."

Also, you can set the series of parameters in this py file, including learning_rate, epoch_number, drop_out, attention head number, instance encoder.

Example Presentation

  1. Example 1 image
  2. Example 2 image
  3. Example 3 image

Copyright

All copyright of the tool is owned by the author of the paper.

aipl's People

Contributors

baishuotong avatar

Stargazers

 avatar  avatar

Watchers

Kostas Georgiou avatar

Forkers

wuxiangchen

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.