This repository stores the source code used in the journal paper, "Predicting vulnerability inducing function versions using node embeddings and graph neural networks"
We aim to propose a vulnerability prediction model that runs after every code change, and identifies vulnerability inducing functions in that version. We also would like to assess the success of node and token based source code representations over abstract syntax trees (ASTs) on predicting vulnerability inducing functions.
This research project is mainly conducted on wireshark project by using wireshark security advisories and wireshark bug repository
The dataset formed by this study and used in this study can be accessed through our submission in Mendeley repository
Please cite our work in case you use our dataset or source code.
@article{SAHIN2022106822,
title = {Predicting vulnerability inducing function versions using node embeddings and graph neural networks},
journal = {Information and Software Technology},
volume = {145},
pages = {106822},
year = {2022},
issn = {0950-5849},
doi = {https://doi.org/10.1016/j.infsof.2022.106822},
url = {https://www.sciencedirect.com/science/article/pii/S0950584922000015},
author = {Sefa Eren Şahin and Ecem Mine Özyedierler and Ayse Tosun},
keywords = {Software vulnerabilities, Graph neural networks, Graph embeddings, Abstract syntax trees},
}
Python 3.6+ is required. Additionally, LLVM backend is required for AST parsing
First, install requirements,
pip install -r requirements.txt
Then, add project to PYTHONPATH, according to your OS.
Rename vulnerability_prediction/config/config.yaml.example
as vulnerability_prediction/config/config.yaml
and fill acorrdingly.
Currently, wireshark and mozilla foundation bug repository scrapers are implemented. Just execute their scripts.
python vulnerability_prediction/scrapers/wireshark_scraper.py
python vulnerability_prediction/scrapers/mozilla_scraper.py
Commit mining in done in a sequential way. First, file changes are extracted. Then, commits are matched to bugs. Finally, vulnerability inducing code changes are found by SZZ algorithm.
Execute following scripts:
python vulnerability_prediction/commit_mining/extract_file_changes.py
python vulnerability_prediction/commit_mining/bug_commit_matching.py
python vulnerability_prediction/commit_mining/szz.py
Make sure that you have LLVM backend an Clang installed. Then, execute
python vulnerability_prediction/ast_extraction/ast_extractor.py