Bug Localization

This folder contains code for Bug Localization benchmark. Challenge: given an issue with bug description, identify the files within the project that need to be modified to address the reported bug.

We provide scripts for data collection and processing, data exploratory analysis as well as several baselines implementations for the task solution.

💾 Install dependencies

We provide dependencies for pip dependency manager, so please run the following command to install all required packages:

pip install -r requirements.txt

Bug Localization task: given an issue with bug description, identify the files within the project that need to be modified to address the reported bug

🤗 Load data

All data is stored in HuggingFace 🤗. It contains:

Dataset with bug localization data (with issue description, sha of repo with initial state and to the state after issue fixation). You can access data using datasets library:
```
from datasets import load_dataset

# Select a configuration from ["py", "java", "kt", "mixed"]
configuration = "py"
# Select a split from ["dev", "train", "test"]
split = "dev"
# Load data
dataset = load_dataset("JetBrains-Research/lca-bug-localization", configuration, split=split)
```
where labels are:
dev - all collected data
test - manually selected data (labeling artifacts)
train - all collected data which is not in test
and configurations are:
py -- only .py files in diff
java -- only .java files in diff
kt -- only .kt files in diff
mixed -- at least on of the .py, .java or .kt file and maybe file(s) with another extensions in diff
Archived repos (from which we can extract repo content on different stages and get diffs which contains bugs fixations).
They are stored in .tar.gz so you need to run script to load them and unzip:
1. Set repos_path in config to directory where you want to store repos
2. Run load_data_from_hf.py which will load all repos from HF and unzip them

⚙️ Run Baseline

Embedding-based
- TF-IDF
- GTE
- CodeT5
- BM25
Name-based
- GPT3.5
- GPT4
- Cloud 2
- CodeLLama
- Mistral

tiginamaria / bug-localization Goto Github PK

bug-localization's Introduction

Bug Localization

💾 Install dependencies

🤗 Load data

⚙️ Run Baseline

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent