A Framework for Manipulating Existing Datasets of Security Patches for Automatic Program Repair Techniques and Studies
The baseline usage involves three operations: collect, transform and filter.
Collect from the specified source the dataset.
The dataset is downloaded to the folder PatchBundle/data/collected/'name_of_the_dataset'
$ ./tool/PatchBundle.py collect --datasets nvd, mozilla, secretpatch, secbench
Transforms the collected dataset's records into the PatchRecord format.
The dataset is saved in the folder PatchBundle/data/transformed/'name_of_the_dataset'
$ ./tool/PatchBundle.py transform --datasets nvd, mozilla, secretpatch, secbench
Filters the transformed dataset's based on pre-defined decorators in the file PatchBundle/tool/decorators/filter.py
.
$ ./tool/PatchBundle.py filter --datasets nvd, mozilla, secretpatch, secbench
Implement your own filter rules and update the filter method in the respective dataset file, PatchBundle/tool/datasets/'name_of_the_dataset'.py
.
Example of a custom filter:
def custom(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
dataset = func(*args, **kwargs)
return dataset.drop(columns=['commit', 'name'])
return wrapper
Add the filter:
@save
--> @custom
@one_line_changes
@equal_adds_dels
@c_code
@load
def filter(self, path: Path):
print(f"Filtering {self.name}")
return self.transformed