This project contains the program transformation tool and the datasets of transformed programs for the paper 'On the Generalizability of Neural Program Models with respect to Semantic-Preserving Program Transformations' (arXiv, ScienceDirect) accepted at the IST Journal, Elsevier 2021.
├── JavaMethodTransformer # source code for the program transformation tool.
├── images # some figures from the paper for README.
Figure 1: A misprediction in code2vec is revealed by renaming the other
variable as var0
in the compareTo method of the java-small/test/hadoop/ApplicationAttemptId.java
file.
- Variable Renaming (VN) - renames the name of a variable.
- Permute Statement (PS) - swaps two independent statements in a basic block.
- Unused Statement (UN) - inserts an unused string declaration.
- Loop Exchange (LX) - replaces
for
loops withwhile
loops or vice versa. - Switch to If (SF) - replaces a
switch
statement with an equivalentif
statement. - Boolean Exchange (BX) - switches the value of a
boolean
variable and propagates this change in the method.
Create the jar file (JavaMethodTransformer.jar) using Maven
and then call the jar with the following arguments:
- args[0] = Input directory to the original methods.
- args[1] = Output directory to the transformed methods.
$ cd <...>/JavaMethodTransformer/
$ mvn clean compile assembly:single
$ java -jar target/jar/JavaMethodTransformer.jar <.../methods/> <.../transforms/>
- single-place - apply the transformation to each candidate location separately.
- all-place - apply the transformation to all candidate locations simultaneously.
- x-percent - apply the transformation to randomly selected X% candidate locations, where X = [25, 50, 75].
-
Prediction Change Percentage (PCP):
The percentage of changes in predictions before the transformation and after the transformation.
Type of Changes:
- CCP - the percentage of correct predictions that stay correct.
- CWP - the percentage of correct predictions that become wrong.
- WWSP - the percentage of wrong predictions that stay to the same wrong prediction.
- WCP - the percentage of wrong predictions that become correct.
- WWDP - the percentage of wrong predictions that change to a different wrong prediction.
-
Sub-token Comparison:
- Precision - the percentage of predicted sub-tokens that are true positives.
- Recall - the percentage of true positive sub-tokens that are correctly predicted.
- F1-Score - the harmonic mean of precision (P) and recall (R).
-
Target Downstream Task:
- Method Name Prediction (a.k.a. Code Summarization)
-
Subject Neural Program Models:
- code2vec model - represents programs with AST paths (monolithic path embeddings).
- code2seq model - represents programs with AST paths (encode paths node-by-node).
- GGNN model - represents programs with graphs (semantic edges + nodes).
-
Original Java Datasets:
@article{rabin2021generalizability,
title = {On the generalizability of Neural Program Models with respect to semantic-preserving program transformations},
author = {Md Rafiqul Islam Rabin and Nghi D.Q. Bui and Ke Wang and Yijun Yu and Lingxiao Jiang and Mohammad Amin Alipour},
journal = {Information and Software Technology},
volume = {135},
pages = {106552},
year = {2021},
issn = {0950-5849},
doi = {https://doi.org/10.1016/j.infsof.2021.106552},
url = {https://www.sciencedirect.com/science/article/pii/S0950584921000379}
}