Code Monkey home page Code Monkey logo

tnpa-generalizability's Introduction

Semantic-Preserving Program Transformations

This project contains the program transformation tool and the datasets of transformed programs for the paper 'On the Generalizability of Neural Program Models with respect to Semantic-Preserving Program Transformations' (arXiv, ScienceDirect) accepted at the IST Journal, Elsevier 2021.

Structure

├── JavaMethodTransformer   # source code for the program transformation tool.
├── images                  # some figures from the paper for README.

Motivating Example:

Motivating Example

Figure 1: A misprediction in code2vec is revealed by renaming the other variable as var0 in the compareTo method of the java-small/test/hadoop/ApplicationAttemptId.java file.


Semantic Program Transformations:


Program Transformation Tool:

Create the jar file (JavaMethodTransformer.jar) using Maven and then call the jar with the following arguments:

  • args[0] = Input directory to the original methods.
  • args[1] = Output directory to the transformed methods.
$ cd <...>/JavaMethodTransformer/
$ mvn clean compile assembly:single
$ java -jar target/jar/JavaMethodTransformer.jar <.../methods/> <.../transforms/>

Datasets of Transformed Methods:

  • single-place - apply the transformation to each candidate location separately.
  • all-place - apply the transformation to all candidate locations simultaneously.
  • x-percent - apply the transformation to randomly selected X% candidate locations, where X = [25, 50, 75].

Generalizability Metrics:

  • Prediction Change Percentage (PCP):

    The percentage of changes in predictions before the transformation and after the transformation.

    Type of Changes:

    • CCP - the percentage of correct predictions that stay correct.
    • CWP - the percentage of correct predictions that become wrong.
    • WWSP - the percentage of wrong predictions that stay to the same wrong prediction.
    • WCP - the percentage of wrong predictions that become correct.
    • WWDP - the percentage of wrong predictions that change to a different wrong prediction.
  • Sub-token Comparison:

    • Precision - the percentage of predicted sub-tokens that are true positives.
    • Recall - the percentage of true positive sub-tokens that are correctly predicted.
    • F1-Score - the harmonic mean of precision (P) and recall (R).

Experimental Setting:


Citation:

On the Generalizability of Neural Program Models with respect to Semantic-Preserving Program Transformations

@article{rabin2021generalizability,
  title = {On the generalizability of Neural Program Models with respect to semantic-preserving program transformations},
  author = {Md Rafiqul Islam Rabin and Nghi D.Q. Bui and Ke Wang and Yijun Yu and Lingxiao Jiang and Mohammad Amin Alipour},
  journal = {Information and Software Technology},
  volume = {135},
  pages = {106552},
  year = {2021},
  issn = {0950-5849},
  doi = {https://doi.org/10.1016/j.infsof.2021.106552},
  url = {https://www.sciencedirect.com/science/article/pii/S0950584921000379}
}

tnpa-generalizability's People

Contributors

mdrafiqulrabin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.