Code Monkey home page Code Monkey logo

open-vocabulary-learning-on-source-code-with-a-graph-structured-cache's Introduction

What is this library?

This library contains the code needed to reproduce all experiments in the paper Open Vocabulary Learning on Source Code with a Graph-Structured Cache.

It's meant to be used along with this data preprocessing library.

How do I run your code?

Installation

Python

Install the Conda python package manager. Then follow the instructions here using the file environment.yml file in this library's root directory to satisfy the python requirements to run this library's code.

In theory our code is OS-agnostic, but we ran all our experiments on Ubuntu Linux, so that's where you're most likely to have installation success.

Cloud Integration (optional)

We ran our experiments on Amazon EC2 instances. Everything should run fine locally, but our code includes functionality to start jobs on AWS Managed Instances and save data and logs to S3.

To use these features, you'll need the AWS CLI installed and configured, and you need to edit the details in the experiments/temp.aws_config.py file and rename it experiments/aws_config.py. (Warning: aws_config.py is in the .gitignore since it might contain sensitive info.)

Tests (optional)

We included as many unit tests as we could. They're in the tests directory, whose directory structure mirrors that of the rest of the library. (Warning: they take a while to run, and they expect a GPU.)

You can run them from the project root directory with python -m unittest.

Training and Evaluating models

All code in this library expects to be run from the library's root directory with python running modules as scripts. E.g. python -m experiments.VarNaming_vocab_comparison.train_models.

The general workflow is

  1. Create some .gml files with this library.
  2. Create a Task instance for the task you want the model to perform, as shown in the file experiments/make_tasks_and_preprocess_for_experiment.py.
  3. Turn the task into preprocessed datapoints by running python -m preprocess_task_for_model.
  4. Run python -m train_model_on_task.
  5. Run python -m evaluate_model to see how your model did on a test set.

Recreating the experiments in the paper

  1. Use this library with its existing repositories.txt file to download 18 maven repositories and preprocess their contents into Augmented ASTs.
  2. Move the directories produced via step 1. to s3shared/18_popular_mavens/repositories. (Don't worry if you're not using S3 - it'll still work.)
  3. Navigate to s3shared/18_popular_mavens/ and run experiments/make_train_test_split.sh from the command line.
  4. For either the Fill In The Blank experiment (FITB_vocab_comparison) or the Variable Naming experiment (VarNaming_vocab_comparison), run python -m experiments.<experiment name>.make_tasks_and_preprocess. (You may need to change some args/kwargs in this file to suit your setup, e.g. changing aws_config['remote_ids']['box1'] to 'local' if you want to run locally.)
  5. Run python -m experiments.<experiment name>.train_models. (Again, you may need to change some args/kwargs in this file to suit your setup.)
  6. Run python -m experiments.<experiment name>.evaluate_models. (Again, you may need to change some args/kwargs in this file to suit your setup.)

Questions?

Feel free to get in touch with Milan Cvitkovic or any of the other paper authors. We'd love to hear from you!

open-vocabulary-learning-on-source-code-with-a-graph-structured-cache's People

Contributors

mwcvitkovic avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.