A curated list of awesome research papers, datasets, and tools for applying machine learning techniques to code intelligence, which is about leverages machine learning and data mining techniques to mine knowledge from large-scale code corpus by developing intelligent tools to improve the quality and productivity of computer programming.
Year | Title | Author | Venue | Code | In |
---|---|---|---|---|---|
2018 | A Survey of Machine Learning for Big Code and Naturalness | Allamanis et al. | CSUR | Code | Y |
2021 | A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research | Watson et al. | TOSEM | Code | Y |
2020 | Synergy between Machine/Deep Learning and Software Engineering- How Far Are We? | Wang et al. | arXiv | Code | Y |
2020 | A Survey on Deep Learning for Software Engineering | Yang et al. | CSUR | Code | Y |
2020 | Deep Learning & Software Engineering- State of Research and Future Directions | Devanbu et al. | arXiv | Code | Y |
2021 | CodeXGLUE- A Machine Learning Benchmark Dataset for Code Understanding and Generation | Lu et al. | arXiv | Code | Y |
Year | Title | Author | Venue | Code | |
---|---|---|---|---|---|
2017 | Synthesizing benchmarks for predictive modeling | Cummins et al. | CGO | Code | Y |
2015 | Toward deep learning software repositories | White et al. | ICSE | Code | Y |
2016 | Summarizing source code using a neural attention model | Iyer et al. | ACL | Code | Y |
2016 | A convolutional attention network for extreme summarization of source code | Allamanis et al. | ICML | Code | Y |
2019 | Open Vocabulary Learning on Source Code with a Graph-Structured Cache | Cvitkovic et al. | ICML | Code | Y |
2021 | A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code | Chirkova et al. | NAACL | Code | Y |
2020 | Learning and Evaluating Contextual Embedding of Source Code | Kanade et al. | ICML | Code | Y |
2020 | Codebert: A pre-trained model for programming and natural languages | Feng et al. | EMNLP | Code | Y |
2020 | Big code!= big vocabulary: Open-vocabulary models for source code | Karampatsis et al. | ICSE | Code | Y |
Year | Title | Author | Venue | Code | In |
---|---|---|---|---|---|
2015 | How can I use this method? | Moreno et al. | ICSE | Code | Y |
2017 | An unsupervised approach for discovering relevant tutorial fragments for APIs | Jiang et al. | ICSE | Code | Y |
2017 | DeepAM: Migrate APIs with Multi-Modal Sequence to Sequence Learning | Deepam et al. | IJCAI | Code | Y |
2016 | Deep API learning | Gu et al. | FSE | Code | Y |
2017 | Exploring API Embedding for API Usages and Applications | Nguyen et al. | ICSE | Code | Y |
2019 | SAR: learning cross-language API mappings with little knowledge | Bui et al. | FSE | Code | Y |
Year | Title | Author | Venue | Code | In |
---|---|---|---|---|---|
2016 | Convolutional neural networks over tree structures for programming language processing | Mou et al. | AAAI | Code | Y |
2020 | Modeling programs hierarchically with stack-augmented LSTM | Liu et al. | JSS | Code | Y |
2019 | A novel neural source code representation based on abstract syntax tree | Zhang et al. | ICSE | Code | Y |
2018 | Deep code comment generation | Hu et al. | ICPC | Code | Y |
2019 | code2vec: Learning distributed representations of code | Alon et al. | PLDI | Code | Y |
2019 | code2seq: Generating Sequences from Structured Representations of Code | Alon et al. | ICLR | Code | Y |
2020 | Structural language models of code | Alon et al. | ICML | Code | Y |
2017 | A syntactic neural model for general-purpose code generation | Yin et al. | ACL | Code | Y |
2018 | Tree-to-tree neural networks for program translation | Chen et al. | ICLR | Code | Y |
Year | Title | Author | Venue | Code | In |
---|---|---|---|---|---|
2018 | Neural code comprehension: A learnable representation of code semantics | Ben et al. | Neurips | Code | Y |
2020 | IR2Vec: LLVM IR based Scalable Program Embeddings | Venkatakeerthy et al. | TACO | Code | Y |
2020 | Compiler-based graph representations for deep learning models of code | Brauckmann et al. | CC | Code | Y |
2021 | ProGraML: Graph-based Deep Learning for Program Optimization and Analysis | Cummins et al. | ICML | Code | Y |
2021 | How could Neural Networks understand Programs? | Peng et al. | ICML | Code | Y |
Year | Title | Author | Venue | Code | In |
---|---|---|---|---|---|
2018 | Learning to represent programs with graphs | Allamanis et al. | ICLR | Code | Y |
2017 | Smartpaste: Learning to adapt source code | Allamanis et al. | arXiv | Code | Y |
2018 | Generative code modeling with graphs | Brockschmidt et al. | ICLR | Code | Y |
2020 | Flow2Vec: value-flow-based precise code embedding | Sui et al. | OOPSLA | Code | Y |
2021 | ProGraML: Graph-based Deep Learning for Program Optimization and Analysis | Cummins et al. | ICML | Code | Y |
2021 | PLUR: A Unifying, Graph-Based View of Program Learning, Understanding, and Repair | Chen et al. | NeurIPS | Code | Y |
2017 | Intelligent development environment and software knowledge graph | Lin et al. | NeurIPS | Code | Y |
2020 | Graph4code: A machine interpretable knowledge graph for code | Abdelaziz et al. | arXiv | Code | Y |
2020 | Exploiting Code Knowledge Graph for Bug Localization via Bi-directional Attention | Zhang et al. | ICPC | Code | Y |
Year | Title | Author | Venue | Code | In |
---|---|---|---|---|---|
2018 | Code vectors: Understanding programs through embedded abstracted symbolic traces | Henkel et al. | FSE | Code | Y |
2019 | Learning to Represent Edits | Yin et al. | ICLR | Code | Y |
2019 | Neural Networks for Modeling Source Code Edits | Zhao et al. | arXiv | Code | Y |
2020 | Cc2vec: Distributed representations of code changes | Hoang et al. | ICSE | Code | Y |
2019 | On Learning Meaningful Code Changes via Neural Machine Translation | Tufano et al. | ICSE | Code | Y |
2021 | Copy that! Editing Sequences by Copying Spans | Panthaplackel et al. | AAAI | Code | Y |
2020 | A Structural Model for Contextual Code Changes | Brody et al. | OOPSLA | Code | Y |
2021 | Learning Structural Edits via Incremental Tree Transformations | Yao et al. | ICLR | Code | Y |
Year | Title | Author | Venue | Code | In |
---|---|---|---|---|---|
2018 | Deep code search | Gu et al. | ICSE | Code | Y |
2016 | Deep learning code fragments for code clone detection | White et al. | ASE | Code | Y |
2018 | Deepsim: deep learning code functional similarity | Zhao et al. | FSE | Code | Y |
2018 | Improving automatic source code summarization via deep reinforcement learning | Wan et al. | ASE | Code | Y |
2019 | Multi-modal attention network learning for semantic source code retrieval | Wan et al. | ASE | Code | Y |
Year | Title | Author | Venue | Code | In |
---|---|---|---|---|---|
2016 | Convolutional neural networks over tree structures for programming language processing | Mou et al. | AAAI | Code | Y |
2018 | Adapting neural text classification for improved software categorization | Leclair et al. | ICSME | Code | Y |
2019 | Bilateral dependency neural networks for cross-language algorithm classification | Bui et al. | SANER | Code | Y |
2018 | SCC: Automatic classification of code snippets | Alreshedy et al. | SCAM | Code | Y |
2020 | SCC++: predicting the programming language of questions and snippets of Stack Overflow | Alrashedy et al. | JSS | Code | Y |
Year | Title | Author | Venue | Code | In |
---|---|---|---|---|---|
2013 | Lexical statistical machine translation for language migration | Nguyen et al. | FSE | Code | Y |
2015 | Using machine translation for converting python 2 to python 3 code | Aggarwal et al. | Technical Report | Code | Y |
2015 | Divide-and-conquer approach for multi-phase statistical migration for source code | Nguyen et al. | ASE | Code | Y |
2018 | Tree-to-tree neural networks for program translation | Chen et al. | ICLR | Code | Y |
2017 | DeepAM: Migrate APIs with Multi-Modal Sequence to Sequence Learning | Deepam et al. | IJCAI | Code | Y |
2020 | Unsupervised translation of programming languages | Lachaux et al. | NeurIPS | Code | Y |
Year | Title | Author | Venue | Code | In |
---|---|---|---|---|---|
2018 | Learning to optimize tensor programs | Chen et al. | NeurIPS | Code | Y |
2020 | FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System | Zheng et al. | ASPLOS | Code | Y |
2020 | Ansor: Generating high-performance tensor programs for deep learning | Zheng et al. | OSDI | Code | Y |
2013 | Predictive modeling in a polyhedral optimization space | Park et al. | IJPL | Code | Y |
Year | Title | Author | Venue | Code | In |
---|---|---|---|---|---|
2021 | ProGraML: Graph-based Deep Learning for Program Optimization and Analysis | Cummins et al. | ICML | Code | Y |
2020 | Deep program structure modeling through multi-relational graph-based learning | Ye et al. | PACT | Code | Y |
2020 | Designing PairBuddy – A Conversational Agent for Pair Programming | Robe et al. | arXiv | Code | Y |
2021 | On the Evaluation of Commit Message Generation Models: An Experimental Study | Tao et al. | ICSME | Code | Y |
2018 | Large-scale and language-oblivious code authorship identification | Abuhamad et al. | CCS | Code | Y |
Year | Title | Author | Venue | Code | In |
---|---|---|---|---|---|
2019 | Open Vocabulary Learning on Source Code with a Graph-Structured Cache | Cvitkovic et al. | ICML | Code | Y |
2020 | Big code!= big vocabulary: Open-vocabulary models for source code | Karampatsis et al. | ICSE | Code | Y |
2021 | A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code | Chirkova et al. | NAACL | Code | Y |
Year | Title | Author | Venue | Code | In |
---|---|---|---|---|---|
2021 | Disentangled Code Representation Learning for Multiple Programming Languages | Zhang et al. | ACL | Code | Y |
2022 | Multilingual training for Software Engineering | Ahmed et al. | ICSE | Code | Y |
2019 | Clcdsa: cross language code clone detection using syntactical features and api documentation | Nafi et al. | ASE | Code | Y |
2019 | Bilateral dependency neural networks for cross-language algorithm classification | Bui et al. | SANER | Code | Y |
2019 | SAR: learning cross-language API mappings with little knowledge | Bui et al. | FSE | Code | Y |
2021 | Interactive Cross-language Code Retrieval with Auto-Encoders | Chen et al. | ASE | Code | Y |
2022 | Cross-Domain Deep Code Search with Few-Shot Meta Learning | Chai et al. | ICSE | Code | Y |
2022 | Cross-Language Binary-Source Code Matching with Intermediate Representations | Gui et al. | SANER | Code | Y |
Year | Title | Author | Venue | Code | In |
---|---|---|---|---|---|
2021 | Vulnerability Detection with Fine-grained Interpretations | Li et al. | FSE | Code | Y |
2021 | Interpreting deep learning-based vulnerability detector predictions based on heuristic searching | Zou et al. | TOSEM | Code | Y |
2021 | Interpretable Program Synthesis | Zhang et al. | CHI | Code | Y |
2021 | PyExplainer: Explaining the Predictions of Just-In-Time Defect Models | Pornprasit et al. | ASE | Code | Y |