Awesome Deep Learning for Code Intelligence

A curated list of awesome research papers, datasets, and tools for applying machine learning techniques to code intelligence, which is about leverages machine learning and data mining techniques to mine knowledge from large-scale code corpus by developing intelligent tools to improve the quality and productivity of computer programming.

Related Survey

Year	Title	Author	Venue	Code	In
2018	A Survey of Machine Learning for Big Code and Naturalness	Allamanis et al.	CSUR	Code	Y
2021	A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research	Watson et al.	TOSEM	Code	Y
2020	Synergy between Machine/Deep Learning and Software Engineering- How Far Are We?	Wang et al.	arXiv	Code	Y
2020	A Survey on Deep Learning for Software Engineering	Yang et al.	CSUR	Code	Y
2020	Deep Learning & Software Engineering- State of Research and Future Directions	Devanbu et al.	arXiv	Code	Y
2021	CodeXGLUE- A Machine Learning Benchmark Dataset for Code Understanding and Generation	Lu et al.	arXiv	Code	Y

Code Representation

Code Tokens

Year	Title	Author	Venue	Code
2017	Synthesizing benchmarks for predictive modeling	Cummins et al.	CGO	Code	Y
2015	Toward deep learning software repositories	White et al.	ICSE	Code	Y
2016	Summarizing source code using a neural attention model	Iyer et al.	ACL	Code	Y
2016	A convolutional attention network for extreme summarization of source code	Allamanis et al.	ICML	Code	Y
2019	Open Vocabulary Learning on Source Code with a Graph-Structured Cache	Cvitkovic et al.	ICML	Code	Y
2021	A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code	Chirkova et al.	NAACL	Code	Y
2020	Learning and Evaluating Contextual Embedding of Source Code	Kanade et al.	ICML	Code	Y
2020	Codebert: A pre-trained model for programming and natural languages	Feng et al.	EMNLP	Code	Y
2020	Big code!= big vocabulary: Open-vocabulary models for source code	Karampatsis et al.	ICSE	Code	Y

API

Year	Title	Author	Venue	Code	In
2015	How can I use this method?	Moreno et al.	ICSE	Code	Y
2017	An unsupervised approach for discovering relevant tutorial fragments for APIs	Jiang et al.	ICSE	Code	Y
2017	DeepAM: Migrate APIs with Multi-Modal Sequence to Sequence Learning	Deepam et al.	IJCAI	Code	Y
2016	Deep API learning	Gu et al.	FSE	Code	Y
2017	Exploring API Embedding for API Usages and Applications	Nguyen et al.	ICSE	Code	Y
2019	SAR: learning cross-language API mappings with little knowledge	Bui et al.	FSE	Code	Y

AST

Year	Title	Author	Venue	Code	In
2016	Convolutional neural networks over tree structures for programming language processing	Mou et al.	AAAI	Code	Y
2020	Modeling programs hierarchically with stack-augmented LSTM	Liu et al.	JSS	Code	Y
2019	A novel neural source code representation based on abstract syntax tree	Zhang et al.	ICSE	Code	Y
2018	Deep code comment generation	Hu et al.	ICPC	Code	Y
2019	code2vec: Learning distributed representations of code	Alon et al.	PLDI	Code	Y
2019	code2seq: Generating Sequences from Structured Representations of Code	Alon et al.	ICLR	Code	Y
2020	Structural language models of code	Alon et al.	ICML	Code	Y
2017	A syntactic neural model for general-purpose code generation	Yin et al.	ACL	Code	Y
2018	Tree-to-tree neural networks for program translation	Chen et al.	ICLR	Code	Y

IR

Year	Title	Author	Venue	Code	In
2018	Neural code comprehension: A learnable representation of code semantics	Ben et al.	Neurips	Code	Y
2020	IR2Vec: LLVM IR based Scalable Program Embeddings	Venkatakeerthy et al.	TACO	Code	Y
2020	Compiler-based graph representations for deep learning models of code	Brauckmann et al.	CC	Code	Y
2021	ProGraML: Graph-based Deep Learning for Program Optimization and Analysis	Cummins et al.	ICML	Code	Y
2021	How could Neural Networks understand Programs?	Peng et al.	ICML	Code	Y

Code Graphs

Year	Title	Author	Venue	Code	In
2018	Learning to represent programs with graphs	Allamanis et al.	ICLR	Code	Y
2017	Smartpaste: Learning to adapt source code	Allamanis et al.	arXiv	Code	Y
2018	Generative code modeling with graphs	Brockschmidt et al.	ICLR	Code	Y
2020	Flow2Vec: value-flow-based precise code embedding	Sui et al.	OOPSLA	Code	Y
2021	ProGraML: Graph-based Deep Learning for Program Optimization and Analysis	Cummins et al.	ICML	Code	Y
2021	PLUR: A Unifying, Graph-Based View of Program Learning, Understanding, and Repair	Chen et al.	NeurIPS	Code	Y
2017	Intelligent development environment and software knowledge graph	Lin et al.	NeurIPS	Code	Y
2020	Graph4code: A machine interpretable knowledge graph for code	Abdelaziz et al.	arXiv	Code	Y
2020	Exploiting Code Knowledge Graph for Bug Localization via Bi-directional Attention	Zhang et al.	ICPC	Code	Y

Other Features of Code

Year	Title	Author	Venue	Code	In
2018	Code vectors: Understanding programs through embedded abstracted symbolic traces	Henkel et al.	FSE	Code	Y
2019	Learning to Represent Edits	Yin et al.	ICLR	Code	Y
2019	Neural Networks for Modeling Source Code Edits	Zhao et al.	arXiv	Code	Y
2020	Cc2vec: Distributed representations of code changes	Hoang et al.	ICSE	Code	Y
2019	On Learning Meaningful Code Changes via Neural Machine Translation	Tufano et al.	ICSE	Code	Y
2021	Copy that! Editing Sequences by Copying Spans	Panthaplackel et al.	AAAI	Code	Y
2020	A Structural Model for Contextual Code Changes	Brody et al.	OOPSLA	Code	Y
2021	Learning Structural Edits via Incremental Tree Transformations	Yao et al.	ICLR	Code	Y

Hybrid

Year	Title	Author	Venue	Code	In
2018	Deep code search	Gu et al.	ICSE	Code	Y
2016	Deep learning code fragments for code clone detection	White et al.	ASE	Code	Y
2018	Deepsim: deep learning code functional similarity	Zhao et al.	FSE	Code	Y
2018	Improving automatic source code summarization via deep reinforcement learning	Wan et al.	ASE	Code	Y
2019	Multi-modal attention network learning for semantic source code retrieval	Wan et al.	ASE	Code	Y

Application

Code Classification

Year	Title	Author	Venue	Code	In
2016	Convolutional neural networks over tree structures for programming language processing	Mou et al.	AAAI	Code	Y
2018	Adapting neural text classification for improved software categorization	Leclair et al.	ICSME	Code	Y
2019	Bilateral dependency neural networks for cross-language algorithm classification	Bui et al.	SANER	Code	Y
2018	SCC: Automatic classification of code snippets	Alreshedy et al.	SCAM	Code	Y
2020	SCC++: predicting the programming language of questions and snippets of Stack Overflow	Alrashedy et al.	JSS	Code	Y

Vulnerability Detection and Bug Finding

Year	Title	Author	Venue	Code	In
2016	Automatically Learning Semantic Features for Defect Prediction	Wang et al.	ICSE	Code	Y
2017	Software defect prediction via convolutional neural network	Li et al.	QRS	Code	Y
2018	Automatic feature learning for predicting vulnerable software components	Dam et al.	TSE	Code	Y
2018	Vuldeepecker: A deep learning-based system for vulnerability detection	Li et al.	NDSS	Code	Y
2019	μVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection	Zou et al.	TPSC	Code	Y
2021	SySeVR: A framework for using deep learning to detect software vulnerabilities	Li et al.	TDSC	Code	Y
2018	Cross-project transfer representation learning for vulnerable function discovery	Lin et al.	TII	Code	Y
2018	Maximal divergence sequential autoencoder for binary software vulnerability detection	Le et al.	ICLR	Code	Y
2019	Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks	Zhou et al.	NeurIPS	Code	Y
2020	Combining graph-based learning with automated data collection for code vulnerability detection	Wang et al.	TIFS	Code	Y
2021	DeepWukong: Statically detecting software vulnerabilities using deep graph neural network	Cheng et al.	TOSEM	Code	Y
2021	Combining Graph Neural Networks with Expert Knowledge for Smart Contract Vulnerability Detection	Liu et al.	TKDE	Code	Y
2021	Vulnerability Detection with Fine-Grained Interpretations	Li et al.	FSE	Code	Y
2021	Interpreting deep learning-based vulnerability detector predictions based on heuristic searching	Zou et al.	TOSEM	Code	Y
2018	Deepbugs: A learning approach to name-based bug detection	Pradel et al.	OOPSLA	Code	Y
2019	Improving bug detection via context-based code representation learning and attention-based neural networks	Li et al.	OOPSLA	Code	Y
2020	Neural Attribution for Semantic Bug-Localization in Student Programs	Gupta et al.	NeurIPS	Code	Y
2021	Fault Localization with Code Coverage Representation Learning	Li et al.	ICSE	Code	Y
2021	Learning to find naming issues with big code and small supervision	He et al.	PLDI	Code	Y

Code Completion

Year	Title	Author	Venue	Code	In
2014	Code completion with statistical language models	Raychev et al.	PLDI	Code	Y
2017	Neural code completion	Liu et al.	ICLR	Code	Y
2018	Code completion with neural attention and pointer networks	Li et al.	IJCAI	Code	Y
2016	Learning python code suggestion with a sparse pointer network	Bhoopchand et al.	arXiv	Code	Y
2019	Pythia: Ai-assisted code completion system	Svyatkovskiy et al.	SIGKDD	Code	Y
2021	Code prediction by feeding trees to transformers	Kim et al.	ICSE	Code	Y
2020	Structural language models of code	Alon et al.	ICML	Code	Y
2021	Code completion by modeling flattened abstract syntax trees as graphs	Wang et al.	AAAI	Code	Y
2020	IntelliCode Compose: Code Generation Using Transformer	Svyatkovskiy et al.	FSE	Code	Y
2020	A Self-Attentional Neural Architecture for Code Completion with Multi-Task Learning	Liu et al.	ICPC	Code	Y
2020	Multi-task learning based pre-trained language model for code completion	Liu et al.	ASE	Code	Y
2021	Fast and memory-efficient neural code completion	Svyatkovskiy et al.	MSR	Code	Y
2020	On-the-Fly Adaptation of Source Code Models using Meta-Learning	Shrivastava et al.	arXiv	Code	Y
2019	Generative Code Modeling with Graphs	Brockschmidt et al.	ICLR	Code	Y
2018	A Retrieve-and-Edit Framework for Predicting Structured Outputs	Hashimoto et al.	NIPS	Code	Y

Type Inference

Year	Title	Author	Venue	Code	In
2018	MaxSMT-based type inference for Python 3	Hassan et al.	CAV	Code	Y
2004	Faster than C: Static type inference with Starkiller	Salib et al.	PyCon Proceedings	Code	Y
2015	Predicting program properties from big code	Raychev et al.	Communications of the ACM	Code	Y
2016	Python probabilistic type inference with natural language support	Xu et al.	FSE	Code	Y
2018	Deep learning type inference	Hellendoorn et al.	FSE	Code	Y
2019	NL2Type: Inferring JavaScript Function Types from Natural Language Information	Malik et al.	ICSE	Code	Y
2020	Typewriter: Neural type prediction with search-based validation	Pradel et al.	FSE	Code	Y
2020	Lambdanet: Probabilistic type inference using graph neural networks	Wei et al.	ICLR	Code	Y
2020	OptTyper: Probabilistic Type Inference by Optimising Logical and Natural Constraints	Pandi et al.	arXiv	Code	Y
2020	Typilus: neural type hints	Allamanis et al.	PLDI	Code	Y
2021	Type4Py: Deep Similarity Learning-Based Type Inference for Python	Mir et al.	arXiv	Code	Y

Code Search

Year	Title	Author	Venue	Code	In
2015	Codehow: Effective code search based on api understanding and extended boolean model (e)	Lv et al.	ASE	Code	Y
2016	Relationship-aware code search for JavaScript frameworks	Li et al.	FSE	Code	Y
2018	Deep code search	Gu et al.	ICSE	Code	Y
2019	Multi-modal attention network learning for semantic source code retrieval	Wan et al.	ASE	Code	Y
2020	A Multi-Perspective Architecture for Semantic Code Search	Haldar et al.	ACL	Code	Y
2020	OCoR: An Overlapping-Aware Code Retriever	Zhu et al.	ASE	Code	Y
2019	Coacor: Code annotation for code retrieval with reinforcement learning	Yao et al.	WWW	Code	Y
2019	Aroma: Code recommendation via structural code search	Luan et al.	OOPSLA	Code	Y
2020	Deep Graph Matching and Searching for Semantic Code Retrieval	Ling et al.	TKDD	Code	Y
2019	When deep learning met code search	Cambronero et al.	FSE	Code	Y
2018	FaCoY: a code-to-code search engine	Kim et al.	ICSE	Code	Y
2021	Interactive Cross-language Code Retrieval with Auto-Encoders	Chen et al.	ASE	Code	Y

Code Clone Detection

Year	Title	Author	Venue	Code	In
2002	CCFinder: A multilinguistic token-based code clone detection system for large scale source code	Kamiya et al.	TSE	Code	Y
2008	NICAD- Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization	Roy et al.	ICPC	Code	Y
2007	Deckard: Scalable and accurate tree-based detection of code clones	Jiang et al.	ICSE	Code	Y
2016	Sourcerercc: Scaling code clone detection to big-code	Sajnani et al.	ICSE	Code	Y
2016	Deep learning code fragments for code clone detection	White et al.	ASE	Code	Y
2017	Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code	Wei et al.	IJCAI	Code	Y
2018	Deepsim: deep learning code functional similarity	Zhao et al.	FSE	Code	Y
2020	SCDetector: Software Functional Clone Detection Based on Semantic Tokens Analysis	Wu et al.	ASE	Code	Y
2019	A novel neural source code representation based on abstract syntax tree	Zhang et al.	ICSE	Code	Y
2019	Learning-based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection	Buch et al.	SANER	Code	Y
2020	Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree	Wang et al.	SANER	Code	Y
2020	funcGNN: A Graph Neural Network Approach to Program Similarity	Nair et al.	ESEM	Code	Y
2021	Modeling Functional Similarity in Source Code with Graph-Based Siamese Networks	Mehrotra et al.	TSE	Code	Y
2018	Deep Learning Similarities from Different Representations of Source Code	Tufano et al.	MSR	Code	Y

Code Summarization

Year	Title	Author	Venue	Code	In
2010	Supporting program comprehension with source code summarization	Haiduc et al.	ICSE	Code	Y
2013	Autocomment: Mining question and answer sites for automatic comment generation	Wong et al.	ASE	Code	Y
2015	Clocom: Mining existing source code for automatic comment generation	Wong et al.	SANER	Code	Y
2013	Evaluating source code summarization techniques: Replication and expansion	Eddy et al.	ICPC	Code	Y
2013	Natural Language Models for Predicting Programming Comments	Movshovitz et al.	ACL	Code	Y
2016	A convolutional attention network for extreme summarization of source code	Allamanis et al.	ICML	Code	Y
2016	Summarizing source code using a neural attention model	Iyer et al.	ACL	Code	Y
2018	Deep code comment generation	Hu et al.	ICPC	Code	Y
2019	code2seq: Generating Sequences from Structured Representations of Code	Alon et al.	ICLR	Code	Y
2019	Structured neural summarization	Fernandes et al.	ICLR	Code	Y
2020	A transformer-based approach for source code summarization	Ahmad et al.	ACL	Code	Y
2021	SIT: Code Summarization with Structure-Induced Transformer	Wu et al.	ACL	Code	Y
2018	Improving automatic source code summarization via deep reinforcement learning	Wan et al.	ASE	Code	Y
2020	Improved code summarization via a graph neural network	Leclair et al.	ICPC	Code	Y
2021	CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees	Shi et al.	EMNLP	Code	Y
2019	A Neural Model for Generating Natural Language Summaries of Program Subroutines	Leclair et al.	ICSE	Code	Y
2020	Improved Automatic Summarization of Subroutines via Attention to File Context	Haque et al.	MSR	Code	Y
2020	Suggesting Comment Completions for Python using Neural Language Models	Ciurumelea et al.	SANER	Code	Y
2020	Retrieval-based neural source code summarization	Zhang et al.	ICSE	Code	Y
2020	Retrieve and refine: exemplar-based neural comment generation	Wei et al.	ASE	Code	Y
2021	Retrieval-Augmented Generation for Code Summarization via Hybrid GNN	Liu et al.	ICLR	Code	Y
2021	EditSum: A Retrieve-and-Edit Framework for Source Code Summarization	Li et al.	ASE	Code	Y
2018	Summarizing source code with transferred api knowledge	Hu et al.	IJCAI	Code	Y
2019	Code generation as a dual task of code summarization	Wei et al.	NeurIPS	Code	Y
2020	Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning	Ye et al.	WWW	Code	Y
2019	Learning to Spot and Refactor Inconsistent Method Names	Liu et al.	ICSE	Code	Y
2021	Deep Just-In-Time Inconsistency Detection Between Comments and Source Code	Panthaplackel et al.	AAAI	Code	Y
2020	Suggesting Natural Method Names to Check Name Consistencies	Nguyen et al.	ICSE	Code	Y
2020	Learning to Update Natural Language Comments Based on Code Changes	Panthaplackel et al.	ACL	Code	Y
2020	Automating Just-In-Time Comment Updating	Liu et al.	ASE	Code	Y
2021	Automating the removal of obsolete TODO comments	Gao et al.	FSE	Code	Y

Program Translation

Year	Title	Author	Venue	Code	In
2013	Lexical statistical machine translation for language migration	Nguyen et al.	FSE	Code	Y
2015	Using machine translation for converting python 2 to python 3 code	Aggarwal et al.	Technical Report	Code	Y
2015	Divide-and-conquer approach for multi-phase statistical migration for source code	Nguyen et al.	ASE	Code	Y
2018	Tree-to-tree neural networks for program translation	Chen et al.	ICLR	Code	Y
2017	DeepAM: Migrate APIs with Multi-Modal Sequence to Sequence Learning	Deepam et al.	IJCAI	Code	Y
2020	Unsupervised translation of programming languages	Lachaux et al.	NeurIPS	Code	Y

Program Synthesis

Year	Title	Author	Venue	Code	In
2006	Learning for semantic parsing with statistical machine translation	Wong et al.	NAACL	Code	Y
2015	Language to code: Learning semantic parsers for if-this-then-that recipes	Quirk et al.	ACL	Code	Y
2016	Language to logical form with neural attention	Dong et al.	ACL	Code	Y
2016	Latent attention for if-then program synthesis	Liu et al.	NIPS	Code	Y
2016	Improved semantic parsers for if-then statements	Beltagy et al.	ACL	Code	Y
2017	A syntactic neural model for general-purpose code generation	Yin et al.	ACL	Code	Y
2014	Structured Generative Models of Natural Source Code	Maddison et al.	ICML	Code	Y
2016	Latent Predictor Networks for Code Generation	Ling et al.	ACL	Code	Y
2017	Abstract Syntax Networks for Code Generation and Semantic Parsing	Rabinovich et al.	ACL	Code	Y
2019	A Grammar-Based Structural CNN Decoder for Code Generation	Sun et al.	AAAI	Code	Y
2019	Spoc: Search-based pseudocode to code	Kulal et al.	NIPS	Code	Y
2018	Mapping Language to Code in Programmatic Context	Iyer et al.	EMNLP	Code	Y
2020	HISyn: human learning-inspired natural language programming	Nan et al.	FSE	Code	Y
2022	Competition-Level Code Generation with AlphaCode	Li et al.	AI	Code	Y
2011	Automating string processing in spreadsheets using input-output examples	Gulwani et al.	POPL	Code	Y
2017	Neural Programming by Example	Shu et al.	AAAI	Code	Y
2017	DeepCoder: Learning to write programs	Balog et al.	ICLR	Code	Y
2017	RobustFill: Neural Program Learning under Noisy I/O	Devlin et al.	ICML	Code	Y
2019	Learning to infer program sketches	Nye et al.	ICML	Code	Y
2018	Selecting representative examples for program synthesis	Pu et al.	ICML	Code	Y
2019	AutoPandas: neural-backed generators for program synthesis	Bavishi et al.	OOPSLA	Code	Y
2018	NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System	Lin et al.	LREC	Code	Y
2017	Seq2sql: Generating structured queries from natural language using reinforcement learning	Zhong et al.	arXiv	Code	Y
2018	An encoder-decoder framework translating natural language to database queries	Cai et al.	IJCAI	Code	Y
2018	Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task	Yu et al.	EMNLP	Code	Y
2018	Syntaxsqlnet: Syntax tree networks for complex and cross-domain text-to-sql task	Yu et al.	EMNLP	Code	Y
2019	Sparc: Cross-domain semantic parsing in context	Yu et al.	ACL	Code	Y
2019	CoSQL: A conversational text-to-SQL challenge towards cross-domain natural language interfaces to databases	Yu et al.	EMNLP	Code	Y

Program Repair

Year	Title	Author	Venue	Code	In
2016	Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks	Bhatia et al.	arXiv	Code	Y
2018	Syntax and Sensibility: Using language models to detect and correct syntax errors	Santos et al.	SANER	Code	Y
2017	DeepFix: Fixing Common C Language Errors by Deep Learning	Gupta et al.	AAAI	Code	Y
2021	SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair	Chen et al.	TSE	Code	Y
2018	Deep Reinforcement Learning for Programming Language Correction	Gupta et al.	arXiv	Code	Y
2019	SampleFix: Learning to Correct Programs by Sampling Diverse Fixes	Hajipour et al.	arXiv	Code	Y
2019	Neural Program Repair by Jointly Learning to Localize and Repair	Vasic et al.	ICLR	Code	Y
2020	Hoppity: Learning graph transformations to detect and fix bugs in programs	Dinella et al.	ICLR	Code	Y
2014	Neural turing machines	Graves et al.	arXiv	Code	Y
2019	DeepDelta: Learning to Repair Compilation Errors	Mesbah et al.	FSE	Code	Y
2020	Learning to Fix Build Errors with Graph2Diff Neural Networks	Tarlow et al.	ICSE	Code	Y
2020	Codit: Code editing with tree-based neural models	Chakraborty et al.	TSE	Code	Y
2021	A Syntax-Guided Edit Decoder for Neural Program Repair	Zhu et al.	FSE	Code	Y
2020	Graph-based, Self-Supervised Program Repair from Diagnostic Feedback	Yasunaga et al.	ICML	Code	Y
2021	TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer	Berabi et al.	ICML	Code	Y
2020	Self-Supervised Bug Detection and Repair	Allamanis et al.	NeurIPS	Code	Y
2021	CURE: Code-Aware Neural Machine Translation for Automatic Program Repair	Jiang et al.	ICSE	Code	Y
2018	An empirical investigation into learning bug-fixing patches in the wild via neural machine translation	Tufano et al.	ASE	Code	Y
2018	Learning to Generate Corrective Patches using Neural Machine Translation	Hata et al.	arXiv	Code	Y
2018	Learning to Repair Software Vulnerabilities with Generative Adversarial Networks	Harer et al.	NeurIPS	Code	Y
2020	Synthesize, execute and debug: Learning to repair for neural program synthesis	Gupta et al.	NeurIPS	Code	Y
2020	DLFix: Context-based Code Transformation Learning for Automated Program Repair	Li et al.	ICSE	Code	Y
2020	Evaluating Representation Learning of Code Changes for Predicting Patch Correctness in Program Repair	Tian et al.	ASE	Code	Y
2004	At the end of synthesis: narrowing program candidates	Shriver et al.	ICSE-NIER	Code	Y
2020	Human-in-the-loop automatic program repair	Bohme et al.	ICST	Code	Y
2021	Interactive Patch Filtering as Debugging Aid	Liang et al.	ICSME	Code	Y
2019	Learning to optimize halide with tree search and random programs	Adams et al.	TOG	Code	Y

Code Optimization

Year	Title	Author	Venue	Code	In
2018	Learning to optimize tensor programs	Chen et al.	NeurIPS	Code	Y
2020	FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System	Zheng et al.	ASPLOS	Code	Y
2020	Ansor: Generating high-performance tensor programs for deep learning	Zheng et al.	OSDI	Code	Y
2013	Predictive modeling in a polyhedral optimization space	Park et al.	IJPL	Code	Y

Other Applications

Year	Title	Author	Venue	Code	In
2021	ProGraML: Graph-based Deep Learning for Program Optimization and Analysis	Cummins et al.	ICML	Code	Y
2020	Deep program structure modeling through multi-relational graph-based learning	Ye et al.	PACT	Code	Y
2020	Designing PairBuddy – A Conversational Agent for Pair Programming	Robe et al.	arXiv	Code	Y
2021	On the Evaluation of Commit Message Generation Models: An Experimental Study	Tao et al.	ICSME	Code	Y
2018	Large-scale and language-oblivious code authorship identification	Abuhamad et al.	CCS	Code	Y

Dataset

Year	Title	Author	Venue	Code	In
2019	Codesearchnet challenge: Evaluating the state of semantic code search	Husain et al.	arXiv	Code	Y
2021	CoSQA: 20,000+ Web Queries for Code Search and Question Answering	Huang et al.	ACL	Code	Y
2016	Probabilistic model for code with decision trees	Raychev et al.	OOPSLA	Code	Y
2017	A parallel corpus of Python functions and documentation strings for automated code documentation and code generation	Barone et al.	IJCNLP	Code	Y
2020	PyMT5: multi-mode translation of natural language and Python code with transformers	Clement et al.	EMNLP	Code	Y
2018	Deep code comment generation	Hu et al.	ICPC	Code	Y
2021	Retrieval-Augmented Generation for Code Summarization via Hybrid GNN	Liu et al.	ICLR	Code	Y
2018	Deep learning type inference	Hellendoorn et al.	FSE	Code	Y
2021	CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks	Puri et al.	arXiv	Code	Y
2019	JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation	Agashe et al.	EMNLP	Code	Y
2021	ProGraML: Graph-based Deep Learning for Program Optimization and Analysis	Cummins et al.	ICML	Code	Y
2019	Recommendations for Datasets for Source Code Summarization	Leclair et al.	NAACL	Code	Y
2021	CoDesc: A Large Code-Description Parallel Dataset	Hasan et al.	ACL	Code	Y
2021	Measuring Coding Challenge Competence With APPS	Hendrycks et al.	NeurIPS	Code	Y
2021	AVATAR: A Parallel Corpus for Java-Python Program Translation	Ahmad et al.	arXiv	Code	Y
2018	StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow	Yao et al.	WWW	Code	Y
2021	PyTorrent: A Python Library Corpus for Large-scale Language Models	Bahrami et al.	arXiv	Code	Y
2021	CodeQA: A Question Answering Dataset for Source Code Comprehension	Liu et al.	EMNLP	Code	Y
2021	CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation	Lu et al.	NeurIPS	Code	Y

CHALLENGES AND OPPORTUNITIES

Comprehensive Code Representation

Year	Title	Author	Venue	Code	In
2019	Open Vocabulary Learning on Source Code with a Graph-Structured Cache	Cvitkovic et al.	ICML	Code	Y
2020	Big code!= big vocabulary: Open-vocabulary models for source code	Karampatsis et al.	ICSE	Code	Y
2021	A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code	Chirkova et al.	NAACL	Code	Y

Multi-Lingual and Cross-Language

Year	Title	Author	Venue	Code	In
2021	Disentangled Code Representation Learning for Multiple Programming Languages	Zhang et al.	ACL	Code	Y
2022	Multilingual training for Software Engineering	Ahmed et al.	ICSE	Code	Y
2019	Clcdsa: cross language code clone detection using syntactical features and api documentation	Nafi et al.	ASE	Code	Y
2019	Bilateral dependency neural networks for cross-language algorithm classification	Bui et al.	SANER	Code	Y
2019	SAR: learning cross-language API mappings with little knowledge	Bui et al.	FSE	Code	Y
2021	Interactive Cross-language Code Retrieval with Auto-Encoders	Chen et al.	ASE	Code	Y
2022	Cross-Domain Deep Code Search with Few-Shot Meta Learning	Chai et al.	ICSE	Code	Y
2022	Cross-Language Binary-Source Code Matching with Intermediate Representations	Gui et al.	SANER	Code	Y

Model Interpretability

Year	Title	Author	Venue	Code	In
2021	Vulnerability Detection with Fine-grained Interpretations	Li et al.	FSE	Code	Y
2021	Interpreting deep learning-based vulnerability detector predictions based on heuristic searching	Zou et al.	TOSEM	Code	Y
2021	Interpretable Program Synthesis	Zhang et al.	CHI	Code	Y
2021	PyExplainer: Explaining the Predictions of Just-In-Time Defect Models	Pornprasit et al.	ASE	Code	Y

Robustness and Security

Year	Title	Author	Venue	Code	In
2017	Towards evaluating the robustness of neural networks	Carlini et al.	SP	Code	Y
2018	Robust physical-world attacks on deep learning visual classification	Eykholt et al.	CVPR	Code	Y
2017	Towards evaluating the robustness of neural networks	Carlini et al.	SP	Code	Y
2019	On evaluating adversarial robustness	Carlini et al.	arXiv	Code	Y
2020	Adversarial attacks on deep-learning models in natural language processing: A survey	Zhang et al.	TIST	Code	Y
2020	Semantic Robustness of Models of Source Code	Ramakrishnan et al.	arXiv	Code	Y
2020	Adversarial Examples for Models of Code	Yefet et al.	OOPSLA	Code	Y
2021	Adversarial Attacks to API Recommender Systems: Time to Wake Up and Smell the Coffee?	Nguyen et al.	ASE	Code	Y
2020	Adversarial robustness for code	Bielik et al.	ICML	Code	Y
2021	Adversarial Robustness of Deep Code Comment Generation	Zhou et al.	arXiv	Code	Y
2019	Misleading Authorship Attribution of Source Code using Adversarial Learning	Quiring et al.	USENIX Security	Code	Y
2021	A Practical Black-box Attack on Source Code Authorship Identiﬁcation Classiﬁers	Liu et al.	TIFS	Code	Y
2021	Backdoors in Neural Models of Source Code	Ramakrishnan et al.	arXiv	Code	Y
2021	You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion	Schuster et al.	USENIX Security	Code	Y
2021	Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers	Severi et al.	USENIX Security	Code	Y
2020	Generating Adversarial Examples for Holding Robustness of Source Code Processing Models	Zhang et al.	AAAI	Code	Y

alaalial / awesome-code-intelligence Goto Github PK

awesome-code-intelligence's Introduction