Code Monkey home page Code Monkey logo

awesome-code-intelligence's Introduction

Awesome Deep Learning for Code Intelligence

Awesome Maintenance

A curated list of awesome research papers, datasets, and tools for applying machine learning techniques to code intelligence, which is about leverages machine learning and data mining techniques to mine knowledge from large-scale code corpus by developing intelligent tools to improve the quality and productivity of computer programming.

Related Survey

Year Title Author Venue Code In
2018 A Survey of Machine Learning for Big Code and Naturalness Allamanis et al. CSUR Code Y
2021 A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research Watson et al. TOSEM Code Y
2020 Synergy between Machine/Deep Learning and Software Engineering- How Far Are We? Wang et al. arXiv Code Y
2020 A Survey on Deep Learning for Software Engineering Yang et al. CSUR Code Y
2020 Deep Learning & Software Engineering- State of Research and Future Directions Devanbu et al. arXiv Code Y
2021 CodeXGLUE- A Machine Learning Benchmark Dataset for Code Understanding and Generation Lu et al. arXiv Code Y

Code Representation

Code Tokens

Year Title Author Venue Code
2017 Synthesizing benchmarks for predictive modeling Cummins et al. CGO Code Y
2015 Toward deep learning software repositories White et al. ICSE Code Y
2016 Summarizing source code using a neural attention model Iyer et al. ACL Code Y
2016 A convolutional attention network for extreme summarization of source code Allamanis et al. ICML Code Y
2019 Open Vocabulary Learning on Source Code with a Graph-Structured Cache Cvitkovic et al. ICML Code Y
2021 A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code Chirkova et al. NAACL Code Y
2020 Learning and Evaluating Contextual Embedding of Source Code Kanade et al. ICML Code Y
2020 Codebert: A pre-trained model for programming and natural languages Feng et al. EMNLP Code Y
2020 Big code!= big vocabulary: Open-vocabulary models for source code Karampatsis et al. ICSE Code Y

API

Year Title Author Venue Code In
2015 How can I use this method? Moreno et al. ICSE Code Y
2017 An unsupervised approach for discovering relevant tutorial fragments for APIs Jiang et al. ICSE Code Y
2017 DeepAM: Migrate APIs with Multi-Modal Sequence to Sequence Learning Deepam et al. IJCAI Code Y
2016 Deep API learning Gu et al. FSE Code Y
2017 Exploring API Embedding for API Usages and Applications Nguyen et al. ICSE Code Y
2019 SAR: learning cross-language API mappings with little knowledge Bui et al. FSE Code Y

AST

Year Title Author Venue Code In
2016 Convolutional neural networks over tree structures for programming language processing Mou et al. AAAI Code Y
2020 Modeling programs hierarchically with stack-augmented LSTM Liu et al. JSS Code Y
2019 A novel neural source code representation based on abstract syntax tree Zhang et al. ICSE Code Y
2018 Deep code comment generation Hu et al. ICPC Code Y
2019 code2vec: Learning distributed representations of code Alon et al. PLDI Code Y
2019 code2seq: Generating Sequences from Structured Representations of Code Alon et al. ICLR Code Y
2020 Structural language models of code Alon et al. ICML Code Y
2017 A syntactic neural model for general-purpose code generation Yin et al. ACL Code Y
2018 Tree-to-tree neural networks for program translation Chen et al. ICLR Code Y

IR

Year Title Author Venue Code In
2018 Neural code comprehension: A learnable representation of code semantics Ben et al. Neurips Code Y
2020 IR2Vec: LLVM IR based Scalable Program Embeddings Venkatakeerthy et al. TACO Code Y
2020 Compiler-based graph representations for deep learning models of code Brauckmann et al. CC Code Y
2021 ProGraML: Graph-based Deep Learning for Program Optimization and Analysis Cummins et al. ICML Code Y
2021 How could Neural Networks understand Programs? Peng et al. ICML Code Y

Code Graphs

Year Title Author Venue Code In
2018 Learning to represent programs with graphs Allamanis et al. ICLR Code Y
2017 Smartpaste: Learning to adapt source code Allamanis et al. arXiv Code Y
2018 Generative code modeling with graphs Brockschmidt et al. ICLR Code Y
2020 Flow2Vec: value-flow-based precise code embedding Sui et al. OOPSLA Code Y
2021 ProGraML: Graph-based Deep Learning for Program Optimization and Analysis Cummins et al. ICML Code Y
2021 PLUR: A Unifying, Graph-Based View of Program Learning, Understanding, and Repair Chen et al. NeurIPS Code Y
2017 Intelligent development environment and software knowledge graph Lin et al. NeurIPS Code Y
2020 Graph4code: A machine interpretable knowledge graph for code Abdelaziz et al. arXiv Code Y
2020 Exploiting Code Knowledge Graph for Bug Localization via Bi-directional Attention Zhang et al. ICPC Code Y

Other Features of Code

Year Title Author Venue Code In
2018 Code vectors: Understanding programs through embedded abstracted symbolic traces Henkel et al. FSE Code Y
2019 Learning to Represent Edits Yin et al. ICLR Code Y
2019 Neural Networks for Modeling Source Code Edits Zhao et al. arXiv Code Y
2020 Cc2vec: Distributed representations of code changes Hoang et al. ICSE Code Y
2019 On Learning Meaningful Code Changes via Neural Machine Translation Tufano et al. ICSE Code Y
2021 Copy that! Editing Sequences by Copying Spans Panthaplackel et al. AAAI Code Y
2020 A Structural Model for Contextual Code Changes Brody et al. OOPSLA Code Y
2021 Learning Structural Edits via Incremental Tree Transformations Yao et al. ICLR Code Y

Hybrid

Year Title Author Venue Code In
2018 Deep code search Gu et al. ICSE Code Y
2016 Deep learning code fragments for code clone detection White et al. ASE Code Y
2018 Deepsim: deep learning code functional similarity Zhao et al. FSE Code Y
2018 Improving automatic source code summarization via deep reinforcement learning Wan et al. ASE Code Y
2019 Multi-modal attention network learning for semantic source code retrieval Wan et al. ASE Code Y

Application

Code Classification

Year Title Author Venue Code In
2016 Convolutional neural networks over tree structures for programming language processing Mou et al. AAAI Code Y
2018 Adapting neural text classification for improved software categorization Leclair et al. ICSME Code Y
2019 Bilateral dependency neural networks for cross-language algorithm classification Bui et al. SANER Code Y
2018 SCC: Automatic classification of code snippets Alreshedy et al. SCAM Code Y
2020 SCC++: predicting the programming language of questions and snippets of Stack Overflow Alrashedy et al. JSS Code Y

Vulnerability Detection and Bug Finding

Year Title Author Venue Code In
2016 Automatically Learning Semantic Features for Defect Prediction Wang et al. ICSE Code Y
2017 Software defect prediction via convolutional neural network Li et al. QRS Code Y
2018 Automatic feature learning for predicting vulnerable software components Dam et al. TSE Code Y
2018 Vuldeepecker: A deep learning-based system for vulnerability detection Li et al. NDSS Code Y
2019 μVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection Zou et al. TPSC Code Y
2021 SySeVR: A framework for using deep learning to detect software vulnerabilities Li et al. TDSC Code Y
2018 Cross-project transfer representation learning for vulnerable function discovery Lin et al. TII Code Y
2018 Maximal divergence sequential autoencoder for binary software vulnerability detection Le et al. ICLR Code Y
2019 Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks Zhou et al. NeurIPS Code Y
2020 Combining graph-based learning with automated data collection for code vulnerability detection Wang et al. TIFS Code Y
2021 DeepWukong: Statically detecting software vulnerabilities using deep graph neural network Cheng et al. TOSEM Code Y
2021 Combining Graph Neural Networks with Expert Knowledge for Smart Contract Vulnerability Detection Liu et al. TKDE Code Y
2021 Vulnerability Detection with Fine-Grained Interpretations Li et al. FSE Code Y
2021 Interpreting deep learning-based vulnerability detector predictions based on heuristic searching Zou et al. TOSEM Code Y
2018 Deepbugs: A learning approach to name-based bug detection Pradel et al. OOPSLA Code Y
2019 Improving bug detection via context-based code representation learning and attention-based neural networks Li et al. OOPSLA Code Y
2020 Neural Attribution for Semantic Bug-Localization in Student Programs Gupta et al. NeurIPS Code Y
2021 Fault Localization with Code Coverage Representation Learning Li et al. ICSE Code Y
2021 Learning to find naming issues with big code and small supervision He et al. PLDI Code Y

Code Completion

Year Title Author Venue Code In
2014 Code completion with statistical language models Raychev et al. PLDI Code Y
2017 Neural code completion Liu et al. ICLR Code Y
2018 Code completion with neural attention and pointer networks Li et al. IJCAI Code Y
2016 Learning python code suggestion with a sparse pointer network Bhoopchand et al. arXiv Code Y
2019 Pythia: Ai-assisted code completion system Svyatkovskiy et al. SIGKDD Code Y
2021 Code prediction by feeding trees to transformers Kim et al. ICSE Code Y
2020 Structural language models of code Alon et al. ICML Code Y
2021 Code completion by modeling flattened abstract syntax trees as graphs Wang et al. AAAI Code Y
2020 IntelliCode Compose: Code Generation Using Transformer Svyatkovskiy et al. FSE Code Y
2020 A Self-Attentional Neural Architecture for Code Completion with Multi-Task Learning Liu et al. ICPC Code Y
2020 Multi-task learning based pre-trained language model for code completion Liu et al. ASE Code Y
2021 Fast and memory-efficient neural code completion Svyatkovskiy et al. MSR Code Y
2020 On-the-Fly Adaptation of Source Code Models using Meta-Learning Shrivastava et al. arXiv Code Y
2019 Generative Code Modeling with Graphs Brockschmidt et al. ICLR Code Y
2018 A Retrieve-and-Edit Framework for Predicting Structured Outputs Hashimoto et al. NIPS Code Y

Type Inference

Year Title Author Venue Code In
2018 MaxSMT-based type inference for Python 3 Hassan et al. CAV Code Y
2004 Faster than C: Static type inference with Starkiller Salib et al. PyCon Proceedings Code Y
2015 Predicting program properties from big code Raychev et al. Communications of the ACM Code Y
2016 Python probabilistic type inference with natural language support Xu et al. FSE Code Y
2018 Deep learning type inference Hellendoorn et al. FSE Code Y
2019 NL2Type: Inferring JavaScript Function Types from Natural Language Information Malik et al. ICSE Code Y
2020 Typewriter: Neural type prediction with search-based validation Pradel et al. FSE Code Y
2020 Lambdanet: Probabilistic type inference using graph neural networks Wei et al. ICLR Code Y
2020 OptTyper: Probabilistic Type Inference by Optimising Logical and Natural Constraints Pandi et al. arXiv Code Y
2020 Typilus: neural type hints Allamanis et al. PLDI Code Y
2021 Type4Py: Deep Similarity Learning-Based Type Inference for Python Mir et al. arXiv Code Y

Code Search

Year Title Author Venue Code In
2015 Codehow: Effective code search based on api understanding and extended boolean model (e) Lv et al. ASE Code Y
2016 Relationship-aware code search for JavaScript frameworks Li et al. FSE Code Y
2018 Deep code search Gu et al. ICSE Code Y
2019 Multi-modal attention network learning for semantic source code retrieval Wan et al. ASE Code Y
2020 A Multi-Perspective Architecture for Semantic Code Search Haldar et al. ACL Code Y
2020 OCoR: An Overlapping-Aware Code Retriever Zhu et al. ASE Code Y
2019 Coacor: Code annotation for code retrieval with reinforcement learning Yao et al. WWW Code Y
2019 Aroma: Code recommendation via structural code search Luan et al. OOPSLA Code Y
2020 Deep Graph Matching and Searching for Semantic Code Retrieval Ling et al. TKDD Code Y
2019 When deep learning met code search Cambronero et al. FSE Code Y
2018 FaCoY: a code-to-code search engine Kim et al. ICSE Code Y
2021 Interactive Cross-language Code Retrieval with Auto-Encoders Chen et al. ASE Code Y

Code Clone Detection

Year Title Author Venue Code In
2002 CCFinder: A multilinguistic token-based code clone detection system for large scale source code Kamiya et al. TSE Code Y
2008 NICAD- Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization Roy et al. ICPC Code Y
2007 Deckard: Scalable and accurate tree-based detection of code clones Jiang et al. ICSE Code Y
2016 Sourcerercc: Scaling code clone detection to big-code Sajnani et al. ICSE Code Y
2016 Deep learning code fragments for code clone detection White et al. ASE Code Y
2017 Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code Wei et al. IJCAI Code Y
2018 Deepsim: deep learning code functional similarity Zhao et al. FSE Code Y
2020 SCDetector: Software Functional Clone Detection Based on Semantic Tokens Analysis Wu et al. ASE Code Y
2019 A novel neural source code representation based on abstract syntax tree Zhang et al. ICSE Code Y
2019 Learning-based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection Buch et al. SANER Code Y
2020 Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree Wang et al. SANER Code Y
2020 funcGNN: A Graph Neural Network Approach to Program Similarity Nair et al. ESEM Code Y
2021 Modeling Functional Similarity in Source Code with Graph-Based Siamese Networks Mehrotra et al. TSE Code Y
2018 Deep Learning Similarities from Different Representations of Source Code Tufano et al. MSR Code Y

Code Summarization

Year Title Author Venue Code In
2010 Supporting program comprehension with source code summarization Haiduc et al. ICSE Code Y
2013 Autocomment: Mining question and answer sites for automatic comment generation Wong et al. ASE Code Y
2015 Clocom: Mining existing source code for automatic comment generation Wong et al. SANER Code Y
2013 Evaluating source code summarization techniques: Replication and expansion Eddy et al. ICPC Code Y
2013 Natural Language Models for Predicting Programming Comments Movshovitz et al. ACL Code Y
2016 A convolutional attention network for extreme summarization of source code Allamanis et al. ICML Code Y
2016 Summarizing source code using a neural attention model Iyer et al. ACL Code Y
2018 Deep code comment generation Hu et al. ICPC Code Y
2019 code2seq: Generating Sequences from Structured Representations of Code Alon et al. ICLR Code Y
2019 Structured neural summarization Fernandes et al. ICLR Code Y
2020 A transformer-based approach for source code summarization Ahmad et al. ACL Code Y
2021 SIT: Code Summarization with Structure-Induced Transformer Wu et al. ACL Code Y
2018 Improving automatic source code summarization via deep reinforcement learning Wan et al. ASE Code Y
2020 Improved code summarization via a graph neural network Leclair et al. ICPC Code Y
2021 CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees Shi et al. EMNLP Code Y
2019 A Neural Model for Generating Natural Language Summaries of Program Subroutines Leclair et al. ICSE Code Y
2020 Improved Automatic Summarization of Subroutines via Attention to File Context Haque et al. MSR Code Y
2020 Suggesting Comment Completions for Python using Neural Language Models Ciurumelea et al. SANER Code Y
2020 Retrieval-based neural source code summarization Zhang et al. ICSE Code Y
2020 Retrieve and refine: exemplar-based neural comment generation Wei et al. ASE Code Y
2021 Retrieval-Augmented Generation for Code Summarization via Hybrid GNN Liu et al. ICLR Code Y
2021 EditSum: A Retrieve-and-Edit Framework for Source Code Summarization Li et al. ASE Code Y
2018 Summarizing source code with transferred api knowledge Hu et al. IJCAI Code Y
2019 Code generation as a dual task of code summarization Wei et al. NeurIPS Code Y
2020 Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning Ye et al. WWW Code Y
2019 Learning to Spot and Refactor Inconsistent Method Names Liu et al. ICSE Code Y
2021 Deep Just-In-Time Inconsistency Detection Between Comments and Source Code Panthaplackel et al. AAAI Code Y
2020 Suggesting Natural Method Names to Check Name Consistencies Nguyen et al. ICSE Code Y
2020 Learning to Update Natural Language Comments Based on Code Changes Panthaplackel et al. ACL Code Y
2020 Automating Just-In-Time Comment Updating Liu et al. ASE Code Y
2021 Automating the removal of obsolete TODO comments Gao et al. FSE Code Y

Program Translation

Year Title Author Venue Code In
2013 Lexical statistical machine translation for language migration Nguyen et al. FSE Code Y
2015 Using machine translation for converting python 2 to python 3 code Aggarwal et al. Technical Report Code Y
2015 Divide-and-conquer approach for multi-phase statistical migration for source code Nguyen et al. ASE Code Y
2018 Tree-to-tree neural networks for program translation Chen et al. ICLR Code Y
2017 DeepAM: Migrate APIs with Multi-Modal Sequence to Sequence Learning Deepam et al. IJCAI Code Y
2020 Unsupervised translation of programming languages Lachaux et al. NeurIPS Code Y

Program Synthesis

Year Title Author Venue Code In
2006 Learning for semantic parsing with statistical machine translation Wong et al. NAACL Code Y
2015 Language to code: Learning semantic parsers for if-this-then-that recipes Quirk et al. ACL Code Y
2016 Language to logical form with neural attention Dong et al. ACL Code Y
2016 Latent attention for if-then program synthesis Liu et al. NIPS Code Y
2016 Improved semantic parsers for if-then statements Beltagy et al. ACL Code Y
2017 A syntactic neural model for general-purpose code generation Yin et al. ACL Code Y
2014 Structured Generative Models of Natural Source Code Maddison et al. ICML Code Y
2016 Latent Predictor Networks for Code Generation Ling et al. ACL Code Y
2017 Abstract Syntax Networks for Code Generation and Semantic Parsing Rabinovich et al. ACL Code Y
2019 A Grammar-Based Structural CNN Decoder for Code Generation Sun et al. AAAI Code Y
2019 Spoc: Search-based pseudocode to code Kulal et al. NIPS Code Y
2018 Mapping Language to Code in Programmatic Context Iyer et al. EMNLP Code Y
2020 HISyn: human learning-inspired natural language programming Nan et al. FSE Code Y
2022 Competition-Level Code Generation with AlphaCode Li et al. AI Code Y
2011 Automating string processing in spreadsheets using input-output examples Gulwani et al. POPL Code Y
2017 Neural Programming by Example Shu et al. AAAI Code Y
2017 DeepCoder: Learning to write programs Balog et al. ICLR Code Y
2017 RobustFill: Neural Program Learning under Noisy I/O Devlin et al. ICML Code Y
2019 Learning to infer program sketches Nye et al. ICML Code Y
2018 Selecting representative examples for program synthesis Pu et al. ICML Code Y
2019 AutoPandas: neural-backed generators for program synthesis Bavishi et al. OOPSLA Code Y
2018 NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System Lin et al. LREC Code Y
2017 Seq2sql: Generating structured queries from natural language using reinforcement learning Zhong et al. arXiv Code Y
2018 An encoder-decoder framework translating natural language to database queries Cai et al. IJCAI Code Y
2018 Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task Yu et al. EMNLP Code Y
2018 Syntaxsqlnet: Syntax tree networks for complex and cross-domain text-to-sql task Yu et al. EMNLP Code Y
2019 Sparc: Cross-domain semantic parsing in context Yu et al. ACL Code Y
2019 CoSQL: A conversational text-to-SQL challenge towards cross-domain natural language interfaces to databases Yu et al. EMNLP Code Y

Program Repair

Year Title Author Venue Code In
2016 Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks Bhatia et al. arXiv Code Y
2018 Syntax and Sensibility: Using language models to detect and correct syntax errors Santos et al. SANER Code Y
2017 DeepFix: Fixing Common C Language Errors by Deep Learning Gupta et al. AAAI Code Y
2021 SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair Chen et al. TSE Code Y
2018 Deep Reinforcement Learning for Programming Language Correction Gupta et al. arXiv Code Y
2019 SampleFix: Learning to Correct Programs by Sampling Diverse Fixes Hajipour et al. arXiv Code Y
2019 Neural Program Repair by Jointly Learning to Localize and Repair Vasic et al. ICLR Code Y
2020 Hoppity: Learning graph transformations to detect and fix bugs in programs Dinella et al. ICLR Code Y
2014 Neural turing machines Graves et al. arXiv Code Y
2019 DeepDelta: Learning to Repair Compilation Errors Mesbah et al. FSE Code Y
2020 Learning to Fix Build Errors with Graph2Diff Neural Networks Tarlow et al. ICSE Code Y
2020 Codit: Code editing with tree-based neural models Chakraborty et al. TSE Code Y
2021 A Syntax-Guided Edit Decoder for Neural Program Repair Zhu et al. FSE Code Y
2020 Graph-based, Self-Supervised Program Repair from Diagnostic Feedback Yasunaga et al. ICML Code Y
2021 TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer Berabi et al. ICML Code Y
2020 Self-Supervised Bug Detection and Repair Allamanis et al. NeurIPS Code Y
2021 CURE: Code-Aware Neural Machine Translation for Automatic Program Repair Jiang et al. ICSE Code Y
2018 An empirical investigation into learning bug-fixing patches in the wild via neural machine translation Tufano et al. ASE Code Y
2018 Learning to Generate Corrective Patches using Neural Machine Translation Hata et al. arXiv Code Y
2018 Learning to Repair Software Vulnerabilities with Generative Adversarial Networks Harer et al. NeurIPS Code Y
2020 Synthesize, execute and debug: Learning to repair for neural program synthesis Gupta et al. NeurIPS Code Y
2020 DLFix: Context-based Code Transformation Learning for Automated Program Repair Li et al. ICSE Code Y
2020 Evaluating Representation Learning of Code Changes for Predicting Patch Correctness in Program Repair Tian et al. ASE Code Y
2004 At the end of synthesis: narrowing program candidates Shriver et al. ICSE-NIER Code Y
2020 Human-in-the-loop automatic program repair Bohme et al. ICST Code Y
2021 Interactive Patch Filtering as Debugging Aid Liang et al. ICSME Code Y
2019 Learning to optimize halide with tree search and random programs Adams et al. TOG Code Y

Code Optimization

Year Title Author Venue Code In
2018 Learning to optimize tensor programs Chen et al. NeurIPS Code Y
2020 FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System Zheng et al. ASPLOS Code Y
2020 Ansor: Generating high-performance tensor programs for deep learning Zheng et al. OSDI Code Y
2013 Predictive modeling in a polyhedral optimization space Park et al. IJPL Code Y

Other Applications

Year Title Author Venue Code In
2021 ProGraML: Graph-based Deep Learning for Program Optimization and Analysis Cummins et al. ICML Code Y
2020 Deep program structure modeling through multi-relational graph-based learning Ye et al. PACT Code Y
2020 Designing PairBuddy – A Conversational Agent for Pair Programming Robe et al. arXiv Code Y
2021 On the Evaluation of Commit Message Generation Models: An Experimental Study Tao et al. ICSME Code Y
2018 Large-scale and language-oblivious code authorship identification Abuhamad et al. CCS Code Y

Dataset

Year Title Author Venue Code In
2019 Codesearchnet challenge: Evaluating the state of semantic code search Husain et al. arXiv Code Y
2021 CoSQA: 20,000+ Web Queries for Code Search and Question Answering Huang et al. ACL Code Y
2016 Probabilistic model for code with decision trees Raychev et al. OOPSLA Code Y
2017 A parallel corpus of Python functions and documentation strings for automated code documentation and code generation Barone et al. IJCNLP Code Y
2020 PyMT5: multi-mode translation of natural language and Python code with transformers Clement et al. EMNLP Code Y
2018 Deep code comment generation Hu et al. ICPC Code Y
2021 Retrieval-Augmented Generation for Code Summarization via Hybrid GNN Liu et al. ICLR Code Y
2018 Deep learning type inference Hellendoorn et al. FSE Code Y
2021 CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks Puri et al. arXiv Code Y
2019 JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation Agashe et al. EMNLP Code Y
2021 ProGraML: Graph-based Deep Learning for Program Optimization and Analysis Cummins et al. ICML Code Y
2019 Recommendations for Datasets for Source Code Summarization Leclair et al. NAACL Code Y
2021 CoDesc: A Large Code-Description Parallel Dataset Hasan et al. ACL Code Y
2021 Measuring Coding Challenge Competence With APPS Hendrycks et al. NeurIPS Code Y
2021 AVATAR: A Parallel Corpus for Java-Python Program Translation Ahmad et al. arXiv Code Y
2018 StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow Yao et al. WWW Code Y
2021 PyTorrent: A Python Library Corpus for Large-scale Language Models Bahrami et al. arXiv Code Y
2021 CodeQA: A Question Answering Dataset for Source Code Comprehension Liu et al. EMNLP Code Y
2021 CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation Lu et al. NeurIPS Code Y

CHALLENGES AND OPPORTUNITIES

Comprehensive Code Representation

Year Title Author Venue Code In
2019 Open Vocabulary Learning on Source Code with a Graph-Structured Cache Cvitkovic et al. ICML Code Y
2020 Big code!= big vocabulary: Open-vocabulary models for source code Karampatsis et al. ICSE Code Y
2021 A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code Chirkova et al. NAACL Code Y

Multi-Lingual and Cross-Language

Year Title Author Venue Code In
2021 Disentangled Code Representation Learning for Multiple Programming Languages Zhang et al. ACL Code Y
2022 Multilingual training for Software Engineering Ahmed et al. ICSE Code Y
2019 Clcdsa: cross language code clone detection using syntactical features and api documentation Nafi et al. ASE Code Y
2019 Bilateral dependency neural networks for cross-language algorithm classification Bui et al. SANER Code Y
2019 SAR: learning cross-language API mappings with little knowledge Bui et al. FSE Code Y
2021 Interactive Cross-language Code Retrieval with Auto-Encoders Chen et al. ASE Code Y
2022 Cross-Domain Deep Code Search with Few-Shot Meta Learning Chai et al. ICSE Code Y
2022 Cross-Language Binary-Source Code Matching with Intermediate Representations Gui et al. SANER Code Y

Model Interpretability

Year Title Author Venue Code In
2021 Vulnerability Detection with Fine-grained Interpretations Li et al. FSE Code Y
2021 Interpreting deep learning-based vulnerability detector predictions based on heuristic searching Zou et al. TOSEM Code Y
2021 Interpretable Program Synthesis Zhang et al. CHI Code Y
2021 PyExplainer: Explaining the Predictions of Just-In-Time Defect Models Pornprasit et al. ASE Code Y

Robustness and Security

Year Title Author Venue Code In
2017 Towards evaluating the robustness of neural networks Carlini et al. SP Code Y
2018 Robust physical-world attacks on deep learning visual classification Eykholt et al. CVPR Code Y
2017 Towards evaluating the robustness of neural networks Carlini et al. SP Code Y
2019 On evaluating adversarial robustness Carlini et al. arXiv Code Y
2020 Adversarial attacks on deep-learning models in natural language processing: A survey Zhang et al. TIST Code Y
2020 Semantic Robustness of Models of Source Code Ramakrishnan et al. arXiv Code Y
2020 Adversarial Examples for Models of Code Yefet et al. OOPSLA Code Y
2021 Adversarial Attacks to API Recommender Systems: Time to Wake Up and Smell the Coffee? Nguyen et al. ASE Code Y
2020 Adversarial robustness for code Bielik et al. ICML Code Y
2021 Adversarial Robustness of Deep Code Comment Generation Zhou et al. arXiv Code Y
2019 Misleading Authorship Attribution of Source Code using Adversarial Learning Quiring et al. USENIX Security Code Y
2021 A Practical Black-box Attack on Source Code Authorship Identification Classifiers Liu et al. TIFS Code Y
2021 Backdoors in Neural Models of Source Code Ramakrishnan et al. arXiv Code Y
2021 You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion Schuster et al. USENIX Security Code Y
2021 Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers Severi et al. USENIX Security Code Y
2020 Generating Adversarial Examples for Holding Robustness of Source Code Processing Models Zhang et al. AAAI Code Y

BIB

awesome-code-intelligence's People

Contributors

wanyao1992 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.