Code Monkey home page Code Monkey logo

atpapers's Introduction

ATPapers

Worth-reading papers and related resources on Attention Mechanism, Transformer and Pretrained Language Model (PLM) such as BERT.

Suggestions about fixing errors or adding papers, repositories and other resources are welcomed!

Since I am Chinese, I mainly focus on Chinese resources. Welcome to recommend excellent resources in English or other languages!

值得一读的注意力机制、Transformer和预训练语言模型论文与相关资源集合。

欢迎修正错误以及新增论文、代码仓库与其他资源等建议!

Attention

Papers

  • Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio. (ICML 2015) [paper] - Hard & Soft Attention
  • Effective Approaches to Attention-based Neural Machine Translation. Minh-Thang Luong, Hieu Pham, Christopher D. Manning. (EMNLP 2015) [paper] - Global & Local Attention
  • Neural Machine Translation by Jointly Learning to Align and Translate. Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. (ICLR 2015) [paper]
  • Non-local Neural Networks. Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He. (CVPR 2018) [paper][code]
  • Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures. Gongbo Tang, Mathias Müller, Annette Rios, Rico Sennrich. (EMNLP 2018) [paper]
  • Phrase-level Self-Attention Networks for Universal Sentence Encoding. Wei Wu, Houfeng Wang, Tianyu Liu, Shuming Ma. (EMNLP 2018) [paper]
  • Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling. Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang. (ICLR 2018) [paper][code] - Bi-BloSAN
  • Efficient Attention: Attention with Linear Complexities. Zhuoran Shen, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, Hongsheng Li. (CoRR 2018) [paper][code]
  • Leveraging Local and Global Patterns for Self-Attention Networks. Mingzhou Xu, Derek F. Wong, Baosong Yang, Yue Zhang, Lidia S. Chao. (ACL 2019) [paper] [tf code][pt code]
  • Attention over Heads: A Multi-Hop Attention for Neural Machine Translation. Shohei Iida, Ryuichiro Kimura, Hongyi Cui, Po-Hsuan Hung, Takehito Utsuro, Masaaki Nagata. (ACL 2019) [paper]
  • Are Sixteen Heads Really Better than One?. Paul Michel, Omer Levy, Graham Neubig. (NeurIPS 2019) [paper]

Survey & Review

  • An Attentive Survey of Attention Models. Sneha Chaudhari, Gungor Polatkan, Rohan Ramanath, Varun Mithal. (IJCAI 2019) [paper]

English Blog

Chinese Blog

Repositories

Transformer

Papers

  • Attention is All you Need. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. (NIPS 2017) [paper][code] - Transformer
  • Weighted Transformer Network for Machine Translation. Karim Ahmed, Nitish Shirish Keskar, Richard Socher. (CoRR 2017) [paper][code]
  • Accelerating Neural Transformer via an Average Attention Network. Biao Zhang, Deyi Xiong, Jinsong Su. (ACL 2018) [paper][code] - AAN
  • Self-Attention with Relative Position Representations. Peter Shaw, Jakob Uszkoreit, Ashish Vaswani. (NAACL 2018) [paper] [unoffical code]
  • Universal Transformers. Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Lukasz Kaiser. (ICLR 2019) [paper][code]
  • Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc Viet Le, Ruslan Salakhutdinov. (ACL 2019) [paper]
  • Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, Ivan Titov. (ACL 2019) [paper]
  • Star-Transformer. Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, Zheng Zhang. (NAACL 2019) [paper]
  • Generating Long Sequences with Sparse Transformers. Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever. (CoRR 2019) [paper][code]
  • Memory Transformer Networks. Jonas Metzger. (CS224n Winter2019 Reports) [paper]
  • Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel. Yao-Hung Hubert Tsai, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov. (EMNLP 2019) [paper][code]
  • Transformers without Tears: Improving the Normalization of Self-Attention. Toan Q. Nguyen, Julian Salazar. (IWSLT 2019) [paper][code]
  • TENER: Adapting Transformer Encoder for Named Entity Recognition. Hang Yan, Bocao Deng, Xiaonan Li, Xipeng Qiu. (CoRR 2019) [paper]
  • Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection. Guangxiang Zhao, Junyang Lin, Zhiyuan Zhang, Xuancheng Ren, Qi Su, Xu Sun. (CoRR 2019) [paper][code]
  • Compressive Transformers for Long-Range Sequence Modelling. Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Timothy P. Lillicrap. (ICLR 2020) [paper][code]
  • Reformer: The Efficient Transformer. Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya. (ICLR 2020) [paper] [code 1][code 2][code 3]
  • On Layer Normalization in the Transformer Architecture. Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, Tie-Yan Liu. (ICML 2020) [paper]
  • Lite Transformer with Long-Short Range Attention. Zhanghao Wu, Zhijian Liu, Ji Lin, Yujun Lin, Song Han. (ICLR 2020) [paper][code]
  • ReZero is All You Need: Fast Convergence at Large Depth. Thomas Bachlechner, Bodhisattwa Prasad Majumder, Huanru Henry Mao, Garrison W. Cottrell, Julian McAuley. (CoRR 2020) [paper] [code] [related Chinese post]
  • Improving Transformer Models by Reordering their Sublayers. Ofir Press, Noah A. Smith, Omer Levy. (ACL 2020) [paper]
  • Highway Transformer: Self-Gating Enhanced Self-Attentive Networks. Yekun Chai, Jin Shuo, Xinwen Hou. (ACL 2020) [paper][code]
  • HAT: Hardware-Aware Transformers for Efficient Natural Language Processing. Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, Song Han. (ACL 2020) [paper][code]
  • Longformer: The Long-Document Transformer. Iz Beltagy, Matthew E. Peters, Arman Cohan. (CoRR 2020) [paper][code]
  • Talking-Heads Attention. Noam Shazeer, Zhenzhong Lan, Youlong Cheng, Nan Ding, Le Hou. (CoRR 2020) [paper]
  • Synthesizer: Rethinking Self-Attention in Transformer Models. Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng. (CoRR 2020) [paper]
  • Linformer: Self-Attention with Linear Complexity. Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma. (CoRR 2020) [paper]
  • Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention. Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret. (ICML 2020) [paper][code][project]
  • Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing. Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. (CoRR 2020) [paper][code]
  • Fast Transformers with Clustered Attention. Apoorv Vyas, Angelos Katharopoulos, François Fleuret. (CoRR 2020) [paper][code]
  • Memory Transformer. Mikhail S. Burtsev, Grigory V. Sapunov. (CoRR 2020) [paper]
  • Multi-Head Attention: Collaborate Instead of Concatenate. Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi. (CoRR 2020) [paper][code]

Chinese Blog

English Blog

Repositories

Pretrained Language Model

Models

  • Deep Contextualized Word Representations (NAACL 2018) [paper] - ELMo
  • Universal Language Model Fine-tuning for Text Classification (ACL 2018) [paper] - ULMFit
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (NAACL 2019) [paper][code][official PyTorch code] - BERT
  • Improving Language Understanding by Generative Pre-Training (CoRR 2018) [paper] - GPT
  • Language Models are Unsupervised Multitask Learners (CoRR 2019) [paper][code] - GPT-2
  • MASS: Masked Sequence to Sequence Pre-training for Language Generation (ICML 2019) [paper][code] - MASS
  • Unified Language Model Pre-training for Natural Language Understanding and Generation (CoRR 2019) [paper][code] - UNILM
  • Multi-Task Deep Neural Networks for Natural Language Understanding (ACL 2019) [paper][code] - MT-DNN
  • 75 Languages, 1 Model: Parsing Universal Dependencies Universally[paper][code] - UDify
  • ERNIE: Enhanced Language Representation with Informative Entities (ACL 2019) [paper][code] - ERNIE (THU)
  • ERNIE: Enhanced Representation through Knowledge Integration (CoRR 2019) [paper] - ERNIE (Baidu)
  • Defending Against Neural Fake News (CoRR 2019) [paper][code] - Grover
  • ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (CoRR 2019) [paper] - ERNIE 2.0 (Baidu)
  • Pre-Training with Whole Word Masking for Chinese BERT (CoRR 2019) [paper] - Chinese-BERT-wwm
  • SpanBERT: Improving Pre-training by Representing and Predicting Spans (CoRR 2019) [paper] - SpanBERT
  • XLNet: Generalized Autoregressive Pretraining for Language Understanding (CoRR 2019) [paper][code] - XLNet
  • RoBERTa: A Robustly Optimized BERT Pretraining Approach (CoRR 2019) [paper] - RoBERTa
  • NEZHA: Neural Contextualized Representation for Chinese Language Understanding (CoRR 2019) [paper][code] - NEZHA
  • K-BERT: Enabling Language Representation with Knowledge Graph (AAAI 2020) [paper][code] - K-BERT
  • Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism (CoRR 2019) [paper][code] - Megatron-LM
  • Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transforme (CoRR 2019) [paper][code] - T5
  • BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (CoRR 2019) [paper] - BART
  • ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations (CoRR 2019) [paper][code] - ZEN
  • The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service (CoRR 2019) [paper][code] - BAAI-JDAI-BERT
  • Knowledge Enhanced Contextual Word Representations (EMNLP 2019) [paper] - KnowBert
  • UER: An Open-Source Toolkit for Pre-training Models (EMNLP 2019) [paper][code] - UER
  • ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (ICLR 2020) [paper] - ELECTRA
  • StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding (ICLR 2020) [paper] - StructBERT
  • FreeLB: Enhanced Adversarial Training for Language Understanding (ICLR 2020) [paper][code] - FreeLB
  • HUBERT Untangles BERT to Improve Transfer across NLP Tasks (CoRR 2019) [paper] - HUBERT
  • CodeBERT: A Pre-Trained Model for Programming and Natural Languages (CoRR 2020) [paper] - CodeBERT
  • ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training (CoRR 2020) [paper] - ProphetNet
  • ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation (CoRR 2020) [paper][code] - ERNIE-GEN
  • Efficient Training of BERT by Progressively Stacking (ICML 2019) [paper][code] - StackingBERT
  • PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination (CoRR 2020) [paper][code]
  • Towards a Human-like Open-Domain Chatbot (CoRR 2020) [paper] - Meena
  • UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training (CoRR 2020) [paper][code] - UNILMv2
  • Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space (CoRR 2020) [paper][code] - Optimus
  • SegaBERT: Pre-training of Segment-aware BERT for Language Understanding. He Bai, Peng Shi, Jimmy Lin, Luchen Tan, Kun Xiong, Wen Gao, Ming Li. (CoRR 2020) [paper]
  • MPNet: Masked and Permuted Pre-training for Language Understanding (CoRR 2020) [paper][code] - MPNet
  • Language Models are Few-Shot Learners (CoRR 2020) [paper][code] - GPT-3
  • SPECTER: Document-level Representation Learning using Citation-informed Transformers (ACL 2020) [paper] - SPECTER
  • Recipes for building an open-domain chatbot (CoRR 2020) [paper][post][code] - Blender
  • PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning (CoRR 2020) [paper][code] - PLATO-2
  • DeBERTa: Decoding-enhanced BERT with Disentangled Attention (CoRR 2020) [paper][code] - DeBERTa

Multi-Modal

  • VideoBERT: A Joint Model for Video and Language Representation Learning (ICCV 2019) [paper]
  • Learning Video Representations using Contrastive Bidirectional Transformer (CoRR 2019) [paper] - CBT
  • ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks (NeurIPS 2019) [paper][code]
  • VisualBERT: A Simple and Performant Baseline for Vision and Language (CoRR 2019) [paper][code]
  • Fusion of Detected Objects in Text for Visual Question Answering (EMNLP 2019) [paper][[code]](https://github.com/google-research/ language/tree/master/language/question_answering/b2t2) - B2T2
  • Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training (AAAI 2020) [paper]
  • LXMERT: Learning Cross-Modality Encoder Representations from Transformers (EMNLP 2019) [paper][code]
  • VL-BERT: Pre-training of Generic Visual-Linguistic Representatio (CoRR 2019) [paper][code]
  • UNITER: Learning UNiversal Image-TExt Representations (CoRR 2019) [paper]
  • FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval (SIGIR 2020) [paper] - FashionBERT
  • VD-BERT: A Unified Vision and Dialog Transformer with BERT (CoRR 2020) [paper] - VD-BERT

Multilingual

  • Cross-lingual Language Model Pretraining (CoRR 2019) [paper] - XLM
  • MultiFiT: Efficient Multi-lingual Language Model Fine-tuning (EMNLP 2019) [paper][code] - MultiFit
  • XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization (CoRR 2020) [paper][code] - XTREME
  • Pre-training via Paraphrasing (CoRR 2020) [paper] - MARGE
  • WikiBERT Models: Deep Transfer Learning for Many Languages (CoRR 2020) [paper][code] - WikiBERT
  • Language-agnostic BERT Sentence Embedding (CoRR 2020) [paper] - LaBSE

Compression & Accelerating

  • Distilling Task-Specific Knowledge from BERT into Simple Neural Networks (CoRR 2019) [paper]
  • Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System (CoRR 2019) [paper] - MKDM
  • Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding (CoRR 2019) [paper]
  • Well-Read Students Learn Better: On the Importance of Pre-training Compact Models (CoRR 2019) [paper]
  • Small and Practical BERT Models for Sequence Labeling (EMNLP 2019) [paper]
  • Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT (CoRR 2019) [paper] - Q-BERT
  • Patient Knowledge Distillation for BERT Model Compression (EMNLP 2019) [paper] - BERT-PKD
  • Extreme Language Model Compression with Optimal Subwords and Shared Projections (ICLR 2019) [paper]
  • DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter [paper][code] - DistilBERT
  • TinyBERT: Distilling BERT for Natural Language Understanding (ICLR 2019) [paper][code] - TinyBERT
  • Q8BERT: Quantized 8Bit BERT (NeurIPS 2019 Workshop) [paper] - Q8BERT
  • ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (ICLR 2020) [paper][code] - ALBERT
  • Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning (ICLR 2020) [paper][PyTorch code]
  • Reducing Transformer Depth on Demand with Structured Dropout (ICLR 2020) [paper] - LayerDrop
  • Multilingual Alignment of Contextual Word Representations (ICLR 202) [paper]
  • AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search (CoRR 2020) [paper] - AdaBERT
  • BERT-of-Theseus: Compressing BERT by Progressive Module Replacing. Canwen Xu, Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou. (CoRR 2020) [paper][pt code][tf code][keras code]
  • MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers (CoRR 2020) [paper][code] - MiniLM
  • FastBERT: a Self-distilling BERT with Adaptive Inference Time (ACL 2020) [paper][code] - FastBERT
  • MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices (ACL 2020) [paper][code] - MobileBERT
  • DynaBERT: Dynamic BERT with Adaptive Width and Depth (CoRR 2020) [paper] - DynaBERT
  • SqueezeBERT: What can computer vision teach NLP about efficient neural networks?. Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, Kurt W. Keutzer. (CoRR 2020) [paper]

Application

  • BERT for Joint Intent Classification and Slot Filling (CoRR 2019) [paper]
  • GPT-based Generation for Classical Chinese Poetry (CoRR 2019) [paper]
  • Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (EMNLP 2019) [paper][code]
  • Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring (ICLR 2020) [paper]
  • Pre-training Tasks for Embedding-based Large-scale Retrieval (ICLR 2020) [paper]
  • K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters (CoRR 2020) [paper] - K-Adapter
  • Keyword-Attentive Deep Semantic Matching (CoRR 2020) [paper & code] [post] - Keyword BERT
  • Unified Multi-Criteria Chinese Word Segmentation with BERT (CoRR 2020) [paper]
  • ToD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogues (CoRR 2020) [paper][code]
  • Spelling Error Correction with Soft-Masked BERT (ACL 2020) [paper] - Soft-Masked BERT
  • DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering (ACL 2020) [paper][code] - DeFormer
  • BLEURT: Learning Robust Metrics for Text Generation (ACL 2020) [paper][code] - BLEURT
  • Context-Aware Document Term Weighting for Ad-Hoc Search (WWW 2020) [paper][code] - HDCT

Analysis & Tools

  • Probing Neural Network Comprehension of Natural Language Arguments (ACL 2019) [paper][code]
  • Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference (ACL 2019) [paper] [code]
  • To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks (RepL4NLP@ACL 2019) [paper]
  • Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection (CICLing 2019) [paper]
  • Understanding the Behaviors of BERT in Ranking (CoRR 2019) [paper]
  • How to Fine-Tune BERT for Text Classification? (CoRR 2019) [paper]
  • What Does BERT Look At? An Analysis of BERT's Attention (BlackBoxNLP 2019) [paper][code]
  • Visualizing and Understanding the Effectiveness of BERT (EMNLP 2019) [paper]
  • exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models (CoRR 2019) [paper] [code]
  • Transformers: State-of-the-art Natural Language Processing [paper][code][code]
  • Do Attention Heads in BERT Track Syntactic Dependencies? [paper]
  • Fine-tune BERT with Sparse Self-Attention Mechanism (EMNLP 2019) [paper]
  • How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings (EMNLP 2019) [paper]
  • oLMpics -- On what Language Model Pre-training Captures (CoRR 2019) [paper]
  • Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment (AAAI 2020) [paper][code] - TextFooler
  • A Mutual Information Maximization Perspective of Language Representation Learning (ICLR 2020) [paper]
  • Cross-Lingual Ability of Multilingual BERT: An Empirical Study (ICLR2020) [paper]
  • Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping (CoRR 2020) [paper]
  • How Much Knowledge Can You Pack Into the Parameters of a Language Model? (CoRR 2020) [paper]
  • A Primer in BERTology: What we know about how BERT works. Anna Rogers, Olga Kovaleva, Anna Rumshisky. (CoRR 2020) [paper]
  • BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations (CoRR 2020) [paper]
  • Contextual Embeddings: When Are They Worth It? (ACL 2020) [paper]
  • Weight Poisoning Attacks on Pre-trained Models (ACL 2020) [paper][code] - RIPPLe
  • Roles and Utilization of Attention Heads in Transformer-based Neural Language Models (ACL 2020) [paper][code] - Transformer Anatomy
  • Adversarial Training for Large Neural Language Models (CoRR 2020) [paper][code]
  • Cross-Lingual Ability of Multilingual BERT: An Empirical Study (ICLR 2020) [paper][code]
  • DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference (ACL 2020) [paper][code][huggingface implementation]
  • Beyond Accuracy: Behavioral Testing of NLP models with CheckList. Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh. (ACL 2020 Best Paper) [paper][code]
  • Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith. (ACL 2020) [paper][code]
  • TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing. Ziqing Yang, Yiming Cui, Zhipeng Chen, Wanxiang Che, Ting Liu, Shijin Wang, Guoping Hu. (ACL 2020) [paper][code]
  • Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT. Zhiyong Wu, Yun Chen, Ben Kao, Qun Liu. (ACL 2020) [paper][pt code][keras code]
  • Rethinking Positional Encoding in Language Pre-training. Guolin Ke, Di He, Tie-Yan Liu. (CoRR 2020) [paper][code] - TUPE

Tutorial & Survey

  • Transfer Learning in Natural Language Processing. Sebastian Ruder, Matthew E. Peters, Swabha Swayamdipta, Thomas Wolf. (NAACL 2019) [paper]
  • Evolution of Transfer Learning in Natural Language Processing. Aditya Malte, Pratik Ratadiya. (CoRR 2019) [paper]
  • Transferring NLP Models Across Languages and Domains. Barbara Plank. (DeepLo 2019) [slides]
  • Recent Breakthroughs in Natural Language Processing. Christopher Manning (BAAI 2019) [slides]
  • Pre-trained Models for Natural Language Processing: A Survey. Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, Xuanjing Huang. (Invited Review of Science China Technological Sciences 2020) [paper]
  • Embeddings in Natural Language Processing. Mohammad Taher Pilehvar, Jose Camacho-Collados. (2020) [book]

Repository

Chinese Blog

English Blog

atpapers's People

Contributors

zhengzixiang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.