Code Monkey home page Code Monkey logo

the-nlp-pandect's Introduction

The NLP Pandect

The-NLP-Pandect

This pandect (πανδέκτης is Ancient Greek for encyclopedia) was created to help you find almost anything related to Natural Language Processing that is available online.

The-NLP-Resources

Compendiums and awesome lists on the topic of NLP:

NLP Conferences, Paper Summaries and Paper Compendiums:

NLP Progress and NLP Tasks:

NLP Datasets:

Word and Sentence embeddings:

Notebooks, Scripts and Repositories

The-NLP-Podcasts

The-NLP-Newsletter

The-NLP-Meetups

The-NLP-Youtube

The-NLP-Benchmarks

  • SQuAD - Stanford Question Answering Dataset (SQuAD)
  • GLUE - General Language Understanding Evaluation (GLUE) benchmark
  • SuperGLUE - benchmark styled after GLUE with a new set of more difficult language understanding tasks
  • XTREME - Massively Multilingual Multi-task Benchmark
  • decaNLP - The Natural Language Decathlon (decaNLP) for studying general NLP models
  • RACE - ReAding Comprehension dataset collected from English Examinations

The-NLP-Research

General

Embeddings

Repositories

Blogs

Byte Pair Encoding

  • bpemb - Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) [GitHub ~800 stars]
  • subword-nmt - Unsupervised Word Segmentation for Neural Machine Translation and Text Generation [GitHub ~1500 stars]
  • python-bpe - Byte Pair Encoding for Python [GitHub ~100stars]

Transformer-based Architectures

General

Transformer

BERT

GPT-family

General
GPT-3

Other

Distillation, Pruning and Quantization

Automated Summarization

The-NLP-Industry

Transformer-based Architectures

Embeddings as a Service

NLP Recipes Industrial Applications:

NLP Applications in Bio, Finance, Legal and other industries

The-NLP-Speech

General Speech Recognition

  • wav2letter - Automatic Speech Recognition Toolkit [GitHub ~5k stars]
  • DeepSpeech - Baidu's DeepSpeech architecture [GitHub ~14k stars]
  • Acoustic Word Embeddings by Maria Obedkova [Blog, 2020]
  • kaldi - Kaldi is a toolkit for speech recognition [GitHub ~9k stars]
  • awesome-kaldi - resources for using Kaldi [GitHub ~300 stars]

Text to Speech

  • FastSpeech - The Implementation of FastSpeech based on pytorch [GitHub ~500 stars]

The-NLP-Topics

Blogs

Repositories

Data Augmentation

Ethics, Bias, and Equality in NLP

The-NLP-Frameworks

General Purpose

  • transformers by HuggingFace [GitHub ~28k stars]
  • spaCy by Explosion AI [GitHub ~17k stars]
  • flair by Zalando [Github ~9k stars]
  • AllenNLP by AI2 [Github ~9k stars]
  • stanza (former Stanford NLP) [GitHub ~4k stars]
  • spaCy stanza [GitHub ~400 stars]
  • nltk [GitHub ~9k stars]
  • NLP Architect - A Deep Learning NLP/NLU library by Intel® AI Lab [GitHub ~2.5k stars]
  • Kashgari Transfer Learning with focus on Chinese [GitHub ~2k stars]
  • polyglot - Multi-lingual NLP Framework [Github ~2k stars]
  • FARM [GitHub ~1k stars]
  • gobbli by RTI International [GitHub ~200 stars]
  • headliner - training and deployment of seq2seq models [GitHub ~200 stars]
  • SyferText - A privacy preserving NLP framework [GitHub ~100 stars]

Dialog Systems and Speech

  • DeepPavlov by MIPT [Github ~4k stars]
  • ParlAI by FAIR [Github ~6k stars]
  • rasa - Framework for Conversational Agents [GitHub ~9k stars]
  • wav2letter - Automatic Speech Recognition Toolkit [GitHub ~5k stars]

Distributed NLP

Other NLP Topics

General

Tokenization

  • tokenizers - Fast State-of-the-Art Tokenizers optimized for Research and Production [GitHub ~3k stars]
  • SentencePiece - Unsupervised text tokenizer for Neural Network-based text generation [GitHub ~4k stars]
  • SoMaJo - A tokenizer and sentence splitter for German and English web and social media texts [GitHub ~100 stars]

The-NLP-Learning

Books

Courses

Tutorials

The-NLP-Communities

License CC0

Attributions

Resources

  • All linked resources belong to original authors

Icons

Fonts

the-nlp-pandect's People

Contributors

ivan-bilan avatar stephenroller avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.