Code Monkey home page Code Monkey logo

awesome-huggingface's Introduction

awesome-huggingface

This is a list of some wonderful open-source projects & applications integrated with Hugging Face libraries.

How to contribute

๐Ÿค— Official Libraries

First-party cool stuff made with โค๏ธ by ๐Ÿค— Hugging Face.

  • transformers - State-of-the-art natural language processing for Jax, PyTorch and TensorFlow.
  • datasets - The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools.
  • tokenizers - Fast state-of-the-Art tokenizers optimized for research and production.
  • knockknock - Get notified when your training ends with only two additional lines of code.
  • accelerate - A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision.
  • autonlp - Train state-of-the-art natural language processing models and deploy them in a scalable environment automatically.
  • nn_pruning - Prune a model while finetuning or training.
  • huggingface_hub - Client library to download and publish models and other files on the huggingface.co hub.
  • tune - A benchmark for comparing Transformer-based models.

๐Ÿ‘ฉโ€๐Ÿซ Tutorials

Learn how to use Hugging Face toolkits, step-by-step.

  • Official Course (from Hugging Face) - The official course series provided by ๐Ÿค— Hugging Face.
  • transformers-tutorials (by @nielsrogge) - Tutorials for applying multiple models on real-world datasets.

๐Ÿงฐ NLP Toolkits

NLP toolkits built upon Transformers. Swiss Army!

  • AllenNLP (from AI2) - An open-source NLP research library.
  • Graph4NLP - Enabling easy use of Graph Neural Networks for NLP.
  • Lightning Transformers - Transformers with PyTorch Lightning interface.
  • Adapter Transformers - Extension to the Transformers library, integrating adapters into state-of-the-art language models.
  • Obsei - A low-code AI workflow automation tool and performs various NLP tasks in the workflow pipeline.
  • Trapper (from OBSS) - State-of-the-art NLP through transformer models in a modular design and consistent APIs.
  • Flair - A very simple framework for state-of-the-art NLP.

๐Ÿฅก Text Representation

Converting a sentence to a vector.

  • Sentence Transformers (from UKPLab) - Widely used encoders computing dense vector representations for sentences, paragraphs, and images.
  • WhiteningBERT (from Microsoft) - An easy unsupervised sentence embedding approach with whitening.
  • SimCSE (from Princeton) - State-of-the-art sentence embedding with contrastive learning.
  • DensePhrases (from Princeton) - Learning dense representations of phrases at scale.

โš™๏ธ Inference Engines

Highly optimized inference engines implementing Transformers-compatible APIs.

  • TurboTransformers (from Tencent) - An inference engine for transformers with fast C++ API.
  • FasterTransformer (from Nvidia) - A script and recipe to run the highly optimized transformer-based encoder and decoder component on NVIDIA GPUs.
  • lightseq (from ByteDance) - A high performance inference library for sequence processing and generation implemented in CUDA.
  • FastSeq (from Microsoft) - Efficient implementation of popular sequence models (e.g., Bart, ProphetNet) for text generation, summarization, translation tasks etc.

๐ŸŒ— Model Scalability

Parallelization models across multiple GPUs.

  • Parallelformers (from TUNiB) - A library for model parallel deployment.
  • OSLO (from TUNiB) - A library that supports various features to help you train large-scale models.
  • Deepspeed (from Microsoft) - Deepspeed-ZeRO - scales any model size with zero to no changes to the model. Integrated with HF Trainer.
  • fairscale (from Facebook) - Implements ZeRO protocol as well. Integrated with HF Trainer.
  • ColossalAI (from Hpcaitech) - A Unified Deep Learning System for Large-Scale Parallel Training (1D, 2D, 2.5D, 3D and sequence parallelism, and ZeRO protocol).

๐ŸŽ๏ธ Model Compression/Acceleration

Compressing or accelerate models for improved inference speed.

  • torchdistill - PyTorch-based modular, configuration-driven framework for knowledge distillation.
  • TextBrewer (from HFL) - State-of-the-art distillation methods to compress language models.
  • BERT-of-Theseus (from Microsoft) - Compressing BERT by progressively replacing the components of the original BERT.

๐Ÿน๏ธ Adversarial Attack

Conducting adversarial attack to test model robustness.

  • TextAttack (from UVa) - A Python framework for adversarial attacks, data augmentation, and model training in NLP.
  • TextFlint (from Fudan) - A unified multilingual robustness evaluation toolkit for NLP.
  • OpenAttack (from THU) - An open-source textual adversarial attack toolkit.

๐Ÿ” Style Transfer

Transfer the style of text! Now you know why it's called transformer?

  • Styleformer - A neural language style transfer framework to transfer text smoothly between styles.
  • ConSERT - A contrastive framework for self-supervised sentence representation transfer.

๐Ÿ’ข Sentiment Analysis

Analyzing the sentiment and emotions of human beings.

  • conv-emotion - Implementation of different architectures for emotion recognition in conversations.

๐Ÿ™… Grammatical Error Correction

You made a typo! Let me correct it.

  • Gramformer - A framework for detecting, highlighting and correcting grammatical errors on natural language text.

๐Ÿ—บ Translation

Translating between different languages.

  • dl-translate - A deep learning-based translation library based on HF Transformers.
  • EasyNMT (from UKPLab) - Easy-to-use, state-of-the-art translation library and Docker images based on HF Transformers.

๐Ÿ“– Knowledge and Entity

Learning knowledge, mining entities, connecting the world.

  • PURE (from Princeton) - Entity and relation extraction from text.

๐ŸŽ™ Speech

Speech processing powered by HF libraries. Need for speech!

  • s3prl - A self-supervised speech pre-training and representation learning toolkit.
  • speechbrain - A PyTorch-based speech toolkit.

๐Ÿคฏ Multi-modality

Understanding the world from different modalities.

  • ViLT (from Kakao) - A vision-and-language transformer Without convolution or region supervision.

๐Ÿค– Reinforcement Learning

Combining RL magic with NLP!

  • trl - Fine-tune transformers using Proximal Policy Optimization (PPO) to align with human preferences.

โ“ Question Answering

Searching for answers? Transformers to the rescue!

  • Haystack (from deepset) - End-to-end framework for developing and deploying question-answering systems in the wild.

๐Ÿ’ Recommender Systems

I think this is just right for you!

  • Transformers4Rec (from Nvidia) - A flexible and efficient library powered by Transformers for sequential and session-based recommendations.

โš–๏ธ Evaluation

Evaluating model outputs and data quality powered by HF datasets!

  • Jury (from OBSS) - Easy to use tool for evaluating NLP model outputs, spesifically for NLG (Natural Language Generation), offering various automated text-to-text metrics.
  • Spotlight - Interactively explore your HF dataset with one line of code. Use model results (e.g. embeddings, predictions) to understand critical data segments and model failure modes.

๐Ÿ” Neural Search

Search, but with the power of neural networks!

  • Jina Integration - Jina integration of Hugging Face Accelerated API.
  • Weaviate Integration (text2vec) (QA) - Weaviate integration of Hugging Face Transformers.
  • ColBERT (from Stanford) - A fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.

โ˜ Cloud

Cloud makes your life easy!

  • Amazon SageMaker - Making it easier than ever to train Hugging Face Transformer models in Amazon SageMaker.

๐Ÿ“ฑ Hardware

The infrastructure enabling the magic to happen.

  • Qualcomm - Collaboration on enabling Transformers in Snapdragon.
  • Intel - Collaboration with Intel for configuration options.

awesome-huggingface's People

Contributors

bobvanluijt avatar cakiki avatar clmnt avatar davanstrien avatar devrimcavusoglu avatar ghosthamlet avatar hyunwoongko avatar jetrunner avatar josephrp avatar lewtun avatar lysandrejik avatar nreimers avatar patil-suraj avatar sanjaybharkatiya avatar ssuwelack avatar stefan-it avatar wauplin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome-huggingface's Issues

Add Transformers4Rec to the list

I want to add Transformers4Rec a recently released open-source library to the list. The library works as a bridge between NLP and Recommender systems (RecSys) by integrating with HuggingFace Transformers, making state-of-the-art Transformer architectures available for RecSys researchers and industry practitioners.

Transformers4Rec enables the usage of HF Transformers with any type of sequential tabular data. It is a flexible and efficient library for sequential and session-based recommendation available for both PyTorch and Tensorflow.

@JetRunner I am not opening a PR yet because there is no category that can fit the repo-to-add. Transformers4Rec would fit under Recommender Systems category.

Additional resources:

Add jury to the list.

I wanted to add recently introduced package, Jury, to the list which focuses on evaluation of NLG systems. It is built on top of datasets. It was initially released like three weeks ago. It is relatively new, however, it has a great potential.

@JetRunner I do not open a PR directly as I think it is not a great fit for the current sections. A suitable section would be "Evaluation", it can be added under that section.

Project repo: https://github.com/obss/jury

[hacktoberfest] Hugging Face Collections Hacktoberfest challenge

Hugging Face Collections Hacktoberfest challenge!

Hugging Face Collections are a handy tool for curating the Models, Datasets, Spaces and Papers on the hub. We want to see what cool collections you can develop as part of Hacktoberfest 2023!

For this Hacktoberfest challenge, we're asking you to create exciting collections and make a PR to add them to one of the 'Awesome Collections lists' in this repository. At the end of the month, we'll pick some winners who created the best collections and award some prizes!

We'll be judging collections along a few criteria:

  • The upvotes given to the collection. Make sure you share your collection widely!!
  • Usefulness to the community
  • Creativity of the Collection
  • The approach taken to building the Collection

We're interested in 'hand-curated collections' and collections curated using automatic techniques using the huggingface_hub library.

Instructions

  1. Create a fantastic collection! You will probably find the Collections Documentation helpful!
  2. Once you are happy with your Collection, make a PR to add it under the relevant section (hand-curated or automatically created collections)
  3. If you are adding a collection created using automation, we also ask that you provide a link to your approach. This could be a notebook, a Python script, or another explanation for your approach.

Useful resources

To learn more about collections checkout the docs.

To explore automated approaches to Collections, you probably want to check out the relevant docs in the Hugging Face Hub library.

You might also find this tutorial notebook going through the steps of creating a collection using some automatic approaches useful. This notebook shows a similar approach.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.