Code Monkey home page Code Monkey logo

course's Introduction

The Hugging Face Course

This repo contains the content that's used to create the Hugging Face course. The course teaches you about applying Transformers to various tasks in natural language processing and beyond. Along the way, you'll learn how to use the Hugging Face ecosystem โ€” ๐Ÿค— Transformers, ๐Ÿค— Datasets, ๐Ÿค— Tokenizers, and ๐Ÿค— Accelerate โ€” as well as the Hugging Face Hub. It's completely free and open-source!

๐ŸŒŽ Languages and translations

Language Source Authors
English chapters/en @sgugger, @lewtun, @LysandreJik, @Rocketknight1, @sashavor, @osanseviero, @SaulLu, @lvwerra
Bengali (WIP) chapters/bn @avishek-018, @eNipu
German (WIP) chapters/de @JesperDramsch, @MarcusFra, @fabridamicelli
Spanish (WIP) chapters/es @camartinezbu, @munozariasjm, @fordaz
Persian (WIP) chapters/fa @jowharshamshiri, @schoobani
French chapters/fr @lbourdois, @ChainYo, @melaniedrevet, @abdouaziz
Gujarati (WIP) chapters/gu @pandyaved98
Hebrew (WIP) chapters/he @omer-dor
Hindi (WIP) chapters/hi @pandyaved98
Bahasa Indonesia (WIP) chapters/id @gstdl
Italian (WIP) chapters/it @CaterinaBi, @ClonedOne, @Nolanogenn, @EdAbati, @gdacciaro
Japanese (WIP) chapters/ja @hiromu166, @younesbelkada, @HiromuHota
Korean (WIP) chapters/ko @Doohae, @wonhyeongseo, @dlfrnaos19, @nsbg
Portuguese (WIP) chapters/pt @johnnv1, @victorescosta, @LincolnVS
Russian (WIP) chapters/ru @pdumin, @svv73
Thai (WIP) chapters/th @peeraponw, @a-krirk, @jomariya23156, @ckingkan
Turkish (WIP) chapters/tr @tanersekmen, @mertbozkir, @ftarlaci, @akkasayaz
Vietnamese chapters/vi @honghanhh
Chinese (simplified) chapters/zh-CN @zhlhyx, petrichor1122, @1375626371
Chinese (traditional) (WIP) chapters/zh-TW @davidpeng86

Translating the course into your language

As part of our mission to democratise machine learning, we'd love to have the course available in many more languages! Please follow the steps below if you'd like to help translate the course into your language ๐Ÿ™.

๐Ÿ—ž๏ธ Open an issue

To get started, navigate to the Issues page of this repo and check if anyone else has opened an issue for your language. If not, open a new issue by selecting the Translation template from the New issue button.

Once an issue is created, post a comment to indicate which chapters you'd like to work on and we'll add your name to the list.

๐Ÿ—ฃ Join our Discord

Since it can be difficult to discuss translation details quickly over GitHub issues, we have created dedicated channels for each language on our Discord server. If you'd like to join, follow the instructions at this channel ๐Ÿ‘‰: https://discord.gg/JfAtkvEtRb

๐Ÿด Fork the repository

Next, you'll need to fork this repo. You can do this by clicking on the Fork button on the top-right corner of this repo's page.

Once you've forked the repo, you'll want to get the files on your local machine for editing. You can do that by cloning the fork with Git as follows:

git clone https://github.com/YOUR-USERNAME/course

๐Ÿ“‹ Copy-paste the English files with a new language code

The course files are organised under a main directory:

  • chapters: all the text and code snippets associated with the course.

You'll only need to copy the files in the chapters/en directory, so first navigate to your fork of the repo and run the following:

cd ~/path/to/course
cp -r chapters/en/CHAPTER-NUMBER chapters/LANG-ID/CHAPTER-NUMBER

Here, CHAPTER-NUMBER refers to the chapter you'd like to work on and LANG-ID should be one of the ISO 639-1 or ISO 639-2 language codes -- see here for a handy table.

โœ๏ธ Start translating

Now comes the fun part - translating the text! The first thing we recommend is translating the part of the _toctree.yml file that corresponds to your chapter. This file is used to render the table of contents on the website and provide the links to the Colab notebooks. The only fields you should change are the title, ones -- for example, here are the parts of _toctree.yml that we'd translate for Chapter 0:

- title: 0. Setup # Translate this!
  sections:
  - local: chapter0/1 # Do not change this!
    title: Introduction # Translate this!

๐Ÿšจ Make sure the _toctree.yml file only contains the sections that have been translated! Otherwise you won't be able to build the content on the website or locally (see below how).

Once you have translated the _toctree.yml file, you can start translating the MDX files associated with your chapter.

๐Ÿ™‹ If the _toctree.yml file doesn't yet exist for your language, you can simply create one by copy-pasting from the English version and deleting the sections that aren't related to your chapter. Just make sure it exists in the chapters/LANG-ID/ directory!

๐Ÿ‘ทโ€โ™‚๏ธ Build the course locally

Once you're happy with your changes, you can preview how they'll look by first installing the doc-builder tool that we use for building all documentation at Hugging Face:

pip install hf-doc-builder
doc-builder preview course ../course/chapters/LANG-ID --not_python_module

**preview command does not work with Windows.

This will build and render the course on http://localhost:3000/. Although the content looks much nicer on the Hugging Face website, this step will still allow you to check that everything is formatted correctly.

๐Ÿš€ Submit a pull request

If the translations look good locally, the final step is to prepare the content for a pull request. Here, the first think to check is that the files are formatted correctly. For that you can run:

pip install -r requirements.txt
make style

Once that's run, commit any changes, open a pull request, and tag @lewtun for a review. Congratulations, you've now completed your first translation ๐Ÿฅณ!

๐Ÿšจ To build the course on the website, double-check your language code exists in languages field of the build_documentation.yml and build_pr_documentation.yml files in the .github folder. If not, just add them in their alphabetical order.

๐Ÿ“” Jupyter notebooks

The Jupyter notebooks containing all the code from the course are hosted on the huggingface/notebooks repo. If you wish to generate them locally, first install the required dependencies:

python -m pip install -r requirements.txt

Then run the following script:

python utils/generate_notebooks.py --output_dir nbs

This script extracts all the code snippets from the chapters and stores them as notebooks in the nbs folder (which is ignored by Git by default).

โœ๏ธ Contributing a new chapter

Note: we are not currently accepting community contributions for new chapters. These instructions are for the Hugging Face authors.

Adding a new chapter to the course is quite simple:

  1. Create a new directory under chapters/en/chapterX, where chapterX is the chapter you'd like to add.
  2. Add numbered MDX files sectionX.mdx for each section. If you need to include images, place them in the huggingface-course/documentation-images repository and use the HTML Images Syntax with the path https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/{langY}/{chapterX}/{your-image.png}.
  3. Update the _toctree.yml file to include your chapter sections -- this information will render the table of contents on the website. If your section involves both the PyTorch and TensorFlow APIs of transformers, make sure you include links to both Colabs in the colab field.

If you get stuck, check out one of the existing chapters -- this will often show you the expected syntax.

Once you are happy with the content, open a pull request and tag @lewtun for a review. We recommend adding the first chapter draft as a single pull request -- the team will then provide feedback internally to iterate on the content ๐Ÿค—!

๐Ÿ™Œ Acknowledgements

The structure of this repo and README are inspired by the wonderful Advanced NLP with spaCy course.

course's People

Contributors

lewtun avatar abidlabs avatar bon-qi avatar yaoqih avatar tyisme614 avatar dawoodkhan82 avatar mishig25 avatar xianbaoqian avatar lbourdois avatar johnnv1 avatar jowharshamshiri avatar pdumin avatar haruki-n avatar jomariya23156 avatar victorescosta avatar akkasayaz avatar icell avatar younesbelkada avatar omer-dor avatar caterinabi avatar enipu avatar pandyaved98 avatar a-krirk avatar peeraponw avatar nuass avatar gdacciaro avatar ftarlaci avatar lvwerra avatar camartinezbu avatar blackdoor571 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.