Hi there 👋
ishan-kotian / tokenizer_nlp Goto Github PK
View Code? Open in Web Editor NEWTokenization is a way of separating a piece of text into smaller units called tokens. Here, tokens can be either words, characters, or subwords. Hence, tokenization can be broadly classified into 3 types – word, character, and subword (n-gram characters) tokenization.
Home Page: https://www.kaggle.com/lykin22/tokenizer-nlp
License: MIT License