ander-db / tokenizer Goto Github PK
View Code? Open in Web Editor NEWA very fast and low memory usage C++ automaton tokenizer that breaks an input string into a list of tokens looking at tabs, spaces, new lines, and detects special tokens like numbers, prces, personal noms, emails, lexemes, etc. It allows to specify delimeters and detect special cases.
License: The Unlicense