Code Monkey home page Code Monkey logo

doken's Introduction

doken CI Status

A minimalistic, general purpose tokenizer generator.

Usage

Use npm to install:

$ npm install doken

Import doken and create a tokenizer with regular expression rules:

const {createTokenizer, regexRule} = require('doken')

const tokenizeJSON = createTokenizer({
  rules: [
    regexRule('_whitespace', /\s+/y, {lineBreaks: true}),
    regexRule('brace', /[{}]/y),
    regexRule('bracket', /[\[\]]/y),
    regexRule('colon', /:/y),
    regexRule('comma', /,/y),
    regexRule('string', /"([^"\n\\]|\\[^\n])*"/y),
    regexRule('number', /(-|\+)?\d+(.\d+)?/y),
    regexRule('boolean', /(true|false)\b/y),
    regexRule('null', /null\b/y)
  ]
})

let tokens = tokenizeJSON(`{"a": "Hello World!"}`)

console.log([...tokens])

API

Rule object

A rule object contains the following fields:

  • type <string> - The type of the token this rule generates. If type starts with an underscore _, the token will not be emitted by the tokenizer.

  • lineBreaks <boolean> (Optional) - Set this property to true if this rule might match line breaks and you want to track it correctly.

  • match <Function> - A function with the following signature:

    (input: string, position: number) =>
      null |
      {
        length: number,
        value: any
      }

    This function will try to get the token of given type at given position in input if applicable. Return null if input at position is not a token with the given type, otherwise return an object.

    The first length characters of input after position will be the matched token. You can optionally return a value which can contain any data that will be attached to the token. If value is not given, it will default to the first length characters of input after position.

Token object

A token will be represented by an object with the following fields:

  • type <string> | null - The type of the token or null if no given rules match the input.
  • length <number> - The length of the token.
  • value <any> - The value generated by the rule.
  • pos <number> - The zero-based position of the first character of the token.
  • row <number> - The zero-based row of the first character of the token.
  • col <number> - The zero-based column of the first character of the token.

doken.createTokenizer(options)

  • options <object>
    • rules <Array<Rule>>
    • strategy 'first' | 'longest' (Optional) - Default: 'first'
  • Returns: <Function>

Generates a tokenize function with the following signature:

(input: string) => IterableIterator<Token>

This function will attempt to tokenize given input, yielding tokens matched by given rules one by one.

Set strategy to 'longest' to match the token with the rule that matches the most characters instead of using the rule that matches first.

doken.regexRule(type, regex[, options])

  • type <string>
  • regex <RegExp>
  • options <object>
    • lineBreaks <boolean> (Optional) - Set this property to true if this rule might match line breaks and you want to track it correctly.
    • value <Function> (Optional) - A function for calculating the token value out of the match.
    • condition <Function> (Optional) - A function for indicating whether to discard match or not.
  • Returns: <Rule>

Returns a rule that attempts to match input string with the given regex.

value can be set to a function (match: RegExpExecArray) => any. The generated token will have the returned value as value.

condition can be set to a function (match: RegExpExecArray) => boolean. Return false to indicate to discard matched result and go on with the next rule.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.