Code Monkey home page Code Monkey logo

lang-ietf-opentype's Introduction

Mapping IETF language tags to OpenType language system tags

The language of text written in a markup language such as XML or HTML is usually identified by an IETF language tag, conforming to BCP 47. Such text is most often displayed using OpenType fonts. In order to display text optimally with OpenType, it is necessary to use an OpenType language system tag. Unfortunately, for historical reasons, OpenType language system tags have their own specification, and are not the same as IETF language tags. This means that any system that wants to display HTML or XML using OpenType fonts has to convert IETF language tags into OpenType language system tags.

This project is designed to help with this problem, by providing

  • data, in JSON format, declaratively describing the conversion and
  • a node.js module that uses this data to do the conversion.

The OpenType specification associates a number of ISO 639-2 or ISO 639-3 three-letter tags with each OpenType language system. Using ISO 639-2, each such ISO 639 tag can be mapped to an IETF language tag consisting of just a primary language subtag. Although this provides the starting point for the conversion, the following problems still have to be solved:

  • sometimes a particular ISO 639 tag is associated with multiple OpenType tags; these ambiguities can sometimes be resolved by finding an appropriate script, region or variant subtag to add to the primary language subtag;
  • for some OpenType tags, there is no associated ISO 639 tag;
  • sometimes the associated ISO 639 tags are inconsistent with the name of the OpenType language system.

There are extensive notes describing the issues for each OpenType language system tag. The gen directory contains code based on these notes to generate the JSON data file.

Data

The format of the data is a JSON object, which maps an initial subtag to a rule for converting tags that begin with that subtag. The initial subtag is the part of the IETF tag up to the first hyphen, if there is one, or the whole tag, if there is not. The rule is either a string or an array of strings.

If the rule is a string, then the string specifies the OpenType tag.

If the rules is an array of strings, then the array will contain an odd number of strings. The first members of the array is the default OpenType tag. The remaining strings are interpreted in pairs. The first member of each pair specifies a subtag, and the second member specifies the OpenType tag for when the IETF tag contains that subtag. For example, an entry:

"zh": ["ZHS", "latn", "ZHP", "hk", "ZHH", "hant", "ZHT"]

means that an IETF tag that is zh or starts with zh- is handled as follows:

  • if it has a subtag "latn", then it maps to the OpenType tag ZHP;
  • othewise, if it has a subtag "hk", then it maps to the OpenType tag ZHH;
  • otherwise, if it has a subtag "hant", then it maps to the OpenType tag ZHT;
  • otherwise, it maps to the OpenType tag ZHS.

Note that subtags are specified in the data in lower-case but are matched case-insensitively. Subtags that occur in a tag following a singleton subtag (a subtag of length 1) should be ignored when applying a rule.

Module

The module provides a single function.

ietfToOpenType(tag)

This converts an IETF language tag to an OpenType language system tag. The argument must be a string. It returns a string with the OpenType language system tag, or undefined if there is no suitable OpenType language system tag.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.