Code Monkey home page Code Monkey logo

Comments (2)

jerch avatar jerch commented on August 16, 2024

Well you have basically two (and a half) options here:

  • implement fully ISO-2022 compliance
    This is how terminals were meant to treat 7/8-bit data streams. Doing this gives you a great level of compatibility to older systems. But this is error-prone in the C0/C1 and G0/G1/G2/G3 state handling. For full support you basically have to be able to change the parser transition table on the fly (like unmapping C1 area and such, or re-declare it as printable for certain character sets). TL;DR - not the way to go these days, ISO-2022 is basically dead.
  • go as Unicode/UTF-8 only emulator
    Any data arriving at the parser, is meant to map correctly on the Unicode codepoint. The parser itself only needs to account up to \xA0, thus UTF-16 and UTF-32 work out of the box. For UTF-8 you have to decode on the fly or prehand, otherwise it will confuse C1 and multibyte characters. Thats the preferred way to start with these days.
  • middle ground between ISO-2022 and Unicode (thats what most emulators do).
    Some character sets of ISO-2022 are still used (like the graphic supplement from DEC), implement that in the terminal. UTF-8 is actually not further specced in ISO-2022 regarding the compliance level. Here every modern emulator would switch into "full UTF-8" - the idea is to treat it as stream encoding (thus a "naked" C1 symbol must not occur anymore, instead it has to be encoded properly as 2 byte character). Switching to another charset in ISO-2022 realms means, to change back into 7/8-bit mode (depending on the sequence initiating it), again allowing single byte C1 codes. This can be achieved w'o parser changes by applying the charset replacements either beforehand on the full stream, or later on the interesting data portions in the terminal functions (the latter is technically not 100% ISO-2022 compatible anymore, but should work due to the stronger "whole stream should be UTF-8" rule). Thats where the confusion starts, and most producers get it wrong with C1 codes. Thus a rule of thumb - never use C1 as 8-bit variant in an env, thats otherwise mainly Unicode/UTF-8.

Imho fully implementing ISO-2022 is a waste of time these days, most programs have adopted to the stream encoding rule of Unicode. If a certain program refuses to work - use luit as transcoder in between.

from swiftterm.

migueldeicaza avatar migueldeicaza commented on August 16, 2024

Thank you for the detailed description! I am going to take another stab at this.

from swiftterm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.