Code Monkey home page Code Monkey logo

Comments (7)

tianon avatar tianon commented on August 16, 2024

Do you have some examples that don't work so we can more easily attempt to test/verify? (perhaps a short shell script which shows the failure, or specific characters you're having trouble inputting?)

I've done some very simple testing with and it seems to be working, although the input looks a bit strange (replaces the UTF-8 character with ? as soon as I paste it in, but it prints back out correctly):

$ docker run -it --rm busybox
/ # echo '?'

/ # echo -e "\xE2\x98\xA0"

from busybox.

aibar avatar aibar commented on August 16, 2024

Output works, sorry for that.

But input not. When I type UTF-8 characters it shows '?' marks in Shell as you pointed.

In vi it shows '.' marks (docker run -it --rm --entrypoint vi busybox). But it prints correct vi edited text.

from busybox.

tianon avatar tianon commented on August 16, 2024

Interesting -- while looking into this, I see CONFIG_UNICODE_SUPPORT exists, but it's enabled. There's also CONFIG_FEATURE_CHECK_UNICODE_IN_ENV, which checks whether LANG is set to something ending in .utf8, but it's disabled (and when it's disabled, "Unicode support will be always enabled and active.")

from busybox.

tianon avatar tianon commented on August 16, 2024

This looks related:

CONFIG_SUBST_WCHAR:                                               

Typical values are 63 for '?' (works with any output device),     
30 for ASCII substitute control code,                             
65533 (0xfffd) for Unicode replacement character.                 

Symbol: SUBST_WCHAR [=63]                                         
Prompt: Character code to substitute unprintable characters with  
  Defined at Config.in:186                                        
  Depends on: UNICODE_SUPPORT                                     
  Location:                                                       
    -> Busybox Settings                                           
      -> General Configuration                                    
        -> Support Unicode (UNICODE_SUPPORT [=y])                 

from busybox.

tianon avatar tianon commented on August 16, 2024

And this:

CONFIG_LAST_SUPPORTED_WCHAR:                                                                        

Any character with Unicode value bigger than this is assumed                                        
to be non-printable on output device. Many applets replace                                          
such chars with substitution character.                                                             

The idea is that many valid printable Unicode chars are                                             
nevertheless are not displayed correctly. Think about                                               
combining charachers, double-wide hieroglyphs, obscure                                              
characters in dozens of ancient scripts...                                                          
Many terminals, terminal emulators, xterms etc will fail                                            
to handle them correctly. Choose the smallest value                                                 
which suits your needs.                                                                             

Typical values are:                                                                                 
126 - ASCII only                                                                                    
767 (0x2ff) - there are no combining chars in [0..767] range                                        
              (the range includes Latin 1, Latin Ext. A and B),                                     
              code is ~700 bytes smaller for this case.                                             
4351 (0x10ff) - there are no double-wide chars in [0..4351] range,                                  
              code is ~300 bytes smaller for this case.                                             
12799 (0x31ff) - nearly all non-ideographic characters are                                          
              available in [0..12799] range, including                                              
              East Asian scripts like katakana, hiragana, hangul,                                   
              bopomofo...                                                                           
0 - off, any valid printable Unicode character will be printed.                                     

Symbol: LAST_SUPPORTED_WCHAR [=767]                                                                 
Prompt: Range of supported Unicode characters                                                       
  Defined at Config.in:195                                                                          
  Depends on: UNICODE_SUPPORT                                                                       
  Location:                                                                                         
    -> Busybox Settings                                                                             
      -> General Configuration                                                                      
        -> Support Unicode (UNICODE_SUPPORT [=y])                                                   

from busybox.

tianon avatar tianon commented on August 16, 2024

For what it's worth, here are the values Debian uses for its busybox package:

CONFIG_UNICODE_SUPPORT=y
# CONFIG_UNICODE_USING_LOCALE is not set
CONFIG_FEATURE_CHECK_UNICODE_IN_ENV=y
CONFIG_SUBST_WCHAR=63
CONFIG_LAST_SUPPORTED_WCHAR=767
CONFIG_UNICODE_COMBINING_WCHARS=y
CONFIG_UNICODE_WIDE_WCHARS=y

from busybox.

tianon avatar tianon commented on August 16, 2024

Contrast that with Alpine's values:

CONFIG_UNICODE_SUPPORT=y
CONFIG_UNICODE_USING_LOCALE=y
# CONFIG_FEATURE_CHECK_UNICODE_IN_ENV is not set
CONFIG_SUBST_WCHAR=63
CONFIG_LAST_SUPPORTED_WCHAR=1114111
CONFIG_UNICODE_COMBINING_WCHARS=y
CONFIG_UNICODE_WIDE_WCHARS=y
# CONFIG_UNICODE_BIDI_SUPPORT is not set
# CONFIG_UNICODE_NEUTRAL_TABLE is not set
CONFIG_UNICODE_PRESERVE_BROKEN=y

from busybox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.