Code Monkey home page Code Monkey logo

your-code-displays-japanese-wrong's Introduction

title
Your code displays Japanese wrong

Why am I here?

If you were brought here from a link given to you, the person who gave you the link probably thinks your code displays Japanese wrong. This page will give you a brief description of the glyph appearance problems that frequently arise with naive code implementations of Asian text display, why it happens, why it’s a big deal, and how to fix it.

OK, what’s wrong?

Kanji (sometimes called Hanzi, Hanja or Han characters, but let’s call it Kanji for now) is a set of logographic characters that originated in China but are now also used in several other countries, such as Japan, Korea, and Taiwan.

The Kanji glyph sets used in Simplified Chinese, Traditional Chinese, Japanese, and Korean each look mostly similar to each other, but has large numbers of letters that must look distinctly different. For instance, here are the Japanese, Simplified Chinese, and Traditional Chinese glyph variants of the letter ():

Therefore, if text in Japanese is displayed using a Kanji glyph set meant for Chinese, it will immediately stand out to a native Japanese speaker as ugly, non-native, and plain bizarre due to the unfamiliar glyphs showing up in the text. This is most likely what’s happening with your program.

Why does that happen?

Back when Unicode was being designed, a decision called Han Unification was made to create a single unified set of all the Chinese (Simplified/Traditional), Japanese, and Korean Kanji characters. This involved giving equivalent code points to characters that were deemed equivalent across languages, which allowed the size of the character set to be kept small.

However, this also meant that characters which differ in appearance across languages, such as _ and _ and _, were given identical code points! It is up to the font and the program displaying the text to render them using the correct glyph set. Which, if the developer is not aware and don’t do anything about it, frequently fails.

In many cases, the default fallback behavior in an ambiguous situation is to choose the Chinese glyph set. Therefore, Japanese text tends to be incorrectly displayed using Chinese glyphs.

How can I check if it’s happening or not?

Here are some characters that are known to have different glyph appearances between different languages.

Try copy-pasting them into your code, see the rendered results, and compare them with below. If the glyphs don’t look like the Japanese result sample below, your code is displaying Japanese wrong.

How can I fix it?

In a nutshell, the way to fix it is to make your code and font be aware that it’s displaying Japanese when it is doing so.

HTML

Unity (TextMesh Pro)

Do similar problems occur with other languages?

Likely, but I haven’t gone into details since the author is a Japanese who only speaks English/Japanese and doesn’t have too much insight on other languages. If you can offer assistance in problems that happen with other languages please drop me a line.

Why isn’t there steps to fix it in INSERT_ENVIRONMENT_HERE?

More Resources

Wikipedia: Han Unification

  • Do not enable discretionary ligatures.
  • Line-wrapping rules.

your-code-displays-japanese-wrong's People

Contributors

heistak avatar tnj avatar

Stargazers

Takuya MATSUNAGA avatar Keiichiro Ui avatar Shunichiro Nomura avatar  avatar  avatar  avatar  avatar Kazushi avatar Jason Liang avatar Toby Thain avatar parchii avatar Delyan Angelov avatar Antonio D'souza avatar Peng Xiao avatar Shugo MATSUZAWA avatar blackbracken avatar Serhii avatar Eric Johnson avatar  avatar mizuo avatar Oliver Xu avatar Kenrick avatar Yukai Huang avatar  avatar  avatar  avatar Tak avatar nino avatar Takahiro Tomita avatar Pon avatar odanado avatar Satoshi Ohtsuka avatar Daisuke Sugiyama avatar Suphakit P. avatar Nanashi. avatar MAEDA Go avatar Kazuki Tsuoka avatar Ram Shanker avatar Soji Takahashi avatar  avatar Steve Larkin avatar  avatar Max Braun avatar devlights avatar Ethan John Walker avatar  avatar Kristóf Poduszló avatar Riku Ishikawa avatar  avatar mfakane avatar vain0x avatar  avatar Brian Kim avatar SUZUKI Sosuke avatar Shingo Yamazaki avatar Chiaki Takeda avatar  avatar Susumu Ishizuka avatar Shingo Onobori avatar Ryo Namiki avatar Nathanael Beisiegel avatar Jeonghun Baek avatar Poren Chiang avatar PC-CNT avatar OKAMOTO Taichi avatar remin avatar  avatar  avatar Hiroshi Kawana avatar TenNen avatar Johnny Yoon avatar  avatar tone avatar Kentaro Nagatomo avatar  avatar  avatar Benoît Cortier avatar nomissbowling avatar Takuya Fukuju avatar Tsuyoshi CHO avatar isqua avatar Peyang avatar Tomohiro Ishii avatar Yukiko Fujishita avatar Akira avatar  avatar Naoko Takano avatar competor avatar Yihang H avatar Seventeen avatar MikeL avatar Kohei Seino avatar  avatar  avatar Masataka Ogawa avatar junji hashimoto avatar Matsuo Takaharu avatar Oichan avatar Alejandro Exojo avatar SunnyRx avatar

Watchers

Katsuyuki Sakai avatar nomissbowling avatar  avatar

your-code-displays-japanese-wrong's Issues

Question - " the default fallback behavior in an ambiguous situation is to choose the Chinese glyph set" - is this true?

Hi, I'm Mainland Chinese and I know and encounter this problem a lot too, and agree that it needs more exposure and solutions.

However, I have a question about the line "the default fallback behavior in an ambiguous situation is to choose the Chinese glyph set". In my personal (anecdotal, so yes, inherently flawed) experience, I've always observed that the opposite is true. For example, on a fresh install of Windows 11 or Ubuntu, going into YouTube causes Simplified Chinese to be displayed in the Japanese font, most notably the squished-together 复 and 关, and the 直 with the extra vertical stroke to the left.

Actually, just typing those in right now, I just realized that GitHub is doing the same thing for me right now...
image

I've always chalked this down to "Japan industrialized before us, they must have created computer fonts before us, it's only natural". But this is the first time I've heard about Chinese glyphs being displayed in the place of Japanese glyphs. I found that even Traditional Chinese tends to be displayed in Japanese style (like 備).
image

Even Kyuujitai Japanese glyphs take preference over Traditional Chinese glyphs (like 縣).
image

I'm just a little curious about the default behaviors and roots of the problem 😅 Would be nice to get more concrete information about the problem.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.