mattn / go-runewidth Goto Github PK

View Code? Open in Web Editor NEW

604.0 13.0 92.0 153 KB

wcwidth for golang

License: MIT License

Go 100.00%

golang go windows wcwidth

go-runewidth's Introduction

go-runewidth

Provides functions to get fixed width of the character or string.

Usage

runewidth.StringWidth("つのだ☆HIRO") == 12

Author

Yasuhiro Matsumoto

License

under the MIT License: http://mattn.mit-license.org/2013

go-runewidth's People

Stargazers

Watchers

Forkers

junegunn kurehajime escribano schachmat jsoref jogramming joshuarubin yeshm y-yagi versus nullne zouhuigang rubyli0612 markus-oberhumer-forks sameer niltonkummer gxed shengyao ppzky tesujiro mirandacong hilalisa commonnet chipper1 portalgun-io isgasho frankspitulski luoyf520 couchbasedeps bdfnl justforkin rivo krisnova brewvet devmanorg idcdog machinechatllc waltarix p-e-w petervd nihaoyin thrive-software wedaly lancosch tslocum srinivas32 aretext zyedidia chadpierce nwater hans007 cywcodingone tklauser johejo saikodi yeshq klauspost hayatoshiba gdamore zhsj boairy gdrens ehztdlefaoyxbekwo visimulator ajunlonglive itchyny tulir okatti-et14 linzhengen ernestrc tty2 ilius newacorn tinywolf3 ex-preman thajeztah aymanbagabas zhouxiaoxiang m9d2 shogo82148 kuredev vforks umlx5h cancue leg100 josharian arp242 isabella232 impxy

go-runewidth's Issues

License

Which license did you adopt for this product? Thanks.

Erroneous interpretation of Na leads to width-zero mathematical symbols

According to the Unicode® Standard Annex #11 Na stands for narrow:

ED5. East Asian Narrow (Na): All other characters that are always narrow and have explicit fullwidth or wide counterparts. These characters are implicitly narrow in East Asian typography and legacy character sets because they have explicit fullwidth or wide counterparts. All of ASCII is an example of East Asian Narrow characters.

Therefore, the characters that are currently considered to belong to the nonassigned table should have width 1, not width 0.

Two of these characters are commonly used in quantum mechanics: |α⟩⟨α|

EDIT: This issue is fixed by #44. Please merge that PR.

Variation Selectors 1 - 256 Report Width = 1

Variation Selectors 1-256 (Unicode range 0xFE00-0xFE0F and 0xE0100-0xE01EF report as width = 1. These are nonprintable characters and should report width 0. I think it would make sense to add them to the nonprint table. I can submit a PR if that sounds good.

Feature request: Add support for zero-width-joiners

It would be great if you could add support for zero-width joiners (ZWJ). I have the following code example which doesn't work as expected:

package main

import (
	"fmt"

	runewidth "github.com/mattn/go-runewidth"
)

func main() {
	e := "👨‍👨‍👧"
	r := []rune(e)
	var widths []int
	for _, c := range r {
		widths = append(widths, runewidth.RuneWidth(c))
	}
	fmt.Printf("%s : len=%d numrunes=%d width=%d widths=%v runes=%X\n", e, len(e), len(r), runewidth.StringWidth(e), widths, r)
}

The output is:

👨‍👨‍👧 : len=18 numrunes=5 width=6 widths=[2 0 2 0 2] runes=[1F468 200D 1F468 200D 1F467]

Specifically, width should be 2 instead of 6. I found this article which explains how they work. It does not only affect emojis but also characters in some languages.

This came up in rivo/tview#161. It would be great if support for ZWJ could be added so I can implement support for these Unicode characters in tview. I understand that not all kinds of combinations are supported and it's probably difficult to figure out which ones are. But assuming these characters are supported will help a lot. I don't expect users to try to print ZWJ combinations which are not supported anyway.

Thanks!

The width of Box-drawing characters

Check this for the definition of box-drawing (BD below) characters.

I found that these characters are defined to be of ambiguous width, so passing these to RuneWidth returns 2 in my environment. This is somehow inconvenient since AFAIK, terminal fonts tend to interpret BD characters in half-width.

Is it possible to remove these characters from the ambiguous table? I can make the PR if you think this sounds sane.

Thanks.

Wrong width for flag symbols

runewidth.StringWidth(🇩🇰) returns 2.

I haven't looked into this at all, and I have no idea what I should expect, but a width of 1 seems reasonable.

Conflict with rivo/uniseg

I am trying to install go fiber 2.40.0 using gov1.15.
I encountered an error saying something like:

github.com/mattn/[email protected]/runewidth.go:7:2: found packages uniseg(doc.go) and main (gen_breaktest.go) in ...

Has anyone else ever encountered this before?

make ngrok release-server error

src/github.com/mattn/go-runewidth/runewidth.go:7:2: found packages uniseg (doc.go) and main (gen_breaktest.go) in /root/ngrok/src/github.com/rivo/uniseg
make: *** [Makefile:8: deps] Error 1

`

Broken benchmark tests

c9bd7d1 and 43a826d broke benchmark tests

$ go test -bench . -benchmem
--- FAIL: BenchmarkRuneWidthAll
    benchmark_test.go:27: got 1293942, want 1293932
goos: linux
goarch: amd64
pkg: github.com/mattn/go-runewidth
cpu: 11th Gen Intel(R) Core(TM) i3-1115G4 @ 3.00GHz
BenchmarkRuneWidth768-4                   650364              1877 ns/op               0 B/op          0 allocs/op
--- FAIL: BenchmarkRuneWidthAllEastAsian
    benchmark_test.go:27: got 1432568, want 1432558
BenchmarkRuneWidth768EastAsian-4           85194             14217 ns/op               0 B/op          0 allocs/op
--- FAIL: BenchmarkString1WidthAll
    benchmark_test.go:62: got 1295990, want 1295980
BenchmarkString1Width768-4                  9513            125876 ns/op           86016 B/op       3072 allocs/op
--- FAIL: BenchmarkString1WidthAllEastAsian
    benchmark_test.go:62: got 1436664, want 1436654
BenchmarkString1Width768EastAsian-4         8168            142574 ns/op           86016 B/op       3072 allocs/op
BenchmarkTablePrivate-4                      656           1798150 ns/op               0 B/op          0 allocs/op
BenchmarkTableNonprint-4                     402           2982255 ns/op               0 B/op          0 allocs/op
BenchmarkTableCombining-4                    264           4511447 ns/op               0 B/op          0 allocs/op
BenchmarkTableDoublewidth-4                  222           5379437 ns/op               0 B/op          0 allocs/op
BenchmarkTableAmbiguous-4                    183           6475643 ns/op               0 B/op          0 allocs/op
BenchmarkTableEmoji-4                        222           5272836 ns/op               0 B/op          0 allocs/op
BenchmarkTableNarrow-4                       522           2255628 ns/op               0 B/op          0 allocs/op
BenchmarkTableNeutral-4                      144           8281886 ns/op               0 B/op          0 allocs/op
FAIL
exit status 1
FAIL    github.com/mattn/go-runewidth   19.880s

Inconsistent behavior under different operating systems

dot := '\uF111' // a dot
println(runewidth.RuneWidth(dot))

/*
    Linux(wsl):1 (correct)
    Windows 11:2 (incorrect)
*/

go version: 1.20.4

Does it has java project？

I was try to parse terminal input command in java, but ANSI code parse is difficult, so I want to use this project. However, this project is written in Go. I would like to ask if there is a Java version of it? Or any similar third-party library?

Please tag a new release

There are 14 commits since master.

`─` not expect

func main() {
	b := `─` // unicode 0x2500
	fmt.Println(runewidth.StringWidth(b))
}

on windows/mac get: 2
on linux get: 1

Rune width of certain CP437 chars like ♦ is 2 instead of 1

I'm trying to port an old DOS program using tcell (which uses RuneWidth). My program has a table mapping CP437 char code to rune, and then I print that rune to the screen. I'm in the terminal with fixed width fonts, so I expect all chars to be the same width.

The issue is RuneWidth('\u2666') and some other characters is returning width 2 instead of 1, which makes tcell allocate 2 chars for it and causes "gaps" in the rendering. Here's playground code showing which chars do this: https://play.golang.org/p/Hjq3GOC0Pcd -- output is:

RuneWidth('☺') = 2
RuneWidth('☻') = 2
RuneWidth('♥') = 2
RuneWidth('♦') = 2
RuneWidth('♣') = 2
RuneWidth('♠') = 2
RuneWidth('♂') = 2
RuneWidth('♀') = 2
RuneWidth('♪') = 2
RuneWidth('♫') = 2
RuneWidth('☼') = 2
RuneWidth('↕') = 2
RuneWidth('‼') = 2
RuneWidth('↔') = 2

I believe it's happening because these are treated as Emoji characters. Is this behavior expected? If so, how do I work around this in tcell?

The go1 tag is out of date; can you update or remove it?

Hi,

Currently, your go1 tag points to commit ce86f93. So when someone does go get -u github.com/mattn/go-runewidth, it will check out that revision.

However, you have newer commits that add Truncate and fix bugs on master, that are not available:

go1...master

Can you either update go1 tag to point to latest stable version (I'm guessing 39104c7), or simpler yet, remove it and let master be the latest go gettable version. You can use feature branches for development and merge them into master when they're ready.

I'm guessing this was an unintended situation, but please let me know if that's not the case. Thanks.

Width of Box Drawing characters and LANG=zh_CN.UTF-8

Hello!

I maintain a golang library for drawing ASCII tables at https://github.com/jedib0t/go-pretty and this is one of the few dependencies I have, to calculate rune width for drawing the tables. Sample: https://go.dev/play/p/I6uxssyXxhN?v=goprev

Now, a couple of users reported some alignment issues, and after some investigation I figured that the Width returned for Box Drawing characters were not the expected values when LANG=zh_CN.UTF-8 or when EastAsianWidth=true is set in go-runewidth.

To replicate the bug, I create this program -- say foo.go:

package main

import (
	"fmt"
	"strings"

	"github.com/mattn/go-runewidth"
)

func main() {
	boxDrawingChars := []string{
		"+", "-", "=",
		"┏", "┳", "┓",
		"┣", "╋", "┫",
		"┗", "┻", "┛",
		"━", "┃",
	}

	cellWidth := 8
	for _, boxDrawingChar := range boxDrawingChars {
		padding := strings.Repeat(" ", cellWidth-runewidth.StringWidth(boxDrawingChar))
		fmt.Printf("| %s%s |\n", boxDrawingChar, padding)
	}
}

Output:

$ LANG=en_US.UTF-8 go run foo.go
| +        |
| -        |
| =        |
| ┏        |
| ┳        |
| ┓        |
| ┣        |
| ╋        |
| ┫        |
| ┗        |
| ┻        |
| ┛        |
| ━        |
| ┃        |

$ LANG=zh_CN.UTF-8 go run foo.go 
| +        |
| -        |
| =        |
| ┏       |
| ┳       |
| ┓       |
| ┣       |
| ╋       |
| ┫       |
| ┗       |
| ┻       |
| ┛       |
| ━       |
| ┃       |

Is this behavior right, or am I using runewidth.RuneWidth/StringWidth incorrectly?

Is width for EN DASH intended to be 2 instead of 1?

Hi,

Consider the following three similar unicode characters:

'-' - Unicode Character 'HYPHEN-MINUS' (U+002D)
'–' - Unicode Character 'EN DASH' (U+2013)
'—' - Unicode Character 'EM DASH' (U+2014)

From shurcooL/markdownfmt#7 (comment), I've learned that go-runewidth considers the width of the first character to be 1, and the width of second and third characters to be 2.

Is that intended?

I'm not sure how to test this reliably, but in most environments it seems that EN DASH has width that's closer to 1 than 2.

Any thoughts on this?

Wrong width reported for some characters

It appears that StringWidth reports the length of certain runes incorrectly. The problem seems to be centered around languages used primarily in India (Tamil, Telugu, and Hindi are examples).

Sample program that shows the problem:

package main

import (
	"fmt"
	"github.com/mattn/go-runewidth"
	"strings"
)

func main() {
	words := []string{
		"English",
		"हिन्द",
		"தமிழ்",
		"ไทย",
		"עברית",
	}

	for _, w := range words {
		max := 12 - runewidth.StringWidth(w)
		fmt.Printf("|%s%s|\n", w, strings.Repeat(" ", max))
	}
}

The output is shows the misalignment in the 2nd and 3rd rows (sorry, but pasting here won't work since Github seems to force "Liberation Mono" as the monospace font and this font appears to have its own issues). I've tried this on terminals, browsers, etc, always with similar results.

go-app-builder fails on windows because of "syscall" in runewidth_windows.go

Im using the package github.com/jhillyerd/go.enmime from an AppEngine classic project where the syscall package is not available.

The go.enmime in turn imports this package github.com/mattn/go-runewidth.

Unfortunately running the project from dev_appserver.py on windows results in:

go-app-builder: Failed parsing input: parser: bad import "syscall" in github.com\mattn\go-runewidth\runewidth.go from GOPATH

I had to change the file runewidth_windows.go to following to make the project build:

package runewidth

import (
	//"syscall"
)

var (
	//kernel32               = syscall.NewLazyDLL("kernel32")
	//procGetConsoleOutputCP = kernel32.NewProc("GetConsoleOutputCP")
)

// IsEastAsian return true if the current locale is CJK
func IsEastAsian() bool {
	return false
}

Runewidth of '…' is not equal to actual width on terminal

Hello,

Runewidth of '…' is not equal to actual width on terminal.
Is this expected?

Width is 1 when it should be 2

I stumbled over a character that, when output to the console directly, takes up two characters. But StringWidth() gives me 1. This is because the first rune of this character has a width of 1 and that's what's being used, see here. I know I wrote this code and I'm sure that you cannot simply add up the widths of individual runes ("🏳️‍🌈" would then have a width of 4 which is obviously wrong) and using the first rune's width worked fine so far. But it turns out that it fails in some cases.

I'm not familiar with Indian characters but it seems to me that the second rune is a modifier that turns the character from a width of 1 into a width of 2. Are you aware of any logic that we could add to go-runewidth that makes this right?

Here's example code that illustrates the issue:

package main

import (
	"fmt"

	runewidth "github.com/mattn/go-runewidth"
)

func main() {
	s := "खा"
	fmt.Println("0123456789")
	fmt.Println(s + "<")
	fmt.Printf("String width: %d\n", runewidth.StringWidth(s))
	var i int
	for _, r := range s {
		fmt.Printf("Rune %s  (%d) width: %d\n", string(r), i, runewidth.RuneWidth(r))
		i++
	}
}

Output (on macOS with iTerm2):

RuneWidth does not equal StringWidth

I stumbled over this while working on #47.

It seems that RuneWidth is not always equal to the StringWidth of a single rune.

This is quite unexpected, TBH.

Please see markus-oberhumer-forks@5da511d for a test case.

Define width?

This is a question about how you are defining "width"? I'm mostly looking for a solution that gives me character width in monospaced fonts. So example in #39 and #36, the "width" would still be 2 as a flag although is considered 1 character in modern renders, it still takes up the space of 2 normal characters.

incorrect rune width for box drawing characters in east asian encoding

When using an east asian encoding, the following runes are given a width of 2 but they should be 1: ─┌└┐┘│.

To reproduce:

export LC_CTYPE="ja_JP.UTF-8"
(in go program)
runewidth.RuneWidth('─') // returns 2

looking at the runewidth_table.go file, the culprit is {0x24EB, 0x254B} in the ambiguous table. I'm not sure how to update this; the file is auto-generated.

In terminal apps which render box characters this can lead to broken rendering:

Let me know if there's anything else I can add. Thanks :)

../../mattn/go-runewidth/runewidth.go:823: function ends without a return statement

go get github.com/brandleesee/TerminalStocks
# github.com/mattn/go-runewidth
../../mattn/go-runewidth/runewidth.go:823: function ends without a return statement

semantic release versioning please

See

Thanks.

Regional Indicators (Flags) and Grapheme Clusters

Here's a short example that illustrates an issue with flags (or "regional indicators"):

fmt.Println(runewidth.StringWidth("🇩🇪")) // Should be "2", outputs "4".

The flag consists of two code points which are processed separately by runewidth. But most modern systems will combine them into one flag emoji.

This is part of a larger topic which I describe in more detail here: gdamore/tcell#264. It doesn't just affect flags but also characters in e.g. Arabic and Korean where there are more sophisticated rules than "combining characters" and zero-width joiners (which you added with #20).

I don't know exactly how you calculate the widths of characters. I'm also not sure how you would solve flags as well as some of the other rules described in the Unicode specification but it would sure be nice as printing these flags currently gives me trouble in tview. There have been multiple issues asking for better support for different languages and emojis so it seems that there are quite a few people who use the terminal with these characters.

(Maybe my new package uniseg can help you here.)

Semantic Versioning: `ZeroWidthJoiner` Removal

ZeroWidthJoiner was removed after v0.0.9: https://github.com/mattn/go-runewidth/blob/v0.0.9/runewidth.go#L14

The next version was v0.0.10, but this introduced a breaking API change.

While being v0 means you can introduce breaking API changes, would it be possible to get a v1 release that can ensure API stability?

It's fine to just keep cutting new versions when API changes happen, but right now it makes managing Go Module dependencies rather painful, since it just assumes patch versions don't introduce breaking changes.

hello,I have a problem.

bash-3.2$ go get -u -d github.com/coreos/etcd/...
# cd .; git clone https://github.com/mattn/go-runewidth /Users/admin/go/src/github.com/mattn/go-runewidth
fatal: could not create work tree dir '/Users/admin/go/src/github.com/mattn/go-runewidth': Permission denied
package github.com/mattn/go-runewidth: exit status 128

linux go build failed

# go build
/go/pkg/mod/github.com/mattn/[email protected]/runewidth.go:7:2: //go:build comment without // +build comment

should rune like tab `\t` have width?

currently on my Linux machine it's 0, and in terminal it's 8, but for most of the IDE, it's customizable.
i don't know if there's other char like this and should i just define the width of it my self?

possible regression ?

Hi,

Updating go-runewidth from v0.0.4 to v0.0.5 break my tests in https://github.com/MichaelMure/go-term-text. go-term-text is a package doing text formatting for the terminal, relying on go-runewidth to get the character width.

Here is example of before/after:

Notice that after switching to 0.0.5, the text go further than it should. As the algorithm remain unchanged, I suspect go-runewidth return a different length. Would that be possible ? If so, why ?

mattn / go-runewidth Goto Github PK

go-runewidth's Introduction

go-runewidth

Usage

Author

License

go-runewidth's People

Stargazers

Watchers

Forkers

go-runewidth's Issues

Recommend Projects

Recommend Topics

Recommend Org