Normalize Unicode characters of a character vector or factor using ICU library. Currently, Windows is not supported.
Firstly, install ICU library and make sure that you can execute icu-config --version
.
$ brew install icu4c
$ brew link icu4c --force
$ sudo aptitude install libicu-dev
$ sudo yum install libicu-devel
Next, clone the repository and install the package.
$ git clone [email protected]:abicky/RUnicode.git
$ R CMD INSTALL RUnicode
unormalize(x, form = c("NFKC", "NFC", "NFKD", "NFD"), encoding = "utf8")
See also ?unormalize
> x <- c("\uff71\uff72\uff73", "\uff11\uff12\uff13")
> x
[1] "アイウ" "123"
> unormalize(x, "NFKC")
[1] "アイウ" "123"
> ga <- "\u304c"
> ga
[1] "が"
> charToRaw(ga)
[1] e3 81 8c
> charToRaw(unormalize(ga, "NFD"))
[1] e3 81 8b e3 82 99