Code Monkey home page Code Monkey logo

gomojimoji's Introduction

(go) mojimoji

Go Reference

This is a port of the excellent mojimoji library written in Python to Golang.

It provides two functions:

  • HanToZen - half-width to full-width character conversion.
  • ZenToHan - half-width to full-width character conversion.

Each of the functions allow the following options:

  • ASCII - enable or disable ASCII translation.
  • Digits - enable or disable Digits translation.
  • Kana - enable or disable Kana translation.

All options are enabled by default, see examples on their usage.

Logic is implemented as of commit aca2661.

Examples

HanToZen

fmt.Println(HanToZen("ニュージーランド"))
fmt.Println(HanToZen("ニュージーランド Auckland 6012", ASCII(true), Digits(false), Kana(false)))

// Output:
// ニュージーランド
// ニュージーランド Auckland 6012

ZenToHan

fmt.Println(ZenToHan("ニュージーランド"))
fmt.Println(ZenToHan("ニュージーランド Auckland 0123", Kana(false), Digits(true)))

// Output:
// ニュージーランド
// ニュージーランド Auckland 0123

Benchmark

Original library etc.

Original mojimoji, zenhan and unicodedata on my system, for comparison:

In [4]: s = u'ABCDEFG012345' * 10

In [5]: %time for n in range(1000000): mojimoji.zen_to_han(s)
CPU times: user 3.24 s, sys: 1.28 ms, total: 3.24 s
Wall time: 3.24 s

In [6]: %time for n in range(1000000): zenhan.z2h(s)
CPU times: user 26.2 s, sys: 16.3 ms, total: 26.2 s
Wall time: 26.2 s

In [7]: %time for n in range(1000000): unicodedata.normalize('NFKC', s)
CPU times: user 3.12 s, sys: 15.4 ms, total: 3.13 s
Wall time: 3.14 s

This library

ZenToHan and HanToZen use different approaches:

  • ZenToHan uses string.Builder, which is simpler to implement.
  • HanToZen uses direct slice operations to allow for seeking when needed.

ZenToHan:

mojimoji (master)> go test -bench=BenchmarkZenToHanConv
goos: darwin
goarch: amd64
pkg: github.com/rusq/gomojimoji
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkZenToHanConv-16               1        2880823810 ns/op
--- BENCH: BenchmarkZenToHanConv-16
    mojimoji_test.go:98: 2.88079814s
PASS
ok      github.com/rusq/gomojimoji      2.977s

HanToZen:

mojimoji (master)> go test -bench=BenchmarkHanToZen    
goos: darwin
goarch: amd64
pkg: github.com/rusq/gomojimoji
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkHanToZenConv-16               1        2712209539 ns/op
--- BENCH: BenchmarkHanToZenConv-16
    mojimoji_test.go:107: 2.712166151s
PASS
ok      github.com/rusq/gomojimoji      2.804s

gomojimoji's People

Contributors

rusq avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.