Code Monkey home page Code Monkey logo

jaconv's Introduction

jaconv

travis-ci.org coveralls.io pyversion latest version license download

jaconv (Japanese Converter) is interconverter for Hiragana, Katakana, Hankaku (half-width character) and Zenkaku (full-width character)

Japanese README is available.

INSTALLATION

$ pip install jaconv

USAGE

See also document

import jaconv

# Hiragana to Katakana
jaconv.hira2kata('ともえまみ')
# => 'トモエマミ'

# Hiragana to half-width Katakana
jaconv.hira2hkata('ともえまみ')
# => 'トモエマミ'

# Katakana to Hiragana
jaconv.kata2hira('巴マミ')
# => '巴まみ'

# half-width character to full-width character
# default parameters are followings: kana=True, ascii=False, digit=False
jaconv.h2z('ティロ・フィナーレ')
# => 'ティロ・フィナーレ'

# half-width character to full-width character
# but only ascii characters
jaconv.h2z('abc', kana=False, ascii=True, digit=False)
# => 'abc'

# half-width character to full-width character
# but only digit characters
jaconv.h2z('123', kana=False, ascii=False, digit=True)
# => '123'

# half-width character to full-width character
# except half-width Katakana
jaconv.h2z('アabc123', kana=False, digit=True, ascii=True)
# => 'アabc123'

# an alias of h2z
jaconv.hankaku2zenkaku('ティロ・フィナーレabc123')
# => 'ティロ・フィナーレabc123'

# full-width character to half-width character
# default parameters are followings: kana=True, ascii=False, digit=False
jaconv.z2h('ティロ・フィナーレ')
# => 'ティロ・フィナーレ'

# full-width character to half-width character
# but only ascii characters
jaconv.z2h('abc', kana=False, ascii=True, digit=False)
# => 'abc'

# full-width character to half-width character
# but only digit characters
jaconv.z2h('123', kana=False, ascii=False, digit=True)
# => '123'

# full-width character to half-width character
# except full-width Katakana
jaconv.z2h('アabc123', kana=False, digit=True, ascii=True)
# => 'アabc123'

# an alias of z2h
jaconv.zenkaku2hankaku('ティロ・フィナーレabc123')
# => 'ティロ・フィナーレabc123'

# normalize
jaconv.normalize('ティロ・フィナ〜レ', 'NFKC')
# => 'ティロ・フィナーレ'

# Hiragana to alphabet
jaconv.kana2alphabet('じゃぱん')
# => 'japan'

# Alphabet to Hiragana
jaconv.alphabet2kana('japan')
# => 'じゃぱん'

# Katakana to Alphabet
jaconv.kata2alphabet('ケツイ')
# => 'ketsui'

# Alphabet to Katakana
jaconv.alphabet2kata('namba')
# => 'ナンバ'

# Hiragana to Julius's phoneme format
jaconv.hiragana2julius('てんきすごくいいいいいい')
# => 't e N k i s u g o k u i:'

NOTE

jaconv.normalize method expand unicodedata.normalize for Japanese language processing.

'〜' => 'ー'
'~' => 'ー'
"’" => "'"
'”'=> '"'
'“' => '``'
'―' => '-'
'‐' => '-'
'˗' => '-'
'֊' => '-'
'‐' => '-'
'‑' => '-'
'‒' => '-'
'–' => '-'
'⁃' => '-'
'⁻' => '-'
'₋' => '-'
'−' => '-'
'﹣' => 'ー'
'-' => 'ー'
'—' => 'ー'
'―' => 'ー'
'━' => 'ー'
'─' => 'ー'

jaconv's People

Contributors

cuddlemuffin007 avatar frog42 avatar furukawatakumi avatar ikegami-yukino avatar kokimame avatar ksato9700 avatar kyamada-exwzd-xware avatar letuananh avatar manjuu-eater avatar shiumachi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jaconv's Issues

kana2alphabet bug っ(xtsu)

I found a bug in your code when I tried to convert a sentence that ends with 'っ' . (example : 'あっ' , 'ぐっ')
IMO, Cause is clear, list variable 'text' must be cast to str

In [2]: jaconv.kana2alphabet('あっ')


TypeError Traceback (most recent call last)
in ()
----> 1 jaconv.kana2alphabet('あっ')

C:\ProgramData\Anaconda3\lib\site-packages\jaconv\jaconv.py in kana2alphabet(text)
211 tsu_pos = text.index('っ')
212 if len(text) <= tsu_pos + 1:
--> 213 return text[:-1] + 'xtsu'
214 text[tsu_pos] = text[tsu_pos + 1]
215 text = ''.join(text)

TypeError: can only concatenate list (not "str") to list

README.rst and CHANGES.rst are installed in wrong location

OS: Arch Linux
jaconv version: 0.3.4

  • When installing jaconv as a system package, README.rst and CHANGES.rst get installed to /usr/README.rst and /usr/CHANGES.rst.
  • When installing locally for the current user, the files get installed to $HOME/.local/README.rst and $HOME/.local/CHANGES.rst.

This happens with both python setup.py install and pip install.

The issue seems to be the data_files argument to setup in setup.py. This keyword is deprecated according to the documentation and removing it fixes the problem.

Supports <= 3.4 but using typing module new in version 3.5

Python3.4以下もサポートされているようですが、typingモジュールはPython3.5で追加されたようです。
https://docs.python.org/ja/3/library/typing.html

Python3.4の環境で検証ができなかったのですが、とりあえず型ヒントのある記法はPython2.7ではinvalid syntaxになるようです…(Wandboxで確認

enlargesmallkana()にある型ヒントは<=3.4の環境で邪魔にならないのでしょうか?

いまいち確証が持てていなくて申し訳ないです。よろしくお願いします。

kwarg `ignore` was not used in `jaconv.normalize`

Hi!
Pretty lucky to get to use this convenient tool for handling Japanese script.
Keyword argument ignore comes handy in some circumstances where you don't want some of the characters to stay unchanged. I just found ignore was not used in jaconv.normalize. It'd be an easy fix though.
ありがとうございます〜

alphabet2kana bug

  1. su is converted before tsu leading to words like 'atsui' being converted to 「あっすい」
  2. Trailing oh is converted too soon leaving words like 'itoh' as 「いっおお」
  3. Remaining oh replacements should be done after other replacements, current placement causes words like toho to become 「っおおお」
  4. singular m remains unconverted and becomes 「っ」, e.g., namba => なっば
  5. dzu is not converted, e.g. tsudsuku => っすっずく
>>> from jaconv import alphabet2kana as a2k
>>> a2k('tsudzuku')
'っすっずく'
>>> a2k('namba')
'なっば'
>>> a2k('itoh')
'いっおお'
>>> a2k('toho')
'っおおお'

日本語Readmeのサンプル

※日本語で申し訳ないですが、これを英語でというのも若干罰ゲーム感漂い、だったらスルーしちゃおうって流れになるところからのIssue作成なのでどうかご容赦を。

以下の部分の日本語が間違っていないでしょうか?

# ASCII以外の半角文字 to 全角文字
jaconv.h2z(u'abc', ascii=True)
# => u'abc'

# 数字以外の半角文字 to 全角文字
jaconv.h2z(u'123', digit=True)
# => u'123'

なお、使う側からすると、混ざっている文字列に対して操作をするケースがあることを考えるとASCII以外→ASCIIのみという言い換えも良くないと思います。

というのは、kanaはデフォルトTrueで、いまのロジックだと以下のようになり、カナは常に変換されます。

>>> import jaconv
>>> jaconv.h2z(u'123abcティロ・フィナーレ', digit=True)
'123abcティロ・フィナーレ'
>>> jaconv.h2z(u'123abcティロ・フィナーレ', ascii=True)
'123abcティロ・フィナーレ'
>>>

これに関しては、英語のコメントを修正するのか、ロジックの修正なのか判断つきませんが、合わせてご判断頂ければと思います。

Tag the source

Could you please tag the source again? This allows distributions to get the complete source from GitHub if they want.

This was done in the past but not for 0.3.3.

Thanks

Support Small/Large Conversion

This is mostly useful for OCR purposes, but being able to change っ to つ, ぃ to い, etc. would be helpful when standardizing texts for search.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.