Code Monkey home page Code Monkey logo

ting's Introduction

Build Status Code Climate

Ting

Ting can convert between various systems for phonetically writing Mandarin Chinese. It can also handle various representation of tones, so it can be used to convert pinyin with numbers to pinyin with tones.

Hanyu Pinyin, Bopomofo, Wade-Giles, Tongyong Pinyin and International Phonetic Alphabet (IPA) are supported.

INSTALL

  • gem install ting

SYNOPSIS

To parse your strings create a Reader object. Ting.reader() takes two parameters : the transliteration format, and the way that tones are represented.

To some extent these can be mixed and matched.

To generate pinyin/wade-giles/etc. create a Writer object. Use Ting.writer()

Formats

  • :hanyu Hanyu Pinyin
  • :zhuyin Zhuyin Fuhao (a.k.a. Bopomofo)
  • :wadegiles Wade Giles
  • :ipa International Phonetic Alphabet
  • :tongyong Tongyong Pinyin

Tones

  • :numbers Simply put a number after the syllable, easy to type
  • :accents Use diacritics, follows the Hanyu Pinyin rules, there needs to be at least one vowel to apply this to, not usable with IPA or Bopomofo
  • :supernum Superscript numerals, typically used for Wade-Giles
  • :marks Tone mark after the syllable, typically used for Bopomofo
  • :ipa IPA tone marks
  • :no_tones Use no tones

Examples

Parse Hanyu Pinyin

   require 'ting'

   reader = Ting.reader(:hanyu, :numbers)
   reader.( "wo3 ai4 ni3" )
    # => [<Ting::Syllable <initial=Empty, final=Uo, tone=3>>,
    #     <Ting::Syllable <initial=Empty, final=Ai, tone=4>>,
    #     <Ting::Syllable <initial=Ne, final=I, tone=3>>]

Generate Bopomofo

   zhuyin = Ting.writer(:zhuyin, :marks)
   zhuyin.(reader.("wo3 ai4 ni3"))
   # => "ㄨㄛˇ ㄞˋ ㄋㄧˇ"

Generate Wade-Giles

   wadegiles = Ting.writer(:wadegiles, :supernum)
   wadegiles.(reader.("qing2 kuang4 ru2 he2"))
   # => "ch`ing² k`uang⁴ ju² ho²"

Generate IPA

   ipa = Ting.writer.new(:ipa, :ipa)
   ipa.(reader.("you3 peng2 zi4 yuan2 fang1 lai2"))
   # => "iou˧˩˧ pʰeŋ˧˥ ts˥˩ yɛn˧˥ faŋ˥˥ lai˧˥"

Since this is such a common use case, a convenience method exists to add diacritics to pinyin.

   Ting.pretty_tones "wo3 ai4 ni3"
   # => "wǒ ài nǐ"

Note that syllables need to be separated by spaces, feeding "peng2you3" to the parser does not work. The Ting.pretty_tones(string) method does handle these things a bit more gracefully.

If you need to parse input that does not conform, consider using a regexp to scan for valid syllables, then feed the syllables to the parser one by one. Have a look at Ting.pretty_tones for an example of how to do this, but note that it does not support special cases like erhua (wanr2 = wan2 er) or non-standard Pinyin syllables like 嗯/"ń" or 呣/"ḿ" (which appear in the official Unicode data and some textbooks).

ting_table

The ting_table script will spit out a CSV table of all syllables and formats Ting knows about. Useful if you want to do conversion in other languages.

REQUIREMENTS

  • none, Ting uses nothing but Ruby

LICENSE

Copyright (c) 2007-2017, Arne Brasseur. (http://www.arnebrasseur.net)

Available as Free Software under the GPLv3 License, see LICENSE.txt for details

ting's People

Contributors

djuretic avatar domon avatar jlnr avatar plexus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ting's Issues

How to read or write ǜ sound

Thank you for this nice work,
I was wondering if there was any way to deals with the "u:" sound in pinyin ?
I tried with things like "lu:4" or "lv4" but it does not work.

Pinyin tone accents to numbers?

I have gotten numbers -> accents to work, but does this gem go the reverse way as well?
I tried:

reader = Ting.reader(:hanyu, :accents)
reader.parse("sànbù")
Ting::ParseError: Parsing of "sànbù" failed : Can't parse `"sanbù"'
    from /Users/michael/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/gems/ting-0.9.0/lib/ting/reader.rb:21:in `rescue in parse'
    from /Users/michael/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/gems/ting-0.9.0/lib/ting/reader.rb:11:in `parse'
    from (irb):6

Edit:
Oh I see, it requires the pinyin to be space delimited. Is it possible to parse pinyin without spaces in them?

HanyuPinyinParser is too greedy

The HanyuPinyinParser class greedily matches syllables, preferring long syllables over short ones. This does not work for syllables like 堅固 "jiāngù":

Ting::HanyuPinyinParser.new.parse("jiangu")
# => [<Ting::Syllable <initial=Ji, final=Iang, tone=5>>]

This also affects words like 珍貴 zhēnguì, 蝸牛 guāniú, 比擬 bǐnǐ, 膽固醇 dǎngùchún, 專櫃 zhuānguì etc.

The parser should realize that the remainder of the word is not a valid syllable, and...backtrack? 😱 It could be as simple as changing the regex, but then the pinyin would probably have to be further sanitized before attempting to parse it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.