Code Monkey home page Code Monkey logo

sanmoku's Introduction

[名前]
・Sanmoku


[バージョン]
・0.0.6


[概要]
・Gomoku( https://github.com/sile/gomoku )から派生した形態素解析器
・Gomokuに比べて実行時に必要なメモリ量が少ないのが特徴 (数分の一程度)
・形態素解析器としての特徴は、おおむねGomokuと同様


[形態素解析器]
形態素解析を行う(Java)。

= ビルド方法
 # Sanmoku本体
 $ cd analyzer
 $ ant
 $ ls sanmoku-0.0.5.jar

 # FeatureEx(拡張素性データ用)
 $ cd feature-ex
 $ ant
 $ ls sanmoku-feature-ex-0.0.1.jar 

= 形態素解析コマンド
 # 形態素解析
 $ java -cp sanmoku-0.0.5.jar net.reduls.sanmoku.bin.Sanmoku < 解析対象テキスト

 # 分かち書き
 $ java -cp sanmoku-0.0.5.jar net.reduls.sanmoku.bin.Sanmoku -wakati < 解析対象テキスト

 # 原型、読み、発音情報付きの形態素解析
 $ java -cp sanmoku-0.0.5.jar:sanmoku-feature-ex-0.0.1.jar net.reduls.sanmoku.bin.SanmokuFeatureEx < 解析対象テキスト


= java API
package net.reduls.sanmoku;

class Tagger {
  public static List<Morpheme> parse(String text);
  public static List<String> wakati(String text);
}

class Morpheme {
  public final String surface;  // 形態素表層形
  public final String feature;  // 形態素素性 (== 品詞)
  public final int start;       // 入力テキスト内での形態素の出現開始位置
}

// 拡張素性(原型、読み、発音)用のクラス
// - sanmoku-feature-ex-0.0.1.jar
class FeatureEx { 
  public FeatureEx(Morpheme m);      // 形態素インスタンスから素性インスタンスを構築
  public final String baseform;      // 原型: 対応する情報がない場合は空文字列を返す (以下同) 
  public final String reading;       // 読み:
  public final String pronunciation; // 発音:
}


[辞書構築方法メモ]
# Sanmoku本体辞書
$ cd dicbuilder
$ sbcl --script make-build-dic-command.lisp
$ ./sanmoku-build-dic /path/to/mecab-ipadic-2.7.0-20070801/ dic/
$ sbcl --script reduce.lisp dic/
$ cp dic/* ../analyzer/src/net/reduls/sanmoku/dicdata/
$ cd ../analyzer
$ ant

# 拡張素性辞書 (原型、読み、発音)
$ cd dicbuilder
$ sbcl --script gen-feature-ex-files.lisp /path/to/mecab-ipadic-2.7.0-20070801 dic-ex/
$ cp dic-ex/ ../feature-ex/src/net/reduls/sanmoku/dicdata/
$ cd ../feature-ex
$ ant



[ライセンス]
・Sanmoku本体(ソースファイル及びjarファイル)
 ・MITライセンス: 詳細はCOPYINGファイルを参照

・Sanmokuに含まれるIPADICバイナリデータ
 ・IPADICのライセンスに準拠: 詳細はCOPYING.ipadicを参照


[TODO]
・諸々整理

sanmoku's People

Contributors

sile avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.