ko-ichi-h / khcoder Goto Github PK

View Code? Open in Web Editor NEW

309.0 21.0 95.0 31.25 MB

KH Coder: for Quantitative Content Analysis or Text Mining

Home Page: http://khcoder.net/en

License: GNU General Public License v2.0

Perl 94.02% JavaScript 3.90% Batchfile 0.09% R 1.98% Scheme 0.01%

content-analysis text-mining visualization kwic corpus

khcoder's People

Contributors

Stargazers

Watchers

Forkers

shuu-textmining-screenas konishi0874 stemd hrtk1219 dayuanjiang tomoki-n air1990wp vsantamaria seabreg benzei bhanumitt rvk46 seoyeongsong tottok-ug nsuganuma27 maryashby thefungineer tottokug 727369862 alexusconstantin continuum0307 onerebos 5l1v3r1 dojacp eliotcai admariner vishalbelsare yuukitoriyama kuchimura0101 mokamotosan chaoguanghuo tokoi alphaclj orangekiwi lucyxiaoluwang bkmgit tsukuyomih2 embeddedsamurai sekikn k-kobayas yuantiansun noritsugukamata spruning xiaoyue1021 braineditor ueharakazuki999 huidonotknow ebk13579 sinchiba-backyard neraunzaran xibrer shun2wang maikajiwara-hirata yhw1234 miyukit fayehh brettmkettler kia-gsii israelzinc lw9726 weddy26497 vimalacari altvalente pickumi93 r2021g proby143 ouyumeng kaka-png yo0825 songbowendalian carinadumea aaryawadje brahimmade 906695780 dataspider-fan nobe55269 xiayuan-xujun pokapro haziqazri kamomek2 florafan1122 hiresh05 levelmiya reon33g scnsk timothy7062 technohippy morioka mhatta jksangeethchandran hoedro trausan marziyanigeban puffile 3x3y3z3t

khcoder's Issues

Chinese translation of interface messages

Description

KH Coder has Chinese interface but some messages have not been translated into Chinese yet.

How to Contribute

Fork and Clone this repository
Open "config/msg.cn" with your favorite text editor & search for ***not translated***.
Add Chinese translation

There are lines like this:

max: '***not translated*** Max // 最大'

This is an example of a message that is not translated into Chinese. The meaning of this line is:

message-id: '***not translated*** English message // Japanese message'

Please edit the line like this to add Chinese message:

message-id: 'Chinese message'

Test the message file "config/msg.cn" you edited.
1. Download the latest release of the KH Coder, unzip, and run it.
2. Select "Chinese" as the Interface Language.
3. Shutdown KH Coder.
4. Overwrite "config/msg.cn" with your edited version.
5. Run KH Coder to see your translated messages in screens of KH Coder.
Commit, push and send a pull request.

Note

KH Coder is available under the terms of the GNU GPL v2 or later. Contributed messages will be a part of KH Coder and the license of KH Coder will be applied. Your contribution will be documented in the "Contributors" page.

Plotting more than two dimensions with correspondence analysis

Hello,
is it possible to plot a third and fourth dimension using correspondence analysis? KH Coder does a great job providing two dimensional plots with correspondence analysis. In my specific case variations are explained to a relevant extent by more than two dimension. To give an example:

 Dimension  Cor ^2 Explained

[1,] 1 0.0122 29.71
[2,] 2 0.0104 25.33
[3,] 3 0.0080 19.52
[4,] 4 0.0053 12.91
[5,] 5 0.0028 6.91
[6,] 6 0.0023 5.62
[1] "iterations: 75"

The first two dimension already explain 55,04 percent of variations. To increase my outcome regarding an explanation of variations it would be helpful to have also the third and fourth dimension plotted. Is it feasible for the KH Coder to plot more dimension than two using correspondence analysis? Of course, space is restricted to three dimension but I wonder whether it is possible to visualize a third and fourth dimension. Or is it possible to implement a download option for a table (i.e., csv) with all numeric values of the profiles?

Best regards
Axel

MVR（Modifier Verb Ratio）および名詞率の計算方法

以前に余所でご質問いただいたMVR（Modifier Verb Ratio）と、それに加えて名詞率の計算方法を、このIssueで検討します。

なおMVRおよび名詞率の定義はこちらのページに記載があります。
http://langstat.hatenablog.com/entry/20140913/1410534000

文書のクラスター分析について

樋口先生

研究でKH Coderを使用しております。黒田と申します。
文書のクラスター分析について複数お伺いしたいことがありまして、投稿させていただきました。
よろしくお願いいたします。

①文書のクラスター分析は「非階層的クラスター分析」との解釈でよろしいでしょうか？
クラスター分析には階層的と非階層的に分けられるとのことですが、クラスター数を自由に決めることができる文書のクラスター分析は非階層的であると判断してよいのでしょうか。
また、その場合は、クラスター1とクラスター2など、隣り合ったクラスター間に何らかの関係はないと解釈するのでしょうか。それとも、階層的クラスター分析のように、隣り合ったクラスターは似たような関係にあるのでしょうか。

②クラスター数について
私は朝日新聞と読売新聞の社説を戦後から分析しているのですが(朝日は342、読売は282件です)、最小出現数による取捨選択やクラスター数は合わせた方が良いのでしょうか。
朝日新聞は145語でクラスターは９つ、読売新聞は105語でクラスターは8つが一番解釈しやすかったのですが…。

③Jacard距離について
文書のクラスター分析の場合、例えば朝日新聞の場合、145語以上出てくる語が含まれる文書が分析の対象となり、似た文書が同一クラスターに分類されるということだと思います。
この場合、特徴語におけるJacard係数は、145語以上出てくる語同士の共起ということになるのでしょうか？それとも語と文書の共起になるのでしょうか？
（「実例クラスター分析」が本学図書館にないため質問させていただきました）

お忙しいところ恐縮ですが、よろしくお願いいたします。

OS

Windows 10

KH Coderのバージョン

Version 3.Alpha.14

console says: MySQL: FLUSH and thats it for this targetfile

Hi Mister Higuchi,

if I open my BIG targetfile in Language: German | FreeLing

console says: MySQL: FLUSH

(?) Can I ignore this nd just do 'Pre Processing'?

But then KH-Coder will find NO word to analyse

Best regards! And many thanks for your advice. I think I do something wrong with your tool.

プロジェクト「新規」のエラー（Failed to create a new DB! / InnoDB: Cannot open table...）

いつもお世話になっております。
ご質問失礼いたします。
新規プロジェクトを立ち上げると以下のエラーが生じます。
以前の書き込みで、KHcoder ３をドライブから削除して、Unzipから開くとありましたが、
Unzipが何を示すか調べてもわかりませんでした。
もう一度ダウンロードし直すということでしょうか？
PCに精通していなく、稚拙な質問で申し訳ございません。

■エラー・メッセージ
Error: Failed to create new DB! at /<Users/ /Desktop/khcoder3/
x_mac64>mysql_exec.pm line 189.

■その問題はチュートリアルの漱石「こころ」データでも同様に発生しましたか？
→はい

■お使いのOS
→：macOS 10.13.6

対応分析を行い各文書（各回答者）のスコアを出力する

対応分析を行い各文書（各回答者）のスコアを出力する方法はあるかという旨のご質問を余所でいただきました。

バグによるエラー

本来は「データ表の種類」として「抽出語×文書」を選択して対応分析を実行し、その結果をCSV形式で保存すれば良いのですが、現時点（3.Alpha.15e）ではKH Coderのバグのため、この操作がエラーになります。

このエラーはRを使う以下の手順で回避できますが、修正版をすでに公開しましたのでKH Coderを更新していただくのが簡単かと思います。詳しくは別途のコメントに書きましたので、このページの下の方をご覧ください。

対応分析のオプション画面右上で「文書×抽出語」を選択して「OK」をクリック

結果をR形式で保存してRで実行：このスライドのp. 5までの手順

Rで「 write.csv(c$rscore, file="c:/khcoder3/corresp.csv")」を実行

実行とは「R Console」画面に貼り付けて「Enter」キーを押すことです。これでCドライブの「khcoder3」フォルダに「corresp.csv」という名前で保存されます。各成分の寄与率を見るには「txt」を実行してください。

ケース数の減少

結果を見ると、入力データよりも文書数（回答者数）が減っている場合があります。これは分析対象の語を1つも含まない文書（回答者）が分析から省かれているためです。もし、他の変数との相関を見るために、他の変数を含むデータと結合するような場合には大きな問題になるでしょう。この対策については、別途のコメントに書きましたので、このページの下の方をご参照ください。

一応可能ですがやはりコーディングがお勧め？

以上の操作によって、各文書（各回答者）のスコアを出力して使用することは一応可能です。

しかし単に面倒だというだけでなく、「抽出語×文書」表を使う対応分析は、データの形式上、寄与率の低い（数パーセント）成分が数多く抽出されます。そしてどの成分を分析に使うのかという（やや難しい）選択が必要になります。さらに、これはあくまで私自身の現在の考えですが、分析者が注目したい多様なコンセプト・トピック・事柄みたいなものを、統計的方法・自動処理だけで常に上手く取り出せるとは限らないと思います。

このため、開発者としてはこの方法を使うよりもコーディングをお勧めしています。対応分析や共起ネットワークのような多変量解析から、おもしろそうな成分／コンセプトが見つかったら、それらに関係する語群を手動で指定するのがコーディングです。テキストファイルの中で語を指定して、「コーディングルール・ファイル」を作成します。チュートリアルに付属の「theme.txt」を見て、同じような内容のファイルを「サクラエディタ」「秀丸」「メモ帳」のようなテキストエディタで作ります。

そうすれば、どんな人の回答にそれらの語が含まれるかを他の変数との「クロス集計」で分析できます。また「コーディング結果の出力」で、回答にそれらの語を含むかどうかを0と1のダミー変数で出力できます。「文書検索」画面で「tf順」を選べば、それらの語を多く含む実際の回答を閲覧できます。

コーディング以外では

手動のコーディングではなく、統計的方法を使うことにこだわりたい場合は、寄与率の低い成分が数多く抽出される対応分析よりも、NMFやトピックモデルと呼ばれるような方法が良いかもしれません。NMFやトピックモデルでは、「成分」にあたるものの数を、手で指定できます。ライブラリやコマンドを調べる必要がありますが、Rで実行できるかと思います。（上記の理由からあまりお勧めしないのですが…）

Topic model

Hello,
Just a quick suggestion/question: will/could the next iteration of the KHCoder include a topic model function (both "bag of words" and ordered words)?

文脈ベクトルにおける頻出語の抽出数について

樋口先生

こんにちは。
KHCoderでの歌詞分析で、卒論を執筆中の学部生(立命館大)です。

この度、標記のことでお伺いさせて頂きたいことがあり、ご連絡させて頂きました。

先生が上梓なされた『社会調査のための計量テキスト分析』の中のp.53, 68において、論文中で文脈ベクトル算出の手順について述べられており, そのベクトル算出の為の頻出語の抽出条件において, p.53においては「800回以上5864回以下」, p68では「500回以上」という制限を定めていると思うのですが, この下限である800回以上や500回以上はどういう基準で定めているのでしょうか（上限については、サンプル数以上に出現する語は一般的と見なすのを基準とするのはよくわかりました）。布置語が100~200語あたりになるように、分析者が独自で設定しているようなものなのでしょうか。

お忙しいところ恐縮ですが、ご回答のお返事を頂けると幸甚です。
よろしくお願いいたします。

「関連語検索」画面で外部変数を利用するときに複数の値を選択したい

初めて連絡いたします。
稚拙な質問で申し訳ありません。

外部変数と見出しで１つの変数で複数の値を選択して、特徴語から共起ネットワークで表示したいのですが、どのように複数の値を設定すればよいのでしょうか？
"<>変数名-->値"を2つ以上指定したいのです。例えば、Aという変数が「いぬ」または「ねこ」の場合のようなことです。

お手数をおかけいたしますが、よろしくお願いいたします。

見出しで区切ったテキストの対応分析がエラーに：同じレヴェルの文書を区切る時には同じH数字の見出しを

樋口さん
はじめまして。今卒業論文でkhcoderを使用し、タグをh1からh3までつけて、対応分析をしようとしているのですが、どうしても『Ｒを用いた推定または描画に失敗しました simple error in colnames...』『attempt to set colnames on an object with less than two dimensions』というふうに出てしまいます。
そのあとに、
『evaluating a method for function plot: Error in c＄cscore : object of type builtin is not subsettable 』
が3回出たあとに
『can't call method "r_msg" without a package or object』というふうに出てしまいます。

Rを再インストールしたのですが上手く行きません。どうしたら良いでしょうか。

設定した条件は以下の通りです。
最小出現数 350 布置される語数 155
集計単位 H1
文ケース数 109,929
段落ケース数 75,218
H3 1
H2 1
H1 1

Korean translation of interface messages

Description

KH Coder has Korean interface but some messages have not been translated into Korean yet.

How to Contribute

Fork and Clone this repository
Open "config/msg.kr" with your favorite text editor & search for ***not translated***.
Add Korean translation

There are lines like this:

max: '***not translated*** Max // 最大'

This is an example of a message that is not translated into Korean. The meaning of this line is:

message-id: '***not translated*** English message // Japanese message'

Please edit the line like this to add Korean message:

message-id: 'Korean message'

Test the message file "config/msg.kr" you edited.
1. Download the latest release of the KH Coder, unzip, and run it.
2. Select "Korean" as the Interface Language.
3. Shutdown KH Coder.
4. Overwrite "config/msg.kr" with your edited version.
5. Run KH Coder to see your translated messages in screens of KH Coder.
Commit, push and send a pull request.

Note

compatibility with ggplot2 version 3: Correspondence Analysis

Hello,

When I try to use Correspondence Analysis, I get this error message. (For both Words>Correspondence Analysis and Coding>Correspondence Analysis)

eduroam-078-104-000-073:khcoder alessia$ perl kh_coder.pl Perl/Tk: 804.034 This is KH Coder 3.Alpha.14b on darwin. CWD: /Users/alessia/khcoder R Version: 3.5, x86_64 Using un-threaded functions... Connected to MySQL 8.0, khc21. ignore: 677,618,548,485,33,23,549,486,619,678,126,443,130,436,14001,24,86,90,87,91,88,92, ................... 5 wallclock secs ( 0.81 usr 0.02 sys + 0.01 cusr 0.00 csys = 0.84 CPU) Data matrix for R: 120 words x 2 docs Statistics::R::Bridge::pipe::read_processR, Sleep and Retry! Loading required package: sp Checking rgeos availability: TRUE Statistics::R::Bridge::pipe::read_processR, Retry: Loading required package: Rcpp Loading required package: RColorBrewer Non-function objects are not currently inserted (not traceable): .packageName Modified functions inserted through trace(): wordlayout output file: /Users/alessia/khcoder/config/R-bridge/khc21_word_corresp_1.png done: 00:00:42

It happens with my own files as well as the tutorial file.

KH Coder version
3.Alpha.14b via source code

OS
macOS Mojave 10.14.1

(I have another problem with R with Hierarchical Cluster Analysis. Should I ask about that in a separate thread?)

コードのクロス集計はできるけれど抽出語のクロス集計は？

コーディングでの分析にはクロス分析がありますが、素データでのクロス分析は無いのでしょうか。例えば、「出現回数上位１０の語」、もしくは、「指定した語」と指定した変数のクロス分析のイメージです。

Spanish translation of interface messages

Description

KH Coder has Spanish interface but some messages have not been translated into Spanish yet.

How to Contribute

Fork and Clone this repository
Open "config/msg.es" with your favorite text editor & search for ***not translated***.
Add spanish translation

There are lines like this:

max: '***not translated*** Max // 最大'

This is an example of a message that is not translated into Spanish. The meaning of this line is:

message-id: '***not translated*** English message // Japanese message'

Please edit the line like this to add Spanish message:

message-id: 'Spanish message'

Test the message file "config/msg.es" you edited.
1. Download the latest release of the KH Coder, unzip, and run it.
2. Select "Spanish" as the Interface Language.
3. Shutdown KH Coder.
4. Overwrite "config/msg.es" with your edited version.
5. Run KH Coder to see your translated messages in screens of KH Coder.
Commit, push and send a pull request.

Note

Language Support

Hi Dr. Koichi,

Just want to ask if the Malay language is supported or not to produce co-occurrence networks. Thanks

「このアプリはお使いのPCでは実行できません」エラー（Surface Pro 第5世代）

最新バージョンがSurface Proで動作しない現象が確認されています。
Surface ProはWindows 10 (Sモードではない)ですし、64bitです。
「アプリケーションは、このWindowsのバージョンではサポートされていません」と出ます。

過去に蓄積したデータとの比較

樋口先生

中村と申します。Kh Coderを有難く使用させていただきます。

さて掲題の件に関してお伺いいたしたく、質問させていただきました。

過去に蓄積した単語と比較して、突然増加した「ホットワード」のようなものを
抽出したいと考えておりますが、よい方法はございますでしょうか。
（現在は過去のデータからリストを作成し、頻出語を除外した上で、
共起ネットワークを作成するという手順で試しています。）

ヒントをいただければ再度当方で調査したいと思っています。

お手数ですがご教示の程よろしくお願いいたします。

Fix the size of "Config" window of co-occurrence network

Currently, the size of "Config" window of co-occurrence network will change depending on selected options. It can be annoying that users have to manually adjust the window size with mouses.

compatibility with ggplot2 version 3: Hierarchical Cluster Analysis on Mac

This is the error I get for the hierarchical cluster analysis. I used the kokoro.xls file here.

R Version: 3.5, x86_64 Using un-threaded functions... Connected to MySQL 8.0, khc31. ignore: 19,20,21,..... 0 wallclock secs ( 0.17 usr 0.02 sys + 0.01 cusr 0.01 csys = 0.21 CPU) Data matrix for R: 71 words x 1215 docs Statistics::R::Bridge::pipe::read_processR, Sleep and Retry! Statistics::R::Bridge::pipe::read_processR, Retry: output file: /Users/alessia/Desktop/3alpha15/config/R-bridge/khc31_word_cls_1.png done: 00:00:07

KH Coder Version
3.Alpha.15 + commit 9e9adbf
from source code

OS
macOS Mojave 10.14.1

Is this also an issue with the R or maybe gPath version?

Thank you so much for your help

データ準備：5種類のテキストそれぞれの特徴をまとめたり、そのうち1種類について詳しく見るには

お世話になっております。

添付のExcelで、①ペンケース・②玉コロ・③ダンベル・④バット・⑤パズルと５つのワードがあり、それぞれのワードに対する評価がありまして

（評価は、例えば一番左の列の
①ペンケースならテーブルに置ける、しっくりくる…
②玉コロならネーミングと見た目が可愛い、つるつる手触りが良い、気持ち良い…
など）

Corderに読み込むテキストに上記エクセルをそのままコピペすると
全部のワードに対する評価がごっちゃになって出てきてしまうのですが

それぞれ

ペンケースの評価
玉コロの評価
ダンベルの評価
…

というように、ワード毎に評価を得た
共起ネットワークを作りたい場合、何か良い方法はありますでしょうか？

お使いのOS：Mac OS 10.12.6
KH Coderのバージョン：3

Pre-processing stops at "Connecting..." stage in console window

コーディングルールの書式エラー： Could NOT find the word...

こんにちは。
現在、英語でのテキスト分析をしています。
KHCoder のバージョンは3.Alpha.13m です。
コーディングルールを作成したのですが、書式に誤りがあると表示され、うまく文書検索を実行できません。
以下にスクリーンショットを添付します。

作成したコーディングルール：

Terminal:

何が問題なのでしょうか。
ご教授お願いいたします。

mysqld.exeのアプリケーションエラー「アプリケーションを正しく起動できませんでした(0xc000007b)」

ざっと過去の投稿を確認させて頂きましたが、同様の事象が発生していなかったため、質問させて下さい。

■お使いのKH Coderのバージョン
3.Alpha.16

■KH Coderのインストール先フォルダ（解凍先フォルダ）
C:\khcoder3

■どんなエラー・不具合・問題ですか？
KH Coder起動時にmysqld.exeからアプリケーションエラーが出力される。
アプリケーションエラーが出力されるものの、KH Coder自体は起動する。

■どのように操作すればその問題を再現できますか？
KH Coder起動時に毎回出力される

■エラー・メッセージ
mysqld.exe - アプリケーションエラー「アプリケーションを正しく起動できませんでした(0xc000007b)。[OK]をクリックしてアプリケーションを閉じてください」

■その問題はチュートリアルの漱石「こころ」データでも同様に発生しましたか？
起動時の問題であるため、試行していない

■その問題を再現できるファイル（群）
チュートリアルの漱石「こころ」データでは問題が生じない場合、もし可能であれば、その問題を再現できる分析対象ファイル等を添付してください。
起動時の問題であるため、試行していない

■お使いのOS
Windows10

前処理を実行すると反応が無くなる（Macのマルチ・ディスプレイ環境）

はじめまして。お忙しいところ恐れ入りますが、以下ご確認のほどよろしくお願いいたします。

■KH Coderのバージョン

3.Alpha.15f [Perl 5.18.2, Perl/Tk 804.034]

■KH Coderのインストール先フォルダ（解凍先フォルダ）

/Users/*****/Downloads/khcoder-master

■エラー・不具合内容

前処理を実行すると、エラーが出ないまま結果が返って来ず、各メニューのクリックも反応しなくなる

■再現手順

/Users/*****/Downloads/khcoder-masterで perl kh_coder.pl を実行してKH Coderを起動
チュートリアル（ http://khcoder.net/kh_tuto.html ）10ページ目の操作を実行

■エラー・メッセージ

無し

チュートリアルにある「この処理には時間がかかる場合があります。」のウィンドウが出ない
エラーは出ず、結果も返ってこない
各メニューをクリックしても反応がない

新規プロジェクト作成時に kokoro.xls ファイルを参照していますが、

下の画像のように、プロジェクトのファイル名が変わっている点が気になります。

■コンソール・ウィンドウの表示内容

******:khcoder-master ******$ perl kh_coder.pl
Perl/Tk: 804.034
Locale: ja_JP.SJIS
This is KH Coder 3.Alpha.15f on darwin.
CWD: /Users/******/Downloads/khcoder-master
R Version: 3.5, x86_64
Using un-threaded functions...
Conv:	 0 wallclock secs ( 0.22 usr +  0.01 sys =  0.23 CPU)
Connected to MySQL 8.0, khc8.
Data dir: /Users/******/Downloads/khcoder-master/config/khc8/
Connected to MySQL 8.0, khc8.
Checking icode (jp2)... utf8
MySQL: FLUSH

↓
KH Coderを終了すると

sh: line 1: 28470 Killed: 9               /usr/local/bin/R --slave --vanilla < start.r > output.log

が追加される

■チュートリアルの漱石「こころ」データでも同様に発生するか

はい

OS

Mac OS High Sierra 10.13.3

Coding rule error: phrase search

Dear Professor,
I am using Kh Koder version 3.Alpha.15f on Windows 10, my code file (txt file) works fine on some (XLS files), but with a specific file (same xls format) keeps showing an error and by closing the error the program shutdowns as shown in the attached photo (I can't understand the error), could you please help me in understanding the error, very thankful in advance!

Macで日本語が入力できません

基本的な操作はできておりますが、日本語を入力しようとしても英字しか入力できません。どうすればよいでしょうか。
添付は、kokoro.xlsを分析しようとしてメモを入力しても英字になってしまう画面と、システムの画面です。

KH Coder 3の共起ネットワークで「中心性媒介」が無くなっている件

樋口先生
はじめまして。
最新バージョンの共起ネットワークで「中心性媒介」の項目がなくなっていますが、分析において「不要」とご判断されたという理解でよろしかったでしょうか？
SCREENさんの昔の資料や、別の方に対する樋口先生のコメントで「（場合によって）サブグラフ検出よりもと中心性媒介の方が解釈しやすい」というのがあり、それ以来、教訓にしてきましたもので。
今後活用させていただくなかでの参考にご教示いただけないでしょうか。

「前処理データの整合性が失われました。genkei-hyousobun」エラー

皆さま、樋口先生

はじめまして。突然の質問失礼致します。
タイトルの通り、「前処理データの整合性が失われました。genkei-hyosobun」というエラーが解決できずに困っています。恐れ入りますが、皆さまにご助言を頂けないかと思い質問させて頂きました。

###やったこと
http://www.koichi.nihon.to/cgi-bin/bbs_khn/khcf.cgi?no=50&mode=allread
こちらを参考に「'」「"」「\」「|」「<」「>」を削除しました。
また、チュートリアルの漱石「こころ」データでは問題なく下処理を行えました。

お使いのOS

Windows 7

KH Coderのバージョン

3.Alpha.14[Perl5.14.2, Perl/Tk 804.03]

スクリーンショット

参考までにコンソール？のスクリーンショットと元データを添付させて頂きます。また、文字コードはSJISです。

自分で出来ることは一通り試したつもりですが…解決の足がかりが見えなくて苦労しています。
ご助言を頂ければ大変助かります。どうぞよろしくお願いします

khcoderテキスト.zip

Memory size related error from R (data size issue)

Hello,
I am using KHCoder 3.alpha.13, under Windows 64bits.
Analysing very large texts, I keep on getting memory size error messages on the console when I try to run networks. Like these below:

What would the solution be? Increase the allocated memory size? I did not find a way to do it.
Many thanks!

Mac用の自動設定ソフトウェア： No such file or directory

プロジェクト「開く」「新規」のMySQLエラー（Connect）

質問内容
先日のチュートリアルセッションに参加させていただいたものです。
２週間あまり問題なく使用できていたのですが、急にエラーが発生して使用不可となっています。
旧掲示板を検索してみましたが、解決しないので投稿させていただきます。

■KH Coderのバージョン
→3.Alpha.16 [Pearl 5.14.2, Pearl/Tk 804.03]

■KH Coderのインストール先フォルダ（解凍先フォルダ）
→C:\khcoder3

■どんなエラー・不具合・問題ですか？
→起動後、プロジェクトの「開く」・「新規」でエラーになる。

■どのように操作すればその問題を再現できますか？
→(1)メニューから「プロジェクト」「開く」をクリック
(2)メニューから「プロジェクト」「新規」・ファイル指定後「OK」をクリック

■エラー・メッセージ
→エラー画面のスクリーンショット

■コンソール・ウィンドウの表示内容

■その問題はチュートリアルの漱石「こころ」データでも同様に発生しましたか？
→はい

■その問題を再現できるファイル（群）
チュートリアルの漱石「こころ」データ

■使用OS
→Windows10　Mac ver. 10.14.3 マシンで VirtualBox Version 6.0.4 の環境で利用

data preparation: not all marker-levels are being showed

Hello,

I try to analyze news paper articles, using texts from the asahi shinbun data base "Kikuzo II". After copying the texts and pasting them into EditPad Lite 7, I continue to apply markers h1 to h3 to the text-cluster. Then I use Windows 932: Japanese (Shift JIS) to encode the text in order to check the target text. The problem is, that even though I reinstalled KHCoder once only the markers h1 and h2 are being showed after checking the target file. Previously, there was a problem with the arrows in front and behind the h1, h2, h3 markers, which let to an error that substituted the normal arrows with bigger one. Like mentioned is the problem now, that not all marker-levels are being showed even though there was no error stated. what can I do to fix this problem?
Thank you for your reply in advance!

品詞設定の方法（連体詞の抽出）

掲示板の使い方がわからずご迷惑をおかけしてしまい申し訳ありません。
旧掲示板にて何度かキーワード検索しましたが、答えとなるものを見つけられませんでした。

■お使いのKH Coderのバージョン
→ 2.00f　[Peal 5.412,Peal/Tk 804.029]

■KH Coderのインストール先フォルダ（解凍先フォルダ）
→D:/ （使用しているUSB）

■どんなエラー・不具合・問題ですか？
「hinsi_chasen」に【23,連体詞,連体詞-一般】を付け加え語の取捨選択の項目に連体詞を組み込むことができたが、品詞分析したいテキストファイルを新規プロジェクトで読み込み、Excelにて品詞抽出をすると連体詞の項目がなく、抽出出来ていない。

■どのように操作すればその問題を再現できますか？
→記入例：
　(1)「hinsi_chasen」に【23,連体詞,連体詞-一般】をサクラエディタにて編集し付け加える
　(2)「プロジェクト」「新規」で対象とするテキストファイルを選択し「開く」→「OK」
　(3)「前処理（R）」→「語の取捨選択」→「連体詞」を含む「未知語」「タグ」「感動詞」「その他」「HTMLタグ」以外のすべての項目にチェック→「OK」
　(4)「前処理（R）」→「前処理の実行」→「OK」
　(5)「処理が完了しました」→「OK」
　(6)「ツール(T)」→「抽出語」→「抽出語リスト」→「品詞別」→「出現回数」→「Excel」→「OK」
　(7)「連体詞」以外の項目は抽出できている　　

■エラー・メッセージ
なし

■コンソール・ウィンドウの表示内容
スクショの貼り付け方がわからずすみません

■その問題はチュートリアルの漱石「こころ」データでも同様に発生しましたか？
はい

■お使いのOS
Windows 10

"Something wrong with the database: bun_r table" error (pre-processing interruption)

Version number of KH Coder I use:
3.Alpha.16 [Perl 5.14.2, Perl/Tk 804.03]
Operating System:
Windows 10

Hi there, hello,

I am working as produktmanagerassistant at a new webportal which is created new for selling new cars.

Now we need to research all 'words' in a table 'car-description' to resolve issue with data-quality

I.. We would be very grateful for help:
as soon as I use an othr targetfile as described in folder tutorial_en KH-Koder3 interrupt the Pre-Processing.

Many Thanks in advance for reading. i would be verry happy to recive any hint or help to run it.

Manifestation of my Problem is:

KH-Koder3 interrupt the Pre-Processing in several ways:

- tried1)

if I use my original db-extracted long „targetfile_part_1_of_x.txt“ with thousands of lines and realy long cell-content as descriptiontext

- tried2)

if I use an export from Libre-office as *.xls

- tried3)

as *.csv kh-coder3 will not open it

What I can say about the original targetfile_part_1_of_x.txt is:

is *.txt > 16MB
I am working as produktmanagerassistant at a new webportal which is created new for selling new cars.
-- Now we need to research all 'words' in a table 'car-description'
--- this table is filled by cardealer and they fill this table with strings as they want, e.g: CopyPaste from long word.doc or such)

what I did with kh.coder3 is:

- tried1)

new project with original db-extracted long „targetfile_part_1_of_x.txt“
-- but in Console of KHCoder comes such:

Use of uninitialized value in concatenation (.) or string at \khcoder3\kh_coder.exe>Lingua/Sentence.pm line 131,
<TRGT> line 7401.

- tried2)

new poject with export from Libre-office as *.xls
-- but in Console of KHCoder comes such:
SERVER.WORKER: client ended, Closing connection

- tried3) new project with export from Libre-office as *.csv

-- but comes such:

Word Frequency List: Change color of words with characteristic distribution of POS

This planned feature is explained here (in Japanese):
https://twitter.com/khcoder/status/967433417250947072

Adding [save as png] and [save as svg] buttons to D3.js visualization

Description

We would like to add [save as png] and [save as svg] buttons to D3.js visualizations.

I attach the D3.js visualization to this post as a zip file.
network.zip

Also, here is an example online:
http://khcoder.net/tmp/network/index.html

How to Contribute

a) Just edit HTML and / or JS files

download network.zip
edit HTML and / or JS files to add buttons
upload edited files here

b) through GitHub

fork and clone this repository
edit kh_lib/kh_r_plot/network.pm and / or files in kh_lib/web_lib directory
commit, push, and send a pull request

Note

KH Coder is available under the terms of the GNU GPL v2 or later. The license of KH Coder will be applied to contributed codes.

How to merge *.txt files in the folder in a specific order?

Hello,

I study academic texts with numeric file names (001.txt, 002.txt, 003.txt and so on) qualitatively and quantitatively. For the KH Coder these files were merged into one txt-file. However, all text files unified in the single text file appear in seemingly random order. They were not merged in the order they were organized in the folder (from 001.txt to 336.txt) but appeared instead 005.txt, 003.txt, 001.txt and so on. The order of appearance might be not important for working with the KH Coder but I also would like to work with the merged text file in other contexts. Therefore my question: Is it possible to merge all files in the same order as they appear in the folder?

KH coder version 3.Alapha.14b runs on Mac OS 10.13.6.

KH Coder 3で英語データを分析する手順

表題の件で，「設定」の項目をクリック後，辞書を英語に対応したものに設定したいのですが，
マニュアルで示されている項目をチェックする欄が私のPC画面に表示されないので困っております。

私が使用しているPCは "Windows10 64bit"です。
どの様に対応すべきかを教えて下さい。

宜しくお願い致します。

複数のコーディングルールで同じ語を指定

コーディングルールについてご質問です。

あるｗｏｒｄでコーディングルールＡとコーディングルールＢ両者に重複するワードがある場合両者のルールで認識されますでしょうか？それとも記述したルールの順番で１つしか認識されませんでしょうか？

analyses the frequencies of our _words_ But not of _groups of words_

Hi there,

thank you for your tool!

KH-Coder analyses the frequencies of our words!

Is there possibility to identify the frequencies of occurring groups of words in a list which later has to be related to concrete car-equipment-features.
-- I think I realy do mean: identify occurring groups of words, not: words.

e.g:
cardesc1 wrote this as long descriptionstring into db:
Webasto parking heater, seat heater,
cardesc2 wrote this into db:
Rear window heating, Heated steering wheel,

But KH-coder the Quantitative Content Analysis shows most frequent words, not groups of words

Please dont get my question wrong, or unprecise, but at the moment I see in KH-Coder many analyses referenced to words but not to groups of words
-- I must confess: At the moment I don't see any solid floor with the many possibilities of the tool.

Automate the release of a new version in Github

We have to figure out how to automatically

Create a new release and
Upload binary files to the release

from a Perl script.

Need to edit "upload" subroutine in "utils/publisher.pl".

コーディングルール作成の考え方

お世話になります。mituhasiと申します。
初歩的な質問で大変申し訳ありません。

昨日、主人が代わりにコーディングルールの仕様について質問させていただきましたが、
私は大学院修士課程でインタビューの発言録を分析に掛けたく行っております。

仕様以外に、その中でコーディングルールを用いてカテゴリごとの傾向を見たく考えているのですが、
コーディングルールの作成のやり方は、あくまでも分析者の主観と仮説に基づいて設定するものなのでしょうか？インタビューなどの場合はこういう観点など決まりや一般的なルールはあるのでしょうか？

主人は「そんなの決まりはない。分析者主観によるもので、それをソフトウェアに委ねるのはおかしい」と言っていますが、指導教授はテキストマイニングについて専門外であるのですが、コーディングルールの作成のあるべきやり方について聞いていて回答に困っています。
分析者によると回答がいただければそれで結構ですし、なにか一般的なコーディングルール作成のやり方があればご教授いただければ幸いです。

共起ネットワークの中心性の値をCSVファイルに出力

OS

Mac OS 10.13.6

KH Coderのバージョン

2.Alpha.14b[Perl5.16.3, Perl/Tk804.032]

エラーメッセージ

KH Coderで作成した共起ネットワークの中心性の値を知ることが目的です。
2019年3月のワークショップで質問させていただいた者です。あの折にいただいたご助言にしたがい、以下のように進めました。
１）自身で作成した共起ネットワークを「R Source」形式で保存しました（A1.r）。
２）「Rgui.bat」はフォルダ内になかったため、Rをインストールしました。
３）コンソール上にファイル「A1.r」をドロップした際に出現したエラーメッセージに従い、いくつかのパッケージを追加しました。（Rを使用するのは、今回が初めてです）。RStudioをインストールして、そこで作業をしています。
４）最終的に、上記エラーメッセージが表示され、この解決方法がわからなかったため質問させていただきました。

旧掲示板のNo.898も参照したのですが、コードに入る前までいけませんでした。
Rについても全くの素人のため、基本的なことを伺っているのかもしれません。
大変お忙し中恐縮なのですが、ご助言いただけましたら幸いです。
よろしくお願いいたします。

how should I export my result of Documents' Cluster Analysis to Excel file

hello,how should I export my result of Documents' Cluster Analysis to Excel file?
I want the "id" information and word list of the result

CSV_XS ERROR: 2032

Hello,

I produced a new list of variables for a text corpus in a CSV file and tried to import this file into the KHCoder. The list includes string labels. In contrast to earlier CSV files it did not work.

I received following message:

Version number of KH Coder
3.Alapha.14b

OS
macOS 10.13.6

KH Coder上では正常に表示されるプロット（共起ネットワーク・対応分析）がEMF保存で文字化け

研究用にソフトを使わせていただいています。
はじめてつかっているので、tutorialやmanualを読みながら勉強しつつ試行錯誤しつつ使い方を少しずつ覚えているところです。
日本に在住ですが、韓国から持ってきた韓国語仕様のSAMSUNG ノートパソコンWindows 10 64btを使っています。
KH Coderソフトは正常に起動しており、データ分析も行いましたが、画像で保存されたデータをワードやペイントに読み込むと漢字(略字)やカタカナの調音(ー)が表示されず四角「□」になってしまいます。例えば、プライベート→プライベ□ト、社会→社□、学校→□校
のように表示されますが、解決できるのでしょうか。

Scaling in correspondence analysis

Dear Koichi,

I use the KH Coder to run correpondence analysis of words and variables. It works fine but I'm not sure whether scaling in plots is always the same. Especially if I apply for scaling "none" I have the impression that scaling in the plots vary. I mean the distance between 0 and 1 is different on axes X and Y. See:

How to read the plots? Is it a wrong impression on my side? Or how can I generate plots with axes that show the same distance between 0 and 1 (and other values)?

Best,
Axel

The version number of KH Coder: 3.Alapha.14b
My operating system: Mac OS 10.13.6

Calculation of coefficients in word-variable co-occurrence network

Hi all,

I've used KH Coder for word-word cooccurrence networks quite often and I understand how the relationship coefficients between words are calculated. Now I created a network to understand the association between words and an external variable (with two different values, let's say a and b). However for some words, I do not understand why they have been assigned to only a, only b or both. I investigated the total occurrence of a word with a or b, the occurrence relative to the number of documents with a or b etc. but for some words I would assign them differently than KH Coder did.

Could you maybe explain with an example how the coefficient here is calculated / how the assignment to either one of the variables or all of them works? Unfortunately I couldn't find more information about this particular network in previous discussions or the manual.

Thank you!

韓国語のテキストを分析したときに抽出語の後ろにつく番号の意味は？

韓国語のテキストを分析したときに、抽出語の後ろにつく番号の意味は何かというご質問を余所でいただきました。

マニュアルのA.2.4節をご覧いただくと、韓国語の分析には「HanDic」を用いていることがわかります。

そこでHanDicを検索していただくと、こちらのページに以下の情報が見つかります。

辞書形はその項目の辞書形に、同音異義語がある場合には「標準国語大辞典」（ウェブ版）の同音異義語番号をつけてあります。

ko-ichi-h / khcoder Goto Github PK

khcoder's People

Contributors

Stargazers

Watchers

Forkers

khcoder's Issues

Description

How to Contribute

Note

OS

KH Coderのバージョン

console says: MySQL: FLUSH

バグによるエラー

ケース数の減少

一応可能ですがやはりコーディングがお勧め？

コーディング以外では

Description

How to Contribute

Note

Description

How to Contribute

Note

■KH Coderのバージョン

■KH Coderのインストール先フォルダ（解凍先フォルダ）

■エラー・不具合内容

■再現手順

■エラー・メッセージ

■コンソール・ウィンドウの表示内容

■チュートリアルの漱石「こころ」データでも同様に発生するか

OS

お使いのOS

KH Coderのバージョン

スクリーンショット

Hi there, hello,

Manifestation of my Problem is:

- tried1)

- tried2)

- tried3)

What I can say about the original targetfile_part_1_of_x.txt is:

what I did with kh.coder3 is:

- tried1)

- tried2)

- tried3) new project with export from Libre-office as *.csv

Description

How to Contribute

a) Just edit HTML and / or JS files

b) through GitHub

Note

OS

KH Coderのバージョン

エラーメッセージ

Recommend Projects

Recommend Topics

Recommend Org