apertium / apertium-chv Goto Github PK
View Code? Open in Web Editor NEWApertium linguistic data for Chuvash
License: GNU General Public License v3.0
Apertium linguistic data for Chuvash
License: GNU General Public License v3.0
<abil>
gives many problems. Here are some relevant cases:
$ echo "кил<v><iv><abil><pres><p1><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
кил<v><iv><abil><pres><p1><sg> кил>{A}й>{A}т>{Ă}п 0,000000
$ aq-morftest -ci v_кил_del.yaml | grep "<abil><pres><p1><sg>"
[FAIL] кил<v><iv><abil><pres><p1><sg> => missing results: килеетӗп
[FAIL] кил<v><iv><abil><pres><p1><sg> => unexpected results: килейетӗп
$ echo "кала<v><tv><abil><pres><p1><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
кала<v><tv><abil><pres><p1><sg> кала>{A}й>{A}т>{Ă}п 0,000000
$ aq-morftest -ci v_кала_del.yaml | grep "<abil><pres><p1><sg>"
[FAIL] кала<v><tv><abil><pres><p1><sg> => missing results: калаятӑп
[FAIL] кала<v><tv><abil><pres><p1><sg> => unexpected results: калайатӑп, калааятӑп
echo "ҫи<v><tv><abil><pres><p1><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
ҫи<v><tv><abil><pres><p1><sg> ҫи{й}>{A}й>{A}т>{Ă}п 0,000000
$ aq-morftest -ci v_ҫи_del.yaml | grep "<abil><pres><p1><sg>"
[FAIL] ҫи<v><tv><abil><pres><p1><sg> => missing results: ҫийеетӗп
[FAIL] ҫи<v><tv><abil><pres><p1><sg> => unexpected results: ҫиейетӗп
$ echo "ту<v><tv><abil><pres><p1><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
ту<v><tv><abil><pres><p1><sg> ту{в}>{A}й>{A}т>{Ă}п 0,000000
$ aq-morftest -ci v_ту_del.yaml | grep "<abil><pres><p1><sg>"
[FAIL] ту<v><tv><abil><pres><p1><sg> => missing results: тӑвaятӑп
[FAIL] ту<v><tv><abil><pres><p1><sg> => unexpected results: тӑвайатӑп, тӑваятӑп
$ echo "выля<v><tv><abil><pres><p1><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
выля<v><tv><abil><pres><p1><sg> выля>{A}й>{A}т>{Ă}п 0,000000
$ aq-morftest -ci v_выля_del.yaml | grep "<abil><pres><p1><sg>"
[FAIL] выля<v><tv><abil><pres><p1><sg> => missing results: выляятӑп
[FAIL] выля<v><tv><abil><pres><p1><sg> => unexpected results: выляаятӑп, выляайатӑп
Model it on the Guaraní transducer.
modes.xml includes some modes with install="yes"
, but the required
files aren't installed.
Some generic suggestions:
-lexc and -twol modes probably aren't useful to users
-spell modes should depend on --enable-ospell
.deps files are never installed, so any modes using them shouldn't be
installed.
Messages for package app-dicts/apertium-chv-9999:
Failed to find '/usr/share/apertium/apertium-chv/.deps/chv.twol.hfst' in install image.
QA: missing files required for mode chv-twol.
Failed to find '/usr/share/apertium/apertium-chv/.deps/chv.lexc.hfst' in install image.
QA: missing files required for mode chv-lexc.
{в} :в
in past tense, but this should not happen:
$ echo "ту<v><tv><past><p1><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
ту<v><tv><past><p1><sg> ту{в}>{T}>{Ă}м 0,000000
s$ aq-morftest -ci v_ту_del.yaml | grep "<past>"
[FAIL] ту<v><tv><past><p1><sg> => missing results: турӑм
[FAIL] ту<v><tv><past><p1><sg> => unexpected results: туврӑм
[FAIL] ту<v><tv><past><p2><sg> => missing results: турӑн
[FAIL] ту<v><tv><past><p2><sg> => unexpected results: туврӑн
[FAIL] ту<v><tv><past><p3><sg> => missing results: турӗ
[FAIL] ту<v><tv><past><p3><sg> => unexpected results: туврӗ
[FAIL] ту<v><tv><past><p1><pl> => missing results: турӑмӑр
[FAIL] ту<v><tv><past><p1><pl> => unexpected results: туврӑмӑр
[FAIL] ту<v><tv><past><p2><pl> => missing results: турӑр
[FAIL] ту<v><tv><past><p2><pl> => unexpected results: туврӑр
[FAIL] ту<v><tv><past><p3><pl> => missing results: турӗҫ
[FAIL] ту<v><tv><past><p3><pl> => unexpected results: туврӗҫ
This seems to be the same problem as #26 but for two other suffixes.
The first ч should only appear in contexts where чӗ should be read as voiced. So, it has to be deleted after voiceless consonants. Example:
$ echo "йывӑҫ<n><nom>+ӗ<cop><ifi>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
йывӑҫ<n><nom>+ӗ<cop><ifi> йывӑҫ>{ч}чӗ 0,000000
Generates йывӑҫччӗ
instead of йывӑҫчӗ
.
$ echo "калаҫ<v><tv><imp><p2><sg>+ччӗ<mod>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
калаҫ<v><tv><imp><p2><sg>+ччӗ<mod> калаҫ>{ч}чӗ 0,000000
Generates калаҫччӗ
instead of калаҫчӗ
fran@ipek:~/source/apertium/languages/apertium-chv$ bash dev/lint.sh
0 missing multi
5 mixed script çи:çи N-INFORMACI; ! src: Chuvash wordlist "" ӗç:ӗç V-TD; ! src: Chuvash wordlist "" çи:çи V-TD; ! src: Chuvash wordlist ""
The first ч
should only appear in contexts where чен
should be read as voiced. So, it has to be deleted after voiceless consonants. Exemples:
$ echo "мулкач<n>+ччен<post>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
мулкач<n>+ччен<post> мулкач>{ч}чен 0,000000
Generates мулкачччен
instead of мулкаччен
.
$ echo "автан<n>+ччен<post>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
автан<n>+ччен<post> автан>{ч}чен 0,000000
Generates автанччен
: correct.
$ echo "йытӑ<n>+ччен<post>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
йытӑ<n>+ччен<post> йыт{ː}ӑ>{ч}чен 0,000000
Generates йытӑччен
: Correct.
This is a quite often combination, but it fails:
$ echo "23<num><ord><subst><px3sp>+ччен<post>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
23<num><ord><subst><px3sp>+ччен<post> 23>-мӗш>{и}{н}>{ч}чен 0,000000
Generates 23-мӗшнччен
instead of 23-мӗшӗччен
Suffix/postposition -(ч)чен is currently added always with two чч. Only one should be written after a voiceless consonant:
автанччен, чӗрӗпчен, ҫуртчен, автобусчен, июльччен, пулӑччен, лашаччен, литератураччен.
In words finishing in н{й}я
й
is substituted by ь
before ӑ
. This is correct. The same should happen for words finishing in л{й}я
. For instance:
$ echo "Коля<np><ant><m><ins>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
Коля<np><ant><m><ins> Кол{й}я>п{A} 0,000000
Generates Колйӑпа
instead of Кольӑпа
.
In 8b6e805 a number of new tenses were added, which don't seem to be standard. Some of the negative forms look regular, others not. We should discuss tag names and how to organise the lexica.
05:35 <@firespeaker> prep is {A}t+past / m(A)s+past
05:36 <@firespeaker> plu is s{A}t+past / m{A}s{A}t+past
I've done a bit of simplication in 8ecbe54 .
I've been surprised to find this suffix as a postposition. I always considered it as a form of adjectivation. Of course, it can be considered as a postposition if this is more pan-Turkic :)
In any case, the suffix has two л after a vowel. So it is similar to (ч)чен, but easier, since there are no differences between consonants.
Examples:
пуртӑллӑ from пуртӑ.
мозаикăллă from мозаика.
няньăллă from няня.
илемлӗ from илем.
хутлӑ from хут.
<opt><p3><sg>
has a specific behaviour. I have used for it the new archiphoneme {И}. The и ending overwrites the final а or е vowels of the stem, if they exists. For у and и, an epenthesis must be added.
$ aq-morftest -ci v_кил_del.yaml | grep "<opt><p3><sg>"
[PASS] кил<v><iv><opt><p3><sg> => килин
$ aq-morftest -ci v_кай_del.yaml | grep "<opt><p3><sg>"
[PASS] кай<v><iv><opt><p3><sg> => кайин
[PASS] кайин => кай<v><iv><opt><p3><sg>
$ aq-morftest -ci v_кала_del.yaml | grep "<opt><p3><sg>"
[FAIL] кала<v><tv><opt><p3><sg> => missing results: калин
[FAIL] кала<v><tv><opt><p3><sg> => unexpected results: калаин
$ echo "кала<v><iv><opt><p3><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
кала<v><iv><opt><p3><sg> кала<v><iv><opt><p3><sg>+? inf
$ aq-morftest -ci v_ту_del.yaml | grep "<opt><p3><sg>"
[FAIL] ту<v><tv><opt><p3><sg> => missing results: тӑвин
[FAIL] ту<v><tv><opt><p3><sg> => unexpected results: тувин
$ echo "ту<v><tv><opt><p3><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
ту<v><tv><opt><p3><sg> ту{в}>{И}н 0,000000
$ aq-morftest -ci v_выля_del.yaml | grep "<opt><p3><sg>"
[FAIL] выля<v><tv><opt><p3><sg> => missing results: вылин
[FAIL] выля<v><tv><opt><p3><sg> => unexpected results: выляин
$ echo "выля<v><tv><opt><p3><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
выля<v><tv><opt><p3><sg> выля>{И}н 0,000000
For now, we have a very "dirty" Chuvash corpus. We have automatically converted PDF files to TXT and that's all. When it comes to chv-morph, some extra lines appear. This makes difficult to match line by line the text file with the output of the analysis. This is a case of the appearance of an extra line. Any idea how to fix it (without fully "clean" the corpus)?
$ cat -n corpus/chv.crp.tantash.net.txt | head -n 519 | tail -n 2
518 кӗтеттӗм. Кӗтсе илейменнипе пӗр эрне
519 маларах ҫыхма пуҫларӑм. Кӗтнӗ кун
$ cat -n corpus/chv.crp.tantash.net.txt | head -n 519 | tail -n 2 | apertium -d . chv-morph
^518/518<num>$ ^кӗтеттӗм/кӗт<v><tv><dur><p1><sg>$^./.<sent>$ ^Кӗтсе/кӗт<v><tv><gna_impf>$ ^илейменнипе/*илейменнипе$ ^пӗр/пӗрре<num><attr>/пӗр<v><iv><imp><p2><sg>/пӗр<v><tv><imp><p2><sg>$ ^эрне
519/*эрне
519$ ^маларах/маларах<adj>/маларах<adv>$ ^ҫыхма/ҫых<v><iv><ger>/ҫых<v><iv><neg><imp><p2><sg>/ҫых<v><tv><ger>/ҫых<v><tv><neg><imp><p2><sg>$ ^пуҫларӑм/пуҫла<v><iv><past><p1><sg>/пуҫла<v><tv><past><p1><sg>$^./.<sent>$ ^Кӗтнӗ/кӗт<v><tv><gpr_past>/кӗт<v><tv><past><evid>$ ^кун/кун<adj>/кун<n><attr>/кун<n><nom>/кун<v><iv><imp><p2><sg>/кун<v><tv><imp><p2><sg>/ку<prn><dem><gen>$^./.<sent>$
I've tried to introduce this suffix, but I've done it wrong and I don't know how I should do it.
There are 3 politeness suffixes that can be used in imp.p2. The problem is with сĂм, because it is added next to the stem before the person suffix (which exists only is p2.pl). For instance, for the verb кала in p2.pl we can have
калӑр ("normal" form without any politeness suffix)
But with сӑм:
кала.сӑм.ӑр
Other options are simple, since suffixes/clitics ччӗ and ха are added after the person suffix:
калӑрччӗ
каласӑмӑрччӗ
калӑр-ха
калӑрччӗ-ха
каласӑмӑрччӗ-ха
For the following forms of verb ҫи epenthetic й is not generated:
$ echo "ҫи<v><tv><fut><p1><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
ҫи<v><tv><fut><p1><sg> ҫи>{Ă}п 0,000000
ҫиӗп is generated instead of ҫийӗп
$ echo "ҫи<v><tv><fut><p2><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
ҫи<v><tv><fut><p2><sg> ҫи>{Ă}н 0000000
ҫин is generated instead of ҫийӗн
$ echo "ҫи<v><tv><fut><p3><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
ҫи<v><tv><fut><p3><sg> ҫи>{ӗ} 0,000000
ҫиӗ is generated instead of ҫийӗ
$ echo "ҫи<v><tv><fut><p1><pl>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
ҫи<v><tv><fut><p1><pl> ҫи>{Ă}п{Ă}р 0,000000
ҫиӗпӗр is generated instead of ҫийӗпӗр
$ echo "ҫи<v><tv><fut><p2><pl>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
ҫи<v><tv><fut><p2><pl> ҫи>{Ă}р 0,000000
ҫиӗр is generated instead of ҫийӗр
$ echo "ҫи<v><tv><fut><p3><pl>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
ҫи<v><tv><fut><p3><pl> ҫи>{ӗ}ҫ 0,000000
ҫиӗҫ is generated instead of ҫийӗҫ
$ echo "ҫи<v><tv><cond><p1><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
ҫи<v><tv><cond><p1><sg> ҫи>{Ă}тт{Ă}м 0,000000
ҫиӗттӗм is generated instead of ҫийӗттӗм
$ echo "ҫи<v><tv><cond><p2><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
ҫи<v><tv><cond><p2><sg> ҫи>{Ă}тт{Ă}н 0,000000
ҫиӗттӗн is generated instead of ҫийӗттӗн
$ echo "ҫи<v><tv><cond><p3><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
ҫи<v><tv><cond><p3><sg> ҫи>{ӗ}ҫҫ{ӗ} 0,000000
ҫиӗҫҫӗ
is generated instead of ҫийӗҫҫӗ
$ echo "ҫи<v><tv><cond><p1><pl>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
ҫи<v><tv><cond><p1><pl> ҫи>{Ă}тт{Ă}м{Ă}р 0,000000
ҫиӗттӗмӗр is generated instead of ҫийӗттӗмӗр
$ echo "ҫи<v><tv><cond><p2><pl>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
ҫи<v><tv><cond><p2><pl> ҫи>{Ă}тт{Ă}р 0,000000
ҫиӗттӗр is generated instead of ҫийӗттӗр
$ echo "ҫи<v><tv><cond><p3><pl>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
ҫи<v><tv><cond><p3><pl> ҫи>{ӗ}ҫҫ{ӗ}ҫ 0,000000
ҫиӗҫҫӗҫ is generated instead of ҫийӗҫҫӗҫ
$ echo "ҫи<v><tv><imp><p2><pl>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
ҫи<v><tv><imp><p2><pl> ҫи>{Ă}р 0,000000
ҫиӗр is generated instead of ҫийӗр
One of the two possible dative forms of музей is музея (the other is музее), but instead музеа is recognised/generated. Vowel+a requests an epenthetic й: музея.
There are three typical forms of adjective nominalisation:
-скер
suffix is also known and implementedThis third way is less used than the other two but it also happens.
The question here is that currently we have that the analysis stands for either +и and +∅. We should differenciate between both. I would use for the first one +и<subst%>, as it is done for скер.
Any problem?
There are problems when joining <iter>
to the suffix of present and conditional tenses. For example:
$ echo "кил<v><iv><iter><pres><p1><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
кил<v><iv><iter><pres><p1><sg> кил>к{A}л{A}>{A}т>{Ă}п 0,000000
$ aq-morftest -ci v_кил_del.yaml | grep "<iter>"
[FAIL] кил<v><iv><iter><pres><p1><sg> => missing results: килкелетӗп
[FAIL] кил<v><iv><iter><pres><p1><sg> => unexpected results: килкелеетӗп
[FAIL] кил<v><iv><iter><pres><p2><sg> => missing results: килкелетӗн
[FAIL] кил<v><iv><iter><pres><p2><sg> => unexpected results: килкелеетӗн
[FAIL] кил<v><iv><iter><pres><p3><sg> => missing results: килкелет
[FAIL] кил<v><iv><iter><pres><p3><sg> => unexpected results: килкелеет
[FAIL] кил<v><iv><iter><pres><p1><pl> => missing results: килкелетпӗр
[FAIL] кил<v><iv><iter><pres><p1><pl> => unexpected results: килкелеетпӗр
[FAIL] кил<v><iv><iter><pres><p2><pl> => missing results: килкелетӗр
[FAIL] кил<v><iv><iter><pres><p2><pl> => unexpected results: килкелеетӗр
[FAIL] кил<v><iv><iter><pres><p3><pl> => missing results: килкелеҫҫӗ
[FAIL] кил<v><iv><iter><pres><p3><pl> => unexpected results: килкелееҫҫӗ
$ echo "кала<v><tv><iter><pres><p1><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
кала<v><tv><iter><pres><p1><sg> кала>к{A}л{A}>{A}т>{Ă}п 0,000000
$ aq-morftest -ci v_кала_del.yaml | grep "<iter>"
[FAIL] кала<v><tv><iter><pres><p1><sg> => missing results: калакалатӑп
[FAIL] кала<v><tv><iter><pres><p1><sg> => unexpected results: калакалаатӑп
[FAIL] кала<v><tv><iter><pres><p2><sg> => missing results: калакалатӑн
[FAIL] кала<v><tv><iter><pres><p2><sg> => unexpected results: калакалаатӑн
[FAIL] кала<v><tv><iter><pres><p3><sg> => missing results: калакалать
[FAIL] кала<v><tv><iter><pres><p3><sg> => unexpected results: калакалаать
[FAIL] кала<v><tv><iter><pres><p1><pl> => missing results: калакалатпӑр
[FAIL] кала<v><tv><iter><pres><p1><pl> => unexpected results: калакалаатпӑр
[FAIL] кала<v><tv><iter><pres><p2><pl> => missing results: калакалатӑр
[FAIL] кала<v><tv><iter><pres><p2><pl> => unexpected results: калакалаатӑр
[FAIL] кала<v><tv><iter><pres><p3><pl> => missing results: калакалаҫҫӗ
[FAIL] кала<v><tv><iter><pres><p3><pl> => unexpected results: калакалааҫҫӗ
$ echo "кил<v><iv><iter><cond><p1><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
кил<v><iv><iter><cond><p1><sg> кил>к{A}л{A}>{Ă}т>т{Ă}м 0,000000
$ aq-morftest -ci v_кил_del.yaml | grep "<iter><cond>"
[FAIL] кил<v><iv><iter><cond><p1><sg> => missing results: килкелӗттӗм
[FAIL] кил<v><iv><iter><cond><p1><sg> => unexpected results: килкелеӗттӗм
[FAIL] кил<v><iv><iter><cond><p2><sg> => missing results: килкелӗттӗн
[FAIL] кил<v><iv><iter><cond><p2><sg> => unexpected results: килкелеӗттӗн
[FAIL] кил<v><iv><iter><cond><p3><sg> => missing results: килкелӗччӗ
[FAIL] кил<v><iv><iter><cond><p3><sg> => unexpected results: килкелеӗччӗ
[FAIL] кил<v><iv><iter><cond><p1><pl> => missing results: килкелӗттӗмӗр
[FAIL] кил<v><iv><iter><cond><p1><pl> => unexpected results: килкелеӗттӗмӗр
[FAIL] кил<v><iv><iter><cond><p2><pl> => missing results: килкелӗттӗр
[FAIL] кил<v><iv><iter><cond><p2><pl> => unexpected results: килкелеӗттӗр
[FAIL] кил<v><iv><iter><cond><p3><pl> => missing results: килкелӗччӗҫ
[FAIL] кил<v><iv><iter><cond><p3><pl> => unexpected results: килкелеӗччӗҫ
$ echo "кала<v><tv><iter><cond><p1><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
кала<v><tv><iter><cond><p1><sg> кала>к{A}л{A}>{Ă}т>т{Ă}м 0,000000
$ aq-morftest -ci v_кала_del.yaml | grep "<iter><cond>"
[FAIL] кала<v><tv><iter><cond><p1><sg> => missing results: калакалӑттӑм
[FAIL] кала<v><tv><iter><cond><p1><sg> => unexpected results: калакалаӑттӑм
[FAIL] кала<v><tv><iter><cond><p2><sg> => missing results: калакалӑттӑн
[FAIL] кала<v><tv><iter><cond><p2><sg> => unexpected results: калакалаӑттӑн
[FAIL] кала<v><tv><iter><cond><p3><sg> => missing results: калакалӗччӗ
[FAIL] кала<v><tv><iter><cond><p3><sg> => unexpected results: калакалаӗччӗ
[FAIL] кала<v><tv><iter><cond><p1><pl> => missing results: калакалӑттӑмӑр
[FAIL] кала<v><tv><iter><cond><p1><pl> => unexpected results: калакалаӑттӑмӑр
[FAIL] кала<v><tv><iter><cond><p2><pl> => missing results: калакалӑттӑр
[FAIL] кала<v><tv><iter><cond><p2><pl> => unexpected results: калакалаӑттӑр
[FAIL] кала<v><tv><iter><cond><p3><pl> => missing results: калакалӗччӗҫ
[FAIL] кала<v><tv><iter><cond><p3><pl> => unexpected results: калакалаӗччӗҫ
Verbs like кай or выля and affixes like Ай give a lot of work not because they are "irregular" but because of the stupid alien Stalinist orthography of Chuvash, that uses я, ю et al. Maybe it would be easier to have in the lexc forms like выльа, and to work in twol with pre-1936 orthography, i.e. producing e.g. кайатӑп. A final step could change all йа and ьа to я. Wouldn't it be easier?
A suffix АллА is called "allative". The problem is that the vowels previous to it are not deleted or no epenthetic consonants are added:
пулӑ<n><all>
= пуллӑалла instead of пуллаллапулӑ<n><pl><all>
= пулӑсеелле instead of пулӑсенеллелаша<n><all>
= лашаалла instead of лашаллалаша<n><pl><all>
= лашасеелле instead of лашасенеллемузей<n><all>
= музейелле or музейалла instead of музеелле or музеяллаIt could be added that, according to Chuvash grammarians, this is the dative case + (л)лА. This can help understand the rules about duplication of л in пуллалла and the н in пулӑсенелле.
There are problems in p3.sg and p3.pl in the past tense.
According to Krueger (p. 144) ч appears in p3 of stems in /l n r/ (actually he says p3.sg but in the examples he gives we can see that it also happens in p3.pl). So:
$ echo "кай<v><iv><past><p3><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
кай<v><iv><past><p3><sg> кай>{T}>ӗ 0,000000
Currently кайчӗ is generated instead of кайрӗ
$ echo "кай<v><iv><past><p3><pl>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
кай<v><iv><past><p3><pl> кай>{T}>ӗҫ 0,000000
Currently кайчӗҫ is generated instead of кайрӗҫ
(By the way, https://www.sapatlav.club is useful for conjugating verbs in Chuvash)
Sovietisms with this unstressed ending vowels should have allatives like in similar Chuvash words ending in ӑ or ӗ. So:
литература<n><all> = литературалла instead of литературӑналла (cf, пуллалла from пулӑ)
няня<n><all> = нянялла instead of няньӑналла
аллея<n><all> = аллеялла instead of аллейӑналла
министерство<n><all> = министерствалла instead of министерствӑналла
училище<n><all> = училищелле instead of училищӗнелле
E.g.,
[FAIL] тетрадь<n><pl>+сӑр<post> => missing results: тетрадьсемсӗр
[FAIL] тетрадь<n><pl>+сӑр<post> => unexpected results: тетрадьсесӗр
[FAIL] июнь<n><pl>+сӑр<post> => missing results: июньсемсӗр
[FAIL] июнь<n><pl>+сӑр<post> => unexpected results: июньсесӗр
[FAIL] кукӑль<n><pl>+сӑр<post> => missing results: кукӑльсемсӗр
[FAIL] кукӑль<n><pl>+сӑр<post> => unexpected results: кукӑльсесӗр
[FAIL] тӗн<n><pl>+сӑр<post> => missing results: тӗнсемсӗр
[FAIL] тӗн<n><pl>+сӑр<post> => unexpected results: тӗнсесӗр
[FAIL] ял<n><pl>+сӑр<post> => missing results: ялсемсӗр
[FAIL] ял<n><pl>+сӑр<post> => unexpected results: ялсесӗр
[FAIL] ача<n><pl>+сӑр<post> => missing results: ачасемсӗр
[FAIL] ача<n><pl>+сӑр<post> => unexpected results: ачасесӗр
[FAIL] ҫыру<n><pl>+сӑр<post> => missing results: ҫырусемсӗр
[FAIL] ҫыру<n><pl>+сӑр<post> => unexpected results: ҫырусесӗр
All missing м.
Ту is defined as:
ту:ту%{в%} V-TV;!""
But {в}
is not working properly in many cases.
ту<v><tv><pres><p1><sg>
generates туатӑп instead of тӑватӑп
ту<v><tv><pres><p2><sg>
generates туатӑн instead of тӑватӑн
ту<v><tv><pres><p3><sg>
generates туать instead of тӑвать
ту<v><tv><pres><p1><pl>
generates туатпӑр instead of тӑватпӑр
ту<v><tv><pres><p2><pl>
generates туатӑр instead of тӑватӑр
(But p3.pl is correctly generated: тӑваҫҫӗ)
There is a similar problem for the future:
ту<v><tv><fut><p1><sg>
generates тувӑп instead of тӑвӑп
ту<v><tv><fut><p3><sg>
generates тувӗ instead of тӑвӗ
ту<v><tv><fut><p1><pl>
generates тувӑпӑр instead of тӑвӑпӑр
ту<v><tv><fut><p2><pl>
generates тувӑр instead of тӑвӑр
ту<v><tv><fut><p3><pl>
generates тувӗҫ instead of тӑвӗҫ
(But p2.sg is correctly generated: тӑвӑн)
For some persons in imperative:
ту<v><tv><imp><p3><sg>
generates тувтӑр instead of тутӑр
ту<v><tv><imp><p2><pl>
generates тувӑр instead of тӑвӑр
ту<v><tv><imp><p3><pl>
generates тувччӑр instead of туччӑр
More:
$ echo "^ту<v><tv><gpr_pres>$ " | hfst-proc -g chv.autogen.hfst
туакан
Should be тӑвакан
(But ту<v><tv><gpr_fut>
is correctly generated: тӑвас)
The same problems can be found for тӳ,
05:12 <spectie> $ echo килнине | hfst-lookup chv.automorf.hfst
05:12 <spectie> килнине кил<v><iv><ger_past><px3sp><dat> 0,000000
05:13 <spectie> but not the one that Luutonen marks as соmе-PST.PTCP-I-DAT/ACC
05:13 <spectie> page 51
05:14 <firespeaker> well he says on p.57-58 that it can thought of synchronically as px3
05:14 <firespeaker> I guess what you want is
05:14 <firespeaker> кил<v><iv><gpr_past><subst><dat>
05:15 <spectie> but then there are some cases where it should surface as -ĕ- but surfaces as -и-
05:16 <firespeaker> yeah, he says on p. 55 that they PARTIALLY overlap in form
Looking at the differences in the analysis after the fix of #27, I have noticed two word forms that are not analysed now. I cannot understand why, because I cannot see the relationship with the fix, but it seems that it is a very strange side effect. Here are the two forms I noticed:
$ echo "выля<v><tv><pres><p3><pl>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
выля<v><tv><pres><p3><pl> выл{й}я>{A}ҫҫӗ 0,000000
Now выляаҫҫӗ
is generated instead of выляҫҫӗ
.
Notice that the change should be, as I understand:
в ы л {й}:й я:0 > {A}:а ҫ ҫ ӗ
$ echo "выля<v><tv><gna_impf>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
выля<v><tv><gna_impf> выл{й}я>с{A} 0,000000
Now выльӑса
is generated instead of выляса
I have been investigating on the substantivisation suffix и. It was not very clear.
As your quote Luutonen in #5, it is very close to the third person affix (and, I guess, maybe comes from it), but there are differences with it.
лайӑх<adj>+и<subst><dat> : лайӑххин
, cf. информаци<n><px3sp><gen> : информацийӗн
)So:
{ː}
to all adjectives finishing in VC[ӑӗ]
(as it is done for nouns), and it seems to work{ː}
to all adjectives finishing by one single consonant. I tried it, but the problem is that duplication now happens with other suffixes, for instance +ах. There is duplication only for +и (it seems).Tests can be found in (и_subst.yaml). The results bellow are from a test with лайӑх defined as лайӑх:лайӑх%{ː%} A2;
(but пысӑк, пӗчӗк and аван defined without {ː}
). Notice how the test passes for лайӑх+и, for instance, but not for лайӑх<adj>+ах<mod>
, while for аван (defined without {ː}), the situation is the opposite.
$ aq-morftest и_subst.yaml | more
--------------------------------------
Test 0: и <subst> (Lexical/Generation)
--------------------------------------
[PASS] лайӑх<adj>+и<subst><nom> => лайӑххи
[FAIL] пысӑк<adj>+и<subst><nom> => missing results: пысӑкки
[FAIL] пысӑк<adj>+и<subst><nom> => unexpected results: пысӑкӗ
[FAIL] пӗчӗк<adj>+и<subst><nom> => missing results: пӗчӗкки
[FAIL] пӗчӗк<adj>+и<subst><nom> => unexpected results: пӗчӗкӗ
[FAIL] аван<adj>+и<subst><nom> => missing results: аванни
[FAIL] аван<adj>+и<subst><nom> => unexpected results: аванӗ
[PASS] тутлӑ<adj>+и<subst><nom> => тутли
[PASS] ҫурӑ<adj>+и<subst><nom> => ҫурри
[PASS] хура<adj>+и<subst><nom> => хури
[PASS] хитре<adj>+и<subst><nom> => хитри
[FAIL] лайӑх<adj>+и<subst><dat> => missing results: лайӑххин
[FAIL] лайӑх<adj>+и<subst><dat> => unexpected results: лайӑххине
[PASS] лайӑх<adj>+и<subst><ins> => лайӑххипе
[PASS] лайӑх<adj>+и<subst><pl><nom> => лайӑххисем
[PASS] лайӑх<adj>+и<subst><pl><gen> => лайӑххисен
[FAIL] лайӑх<adj>+ах<mod> => missing results: лайӑхах
[FAIL] лайӑх<adj>+ах<mod> => unexpected results: лайӑххах
[PASS] лайӑх<adj><comp> => лайӑхрах
[PASS] аван<adj>+ах<mod> => аванах
[PASS] аван<adj><comp> => авантарах
[FAIL] аван<adj><comp> => unexpected results: аванрах
Test 0 - Passes: 11, Fails: 11, Total: 22
P.S.
I was in the Humanities Institute discussing it with the whole philology department. They do not agree whether it is possible or not that a possessive comes after this substantivisation suffix. If it is, it should be extremely rare, and I have not find such cases in our corpus, so I removed the possibility to add possessives after this suffix (I put there a few days ago: now it is like it was).
It looks like <abe>
has been changed to +сӑр<post>
and <term>
has been changed to <ter>
. I'm curious when these changes were made and what the reasoning for the changes was.
"кил" v iv past p3 sg is килчӗ instead of килтӗ
"кил" v iv past p3 pl is килчӗҫ instead of килтӗҫ
See http://wiki.apertium.org/wiki/%D0%9A%D0%B8%D0%BB
( %{T%} changes to т for the other persons in such a verb with final л, but to ч in p3.sg and p3.pl)
Currently apertium-chv generates two forms of some words with <px2sg><dat>
, e.g.
^алӑк<n><px2sg><dat>$ ↬ алӑкуна/алӑкна
If both of these are correct, we need to chose which one to generate (my inclination would be to go with the former, since it will be distinct from px3sp.dat
in many situations). If only one is correct, which is it?
A new Chuvash grammar textbook is being prepared on the basis of a 3M+ word corpus and our morphological analysis. The author is asking for a composite output of modes chv-morph and chv-segment in which he could more easily search for specific surface forms of morphems.
For instance currently we have these two analysis for ачисен:
$ echo "ачисен" | apertium -d . chv-morph
^ачисен/ача<n><px3sp><pl><gen>$^./.<sent>$
$ echo "ачисен" | apertium -d . chv-segment
^ачисен/ач>и>се>н$^./.$
He is asking for something like this:
^ачисен/ача<n>и<px3sp>се<pl>н<gen>$
This request seems not illogical and probably can be useful for other people and languages.
Could this more or less easily be done?
In some verbal forms the last а of verb кала is not deleted:
$ echo "кала<v><iv><fut><p1><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
кала<v><iv><fut><p1><sg> кала>{Ă}п 0,000000
калаӑп is generated instead of калӑп
$ echo "кала<v><iv><fut><p2><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
кала<v><iv><fut><p2><sg> кала>{Ă}н 0,000000
калан is generated instead of калӑн
$ echo "кала" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
кала кала>{Ă}п{Ă}р 0,000000
калаӑпӑр is generated instead of калӑпӑр
$ echo "кала<v><iv><fut><p2><pl>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
кала<v><iv><fut><p2><pl> кала>{Ă}р 0,000000
калаӑр is generated instead of калӑр
$ echo "кала<v><iv><cond><p1><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
кала<v><iv><cond><p1><sg> кала>{Ă}тт{Ă}м 0,000000
калаӑттӑм is generated instead of калӑттӑм
$ echo "кала<v><iv><cond><p2><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
кала<v><iv><cond><p2><sg> кала>{Ă}тт{Ă}н 0,000000
калаӑттӑн is generated instead of калӑттӑн
$ echo "кала<v><iv><cond><p1><pl>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
кала<v><iv><cond><p1><pl> кала>{Ă}тт{Ă}м{Ă}р 0,000000
калаӑттӑмӑр is generated instead of калӑттӑмӑр
$ echo "кала<v><iv><cond><p2><pl>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
кала<v><iv><cond><p2><pl> кала>{Ă}тт{Ă}р 0,000000
калаӑттӑр is generated instead of калӑттӑр
$ echo "кала<v><iv><imp><p1><sg>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
кала<v><iv><imp><p1><sg> кала>{A}м 0,000000
калаам is generated instead of калам
$ echo "кала<v><iv><imp><p2><pl>" | hfst-lookup .deps/chv.LR.lexc.hfst 2> /dev/null
кала<v><iv><imp><p2><pl> кала>{Ă}р 0,000000
калаӑр is generated instead of калӑр
There are problems with +ах when it follows something finishing in vowel. Here are some cases:
$ echo "урам<n><px3sp><nom>+ах<mod>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
урам<n><px3sp><nom>+ах<mod> урам>{и}{н}>{A}х 0,000000
Generates урамнех
instead of урамех
$ echo "пулӑ<n><nom>+ах<mod>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
пулӑ<n><nom>+ах<mod> пул{ː}ӑ>{A}х 0,000000
Generates пуллӑах
instead of пулах
(notice also the lack of gemination)
$ echo "пулӑ<n><dat>+ах<mod>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
пулӑ<n><dat>+ах<mod> пул{ː}ӑ>{N}{A}>{A}х 0,000000
Generates пуллаах
instead of пуллах
.
$ echo "кӗнеке<n><nom>+ах<mod>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
кӗнеке<n><nom>+ах<mod> кӗнеке>{A}х 0,000000
Generates кӗнекеех
instead of кӗнекех
$ echo "лаша<n><nom>+ах<mod>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
лаша<n><nom>+ах<mod> лаша>{A}х 0,000000
Generates лашаах
instead of лашах
$ echo "кӗнеке<n><px3sp><nom>+ах<mod>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
кӗнеке<n><px3sp><nom>+ах<mod> кӗнеке>{и}{н}>{A}х 0,000000
Generates кӗнекинех
instead of кӗнекиех
$ echo "информаци<n><px3sp><nom>+ах<mod>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
информаци<n><px3sp><nom>+ах<mod> информаци{й}>{и}{н}>{A}х 0,000000
Generates информацинех
instead of информациех
$ echo "правительство<n><nom>+ах<mod>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
правительство<n><nom>+ах<mod> правительств{о}>{A}х 0,000000
Generates правительствоах
instead of правительствах
$ echo "кофе<n><nom>+ах<mod>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
кофе<n><nom>+ах<mod> коф{е}>{A}х 0,000000
Generates кофеех
instead of кофех
$ echo "кил<v><iv><pres><p3><pl>+ах<mod>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
кил<v><iv><pres><p3><pl>+ах<mod> кил>{A}ҫҫӗ>{A}х 0,000000
Generates килеҫҫӗех
instead of килеҫҫех
$ echo "кил<v><iv><pres><p3><pl>+ах<mod>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
кил<v><iv><pres><p3><pl>+ах<mod> кил>{A}ҫҫӗ>{A}х 0,000000
Generates килеҫҫӗех
instead of килеҫҫех
$ echo "кил<v><iv><ger_nec>+ах<mod>" | hfst-lookup .deps/chv.LR.lexc.hfst 2>/dev/null
кил<v><iv><ger_nec>+ах<mod> кил>м{A}лл{A}>{A}х 0,000000
Generates килмеллеех
instead of килмеллех
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.