apertium / apertium-tyv Goto Github PK
View Code? Open in Web Editor NEWApertium linguistic data for Tuvinian
License: GNU General Public License v3.0
Apertium linguistic data for Tuvinian
License: GNU General Public License v3.0
Tuvan apertium-tyv =============================================================================== This is an Apertium monolingual language package for Tuvan. What you can use this language package for: * Morphological analysis of Tuvan * Morphological generation of Tuvan * Part-of-speech tagging of Tuvan Requirements =============================================================================== You will need the following software installed: * lttoolbox (>= 3.3.0) * apertium (>= 3.3.0) * vislcg3 (>= 0.9.9.10297) * hfst (>= 3.8.2) If this does not make any sense, we recommend you look at: apertium.org Compiling =============================================================================== Given the requirements being installed, you should be able to just run: $ ./configure $ make You can use ./autogen.sh instead of ./configure if you're compiling from SVN. If you're doing development, you don't have to install the data, you can use it directly from this directory. If you are installing this language package as a prerequisite for an Apertium translation pair, then do (typically as root / with sudo): # make install You can give a --prefix to ./configure to install as a non-root user, but make sure to use the same prefix when installing the translation pair and any other language packages. Testing =============================================================================== If you are in the source directory after running make, the following commands should work: $ echo "TODO: test sentence" | apertium -d . tyv-morph TODO: test analysis result $ echo "TODO: test sentence" | apertium -d . tyv-tagger TODO: test tagger result Files and data =============================================================================== * apertium-tyv.tyv.dix - Monolingual dictionary * apertium-tyv.tyv.lexc - Morphotactic dictionary * apertium-tyv.tyv.twol - Morphophonological rules * apertium-tyv.tyv.rlx - Constraint Grammar disambiguation rules * apertium-tyv.post-tyv.dix - Post-generator * tyv.prob - Tagger model * modes.xml - Translation modes For more information =============================================================================== * https://wiki.apertium.org/wiki/Installation * https://wiki.apertium.org/wiki/apertium-tyv * https://wiki.apertium.org/wiki/Using_an_lttoolbox_dictionary Help and support =============================================================================== If you need help using this language pair or data, you can contact: * Mailing list: [email protected] * IRC: #apertium on irc.oftc.net See also the file AUTHORS included in this distribution.
In my generated paradigm for кел
, some forms have double <perf>
tag:
келиптипкен:кел<v><iv><perf><perf><ger_perf><nom>
келивитпепкен:кел<v><iv><perf><neg><perf><ger_perf><nom>
келиптипкеш:кел<v><iv><perf><perf><gna_perf>
келивитпепкеш:кел<v><iv><perf><neg><perf><gna_perf>
келиптиптер:кел<v><iv><perf><perf><p3><sg>
келивитпептер:кел<v><iv><perf><neg><perf><p3><sg>
plus all of their inflections by case, person, etc. Looks like a double -{I}pt{I}
, not sure if Tuvan allows this.
modes.xml includes some modes with install="yes"
, but the required
files aren't installed.
Some generic suggestions:
-lexc and -twol modes probably aren't useful to users
-spell modes should depend on --enable-ospell
.deps files are never installed, so any modes using them shouldn't be
installed.
Messages for package app-dicts/apertium-tyv-9999:
Failed to find '/usr/share/apertium/apertium-tyv/.deps/tyv.twol.hfst' in install image.
QA: missing files required for mode tyv-twol.
Failed to find '/usr/share/apertium/apertium-tyv/.deps/tyv.LR.lexc.hfst' in install image.
QA: missing files required for mode tyv-lexc.
Failed to find '/usr/share/apertium/apertium-tyv/tyv.zhfst' in install image.
QA: missing files required for mode tyv-spell.
Failed to find '/usr/share/apertium/apertium-tyv/.deps/acceptor.default.hfst' in install image.
QA: missing files required for mode tyv-tokenise.
Some imperatives are still broken.
This includes the following regressions because of #2:
> 2 ^чугаалаваайн/*чугаалаваайн$
> 2 ^сагындырбаайн/*сагындырбаайн$
> 1 ^кортпаайн/*кортпаайн$
> 1 ^чугаалаваайн/*чугаалаваайн$
> 1 ^чорбаайн/*чорбаайн$
> 1 ^чажырбаайн/*чажырбаайн$
> 1 ^узуткаваайн/*узуткаваайн$
> 1 ^барбаайн/*барбаайн$
> 1 ^чажырбаайн/*чажырбаайн$
> 1 ^тайылбырлаваайн/*тайылбырлаваайн$
> 1 ^адаваайн/*адаваайн$
And the following form from tests/verbs.yaml
:
[1/3][FAIL] саг<v><tv><imp><p1><du> => Missing results: саалы
[1/3][FAIL] саг<v><tv><imp><p1><du> => Unexpected results: сааалы
Two more forms in Iskhakov & Pal'mbakh are not being generated:
Прошедшее повествовательное время на -п-тыр, прошедшее историческое/заглазное/неожиданное, эрткен үэниң медээ хевири (I&P 373). The book says it's a past tense used to describe a sudden occurrence.
кээп-тир мен
кээп-тир сен
кээп-тир
кээп-тир бис
кээп-тир силер
кээп-тирлер
Without the hyphen, the analyzer can parse the кээптир мен
as кел<v><iv><perf><aor><p1><sg>
, but it
generates келиптир мен
for that form.
Прошедшее-настоящее время на -пышаан (I&P 379). Looks like it denotes an action that started in the past and is still going on, Anderson & Harrison annotate is as durative.
келбишаан мен
келбишаан сен
келбишаан
келбишаан бис
келбишаан силер
келбишааннар
There is an analysis for келбишаан
as a verbal adverb though: кел<v><iv><gna_still>
. Are these forms
considered analytic?
Opening an issue per @ftyers' request.
чор:чор<v><iv><aor><p3><sg>
Is this inflection correct, or should it be чоор
?
I've been comparing Apertium-generated paradigms with the ones in Iskhakov & Pal'mbakh 1961 grammar book (Ф. Г. Исхаков, А. А. Пальмбах. Грамматика тувинского языка: Фонетика и морфология.) and found some mismatches.
Disclaimer: I am not a speaker of Tuvan.
Some Apertium-generated imperative forms for кел
:
келеалыңар:кел<v><iv><imp><p1><pl>
келейн:кел<v><iv><imp><p1><sg>
келеалы:кел<v><iv><imp><p1><du>
I&P book has келиилиңер, келийн, келиили
respectively (pp. 391-392).
Some <p3><pl>
forms have a double -лер
. I haven't seen this in the literature and it looked suspicious.
келдилер:кел<v><iv><ifi><p3><pl>
келдилерлер:кел<v><iv><ifi><p3><pl>
I&P has келдилер
for this analysis (I&P 365), and Harrison, 2000 has keldi(ler)
. The same pattern in other tenses:
келгеннер:кел<v><iv><ger_past><nom>+э<cop><aor><p3><pl>
келгеннерлер:кел<v><iv><ger_past><nom>+э<cop><aor><p3><pl>
келгендирлер:кел<v><iv><ger_past><nom>+э<cop><aor><evid><p3><pl>
келгендирлерлер:кел<v><iv><ger_past><nom>+э<cop><aor><evid><p3><pl>
...
This is a list of generator errors that Aziyana Bayyr-ool identified while working on the error analysis for the shared task.
Incorrect inflections:
Generated form | Correct form |
---|---|
ижиарлар:ижик<v><TD><aor><p3><pl> |
ижигерлер |
көрдүнүүлү:көрдүн<v><iv><imp><p1><du> |
көрдүнээли |
садырлар:сад<v><tv><aor><p3><pl> |
садарлар |
садыылы:сад<v><tv><imp><p1><du> |
садаалы |
тырылыйн:тырыл<v><TD><imp><p1><sg> |
тырлыйн |
тырылырлар:тырыл<v><TD><aor><p3><pl> |
тырлырлар |
ужуаалы:ужук<v><TD><imp><p1><du> |
ужаалы |
холужуптур бис:холуш<v><iv><perf><aor><p1><pl> |
холужуптар бис |
холужуптурлар:холуш<v><iv><perf><aor><p3><pl> |
холужуптарлар |
хоорулур:хоорул<v><iv><aor><p3><sg> |
хоорлур |
хоорулур мен:хоорул<v><iv><aor><p1><sg> |
хоорлур мен |
хоорулур сен:хоорул<v><iv><aor><p2><sg> |
хоорлур сен |
чыглыңар:чыыл<v><iv><imp><p2><pl> |
чыглыылыӊар (see Note below) |
шымыныр силер:шымын<v><TD><aor><p2><pl> |
шымныр силер |
мөгеейн:мөгей<v><iv><imp><p1><sg> |
мөгейээйн (rare/unusual) |
мөгееалы:мөгей<v><iv><imp><p1><du> |
мөгейээли (rare/unusual) |
Note: чыглыңар:чыыл<v><iv><imp><p2><pl>
: Aziyana says this form exists (meaning 'вы собирайтесь') but does not correspond to this lemma. The correct form for чыыл
should be чыглыылыӊар
('давайте соберемся').
Incorrect lemmas:
Lemma in the lexicon | Correct lemma |
---|---|
номчун<v> |
номчуттун |
өпей<v> |
өпейле (Aziyana says өпей exists too but as a name) |
Forms that are plausible but rarely or never used, so Aziyana has doubts about them:
мөгеейн:мөгей<v><iv><imp><p1><sg>
мөгееалы:мөгей<v><iv><imp><p1><du>
аржаяйн:аржай<v><TD><imp><p1><sg>
арзаяйн:арзай<v><TD><imp><p1><sg>
мажаяйн:мажай<v><TD><imp><p1><sg>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.