Code Monkey home page Code Monkey logo

apertium-uig's Introduction

Uyghur (apertium-uig)

This is an Apertium monolingual language package for Uyghur. What you can use this language package for:

  • Morphological analysis of Uyghur
  • Morphological generation of Uyghur
  • Part-of-speech tagging of Uyghur

Requirements

You will need the following software installed:

  • lttoolbox (>= 3.3.0)
  • apertium (>= 3.3.0)
  • vislcg3 (>= 0.9.9.10297)

If this does not make any sense, we recommend you look at: apertium.org

Compiling

Given the requirements being installed, you should be able to just run:

$ ./configure
$ make

You can use ./autogen.sh instead of ./configure if you're compiling from GitHub.

If you're doing development, you don't have to install the data, you can use it directly from this directory.

If you are installing this language package as a prerequisite for an Apertium translation pair, then do (typically as root / with sudo):

# make install

You can give a --prefix to ./configure to install as a non-root user, but make sure to use the same prefix when installing the translation pair and any other language packages.

Testing

If you are in the source directory after running make, the following commands should work:

$ echo "لېكىن بۇنىڭ مېھرى باشقىچە ئىسسىق بىلىندى."  | apertium -d . uig-morph
^لېكىن/لېكىن<cnjcoo>$ ^بۇنىڭ/بۇ<prn><dem><gen>$ ^مېھرى/مېھر<n><px3sp><nom>$ 
^باشقىچە/باش<n><ter>$ ^ئىسسىق/ئىسسىق<adj>/ئىسسىق<adj><advl>/ئىسسىق<adj><subst><nom>$ 
^بىلىندى/بىل<v><tv><pass><ifi><p3><sg>/بىل<v><tv><pass><ifi><p3><pl>$^./.<sent>$
$ echo "لېكىن بۇنىڭ مېھرى باشقىچە ئىسسىق بىلىندى."  | apertium -d . uig-tagger
^لېكىن/لېكىن<cnjcoo>$ ^بۇنىڭ/بۇ<prn><dem><gen>$ ^مېھرى/مېھر<n><px3sp><nom>$ 
^باشقىچە/باش<n><ter>$ ^ئىسسىق/ئىسسىق<adj><advl>$ ^بىلىندى/بىل<v><tv><pass><ifi><p3><sg>$
^./.<sent>$

Files and data

  • apertium-uig.uig.lexc - Lexicon file
  • apertium-uig.uig.twol - Phonological rules
  • uig.prob - Tagger model
  • apertium-uig.uig.rlx - Constraint Grammar disambiguation rules
  • apertium-uig.post-uig.dix - Post-generator
  • modes.xml - Translation modes

For more information

Help and support

If you need help using this language pair or data, you can contact:

See also the file AUTHORS included in this distribution.

apertium-uig's People

Contributors

connormayer avatar ftyers avatar ilnarselimcan avatar jonorthwash avatar koghuzhan avatar memduhg avatar mr-martian avatar sushain97 avatar tinodidriksen avatar unhammer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

apertium-uig's Issues

No morphology on numbers/punctuation

quoting @koguzhan from the wiki:

Suffixes after Numbers or characters like " << >> are currently not analyzed at all

Copying the CRH or TUR solution might work, though I'm not sure how uyghur usually puts numbers and morphology.

Missing analysis for "mu"

"<ئۇيغۇرلارمۇ>"
        "ئۇيغۇر" n pl nom
                "مۇ" qst

مۇ here should have an analysis similar to the Turkish -dA, as in Onlar da öğrenci, "they too are students."

imish forms

context base full form no space reduced with mi
nouns (nom, etc.), p3 doktur doktur imish dokturimish dokturmish
→ normal analysis doktur<n><nom> doktur<n><nom> i<cop><aor><dub><p3><sg> doktur<n><nom>+i<cop><aor><dub><p3><sg> doktur<n><nom>+i<cop><aor><dub><p3><sg>
→ linguistic analysis doktur<n><nom> doktur<n><nom> i<cop><aor><dub><p3><sg> doktur<n><nom>+i<cop><aor><dub><p3><sg> doktur<n><nom>+i<cop_no_i><aor><dub><p3><sg>

Clitics not splitting off

LEXICON CLITICS
# ;
%<qst%>:%>ﻡۇ V-COP-PERS ;
%<qst%>:%>چۇ # ;
!!%<gm%>:%>%{D%}ۇﺭ # ;
%<comp%>:%>ﺭ%{A%}%{K%} # ;
%<dek%>:%>%{D%}ەﻙ # ;
%<che%>:%>چە # ;
%<dub%>:%>%{y%}ﻙەﻥ V-COP-PERS ;
%<postadv%>:%>مۇ # ;
%<cnjadv%>:%>كى # ; !

It looks like these clitics are not splitting off properly, causing their tags to be added to the previous word, e.g.
^سىرت<n><px3sp><loc><p3><sg><cnjadv>$
The solution is to split them off and make sure they are assigned their own lemmas:
+%+كى%<cnjadv%>:%>كى # ; !

Locative -dA expecting the wrong vowel

Words like "musabiqe" and "téxnika" conflicting with vowel harmony cause problems when they get the locative suffix, especially when they get both possessive and locative suffixes.
For example, تېخنىكىدا and مۇسابىقىسىدە are the correct forms but the analyzer expects تېخنىكىدە and مۇسابىقىسىدا.
update: roots ending with ى like كىشى are also problematic. The analyzer considers ى a back vowel for example when suffixing -lAr, so it gets كىشىلار while كىشىلەر is the correct form.

Installed modes are missing files

modes.xml includes some modes with install="yes", but the required
files aren't installed.

Some generic suggestions:

  • -lexc and -twol modes probably aren't useful to users

  • -spell modes should depend on --enable-ospell

  • .deps files are never installed, so any modes using them shouldn't be
    installed.

  • Messages for package app-dicts/apertium-uig-9999:

  • Failed to find '/usr/share/apertium/apertium-uig/.deps/uig.twol.hfst' in install image.

  • QA: missing files required for mode uig-twol.

  • Failed to find '/usr/share/apertium/apertium-uig/.deps/uig.LR.lexc.hfst' in install image.

  • QA: missing files required for mode uig-lexc.

When CG isn't found compile errors

make: *** No rule to make target `no', needed by `uig.rlx.bin'.  Stop.

The "no" comes from the $(CGCOMP) variable in the Makefile. The Makefile should probably not try and compile the CG if it is not there.

This probably applies to other languages too.

Vowel Harmony-ى

ى still causes problems with vowel harmony, especially when it interacts with special characters.

iken forms

context base full form no space reduced with mi
nouns (nom, etc.), p3 doktur doktur iken dokturiken dokturken doktur miken
→ normal analysis doktur<n><nom> doktur<n><nom> i<cop><aor><evid><p3><sg> doktur<n><nom>+i<cop><aor><evid><p3><sg> doktur<n><nom>+i<cop><aor><evid><p3><sg> doktur<n><nom>+mi<qst>+i<cop><aor><evid><p3><sg>
→ linguistic analysis doktur<n><nom> doktur<n><nom> i<cop><aor><evid><p3><sg> doktur<n><nom>+i<cop><aor><evid><p3><sg> doktur<n><nom>+i<cop_no_i><aor><evid><p3><sg> doktur<n><nom>+mi<qst>+i<cop><aor><evid><p3><sg>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.