Code Monkey home page Code Monkey logo

apertium-afr-nld's Introduction

Apertium

Requirements

  • This package needs the package lttoolbox-3.5 installed in the system, as well as libxml and libpcre.

See https://apertium.org and https://wiki.apertium.org for more information on installing.

Description

When building, this package generates, among others, the following modules:

  • apertium-deshtml, apertium-desrtf, apertium-destxt Deformatters for html, rtf and txt document formats.
  • apertium-rehtml, apertium-rertf, apertium-retxt Reformatters for html, rtf and txt document formats.
  • apertium Translator program. Execute without parameters to see the usage.

Quick Start

There are binaries available for Debian, Ubuntu, Fedora, CentOS, OpenSUSE, Windows, and macOS. We package both nightly builds and releases. See https://wiki.apertium.org/wiki/Installation for more information. Only build from source if you either want to change this tool's behavior, or are on a platform we don't yet package for.

  1. Download the packages for lttoolbox-VERSION.tar.gz and apertium-VERSION.tar.gz and linguistic data

    Note: If you are using the translator from GitHub, run ./autogen.sh before running ./configure in all cases.

  2. Unpack lttoolbox and do ('#' means 'do that with root privileges'):

   $ cd lttoolbox-VERSION
   $ ./configure
   $ make
   # make install
  1. Unpack apertium and do:
   $ cd apertium-VERSION
   $ ./configure
   $ make
   # make install
  1. Unpack linguistic data (LING_DATA_DIR) and do:
   $ cd LING_DATA_DIR
   $ ./configure
   $ make
   and wait for a while (minutes).
  1. Use the translator
   USAGE: apertium [-d datadir] [-f format] [-u] <direction> [in [out]]
    -d datadir       directory of linguistic data
    -f format        one of: txt (default), html, rtf, odt, docx, wxml, xlsx, pptx,
                     xpresstag, html-noent, latex, latex-raw
    -a               display ambiguity
    -u               don't display marks '*' for unknown words
    -n               don't insert period before possible sentence-ends
    -m memory.tmx    use a translation memory to recycle translations
    -o direction     translation direction using the translation memory,
                     by default 'direction' is used instead
    -l               lists the available translation directions and exits
    direction        typically, LANG1-LANG2, but see modes.xml in language data
    in               input file (stdin by default)
    out              output file (stdout by default)


   Sample:

   $ apertium -f txt es-ca <input >output

apertium-afr-nld's People

Contributors

bentley avatar ftyers avatar jimregan avatar marcriera avatar mr-martian avatar pimotte avatar sushain97 avatar tinodidriksen avatar trondtr avatar unhammer avatar wolfgangth avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

apertium-afr-nld's Issues

Migrate to monolingual language packages

This pair currently embeds the following monolingual information:

  • apertium-afr-nld.nld.dix
  • apertium-afr-nld.afr.dix
  • apertium-afr-nld.nld.acx
  • apertium-afr-nld.afr.acx
  • apertium-afr-nld.post-nld.dix
  • apertium-afr-nld.post-afr.dix
  • nld-afr.prob
  • afr-nld.prob

These files should be imported from apertium-afr and apertium-nld. After doing this, testvoc will need to be done. It is recommended to do this on a branch and then merge to master after it is finished and the testvoc results as as good or better.

inflection of adjectives

This is a complicated issue in both languages and the complications are not the same.

The ground rule in both languages is the same:

*Used as predicate (or as adverb): adjective remains without inflection:

afr: Soms is kerk regtig vervelig en voorspelbaar. --> nld: Soms is kerk echt vervelend en voorspelbaar

*Used as attribute the adjective gets an inflection -e:

afr: vervelige boeke --> nld: vervelende boeken
afr:Suid-Afrika se taamlik voorspelbare politieke situasie -> Zuid-Afrika's nogal voorspelbare politieke situatie.

Unfortunately the latter rule has many exceptions in both languages and they are very different in the two languages. I do not pretend to know the Afrikaans ones all that well.

In nld the biggest exception is if the noun is singular neuter and used in indefinite form, for example with the indefinite article "een" or its negative "geen"

het paard is mooi
het mooie paard
een mooi paard

In Afrikaans neutral gender does not exist:

die perd is mooi
die mooie perd
'n mooie perd

In Afrikaans there are different exceptions: a whole bunch of monosyllabic adjectives never get inflected; e.g. groot:

die perd is groot
die groot perd
'n groot perd

For us Dutchies this is really hard, we would agree with 'n groot perd but not with die groot perd... The rules are really different..

I have no idea how you would code this. Perhaps in Afrikaans you can define a predicate form and an attribute form and simply make the two the same for cases like groot? In Dutch the rules would have to include gender and indefiniteness for the attribute form.

There is more on this subject but let me leave it at this.

Participles

Participles are a big problem in Afrikaans. If they are used as part of the verb's conjugation they are usually just the verb with a ge- prefix:

ek loop --: ek het geloop

The prefix is dropped if there is an inseperable prefix

ek beveel -- ek het beveel.

The problem comes when they are used as attibutive (or predicative) adjectives:

verbal usage: hy het dit aanbeveel - hij heeft dit aanbevolen - he has recommended this
adjectival: die daaglikse aanbevole inname - de dagelijks aanbevolen hoeveelheid - the recommended daily intake
predicative: hierdie inname is aanbevole - this intake is recommended

Sometimes the attributive participle is still strong like Dutch aanbevolen, particularly in formal usuage and idioms, but for lower register don't be surprised to see 'aanbeveelde'. The language seems in flux on this point.

Can apertium be taught to distinguish the difference between verbal and adjectival usage?

Vocabulary issues 2

nld: link (masc noun) --> afr:skakel (computer link)
nld: externe links -> afr: eksterne skakels
nld:zich (refelxive pronoun) -> afr: hom (apertium gives sig, this word is very rare and obsolete)

Double negation

Afrikaans has double negation, something Dutch does not have. In Afrikaans this means that a sentence that contains a negation typically ends in the (extra) word nie.

Apertium does not add that as yet not.

Examples:
afr: Johnnie is nie dood nie.
nld:Johnnie is niet dood.

afr:Maar sy het nie omgegee nie.
nld: Maar zij gaf er niet om.

There are cases where Afrikaans only has one negative element in the sentence but that is rare and I'm not quite sure how that works. They are typically short utterances in the present tense. I don't think you ever get two 'nie'-s at the end of the phrase.

The double negation can be triggered by other words besides nie itself that imply a negation like:
*geen, g'n
*niemand
*niks
*nooit
*moenie

afr: Dis g'n wonder nie.
nld: 't Is geen wonder.

afr: Dit was 'n dag wat ek nooit sal vergeet nie.
nld: Dit was een dag die ik nooit zal vergeten.

afr: Ek het niks gedoen nie.
nld: Ik heb niets gedaan.

Moenie initiates a negative imperative:

afr: Moenie huil nie!
nld: Huil niet!

Sometimes the distance between the two negatives is quite large, e.g. when a subordinate clause intervenes:

afr: Niemand is nog in hegtenis geneem nadat 'n man Maandagaand buite die bekende M-kern-apteek in Bellville in die Wes-Kaap verskeie kere in die been geskiet is nie.
nld: Niemand is er nog in hechtenis genomen nadat een man maandagavond buiten de bekende M-kern-apotheek in Belville in de provincie West-Kaap verscheidene keren in het been geschoten werd.

vocabulary issues 3

nld:gesteente fem. noun pl:gesteentes or gesteenten
afr:gesteente pl: gesteentes (only)

this holds for most nouns on -te.

nld:tienduizendste (ordinal) -> afr: tienduisendste

nld:opgesloten -> afr: opgesluit

This is a participle of opsluit in verbal use. I don't think it is used much as an adjective

nld: halveringstijd (noun masc) -> afr: halfleeftyd

plural halveringstijden -> halfleeftye

nld: methode (noun fm) -> afr: metode

nld pl: methodes, methoden -> afr: metodes

nld: recent (adj) -> afr: onlangs

In nld onlangs is purely an adverb, in afr it is also an adjective and it gets inflected as such.

Move to three letter ISO codes

This pair should be moved to three letter ISO codes. The name should probably be apertium-afr-nld.

The following files (at minimum) will need to be checked:

  • Makefile.am
  • configure.ac
  • modes.xml
  • README

The pair should also be checked to see if it can be adapted to work with monolingual language packages in languages/

Vocabulary issues 5: Moezel

:nld: verwateren (vb. erg, insep.) --> afr: afwater (vb. sep)
:nld: afwateren (vb. abs, sep.) --> afr: tot die stroomgebied behoort
:dit gebied watert af op de Rijn --> hierdie gebied behoort tot die Ryn se stroomgebied
:nld: Vogezen (name, pl) --> afr: Vogese (name, pl)
:nld: uitmonden (vb. abs. sep.) --> afr: uitmond (vb. sep)
:nld na (prep) --> afr: ná (prep)
:nld: naar (prep) --> afr: na (prep)
:nld: Rijn (name masc) --> afr: Ryn (name)
:nld: bovenloop (n. masc sg) --> afr: boloop (n. sg.)
:nld: bovenlopen (n. pl.) --> afr: bolope (n. pl.)

Vocabulary issues 4: Euratom

nld: vreedzaam (adj) --> afr:vreedsaam
inflected:
nld:vreedzame --> afr: vreedsame
nld:kernenergie (noun; mf)--> afr: kernkrag
nld:brandstof (noun; mf) --> afr: brandstof
nld:brandstoffen (noun, pl) --> afr:brandstowwe
nld:broeikaseffect (noun; nt) --> kweekhuiseffek

Vocabulary issues Dutch --> Afrikaans

Some vocabulary issues Dutch --> Afrikaans

duikoperator --> duikdiensverskaffer
nestplaats --> nesplek
rond (as preposition) --> rondom
e.g.: rond het eiland -- rondom die eiland
cactus --> kaktus
lig, ligt, liggen (verb) --> lê
e.g.: het ligt --> dit lê
licht (noun) --> lig
lichten (plural) --> ligte
recreatieoord --> ontspanningsoord

to be / om te wees / om te zijn

<e><p><l>wees<s n="vbser"/></l><r>zijn<s n="vbser"/></r></p></e>

Is there a way to accommodate the following construction: zij zijn <-> hulle is

relative pronouns

The simplest relative pronouns in Dutch are:
*dat for sing neutrum
*die for masc/fem sing and for plural

In Afrikaans the equivalent is:
*wat in all cases

Apertium now translates nld:die into afr:dit (the personal pronoun) in the following sentence:

afr:Die gebou is opgebou uit stene wat in die son gedroog word. -- nld:Het gebouw is opgebouwd uit stenen die in de zon gedroogd worden.

Other examples

afr:dit is 'n lae getal in vergelyking met die getal renosters wat gestroop word -- nld:dit is een laag getal in vergelijking met het aantal neushoorns dat gestroopt wordt.

afr:God gaan elke boom wat nie gesonde vrugte dra nie, afkap en in die vuur gooi -- nld:God zal iedere boom die geen gezonde vruchten draagt, kappen en op het vuur gooien.

afr:en is dit 'n verskynsel wat ons nog hoegenaamd ernstig behoort op te vat? -- nld: en is dit een verschijnsel dat we zelfs maar ernstig behoren op te vatten?

Upgrade tagger to use vislcg too

So we can take care of examples such as this:

$ echo "zij zijn een man" | apertium -d . nld-afr-tagger
^prpers<prn><subj><p3><mf><pl>$ ^zijn<vbser><inf>$ ^een<det><ind><mf><sg>$ ^man<n><m><sg>$^.<sent>$

Related to #3.

Move to three letter ISO codes

This pair should be moved to three letter ISO codes. The name should probably be apertium-afr-nld.

The following files (at minimum) will need to be checked:

  • Makefile.am
  • configure.ac
  • modes.xml
  • README

The pair should also be checked to see if it can be adapted to work with monolingual language packages in languages/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.