apertium / apertium-afr-nld Goto Github PK
View Code? Open in Web Editor NEWApertium translation pair for Afrikaans and Dutch
License: GNU General Public License v2.0
Apertium translation pair for Afrikaans and Dutch
License: GNU General Public License v2.0
This pair should be moved to three letter ISO codes. The name should probably be apertium-afr-nld
.
The following files (at minimum) will need to be checked:
Makefile.am
configure.ac
modes.xml
README
The pair should also be checked to see if it can be adapted to work with monolingual language packages in languages/
Participles are a big problem in Afrikaans. If they are used as part of the verb's conjugation they are usually just the verb with a ge- prefix:
ek loop --: ek het geloop
The prefix is dropped if there is an inseperable prefix
ek beveel -- ek het beveel.
The problem comes when they are used as attibutive (or predicative) adjectives:
verbal usage: hy het dit aanbeveel - hij heeft dit aanbevolen - he has recommended this
adjectival: die daaglikse aanbevole inname - de dagelijks aanbevolen hoeveelheid - the recommended daily intake
predicative: hierdie inname is aanbevole - this intake is recommended
Sometimes the attributive participle is still strong like Dutch aanbevolen, particularly in formal usuage and idioms, but for lower register don't be surprised to see 'aanbeveelde'. The language seems in flux on this point.
Can apertium be taught to distinguish the difference between verbal and adjectival usage?
nld: link (masc noun) --> afr:skakel (computer link)
nld: externe links -> afr: eksterne skakels
nld:zich (refelxive pronoun) -> afr: hom (apertium gives sig, this word is very rare and obsolete)
So we can take care of examples such as this:
$ echo "zij zijn een man" | apertium -d . nld-afr-tagger
^prpers<prn><subj><p3><mf><pl>$ ^zijn<vbser><inf>$ ^een<det><ind><mf><sg>$ ^man<n><m><sg>$^.<sent>$
Related to #3.
This pair should be moved to three letter ISO codes. The name should probably be apertium-afr-nld
.
The following files (at minimum) will need to be checked:
Makefile.am
configure.ac
modes.xml
README
The pair should also be checked to see if it can be adapted to work with monolingual language packages in languages/
This is a complicated issue in both languages and the complications are not the same.
The ground rule in both languages is the same:
*Used as predicate (or as adverb): adjective remains without inflection:
afr: Soms is kerk regtig vervelig en voorspelbaar. --> nld: Soms is kerk echt vervelend en voorspelbaar
*Used as attribute the adjective gets an inflection -e:
afr: vervelige boeke --> nld: vervelende boeken
afr:Suid-Afrika se taamlik voorspelbare politieke situasie -> Zuid-Afrika's nogal voorspelbare politieke situatie.
Unfortunately the latter rule has many exceptions in both languages and they are very different in the two languages. I do not pretend to know the Afrikaans ones all that well.
In nld the biggest exception is if the noun is singular neuter and used in indefinite form, for example with the indefinite article "een" or its negative "geen"
het paard is mooi
het mooie paard
een mooi paard
In Afrikaans neutral gender does not exist:
die perd is mooi
die mooie perd
'n mooie perd
In Afrikaans there are different exceptions: a whole bunch of monosyllabic adjectives never get inflected; e.g. groot:
die perd is groot
die groot perd
'n groot perd
For us Dutchies this is really hard, we would agree with 'n groot perd but not with die groot perd... The rules are really different..
I have no idea how you would code this. Perhaps in Afrikaans you can define a predicate form and an attribute form and simply make the two the same for cases like groot? In Dutch the rules would have to include gender and indefiniteness for the attribute form.
There is more on this subject but let me leave it at this.
This pair currently embeds the following monolingual information:
apertium-afr-nld.nld.dix
apertium-afr-nld.afr.dix
apertium-afr-nld.nld.acx
apertium-afr-nld.afr.acx
apertium-afr-nld.post-nld.dix
apertium-afr-nld.post-afr.dix
nld-afr.prob
afr-nld.prob
These files should be imported from apertium-afr
and apertium-nld
. After doing this, testvoc
will need to be done. It is recommended to do this on a branch and then merge to master after it is finished and the testvoc results as as good or better.
nld:gesteente fem. noun pl:gesteentes or gesteenten
afr:gesteente pl: gesteentes (only)
this holds for most nouns on -te.
nld:tienduizendste (ordinal) -> afr: tienduisendste
nld:opgesloten -> afr: opgesluit
This is a participle of opsluit in verbal use. I don't think it is used much as an adjective
nld: halveringstijd (noun masc) -> afr: halfleeftyd
plural halveringstijden -> halfleeftye
nld: methode (noun fm) -> afr: metode
nld pl: methodes, methoden -> afr: metodes
nld: recent (adj) -> afr: onlangs
In nld onlangs is purely an adverb, in afr it is also an adjective and it gets inflected as such.
<e><p><l>wees<s n="vbser"/></l><r>zijn<s n="vbser"/></r></p></e>
Is there a way to accommodate the following construction: zij zijn <-> hulle is
:nld: verwateren (vb. erg, insep.) --> afr: afwater (vb. sep)
:nld: afwateren (vb. abs, sep.) --> afr: tot die stroomgebied behoort
:dit gebied watert af op de Rijn --> hierdie gebied behoort tot die Ryn se stroomgebied
:nld: Vogezen (name, pl) --> afr: Vogese (name, pl)
:nld: uitmonden (vb. abs. sep.) --> afr: uitmond (vb. sep)
:nld na (prep) --> afr: ná (prep)
:nld: naar (prep) --> afr: na (prep)
:nld: Rijn (name masc) --> afr: Ryn (name)
:nld: bovenloop (n. masc sg) --> afr: boloop (n. sg.)
:nld: bovenlopen (n. pl.) --> afr: bolope (n. pl.)
The simplest relative pronouns in Dutch are:
*dat for sing neutrum
*die for masc/fem sing and for plural
In Afrikaans the equivalent is:
*wat in all cases
Apertium now translates nld:die into afr:dit (the personal pronoun) in the following sentence:
afr:Die gebou is opgebou uit stene wat in die son gedroog word. -- nld:Het gebouw is opgebouwd uit stenen die in de zon gedroogd worden.
Other examples
afr:dit is 'n lae getal in vergelyking met die getal renosters wat gestroop word -- nld:dit is een laag getal in vergelijking met het aantal neushoorns dat gestroopt wordt.
afr:God gaan elke boom wat nie gesonde vrugte dra nie, afkap en in die vuur gooi -- nld:God zal iedere boom die geen gezonde vruchten draagt, kappen en op het vuur gooien.
afr:en is dit 'n verskynsel wat ons nog hoegenaamd ernstig behoort op te vat? -- nld: en is dit een verschijnsel dat we zelfs maar ernstig behoren op te vatten?
nld: vreedzaam (adj) --> afr:vreedsaam
inflected:
nld:vreedzame --> afr: vreedsame
nld:kernenergie (noun; mf)--> afr: kernkrag
nld:brandstof (noun; mf) --> afr: brandstof
nld:brandstoffen (noun, pl) --> afr:brandstowwe
nld:broeikaseffect (noun; nt) --> kweekhuiseffek
Afrikaans has double negation, something Dutch does not have. In Afrikaans this means that a sentence that contains a negation typically ends in the (extra) word nie.
Apertium does not add that as yet not.
Examples:
afr: Johnnie is nie dood nie.
nld:Johnnie is niet dood.
afr:Maar sy het nie omgegee nie.
nld: Maar zij gaf er niet om.
There are cases where Afrikaans only has one negative element in the sentence but that is rare and I'm not quite sure how that works. They are typically short utterances in the present tense. I don't think you ever get two 'nie'-s at the end of the phrase.
The double negation can be triggered by other words besides nie itself that imply a negation like:
*geen, g'n
*niemand
*niks
*nooit
*moenie
afr: Dis g'n wonder nie.
nld: 't Is geen wonder.
afr: Dit was 'n dag wat ek nooit sal vergeet nie.
nld: Dit was een dag die ik nooit zal vergeten.
afr: Ek het niks gedoen nie.
nld: Ik heb niets gedaan.
Moenie initiates a negative imperative:
afr: Moenie huil nie!
nld: Huil niet!
Sometimes the distance between the two negatives is quite large, e.g. when a subordinate clause intervenes:
afr: Niemand is nog in hegtenis geneem nadat 'n man Maandagaand buite die bekende M-kern-apteek in Bellville in die Wes-Kaap verskeie kere in die been geskiet is nie.
nld: Niemand is er nog in hechtenis genomen nadat een man maandagavond buiten de bekende M-kern-apotheek in Belville in de provincie West-Kaap verscheidene keren in het been geschoten werd.
Some vocabulary issues Dutch --> Afrikaans
duikoperator --> duikdiensverskaffer
nestplaats --> nesplek
rond (as preposition) --> rondom
e.g.: rond het eiland -- rondom die eiland
cactus --> kaktus
lig, ligt, liggen (verb) --> lê
e.g.: het ligt --> dit lê
licht (noun) --> lig
lichten (plural) --> ligte
recreatieoord --> ontspanningsoord
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.