apertium / apertium-eng-spa Goto Github PK
View Code? Open in Web Editor NEWApertium translation pair for English and Spanish
License: GNU General Public License v2.0
Apertium translation pair for English and Spanish
License: GNU General Public License v2.0
English--Spanish =================================================================== You need apertium-3.0 and lttoolbox-3.0 to use this translator. To compile the linguistic data simply do: autoreconf -fvi ./configure make make install inside of this directory. =================================================================== More information about this module, and others can be found on the Apertium: Wiki, https://wiki.apertium.org
"Mrs de Palacio" is translated as "Mrs of Palace".
Even without complex NERs, it should consider stuff in front of titles as Names.
Transfer rule to ignore/transliterate, probably?
Update the readme file to the new boilerplate.
This pair should be moved to three letter ISO codes. The name should probably be apertium-eng-spa
.
The following files (at minimum) will need to be checked:
Makefile.am
configure.ac
modes.xml
README
The pair should also be checked to see if it can be adapted to work with monolingual language packages in languages/
Moving to three‐letter codes results in the US variant translation not working.
Comparing git master built from “make dist” to apertium-en-es-0.8.0:
$ echo 'Los colores' | apertium es-en
The colours
$ echo 'Los colores' | apertium es-en_US
The colors
$ echo 'Los colores' | apertium spa-eng
The colours
$ echo 'Los colores' | apertium spa-eng_US
USAGE: apertium-transfer trules preproc biltrans [input [output]]
apertium-transfer -b trules preproc [input [output]]
apertium-transfer -n trules preproc [input [output]]
apertium-transfer -x extended trules preproc biltrans [input [output]]
apertium-transfer -c trules preproc biltrans [input [output]]
apertium-transfer -t trules preproc biltrans [input [output]]
trules transfer rules file
preproc result of preprocess trules file
biltrans bilingual letter transducer file
input input file, standard input by default
output output file, standard output by default
-b input from lexical transfer
-n don't use bilingual dictionary
-x bindix extended mode with user dictionary
-c case-sensitiveness while accessing bilingual dictionary
-t trace (show rule numbers and patterns matched)
-T trace, for apertium-transfer-tools (also sets -t)
-z null-flushing output on '
-h shows this message
I am trying to build the data for this language pair using,
autoreconf -fvi
./configure
make
make install
./configure
succeeds, but includes the output:
./configure: line 3052: AP_MKINCLUDE: command not found
Then, make
does some things, but then errors with
make: *** No rule to make target
'modes/eng-spa.mode', needed by 'all-am'. Stop.
This is on macOS
and I have,
$ apertium -V
Apertium 3.8.1
$ lt-proc --version
lt-proc version 3.6.8
It seems like it's missing a function that I see in the apertium
repo (AP_MKINCLUDE
in apertium.m4
). Is there some setup I need with apertium
install beyond having the tools in my path?
Thanks
When Spanish verbs are translated to English, the pronoun isn't generated even when it is not ambiguous (at least in the examples I came across).
For example,
"Si la Asamblea está de acuerdo, haré lo que el señor Evans acaba de sugerir."
translates to
"If the Assembly agrees, will do what the gentleman Evans finishes to suggest."
This should be 'I will do'.
A transfer rule should fix this.
$some_number
as ^$ ^some_number<num>$
AIDS treatments that may cost $18 000 per annum for one person are obviously not affordable by countries whose annual health budget may be less than $5 per capita
^AIDS<n><acr><sg>$ ^treatment<n><pl>$ ^that<cnjsub>$ ^may<vaux><inf>$ ^cost<vblex><inf>$ ^$ ^18<num>$ ^000<num>$ ^per<pr>$ ^*annum$ ^for<pr>$ ^one<num><sg>$ ^person<n><sg>$ ^be<vbser><pres>$ ^obviously<adv>$ ^not<adv>$ ^affordable<adj>$ ^by<pr>$ ^country<n><pl>$ ^whose<rel><aa><mf><sp>$ ^annual<adj>$ ^health<n><sg>$ ^budget<n><sg>$ ^may<vaux><inf>$ ^be<vbser><inf>$ ^less~than<pr>$ ^$ ^5<num>$ ^per~capita<adj>$ ^.<sent>$
IT
as pronounsystem of electronic employment cards and using suitable IT systems to monitor all these policies.
^system<n><sg>$ ^of<pr>$ ^electronic<adj>$ ^employment<n><sg>$ ^card<n><pl>$ ^and<cnjcoo>$ ^use<vblex><ger>$ ^suitable<adj>$ ^prpers<prn><subj><p3><nt><sg>$ ^system<n><pl>$ ^to<pr>$ ^monitor<vblex><inf>$ ^all<predet><sp>$ ^this<det><dem><pl>$ ^policy<n><pl>$ ^.<sent>$
I'm the developer of http://www.wikdict.com and I'm considering to use the generated dictionaries to improve apertium. If this works out for one language pair, I'll be able to provide the same for many additional language pairs. The data comes originally from Wiktionary and is licensed under CC-BY-SA 3.0.
The same process might be usable for the dictionaries from http://www.freedict.org , but those are less homogenous, so I'll leave that for later.
I've done a quick first try to convert entries and would like some feedback on the current state.
Example:
<e><p><l>house<s n="n"></l><r>casa</r></p></e>
<e><p><l>house<s n="v"></l><r>alojar</r></p></e>
<e><p><l>house<s n="v"></l><r>envolver</r></p></e>
<e><p><l>house<s n="v"></l><r>almacenar</r></p></e>
<e><p><l>house<s n="v"></l><r>albergar</r></p></e>
<e><p><l>house<s n="v"></l><r>hospedar</r></p></e>
<e><p><l>house<s n="v"></l><r>encajar</r></p></e>
Full data at: http://download.wikdict.com/apertium/
Things to note:
My main question is: how close is this to being usable for Apertium and which are the minimum Todos before it will get any usage? It's obvious to me that this is not ready, yet. But I would like to have a realistic overview whether I can get it in a useable state at all before doing more complicated steps.
This pair should be moved to three letter ISO codes. The name should probably be apertium-eng-spa
.
The following files (at minimum) will need to be checked:
Makefile.am
configure.ac
modes.xml
README
The pair should also be checked to see if it can be adapted to work with monolingual language packages in languages/
apertium/apertium-apy#87 (comment)
maybe something for @TinoDidriksen
Translating from Englsih to Spanish, extra whitespace is added, including a space before full stop at end of sentances.
Pilar is 25 years old. She is studying medicine in Tarragona and has lots of friends.
Pilar is very likable and kind.She is going out with a young man called Javier. He is 30 years old and works in Barcelona.
Javier is an engineer. Javier likes going out in the evenings, going to the cinema and meeting friends.
Pilar tiene 25 años . Está estudiando medicina en Tarragona y tiene muchos amigos.
Pilar es muy *likable y clase.Está saliendo con un hombre joven llamó Javier. Tiene 30 años y obras en Barcelona.
Javier es un ingeniero . A Javier le gusta salir en las tardes, yendo al cine y cumpliendo amigos.
This repo is an absurd 662 MiB. There's a bunch of big files that do not need to exist in history, so at some point this repo should be scrubbed.
First of all, apologies if this is not the proper way to report this kind of issues. I've asked on #apertium channel of IRC and they suggested to report here.
I've found an unkown word on the translation from English to Spanish: brightness
I don't know how can contribute to include "brightness --> brillo" pair translation. Any insight? Thanks!
lt-proc -b spa-eng.autobil.bin
on
^común<adj><mf><sg>$ ^[<lpar>$ ^*COM$ ^(<lpar>$ ^2002<num>$ ^)<rpar>$ ^186<num>$ ^?<sent>$ ^*C5$ ^-<guio>$ ^0331<num>$ ^2002<num>$ ^-<guio>$ ^2002<num>$ ^2175<num>$ ^(<lpar>$ ^*COS$ ^)<rpar>$ ^]<rpar>$ ^.<sent>$
gives me
^común<adj><mf><sg>/common<adj><sg>$ [<lpar>$ ^*COM$ ^(<lpar>$ ^2002<num>$ ^)<rpar>$ ^186<num>$ ^?<sent>$ ^*C5$ ^-<guio>$ ^0331<num>$ ^2002<num>$ ^-<guio>$ ^2002<num>$ ^2175<num>$ ^(<lpar>$ ^*COS$ ^)<rpar>$ ^]^<rpar>/@<rpar>$ ^.<sent>/.<sent>$
Update the pair to use monolingual modules and run testvoc.
It says to run ./configure
. This file does not exist.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.