apertium / apertium-init Goto Github PK
View Code? Open in Web Editor NEWScripts to bootstrap Apertium language modules and pairs
License: GNU General Public License v3.0
Scripts to bootstrap Apertium language modules and pairs
License: GNU General Public License v3.0
like in
apertium/apertium-nio@e1f4494
maybe we should also have some $(UPPERCASE) programs
Could easily be a GCI task
did
$ apertium-init -a2 hfst --no-rlx1 --no-pgen2 lol-wat
got
TARGETS_COMMON = \
$(PREFIX1).automorf.bin \
$(PREFIX1).prob
$(PREFIX1).rlx.bin \
$(PREFIX1).autobil.bin \
and
config.status: creating Makefile
Makefile:656: *** missing separator. Stop.
That way, you can do
apertium-init foo
apertium-init bar
apertium-init foo-bar
cd apertium-foo-bar
./autogen.sh --with-lang1=../apertium-foo --with-lang2=../apertium-bar
make langs
-AC_INIT([Apertium {{languageName1}}-{{languageName2}}], [0.1.0], [[email protected]], [apertium-{{languageCode1}}-{{languageCode2}}], [http://wiki.apertium.org/wiki/Apertium-{{languageCode1}}-{{languageCode2}}])
+AC_INIT([apertium-nio], [0.0.0], [[email protected]], [apertium-nio], [http://wiki.apertium.org/wiki/Apertium-nio])
fixed by having
apertium_foo_DATA=$(TARGETS_COMMON) $(LANG1).prob
instead of
apertium_foo_DATA=$(TARGETS_COMMON)
fran@matxine:/tmp$ python3 ~/scripts/apertium-init.py --help
^CTraceback (most recent call last):
File "/home/fran/scripts/apertium-init.py", line 18, in <module>
import subprocess
File "/usr/lib/python3.6/subprocess.py", line 50, in <module>
import signal
File "/home/fran/scripts/signal.py", line 4, in <module>
doc = sys.stdin.read()
KeyboardInterrupt
What is the XML that it expects to read from stdin ?
$ echo "" | python3 ~/scripts/apertium-init.py --help
Traceback (most recent call last):
File "/home/fran/scripts/apertium-init.py", line 18, in <module>
import subprocess
File "/usr/lib/python3.6/subprocess.py", line 50, in <module>
import signal
File "/home/fran/scripts/signal.py", line 6, in <module>
root = ET.fromstring(doc)
File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 1315, in XML
return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 2, column 0
E.g., instead of apertium-xyz
, maybe ling073-xyz
(like for the course I teach). Or even things like xyzmorph
?
Title says all.
Is it possible to make the instructions of apertium-init.py more clear? Because it's not immediately obvious (to beginners like me) that you need to create your own repo when bootstrapping a new language pair and push to it.
Also could it be added to the wiki that when compiling the language pair you need to enter the apertium-XXX-YYY directory first? Otherwise you get the error that auto-gen.sh doesn't exist...
We should have some make rules for the unigram tagger, along with a tiny sample corpus of the right format (e.g. in the subdir corpus). Something like
LANG.prob: corpus/LANG.corpus
apertium -d . tagger-training-pipeline <$< >$@
(Someone with an idea what the tagger training pipeline looks like will have to fill that out.)
Prob files have to exist, but they are on the monolingual modules. They shouldn't be versioned on the bilingual ones.
Example:
apertium-cat
-> apertium-cat.cat.prob
added to GITapertium-spa
-> apertium-cat.spa.prob
added to GITapertium-spa-cat
-> cat-spa.prob
and spa-cat.prob
ignored in GITAdd an option to apertium-init
to add for hfst
modules a spellrelax file and compile it, along with relevant details in the README.
Add an option to enable twoc
-style prefixational morphology in HFST modules, along with a relevant example file and relevant details in the README
.
$ apertium-init foo
Succesfully created /apertium-foo.
Unable to add /tmp/apertium-foo to SVN: svn: E155007: '/tmp' is not a working copy
$ head apertium-foo/README
foo
apertium-foo
===============================================================================
This is an Apertium monolingual language package for foo. What
you can use this language package for:
* Morphological analysis of foo
* Morphological generation of foo
$
There are some issues with the included bidix: the alphabet should be empty (or non-existant?) and the "sg" tags shouldn't be in the example entries. It also might be good to have entries in two different languages, especially ones with incompatible POS sub-categories (e.g. casa<n><f>
).
Currently HFST modules show the "dix" file in the list of files in the README, and it's likely that lttoolbox modules show "hfst" files in their README. This needs to be checked and fixed.
apertium-init
does not seem to install correctly using pip.
jwashin1@pearson105:~/Documents$ sudo pip3 install apertium-init
Collecting apertium-init
Downloading apertium_init-2.1.0-py3-none-any.whl (75kB)
100% |████████████████████████████████| 81kB 4.2MB/s
Installing collected packages: apertium-init
Successfully installed apertium-init-2.1.0
jwashin1@pearson105:~/Documents$ apertium-init -h
Traceback (most recent call last):
File "/usr/local/bin/apertium-init", line 7, in <module>
from apertium_init import main
File "/usr/local/lib/python3.6/dist-packages/apertium_init.py", line 13, in <module>
module = __import__('apertium-init')
ModuleNotFoundError: No module named 'apertium-init'
I've had to comment out some stuff from the configure.ac and Makefile.am to get distcheck target working; distcheck is very useful in GNU autotools since it checks some expectations that are baked in to many package mangling systems (gentoo and some BSD's for example).
For example:
apertium/apertium-fin-deu@a8d3ad2
apertium/apertium-fin@0586173
apertium-init needs to check whether the .prob files are made with the perceptron algorithm, and add -x to apertium-tagger if so. Otherwise, bootstrapped pairs will gobble up all memory and segfault as tagger tries to load the .prob as HMM.
E.g. apertium-eng's eng.prob is perceptron.
See apertium/streamparser#33 for an approach.
Giellatekno pairs require slightly different makefile stuffs.
These pairs are both directions GT:
[email protected]:/apertium/apertium-sme-sma.git
[email protected]:/apertium/apertium-sme-smj.git
[email protected]:/apertium/apertium-sme-smn.git
These are one direction GT:
[email protected]:/apertium/apertium-sme-nob.git
TARGETS_COMMON = $(LANG1).automorf.bin \
$(LANG1).autogen.bin $(LANG1).autopgen.bin $(LANG1).autogen.bin \
$(LANG1).automorf.att.gz $(LANG1).autogen.att.gz \
$(LANG1).rlx.bin
$(LANG1).autogen.bin is listed twice. 9a77dcc
e.g.
should be .lrx
fran@matxine:/tmp/apertium-pus$ ./autogen.sh
make
configure.ac:4: installing './install-sh'
configure.ac:4: installing './missing'
Makefile.am: installing './INSTALL'
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether ln -s works... yes
checking for gawk... (cached) gawk
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for APERTIUM... yes
checking for LTTOOLBOX... yes
checking for cg-comp... /usr/bin/cg-comp
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating apertium-pus.pc
fran@matxine:/tmp/apertium-pus$ make
apertium-validate-dictionary apertium-pus.pus.dix
lt-comp lr apertium-pus.pus.dix pus.automorf.bin apertium-pus.pus.acx
final@inconditional 4 11
main@standard 16 22
lt-print pus.automorf.bin | gzip -9 -c -n > pus.automorf.att.gz
test -d .deps || mkdir .deps
touch .deps/.d
apertium-validate-dictionary apertium-pus.pus.dix
lt-comp rl apertium-pus.pus.dix pus.autogen.bin apertium-pus.pus.acx
final@inconditional 4 11
main@standard 16 22
lt-print pus.autogen.bin | gzip -9 -c -n > pus.autogen.att.gz
lt-comp lr apertium-pus.post-pus.dix pus.autopgen.bin
main@standard 5 8
make: *** No rule to make target 'corpus/pus.tagged', needed by 'pus.prob'. Stop.
This page: http://wiki.apertium.org/wiki/Apertium-init
Should specify how to run the thing.
Add an option to apertium-init
to include appropriate apertium-separable stages in the pipeline for translation modes, along with a relevant example file and relevant details in the README
.
See https://sourceforge.net/p/apertium/tickets/88/#b3d3
For all pairs, we should add a debug target that includes the untrimmed analyser (easy), and debug modes that are like the regular ones, but use the untrimmed analyser (boring).
Should be safe to create apertium-lol instead of apertium-apertium-lol if the user gave "apertium-lol"
I tried apertium-init out:
$ apertium-init/apertium-init.py tlh-swe --analyser1=hfst
Successfully created /home/tpirinen/github/apertium/apertium-tlh-swe.
Initialized git repository apertium-tlh-swe.
Successfully added and committed files to git repository apertium-tlh-swe.
To push your new local repository to incubator in the Apertium organisation on GitHub:
apertium-init.py -pe /home/tpirinen/github/apertium/apertium-tlh-swe
$ apertium-init/apertium-init.py -pe apertium-tlh-swe/
usage: apertium-init.py [-h] [-d DESTINATION] [-p]
[-pe PUSH_EXISTING_TO_GITHUB] [-u USERNAME]
[-a {lt,lttoolbox,hfst}] [-a1 {lt,lttoolbox,hfst}]
[-a2 {lt,lttoolbox,hfst}] [--no-rlx1] [--no-rlx2]
[--no-prob1] [--no-prob2] [--no-pgen1] [--no-pgen2]
name
apertium-init.py: error: the following arguments are required: name
I also cannot execute git push -u origin master
or so in the dir:
$ git push -u origin master
fatal: 'origin' does not appear to be a git repository
fatal: Konnte nicht vom Remote-Repository lesen.
Bitte stellen Sie sicher, dass die korrekten Zugriffsberechtigungen bestehen
und das Repository existiert.
Add optional support to hfst modules for enabling spelling modules, an extra twoc module for morphotactic constraints, and spellrelax. Perhaps these should all be options at configure/autogen time, so that they can be enabled as needed after a module is started, but skeletons of the files they require should probably also be provided.
The readme should be updated to remove references to SVN. There should be a symlink to README.md. The README should be made markdowny.
The current rules for checking whether some program exists don't work very well, e.g. they set the CGCOMP variable to "no", which might make you try to compile with the program "no" (or check for the existence of the program "no", leading to No rule to make target 'no'
).
According to @flammie, this is a better check:
AC_PATH_PROG([CGCOMP], [cg-comp], [false], [$PATH$PATH_SEPARATOR$with_cg_comp/bin])
AS_IF([test x$CGCOMP = xfalse], [AC_MSG_ERROR([You don't have cg-comp installed])])
It's also much shorter than
AC_PATH_PROG(CGCOMP, cg-comp, no)
if test x$ac_cv_path_CGCOMP = x
then
AC_MSG_ERROR([You don't have cg-comp installed.])
fi
if test x$ac_cv_path_CGCOMP = xno
then
AC_MSG_ERROR([You don't have cg-comp installed.])
fi
:-)
Website should be wiki page
Add an option to apertium-init
to add for hfst
modules the ability to write twol-type prefix constraints. This will require adjusting Makefile.am
, adding a twoc
file, and providing relevant details in the README
.
Could all xml files start with:
<?xml version='1.0' encoding='utf-8'?>
It seems to make vim happy to select xml mode in my system, currently it's a bit random which files have this.
after apertium-init foo-bar, the Makefile.am contains lines
$(PREFIX1).automorf.bin \
$(PREFIX1).autogen.bin \
$(PREFIX2).automorf.bin \
$(PREFIX2).autogen.bin \
with space after the \
in the README, we have no docs/examples for:
--no-prob
--no-rlx
--with-twoc
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.