The apertium-init's discuss from apertium

define some LEXCFLAGS?

like in
apertium/apertium-nio@e1f4494

maybe we should also have some $(UPPERCASE) programs

missing \ after .prob in TARGETS_COMMON when --no-rlx1

did

$ apertium-init -a2 hfst --no-rlx1 --no-pgen2 lol-wat

got

TARGETS_COMMON = \
    $(PREFIX1).automorf.bin \ 
    $(PREFIX1).prob 
    $(PREFIX1).rlx.bin \
    $(PREFIX1).autobil.bin \

and

config.status: creating Makefile
Makefile:656: *** missing separator.  Stop.

Auto warn when TODOs not replaced on push

cc @ftyers @jonorthwash

Use base 64 instead of base 81 for < Python 3.4

Set repository metadata on create (apertium-incubator)

topic
website
description

should run ./autogen.sh in monolinguals

That way, you can do

apertium-init foo
apertium-init bar
apertium-init foo-bar
cd apertium-foo-bar
./autogen.sh --with-lang1=../apertium-foo --with-lang2=../apertium-bar
make langs

"apertium-init -a hfst nio" doesn't give language code is configure.ac

 -AC_INIT([Apertium {{languageName1}}-{{languageName2}}], [0.1.0], [[email protected]], [apertium-{{languageCode1}}-{{languageCode2}}], [http://wiki.apertium.org/wiki/Apertium-{{languageCode1}}-{{languageCode2}}])
+AC_INIT([apertium-nio], [0.0.0], [[email protected]], [apertium-nio], [http://wiki.apertium.org/wiki/Apertium-nio])

.prob is not installed (monolingual)

fixed by having

apertium_foo_DATA=$(TARGETS_COMMON) $(LANG1).prob

instead of

apertium_foo_DATA=$(TARGETS_COMMON)

apertium-init hangs waiting for input

fran@matxine:/tmp$ python3 ~/scripts/apertium-init.py --help
^CTraceback (most recent call last):
  File "/home/fran/scripts/apertium-init.py", line 18, in <module>
    import subprocess
  File "/usr/lib/python3.6/subprocess.py", line 50, in <module>
    import signal
  File "/home/fran/scripts/signal.py", line 4, in <module>
    doc = sys.stdin.read()
KeyboardInterrupt

What is the XML that it expects to read from stdin ?

$ echo "" | python3 ~/scripts/apertium-init.py --help
Traceback (most recent call last):
  File "/home/fran/scripts/apertium-init.py", line 18, in <module>
    import subprocess
  File "/usr/lib/python3.6/subprocess.py", line 50, in <module>
    import signal
  File "/home/fran/scripts/signal.py", line 6, in <module>
    root = ET.fromstring(doc)
  File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 1315, in XML
    return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 2, column 0

Support different naming conventions.

E.g., instead of apertium-xyz, maybe ling073-xyz (like for the course I teach). Or even things like xyzmorph?

Could this be changed to GPLv3+

Title says all.

change lrx-proc to be lrx-proc -m by default

apertium-init.py instructions more clear

Is it possible to make the instructions of apertium-init.py more clear? Because it's not immediately obvious (to beginners like me) that you need to create your own repo when bootstrapping a new language pair and push to it.
Also could it be added to the wiki that when compiling the language pair you need to enter the apertium-XXX-YYY directory first? Otherwise you get the error that auto-gen.sh doesn't exist...

Add apertium-incubator topic on init

Increase test coverage to 95%

monolingual make rules and sample corpus for apertium-tagger .prob training

We should have some make rules for the unigram tagger, along with a tiny sample corpus of the right format (e.g. in the subdir corpus). Something like

LANG.prob: corpus/LANG.corpus
	apertium -d . tagger-training-pipeline <$< >$@

(Someone with an idea what the tagger training pipeline looks like will have to fill that out.)

Ignore .prob on bilingual modules

Prob files have to exist, but they are on the monolingual modules. They shouldn't be versioned on the bilingual ones.

Example:

apertium-cat -> apertium-cat.cat.prob added to GIT
apertium-spa -> apertium-cat.spa.prob added to GIT
apertium-spa-cat -> cat-spa.prob and spa-cat.prob ignored in GIT

add option to include spellrelax

Add an option to apertium-init to add for hfst modules a spellrelax file and compile it, along with relevant details in the README.

add option to include prefixes in monolingual HFST modules

Add an option to enable twoc-style prefixational morphology in HFST modules, along with a relevant example file and relevant details in the README.

should warn if it couldn't find the English language name for an iso code

$ apertium-init foo                                                                                                                                                                 
Succesfully created /apertium-foo.
Unable to add /tmp/apertium-foo to SVN: svn: E155007: '/tmp' is not a working copy
$ head apertium-foo/README                                                                                                                                   
foo

                            apertium-foo
===============================================================================

This is an Apertium monolingual language package for foo. What
you can use this language package for:

* Morphological analysis of foo
* Morphological generation of foo
$

fix issues with bidix

There are some issues with the included bidix: the alphabet should be empty (or non-existant?) and the "sg" tags shouldn't be in the example entries. It also might be good to have entries in two different languages, especially ones with incompatible POS sub-categories (e.g. casa<n><f>).

make sure monolingual modules show only the right files in the README

Currently HFST modules show the "dix" file in the list of files in the README, and it's likely that lttoolbox modules show "hfst" files in their README. This needs to be checked and fixed.

Add CI configs

pip issues

apertium-init does not seem to install correctly using pip.

jwashin1@pearson105:~/Documents$ sudo pip3 install apertium-init
Collecting apertium-init
  Downloading apertium_init-2.1.0-py3-none-any.whl (75kB)
    100% |████████████████████████████████| 81kB 4.2MB/s 
Installing collected packages: apertium-init
Successfully installed apertium-init-2.1.0
jwashin1@pearson105:~/Documents$ apertium-init -h
Traceback (most recent call last):
  File "/usr/local/bin/apertium-init", line 7, in <module>
    from apertium_init import main
  File "/usr/local/lib/python3.6/dist-packages/apertium_init.py", line 13, in <module>
    module = __import__('apertium-init')
ModuleNotFoundError: No module named 'apertium-init'

Correctly set repository description

feature: support for giella-pairs

default Makefiles and configs should work with make distcheck

I've had to comment out some stuff from the configure.ac and Makefile.am to get distcheck target working; distcheck is very useful in GNU autotools since it checks some expectations that are baked in to many package mangling systems (gentoo and some BSD's for example).

For example:

apertium/apertium-fin-deu@a8d3ad2
apertium/apertium-fin@0586173

apertium-tagger needs -x sometimes

apertium-init needs to check whether the .prob files are made with the perceptron algorithm, and add -x to apertium-tagger if so. Otherwise, bootstrapped pairs will gobble up all memory and segfault as tagger tries to load the .prob as HMM.

E.g. apertium-eng's eng.prob is perceptron.

Switch to pipenv

See apertium/streamparser#33 for an approach.

Support bootstrapping pairs where one side is from GT infra

Giellatekno pairs require slightly different makefile stuffs.

These pairs are both directions GT:

[email protected]:/apertium/apertium-sme-sma.git
[email protected]:/apertium/apertium-sme-smj.git
[email protected]:/apertium/apertium-sme-smn.git

These are one direction GT:

[email protected]:/apertium/apertium-sme-nob.git

autogen listed twice in monodix makefile.am

TARGETS_COMMON = $(LANG1).automorf.bin \
        $(LANG1).autogen.bin $(LANG1).autopgen.bin $(LANG1).autogen.bin \
        $(LANG1).automorf.att.gz $(LANG1).autogen.att.gz \
        $(LANG1).rlx.bin

$(LANG1).autogen.bin is listed twice. 9a77dcc

readme says rlx instead of lrx

e.g.

apertium-nor-eng.nor-eng.rlx - Lexical selection rules for translating into Norwegian

should be .lrx

apertium-init doesn't make corpus dir

fran@matxine:/tmp/apertium-pus$ ./autogen.sh 
make
configure.ac:4: installing './install-sh'
configure.ac:4: installing './missing'
Makefile.am: installing './INSTALL'
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether ln -s works... yes
checking for gawk... (cached) gawk
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for APERTIUM... yes
checking for LTTOOLBOX... yes
checking for cg-comp... /usr/bin/cg-comp
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating apertium-pus.pc
fran@matxine:/tmp/apertium-pus$ make
apertium-validate-dictionary apertium-pus.pus.dix
lt-comp lr apertium-pus.pus.dix pus.automorf.bin apertium-pus.pus.acx
final@inconditional 4 11
main@standard 16 22
lt-print pus.automorf.bin | gzip -9 -c -n > pus.automorf.att.gz
test -d .deps || mkdir .deps
touch .deps/.d
apertium-validate-dictionary apertium-pus.pus.dix
lt-comp rl apertium-pus.pus.dix pus.autogen.bin apertium-pus.pus.acx
final@inconditional 4 11
main@standard 16 22
lt-print pus.autogen.bin | gzip -9 -c -n > pus.autogen.att.gz
lt-comp lr apertium-pus.post-pus.dix pus.autopgen.bin
main@standard 5 8
make: *** No rule to make target 'corpus/pus.tagged', needed by 'pus.prob'.  Stop.

Add description on how to run it to the wiki page

This page: http://wiki.apertium.org/wiki/Apertium-init

Should specify how to run the thing.

add option to include apertium-separable

Add an option to apertium-init to include appropriate apertium-separable stages in the pipeline for translation modes, along with a relevant example file and relevant details in the README.

add goal for untrimmed debug modes?

See https://sourceforge.net/p/apertium/tickets/88/#b3d3

For all pairs, we should add a debug target that includes the untrimmed analyser (easy), and debug modes that are like the regular ones, but use the untrimmed analyser (boring).

DTRT when user types "apertium-lol" instead of just "lol"

Should be safe to create apertium-lol instead of apertium-apertium-lol if the user gave "apertium-lol"

The instructions are wrong

I tried apertium-init out:

$ apertium-init/apertium-init.py tlh-swe --analyser1=hfst
Successfully created /home/tpirinen/github/apertium/apertium-tlh-swe.
Initialized git repository apertium-tlh-swe.
Successfully added and committed files to git repository apertium-tlh-swe.
To push your new local repository to incubator in the Apertium organisation on GitHub:
	apertium-init.py -pe /home/tpirinen/github/apertium/apertium-tlh-swe
$ apertium-init/apertium-init.py -pe apertium-tlh-swe/
usage: apertium-init.py [-h] [-d DESTINATION] [-p]
                        [-pe PUSH_EXISTING_TO_GITHUB] [-u USERNAME]
                        [-a {lt,lttoolbox,hfst}] [-a1 {lt,lttoolbox,hfst}]
                        [-a2 {lt,lttoolbox,hfst}] [--no-rlx1] [--no-rlx2]
                        [--no-prob1] [--no-prob2] [--no-pgen1] [--no-pgen2]
                        name
apertium-init.py: error: the following arguments are required: name

I also cannot execute git push -u origin master or so in the dir:

$ git push -u origin master
fatal: 'origin' does not appear to be a git repository
fatal: Konnte nicht vom Remote-Repository lesen.

Bitte stellen Sie sicher, dass die korrekten Zugriffsberechtigungen bestehen
und das Repository existiert.

Package for pypi and expose apertium-init script.

support for more features in hfst modules

Add optional support to hfst modules for enabling spelling modules, an extra twoc module for morphotactic constraints, and spellrelax. Perhaps these should all be options at configure/autogen time, so that they can be enabled as needed after a module is started, but skeletons of the files they require should probably also be provided.

Update README

The readme should be updated to remove references to SVN. There should be a symlink to README.md. The README should be made markdowny.

use AS_IF in configure.ac checks for programs

The current rules for checking whether some program exists don't work very well, e.g. they set the CGCOMP variable to "no", which might make you try to compile with the program "no" (or check for the existence of the program "no", leading to No rule to make target 'no').

According to @flammie, this is a better check:

AC_PATH_PROG([CGCOMP], [cg-comp], [false], [$PATH$PATH_SEPARATOR$with_cg_comp/bin])
AS_IF([test x$CGCOMP = xfalse], [AC_MSG_ERROR([You don't have cg-comp installed])])

It's also much shorter than

AC_PATH_PROG(CGCOMP, cg-comp, no)
if test x$ac_cv_path_CGCOMP = x
then
  AC_MSG_ERROR([You don't have cg-comp installed.])
fi
if test x$ac_cv_path_CGCOMP = xno
then
  AC_MSG_ERROR([You don't have cg-comp installed.])
fi

:-)

<?xml version='1.0' encoding='utf-8'?>

It seems to make vim happy to select xml mode in my system, currently it's a bit random which files have this.

trailing whitespace in makefile

after apertium-init foo-bar, the Makefile.am contains lines

    $(PREFIX1).automorf.bin \ 
    $(PREFIX1).autogen.bin \ 
    $(PREFIX2).automorf.bin \ 
    $(PREFIX2).autogen.bin \

with space after the \

more docs for other options

in the README, we have no docs/examples for:

--no-prob
--no-rlx
--with-twoc
github stuff

apertium / apertium-init Goto Github PK

apertium-init's Issues

Recommend Projects

Recommend Topics

Recommend Org