Code Monkey home page Code Monkey logo

apertium-init's Introduction

Apertium-Init (a.k.a. Apertium Bootstrap)

Build Status Coverage Status PyPI PyPI - Python Version

Bootstrap Apertium language modules and pairs using apertium-init.py.

Installation

There are 3 ways to obtain Apertium-Init:

If the script was downloaded directly, it can be run with python3 apertium-init.py. Otherwise it will be installed as apertium-init.

Usage

Usage depends on the desired module and is described below. Remember to search for TODO in the generated module to add example sentences, etc.

Monolingual Lttoolbox module

To bootstrap a monolingual language module apertium-foo using the lttoolbox formalism,

$ python3 apertium-init.py foo

To bootstrap and compile it at the same time,

$ python3 apertium-init.py foo && ( cd apertium-foo/ && make )

Monlingual HFST module

To bootstrap a monolingual language module apertium-foo using the HFST formalism,

$ python3 apertium-init.py foo --analyser=hfst

To bootstrap and compile it at the same time,

$ python3 apertium-init.py foo --analyser=hfst && ( cd apertium-foo/ && make )

To include a twoc file for handling prefixes,

$ python3 apertium-init.py foo --analyser=hfst --with-twoc

To include a spellrelax file for handling typographical variance,

$ python3 apertium-init.py foo --analyser=hfst --with-spellrelax

Bilingual Lttoolbox module

To bootstrap a bilingual language module apertium-foo-bar where the monolingual packages apertium-foo and apertium-bar both use the lttoolbox formalism,

$ python3 apertium-init.py foo-bar

To bootstrap and compile it at the same time,

$ python3 apertium-init.py foo-bar && ( cd apertium-foo-bar/ && ./autogen.sh && make test)

Bilingual HFST/Lttoolbox module

To bootstrap a bilingual language module apertium-foo-bar where the monolingual packages apertium-foo and apertium-bar use the HFST formalism and/or the lttoolbox formalism,

$ python3 apertium-init.py foo-bar --analysers=hfst # Both foo and bar use HFST
$ python3 apertium-init.py foo-bar --analyser1=hfst # Only foo (first language) uses HFST
$ python3 apertium-init.py foo-bar --analyser2=hfst # Only bar (second language) uses HFST

To bootstrap and compile it at the same time,

$ python3 apertium-init.py foo-bar --analysers=hfst && ( cd apertium-foo-bar/ && ./autogen.sh && make test) # Both foo and bar use HFST
$ python3 apertium-init.py foo-bar --analyser1=hfst && ( cd apertium-foo-bar/ && ./autogen.sh && make test) # Only foo (first language) uses HFST
$ python3 apertium-init.py foo-bar --analyser2=hfst && ( cd apertium-foo-bar/ && ./autogen.sh && make test) # Only bar (second language) uses HFST

Bilingual module for monolinguals that don't use apertium-tagger or CG

To bootstrap a bilingual module when one or both of the monolingual modules don't use apertium-tagger,

$ python3 apertium-init.py foo-bar --no-prob1            # Only foo doesn't have .prob
$ python3 apertium-init.py foo-bar --no-prob2            # Only bar doesn't have .prop
$ python3 apertium-init.py foo-bar --no-prob1 --no-prob2 # Neither foo nor bar have .prob

To bootstrap a bilingual module when one or both of the monolingual modules don't use CG,

$ python3 apertium-init.py foo-bar --no-rlx1           # Only foo doesn't have .rlx
$ python3 apertium-init.py foo-bar --no-rlx2           # Only bar doesn't have .rlx
$ python3 apertium-init.py foo-bar --no-rlx1 --no-rlx2 # Neither foo nor bar have .rlx

Bilingual module using recursive transfer

To bootstrap a bilingual module which uses apertium-recursive,

$ python3 apertium-init.py foo-bar --transfer=rtx

Bilingual module with anaphora resolution

To bootstrap a bilingual module which uses apertium-anaphora,

$ python3 apertium-init.py foo-bar --with-anaphora

Bilingual module with discontiguous multiwords

To bootstrap a bilingual module which uses apertium-separable,

$ python3 apertium-init.py foo-bar --with-separable

Adding features to an existing module

Apertium-init can reconfigure an existing module or pair. For example, to add apertium-separable to an existing pair:

$ python3 apertium-init.py foo-bar -r --with-separable

Note that all desired options must be specified. If the foo-bar pair used apertium-anaphora, the above command would remove it.

Pushing to Github

To bootstrap a module or pair and also add it to the apertium incubator,

$ python3 apertium-init.py foo -p     # Bootstrap module apertium-foo and push to Github
$ python3 apertium-init.py foo-bar -p # Bootstrap pair apertium-foo-bar and push to Github

To specify what username to push as (rather than relying on git config),

$ python3 apertium-init.py foo -p -u bar # Bootstrap module apertium-foo and push to Github under username bar

Development

After updating vanilla files, run make in the root of the repository to generate apertium-init.py.

You can also do sudo make install to install to /usr/local/bin/apertium-init or e.g. PREFIX=$HOME/local make install to install to $HOME/local/bin/apertium-init.

Use pipenv install --dev to install the requirements required for development, e.g. linters.

Releasing

After installing development resources following the instructions above, deploying to PyPi is relatively straightforward.

Use make dist to create a source distributable inside the dist directory that can be installed locally via pip.

Use make test-release and make release to deploy to the testing PyPi instance and the production PyPi instance respectively. Either step requires PyPi authentication credentials with access to the apertium-init package.

apertium-init's People

Contributors

dependabot[bot] avatar dolphingarlic avatar flammie avatar ftyers avatar itaisteinherz avatar jiffygist avatar jonorthwash avatar khannatanmai avatar mr-martian avatar nlhowell avatar shardulc avatar sushain97 avatar tinodidriksen avatar unhammer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

apertium-init's Issues

should run ./autogen.sh in monolinguals

That way, you can do

apertium-init foo
apertium-init bar
apertium-init foo-bar
cd apertium-foo-bar
./autogen.sh --with-lang1=../apertium-foo --with-lang2=../apertium-bar
make langs

should warn if it couldn't find the English language name for an iso code

$ apertium-init foo                                                                                                                                                                 
Succesfully created /apertium-foo.
Unable to add /tmp/apertium-foo to SVN: svn: E155007: '/tmp' is not a working copy
$ head apertium-foo/README                                                                                                                                   
foo

                            apertium-foo
===============================================================================

This is an Apertium monolingual language package for foo. What
you can use this language package for:

* Morphological analysis of foo
* Morphological generation of foo
$ 

monolingual make rules and sample corpus for apertium-tagger .prob training

We should have some make rules for the unigram tagger, along with a tiny sample corpus of the right format (e.g. in the subdir corpus). Something like

LANG.prob: corpus/LANG.corpus
	apertium -d . tagger-training-pipeline <$< >$@

(Someone with an idea what the tagger training pipeline looks like will have to fill that out.)

add option to include spellrelax

Add an option to apertium-init to add for hfst modules a spellrelax file and compile it, along with relevant details in the README.

trailing whitespace in makefile

after apertium-init foo-bar, the Makefile.am contains lines

    $(PREFIX1).automorf.bin \ 
    $(PREFIX1).autogen.bin \ 
    $(PREFIX2).automorf.bin \ 
    $(PREFIX2).autogen.bin \ 

with space after the \

autogen listed twice in monodix makefile.am

TARGETS_COMMON = $(LANG1).automorf.bin \
        $(LANG1).autogen.bin $(LANG1).autopgen.bin $(LANG1).autogen.bin \
        $(LANG1).automorf.att.gz $(LANG1).autogen.att.gz \
        $(LANG1).rlx.bin

$(LANG1).autogen.bin is listed twice. 9a77dcc

apertium-init.py instructions more clear

Is it possible to make the instructions of apertium-init.py more clear? Because it's not immediately obvious (to beginners like me) that you need to create your own repo when bootstrapping a new language pair and push to it.
Also could it be added to the wiki that when compiling the language pair you need to enter the apertium-XXX-YYY directory first? Otherwise you get the error that auto-gen.sh doesn't exist...

¿<?xml for xml files?

Could all xml files start with:

<?xml version='1.0' encoding='utf-8'?>

It seems to make vim happy to select xml mode in my system, currently it's a bit random which files have this.

Update README

The readme should be updated to remove references to SVN. There should be a symlink to README.md. The README should be made markdowny.

The instructions are wrong

I tried apertium-init out:

$ apertium-init/apertium-init.py tlh-swe --analyser1=hfst
Successfully created /home/tpirinen/github/apertium/apertium-tlh-swe.
Initialized git repository apertium-tlh-swe.
Successfully added and committed files to git repository apertium-tlh-swe.
To push your new local repository to incubator in the Apertium organisation on GitHub:
	apertium-init.py -pe /home/tpirinen/github/apertium/apertium-tlh-swe
$ apertium-init/apertium-init.py -pe apertium-tlh-swe/
usage: apertium-init.py [-h] [-d DESTINATION] [-p]
                        [-pe PUSH_EXISTING_TO_GITHUB] [-u USERNAME]
                        [-a {lt,lttoolbox,hfst}] [-a1 {lt,lttoolbox,hfst}]
                        [-a2 {lt,lttoolbox,hfst}] [--no-rlx1] [--no-rlx2]
                        [--no-prob1] [--no-prob2] [--no-pgen1] [--no-pgen2]
                        name
apertium-init.py: error: the following arguments are required: name

I also cannot execute git push -u origin master or so in the dir:

$ git push -u origin master
fatal: 'origin' does not appear to be a git repository
fatal: Konnte nicht vom Remote-Repository lesen.

Bitte stellen Sie sicher, dass die korrekten Zugriffsberechtigungen bestehen
und das Repository existiert.

add option to include apertium-separable

Add an option to apertium-init to include appropriate apertium-separable stages in the pipeline for translation modes, along with a relevant example file and relevant details in the README.

pip issues

apertium-init does not seem to install correctly using pip.

jwashin1@pearson105:~/Documents$ sudo pip3 install apertium-init
Collecting apertium-init
  Downloading apertium_init-2.1.0-py3-none-any.whl (75kB)
    100% |████████████████████████████████| 81kB 4.2MB/s 
Installing collected packages: apertium-init
Successfully installed apertium-init-2.1.0
jwashin1@pearson105:~/Documents$ apertium-init -h
Traceback (most recent call last):
  File "/usr/local/bin/apertium-init", line 7, in <module>
    from apertium_init import main
  File "/usr/local/lib/python3.6/dist-packages/apertium_init.py", line 13, in <module>
    module = __import__('apertium-init')
ModuleNotFoundError: No module named 'apertium-init'

use AS_IF in configure.ac checks for programs

The current rules for checking whether some program exists don't work very well, e.g. they set the CGCOMP variable to "no", which might make you try to compile with the program "no" (or check for the existence of the program "no", leading to No rule to make target 'no').

According to @flammie, this is a better check:

AC_PATH_PROG([CGCOMP], [cg-comp], [false], [$PATH$PATH_SEPARATOR$with_cg_comp/bin])
AS_IF([test x$CGCOMP = xfalse], [AC_MSG_ERROR([You don't have cg-comp installed])])

It's also much shorter than

AC_PATH_PROG(CGCOMP, cg-comp, no)
if test x$ac_cv_path_CGCOMP = x
then
  AC_MSG_ERROR([You don't have cg-comp installed.])
fi
if test x$ac_cv_path_CGCOMP = xno
then
  AC_MSG_ERROR([You don't have cg-comp installed.])
fi

:-)

missing \ after .prob in TARGETS_COMMON when --no-rlx1

did

$ apertium-init -a2 hfst --no-rlx1 --no-pgen2 lol-wat

got

TARGETS_COMMON = \
    $(PREFIX1).automorf.bin \ 
    $(PREFIX1).prob 
    $(PREFIX1).rlx.bin \
    $(PREFIX1).autobil.bin \

and

config.status: creating Makefile
Makefile:656: *** missing separator.  Stop.

apertium-init hangs waiting for input

fran@matxine:/tmp$ python3 ~/scripts/apertium-init.py --help
^CTraceback (most recent call last):
  File "/home/fran/scripts/apertium-init.py", line 18, in <module>
    import subprocess
  File "/usr/lib/python3.6/subprocess.py", line 50, in <module>
    import signal
  File "/home/fran/scripts/signal.py", line 4, in <module>
    doc = sys.stdin.read()
KeyboardInterrupt

What is the XML that it expects to read from stdin ?

$ echo "" | python3 ~/scripts/apertium-init.py --help
Traceback (most recent call last):
  File "/home/fran/scripts/apertium-init.py", line 18, in <module>
    import subprocess
  File "/usr/lib/python3.6/subprocess.py", line 50, in <module>
    import signal
  File "/home/fran/scripts/signal.py", line 6, in <module>
    root = ET.fromstring(doc)
  File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 1315, in XML
    return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 2, column 0

Ignore .prob on bilingual modules

Prob files have to exist, but they are on the monolingual modules. They shouldn't be versioned on the bilingual ones.

Example:

  • apertium-cat -> apertium-cat.cat.prob added to GIT
  • apertium-spa -> apertium-cat.spa.prob added to GIT
  • apertium-spa-cat -> cat-spa.prob and spa-cat.prob ignored in GIT

fix issues with bidix

There are some issues with the included bidix: the alphabet should be empty (or non-existant?) and the "sg" tags shouldn't be in the example entries. It also might be good to have entries in two different languages, especially ones with incompatible POS sub-categories (e.g. casa<n><f>).

apertium-tagger needs -x sometimes

apertium-init needs to check whether the .prob files are made with the perceptron algorithm, and add -x to apertium-tagger if so. Otherwise, bootstrapped pairs will gobble up all memory and segfault as tagger tries to load the .prob as HMM.

E.g. apertium-eng's eng.prob is perceptron.

apertium-init doesn't make corpus dir

fran@matxine:/tmp/apertium-pus$ ./autogen.sh 
make
configure.ac:4: installing './install-sh'
configure.ac:4: installing './missing'
Makefile.am: installing './INSTALL'
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether ln -s works... yes
checking for gawk... (cached) gawk
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for APERTIUM... yes
checking for LTTOOLBOX... yes
checking for cg-comp... /usr/bin/cg-comp
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating apertium-pus.pc
fran@matxine:/tmp/apertium-pus$ make
apertium-validate-dictionary apertium-pus.pus.dix
lt-comp lr apertium-pus.pus.dix pus.automorf.bin apertium-pus.pus.acx
final@inconditional 4 11
main@standard 16 22
lt-print pus.automorf.bin | gzip -9 -c -n > pus.automorf.att.gz
test -d .deps || mkdir .deps
touch .deps/.d
apertium-validate-dictionary apertium-pus.pus.dix
lt-comp rl apertium-pus.pus.dix pus.autogen.bin apertium-pus.pus.acx
final@inconditional 4 11
main@standard 16 22
lt-print pus.autogen.bin | gzip -9 -c -n > pus.autogen.att.gz
lt-comp lr apertium-pus.post-pus.dix pus.autopgen.bin
main@standard 5 8
make: *** No rule to make target 'corpus/pus.tagged', needed by 'pus.prob'.  Stop.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.