Code Monkey home page Code Monkey logo

apertium-init's Issues

missing \ after .prob in TARGETS_COMMON when --no-rlx1

did

$ apertium-init -a2 hfst --no-rlx1 --no-pgen2 lol-wat

got

TARGETS_COMMON = \
    $(PREFIX1).automorf.bin \ 
    $(PREFIX1).prob 
    $(PREFIX1).rlx.bin \
    $(PREFIX1).autobil.bin \

and

config.status: creating Makefile
Makefile:656: *** missing separator.  Stop.

should run ./autogen.sh in monolinguals

That way, you can do

apertium-init foo
apertium-init bar
apertium-init foo-bar
cd apertium-foo-bar
./autogen.sh --with-lang1=../apertium-foo --with-lang2=../apertium-bar
make langs

apertium-init hangs waiting for input

fran@matxine:/tmp$ python3 ~/scripts/apertium-init.py --help
^CTraceback (most recent call last):
  File "/home/fran/scripts/apertium-init.py", line 18, in <module>
    import subprocess
  File "/usr/lib/python3.6/subprocess.py", line 50, in <module>
    import signal
  File "/home/fran/scripts/signal.py", line 4, in <module>
    doc = sys.stdin.read()
KeyboardInterrupt

What is the XML that it expects to read from stdin ?

$ echo "" | python3 ~/scripts/apertium-init.py --help
Traceback (most recent call last):
  File "/home/fran/scripts/apertium-init.py", line 18, in <module>
    import subprocess
  File "/usr/lib/python3.6/subprocess.py", line 50, in <module>
    import signal
  File "/home/fran/scripts/signal.py", line 6, in <module>
    root = ET.fromstring(doc)
  File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 1315, in XML
    return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 2, column 0

apertium-init.py instructions more clear

Is it possible to make the instructions of apertium-init.py more clear? Because it's not immediately obvious (to beginners like me) that you need to create your own repo when bootstrapping a new language pair and push to it.
Also could it be added to the wiki that when compiling the language pair you need to enter the apertium-XXX-YYY directory first? Otherwise you get the error that auto-gen.sh doesn't exist...

monolingual make rules and sample corpus for apertium-tagger .prob training

We should have some make rules for the unigram tagger, along with a tiny sample corpus of the right format (e.g. in the subdir corpus). Something like

LANG.prob: corpus/LANG.corpus
	apertium -d . tagger-training-pipeline <$< >$@

(Someone with an idea what the tagger training pipeline looks like will have to fill that out.)

Ignore .prob on bilingual modules

Prob files have to exist, but they are on the monolingual modules. They shouldn't be versioned on the bilingual ones.

Example:

  • apertium-cat -> apertium-cat.cat.prob added to GIT
  • apertium-spa -> apertium-cat.spa.prob added to GIT
  • apertium-spa-cat -> cat-spa.prob and spa-cat.prob ignored in GIT

add option to include spellrelax

Add an option to apertium-init to add for hfst modules a spellrelax file and compile it, along with relevant details in the README.

should warn if it couldn't find the English language name for an iso code

$ apertium-init foo                                                                                                                                                                 
Succesfully created /apertium-foo.
Unable to add /tmp/apertium-foo to SVN: svn: E155007: '/tmp' is not a working copy
$ head apertium-foo/README                                                                                                                                   
foo

                            apertium-foo
===============================================================================

This is an Apertium monolingual language package for foo. What
you can use this language package for:

* Morphological analysis of foo
* Morphological generation of foo
$ 

fix issues with bidix

There are some issues with the included bidix: the alphabet should be empty (or non-existant?) and the "sg" tags shouldn't be in the example entries. It also might be good to have entries in two different languages, especially ones with incompatible POS sub-categories (e.g. casa<n><f>).

pip issues

apertium-init does not seem to install correctly using pip.

jwashin1@pearson105:~/Documents$ sudo pip3 install apertium-init
Collecting apertium-init
  Downloading apertium_init-2.1.0-py3-none-any.whl (75kB)
    100% |████████████████████████████████| 81kB 4.2MB/s 
Installing collected packages: apertium-init
Successfully installed apertium-init-2.1.0
jwashin1@pearson105:~/Documents$ apertium-init -h
Traceback (most recent call last):
  File "/usr/local/bin/apertium-init", line 7, in <module>
    from apertium_init import main
  File "/usr/local/lib/python3.6/dist-packages/apertium_init.py", line 13, in <module>
    module = __import__('apertium-init')
ModuleNotFoundError: No module named 'apertium-init'

apertium-tagger needs -x sometimes

apertium-init needs to check whether the .prob files are made with the perceptron algorithm, and add -x to apertium-tagger if so. Otherwise, bootstrapped pairs will gobble up all memory and segfault as tagger tries to load the .prob as HMM.

E.g. apertium-eng's eng.prob is perceptron.

autogen listed twice in monodix makefile.am

TARGETS_COMMON = $(LANG1).automorf.bin \
        $(LANG1).autogen.bin $(LANG1).autopgen.bin $(LANG1).autogen.bin \
        $(LANG1).automorf.att.gz $(LANG1).autogen.att.gz \
        $(LANG1).rlx.bin

$(LANG1).autogen.bin is listed twice. 9a77dcc

apertium-init doesn't make corpus dir

fran@matxine:/tmp/apertium-pus$ ./autogen.sh 
make
configure.ac:4: installing './install-sh'
configure.ac:4: installing './missing'
Makefile.am: installing './INSTALL'
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether ln -s works... yes
checking for gawk... (cached) gawk
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for APERTIUM... yes
checking for LTTOOLBOX... yes
checking for cg-comp... /usr/bin/cg-comp
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating apertium-pus.pc
fran@matxine:/tmp/apertium-pus$ make
apertium-validate-dictionary apertium-pus.pus.dix
lt-comp lr apertium-pus.pus.dix pus.automorf.bin apertium-pus.pus.acx
final@inconditional 4 11
main@standard 16 22
lt-print pus.automorf.bin | gzip -9 -c -n > pus.automorf.att.gz
test -d .deps || mkdir .deps
touch .deps/.d
apertium-validate-dictionary apertium-pus.pus.dix
lt-comp rl apertium-pus.pus.dix pus.autogen.bin apertium-pus.pus.acx
final@inconditional 4 11
main@standard 16 22
lt-print pus.autogen.bin | gzip -9 -c -n > pus.autogen.att.gz
lt-comp lr apertium-pus.post-pus.dix pus.autopgen.bin
main@standard 5 8
make: *** No rule to make target 'corpus/pus.tagged', needed by 'pus.prob'.  Stop.

add option to include apertium-separable

Add an option to apertium-init to include appropriate apertium-separable stages in the pipeline for translation modes, along with a relevant example file and relevant details in the README.

The instructions are wrong

I tried apertium-init out:

$ apertium-init/apertium-init.py tlh-swe --analyser1=hfst
Successfully created /home/tpirinen/github/apertium/apertium-tlh-swe.
Initialized git repository apertium-tlh-swe.
Successfully added and committed files to git repository apertium-tlh-swe.
To push your new local repository to incubator in the Apertium organisation on GitHub:
	apertium-init.py -pe /home/tpirinen/github/apertium/apertium-tlh-swe
$ apertium-init/apertium-init.py -pe apertium-tlh-swe/
usage: apertium-init.py [-h] [-d DESTINATION] [-p]
                        [-pe PUSH_EXISTING_TO_GITHUB] [-u USERNAME]
                        [-a {lt,lttoolbox,hfst}] [-a1 {lt,lttoolbox,hfst}]
                        [-a2 {lt,lttoolbox,hfst}] [--no-rlx1] [--no-rlx2]
                        [--no-prob1] [--no-prob2] [--no-pgen1] [--no-pgen2]
                        name
apertium-init.py: error: the following arguments are required: name

I also cannot execute git push -u origin master or so in the dir:

$ git push -u origin master
fatal: 'origin' does not appear to be a git repository
fatal: Konnte nicht vom Remote-Repository lesen.

Bitte stellen Sie sicher, dass die korrekten Zugriffsberechtigungen bestehen
und das Repository existiert.

Update README

The readme should be updated to remove references to SVN. There should be a symlink to README.md. The README should be made markdowny.

use AS_IF in configure.ac checks for programs

The current rules for checking whether some program exists don't work very well, e.g. they set the CGCOMP variable to "no", which might make you try to compile with the program "no" (or check for the existence of the program "no", leading to No rule to make target 'no').

According to @flammie, this is a better check:

AC_PATH_PROG([CGCOMP], [cg-comp], [false], [$PATH$PATH_SEPARATOR$with_cg_comp/bin])
AS_IF([test x$CGCOMP = xfalse], [AC_MSG_ERROR([You don't have cg-comp installed])])

It's also much shorter than

AC_PATH_PROG(CGCOMP, cg-comp, no)
if test x$ac_cv_path_CGCOMP = x
then
  AC_MSG_ERROR([You don't have cg-comp installed.])
fi
if test x$ac_cv_path_CGCOMP = xno
then
  AC_MSG_ERROR([You don't have cg-comp installed.])
fi

:-)

¿<?xml for xml files?

Could all xml files start with:

<?xml version='1.0' encoding='utf-8'?>

It seems to make vim happy to select xml mode in my system, currently it's a bit random which files have this.

trailing whitespace in makefile

after apertium-init foo-bar, the Makefile.am contains lines

    $(PREFIX1).automorf.bin \ 
    $(PREFIX1).autogen.bin \ 
    $(PREFIX2).automorf.bin \ 
    $(PREFIX2).autogen.bin \ 

with space after the \

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.