Code Monkey home page Code Monkey logo

omorfi's Introduction

🇫🇮Omorfi–Open morphology of Finnish

This is a free/libre open source morphology of Finnish: a database, tools and APIs. Everything you need to build NLP applications processing Finnish language words and texts.

  • 🇫🇮 high-quality Finnish text analysis and generation
  • 🩸 bleeding edge
  • ⚡ blazing fast

Build Status CI

Documentation

I try to keep this README very condensed for github. For more detailed information, see github pages for omorfi.

Citing and academic works

Citation information is available in github's cite this repository function, backed by the CITATION.cff. For further details, see omorfi articles.

Downloading and further information

Omorfi source packages can be downloaded from github:

or the most current version using git. For more information, see Release policy

Dependencies

  • hfst-3.15 or greater,
  • python-3.5 or greater,
  • libhfst-python,
  • C++ compiler and libtool
  • GNU autoconf-2.64, automake-1.12; compatible pkg-config implementation

Optionally:

  • VISL CG 3
  • hfst-ospell-0.2.0 or greater needed for spell-checking
  • Java 7, or greater, for Java bindings

Installation

For detailed instructions and explanations of different options, see Installation instructions on the github pages site. This readme is a quick reference.

Full installation

Requires all dependencies to be installed.

autoreconf -i
./configure
make
make install

Will install binaries and scripts for all users on typical environments

Minimal "installation"

To skip language model building and use some of the scripts locally.

autoreconf -i
./configure
src/bash/omorfi-download.bash

This will download some of the pre-compiled dictionaries into your current working directory.

Python installation

It is possible to install within python via pip or anaconda. The dependencies that are not available in pip or anaconda will not be usable, e.g. syntactic analysis and disambiguation using VISL CG 3.

pip install omorfi

Anaconda

conda install -c flammie omorfi

Docker

It is possible to use omorfi with a ready-made docker container, there is a Dockerfile in src/docker/Dockerfile for that.

docker build -t "omorfi:Dockerfile" .
docker run -it "omorfi:Dockerfile" bash

Usage

Omorfi can be used from command line using following commands:

  1. omorfi-disambiguate-text.sh: analyse and disambiguate
  2. omorfi-analyse-text.sh: analyse
  3. omorfi-spell.sh: spell-check and correct
  4. omorfi-segment.sh: morphologically segment
  5. omorfi-conllu.bash: analyse in CONLL-U format
  6. omorfi-freq-evals.bash: analyse coverage and statistics
  7. omorfi-ftb3.bash: analyse in FTB-3 format (CONLL-X)
  8. omorfi-factorise.bash: analyse in Moses-SMT factorised format
  9. omorfi-vislcg.bash: analyse in VISL CG 3 format
  10. omorfi-analyse-tokenised.sh: analyse word per line (faster)
  11. omorfi-generate.sh: generate word-forms from omor descriptions
  12. omorfi-download.bash: download language models from latest release

For further details please refer to:

Programming APIs

Omorfi can be used via very simple programming APIs, the design is detailed in omorfi API design

Using binary models

There are various binaries for language models that can be used with specialised tools like HFST. For further details, see our usage examples.

Troubleshooting

For full descriptions and archived problems, see: Troubleshooting in github pages

hfst-lexc: Unknown option

Update HFST.

ImportError (or other Python problems)

In order for python scripts to work you need to install them to same prefix as python, or define PYTHONPATH, e.g. export PYTHONPATH=/usr/local/lib/python3.11/site-packages/

Processing text gets stuck / takes long

This can easily happen for legit reasons. It can be reduced by filtering overlong tokens out. Or processing texts in smaller pieces.

Make gets killed

Get more RAM or swap space.

Contributing

Omorfi code and data are free and libre open source, and community-driven, to participate, read further information in CONTRIBUTING

Contact

  • Issues and problems may be filed in our github issue tracker, including support questions
  • IRC channel #omorfi on OFTC is particularly good for live chat for support questions, suggestions and discussions
  • omorfi-devel mailing list is good for longer more involved discussions

You can always discuss in English or Finnish on any of the channels.

Code of conduct

See our code of conduct.

Donations

A lot of omorfi development has been done on spare time and by volunteers, if you want to support Flammie you can use the github's ❤️Sponsor button, or any of the services below:

Donate using Liberapay

Become a Patron!

omorfi's People

Contributors

davissorenson avatar flammie avatar frankier avatar jkuokkala avatar joonazan avatar ljleppan avatar lurkki14 avatar traubert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

omorfi's Issues

Check autoconf files into repository

Getting autogen.sh to actually work seems to be quite a challenge here. It 
requires different versions of the tools than my distro provides, and thus 
fails to build entirely. Since these files don't need to be rebuilt for each 
machine anyway, why not just check them into the repository, to make building 
the code a lot easier?

If nothing else then at least on the stable branch?

Original issue reported on code.google.com by [email protected] on 16 Mar 2015 at 3:40

XRE parsing failed while building on OSX

Got this while building omorfi (GitHub master) with hfst-3.8.3 on Mac OS X 10.10.4:

/usr/local/bin/python3 ./python/generate-regexes.py \
        -r token -o generated/omorfi-token.regex
/opt/hfst/bin/hfst-regexp2fst --verbose --semicolon -j -i generated/omorfi-token.regex |\
        /opt/hfst/bin/hfst-minimize --verbose -o generated/omorfi-token.regex.hfst
Output format not specified, defaulting to openfst tropical
Reading from generated/omorfi-token.regex, writing to <stdout>
Compiling expression #1
Reading from <stdin>, writing to generated/omorfi-token.regex.hfst
Compiling expression #2
Minimizing xre(?)...
echo '0:"\n" ;' | /opt/hfst/bin/hfst-regexp2fst -o generated/omorfi.token-separator.hfst
*** xre parsing failed: syntax error, unexpected LEXER_ERROR
***    parsing 0:"
 [near "]

<stdin>.1: XRE parsing failed
make[1]: *** [generated/omorfi.token-separator.hfst] Error 1
make: *** [all-recursive] Error 1

Propers with boundaries require hyphen at the boundary

In boundaries.tsv we have e.g.
 Vanka|vesi, Haapa|tie, Halenneen|kiven|tie, Laukaa-Konne|vesi

These should get analysed with or without a hyphen at the boundaries, but 
currently they only get analysis if there is a hyphen at the first boundary 
after each capital letter.

Examples:

> Vankavedellä
Vankavedellä    Vankavedellä+?  inf

> Vanka-vedellä
Vanka-vedellä   [WORD_ID=Vankavesi][POS=NOUN][PROPER=PROPER][NUM=SG][CASE=ADE] 
 0,000000

> Haapatie
Haapatie        
[WORD_ID=haapa][POS=NOUN][NUM=SG][CASE=NOM][BOUNDARY=COMPOUND][WORD_ID=tie][POS=
NOUN][NUM=SG][CASE=NOM] 1,000977

> Haapa-tie
Haapa-tie       [WORD_ID=Haapatie][POS=NOUN][PROPER=PROPER][NUM=SG][CASE=NOM]   
0,000000
Haapa-tie       
[WORD_ID=Haapa][POS=NOUN][PROPER=PROPER][NUM=SG][CASE=NOM][BOUNDARY=COMPOUND][WO
RD_ID=tie][POS=NOUN][NUM=SG][CASE=NOM]  1,000977
Haapa-tie       
[WORD_ID=haapa][POS=NOUN][NUM=SG][CASE=NOM][BOUNDARY=COMPOUND][WORD_ID=tie][POS=
NOUN][NUM=SG][CASE=NOM] 1,000977

> Halenneenkiventie
Halenneenkiventie       Halenneenkiventie+?     inf

> Halenneenkiven-tie
Halenneenkiven-tie      Halenneenkiven-tie+?    inf

> Halenneen-kiventie
Halenneen-kiventie      
[WORD_ID=Halenneenkiventie][POS=NOUN][PROPER=PROPER][NUM=SG][CASE=NOM]  0,000000

> Halenneen-kiven-tie
Halenneen-kiven-tie     
[WORD_ID=Halenneenkiventie][POS=NOUN][PROPER=PROPER][NUM=SG][CASE=NOM]  0,000000

> Laukaa-Konnevesi
Laukaa-Konnevesi        Laukaa-Konnevesi+?      inf

> Laukaa-Konne-vesi
Laukaa-Konne-vesi       
[WORD_ID=Laukaa-Konnevesi][POS=NOUN][PROPER=PROPER][NUM=SG][CASE=NOM]   0,000000
Laukaa-Konne-vesi       
[WORD_ID=Laukaa][POS=NOUN][PROPER=PROPER][NUM=SG][CASE=NOM][BOUNDARY=COMPOUND][W
ORD_ID=Konnevesi][POS=NOUN][PROPER=PROPER][NUM=SG][CASE=NOM]    1,000977

Original issue reported on code.google.com by [email protected] on 11 Jun 2014 at 5:46

Compound numerals get flawed ftb3 base forms

Base form constructing goes wrong somewhere with numeral compounds.
Examples:

> satakuusikymmentäviisi
satakuusikymmentäviisi  sa#kuu#kymmen Par#viisi Num Nom Sg      0,000000
satakuusikymmentäviisi  sa#kuusikymmentä#viisi Num Nom Sg       0,000000

> kolmattasataa
kolmattasataa   kolma Ord Par#sata Num Par Sg   0,000000

> viidettäkymmentä
viidettäkymmentä        vii Abe#kymmenen Num Par Sg     0,000000
viidettäkymmentä        viide Ord Par#kymmenen Num Par Sg       0,000000

> satavuotias
satavuotias     sa#-vuotias A Pos Nom Sg        0,000000
satavuotias     sata#vuotias A Pos Nom Sg       0,000000
satavuotias     sata#vuotias N Nom Sg   0,000000


Original issue reported on code.google.com by [email protected] on 13 Jun 2014 at 12:36

Omorfi import into Giella infra 24.7.2016 broken

the file src/morphology/stems/punct.lexc contains entries like:

%+Foc%_kO%+Foc%_hAn:{MB}ko{MB}han   # ;
%+Foc%_kO%+Foc%_kAAn:{MB}ko{MB}kaan # ;
%+Foc%_kO%+Foc%_kin:{MB}ko{MB}kin   # ;

where the multichar tags are broken, and do not correspond to what is defined in root.lexc.

The export/import was committed in revision 117903.

references

It would be nice to have in the README some references to the main articles about this resource. For instance, I found this repo by chance reading

Pirinen, Tommi A, Juha Kuokkala, Hanna Nurmi, and Krister Linden, ‘Harmonising a Large Coverage Rule-Based Morphological Analyser for Multiple Applications — OMorFi Experiences’, 2015.

omorfi-analyse-text.sh ignores --with-hfst path and uses wrong hfst/fi path

After a successful build (on OSX 10.10.4) using:

./configure --prefix=/opt/omorfi --with-hfst=/opt/hfst/bin
make
make install

When I try to run /opt/omorfi/bin/omorfi-analyse-text.sh I get some errors:

  1. The script tries to find files in /opt/omorfi/share/hfst/fi but that directory does not exist. Instead, the files seem to have been installed in /opt/omorfi/share/omorfi. Creating a symlink makes it work.
  2. The script tries to execute hfst-proc without a path, even when a custom path has been configured with --with-hfst=/opt/hfst/bin. This causes it to fail, because the custom path not in $PATH.

The script works after manually fixing those issues.

Release Omorfi 201710515 has undeclared dependency to hfst-optimized-lookup-java

The latest release of Omorfi (201710515) seems to require that the Java class files from hfst-optimized-lookup-java are present in the Java $CLASSPATH. These, however, are not installed by any of the declared dependencies.

This is noted by ./configure, but is well hidden in the output and does indicate in any way that there is an actual issue:

$./configure
...
You have no CLASSPATH, I hope it is good
checking for kaffe... no
checking for java... java
checking for gcj... no
checking for guavac... no
checking for jikes... no
checking for javac... javac
checking if javac works... yes
checking if java works... yes
checking for net.sf.hfst.Transducer class... no
...
Configured omorfi 20170515:
=======
* Analysers: yes
    * OMOR yes (flags: --omor-props   --omor-sem)
    * FTB3.1 no
    * apertium no
    * giella: no
    * labeled segmenter: no
* Limits:
    * tiny lexicons: 
    * big tests: 
* Applications
    * Voikko speller: yes
    * segmenter: no
    * lemmatiser: no
    * hyphenators: no
* Clusters
    * run tests on PBS cluster: false → mailto: no
    * run tests on SLURM cluster: false → mailto: no
say make, make check and make install to proceed

The previous release ("20161115 or UD 1.4") install correctly without these Java files.

Implement open fitwol view of things as well

There's makings of a TWOLised version of omorfi hiding in the python scripts and datas. Finishing that would make it nice to compare how different implementations do actually affect the construction and perhaps even the resulting automaton.

Lemmatizer performance

I installed Omorfi strictly following instructions. Then I tested Python API lemmatizer using following simple code:

from omorfi.omorfi import Omorfi
omorfi = Omorfi()
omorfi.load_from_dir()
result=omorfi.lemmatise(some_test_word)

However, the lemmatization performance does not seem to be up to par with FINTWOL online tool.
Few examples:

kissattomat:
FINTWOL: "kissaton" DN-TON A POS NOM P
Omorfi: ('kissattomat', inf)

autottomat:
FINTWOL: "autoton" DN-TON A POS NOM PL
Omorfi: (('auto', 0.0), ('autoton', 0.0))

tipattomille:
FINTWOL: "tipaton" DN-TON A POS ALL PL
Omorfi: ('tipattomille', inf)

where 'inf' means failure to lemmatise word (unknown word). Dictionary of Omorfi/hfst tool seems very limited.
Is there anything I can do to improve Omorfi performance, e.g., install some additional components or change Omorfi settings? I would like to get results similar to FINTWOL.

Rename paradigms by most frequent word

Unhammer, among others, suggested on IRC channel hashtag apertium, that a good naming policy for paradigms is to have most frequent word instead of random one, stand for the paradigm. I like this policy and will do the necessary changes before next release. Also available will be a better paradigm knower (guesser possible if I have time).

omorfi-ftb3.lexc didn't compile cleanly?

On centos6, got the source code from github, installed the HFST 3.8.3 and did configure:

./configure --prefix=/work/bin/omorfiversiot/151015 --with-hfst=/work/bin/hfst_v383/bin

When doing make (it goes fine for a while), but then...

make
...
420000...
 424254  entries in master db
Reading from generated/stemparts.tsv
Reading from generated/inflections.tsv
/work/bin/hfst_v383/bin/hfst-lexc --Werror -o generated/omorfi-ftb3.lexc.hfst generated/omorfi-ftb3.lexc
/work/bin/hfst_v383/bin/hfst-lexc: warning: Defaulting to OpenFst tropical type
Root...21 ADJ...18294 ADP...413 ADV...8313 CONJ...6 CONJ|VERB...8 INTJ...367 NOUN...108725 NUM...901 PRON...85 PROPN...271518 PUNCT...64 SCONJ...19 SYM...24 VERB...10953 
<clipclip>  ...3 DIGITS_MILJARDI...33 DIGITS_MILJOONA...33 DIGITS_NELJÄ...32 *DIGITS_NELJÄS...
generated/omorfi-ftb3.lexc:428970.1-22: Sublexicon defined more than once. [near: `LEXICON DIGITS_NELJÄ']
/work/bin/hfst_v383/bin/hfst-lexc: The file generated/omorfi-ftb3.lexc did not compile cleanly.*
(if there are no error messages above, try -v or -d to get more info)
make[1]: *** [generated/omorfi-ftb3.lexc.hfst] Error 1
make[1]: Leaving directory `/work/tmp/omorfi_uusin/omorfi/src'
make: *** [all-recursive] Error 1

Not sure if related, but for some reason hfst-installation created hfst-proc as a link to a non-existant file. However, then just replaced that link with real 'hfst-proc' from the precompiled binaries.

Before that change, the omorfi configuration stopped to this:

checking for /work/bin/hfst_v383/bin/hfst-split... yes
checking for /work/bin/hfst_v383/bin/hfst-conjunct... yes
checking for /work/bin/hfst_v383/bin/hfst-proc... no
configure: error: HFST commandline tools are needed in /work/bin/hfst_v383/bin
ls -la /work/bin/hfst_v383/bin/hfst-proc
/work/bin/hfst_v383/bin/hfst-proc -> hfst-apertium-proc
ls -la /work/bin/hfst_v383/bin/hfst-apertium-proc
ls: cannot access /work/bin/hfst_v383/bin/hfst-apertium-proc: No such file or directory

and the precompiled version is, which now exists in the hfst-directory

./hfst-proc --version
hfst-proc 0.0 (hfst 3.8.2)

Java interface compilation fails

I get the following compilation error while trying to compile Omorfi.java against the current (https://github.com/hfst/hfst-optimized-lookup/tree/938edf4075ab1e3a08017750d80d32e903cf5376) hfst-optimized-lookup-java code base from GitHub:

...
CLASSPATH=..:./..${CLASSPATH:+":$CLASSPATH"} javac -d ..    java/com/github/flammie/omorfi/Omorfi.java
java/com/github/flammie/omorfi/Omorfi.java:256: error: analyze(String) is not public in Transducer; cannot be accessed from outside package
                analyser.analyze(wf));
                        ^
java/com/github/flammie/omorfi/Omorfi.java:265: error: analyze(String) is not public in Transducer; cannot be accessed from outside package
                analyser.analyze(wf.toUpperCase()));
                        ^
java/com/github/flammie/omorfi/Omorfi.java:275: error: analyze(String) is not public in Transducer; cannot be accessed from outside package
                analyser.analyze(wf.toLowerCase()));
                        ^
3 errors
Makefile:1199: recipe for target 'classjava.stamp' failed
make[2]: *** [classjava.stamp] Error 1
make[2]: Leaving directory '/home/marko/omorfi-20170515/src'

Also, it would be nice if configure would support passing hfst-optimized-lookup jar path as command line argument instead of having to set it via CLASSPATH.

Omorfi needs a Xerox-style guesser

This is a TODO list bug. A xerox style guesser as described in the black book(?) or K.R.Beesley's articles is a requirement for some downstreams. This would be composed of the stem regexes in src/paradigms and regular continuations as is.

Proper noun filtering is broken

Some process seems to leave broken filtering features in analyses, particularly in uppercase versions of words, eg.

$ hfst-lookup omorfi.describe.hfst

Koira
Koira	[WORD_ID=Koira][UPOS=PROPN]ERRORMACROPROPN-BLOCKING][PROPER=GEO][NUM=SG][CASE=NOM]	0.000000

Some long strings with lots of Os cause omorfi to pause indefinitely

We have lots of words we run through omorfi on centos, some words cause omorfi to get stuck and it doesn't return any result. We have used 0.2 and now tried with the newest version from github couple of hours ago, but reproducible with that, too.

For example this fails

OOOOOOOOOOOOOOOOOOOOOOOOOHOOOOOOOOOOOOOOOOOOOOOOOOOO

In the attached file there are few cases, where there is succeeding word, and then the next word which fails:
omo_zerotest.txt

Omorfi command is just:

./omorfi-analyse-text.sh omo_zerotest.txt

and I'm on centos. Omorfi compiled with:

/configure --prefix=/data/tmp/omorfi_2016/omorfi_koe16 --with-hfst=/work/env/hfst_390/bin/ --enable-ftb3

PEP-8

Please excuse my bikeshedding, but I've started and so I'll finish. I thought it might be nice to make the Python code in the project pass. I would submit a PR but it tends to cause lots of merge conflicts. So instead, if you think it's a good idea then:

  1. Add the following to setup.cfg if you want long lines:
    [flake8]
    ignore=E501

[isort]
line_length=120

  1. Run "autopep8 -r --in-place python" and commit/push

Then I can finish it off and add git pre commit hooks in a PR if you like

OOV analyses signal in-band

Related: #60

Since the Python API will currently splice any OOV input into an output string as-is, you can get some surprising results -- opening up the possibility of injections and surprising bugs in applications -- when dealing with arbitrary or malicious inputs, like this:

>> omorfi.analyse("injektio][UPOS=NOUN][NUM=SG][CASE=NOM][WEIGHT=0.000000")
[{'anal': '[WORD_ID=injektio][UPOS=NOUN][NUM=SG][CASE=NOM][WEIGHT=0.000000][GUESS=UNKNOWN][WEIGHT=inf]', 'weight': inf, 'OOV': 'Yes', 'guess': 'None'}]

`PKG_CHECK_MODULES` error when running `./configure`

Since I'm on Fedora and omorfi documentation mostly talks about Ubuntu, I decided to try and create an Ubuntu based Docker image for omorfi.

Here is my first experimental Dockerfile:

FROM ubuntu

COPY . /omorfi
WORKDIR /omorfi
RUN apt-get update

RUN apt-get install -y autoconf automake libtool wget
RUN wget https://apertium.projectjj.com/apt/install-release.sh -O - | bash
RUN ./autogen.sh

RUN apt-get install -y hfst libhfst-dev python3 python3-libhfst zip
ENV AUTORECONF=false

RUN ./configure

I disabled autoreconf since it would always just fail.

In the ./configure script I now get this error:

./configure: line 20368: syntax error near unexpected token `LIBHFST,'
./configure: line 20368: `PKG_CHECK_MODULES(LIBHFST, hfst >= 3.15,'

I tried with

  • both nigthly and release Apertium,
  • Ubuntu Xenial and Bionic,
  • with and without all Apertium dependencies listed on the Prerequisites for Debian page, as well as
  • defining PKG_CONFIG_PATH and --with-hfst= manually,

but none of those fixed the issue.

A few issues/bugs (mostly related to segmentation/tokenisation)

Hi,

I'm trying to install Omorfi and am running into a couple of (minor) issues.

  1. In omorfi.py, line 308, it should be something like "if self.tokeniser:"

  2. The tokeniser.hfst is currently not generated. I solved this by adding "generated/omorfi.tokenise.hfst" to src/Makefile.am (line 91), but this might not be required after all.

  3. When running omorfi-segment.py, I get the following error:

File "/usr/local/lib/python3.5/dist-packages/omorfi/omorfi.py", line 419, in _accept
    res = self.acceptor.lookup(token)
AttributeError: 'NoneType' object has no attribute 'lookup'

because the acceptor is skipped and not loaded:

Opening file /opt/tools/omorfi/share/omorfi/omorfi.accept.hfst
skipped ['omorfi', 'accept']

As far as I can see, the problem is line 89 in omorfi.py: the acceptor is included only if the include list is empty, but omorfi-segment.py calls it with "labelsegment=True, segment=True". Any suggestions on how to fix this without breaking the rest?

Thanks,
Yves

Some words don't work

Not sure this is the right place to report this, but here we go:

What steps will reproduce the problem?
java -Dfile.encoding=windows-1252 -jar hfst-ol.jar 
morphology.finntreebank.hfstol
Reading header...
Reading alphabet...
Reading transition and index tables...
Ready for input.
näin
näin    +?

same for:
hfst-optimized-lookup.exe 
C:\Users\Mendrik\Dropbox\omorfi\omorfi-generation.hfst.ol
näin
näin    +?

other words seem to work fine, when tested with talon for example.

But näin, tein, etc seems to break.

Original issue reported on code.google.com by [email protected] on 4 Feb 2015 at 11:54

factorise may have mismatched factors

It has been brought to my attention that the sort order of paths between automata used in omorfi-factorise.py is not fully stable such that some factors may originate from different analyses of the same surface string. There is no easy solution to this apart from multi-tape automaton for n>2, or a lot of hacks to simulate it in synchronised fashion...

Are some of OMorFi's scrapers and classifiers not currently publicly available?

...or is it more that keeping lexemes.tsv at to date with the lexicographic sources is not a current priority?

By scrapers and classifiers I'm referring to the arrows pointing into "lexemes.tsv" in this flowchart: https://github.com/flammie/omorfi/blob/master/doc/omorfi-flowchart-2015.svg

There appear to be some things in /externals/. For example fiwikt2omorfi.bash, combined and combined guess-csv2tsv.py and add-word.bash or add-lex-data.bash provide a route for new lexemes to enter lexemes.tsv -- but this seems broken at the moment -- for example fiwiki2omorfi generates a 524 line file from the latest dump, whereas a crude order of magnitude estimate using ``grep -c '<title>' src/externals/fiwiktionary-20160203-pages-articles.xml'' gives 391732 - I would expect ~10% of these to be Finnish words in their basic form.

Order of results in morphological analysis? How to pick the correct result?

Hi

I have lots of text that needs "simplification" and basic categorization of words. I would like to (1) replace each word by its baseform (lemmatization) and (2) get POS tag for each word (POS-tagging). The number of words and their order remains the same in the processed text, i.e., I need to pick one best result for each word.

I can use Omorfi's Python "analyse" and "lemmatise" functions for this job, but the problem is that these functions return multiple baseforms and POS-tags in apparently random order. I don't understand how the indexing/sorting of the output works.

Some example results for "analyse" function and wanted output:

"autoilijat" --> (('[WORD_ID=autoilija][UPOS=NOUN][NUM=PL][CASE=NOM]', 0.0), ('[WORD_ID=autoilla][UPOS=VERB][DRV=JA][NUM=PL][CASE=NOM]', 0.0))
--- Wanted output: "autoilija" (first result, UPOS=NOUN)

"autoilla" --> (('[WORD_ID=auto][UPOS=NOUN][NUM=PL][CASE=ADE]', 0.0), ('[WORD_ID=autoilla][UPOS=VERB][VOICE=ACT][INF=A][CASE=LAT]', 0.0), ('[WORD_ID=autoilla][UPOS=VERB][VOICE=PSS][MOOD=INDV][TENSE=PRESENT][PERS=PE4][NEG=CON]', 0.0), ('[WORD_ID=Autto][UPOS=PROPN][PROPER=GEO][NUM=PL][CASE=ADE]', 0.0), ('[WORD_ID=Autto_2][UPOS=PROPN][PROPER=LAST][NUM=PL][CASE=ADE]', 0.0))
--- Wanted output: "autoilla" (second result, UPOS=VERB)

"kirjoittaja" --> (('[WORD_ID=kirjoittaa][UPOS=VERB][DRV=JA][NUM=SG][CASE=NOM]', 0.0), ('[WORD_ID=kirjoittaja][UPOS=NOUN][NUM=SG][CASE=NOM]', 0.0))
---- Wanted output: "kirjoittaja" (second result, UPOS=NOUN)

"maalata" --> (('[WORD_ID=maa][UPOS=NOUN]ERRORMACROTOOSHORTFORCOMPOUND][NUM=SG][CASE=NOM][BOUNDARY=COMPOUND][WORD_ID=lata][UPOS=NOUN][NUM=SG][CASE=NOM]', 0.0), ('[WORD_ID=maalata][UPOS=VERB][VOICE=ACT][INF=A][CASE=LAT]', 0.0), ('[WORD_ID=maalata][UPOS=VERB][VOICE=PSS][MOOD=INDV][TENSE=PRESENT][PERS=PE4][NEG=CON]'
---- Wanted output: "maalata" (third result, UPOS=VERB)

So if the input is VERB, the output should be also VERB. Same for NOUN.

Is there way to pick the wanted result automatically? Simply choosing the first/last result does not work.

FYI: make problem with hfst-lexc (3.7.1) --Werror option

Hi,
Have been upgrading omorfi and noticed a small thing with the hfst 3.7.1 (which we still use), and only problem was that -Werror option wasn't found for the hfst-lexc.

omorfi-20150326]$ make
Making all in src
make[1]: Entering directory `/work/tmp/omorfi-20150326/src'
/usr/local/bin/python3 ./python/tsv_expand.py \
        -j ./paradigms/morphophonology.tsv \
        -c ./paradigms/stub-deletions.tsv -v -i generated/joint.tsv -o generated/master.tsv.unsrt
head -n 1 < generated/master.tsv.unsrt > generated/master.tsv
tail -n +2 < generated/master.tsv.unsrt | sort >> generated/master.tsv
rm -f generated/master.tsv.unsrt
cat continuations/acronym-stems.tsv continuations/adjective-stems.tsv continuations/digit-stubs.tsv continuations/digit-stems.tsv continuations/noun-stems.tsv continuations/numeral-stems.tsv continuations/particle-stems.tsv continuations/pronoun-stems.tsv continuations/symbol-stems.tsv continuations/51-stems.tsv continuations/verb-stems.tsv generated/timestamp | grep -v '^#' | fgrep -v 'HEADERS' | sort -k 1,1 > generated/stemparts.tsv
cat continuations/acro-inflections.tsv continuations/adjective-inflections.tsv continuations/digit-inflections.tsv continuations/noun-inflections.tsv continuations/numeral-inflections.tsv continuations/particle-inflections.tsv continuations/pronoun-inflections.tsv continuations/symbol-inflections.tsv continuations/verb-inflections.tsv generated/timestamp | grep -v '^#' | fgrep -v 'HEADERS' | sort -k 1,1 > generated/inflections.tsv
/usr/local/bin/python3 ./python/generate-lexcs.py -v -m generated/master.tsv \
        -p generated/stemparts.tsv \
        -i generated/inflections.tsv -o generated/omorfi-ftb3.lexc -f=ftb3
Writing everything to generated/omorfi-ftb3.lexc
Creating Multichar_Symbols and Root
Reading from generated/master.tsv
420000...
 424228  entries in master db
Reading from generated/stemparts.tsv
Reading from generated/inflections.tsv
*/usr/local/bin/hfst-lexc --Werror -o generated/omorfi-ftb3.lexc.hfst generated/omorfi-ftb3.lexc*
Try ``/usr/local/bin/hfst-lexc --help'' for more information.
/usr/local/bin/hfst-lexc: Unknown option
make[1]: *** [generated/omorfi-ftb3.lexc.hfst] Error 1
make[1]: Leaving directory `/work/tmp/omorfi-20150326/src'
make: *** [all-recursive] Error 1
$ hfst-lexc --version
hfst-lexc 0.1 (hfst 3.7.1)

so, I removed the -Werror flag and the compilation went fine after that.

diff Makefile Makefile.orig
1297c1297
<       $(HLEXC) -o $@ $<

---
>       $(HLEXC) --Werror -o $@ $<

OS used is Centos 7.1.
So just a note if anyone else happens to stumble to similar kind of combination.

Installation missing transducer files

According to README.md Omorfi installation should include the following files:

$ ls /usr/local/share/omorfi/
master.tsv           omorfi.describe.hfst      omorfi_recased.analyse.hfst   speller-omorfi.zhfst
omorfi.accept.hfst   omorfi.generate.hfst      omorfi_recased.describe.hfst
omorfi.analyse.hfst  omorfi.labelsegment.hfst  omorfi.segment.hfst
omorfi.cg3bin        omorfi.lemmatise.hfst     omorfi.tokenise.pmatchfst

My installation includes the following files:

marko@marko-VirtualBox:/home/marko/omorfi-20170515# ls /usr/local/share/omorfi/
master.tsv           omorfi.describe.hfst         omorfi_recased.describe.hfst
omorfi.accept.hfst   omorfi.generate.hfst         omorfi.tokenise.pmatchfst
omorfi.analyse.hfst  omorfi_recased.analyse.hfst  speller-omorfi.zhfst

Was there some kind of failure in the compilation/installation process?

Here's a typescript of the compilation and installation:
typescript1.log

Centos7 compilation fails due to undefined reference HfstTranducer::lookup_fd

From #56 got bit forwards, but now getting a new error

export PKG_CONFIG_PATH=/work/bin/hfst_v3.14.0/lib/pkgconfig

./configure --prefix=$OMOPATH --with-hfst=/work/bin/hfst_v3.14.0/bin --enable-ftb3 --enable-segmenter --enable-hyphenator --enable-lemmatiser

but something new pops up, and compilation ends to this:

make[1]: Entering directory `/data/tmp/git_asenn2018/omorfi'
Making all in src
make[2]: Entering directory `/data/tmp/git_asenn2018/omorfi/src'
/bin/sh ../libtool  --tag=CXX   --mode=compile g++ -std=gnu++11 -DHAVE_CONFIG_H -I. -I..     -g -O2 -I/work/bin/hfst_v3.14.0/include/hfst/   -g -O2 -MT cpp/libomorfi_la-omorfi.lo -MD -MP -MF cpp/.deps/libomorfi_la-omorfi.Tpo -c -o cpp/libomorfi_la-omorfi.lo `test -f 'cpp/omorfi.cc' || echo './'`cpp/omorfi.cc
libtool: compile:  g++ -std=gnu++11 -DHAVE_CONFIG_H -I. -I.. -g -O2 -I/work/bin/hfst_v3.14.0/include/hfst/ -g -O2 -MT cpp/libomorfi_la-omorfi.lo -MD -MP -MF cpp/.deps/libomorfi_la-omorfi.Tpo -c cpp/omorfi.cc  -fPIC -DPIC -o cpp/.libs/libomorfi_la-omorfi.o
libtool: compile:  g++ -std=gnu++11 -DHAVE_CONFIG_H -I. -I.. -g -O2 -I/work/bin/hfst_v3.14.0/include/hfst/ -g -O2 -MT cpp/libomorfi_la-omorfi.lo -MD -MP -MF cpp/.deps/libomorfi_la-omorfi.Tpo -c cpp/omorfi.cc -o cpp/libomorfi_la-omorfi.o >/dev/null 2>&1
mv -f cpp/.deps/libomorfi_la-omorfi.Tpo cpp/.deps/libomorfi_la-omorfi.Plo
/bin/sh ../libtool  --tag=CXX   --mode=link g++ -std=gnu++11  -g -O2 -I/work/bin/hfst_v3.14.0/include/hfst/   -g -O2 -no-undefined -version-info 0:0:0 -L/work/bin/hfst_v3.14.0/lib -lhfst    -o libomorfi.la -rpath /work/bin/omorfiversiot/06062018/lib cpp/libomorfi_la-omorfi.lo
libtool: link: g++  -fPIC -DPIC -shared -nostdlib /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crti.o /usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtbeginS.o  cpp/.libs/libomorfi_la-omorfi.o   -Wl,-rpath -Wl,/work/bin/hfst_v3.14.0/lib -Wl,-rpath -Wl,/work/bin/hfst_v3.14.0/lib -L/work/bin/hfst_v3.14.0/lib /work/bin/hfst_v3.14.0/lib/libhfst.so -lpthread -ldl -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../.. -lstdc++ -lm -lc -lgcc_s /usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtendS.o /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crtn.o  -O2 -O2   -Wl,-soname -Wl,libomorfi.so.0 -o .libs/libomorfi.so.0.0.0
libtool: link: (cd ".libs" && rm -f "libomorfi.so.0" && ln -s "libomorfi.so.0.0.0" "libomorfi.so.0")
libtool: link: (cd ".libs" && rm -f "libomorfi.so" && ln -s "libomorfi.so.0.0.0" "libomorfi.so")
libtool: link: ar cru .libs/libomorfi.a  cpp/libomorfi_la-omorfi.o
libtool: link: ranlib .libs/libomorfi.a
libtool: link: ( cd ".libs" && rm -f "libomorfi.la" && ln -s "../libomorfi.la" "libomorfi.la" )
g++ -std=gnu++11 -DHAVE_CONFIG_H -I. -I..     -g -O2 -I/work/bin/hfst_v3.14.0/include/hfst/   -g -O2 -MT cpp/omorfi_lookup-omorfi-lookup.o -MD -MP -MF cpp/.deps/omorfi_lookup-omorfi-lookup.Tpo -c -o cpp/omorfi_lookup-omorfi-lookup.o `test -f 'cpp/omorfi-lookup.cc' || echo './'`cpp/omorfi-lookup.cc
mv -f cpp/.deps/omorfi_lookup-omorfi-lookup.Tpo cpp/.deps/omorfi_lookup-omorfi-lookup.Po
/bin/sh ../libtool  --tag=CXX   --mode=link g++ -std=gnu++11  -g -O2 -I/work/bin/hfst_v3.14.0/include/hfst/   -g -O2   -o omorfi-lookup cpp/omorfi_lookup-omorfi-lookup.o libomorfi.la -L/work/bin/hfst_v3.14.0/lib -lhfst
libtool: link: g++ -std=gnu++11 -g -O2 -I/work/bin/hfst_v3.14.0/include/hfst/ -g -O2 -o .libs/omorfi-lookup cpp/omorfi_lookup-omorfi-lookup.o  ./.libs/libomorfi.so -L/work/bin/hfst_v3.14.0/lib /work/bin/hfst_v3.14.0/lib/libhfst.so -lpthread -ldl -Wl,-rpath -Wl,/work/bin/omorfiversiot/06062018/lib -Wl,-rpath -Wl,/work/bin/hfst_v3.14.0/lib
./.libs/libomorfi.so: undefined reference to `hfst::HfstTransducer::lookup_fd(std::string const&, long) const'
collect2: error: ld returned 1 exit status
make[2]: *** [omorfi-lookup] Error 1
make[2]: Leaving directory `/data/tmp/git_asenn2018/omorfi/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/data/tmp/git_asenn2018/omorfi'
make: *** [all] Error 2

Problem installing omorfi

Hi, I was following the instructions to install omorfi and encountered this problem when run 'make':

/usr/local/bin/hfst-lexc --Werror -o generated/omorfi-ftb3.lexc.hfst generated/omorfi-ftb3.lexc
/usr/local/bin/hfst-lexc: warning: Defaulting to OpenFst tropical type
generated/omorfi-ftb3.lexc:18.1: Syntax error in lexer (no valid token found at the point) [near: `b']
generated/omorfi-ftb3.lexc:18.1: syntax error, unexpected ERROR, expecting LEXICON_START or LEXICON_START_WRONG_CASE [near: `b']
/usr/local/bin/hfst-lexc: The file generated/omorfi-ftb3.lexc did not compile cleanly.
(if there are no error messages above, try -v or -d to get more info)
Makefile:1383: recipe for target 'generated/omorfi-ftb3.lexc.hfst' failed

I have all specified dependencies installed.

ftb3-analysis duplicates numerals in base forms

What happens:

hfst-lookup omorfi-ftb3.analyse.hfst
> 123
123 112233 Num Digit Nom Sg 7.099609

What should happen:

hfst-lookup omorfi-ftb3.analyse.hfst
> 123
123 123 Num Digit Nom Sg 7.099609

Error in omorfi/src/python/omorfi/regex_formatter.py after commit on 18.12.2015

Tried to make latest master from git repository and failed on calling
Omorfi.regex_formatter.format_rules_regex(formatter, ruleset) from python/generate_regexes.py with arguments:
format_rules_regex("ftb3", "rewrite-tags")
The problem that function is called with format name but not formatter object
Also the replaced old argument "format" is still in code thus providing undefined local var

omorfi-locate and running from custom directory

We've multiple omorfi versions in place, so there would be need to run omorfi from custom directories. On centos 7, just recompiled everything and installed to our local directory with the PREFIX as normally. However, running from the directory didn't work anymore, and adding omorfi directory to path didn't help either.

$ omorfi-analyse-text.sh

Usage: /work/bin/omorfiversiot/052016/bin/omorfi-analyse-text.sh [OPTION] [FILENAME...]
Omorfi could not be located. Search path is:
/usr/local/share/omorfi/omorfi-omor.analyse.hfst
/usr/local/share/omorfi/omorfi-ftb3.analyse.hfst
/usr/local/share/omorfi/omorfi-ftb1.analyse.hfst
~/.omorfi/omorfi-omor.analyse.hfst
~/.omorfi/omorfi-ftb3.analyse.hfst
~/.omorfi/omorfi-ftb1.analyse.hfst
./omorfi-omor.analyse.hfst
./omorfi-ftb3.analyse.hfst
./omorfi-ftb1.analyse.hfst
generated/omorfi-omor.analyse.hfst
generated/omorfi-ftb3.analyse.hfst
generated/omorfi-ftb1.analyse.hfst
src/generated/omorfi-omor.analyse.hfst
src/generated/omorfi-ftb3.analyse.hfst
src/generated/omorfi-ftb1.analyse.hfst
../src/generated/omorfi-omor.analyse.hfst
../src/generated/omorfi-ftb3.analyse.hfst
../src/generated/omorfi-ftb1.analyse.hfst

Notice though that when I just replaced one of the directories in omorfi-locate with

../share/omorfi

then it was happy and found those files it was looking for.

Centos7 omorfi compilation fails.

Would have liked to get latest omorfi with the hfst_v3.14.0 , but compilation of omorfi stops at the beginning to an error.

Environment

export PKG_CONFIG_PATH=/work/bin/hfst_v3.14.0/lib/pkgconfig (to help omorfi to find correct hfst)

then configure

./configure --prefix=$OMOPATH --with-hfst=/work/bin/hfst_v3.14.0/bin --enable-ftb3 --enable-hyphenator --enable-segmenter


make[2]: Entering directory `/data/tmp/git_asenn2018/omorfi/src'
/bin/sh ../libtool  --tag=CXX   --mode=compile g++ -std=gnu++11 -DHAVE_CONFIG_H -I. -I..     -g -O2 -I/work/bin/hfst_v3.14.0/include/hfst/   -g -O2 -MT cpp/libomorfi_la-omorfi.lo -MD -MP -MF cpp/.deps/libomorfi_la-omorfi.Tpo -c -o cpp/libomorfi_la-omorfi.lo `test -f 'cpp/omorfi.cc' || echo './'`cpp/omorfi.cc
libtool: compile:  g++ -std=gnu++11 -DHAVE_CONFIG_H -I. -I.. -g -O2 -I/work/bin/hfst_v3.14.0/include/hfst/ -g -O2 -MT cpp/libomorfi_la-omorfi.lo -MD -MP -MF cpp/.deps/libomorfi_la-omorfi.Tpo -c cpp/omorfi.cc  -fPIC -DPIC -o cpp/.libs/libomorfi_la-omorfi.o
cpp/omorfi.cc: In member function 'void omorfi::Omorfi::loadAllFromDefaultDirs()':
cpp/omorfi.cc:50:53: error: 'string' was not declared in this scope
             std::string homeomorfi = string(homepath) + "/.omorfi/";
                                                     ^
cpp/omorfi.cc:50:53: note: suggested alternatives:
In file included from /usr/include/c++/4.8.2/string:39:0,
                 from /usr/local/include/hfst/HfstDataTypes.h:19,
                 from /usr/local/include/hfst/HfstTransducer.h:21,
                 from /usr/local/include/hfst/hfst.h:21,
                 from cpp/omorfi.cc:26:
/usr/include/c++/4.8.2/bits/stringfwd.h:62:33: note:   'std::string'
   typedef basic_string<char>    string;
                                 ^
/usr/include/c++/4.8.2/bits/stringfwd.h:62:33: note:   'std::string'
make[2]: *** [cpp/libomorfi_la-omorfi.lo] Error 1
make[2]: Leaving directory `/data/tmp/git_asenn2018/omorfi/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/data/tmp/git_asenn2018/omorfi'
make: *** [all] Error 2

(autogen version is 2.69 , automake 1.13.4, libtool 2.4.2). configure phase looked quite normal:


Configured omorfi 20171115_alpha:
=======
* Analysers: /work/bin/hfst_v3.14.0/bin
    * OMOR yes (flags: --omor-props   --omor-sem)
    * FTB3.1 yes
    * apertium no
    * giella: no
    * labeled segmenter: no
* Limits:
    * tiny lexicons:
    * big tests:
* Applications
    * Voikko speller: yes
    * segmenter: yes
    * lemmatiser: no
    * hyphenators: yes

I'll probably try with some earlier version, to see if it is something in our environment that has changed or has not been changed.

But just in case also to here, so thanks for all the help.

Some wiki pages don't render. Switch to gh-pages?

https://github.com/flammie/omorfi/wiki/Paradigms-and-stuffs doesn't render for me. One solution would be to switch to Github pages, eg with Jekyll. I've gone ahead and done this, I will admit, partially just as just as an exercise in procrastination. You can pull the gh-pages branch from my fork and enable it in this settings for your fork here.

See it here: http://frankier.github.io/omorfi/stats/Paradigms-and-stuffs.html

Apart from the advantages that all pages render, wide tables are properly displayed (eg http://frankier.github.io/omorfi//stats/Inflection-paradigms-by-example.html), it allows categorisation of pages in the navigation and, arguably, it's prettier, another advantage is it could allow more automation of the Statistics pages. https://github.com/w3ctag/promises-guide shows a way of taking advantage of travis to build gh-pages -- it might be possible to do something similar to get the statistics pages to track the master branch.

Broken pipe problem

We've been running omorfi-analyse-text.sh for a very long file. At certain point(s) there comes this broken pipe issue:

/work/bin/omorfiversiot/042016/bin/omorfi-analyse-text.sh: line 59:
30360 Broken pipe cat $@
30361 | $cleanup
30362 | sed -e 's/.[]//g'
30363 Killed | /work/bin/hfst-3.10.0/bin/hfst-proc -x "$omorfifile"

Updated around last Thursday to latest omorfi and to HFST 3.10.0. The problem can be reproduced with the attached file:

omo_pipe.txt

The three first lines pass fine, but the third IQQ] or 5th line causes omorfi to hang. Not sure if it is the ] mark at the end of the line or the next one, which is just multiple -*'s.

VISL CG-3 required on OSX (undocumented)

On OSX, the build fails unless VISL CG-3 is installed, because $(CGCOMP) defaults to "false". There is no mention of this in the README, and the configure script doesn't show any error about it.

# {{{ VISL CG 3
generated/omorfi.cg3bin: vislcg3/omorfi.cg3
    $(CGCOMP) $< $@
# }}}

Easiest way to install vislcg3 on Mac OS X is:

brew install homebrew/science/vislcg3

However, the vislcg3 formula is currently broken and I've submitted a PR to update the download URL at https://github.com/Homebrew/homebrew-science/pull/2573/files

construct neutral fast interchange format

Omorfi starts to need pipelines to combine e.g. rule-based disambiguation with conll-u analysis, we can't have n-to-m lossy conversions on every tool. I think json might be good since many of the objects getting passed are basically lists of dicts of lists of dicts. But json needs to be read start to end and legacy formats process almost line-in-line-out.

Feature request: getting quality value for a file

Omorfi has that nice option that you can give it a file that you can process via omorfi.

We thought about omorfi recently and it came to our mind, that getting some kind of "quality value" for the whole file could be really useful. I.e. to have some kind of new parameter, which would output :

x count of words, n pcs of words not recognized(+?) , qualityestimate: 80 %

Build fails

Can't build the newest version:

Suffix fail! Trying to rstrip pi from Alarautalahti
Unstubbable! Trying to rstrip pi from Alarautalahti
Word has been misclassified or suffix stripping is insufficient.Fix the database or stripping rules to continue. Relevant word entry:
{'deletion': 'pi', 'stub': None, 'suffix_regex': '[AOUŌÓÅaouåôąáăâāóōúūû][^äöy]*pi', 'is_proper': None, 'style': None, 'kotus_av': 'E', 'possessive': False, 'gradestem': 'Alarautalahti', 'is_suffix': None, 'clitics': False, 'pronoun': None, 'sem': None, 'upos': 'PROPN', 'grade_dir': 'weaken', 'proper_noun_class': None, 'numtype': None, 'bracketstub': 'Alarautalahti', 'real_pos': None, 'extra_i': False, 'adptype': None, 'particle': None, 'stem_diphthong': None, 'stem_vowel': None, 'boundaries': None, 'lex': None, 'twolstem': None, 'prontype': None, 'plurale_tantum': False, 'harmony': 'back', 'argument': None, 'kotus_tn': 7, 'abbr': None, 'extra_e': False, 'homonym': '1', 'blacklist': None, 'pronunciation': None, 'lemma': 'Alarautalahti', 'is_prefix': None, 'symbol': None, 'pos': 'NOUN', 'origin': 'fiwikt', 'new_para': 'PROPN_KORPI'}
Makefile:1919: recipe for target 'generated/master.tsv' failed

Configured omorfi 20190511:

  • Analysers: yes
    • OMOR yes (flags: --omor-props --omor-sem)
    • FTB3.1 no
    • apertium no
    • giella: no
    • labeled segmenter: no
  • Limits:
    • tiny lexicons:
    • big tests:
  • Applications
    • Voikko speller: yes
    • segmenter: no
    • lemmatiser: no
    • hyphenators: no
  • Libraries
    • Python 3: yes
    • Java: no
    • C++: yes
  • Clusters
    • run tests on PBS cluster: false → mailto: no
    • run tests on SLURM cluster: false → mailto: no
      say make, make check and make install to proceed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.