Code Monkey home page Code Monkey logo

lexd's Introduction

Lexd

A lexicon compiler specialising in non-suffixational morphologies.

This module compiles lexicons in a format loosely based on hfst-lexc and produces transducers in ATT format which are equivalent to those produced using the overgenerate-and-constrain approach with hfst-twolc (see here and here). However, it is much faster (see below).

See Usage.md for the rule file syntax.

Installation

First, clone this repository.

To build, do

./autogen.sh
make
make install

If installing to a system-wide path, you may want to run sudo make install instead for the last step.

To compile a lexicon file into a transducer, do

lexd lexicon_file att_file

To get a speed comparison, do

make timing-test

To run basic feature smoke-tests (fast), do

make check

Why is it faster?

When dealing with prefixes, the overgenerate-and-constrain approach initially builds a transducer like this:

transducer that overgenerates

Then composes that with a twolc rule to turn it into somehting like this:

correct transducer

But compiling the rule needed to do that can take hundreds of times longer than compiling the lexicon.

Lexd, meanwhile, makes multiple copies of the lexical portion and attaches one to each prefix, thus generating the second transducer directly in a similar amount of time to what is required to generate the first one.

Language Wamesa Hebrew Navajo Lingala
Stems 262 127 19 1470
Total forms 12576 2540 473 1649496
Path restrictions 14 10 17 19
Lexc + Twolc
Lexc compilation 25ms 15ms 25ms 230ms
Twolc compilation 10245ms 1360ms 8460ms 275525ms
Rule composition 2050ms 225ms 1705ms 45550ms
Minimization 65ms 5ms 20ms 155ms
Total time 12385ms 1605ms 10210ms 321460ms
Lexd
Lexd compilation 210ms 85ms 10ms 490ms
Format conversion 30ms 5ms 5ms 55ms
Total time 240ms 90ms 15ms 545ms
Speedup 52x 18x 681x 590x

lexd's People

Contributors

ahmedsiam0 avatar jonorthwash avatar mr-martian avatar nlhowell avatar tinodidriksen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.