Code Monkey home page Code Monkey logo

nfst's Introduction

Lucene Finite State Transducers

A simple wrapper around Lucene's FST implementation in the spirit of the Rust FST command-line tool.

Building

To build and run you need to have Java 8 and sbt installed.

The following should build and create a script wrapper under target/pack/bin.

sbt pack

Running

For a full list of options run

./target/pack/bin/cli --help

If you have multiple versions of Java installed, export JAVA_HOME, e.g.

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd6

Alternatively, you can run directly with:

java -cp target/pack/lib/\* prototypes.fst.cli --help

Examples

  1. Build an FST from a CSV file:

    LC_ALL=C sort input.csv -o sorted.csv
    ./target/pack/bin/cli map --values --delimiter , sorted.csv out.fst
  2. Build an FST from a file containing 1-grams to 5-grams encoded as ordinals in decimal, separated by spaces. The values will be the count of each entry:

    env LC_ALL=C sort -k1n -k2n -k3n -k4n -k5n input.ordinals -o sorted.ordinals
    ./target/pack/bin/cli map --format ints sorted.ordinals out.fst
  3. Lookup up the count for an ordinal in our previous FST:

    ./target/pack/bin/cli get --format ints out.fst 42
    42 2679

nfst's People

Contributors

mpitid avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.