Code Monkey home page Code Monkey logo

clj-wordnet's Introduction

Clj-WordNet Build Status

A thin/partial wrapper around some JWI functionality, for interfacing the WordNet database using idiomatic Clojure.

Prerequisites

You will need Leiningen 2.3.4 or above installed.

Building

To build and install the library locally, run:

$ git submodule update --init data
$ lein test
$ lein install

Including in your project

There is an initial version hosted at Clojars. For leiningen include a dependency:

[clj-wordnet "0.1.0"]

For maven-based projects, add the following to your pom.xml:

<dependency>
  <groupId>clj-wordnet</groupId>
  <artifactId>clj-wordnet</artifactId>
  <version>0.1.0</version>
</dependency>

A snapshot version is also available, use "0.1.1-SNAPSHOT".

WordNet Database

The WordNet database is not bundled in this project; it is referenced via a git submodule, in order to run integration tests. In order to ensure the submodule is properly initialised, follow the build instructions above.

Quick Examples

(def wordnet (make-dictionary "../path-to/wordnet/dict/"))

(def dog (first (wordnet "dog" :noun)))

(:lemma dog)
=> "dog"

(:pos dog)
=> :noun

(:gloss dog)
=> "a member of the genus Canis (probably descended from the common wolf) that
    has been domesticated by man since prehistoric times; occurs in many breeds; 
    \"the dog barked all night\""   

(map :lemma (words (:synset dog))
=> ("dog" "domestic_dog", "Canis_familiaris")

(def frump (first (wordnet "frump" :noun)))

(map :lemma (related-words frump :derivationally-related))
=> ("frumpy")

(map :lemma (flatten (vals (related-synsets dog :hypernym))))
=> ("domestic_animal" "domesticated_animal" "canine" "canid")

Dictionary

The default dictionary will load definitions from the database as needed and they will be cached as necessary. If higher performance is required and there is sufficient memory available to the JVM, then the dictionary can be made to be resident entirely in memory, as below. This will force an immediate load of the dictionary into RAM, where there may be a perceptible delay on startup.

(def wordnet (make-dictionary "../path-to/wordnet/dict/" :in-memory))

Note: Wordnet is quite large, and usually won’t fit into the standard heap on most 32-bit JVMs. You need to increase your heap size. On the Sun JVM, this involves the command line flag -Xmx along with a reasonable heap size, say, 500 MB or 1 GB.

Word Lookup

Word definitions can be fetched using the make-dictionary factory as per the example below:

(def wordnet (make-dictionary "../path-to/wordnet/dict/"))

(wordnet "car#n#1")    ; fetch the first noun definition for car

(wordnet "bus")        ; fetch a list of all definitions for bus

(wordnet "row" :noun)  ; fetch a list of all noun definitions for row

(wordnet "row#v#1")    ; fetch the single verb definition for row

(wordnet "WID-02086723-N-01-dog" ; fetch the word with the specified ID

(wordnet "SID-02086723-N" ; fetch the synset with the specified ID

See Also

The JWI has been mavenized and rolled up into a github repo, here: https://github.com/delver/jwi. The resulting artifacts have been deployed to http://repo.delver.io/releases, and this has been referenced in this project's repository resolution section.

TODO

  • Implement (make-dictionary "../path-to/wordnet/dict/" :in-memory) to use RAM-based dictionary
  • Coerce functions into separate namespace
  • Re-implement (related-synsets ...) and (related-words ...)
  • Push JWI 2.2.4 to central repository
  • Unit tests & Travis CI
  • Implement more similarity algorithms
  • Improve performance

License

Same as JWI: MIT / Creative Commons 3.0

clj-wordnet's People

Contributors

rm-hull avatar hotwoofy avatar bitdeli-chef avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.