Code Monkey home page Code Monkey logo

gocleo's Introduction

gocleo

##A golang implementation of the Cleo search.

The Cleo search is explained here: Linked in original article

The source for Jingwei Wu's version can be found here: Jingwei's version

Basically, this is a golang version of the original program. The original program is written in Java. I have included a corpus of words to search for. I downloaded this corpus from http://www.wordfrequency.info/

###Algorithm overview

  • The algorithm starts out by searching for matches in the inverted index. The inverted index contains a map of the word's prefix (up to 4 chars). Each word prefix maps to an array of document ID, bloom filter tuples.
  • The bloom filter of each candidate is compared against the query's bloom filter. If it matches successfully, the candidate makes it to the next round.
  • The remaining words are scored by their levenshtein distance to the query, then normalized using the Jaccard coefficient.
  • The final words are returned as JSON
  • You can also change how scoring works if you like. You just need to provide a function that conforms to func(s1, s2 string) (score float64)

###Instructions This is a sample app:

package main
import "github.com/jamra/gocleo"

func main(){
  cleo.InitAndRun("w1_fixed.txt", "8080", nil) //The last parameter is optional. Defaults to Levenshtein distance normalized by Jaccard coefficient
}

Run the program and navigate to localhost:8080/cleo/{query}

{query} is your search. e.g.("tractor", "nightingale", "pizza")

###Your own corpus You can have the search run off of your own corpus so long as each term is separated by a new line. w1_fixed.txt is provided as an example.

###Setup This should work with go get

go get github.com/jamra/gocleo

###TODO

  • Give the user the ability to add and remove words from the index.
  • More robust Unit testing

gocleo's People

Contributors

jamra avatar

Watchers

 avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.