Code Monkey home page Code Monkey logo

Comments (6)

lithammer avatar lithammer commented on May 16, 2024

I actually played around with something like that for RankFind at one point. What I found was that unless you set a high threshold, most longer targets would never match.

Say you search for "aaa" in "aaabbbbbbbbbbbbbbbbbbb" and your threshold is 15 it wouldn't match. Perhaps that's exactly what you wanted, but I think it might have to be a bit smarter and take length into consideration as well or something... This is basically when I went with the current solution of RankFind 😃

Another approach would be to create a RankFindFunc(s, t string, fn func(t, s string) bool) where you can define your own criteria.

from fuzzysearch.

lithammer avatar lithammer commented on May 16, 2024

Then you'd have something like this

const threshold = 15

func predicate(s, t string) bool {
    distance := LevenshteinDistance(s, t)  // 19
    return distance < threshold
}

fuzzy.RankFindFunc("aaa", "aaabbbbbbbbbbbbbbbbbbb", predicate)  // false

from fuzzysearch.

colelawrence avatar colelawrence commented on May 16, 2024

Interesting, thank you for the explanations!

On Wed, Sep 9, 2015, 6:08 AM Peter Renström [email protected]
wrote:

Then you'd have something like this

const threshold = 15
func predicate(s, t string) bool {
distance := LevenshteinDistance(s, t)
return distance < threshold
}
RankFindFunc("aaa", "aaabbbbbbbbbbbbbbbbbbb", predicate)


Reply to this email directly or view it on GitHub
#10 (comment)
.

from fuzzysearch.

lithammer avatar lithammer commented on May 16, 2024

Yeah, so I'm not sure the distance alone is enough to determine a match. Having "a" match "ab" but not "abc" is confusing, it should give at least as many hits as a plain old substring search.

Did you have a good use-case for this? Perhaps I'm just missing something here 😄

from fuzzysearch.

colelawrence avatar colelawrence commented on May 16, 2024

Yeah, so I use it to create a search engine for my GoCourseSort project, and the reason I use Levenshtein instead of simple match, is so a word in the search query matches a keyword from the database of strings.

I think it is vaguely similar to the way BigTables work (don't quote me).

Imagine book titles: "Gone with the Wind", "Gone Girl", and "The Girls", we create an index of key words with references:

gonegirl := &Book{
  title: "Gone Girl",
}
gonewiththewind := &Book{
  title: "Gone with the Wind",
}
thegirls := &Book{
  title: "The Girls",
}
indexByKeywords := map[string][]*Book {
  "girls": { thegirls },
  "girl": { gonegirl },
  "gone": { gonegirl, gonewiththewind },
  "with": { gonewiththewind },
  "the": { gonewiththewind, thegirls },
  "wind": { gonewiththewind },
}

Now you enter the search term "the girls"

Then I'm Levenshtein ranking "the" against the keys of indexByKeywords, and "girls" against the keys of indexByKeywords, then using a ranking formula based on Levenshtein distance, index of word in search, and index of word in title, for each list of references I get back per search term.

This is important because in my search engine, I don't want order of words to be important in any way, but I need "CS" to match "CSC" and "131" to match "121", because I'm using it for a college course catalog.

from fuzzysearch.

elazarl avatar elazarl commented on May 16, 2024

You might want to look at my implementation and the fuzzy find

from fuzzysearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.