Code Monkey home page Code Monkey logo

fuzzysearch's Introduction

Fuzzy Search

Inspired by bevacqua/fuzzysearch, a fuzzy matching library written in JavaScript. But contains some extras like ranking using Levenshtein distance and finding matches in a list of words.

Fuzzy searching allows for flexibly matching a string with partial input, useful for filtering data very quickly based on lightweight user input.

The current implementation uses the algorithm suggested by Mr. Aleph, a russian compiler engineer working at V8.

Install

go get github.com/lithammer/fuzzysearch/fuzzy

Usage

package main

import "github.com/lithammer/fuzzysearch/fuzzy"

func main() {
	fuzzy.Match("twl", "cartwheel")  // true
	fuzzy.Match("cart", "cartwheel") // true
	fuzzy.Match("cw", "cartwheel")   // true
	fuzzy.Match("ee", "cartwheel")   // true
	fuzzy.Match("art", "cartwheel")  // true
	fuzzy.Match("eeel", "cartwheel") // false
	fuzzy.Match("dog", "cartwheel")  // false
	fuzzy.Match("kitten", "sitting") // false
	
	fuzzy.RankMatch("kitten", "sitting") // -1
	fuzzy.RankMatch("cart", "cartwheel") // 5
	
	words := []string{"cartwheel", "foobar", "wheel", "baz"}
	fuzzy.Find("whl", words) // [cartwheel wheel]
	
	fuzzy.RankFind("whl", words) // [{whl cartwheel 6 0} {whl wheel 2 2}]
	
	// Unicode normalized matching.
	fuzzy.MatchNormalized("cartwheel", "cartwhéél") // true

	// Case insensitive matching.
	fuzzy.MatchFold("ArTeeL", "cartwheel") // true
}

You can sort the result of a fuzzy.RankFind() call using the sort package in the standard library:

matches := fuzzy.RankFind("whl", words) // [{whl cartwheel 6 0} {whl wheel 2 2}]
sort.Sort(matches) // [{whl wheel 2 2} {whl cartwheel 6 0}]

See the fuzzy package documentation for more examples.

License

MIT

fuzzysearch's People

Contributors

dependabot[bot] avatar evidlo avatar harrison3000 avatar jeanmatheussouto avatar josharian avatar ktprograms avatar lithammer avatar mlwelles avatar nyaxt avatar rharriso avatar tang-hi avatar tokozedg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fuzzysearch's Issues

Expose Normalize function

Hello, I think creating the Normalize and NormalizaFold functions might be a good idea.
When you want to reuse a []target you don't need to normalize it again. Just normalize once and then normalize source and use Find().

New Release + upgrade to go 1.18.

Hi,

Thank you for maintaining this repo.

Was wondering if you could create a new release please? The last release seems to be using go 1.15 and was in Oct 2020.

Thanks.

Add case insensitive RankFind

Hi, thanks for this library!

Currently, it seems that MatchFold can be used for case insensitive matching, but there doesn't seem to be any way to do case insensitive RankFind.

Using MaxInt instead of -1

First of all, what an awesome lib! Thanks for the work. But I do have this design doubt:

Let's say I wanna compare strings two by two to a target using RankMatchNormalizedFold, for example:

s1 := "Hello Wor"
s2 := "Hello"
target := "hello world"
fmt.Println(fuzzy.RankMatchNormalizedFold(s1, target)) //2
fmt.Println(fuzzy.RankMatchNormalizedFold(s2, target)) //6

I could easily use math.Min(2,6) -> "Hello Wor" to find the best string in this case. But it would get me in trouble if any of the strings don't match, eg:

s1 := "foo"
s2 := "Hello"
target := "hello world"
fmt.Println(fuzzy.RankMatchNormalizedFold(s1, target)) //-1
fmt.Println(fuzzy.RankMatchNormalizedFold(s2, target)) //6

Where math.Min(-1,6) -> "foo"

Why using -1 instead of math.MaxInt if the lower the distance, the closer the strings?

mishandling of utf8 replacement character

Add this test case to var fuzzyTests, and run the tests:

	{"\xffinvalid UTF-8\xff", "", false, -1},

Result:

--- FAIL: TestFuzzyMatchFold (0.00s)
panic: runtime error: slice bounds out of range [19:15] [recovered]
	panic: runtime error: slice bounds out of range [19:15]

goroutine 35 [running]:
testing.tRunner.func1.2({0x1031423e0, 0x14000114018})
	/Users/josh/go/1.20/src/testing/testing.go:1526 +0x1c8
testing.tRunner.func1()
	/Users/josh/go/1.20/src/testing/testing.go:1529 +0x384
panic({0x1031423e0, 0x14000114018})
	/Users/josh/go/1.20/src/runtime/panic.go:884 +0x204
golang.org/x/text/transform.String({0x103153b28, 0x103297c60}, {0x1030cb139, 0xf})
	/Users/josh/pkg/mod/golang.org/x/[email protected]/transform/transform.go:650 +0x9e4
github.com/lithammer/fuzzysearch/fuzzy.stringTransform({0x1030cb139, 0xf}, {0x103153b28?, 0x103297c60?})
	/Users/josh/x/fuzzysearch/fuzzy/fuzzy.go:242 +0x64
github.com/lithammer/fuzzysearch/fuzzy.match({0x1030cb139?, 0x7?}, {0x0, 0x0}, {0x103153b28, 0x103297c60})
	/Users/josh/x/fuzzysearch/fuzzy/fuzzy.go:55 +0x38
github.com/lithammer/fuzzysearch/fuzzy.MatchFold(...)
	/Users/josh/x/fuzzysearch/fuzzy/fuzzy.go:41
github.com/lithammer/fuzzysearch/fuzzy.TestFuzzyMatchFold(0x1400011cb60)
	/Users/josh/x/fuzzysearch/fuzzy/fuzzy_test.go:65 +0xbc
testing.tRunner(0x1400011cb60, 0x103152088)
	/Users/josh/go/1.20/src/testing/testing.go:1576 +0x10c
created by testing.(*T).Run
	/Users/josh/go/1.20/src/testing/testing.go:1629 +0x368
exit status 2
FAIL	github.com/lithammer/fuzzysearch/fuzzy	0.131s

This existed prior to #53 (phew!). The root cause is that unicodeFoldTransformer.Transform is returning n, n, err, but when utf8.RuneError is present, nSrc may differ from nDst. I'll try to put together a fix sometime soonish.

(Found by fuzzing. Once the fuzz tests make it out of the gate without stumbling, I'll PR them.)

Getting error:" github.com/lithammer/fuzzysearch: module github.com/lithammer/fuzzysearch@latest found (v1.1.2), but does not contain package github.com/lithammer/fuzzysearch"

When I run go mod tidy I get the following error:

go mod tidy
go: finding module for package github.com/lithammer/fuzzysearch
project/cmd imports
       github.com/lithammer/fuzzysearch: module github.com/lithammer/fuzzysearch@latest found (v1.1.2), but does not contain package github.com/lithammer/fuzzysearch

The import is not resolving correctly in my project. Any ideas?

panic in transform.go when it's trying to reslice

It seems it's trying to reslice with invalid index in v1.1.5

panic: runtime error: slice bounds out of range [5:3]

goroutine 399472 [running]:
golang.org/x/text/transform.String({0xe4c3d0, 0x13ce5c0}, {0xc00042d358, 0x3})
	/vendor/golang.org/x/text/transform/transform.go:650 +0xbe5
github.com/lithammer/fuzzysearch/fuzzy.stringTransform({0xc00042d358, 0x3}, {0xe4c3d0?, 0x13ce5c0?})
	/vendor/github.com/lithammer/fuzzysearch/fuzzy/fuzzy.go:243 +0x5a
github.com/lithammer/fuzzysearch/fuzzy.match({0xc00042d358?, 0x44fe72?}, {0xc0007a1950, 0x13}, {0xe4c3d0, 0x13ce5c0})

new type for targets

Hello everyone.
I want to ask you why this library uses string type for targets, not a Stringer interface or other. I think devs more often have to search in a slice of objects than in an array of raw strings. I suppose that this change type for targets will be useful.

Consider implementing support of multi-byte character sets

Current implementation assumes that strings are indexed by characters, this assumption holds only for 1-byte UTF8 subset (latin alphabet); Go strings are UTF8 strings, so proper way of iterating over characters would be using range. Right now for non-latin strings this would search byte distance, rendering this package useless (in a non-obvious way).

Exact match mode?

In fzf, for example, there's so called "exact search mode", triggered by ', which searches the exact phrase provided. Would it be possible to have something like this in fuzzysearch?

I.e., as of now, searching the string ae in archive, arkenfox, aerc, zambezi will return all four candidates. However, if an option for strict matching could be added, the result would be only one, aerc.

Why needed? Well, because exactly of this, sometimes there are just too many candidates :)

What do you think?

RankMatch seems to be broken

README.md says:

fuzzy.RankMatch("kitten", "sitting") // 3

But RankMatch returns -1 always.

package main

import (
	"fmt"

	"github.com/lithammer/fuzzysearch/fuzzy"
)

func main() {
	src := "kitten"
	target := "sitting"
	m := fuzzy.RankMatch(src, target)
	fmt.Println(m)
}

-1

https://play.golang.org/p/llP1davLYrK

Change LevenshteinDistance to Damerau–Levenshtein distance?

Hi,
Damerau–LevenshteinDistance is the minimum number of edit operations ( insertions, deletions ,substitutions and transposition) required to change one word to another.It has one more action(transposition) compared to LevenshteinDistance.
I think Damerau–LevenshteinDistance is more practical than LevenshteinDistance. And it's time complexity is also O(N^2), but unfortunately it's space complexity is O(N^2) while LevenshteinDistance's space complexity is O(N)

Do you think it's an acceptable change? If so, I would like to raise a PR to change LevenshteinDistance to Damerau–Levenshtein distance.

Unecessary alocations due to transformer.Nop

I noticed that when doing Match with no normalization or folding the nop transformer does a unecessary roundtrip string -> []byte -> string, this results in 2 allocations, in my case these 2 allocs per operation end up taking 40% of the CPU time and adding a lot of GC pressure

I made a patch that fixes the issue by avoiding the transform.Nop (all tests passed, and the benchmarks confirmed 0 allocs)
noallocnop.txt

Introduce gopkg support

Hi,

I'm planning of using your library, but in order to have no future build-issues, I would like to pin versions to avoid the obvious problems. Would you mind creating branches/tags for your releases? The (simple) instructions are on: https://labix.org/gopkg.in

Cheers,

Mark

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.