Code Monkey home page Code Monkey logo

fuzzywuzzy's People

Contributors

ascopes avatar burdoto avatar comroid-commit avatar efritzsche avatar michaeltandecki avatar xdrop avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fuzzywuzzy's Issues

Can fuzzywuzzy be used in this case?

I have the description of a YouTube video and I want to find if a specific word appears in the text, including typos. For example take the following description

If you tell me you're super busy, I'm going to ask to see your written plan.\n\nMy book "10 Steps to Earning Awesome Grades" is now out and it's free! Get it here:\n\nhttp://collegeinfogeek.com/get-better-grades/\n\nIf you want to get even more strategies and tips on becoming a more productive, successful student, subscribe to my channel right here:\n\nhttp://buff.ly/1vQP5ar\n\nConnect with me on Twitter!\n\nhttps://twitter.com/TomFrankly\n\nCompanion blog post with notes and resource links: \n\nhttp://collegeinfogeek.com/massive-workloads/

I would like to know if the word twitter is present in the description. I would then do

FuzzySearch.extractOne(videoDescription, Arrays.asList("Twitter"))
// (string: Twitter, score: 57, index: 0)

And if the text has typos the score decreases as expected.

Is this a good use for the library?

Allow to use any object as a choice

First of all, thank you for this great library.
However, there's a small issue I have with it: For one of my projects I'm implementing a search for JavaDoc methods and have a class JavadocMethod with methods like getMethodName(), getClassName() and getUrl().
For searching it would be very convenient to just use the object itself for search, so I can access the url of the found method.
I'm thinking about a generic solution like this:

public static <T> List<ExtractedResult<T>> extractTop(String query, Collection<T> choices, Function<T, String> mapper, int limit)

which allows to use any object by just providing a function which maps this object to a string.

Collection<JavadocMethod> methods = ...;
FuzzySearch.extractTop("String#valeuOf(loong)", methods, method -> String.format("%s#%s", method.getClassName(), method.getMethodName()), 5);

Can you imagine implementing such a feature or accept a pull requests that adds it?

How to set the scorer like the python fuzzywuzzy?

In the python fuzzy-wuzzy, we can set the scorer we want to use in extracting the result. How we can do it here?

process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
    ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

Do we have any gitter, discord in order to ask such questions?

Difference between java and python implementation: Spoiler, the problem is the round

Here is the example where I was stuck. The python implemention gets 22, and the java implementation gets 23:

fuzz.token_set_ratio(
  "Vêndo ou troco por outro carro pode ser atrasado negócio volta ",
  "Titan 150 ano 2005 ", 
  False
)
FuzzySearch.tokenSetRatio(
  "Vêndo ou troco por outro carro pode ser atrasado negócio volta", 
  "Titan 150 ano 2005"
);

Debugging both code I could find that the problem is when rounding the value: 22.5

Python code, located in utils.py:

int(round(n))

Java code, located in SimpleRatio class is:

(int) Math.round(100 * DiffUtils.getRatio(s1, s2));

TLDR:

Java: Math.round(22.5) => 23
Python: round(22.5) => 22

Don't know which one is correct for this algorithm...

Mismatch result if the keyword doesn't exist in the dataset

When I search word that doesn't exists in the data set for comparison it will suggest incorrectly or it cannot detect if the word misspelled or not

ArrayList<String> dataSet = new ArrayList<>();
dataSet.add("Iphone");
dataSet.add("white");
dataSet.add("black");
dataSet.add("Samsung");
dataSet.add("galaxy");
dataSet.add("gallileo");
dataSet.add("galaksi");
dataSet.add("harry");
dataSet.add("potter");

//string to be compared
String[] searchKeyword = new String[] {"hari poter", "smsung glxy", "xiaomi mi2", "jamu godhong telo"};
for(int i=0;i<searchKeyword.length;i++) {
	String[] keywords =  searchKeyword[i].split(" ");
	long start = System.currentTimeMillis();		
        List<String> checked = new ArrayList<>();
        Arrays.asList(keywords).stream().sequential().forEach(keyword ->{
		ExtractedResult res = FuzzySearch.extractOne(keyword, dataSet);
		checked.add(res.getString());
	});
	long end = System.currentTimeMillis() - start;
	System.out.println(String.format("keyword:%s , spell-checked: %s took:%d", searchKeyword[i], checked, end));
}

Result will be like this

keyword:hari poter , spell-checked: [harry, potter] took:123
keyword:smsung glxy , spell-checked: [Samsung, galaxy] took:6
keyword:xiaomi mi2 , spell-checked: [Iphone, white] took:5
keyword:jamu godhong telo , spell-checked: [Samsung, Iphone, gallileo] took:8

Strange result

Hi. Ratio value between words "гигантская" and "гигансткая" is 90.
In my opinion, here something is wrong. Or is this a normal result of the library?

Convert codes to apex class

Hi,

I want to convert your fuzzywuzzy codes to apex class language (language in Salesforce cloud), which has very similar syntax to Java. But currently I'm only planning to use SimpleRatio and PartialRatio. Am I allowed to do that? I also plan to opensource the result to my own project

Thank you in advance!

install failure

mvn install causes the following test failure on win 7 in gitbash with java 1.8.0_144

Running me.xdrop.fuzzywuzzy.algorithms.DefaultStringProcessorTest
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.288 sec <<< FAILURE!
testProcess(me.xdrop.fuzzywuzzy.algorithms.DefaultStringProcessorTest) Time elapsed: 0.075 sec <<< FAILURE!
junit.framework.ComparisonFailure: expected:<s trim [μεγιουνικουντ] n o n a lph a n um> but was:<s trim [▒ ▒▒ ▒ ▒ ▒ ▒ ▒ ▒▒ ▒ ▒ ▒ ] n o n a lph a n um>
at junit.framework.Assert.assertEquals(Assert.java:100)
at junit.framework.TestCase.assertEquals(TestCase.java:261)
at groovy.util.GroovyTestCase.assertEquals(GroovyTestCase.java:284)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at groovy.lang.MetaClassImpl.invokeStaticMethod(MetaClassImpl.java:1466)
at org.codehaus.groovy.runtime.callsite.StaticMetaClassSite.callStatic(StaticMetaClassSite.java:65)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallStatic(CallSiteArray.java:56)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:194)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:214)
at me.xdrop.fuzzywuzzy.algorithms.DefaultStringProcessorTest.testProcess(DefaultStringProcessorTest.groovy:9)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at junit.framework.TestCase.runTest(TestCase.java:176)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
at junit.framework.TestSuite.runTest(TestSuite.java:252)
at junit.framework.TestSuite.run(TestSuite.java:247)
at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:86)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

License Question

Hello and thank you for publishing this awesome library! I had a question for you regarding the licensing. I wrote a collection of UDFs for Apache Drill that essentially is a wrapper for your library and would like to submit it to Drill, however the GPL license is not compatible with the Apache license.
Would you consider re-releasing this under a different license so that it could be included in a future release of Drill? (https://www.apache.org/legal/resolved.html#category-x)
Thanks!
-- Charles

fuzzywuzzy search gives 86% for all mismatches, or for incorrect match

Thanks for creating this Java API. it is really useful.

But i am facing one issue, I need to match some addresses in big address list (6000+ records). I am using ExtractOne method.

It works perfect if similar address is in the List. It give correct score (87%-100%).

But if it doesn't find good match, it always gives me 86% match even both addresses are totally different.
Example -
Addr 1 - HUNTINGTON NATIONAL BANK 328 SOUTH SAGINAW ST  FLINT MI 48502
It matches to - BANK OF WEST PO BOX 2000  OMAHA NE 68103
and give Score - 86%

Still Incompatibility with the Python Version

I saw that there was a new version 1.3.4, so I used it, but I think that underscore-handling issue is not fixed - all the examples now return 100...

Here is how I run them in Java:
System.out.println("expected 58 -> got " + FuzzySearch.tokenSetPartialRatio("worm_mikeala", "mikeala rath"));
System.out.println("expected 80 -> got " + FuzzySearch.tokenSetPartialRatio("c_wasyluka", "crystal wasyluka"));
System.out.println( "expected 78 -> got " + FuzzySearch.tokenSetPartialRatio("a_bacdefg", "crystal bacdefg"));

I get:
expected 58 -> got 100
expected 80 -> got 100
expected 78 -> got 100

and here is how I run them in Python:
from fuzzywuzzy import fuzz
if name == 'main':
print(fuzz.partial_token_set_ratio("worm_mikeala", "mikeala rath"))
print(fuzz.partial_token_set_ratio("c_wasyluka", "crystal wasyluka"))
print(fuzz.partial_token_set_ratio("a_bacdefg", "crystal bacdefg"))

I get:
58
80
78

Am I doing something wrong or is there still an issue?

Inconsistent results from extractOne and extractTop

I could see different results are returned when using methods extractOne and extractTop on the same query string and collections.

I have a pretty long list of collection (15k Strings) to search for each query.

For Instance, let's say I have the following scenario
Query - ABC 1721
The collection has following strings in it
ABC1721
ABC1721-FGH/L9
ABC MERAKI Z1
EFGD3111/Z1-ABC
and many more

extractOne("ABC 1721", collection)
gives - ABC1721, Ratio - 95

extractTop("ABC 1721", collection,1)
gives - ABC1721, Ratio - 95

but the problem arose when I want the top 5 results
extractTop("ABC 1721", collection,5)
Match 1 - ABC1721-FGH/L9, Ratio - 86
Match 2 - ABC MERAKI Z1, Ratio - 86
Match 3 - EFGD3111/Z1-ABC, Ratio - 86
and so on

I tried using 'extractSorted' as well, it doesn't give consistent results as extractOne.

I used extractTop (for top 5) and extractOne for 1000+ queries. Around 70% of the 1st Match from extractTop doesn't match with the result of extractOne

BTW, I would like to appreciate your efforts on porting the python logic to Java without any performance lag

levenshtein distance issue

levEditDistance("sf&t co., ltd.","sft",1) = 13 when it is actually 11.

apache commons StringUtils.getLevenshteinDistance gives the correct result.

Wrong score in Partial Ratio

Hi,

I am using 1.4.0, this gives a wrong results with partial ratio:

FuzzySearch.partialRatio("ttttttttt virtuale ggggggggggggvo zizzrztuta mmmmmle", "virtuale"); 

the score is 50, it has to be 100 imho.

The python version returns 100 too:

>>> fuzz.partial_ratio("ttttttttt virtuale ggggggggggggvo zizzrztuta mmmmmle", "virtuale")
100

Thanks for the help

Is there a security scanning performed on this project?

I am very thankful to the contributors for this Java fuzzy match library with the most popular matching algorithms.

Is there a GitHub security scanning performed on this project? I did not observe a scanning policy under the security page but understand there are multiple options to implement scanning where that policy may not exist.

Include index in match result

It would be useful to also get the index of the matched item for each match in the result list.

Example

FuzzySearch.extractTop("goolge", ["google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl"], 3)
[(string: google, score:83, index:0), (string: googleplus, score:63, index:5), (string: plexoogl, score:43, index:7)]

How does this library handle upper and lower case?

When comparing strings, the strings' capitalization affects the value returned. It appears this library is case sensitive. What are the parameters for CAPS vs lowercase? How much does the value decrease if a text such as "fuzzywuzzy" was matched with "FuZzYwUzZy" vs "fuzzywuzzy"?

Very curious!

Incorrect levenshtein distance for completely edited strings

When I calculate the ratio of "abcdef" - "fedcba" , it results in 17, even though I expected 0.

The ratio calculation is as I understand it: r = ( 1 - d/L)*100 ,
with d being the Levenshtein distance and L the sum of the two compared strings.

In this library the levenshtein distance is valued with 1 for each insert/delete and 2 for each replace.

The levenshtein distance in this library, for these two strings should be 12 (2 for each replace), resulting in a ratio = (1 - 12/12)*100 = 0

However, in your library, the ratio results in 17, instead of 0. This is because the distance it calculates is 10 instead of 12, resulting in (1-10/12)*100=17 .

This seems to be the case for string of any length, whith 100% replacements, as if 1 replacement is missed.

StringIndexOutOfBoundsException in partialratio

java.lang.StringIndexOutOfBoundsException: String index out of range: 49
at java.lang.String.substring(String.java:1963)
at com.xdrop.fuzzywuzzy.ratios.PartialRatio.apply(PartialRatio.java:43)
at com.xdrop.fuzzywuzzy.FuzzySearch.partialRatio(FuzzySearch.java:45)

test case:
FuzzySearch.partialRatio("pros holdings, inc.","settlement facility dow corning trust")

Using custom object Instead of String would lead to performance issue?

Hi, First of all, thanks @xdrop for work on this project.

I have a Spring boot Webflux project and I need to do a fuzzy search on one of the fields. I am using in-memory loading, as soon as my Application starts, I would load the fuzzy search list data in the respective list. On subsequent API calls

After reading the API docs, I have two approaches in my mind.

1. Approach first

Use the list of string keys in a variable and a map of keys to the equivalent object in another variable. Fuzzy search using the list of keys. When I get the response back, map the key to the object and return

data class WeatherData(val key: String, val region: String)

// Service function for getting fuzzy search extracted Result
@Component
class FuzzySearchClient(val keys: MutableList<String>, val keysToWeatherDataMap: MutableMap<String, WeatherData> = mutableMapOf()) {

    fun fuzzySearchInMemory(query: String): Mono<List<SearchResponse>> {
        val result: List<ExtractedResult> = FuzzySearch.extractTop(query, keys, 5)
        val searchList: List<SearchResponse> = result.map { extractedResult: ExtractedResult ->
            val WeatherData = keysToWeatherDataMap[extractedResult.string]
            SearchResponse(WeatherData?.key!!, WeatherData.region!!)
        }
        return Mono.just(searchList)
    }
}

Function for adding keys in memory takes ~6s with approach one

@Component
class LoadDataInMemoryCache(
    private val weatherDataRepository: WeatherDataRepository,
    private val searchClient: FuzzySearchClient
) {

    private val logger = KotlinLogging.logger {}


    @EventListener(ApplicationReadyEvent::class)
    fun loadData() {
        val startTime = AtomicReference<Long>()
        weatherDataRepository.findAll()
            .doOnSubscribe { startTime.set(System.nanoTime()) }
            .doFinally { logger.info("Time taken for adding data in memory ${TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startTime.get())} milliseconds.") }
            .subscribe {
                searchClient.keys.add(WeatherData(it?.key!!, it.region!!))
            }
    }
}

2. Approach two

Use weather object keys and define ToStringFunction and get the result and map to the appropriate response.

data class WeatherData(val key: String, val region: String)

data class SearchResponse(val key: String, val region: String)

class WeatherSearchToStringFunction: ToStringFunction<WeatherData> {
    override fun apply(item: WeatherData?): String {
        return item?.key!!
    }
}


@Component
class SearchClient(val keys: MutableList<WeatherData>) {
    fun fuzzySearchInMemory(query: String): Mono<List<SearchResponse>> {
        val result: MutableList<BoundExtractedResult<WeatherData>> = FuzzySearch.extractTop(query, keys, WeatherSearchToStringFunction(), 5)
        val searchList: List<SearchResponse> = result.map { extractedResult: BoundExtractedResult<WeatherData> ->
            SearchResponse(extractedResult.referent?.key!!, extractedResult.referent.region)
        }
        return Mono.just(searchList)
    }
}

I am not sure which approach to perform. Suggestions are welcomed.

PartialRatio issue

This is essentially reopening issue #39, since the introduced fix does not solve the problem, but just makes it work for this explicit example.
E.g.

FuzzySearch.partialRatio("no", "bnonco");

should return the score 100.
This worked until #80, but returns the score 50 after reordering the cases

Is there somewhere I can find out what the different methods do?

I'm not familiar with the Python version of this; I gather from the readme that there are several different method calls that do different things with matching. I've found:

ratio
partialRatio
various tokenX methods
weightedRatio
various extractX methods

I generated the javadoc, but that didn't explain what these different methods do. I think the fuzzy matching could be very useful in what I'm doing, but just using ratio is a bit limiting, and I don't know what the other ones do. Is there documentation of what these things mean and do somewhere?

Thanks for the Library, Here's How I Used It!

Great Library and great work @xdrop ! Thank you so much for creating something that works for Android and sharing it! This library does something I have no idea how to do and would take me countless hours to create :D

Initially, I could not figure out a way to use the library. I have a SQLite database full of text values that I want to search. Unfortunately, neither SQL nor this library has an interface to do fuzzy search without a Full Table Search. Thankfully, I found a great workaround that uses my current dependencies.

This library works great with FlexibleAdapter (https://github.com/davideas/FlexibleAdapter). FlexibleAdapter has a builtin Async filtering mechanism that is extremely fast. Using the code below, I am able to filter my entire listview smoothly and with animations!

    @Override
    public boolean filter(String constraint) {
        Integer fuzzyRatio = FuzzySearch.partialRatio(title.toLowerCase(), constraint.toLowerCase());
        Log.d("Fuzzy Search Ratio", String.valueOf(fuzzyRatio));
        if (fuzzyRatio >= 70 || title.toLowerCase().trim().contains(constraint))
            return true;
        return false;
    }

I find that 70 is a really good value when using partial ratio.
Thanks to this library, I can provide an experience rivaling Google and Facebook! 🥇

Divide by zero exception when using Basic Algorithm

Upon using a string that has only non-alphanumeric characters (Eg: "$#"), The Basic Algorithm throws the following exception
java.lang.ArithmeticException: / by zero at me.xdrop.fuzzywuzzy.algorithms.WeightedRatio.apply(WeightedRatio.java:32) at me.xdrop.fuzzywuzzy.algorithms.BasicAlgorithm.apply(BasicAlgorithm.java:22) at me.xdrop.fuzzywuzzy.Extractor.extractWithoutOrder(Extractor.java:43) at me.xdrop.fuzzywuzzy.Extractor.extractTop(Extractor.java:100)

I believe this is resulting due to the String processor replacing the characters by spaces and then trimming it which results in the string length to become zero

Could not find library in gradle

I am trying to use this library for my android studio project.
But I am facing this issue.

Could not find me.xdrop:fuzzywuzzy:1.3.0.
Required by:
    project :app
Search in build.gradle files

Can someone help with this?
Thanks

Can we priortize results to push first appears over top

hi, I am using this library for a small set of data that has 10k records. But for some strings, I am getting results in the wrong order.

for list of choices query: "Visa"

choices = ["grupo televisa s.a.", "is", "sa", "visa inc.", "via"]

// result
('grupo televisa s.a.', 90), ('is', 90), ('sa', 90), ('visa inc.', 90)
``

I want the Visa string to appear in the first place. how can I achieve that?

wronng ratio

FuzzySearch.ratio("csr", "c s r") = 50.
Actual value is 75. ((8-2)/8)

Difference in extractOne results compared to Python version

I just noticed a difference in the results of extractOne between the Python and Java version.
My token is 19 craven park harlesden and my choices are ["NW10 8SU", "19 Craven Park, Harlesden", "Steven Gerrard"].

In the Python version, the following code:

process.extractOne(query, choices, scorer=fuzz.ratio)

produces:

('19 Craven Park, Harlesden', 98)

In the Java version, the following code:

 ExtractedResult result = FuzzySearch.extractOne(query, choices, new SimpleRatio());

matches 19 Craven Park, Harlesden but with a score of 86 score instead.

I dug a bit deeper into this and found that you can get 86 but doing a direct ratio comparison in the Python version:

fuzz.ratio("19 Craven Park, Harlesden", "19 craven park harlesden") gives 86

However, in the extractOne function in Python, it first processes the string by calling full_process in utils.py before calling the ratio function. From the results of the Java version, it seems this it is not processing the string in the same way before calling SimpleRatio().

It's either this or I am making some mistake in calling the function. Could you please shed some light on this.

GPL - v2 or v3?

Hello,

In #35 you've noted that "this is a rewrite of https://github.com/seatgeek/fuzzywuzzy, which forces this to be licensed under the same license (GPL) as the original library."

The Python package is licensed under GPL-2.0 without clarification if it's GPL-2.0-or-later or GPL-2.0-only, and some implication in the commit message and the timing of when the Python project was relicensed from MIT to GPL-2.0 that it was probably meant to be GPL-2.0-only.

This port has a GPL-3.0 license file.

Was it your intention to license this project under GPL-2.0 to match the license of the original project? If so, would you have any objection to taking the GPL-2.0 license text instead of GPL-3.0?

Thanks!

Difference in PartialScore between Java and Python Implementations

Hi,

I noticed when testing the values outputted from the Java implementation that given:
s1 = "haeagen dazs"
s2 = "liverpool altabrisa"
The Java implementation for PartialScore outputs 25, while the python implementation (fuzz.partial_ratio(s1,s2)) outputs 29. Wanted to report this discrepancy, and was wondering if anyone knew the cause of it (maybe rounding issues?)?

Thank you!
Screen Shot 2023-01-11 at 4 58 11 PM

FuzzyWuzzy MIT?

There's a mit version in python

Can we have the same for java?

The license is the biggest issue i and 90%other developers are facing

And the worst thing is there is no alternate library in java with bare minimum performance like this library

I've searched everywhere

Levenshtein distance port for java is available but it performs very poorly for use case when you match users input (2-3chars) with list of strings
Eg matching "sai" with school names

NoClassDefFoundError

I got this error while calling

FuzzySearch.tokenSortRatio(stringA, stringB) + FuzzySearch.tokenSetRatio(stringA, stringB)

stackTrace: java.lang.RuntimeException: java.lang.NoClassDefFoundError: me/xdrop/fuzzywuzzy/FuzzySearch

I imported this library as a gradle dependency

implementation 'me.xdrop:fuzzywuzzy:1.3.1'

It doesn't look like an issue caused by transitive dependency.

./gradlew dependencies

+--- com.jayway.jsonpath:json-path:2.4.0 (*)
+--- me.xdrop:fuzzywuzzy:1.3.1

v1.3.0 I can't find the .pom file

Hello,
I'm trying to use v 1.3.0 but I'm facing the following error

Could not find me.xdrop:fuzzywuzzy:1.3.0.
Searched in the following locations:

Possible solution:

I can't find the .pom file in the following directories
https://repo.jfrog.org/artifactory/libs-release-bintray/me/xdrop/fuzzywuzzy/1.3.0/
https://repo.maven.apache.org/maven2/me/xdrop/fuzzywuzzy/1.3.0/

am I missing something?
I have the following repositories defined in my build.gradle

repositories {
  jcenter()
  mavenCentral()
}

partial ratio issue

FuzzySearch.partialRatio("chicago transit authority" , "cta") expected value=67

The actual value is 33.

partialRatio issue

FuzzySearch.partialRatio("kaution", "kdeffxxxiban:de1110010060046666666datum:16.11.17zeit:01:12uft0000899999tan076601testd.-20-maisonette-z4-jobas-hagkautionauszug");

Result is "57", I expect "100".

Using 1.1.9.

Performance issue

Thank you for this awesome library, using it for my android project. it taking a lot of time as I am inputting array list of strings for comparison for each time user enters new character it will be called.
is there anyway I can improve its performance....

Results differ from python library

Hi, while porting some python code to java I discovered that the Token Sort and Token Set Ratios calculated by this library oftentimes do not match the ones calculated by the python fuzzywuzzy library.

Here is an example:
Python Code:

from fuzzywuzzy import fuzz 
print(str(fuzz.token_sort_ratio("efwe fwef","wef wefwef"))) 
print(str(fuzz.token_set_ratio("efwe fwef","wef wefwef"))) 

Output:

53
53

Java Code:

import me.xdrop.fuzzywuzzy.FuzzySearch;

public class Main {
	public static void main(String[] args) {
		System.out.println(FuzzySearch.tokenSortRatio("efwe fwef","wef wefwef"));
		System.out.println(FuzzySearch.tokenSetRatio("efwe fwef","wef wefwef"));
	}
}

Output:

84
84

Where is this difference coming from? Shouldn't these two outputs be equal?

partial ratio issue

FuzzySearch.partialRatio("ola middle school", "henry county board of education")=29
FuzzySearch.partialRatio("henry county board of education", "ola middle school")=35

Shouldn't they be same?

Incompatibility with the Python version in handling underscores

The FuzzySearch.tokenSetPartialRatio() method returns different results than the Python version for strings that contain underscore.

Examples:

  • FuzzySearch.tokenSetPartialRatio("worm_mikeala", "mikeala rath") returns 74 while the Python version returns 58
  • FuzzySearch.tokenSetPartialRatio("c_wasyluka", "crystal wasyluka") returns 100 while the Python version returns 80

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.