xdrop / fuzzywuzzy Goto Github PK
View Code? Open in Web Editor NEWJava fuzzy string matching implementation of the well known Python's fuzzywuzzy algorithm. Fuzzy search for Java
License: GNU General Public License v2.0
Java fuzzy string matching implementation of the well known Python's fuzzywuzzy algorithm. Fuzzy search for Java
License: GNU General Public License v2.0
I have the description of a YouTube video and I want to find if a specific word appears in the text, including typos. For example take the following description
If you tell me you're super busy, I'm going to ask to see your written plan.\n\nMy book "10 Steps to Earning Awesome Grades" is now out and it's free! Get it here:\n\nhttp://collegeinfogeek.com/get-better-grades/\n\nIf you want to get even more strategies and tips on becoming a more productive, successful student, subscribe to my channel right here:\n\nhttp://buff.ly/1vQP5ar\n\nConnect with me on Twitter!\n\nhttps://twitter.com/TomFrankly\n\nCompanion blog post with notes and resource links: \n\nhttp://collegeinfogeek.com/massive-workloads/
I would like to know if the word twitter
is present in the description. I would then do
FuzzySearch.extractOne(videoDescription, Arrays.asList("Twitter"))
// (string: Twitter, score: 57, index: 0)
And if the text has typos the score decreases as expected.
Is this a good use for the library?
First of all, thank you for this great library.
However, there's a small issue I have with it: For one of my projects I'm implementing a search for JavaDoc methods and have a class JavadocMethod
with methods like getMethodName()
, getClassName()
and getUrl()
.
For searching it would be very convenient to just use the object itself for search, so I can access the url of the found method.
I'm thinking about a generic solution like this:
public static <T> List<ExtractedResult<T>> extractTop(String query, Collection<T> choices, Function<T, String> mapper, int limit)
which allows to use any object by just providing a function which maps this object to a string.
Collection<JavadocMethod> methods = ...;
FuzzySearch.extractTop("String#valeuOf(loong)", methods, method -> String.format("%s#%s", method.getClassName(), method.getMethodName()), 5);
Can you imagine implementing such a feature or accept a pull requests that adds it?
when creating the module-info.java file, it generates fuzzywuzzy and says "name unstable". Correcting it to me.xdrop.fuzzywuzzy as the instructions say, says "module me.xdrop.fuzzywuzzy cannot be resolved to a module"
In the python fuzzy-wuzzy, we can set the scorer we want to use in extracting the result. How we can do it here?
process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)
Do we have any gitter, discord in order to ask such questions?
.
Here is the example where I was stuck. The python implemention gets 22, and the java implementation gets 23:
fuzz.token_set_ratio(
"Vêndo ou troco por outro carro pode ser atrasado negócio volta ",
"Titan 150 ano 2005 ",
False
)
FuzzySearch.tokenSetRatio(
"Vêndo ou troco por outro carro pode ser atrasado negócio volta",
"Titan 150 ano 2005"
);
Debugging both code I could find that the problem is when rounding the value: 22.5
Python code, located in utils.py
:
int(round(n))
Java code, located in SimpleRatio
class is:
(int) Math.round(100 * DiffUtils.getRatio(s1, s2));
TLDR:
Java: Math.round(22.5)
=> 23
Python: round(22.5)
=> 22
Don't know which one is correct for this algorithm...
When I search word that doesn't exists in the data set for comparison it will suggest incorrectly or it cannot detect if the word misspelled or not
ArrayList<String> dataSet = new ArrayList<>();
dataSet.add("Iphone");
dataSet.add("white");
dataSet.add("black");
dataSet.add("Samsung");
dataSet.add("galaxy");
dataSet.add("gallileo");
dataSet.add("galaksi");
dataSet.add("harry");
dataSet.add("potter");
//string to be compared
String[] searchKeyword = new String[] {"hari poter", "smsung glxy", "xiaomi mi2", "jamu godhong telo"};
for(int i=0;i<searchKeyword.length;i++) {
String[] keywords = searchKeyword[i].split(" ");
long start = System.currentTimeMillis();
List<String> checked = new ArrayList<>();
Arrays.asList(keywords).stream().sequential().forEach(keyword ->{
ExtractedResult res = FuzzySearch.extractOne(keyword, dataSet);
checked.add(res.getString());
});
long end = System.currentTimeMillis() - start;
System.out.println(String.format("keyword:%s , spell-checked: %s took:%d", searchKeyword[i], checked, end));
}
Result will be like this
keyword:hari poter , spell-checked: [harry, potter] took:123
keyword:smsung glxy , spell-checked: [Samsung, galaxy] took:6
keyword:xiaomi mi2 , spell-checked: [Iphone, white] took:5
keyword:jamu godhong telo , spell-checked: [Samsung, Iphone, gallileo] took:8
Hi. Ratio value between words "гигантская" and "гигансткая" is 90.
In my opinion, here something is wrong. Or is this a normal result of the library?
Hi,
I want to convert your fuzzywuzzy codes to apex class language (language in Salesforce cloud), which has very similar syntax to Java. But currently I'm only planning to use SimpleRatio and PartialRatio. Am I allowed to do that? I also plan to opensource the result to my own project
Thank you in advance!
mvn install causes the following test failure on win 7 in gitbash with java 1.8.0_144
Running me.xdrop.fuzzywuzzy.algorithms.DefaultStringProcessorTest
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.288 sec <<< FAILURE!
testProcess(me.xdrop.fuzzywuzzy.algorithms.DefaultStringProcessorTest) Time elapsed: 0.075 sec <<< FAILURE!
junit.framework.ComparisonFailure: expected:<s trim [μεγιουνικουντ] n o n a lph a n um> but was:<s trim [▒ ▒▒ ▒ ▒ ▒ ▒ ▒ ▒▒ ▒ ▒ ▒ ] n o n a lph a n um>
at junit.framework.Assert.assertEquals(Assert.java:100)
at junit.framework.TestCase.assertEquals(TestCase.java:261)
at groovy.util.GroovyTestCase.assertEquals(GroovyTestCase.java:284)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at groovy.lang.MetaClassImpl.invokeStaticMethod(MetaClassImpl.java:1466)
at org.codehaus.groovy.runtime.callsite.StaticMetaClassSite.callStatic(StaticMetaClassSite.java:65)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallStatic(CallSiteArray.java:56)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:194)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:214)
at me.xdrop.fuzzywuzzy.algorithms.DefaultStringProcessorTest.testProcess(DefaultStringProcessorTest.groovy:9)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at junit.framework.TestCase.runTest(TestCase.java:176)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
at junit.framework.TestSuite.runTest(TestSuite.java:252)
at junit.framework.TestSuite.run(TestSuite.java:247)
at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:86)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
Hello and thank you for publishing this awesome library! I had a question for you regarding the licensing. I wrote a collection of UDFs for Apache Drill that essentially is a wrapper for your library and would like to submit it to Drill, however the GPL license is not compatible with the Apache license.
Would you consider re-releasing this under a different license so that it could be included in a future release of Drill? (https://www.apache.org/legal/resolved.html#category-x)
Thanks!
-- Charles
Thanks for creating this Java API. it is really useful.
But i am facing one issue, I need to match some addresses in big address list (6000+ records). I am using ExtractOne method.
It works perfect if similar address is in the List. It give correct score (87%-100%).
But if it doesn't find good match, it always gives me 86% match even both addresses are totally different.
Example -
Addr 1 - HUNTINGTON NATIONAL BANK 328 SOUTH SAGINAW ST FLINT MI 48502
It matches to - BANK OF WEST PO BOX 2000 OMAHA NE 68103
and give Score - 86%
I saw that there was a new version 1.3.4, so I used it, but I think that underscore-handling issue is not fixed - all the examples now return 100...
Here is how I run them in Java:
System.out.println("expected 58 -> got " + FuzzySearch.tokenSetPartialRatio("worm_mikeala", "mikeala rath"));
System.out.println("expected 80 -> got " + FuzzySearch.tokenSetPartialRatio("c_wasyluka", "crystal wasyluka"));
System.out.println( "expected 78 -> got " + FuzzySearch.tokenSetPartialRatio("a_bacdefg", "crystal bacdefg"));
I get:
expected 58 -> got 100
expected 80 -> got 100
expected 78 -> got 100
and here is how I run them in Python:
from fuzzywuzzy import fuzz
if name == 'main':
print(fuzz.partial_token_set_ratio("worm_mikeala", "mikeala rath"))
print(fuzz.partial_token_set_ratio("c_wasyluka", "crystal wasyluka"))
print(fuzz.partial_token_set_ratio("a_bacdefg", "crystal bacdefg"))
I get:
58
80
78
Am I doing something wrong or is there still an issue?
I could see different results are returned when using methods extractOne and extractTop on the same query string and collections.
I have a pretty long list of collection (15k Strings) to search for each query.
For Instance, let's say I have the following scenario
Query - ABC 1721
The collection has following strings in it
ABC1721
ABC1721-FGH/L9
ABC MERAKI Z1
EFGD3111/Z1-ABC
and many more
extractOne("ABC 1721", collection)
gives - ABC1721, Ratio - 95
extractTop("ABC 1721", collection,1)
gives - ABC1721, Ratio - 95
but the problem arose when I want the top 5 results
extractTop("ABC 1721", collection,5)
Match 1 - ABC1721-FGH/L9, Ratio - 86
Match 2 - ABC MERAKI Z1, Ratio - 86
Match 3 - EFGD3111/Z1-ABC, Ratio - 86
and so on
I tried using 'extractSorted' as well, it doesn't give consistent results as extractOne.
I used extractTop (for top 5) and extractOne for 1000+ queries. Around 70% of the 1st Match
from extractTop
doesn't match with the result of extractOne
BTW, I would like to appreciate your efforts on porting the python logic to Java without any performance lag
levEditDistance("sf&t co., ltd.","sft",1) = 13 when it is actually 11.
apache commons StringUtils.getLevenshteinDistance gives the correct result.
Hi,
I am using 1.4.0, this gives a wrong results with partial ratio:
FuzzySearch.partialRatio("ttttttttt virtuale ggggggggggggvo zizzrztuta mmmmmle", "virtuale");
the score is 50
, it has to be 100
imho.
The python version returns 100
too:
>>> fuzz.partial_ratio("ttttttttt virtuale ggggggggggggvo zizzrztuta mmmmmle", "virtuale")
100
Thanks for the help
I am very thankful to the contributors for this Java fuzzy match library with the most popular matching algorithms.
Is there a GitHub security scanning performed on this project? I did not observe a scanning policy under the security page but understand there are multiple options to implement scanning where that policy may not exist.
It would be useful to also get the index of the matched item for each match in the result list.
Example
FuzzySearch.extractTop("goolge", ["google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl"], 3)
[(string: google, score:83, index:0), (string: googleplus, score:63, index:5), (string: plexoogl, score:43, index:7)]
When I try to use FuzzySearch.weightedRatio("lupa","pupa") on Android 5.1, I receive ExceptionInInitializerError. I use FuzzySearch library by Gradle: compile 'me.xdrop:fuzzywuzzy:1.1.5'. Have you any idea?
P.S.: Android v4 do it well on all 4.* versions.
When comparing strings, the strings' capitalization affects the value returned. It appears this library is case sensitive. What are the parameters for CAPS vs lowercase? How much does the value decrease if a text such as "fuzzywuzzy" was matched with "FuZzYwUzZy" vs "fuzzywuzzy"?
Very curious!
When I calculate the ratio of "abcdef" - "fedcba" , it results in 17, even though I expected 0.
The ratio calculation is as I understand it: r = ( 1 - d/L)*100 ,
with d being the Levenshtein distance and L the sum of the two compared strings.
In this library the levenshtein distance is valued with 1 for each insert/delete and 2 for each replace.
The levenshtein distance in this library, for these two strings should be 12 (2 for each replace), resulting in a ratio = (1 - 12/12)*100 = 0
However, in your library, the ratio results in 17, instead of 0. This is because the distance it calculates is 10 instead of 12, resulting in (1-10/12)*100=17 .
This seems to be the case for string of any length, whith 100% replacements, as if 1 replacement is missed.
Beforementioned in #29
java.lang.StringIndexOutOfBoundsException: String index out of range: 49
at java.lang.String.substring(String.java:1963)
at com.xdrop.fuzzywuzzy.ratios.PartialRatio.apply(PartialRatio.java:43)
at com.xdrop.fuzzywuzzy.FuzzySearch.partialRatio(FuzzySearch.java:45)
test case:
FuzzySearch.partialRatio("pros holdings, inc.","settlement facility dow corning trust")
Hi, First of all, thanks @xdrop for work on this project.
I have a Spring boot Webflux project and I need to do a fuzzy search on one of the fields. I am using in-memory loading, as soon as my Application starts, I would load the fuzzy search list data in the respective list. On subsequent API calls
After reading the API docs, I have two approaches in my mind.
Use the list of string keys in a variable and a map of keys to the equivalent object in another variable. Fuzzy search using the list of keys. When I get the response back, map the key to the object and return
data class WeatherData(val key: String, val region: String)
// Service function for getting fuzzy search extracted Result
@Component
class FuzzySearchClient(val keys: MutableList<String>, val keysToWeatherDataMap: MutableMap<String, WeatherData> = mutableMapOf()) {
fun fuzzySearchInMemory(query: String): Mono<List<SearchResponse>> {
val result: List<ExtractedResult> = FuzzySearch.extractTop(query, keys, 5)
val searchList: List<SearchResponse> = result.map { extractedResult: ExtractedResult ->
val WeatherData = keysToWeatherDataMap[extractedResult.string]
SearchResponse(WeatherData?.key!!, WeatherData.region!!)
}
return Mono.just(searchList)
}
}
Function for adding keys in memory takes ~6s with approach one
@Component
class LoadDataInMemoryCache(
private val weatherDataRepository: WeatherDataRepository,
private val searchClient: FuzzySearchClient
) {
private val logger = KotlinLogging.logger {}
@EventListener(ApplicationReadyEvent::class)
fun loadData() {
val startTime = AtomicReference<Long>()
weatherDataRepository.findAll()
.doOnSubscribe { startTime.set(System.nanoTime()) }
.doFinally { logger.info("Time taken for adding data in memory ${TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startTime.get())} milliseconds.") }
.subscribe {
searchClient.keys.add(WeatherData(it?.key!!, it.region!!))
}
}
}
Use weather object keys and define ToStringFunction and get the result and map to the appropriate response.
data class WeatherData(val key: String, val region: String)
data class SearchResponse(val key: String, val region: String)
class WeatherSearchToStringFunction: ToStringFunction<WeatherData> {
override fun apply(item: WeatherData?): String {
return item?.key!!
}
}
@Component
class SearchClient(val keys: MutableList<WeatherData>) {
fun fuzzySearchInMemory(query: String): Mono<List<SearchResponse>> {
val result: MutableList<BoundExtractedResult<WeatherData>> = FuzzySearch.extractTop(query, keys, WeatherSearchToStringFunction(), 5)
val searchList: List<SearchResponse> = result.map { extractedResult: BoundExtractedResult<WeatherData> ->
SearchResponse(extractedResult.referent?.key!!, extractedResult.referent.region)
}
return Mono.just(searchList)
}
}
I am not sure which approach to perform. Suggestions are welcomed.
I'm not familiar with the Python version of this; I gather from the readme that there are several different method calls that do different things with matching. I've found:
ratio
partialRatio
various tokenX methods
weightedRatio
various extractX methods
I generated the javadoc, but that didn't explain what these different methods do. I think the fuzzy matching could be very useful in what I'm doing, but just using ratio is a bit limiting, and I don't know what the other ones do. Is there documentation of what these things mean and do somewhere?
Great Library and great work @xdrop ! Thank you so much for creating something that works for Android and sharing it! This library does something I have no idea how to do and would take me countless hours to create :D
Initially, I could not figure out a way to use the library. I have a SQLite database full of text values that I want to search. Unfortunately, neither SQL nor this library has an interface to do fuzzy search without a Full Table Search. Thankfully, I found a great workaround that uses my current dependencies.
This library works great with FlexibleAdapter (https://github.com/davideas/FlexibleAdapter). FlexibleAdapter has a builtin Async filtering mechanism that is extremely fast. Using the code below, I am able to filter my entire listview smoothly and with animations!
@Override
public boolean filter(String constraint) {
Integer fuzzyRatio = FuzzySearch.partialRatio(title.toLowerCase(), constraint.toLowerCase());
Log.d("Fuzzy Search Ratio", String.valueOf(fuzzyRatio));
if (fuzzyRatio >= 70 || title.toLowerCase().trim().contains(constraint))
return true;
return false;
}
I find that 70 is a really good value when using partial ratio.
Thanks to this library, I can provide an experience rivaling Google and Facebook! 🥇
Hi, looks like the 1.2.0 on http://maven.org/ is still GPLv3. Could you please create a new release with the new license (GPLv2)?
Thanks!
Upon using a string that has only non-alphanumeric characters (Eg: "$#"), The Basic Algorithm throws the following exception
java.lang.ArithmeticException: / by zero at me.xdrop.fuzzywuzzy.algorithms.WeightedRatio.apply(WeightedRatio.java:32) at me.xdrop.fuzzywuzzy.algorithms.BasicAlgorithm.apply(BasicAlgorithm.java:22) at me.xdrop.fuzzywuzzy.Extractor.extractWithoutOrder(Extractor.java:43) at me.xdrop.fuzzywuzzy.Extractor.extractTop(Extractor.java:100)
I believe this is resulting due to the String processor replacing the characters by spaces and then trimming it which results in the string length to become zero
I am trying to use this library for my android studio project.
But I am facing this issue.
Could not find me.xdrop:fuzzywuzzy:1.3.0.
Required by:
project :app
Search in build.gradle files
Can someone help with this?
Thanks
Followed by #29 it would make sense to add StringProcessor overloads for the simple/partial ratios as well just so it is consistent with the rest.
hi, I am using this library for a small set of data that has 10k records. But for some strings, I am getting results in the wrong order.
for list of choices query: "Visa"
choices = ["grupo televisa s.a.", "is", "sa", "visa inc.", "via"]
// result
('grupo televisa s.a.', 90), ('is', 90), ('sa', 90), ('visa inc.', 90)
``
I want the Visa string to appear in the first place. how can I achieve that?
FuzzySearch.ratio("csr", "c s r") = 50.
Actual value is 75. ((8-2)/8)
I just noticed a difference in the results of extractOne between the Python and Java version.
My token is 19 craven park harlesden
and my choices are ["NW10 8SU", "19 Craven Park, Harlesden", "Steven Gerrard"]
.
In the Python version, the following code:
process.extractOne(query, choices, scorer=fuzz.ratio)
produces:
('19 Craven Park, Harlesden', 98)
In the Java version, the following code:
ExtractedResult result = FuzzySearch.extractOne(query, choices, new SimpleRatio());
matches 19 Craven Park, Harlesden
but with a score of 86
score instead.
I dug a bit deeper into this and found that you can get 86
but doing a direct ratio comparison in the Python version:
fuzz.ratio("19 Craven Park, Harlesden", "19 craven park harlesden")
gives 86
However, in the extractOne
function in Python, it first processes the string by calling full_process
in utils.py
before calling the ratio function. From the results of the Java version, it seems this it is not processing the string in the same way before calling SimpleRatio()
.
It's either this or I am making some mistake in calling the function. Could you please shed some light on this.
Hello,
In #35 you've noted that "this is a rewrite of https://github.com/seatgeek/fuzzywuzzy, which forces this to be licensed under the same license (GPL) as the original library."
The Python package is licensed under GPL-2.0 without clarification if it's GPL-2.0-or-later or GPL-2.0-only, and some implication in the commit message and the timing of when the Python project was relicensed from MIT to GPL-2.0 that it was probably meant to be GPL-2.0-only.
This port has a GPL-3.0 license file.
Was it your intention to license this project under GPL-2.0 to match the license of the original project? If so, would you have any objection to taking the GPL-2.0 license text instead of GPL-3.0?
Thanks!
Hi,
I noticed when testing the values outputted from the Java implementation that given:
s1 = "haeagen dazs"
s2 = "liverpool altabrisa"
The Java implementation for PartialScore outputs 25, while the python implementation (fuzz.partial_ratio(s1,s2)) outputs 29. Wanted to report this discrepancy, and was wondering if anyone knew the cause of it (maybe rounding issues?)?
There's a mit version in python
Can we have the same for java?
The license is the biggest issue i and 90%other developers are facing
And the worst thing is there is no alternate library in java with bare minimum performance like this library
I've searched everywhere
Levenshtein distance port for java is available but it performs very poorly for use case when you match users input (2-3chars) with list of strings
Eg matching "sai" with school names
I got this error while calling
FuzzySearch.tokenSortRatio(stringA, stringB) + FuzzySearch.tokenSetRatio(stringA, stringB)
stackTrace: java.lang.RuntimeException: java.lang.NoClassDefFoundError: me/xdrop/fuzzywuzzy/FuzzySearch
I imported this library as a gradle dependency
implementation 'me.xdrop:fuzzywuzzy:1.3.1'
It doesn't look like an issue caused by transitive dependency.
./gradlew dependencies
+--- com.jayway.jsonpath:json-path:2.4.0 (*)
+--- me.xdrop:fuzzywuzzy:1.3.1
Hello,
I'm trying to use v 1.3.0 but I'm facing the following error
Could not find me.xdrop:fuzzywuzzy:1.3.0.
Searched in the following locations:
- https://repo.maven.apache.org/maven2/me/xdrop/fuzzywuzzy/1.3.0/fuzzywuzzy-1.3.0.pom
- https://jcenter.bintray.com/me/xdrop/fuzzywuzzy/1.3.0/fuzzywuzzy-1.3.0.pom
Possible solution:
- Declare repository providing the artifact, see the documentation at https://docs.gradle.org/current/userguide/declaring_repositories.html
I can't find the .pom file in the following directories
https://repo.jfrog.org/artifactory/libs-release-bintray/me/xdrop/fuzzywuzzy/1.3.0/
https://repo.maven.apache.org/maven2/me/xdrop/fuzzywuzzy/1.3.0/
am I missing something?
I have the following repositories defined in my build.gradle
repositories {
jcenter()
mavenCentral()
}
FuzzySearch.partialRatio("chicago transit authority" , "cta") expected value=67
The actual value is 33.
FuzzySearch.partialRatio("kaution", "kdeffxxxiban:de1110010060046666666datum:16.11.17zeit:01:12uft0000899999tan076601testd.-20-maisonette-z4-jobas-hagkautionauszug");
Result is "57", I expect "100".
Using 1.1.9.
Thank you for this awesome library, using it for my android project. it taking a lot of time as I am inputting array list of strings for comparison for each time user enters new character it will be called.
is there anyway I can improve its performance....
Hi, while porting some python code to java I discovered that the Token Sort and Token Set Ratios calculated by this library oftentimes do not match the ones calculated by the python fuzzywuzzy library.
Here is an example:
Python Code:
from fuzzywuzzy import fuzz
print(str(fuzz.token_sort_ratio("efwe fwef","wef wefwef")))
print(str(fuzz.token_set_ratio("efwe fwef","wef wefwef")))
Output:
53
53
Java Code:
import me.xdrop.fuzzywuzzy.FuzzySearch;
public class Main {
public static void main(String[] args) {
System.out.println(FuzzySearch.tokenSortRatio("efwe fwef","wef wefwef"));
System.out.println(FuzzySearch.tokenSetRatio("efwe fwef","wef wefwef"));
}
}
Output:
84
84
Where is this difference coming from? Shouldn't these two outputs be equal?
Im trying to compre long string with partialRatio and I always get java.lang.OutOfMemoryError: Java heap space
FuzzySearch.partialRatio("ola middle school", "henry county board of education")=29
FuzzySearch.partialRatio("henry county board of education", "ola middle school")=35
Shouldn't they be same?
The FuzzySearch.tokenSetPartialRatio() method returns different results than the Python version for strings that contain underscore.
Examples:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.