Code Monkey home page Code Monkey logo

Comments (11)

maxbachmann avatar maxbachmann commented on May 18, 2024

It is not possible to use python-Levenshtein in a MIT Licensed library, which is the reason FuzzyWuzzy has the GPL license.
The MIT licensed version only uses difflib (slow), but is still a bit questionable, since it is based on a FuzzyWuzzy version, which was already GPL licensed (I do not think Seatgeek cares, since they originally released FuzzyWuzzy under the MIT licensed).
I wrote a faster alternative implementation for C++/Python (https://github.com/maxbachmann/RapidFuzz). This implementation could be ported to Java. However since I am not very familiar with Java, this would require someone else to maintain the implementation (I am willing to help with questions regarding the algorithms)

from fuzzywuzzy.

Chase22 avatar Chase22 commented on May 18, 2024

@maxbachmann One could also look into directly using the C code with a Java Native Interface without porting the code, i don't have a lot of experience what that means performance wise though

from fuzzywuzzy.

maxbachmann avatar maxbachmann commented on May 18, 2024

@Chase22 I do something similar already for Python. When using small strings with a fast similarity metric there is a relevant performance impact. However the main reason for this is that Python calls functions with a list of arguments and a hashmap of named arguments, which has to be parsed on each call.

I could think of the following advantages/disadvantages of the JNI:

+ probably less maintenance since it reuses a big part of the code. Note however that in Python the Wrapper to call the C++ code from Python is actually much bigger than the code (partially because much of the code is generated). The C++ library has around 5k lines of code, while the wrapper has over 50k lines of code.

- I guess it would have to be compiled for each platform, which can be a pain

-/+ performance wise I am unsure as well. The JNI might add relevant overhead (e.g. in case all strings have to be copied). However the algorithms make heavy use of bitwise operations, which might be slower in pure Java. So this might go either way.

from fuzzywuzzy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.