Code Monkey home page Code Monkey logo

js-levenshtein's Introduction

js-levenshtein Build Status

A very efficient JS implementation calculating the Levenshtein distance, i.e. the difference between two strings.

Based on Wagner-Fischer dynamic programming algorithm, optimized for speed and memory

  • use a single distance vector instead of a matrix
  • loop unrolling on the outer loop
  • remove common prefixes/postfixes from the calculation
  • minimize the number of comparisons
  • always allocate a new distance vector in order to not leak memory

Install

$ npm install --save js-levenshtein

Usage

const levenshtein = require('js-levenshtein');

levenshtein('kitten', 'sitting');
//=> 3

Benchmark

$ npm run bench
  
                      50 paragraphs, length max=500 min=240 avr=372.5
             162 op/s » js-levenshtein
              98 op/s » talisman
              94 op/s » levenshtein-edit-distance
              85 op/s » leven
              39 op/s » fast-levenshtein

                      100 sentences, length max=170 min=6 avr=57.5
           3,076 op/s » js-levenshtein
           2,024 op/s » talisman
           1,817 op/s » levenshtein-edit-distance
           1,633 op/s » leven
             800 op/s » fast-levenshtein

                      2000 words, length max=20 min=3 avr=9.5
           3,119 op/s » js-levenshtein
           2,416 op/s » talisman
           2,141 op/s » levenshtein-edit-distance
           1,855 op/s » leven
           1,260 op/s » fast-levenshtein

Benchmarks was performed with node v8.12.0 on a MacBook Pro 15", 2.9 GHz Intel Core i9

License

MIT © Gustaf Andersson

js-levenshtein's People

Contributors

georgyberdyshev avatar gustf avatar mvasilkov avatar tech6hutch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

js-levenshtein's Issues

precomputing against a list of strings

Perhaps not with this library specifically, but would you happen to know if it's possible to in essence "pre-compute" part of the levenshtein distance if we know what one argument will be? eg. I have a fixed set of strings, and the other input is always unknown.

That could nudge the performance even further I presume.

How to use it to search for a substring(or nearly it) in a large blob of text?

For example in the following text:

Easier to reason about useMemo can make your code easier to reason about by making it clear which values are being calculated and when.
Follow-up questions
What are some other ways to improve the performance of React applications?

I want to locate and split by sting like : Follow-up questions, Follow-up questions:, Follow-up question, Followup questions , Follow-up questions=, follow-up questions:= .
I know searching by regex is an option by I am looking to explore levenshtein to keep room for an unexpected letter.

Case Insensitive Distance

Have you considered offering a case-insensitive version of the levenshtein distance algorithm?

Would be useful for folks performing fuzzy searches.

TypeScript typings

This project doesn't have a typings file, which makes it hard to use in a TypeScript project. I didn't find any under @types, either.

Question

hey I got a question, does it count different cases? like "something" and "SOMEthing" won't be 100%?

Increase compatibility

This library works fast, but its compatibility is very limited. Using a few lines like in old fast-levenshtein allows the algorithm to be used with AMD, requireJS, Web workers and commonJs.

Early stop when maximum distance reached [feature request]

Hello,

Thank you for this implementation. It is indeed very fast!

I am trying to optimize the algorithm for the specific (but rather common) case where we look for only close items. Stopping early when we know that the distance will certainly be over a given maximum distance can make the algorithm much faster.

I get already a good speed improvement (x2) by stopping early at the very beginning here and/or here with something like:

if (typeof max === 'number' && lb - la >= max) return max;

But when I try to stop early in the for loops below, by keeping the minimum value of vector and stopping if min + lb - la > max, it works but gets slower ;(

Do you have any idea of how to implement this feature in an optimized way?
The best would be of course to support this option directly in your code ;) but any advice is welcome!

Thank you very much for your help.

Cheers

Optimize for long strings

I was thinking if var vector = new Array(2 * la) would help instead of var vector = [];? In theory arrays are created with predefined size and they are reallocated if that size is exceeded (with some new limit estimate). I don't know how the browsers optimize it today, but surely the length constructor exists for a reason.

Improve Code Readability

I'm going to try my hand at this if I can wrap my head around the code. As it stands, it's fairly hard to understand what on earth is going on.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.