Code Monkey home page Code Monkey logo

python-levenshtein's Introduction

MaintainerWanted

I am looking for a new maintainer to the project as it is apparent that I haven't had the need for this particular library for well over 7 years now, due to it being a C-only library and its somewhat restrictive original license.

The Levenshtein Python C extension module contains functions for fast computation of

  • Levenshtein (edit) distance, and edit operations
  • string similarity
  • approximate median strings, and generally string averaging
  • string sequence and set similarity

It supports both normal and Unicode strings.

Python 2.2 or newer is required; Python 3 is supported.

StringMatcher.py is an example SequenceMatcher-like class built on the top of Levenshtein. It misses some SequenceMatcher's functionality, and has some extra OTOH.

Levenshtein.c can be used as a pure C library, too. You only have to define NO_PYTHON preprocessor symbol (-DNO_PYTHON) when compiling it. The functionality is similar to that of the Python extension. No separate docs are provided yet, RTFS. But they are not interchangeable:

  • C functions exported when compiling with -DNO_PYTHON (see Levenshtein.h) are not exported when compiling as a Python extension (and vice versa)
  • Unicode character type used with -DNO_PYTHON is wchar_t, Python extension uses Py_UNICODE, they may be the same but don't count on it
pip install python-Levenshtein

gendoc.sh generates HTML API documentation, you probably want a selfcontained instead of includable version, so run in ./gendoc.sh --selfcontained. It needs Levenshtein already installed and genextdoc.py.

Levenshtein is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

See the file COPYING for the full text of GNU General Public License version 2.

This package was long missing from the Python Package Index and available as source checkout only, but can now be found on PyPI again.

We needed to restore this package for Go Mobile for Plone and Pywurfl projects which depend on this.

  • Maintainer: Antti Haapala <[email protected]>
  • Python 3 compatibility: Esa Määttä
  • Jonatas CD: Fixed documentation generation
  • Previous maintainer: Mikko Ohtamaa
  • Original code: David Necas (Yeti) <yeti at physics.muni.cz>

python-levenshtein's People

Contributors

felixonmars avatar joncasdam avatar kerstin avatar miohtama avatar ojomio avatar sandrotosi avatar stromnov avatar timworx avatar wor avatar ztane avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-levenshtein's Issues

Crashes python interpreter in seq* with simple arguments

Hello,
i'm forwarding Debian bug #597609:

$ gdb python
GNU gdb (GDB) 7.6 (Debian 7.6-5)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/python2.7...Reading symbols from /usr/lib/debug/usr/bin/python2.7...done.
done.
(gdb) run
Starting program: /usr/bin/python2.7 
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Python 2.7.5+ (default, Sep 17 2013, 15:31:50) 
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import Levenshtein
>>> Levenshtein.seqratio("hallo", "bla")

Program received signal SIGSEGV, Segmentation fault.
extract_stringlist (list='hallo', name=name@entry=0x7ffff64b2c00 "seqratio", n=n@entry=5, sizelist=sizelist@entry=0x7fffffffdd90, strlist=strlist@entry=0x7fffffffdd80) at Levenshtein.c:1166
1166    Levenshtein.c: No such file or directory.
(gdb) bt full
#0  extract_stringlist (list='hallo', name=name@entry=0x7ffff64b2c00 "seqratio", n=n@entry=5, sizelist=sizelist@entry=0x7fffffffdd90, strlist=strlist@entry=0x7fffffffdd80)
    at Levenshtein.c:1166
        i = <optimized out>
        first = <unknown at remote 0x-2bd86a95779d78df>
#1  0x00007ffff64ac59e in setseq_common (args=<optimized out>, name=name@entry=0x7ffff64b2c00 "seqratio", foo=..., lensum=lensum@entry=0x7fffffffddf8) at Levenshtein.c:1319
        n1 = 5
        n2 = 3
        strings1 = 0x0
        strings2 = 0x0
        sizes1 = 0x0
        sizes2 = 0x0
        strlist1 = 'hallo'
        strlist2 = 'bla'
        strseq1 = <optimized out>
        strseq2 = ['b', 'l', 'a']
        stringtype1 = <optimized out>
        stringtype2 = <optimized out>
        r = -1
#2  0x00007ffff64b0676 in seqratio_py (self=<optimized out>, args=<optimized out>) at Levenshtein.c:1251
        lensum = 8
        r = <optimized out>
#3  0x0000000000529e45 in call_function (oparg=<optimized out>, pp_stack=0x7fffffffdf00) at ../Python/ceval.c:4021
        flags = <optimized out>
        tstate = 0x9410a0
        func = <built-in function seqratio>
        w = <optimized out>
        na = <optimized out>
        nk = <optimized out>
        n = <optimized out>
        pfunc = 0xa2ed08
        x = <optimized out>
#4  PyEval_EvalFrameEx (f=f@entry=Frame 0xa2eb90, for file <stdin>, line 1, in <module> (), throwflag=throwflag@entry=0) at ../Python/ceval.c:2666
        sp = 0xa2ed10
        stack_pointer = <optimized out>
        next_instr = 0x7ffff7ee5763 "Fd\002"
        opcode = <optimized out>
        oparg = <optimized out>
        why = WHY_NOT
        err = <optimized out>
        x = <optimized out>
        v = <optimized out>
        w = <optimized out>
        u = <optimized out>
---Type <return> to continue, or q <return> to quit---
        t = <optimized out>
        stream = 0x0
        fastlocals = 0xa2ed08
        freevars = <optimized out>
        retval = <optimized out>
        tstate = <optimized out>
        co = <optimized out>
        instr_ub = -1
        instr_lb = 0
        instr_prev = -1
        first_instr = <optimized out>
        names = <optimized out>
        consts = <optimized out>
        enter = '__enter__'
        exit = '__exit__'
#5  0x00000000004c6544 in PyEval_EvalCodeEx (closure=0x0, defcount=0, defs=0x0, kwcount=0, kws=0x0, argcount=0, args=0x0, locals=<optimized out>, globals=<optimized out>, co=0x7ffff7f371b0)
    at ../Python/ceval.c:3253
        retval = 0x0
        fastlocals = 0xa2ed08
        freevars = 0xa2ed08
        u = <optimized out>
        f = Frame 0xa2eb90, for file <stdin>, line 1, in <module> ()
        tstate = 0x9410a0
        x = <optimized out>
#6  PyEval_EvalCode (locals=<optimized out>, globals=<optimized out>, co=0x7ffff7f371b0) at ../Python/ceval.c:667
No locals.
#7  run_mod.42568 (mod=mod@entry=0xa2cce0, filename=filename@entry=0x5bb2b5 "<stdin>", globals=<optimized out>, locals=<optimized out>, flags=flags@entry=0x7fffffffe0c0, 
    arena=arena@entry=0x9ab2d0) at ../Python/pythonrun.c:1365
        co = 0x7ffff7f371b0
#8  0x000000000043407e in PyRun_InteractiveOneFlags (fp=fp@entry=0x7ffff729d240 <_IO_2_1_stdin_>, filename=filename@entry=0x5bb2b5 "<stdin>", flags=flags@entry=0x7fffffffe0c0)
    at ../Python/pythonrun.c:852
        m = <optimized out>
        d = <optimized out>
        v = '>>> '
        w = '... '
        mod = 0xa2cce0
        arena = 0x9ab2d0
        ps1 = <optimized out>
        ps2 = 0x7ffff7eed3b4 "... "
        errcode = 0
#9  0x000000000043419a in PyRun_InteractiveLoopFlags (fp=fp@entry=0x7ffff729d240 <_IO_2_1_stdin_>, filename=filename@entry=0x5bb2b5 "<stdin>", flags=flags@entry=0x7fffffffe0c0)
    at ../Python/pythonrun.c:772
        v = <optimized out>
---Type <return> to continue, or q <return> to quit---
        ret = <optimized out>
        local_flags = {cf_flags = 0}
#10 0x000000000043484f in PyRun_AnyFileExFlags (fp=fp@entry=0x7ffff729d240 <_IO_2_1_stdin_>, filename=filename@entry=0x5bb2b5 "<stdin>", closeit=closeit@entry=0, 
    flags=flags@entry=0x7fffffffe0c0) at ../Python/pythonrun.c:741
        err = <optimized out>
#11 0x00000000004353e3 in Py_Main (argc=<optimized out>, argv=0x7fffffffe278) at ../Modules/main.c:640
        c = <optimized out>
        sts = <optimized out>
        command = 0x0
        filename = 0x0
        module = 0x0
        fp = 0x7ffff729d240 <_IO_2_1_stdin_>
        p = <optimized out>
        unbuffered = <optimized out>
        skipfirstline = <optimized out>
        stdin_is_interactive = 1
        help = <optimized out>
        version = <optimized out>
        saw_unbuffered_flag = <optimized out>
        cf = {cf_flags = 0}
#12 0x00007ffff6f17995 in __libc_start_main (main=0x4354a1 <main>, argc=1, ubp_av=0x7fffffffe278, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffe268) at libc-start.c:260
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, -2830849066025673389, 5720549, 140737488347760, 0, 0, 2830849065964215635, 2830829193398394195}, mask_was_saved = 0}}, priv = {pad = {
              0x0, 0x0, 0x5b8e60 <__libc_csu_init>, 0x7fffffffe278}, data = {prev = 0x0, cleanup = 0x0, canceltype = 6000224}}}
        not_first_call = <optimized out>
#13 0x0000000000574a0e in _start ()
No symbol table info available.
(gdb) thread apply all backtrace

Thread 1 (Thread 0x7ffff7fc1700 (LWP 8108)):
#0  extract_stringlist (list='hallo', name=name@entry=0x7ffff64b2c00 "seqratio", n=n@entry=5, sizelist=sizelist@entry=0x7fffffffdd90, strlist=strlist@entry=0x7fffffffdd80)
    at Levenshtein.c:1166
#1  0x00007ffff64ac59e in setseq_common (args=<optimized out>, name=name@entry=0x7ffff64b2c00 "seqratio", foo=..., lensum=lensum@entry=0x7fffffffddf8) at Levenshtein.c:1319
#2  0x00007ffff64b0676 in seqratio_py (self=<optimized out>, args=<optimized out>) at Levenshtein.c:1251
#3  0x0000000000529e45 in call_function (oparg=<optimized out>, pp_stack=0x7fffffffdf00) at ../Python/ceval.c:4021
#4  PyEval_EvalFrameEx (f=f@entry=Frame 0xa2eb90, for file <stdin>, line 1, in <module> (), throwflag=throwflag@entry=0) at ../Python/ceval.c:2666
#5  0x00000000004c6544 in PyEval_EvalCodeEx (closure=0x0, defcount=0, defs=0x0, kwcount=0, kws=0x0, argcount=0, args=0x0, locals=<optimized out>, globals=<optimized out>, co=0x7ffff7f371b0)
    at ../Python/ceval.c:3253
#6  PyEval_EvalCode (locals=<optimized out>, globals=<optimized out>, co=0x7ffff7f371b0) at ../Python/ceval.c:667
#7  run_mod.42568 (mod=mod@entry=0xa2cce0, filename=filename@entry=0x5bb2b5 "<stdin>", globals=<optimized out>, locals=<optimized out>, flags=flags@entry=0x7fffffffe0c0, 
    arena=arena@entry=0x9ab2d0) at ../Python/pythonrun.c:1365
#8  0x000000000043407e in PyRun_InteractiveOneFlags (fp=fp@entry=0x7ffff729d240 <_IO_2_1_stdin_>, filename=filename@entry=0x5bb2b5 "<stdin>", flags=flags@entry=0x7fffffffe0c0)
    at ../Python/pythonrun.c:852
#9  0x000000000043419a in PyRun_InteractiveLoopFlags (fp=fp@entry=0x7ffff729d240 <_IO_2_1_stdin_>, filename=filename@entry=0x5bb2b5 "<stdin>", flags=flags@entry=0x7fffffffe0c0)
    at ../Python/pythonrun.c:772
#10 0x000000000043484f in PyRun_AnyFileExFlags (fp=fp@entry=0x7ffff729d240 <_IO_2_1_stdin_>, filename=filename@entry=0x5bb2b5 "<stdin>", closeit=closeit@entry=0, 
    flags=flags@entry=0x7fffffffe0c0) at ../Python/pythonrun.c:741
#11 0x00000000004353e3 in Py_Main (argc=<optimized out>, argv=0x7fffffffe278) at ../Modules/main.c:640
#12 0x00007ffff6f17995 in __libc_start_main (main=0x4354a1 <main>, argc=1, ubp_av=0x7fffffffe278, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffe268) at libc-start.c:260
#13 0x0000000000574a0e in _start ()
(gdb) quit
A debugging session is active.

        Inferior 1 [process 8108] will be killed.

Quit anyway? (y or n) y

coverage cannot import Levenshtein

I have a file 'dummy.py' that contains only the line: import Levenshtein.

Running python dummy.py causes no problems, but running coverage run dummy.py generates the error:

Traceback (most recent call last): File "dummy.py", line 1, in <module> import Levenshtein ImportError: No module named 'Levenshtein'

Apparently there is some kind of conflict between coverage and Levenshtein?

setup.py build/install copies also .c and .h files

Hi,
with the move of _levenshtein.{c,h} to Levenshtein there is a side effect of copying those files during build/install setup.py install/build command:

$ python setup.py build
...
$ tree build/lib.linux-x86_64-2.7/Levenshtein/
build/lib.linux-x86_64-2.7/Levenshtein/
├── __init__.py
├── _levenshtein.c
├── _levenshtein.h
├── _levenshtein.so
└── StringMatcher.py

0 directories, 5 files

which they dont need to be there (and generates a small inconvenience in the Debian packaging, as the location conflicts between normal and debug packages).

Regards,
Sandro

Comman prefix length calculated to more than 4 symbols in jaro_winkler

According to wikipedia

ℓ is the length of common prefix at the start of the string up to a maximum of four characters

But looking at the implementation the length of the common prefix is limited only by the input strings length and not by 4. This leads to over estimation of jaro_winkler similarity for long strings with long common prefixes. For example

from Levenshtein import jaro_winkler
print(jaro_winkler('abcdefgH', 'abcdefgh'))  # prints 0.975, expected 0.95
print(jaro_winkler('michele', 'michelle'))  # prints 0.983, expected 0.975

I don't know if this is intentional or a bug. If it's intentional it should be better documented because at the moment jaro_winkler doesn't really compute the jaro_winkler ratio as defined in the literature.

If calculated correctly (assuming a p <= 0.25) jaro_winkler should be in the interval [0; 1] and there shouldn't be a need to clip it in the implementation.

This is related to #11 but concerns the entire algorithm and not just the case when the returned value is 1.

Compute ratios for all strings in list (feature request)

Hello, I'm a very poor c programmer, so unfortunately, this request is a bit beyond me, but one of the things that would be extremely helpful is to be able to give a list of sequences/strings and to return a list of lists of ratios between each pair of ratios (obviously, it could be done so that sequence 1 has a list of n-1, sequence 2 has a list of length n-2, etc. -- how it's done doesn't really matter).

For my purposes (computing ratios among a whole slew of sequences) I've measured the speed limiting factor as the transfer of data to and from the c-code, so I believe that transferring all the data once would give an enormous speed increase. Even transferring a list of sequences to get ratios against one string would be a big speedup I believe.

I'd be willing to contribute some of the other python tasks (test suite; pure-python fallback implementation) in return for the developers' time on this. Thanks!

pip3 installation error - error: command 'gcc' failed with exit status 1

I am using MacOS Catalina, and I have problem installing the package. I get the message:

error: command 'gcc' failed with exit status 1

I read that I needed to install build-essential and python3-dev. I reinstalled python3 using homebrew - brew reinstall python. For build-essential, I cannot get much guidance on how to install this on MacOS - I tried using brew but the message says No available formula with the name "build-essential"

Can I get some guidance on installing the package using pip3 on MacOS?

get_matching_blocks() fails on python3

Hello. I actually use a fuzzywuzzy package which relies on your repo, but suddenly I discovered that, when run on python3, tests for fuzzywuzzy produce output like that

ERROR: testWRatioUnicodeString (__main__.RatioTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_fuzzywuzzy.py", line 199, in testWRatioUnicodeString
    score = fuzz.WRatio(s1, s2, force_ascii=False)
  File "/home/crystal/github/fuzzywuzzy/fuzzywuzzy/fuzz.py", line 255, in WRatio
    partial = partial_ratio(p1, p2) * partial_scale
  File "/home/crystal/github/fuzzywuzzy/fuzzywuzzy/fuzz.py", line 77, in partial_ratio
    blocks = m.get_matching_blocks()
  File "/home/crystal/github/fuzzywuzzy/fuzzywuzzy/StringMatcher.py", line 57, in get_matching_blocks
    self._str1, self._str2)
TypeError: inverse expected a list of edit operations

StringMatcher.py is the same file as in your repo and it seems that there is some problem in get_matching_blocks() function, and only on python3. I haven't written extensions, so I tried to understand the root - but failed :(

Could you suggest the solutionfor this?

Error in installation with pip [Windows]

I am on windows 10 x64. Recently was trying my hand on a tool called videocr. It requires the installation of levenshtein, but apparently it only checks and downloads the 0.12 version. Which apparently gives me an error similar to #44 and #42

Can't say if they are related or not. Original issue on videocr is here. apm1467/videocr#8 and here is the error which it throws https://pastebin.com/ksSVw1Wa

I tried downloading the VC redist files for VS2015/2017/2019 since it says it required VC++ 14.0 but it had no effect.

Error - Conda Build Final Step - Windows x64

Hi guys -

After laboriously installing Microsoft dev tools, I got this error on the very last step:

===== testing package: python-levenshtein-0.12.0-py34_0 =====
import: 'Levenshtein'
Traceback (most recent call last):
  File "C:\Anaconda3\conda-bld\test-tmp_dir\run_test.py", line 25, in <module>
    import Levenshtein
  File "C:\Anaconda3\envs\_test\lib\site-packages\python_Levenshtein-0.12.0-py3.
4-win-amd64.egg\Levenshtein\__init__.py", line 1, in <module>
    from Levenshtein import _levenshtein
ImportError: DLL load failed: The specified module could not be found.
TESTS FAILED: python-levenshtein-0.12.0-py34_0

Any thoughts? It did look like it compiled successfully prior to this step.

GCC build fail when installing using pip install --user in Jupyter

Doesn't matter whether using Python 2 or 3 kernels, trying to install via --user parameter results in build fail. Probably trying to write to folders that it shouldn't?

%%sh
pip install python-levenshtein --user

Collecting python-levenshtein
Using cached python-Levenshtein-0.12.0.tar.gz
Requirement already satisfied (use --upgrade to upgrade): setuptools in /opt/anaconda3/lib/python3.5/site-packages/setuptools-20.3-py3.5.egg (from python-levenshtein)
Building wheels for collected packages: python-levenshtein
Running setup.py bdist_wheel for python-levenshtein: started
Running setup.py bdist_wheel for python-levenshtein: finished with status 'error'
Complete output from command /opt/anaconda3/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-5jt1oaaa/python-levenshtein/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" bdist_wheel -d /tmp/tmpl_1pzegxpip-wheel- --python-tag cp35:
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.5
creating build/lib.linux-x86_64-3.5/Levenshtein
copying Levenshtein/init.py -> build/lib.linux-x86_64-3.5/Levenshtein
copying Levenshtein/StringMatcher.py -> build/lib.linux-x86_64-3.5/Levenshtein
running egg_info
writing top-level names to python_Levenshtein.egg-info/top_level.txt
writing requirements to python_Levenshtein.egg-info/requires.txt
writing entry points to python_Levenshtein.egg-info/entry_points.txt
writing python_Levenshtein.egg-info/PKG-INFO
writing dependency_links to python_Levenshtein.egg-info/dependency_links.txt
writing namespace_packages to python_Levenshtein.egg-info/namespace_packages.txt
warning: manifest_maker: standard file '-c' not found

reading manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '_pyc' found anywhere in distribution
warning: no previously-included files matching '_so' found anywhere in distribution
warning: no previously-included files matching '.project' found anywhere in distribution
warning: no previously-included files matching '.pydevproject' found anywhere in distribution
writing manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
copying Levenshtein/_levenshtein.c -> build/lib.linux-x86_64-3.5/Levenshtein
copying Levenshtein/_levenshtein.h -> build/lib.linux-x86_64-3.5/Levenshtein
running build_ext
building 'Levenshtein._levenshtein' extension
creating build/temp.linux-x86_64-3.5
creating build/temp.linux-x86_64-3.5/Levenshtein
gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/anaconda3/include/python3.5m -c Levenshtein/_levenshtein.c -o build/temp.linux-x86_64-3.5/Levenshtein/_levenshtein.o
unable to execute 'gcc': No such file or directory
error: command 'gcc' failed with exit status 1


Running setup.py clean for python-levenshtein
Failed to build python-levenshtein
Installing collected packages: python-levenshtein
Running setup.py install for python-levenshtein: started
Running setup.py install for python-levenshtein: finished with status 'error'
Complete output from command /opt/anaconda3/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-5jt1oaaa/python-levenshtein/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /tmp/pip-x_496f1q-record/install-record.txt --single-version-externally-managed --compile --user --prefix=:
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.5
creating build/lib.linux-x86_64-3.5/Levenshtein
copying Levenshtein/init.py -> build/lib.linux-x86_64-3.5/Levenshtein
copying Levenshtein/StringMatcher.py -> build/lib.linux-x86_64-3.5/Levenshtein
running egg_info
writing python_Levenshtein.egg-info/PKG-INFO
writing top-level names to python_Levenshtein.egg-info/top_level.txt
writing dependency_links to python_Levenshtein.egg-info/dependency_links.txt
writing namespace_packages to python_Levenshtein.egg-info/namespace_packages.txt
writing entry points to python_Levenshtein.egg-info/entry_points.txt
writing requirements to python_Levenshtein.egg-info/requires.txt
warning: manifest_maker: standard file '-c' not found

reading manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '*pyc' found anywhere in distribution
warning: no previously-included files matching '*so' found anywhere in distribution
warning: no previously-included files matching '.project' found anywhere in distribution
warning: no previously-included files matching '.pydevproject' found anywhere in distribution
writing manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
copying Levenshtein/_levenshtein.c -> build/lib.linux-x86_64-3.5/Levenshtein
copying Levenshtein/_levenshtein.h -> build/lib.linux-x86_64-3.5/Levenshtein
running build_ext
building 'Levenshtein._levenshtein' extension
creating build/temp.linux-x86_64-3.5
creating build/temp.linux-x86_64-3.5/Levenshtein
gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/anaconda3/include/python3.5m -c Levenshtein/_levenshtein.c -o build/temp.linux-x86_64-3.5/Levenshtein/_levenshtein.o
unable to execute 'gcc': No such file or directory
error: command 'gcc' failed with exit status 1

----------------------------------------

Failed building wheel for python-levenshtein
Command "/opt/anaconda3/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-5jt1oaaa/python-levenshtein/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /tmp/pip-x_496f1q-record/install-record.txt --single-version-externally-managed --compile --user --prefix=" failed with error code 1 in /tmp/pip-build-5jt1oaaa/python-levenshtein/

Providing wheels and updating CI + supported versions

Hey @ztane can you take a look at #36 & #48. I think the package could really benefit from publishing wheels and updating CI plus the supported versions, most problems seem to occur with install rather than usage. I don't mind lending a hand to clean up the repo.

Thanks for taking the time to maintain this package, it's appreciated. 👍

pip install failed with command 'gcc' failed with exit status 1

Hi all,
I'm trying to install python-levenshtein in a conda environment, and I got this command 'gcc' failed with exit status 1 error.

I've gcc, the conda is version 4.7.12, the conda environment is a freshly created empty env and is using python3.6 (I tried 3.7 as well), I'm on Manjaro and I installed base-devel (I assume that's the equivalent to build-essential) and python-devtools.

Please help.

The entire dump is:

Building wheels for collected packages: python-levenshtein
  Building wheel for python-levenshtein (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /home/guangzhi/anaconda3/envs/mtt/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-qfsg9aen/python-levenshtein/setup.py'"'"'; __file__='"'"'/tmp/pip-install-qfsg9aen/python-levenshtein/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-d3usalms --python-tag cp36
       cwd: /tmp/pip-install-qfsg9aen/python-levenshtein/
  Complete output (175 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.6
  creating build/lib.linux-x86_64-3.6/Levenshtein
  copying Levenshtein/__init__.py -> build/lib.linux-x86_64-3.6/Levenshtein
  copying Levenshtein/StringMatcher.py -> build/lib.linux-x86_64-3.6/Levenshtein
  running egg_info
  writing python_Levenshtein.egg-info/PKG-INFO
  writing dependency_links to python_Levenshtein.egg-info/dependency_links.txt
  writing entry points to python_Levenshtein.egg-info/entry_points.txt
  writing namespace_packages to python_Levenshtein.egg-info/namespace_packages.txt
  writing requirements to python_Levenshtein.egg-info/requires.txt
  writing top-level names to python_Levenshtein.egg-info/top_level.txt
  reading manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  warning: no previously-included files matching '*pyc' found anywhere in distribution
  warning: no previously-included files matching '*so' found anywhere in distribution
  warning: no previously-included files matching '.project' found anywhere in distribution
  warning: no previously-included files matching '.pydevproject' found anywhere in distribution
  writing manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
  copying Levenshtein/_levenshtein.c -> build/lib.linux-x86_64-3.6/Levenshtein
  copying Levenshtein/_levenshtein.h -> build/lib.linux-x86_64-3.6/Levenshtein
  running build_ext
  building 'Levenshtein._levenshtein' extension
  creating build/temp.linux-x86_64-3.6
  creating build/temp.linux-x86_64-3.6/Levenshtein
  gcc -pthread -B /home/guangzhi/anaconda3/envs/mtt/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/guangzhi/anaconda3/envs/mtt/include/python3.6m -c Levenshtein/_levenshtein.c -o build/temp.linux-x86_64-3.6/Levenshtein/_levenshtein.o
  Levenshtein/_levenshtein.c: In function ‘levenshtein_common’:
  Levenshtein/_levenshtein.c:711:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
    711 |     string1 = PyString_AS_STRING(arg1);
        |             ^
  Levenshtein/_levenshtein.c:712:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
    712 |     string2 = PyString_AS_STRING(arg2);
        |             ^
  Levenshtein/_levenshtein.c: In function ‘hamming_py’:
  Levenshtein/_levenshtein.c:796:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
    796 |     string1 = PyString_AS_STRING(arg1);
        |             ^
  Levenshtein/_levenshtein.c:797:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
    797 |     string2 = PyString_AS_STRING(arg2);
        |             ^
  Levenshtein/_levenshtein.c: In function ‘jaro_py’:
  Levenshtein/_levenshtein.c:840:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
    840 |     string1 = PyString_AS_STRING(arg1);
        |             ^
  Levenshtein/_levenshtein.c:841:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
    841 |     string2 = PyString_AS_STRING(arg2);
        |             ^
  Levenshtein/_levenshtein.c: In function ‘jaro_winkler_py’:
  Levenshtein/_levenshtein.c:890:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
    890 |     string1 = PyString_AS_STRING(arg1);
        |             ^
  Levenshtein/_levenshtein.c:891:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
    891 |     string2 = PyString_AS_STRING(arg2);
        |             ^
  Levenshtein/_levenshtein.c: In function ‘median_common’:
  Levenshtein/_levenshtein.c:992:43: warning: pointer targets in passing argument 1 of ‘PyBytes_FromStringAndSize’ differ in signedness [-Wpointer-sign]
    992 |       result = PyString_FromStringAndSize(medstr, len);
        |                                           ^~~~~~
        |                                           |
        |                                           lev_byte * {aka unsigned char *}
  In file included from /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/Python.h:88,
                   from Levenshtein/_levenshtein.c:99:
  /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/bytesobject.h:51:24: note: expected ‘const char *’ but argument is of type ‘lev_byte *’ {aka ‘unsigned char *’}
     51 | PyAPI_FUNC(PyObject *) PyBytes_FromStringAndSize(const char *, Py_ssize_t);
        |                        ^~~~~~~~~~~~~~~~~~~~~~~~~
  Levenshtein/_levenshtein.c: In function ‘median_improve_common’:
  /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/bytesobject.h:87:31: warning: pointer targets in initialization of ‘lev_byte *’ {aka ‘unsigned char *’} from ‘char *’ differ in signedness [-Wpointer-sign]
     87 | #define PyBytes_AS_STRING(op) (assert(PyBytes_Check(op)), \
        |                               ^
  Levenshtein/_levenshtein.c:106:28: note: in expansion of macro ‘PyBytes_AS_STRING’
    106 | #define PyString_AS_STRING PyBytes_AS_STRING
        |                            ^~~~~~~~~~~~~~~~~
  Levenshtein/_levenshtein.c:1071:19: note: in expansion of macro ‘PyString_AS_STRING’
   1071 |     lev_byte *s = PyString_AS_STRING(arg1);
        |                   ^~~~~~~~~~~~~~~~~~
  Levenshtein/_levenshtein.c:1077:43: warning: pointer targets in passing argument 1 of ‘PyBytes_FromStringAndSize’ differ in signedness [-Wpointer-sign]
   1077 |       result = PyString_FromStringAndSize(medstr, len);
        |                                           ^~~~~~
        |                                           |
        |                                           lev_byte * {aka unsigned char *}
  In file included from /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/Python.h:88,
                   from Levenshtein/_levenshtein.c:99:
  /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/bytesobject.h:51:24: note: expected ‘const char *’ but argument is of type ‘lev_byte *’ {aka ‘unsigned char *’}
     51 | PyAPI_FUNC(PyObject *) PyBytes_FromStringAndSize(const char *, Py_ssize_t);
        |                        ^~~~~~~~~~~~~~~~~~~~~~~~~
  Levenshtein/_levenshtein.c: In function ‘extract_weightlist’:
  Levenshtein/_levenshtein.c:1115:41: warning: comparison of integer expressions of different signedness: ‘Py_ssize_t’ {aka ‘long int’} and ‘size_t’ {aka ‘long unsigned int’} [-Wsign-compare]
   1115 |     if (PySequence_Fast_GET_SIZE(wlist) != n) {
        |                                         ^~
  Levenshtein/_levenshtein.c: In function ‘extract_stringlist’:
  Levenshtein/_levenshtein.c:1201:16: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
   1201 |     strings[0] = PyString_AS_STRING(first);
        |                ^
  Levenshtein/_levenshtein.c:1213:18: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
   1213 |       strings[i] = PyString_AS_STRING(item);
        |                  ^
  Levenshtein/_levenshtein.c: In function ‘string_to_edittype’:
  Levenshtein/_levenshtein.c:1379:13: warning: unused variable ‘len’ [-Wunused-variable]
   1379 |   size_t i, len;
        |             ^~~
  Levenshtein/_levenshtein.c:1378:15: warning: unused variable ‘s’ [-Wunused-variable]
   1378 |   const char *s;
        |               ^
  Levenshtein/_levenshtein.c: In function ‘editops_py’:
  Levenshtein/_levenshtein.c:1650:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
   1650 |     string1 = PyString_AS_STRING(arg1);
        |             ^
  Levenshtein/_levenshtein.c:1651:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
   1651 |     string2 = PyString_AS_STRING(arg2);
        |             ^
  Levenshtein/_levenshtein.c: In function ‘opcodes_py’:
  Levenshtein/_levenshtein.c:1768:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
   1768 |     string1 = PyString_AS_STRING(arg1);
        |             ^
  Levenshtein/_levenshtein.c:1769:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
   1769 |     string2 = PyString_AS_STRING(arg2);
        |             ^
  Levenshtein/_levenshtein.c: In function ‘apply_edit_py’:
  Levenshtein/_levenshtein.c:1863:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
   1863 |     string1 = PyString_AS_STRING(arg1);
        |             ^
  Levenshtein/_levenshtein.c:1864:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
   1864 |     string2 = PyString_AS_STRING(arg2);
        |             ^
  Levenshtein/_levenshtein.c:1878:43: warning: pointer targets in passing argument 1 of ‘PyBytes_FromStringAndSize’ differ in signedness [-Wpointer-sign]
   1878 |       result = PyString_FromStringAndSize(s, len);
        |                                           ^
        |                                           |
        |                                           lev_byte * {aka unsigned char *}
  In file included from /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/Python.h:88,
                   from Levenshtein/_levenshtein.c:99:
  /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/bytesobject.h:51:24: note: expected ‘const char *’ but argument is of type ‘lev_byte *’ {aka ‘unsigned char *’}
     51 | PyAPI_FUNC(PyObject *) PyBytes_FromStringAndSize(const char *, Py_ssize_t);
        |                        ^~~~~~~~~~~~~~~~~~~~~~~~~
  Levenshtein/_levenshtein.c:1894:43: warning: pointer targets in passing argument 1 of ‘PyBytes_FromStringAndSize’ differ in signedness [-Wpointer-sign]
   1894 |       result = PyString_FromStringAndSize(s, len);
        |                                           ^
        |                                           |
        |                                           lev_byte * {aka unsigned char *}
  In file included from /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/Python.h:88,
                   from Levenshtein/_levenshtein.c:99:
  /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/bytesobject.h:51:24: note: expected ‘const char *’ but argument is of type ‘lev_byte *’ {aka ‘unsigned char *’}
     51 | PyAPI_FUNC(PyObject *) PyBytes_FromStringAndSize(const char *, Py_ssize_t);
        |                        ^~~~~~~~~~~~~~~~~~~~~~~~~
  Levenshtein/_levenshtein.c: In function ‘subtract_edit_py’:
  Levenshtein/_levenshtein.c:2060:27: warning: comparison of integer expressions of different signedness: ‘size_t’ {aka ‘long unsigned int’} and ‘int’ [-Wsign-compare]
   2060 |           if (!orem && nr == -1) {
        |                           ^~
  At top level:
  Levenshtein/_levenshtein.c:6700:1: warning: ‘lev_opcodes_total_cost’ defined but not used [-Wunused-function]
   6700 | lev_opcodes_total_cost(size_t nb,
        | ^~~~~~~~~~~~~~~~~~~~~~
  Levenshtein/_levenshtein.c:6655:1: warning: ‘lev_editops_normalize’ defined but not used [-Wunused-function]
   6655 | lev_editops_normalize(size_t n,
        | ^~~~~~~~~~~~~~~~~~~~~
  Levenshtein/_levenshtein.c:6630:1: warning: ‘lev_editops_total_cost’ defined but not used [-Wunused-function]
   6630 | lev_editops_total_cost(size_t n,
        | ^~~~~~~~~~~~~~~~~~~~~~
  Levenshtein/_levenshtein.c:2550:1: warning: ‘lev_u_edit_distance_sod’ defined but not used [-Wunused-function]
   2550 | lev_u_edit_distance_sod(size_t len, const lev_wchar *string,
        | ^~~~~~~~~~~~~~~~~~~~~~~
  Levenshtein/_levenshtein.c:2371:1: warning: ‘lev_edit_distance_sod’ defined but not used [-Wunused-function]
   2371 | lev_edit_distance_sod(size_t len, const lev_byte *string,
        | ^~~~~~~~~~~~~~~~~~~~~
  gcc -pthread -shared -B /home/guangzhi/anaconda3/envs/mtt/compiler_compat -L/home/guangzhi/anaconda3/envs/mtt/lib -Wl,-rpath=/home/guangzhi/anaconda3/envs/mtt/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/Levenshtein/_levenshtein.o -o build/lib.linux-x86_64-3.6/Levenshtein/_levenshtein.cpython-36m-x86_64-linux-gnu.so
  /home/guangzhi/anaconda3/envs/mtt/compiler_compat/ld: build/temp.linux-x86_64-3.6/Levenshtein/_levenshtein.o: unable to initialize decompress status for section .debug_info
  /home/guangzhi/anaconda3/envs/mtt/compiler_compat/ld: build/temp.linux-x86_64-3.6/Levenshtein/_levenshtein.o: unable to initialize decompress status for section .debug_info
  /home/guangzhi/anaconda3/envs/mtt/compiler_compat/ld: build/temp.linux-x86_64-3.6/Levenshtein/_levenshtein.o: unable to initialize decompress status for section .debug_info
  /home/guangzhi/anaconda3/envs/mtt/compiler_compat/ld: build/temp.linux-x86_64-3.6/Levenshtein/_levenshtein.o: unable to initialize decompress status for section .debug_info
  build/temp.linux-x86_64-3.6/Levenshtein/_levenshtein.o: file not recognized: file format not recognized
  collect2: error: ld returned 1 exit status
  error: command 'gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for python-levenshtein
  Running setup.py clean for python-levenshtein
Failed to build python-levenshtein
Installing collected packages: python-levenshtein
    Running setup.py install for python-levenshtein ... error
    ERROR: Command errored out with exit status 1:
     command: /home/guangzhi/anaconda3/envs/mtt/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-qfsg9aen/python-levenshtein/setup.py'"'"'; __file__='"'"'/tmp/pip-install-qfsg9aen/python-levenshtein/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-ch5lrfxh/install-record.txt --single-version-externally-managed --compile
         cwd: /tmp/pip-install-qfsg9aen/python-levenshtein/
    Complete output (175 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.6
    creating build/lib.linux-x86_64-3.6/Levenshtein
    copying Levenshtein/__init__.py -> build/lib.linux-x86_64-3.6/Levenshtein
    copying Levenshtein/StringMatcher.py -> build/lib.linux-x86_64-3.6/Levenshtein
    running egg_info
    writing python_Levenshtein.egg-info/PKG-INFO
    writing dependency_links to python_Levenshtein.egg-info/dependency_links.txt
    writing entry points to python_Levenshtein.egg-info/entry_points.txt
    writing namespace_packages to python_Levenshtein.egg-info/namespace_packages.txt
    writing requirements to python_Levenshtein.egg-info/requires.txt
    writing top-level names to python_Levenshtein.egg-info/top_level.txt
    reading manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no previously-included files matching '*pyc' found anywhere in distribution
    warning: no previously-included files matching '*so' found anywhere in distribution
    warning: no previously-included files matching '.project' found anywhere in distribution
    warning: no previously-included files matching '.pydevproject' found anywhere in distribution
    writing manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
    copying Levenshtein/_levenshtein.c -> build/lib.linux-x86_64-3.6/Levenshtein
    copying Levenshtein/_levenshtein.h -> build/lib.linux-x86_64-3.6/Levenshtein
    running build_ext
    building 'Levenshtein._levenshtein' extension
    creating build/temp.linux-x86_64-3.6
    creating build/temp.linux-x86_64-3.6/Levenshtein
    gcc -pthread -B /home/guangzhi/anaconda3/envs/mtt/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/guangzhi/anaconda3/envs/mtt/include/python3.6m -c Levenshtein/_levenshtein.c -o build/temp.linux-x86_64-3.6/Levenshtein/_levenshtein.o
    Levenshtein/_levenshtein.c: In function ‘levenshtein_common’:
    Levenshtein/_levenshtein.c:711:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
      711 |     string1 = PyString_AS_STRING(arg1);
          |             ^
    Levenshtein/_levenshtein.c:712:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
      712 |     string2 = PyString_AS_STRING(arg2);
          |             ^
    Levenshtein/_levenshtein.c: In function ‘hamming_py’:
    Levenshtein/_levenshtein.c:796:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
      796 |     string1 = PyString_AS_STRING(arg1);
          |             ^
    Levenshtein/_levenshtein.c:797:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
      797 |     string2 = PyString_AS_STRING(arg2);
          |             ^
    Levenshtein/_levenshtein.c: In function ‘jaro_py’:
    Levenshtein/_levenshtein.c:840:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
      840 |     string1 = PyString_AS_STRING(arg1);
          |             ^
    Levenshtein/_levenshtein.c:841:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
      841 |     string2 = PyString_AS_STRING(arg2);
          |             ^
    Levenshtein/_levenshtein.c: In function ‘jaro_winkler_py’:
    Levenshtein/_levenshtein.c:890:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
      890 |     string1 = PyString_AS_STRING(arg1);
          |             ^
    Levenshtein/_levenshtein.c:891:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
      891 |     string2 = PyString_AS_STRING(arg2);
          |             ^
    Levenshtein/_levenshtein.c: In function ‘median_common’:
    Levenshtein/_levenshtein.c:992:43: warning: pointer targets in passing argument 1 of ‘PyBytes_FromStringAndSize’ differ in signedness [-Wpointer-sign]
      992 |       result = PyString_FromStringAndSize(medstr, len);
          |                                           ^~~~~~
          |                                           |
          |                                           lev_byte * {aka unsigned char *}
    In file included from /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/Python.h:88,
                     from Levenshtein/_levenshtein.c:99:
    /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/bytesobject.h:51:24: note: expected ‘const char *’ but argument is of type ‘lev_byte *’ {aka ‘unsigned char *’}
       51 | PyAPI_FUNC(PyObject *) PyBytes_FromStringAndSize(const char *, Py_ssize_t);
          |                        ^~~~~~~~~~~~~~~~~~~~~~~~~
    Levenshtein/_levenshtein.c: In function ‘median_improve_common’:
    /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/bytesobject.h:87:31: warning: pointer targets in initialization of ‘lev_byte *’ {aka ‘unsigned char *’} from ‘char *’ differ in signedness [-Wpointer-sign]
       87 | #define PyBytes_AS_STRING(op) (assert(PyBytes_Check(op)), \
          |                               ^
    Levenshtein/_levenshtein.c:106:28: note: in expansion of macro ‘PyBytes_AS_STRING’
      106 | #define PyString_AS_STRING PyBytes_AS_STRING
          |                            ^~~~~~~~~~~~~~~~~
    Levenshtein/_levenshtein.c:1071:19: note: in expansion of macro ‘PyString_AS_STRING’
     1071 |     lev_byte *s = PyString_AS_STRING(arg1);
          |                   ^~~~~~~~~~~~~~~~~~
    Levenshtein/_levenshtein.c:1077:43: warning: pointer targets in passing argument 1 of ‘PyBytes_FromStringAndSize’ differ in signedness [-Wpointer-sign]
     1077 |       result = PyString_FromStringAndSize(medstr, len);
          |                                           ^~~~~~
          |                                           |
          |                                           lev_byte * {aka unsigned char *}
    In file included from /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/Python.h:88,
                     from Levenshtein/_levenshtein.c:99:
    /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/bytesobject.h:51:24: note: expected ‘const char *’ but argument is of type ‘lev_byte *’ {aka ‘unsigned char *’}
       51 | PyAPI_FUNC(PyObject *) PyBytes_FromStringAndSize(const char *, Py_ssize_t);
          |                        ^~~~~~~~~~~~~~~~~~~~~~~~~
    Levenshtein/_levenshtein.c: In function ‘extract_weightlist’:
    Levenshtein/_levenshtein.c:1115:41: warning: comparison of integer expressions of different signedness: ‘Py_ssize_t’ {aka ‘long int’} and ‘size_t’ {aka ‘long unsigned int’} [-Wsign-compare]
     1115 |     if (PySequence_Fast_GET_SIZE(wlist) != n) {
          |                                         ^~
    Levenshtein/_levenshtein.c: In function ‘extract_stringlist’:
    Levenshtein/_levenshtein.c:1201:16: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
     1201 |     strings[0] = PyString_AS_STRING(first);
          |                ^
    Levenshtein/_levenshtein.c:1213:18: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
     1213 |       strings[i] = PyString_AS_STRING(item);
          |                  ^
    Levenshtein/_levenshtein.c: In function ‘string_to_edittype’:
    Levenshtein/_levenshtein.c:1379:13: warning: unused variable ‘len’ [-Wunused-variable]
     1379 |   size_t i, len;
          |             ^~~
    Levenshtein/_levenshtein.c:1378:15: warning: unused variable ‘s’ [-Wunused-variable]
     1378 |   const char *s;
          |               ^
    Levenshtein/_levenshtein.c: In function ‘editops_py’:
    Levenshtein/_levenshtein.c:1650:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
     1650 |     string1 = PyString_AS_STRING(arg1);
          |             ^
    Levenshtein/_levenshtein.c:1651:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
     1651 |     string2 = PyString_AS_STRING(arg2);
          |             ^
    Levenshtein/_levenshtein.c: In function ‘opcodes_py’:
    Levenshtein/_levenshtein.c:1768:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
     1768 |     string1 = PyString_AS_STRING(arg1);
          |             ^
    Levenshtein/_levenshtein.c:1769:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
     1769 |     string2 = PyString_AS_STRING(arg2);
          |             ^
    Levenshtein/_levenshtein.c: In function ‘apply_edit_py’:
    Levenshtein/_levenshtein.c:1863:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
     1863 |     string1 = PyString_AS_STRING(arg1);
          |             ^
    Levenshtein/_levenshtein.c:1864:13: warning: pointer targets in assignment from ‘char *’ to ‘lev_byte *’ {aka ‘unsigned char *’} differ in signedness [-Wpointer-sign]
     1864 |     string2 = PyString_AS_STRING(arg2);
          |             ^
    Levenshtein/_levenshtein.c:1878:43: warning: pointer targets in passing argument 1 of ‘PyBytes_FromStringAndSize’ differ in signedness [-Wpointer-sign]
     1878 |       result = PyString_FromStringAndSize(s, len);
          |                                           ^
          |                                           |
          |                                           lev_byte * {aka unsigned char *}
    In file included from /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/Python.h:88,
                     from Levenshtein/_levenshtein.c:99:
    /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/bytesobject.h:51:24: note: expected ‘const char *’ but argument is of type ‘lev_byte *’ {aka ‘unsigned char *’}
       51 | PyAPI_FUNC(PyObject *) PyBytes_FromStringAndSize(const char *, Py_ssize_t);
          |                        ^~~~~~~~~~~~~~~~~~~~~~~~~
    Levenshtein/_levenshtein.c:1894:43: warning: pointer targets in passing argument 1 of ‘PyBytes_FromStringAndSize’ differ in signedness [-Wpointer-sign]
     1894 |       result = PyString_FromStringAndSize(s, len);
          |                                           ^
          |                                           |
          |                                           lev_byte * {aka unsigned char *}
    In file included from /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/Python.h:88,
                     from Levenshtein/_levenshtein.c:99:
    /home/guangzhi/anaconda3/envs/mtt/include/python3.6m/bytesobject.h:51:24: note: expected ‘const char *’ but argument is of type ‘lev_byte *’ {aka ‘unsigned char *’}
       51 | PyAPI_FUNC(PyObject *) PyBytes_FromStringAndSize(const char *, Py_ssize_t);
          |                        ^~~~~~~~~~~~~~~~~~~~~~~~~
    Levenshtein/_levenshtein.c: In function ‘subtract_edit_py’:
    Levenshtein/_levenshtein.c:2060:27: warning: comparison of integer expressions of different signedness: ‘size_t’ {aka ‘long unsigned int’} and ‘int’ [-Wsign-compare]
     2060 |           if (!orem && nr == -1) {
          |                           ^~
    At top level:
    Levenshtein/_levenshtein.c:6700:1: warning: ‘lev_opcodes_total_cost’ defined but not used [-Wunused-function]
     6700 | lev_opcodes_total_cost(size_t nb,
          | ^~~~~~~~~~~~~~~~~~~~~~
    Levenshtein/_levenshtein.c:6655:1: warning: ‘lev_editops_normalize’ defined but not used [-Wunused-function]
     6655 | lev_editops_normalize(size_t n,
          | ^~~~~~~~~~~~~~~~~~~~~
    Levenshtein/_levenshtein.c:6630:1: warning: ‘lev_editops_total_cost’ defined but not used [-Wunused-function]
     6630 | lev_editops_total_cost(size_t n,
          | ^~~~~~~~~~~~~~~~~~~~~~
    Levenshtein/_levenshtein.c:2550:1: warning: ‘lev_u_edit_distance_sod’ defined but not used [-Wunused-function]
     2550 | lev_u_edit_distance_sod(size_t len, const lev_wchar *string,
          | ^~~~~~~~~~~~~~~~~~~~~~~
    Levenshtein/_levenshtein.c:2371:1: warning: ‘lev_edit_distance_sod’ defined but not used [-Wunused-function]
     2371 | lev_edit_distance_sod(size_t len, const lev_byte *string,
          | ^~~~~~~~~~~~~~~~~~~~~
    gcc -pthread -shared -B /home/guangzhi/anaconda3/envs/mtt/compiler_compat -L/home/guangzhi/anaconda3/envs/mtt/lib -Wl,-rpath=/home/guangzhi/anaconda3/envs/mtt/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/Levenshtein/_levenshtein.o -o build/lib.linux-x86_64-3.6/Levenshtein/_levenshtein.cpython-36m-x86_64-linux-gnu.so
    /home/guangzhi/anaconda3/envs/mtt/compiler_compat/ld: build/temp.linux-x86_64-3.6/Levenshtein/_levenshtein.o: unable to initialize decompress status for section .debug_info
    /home/guangzhi/anaconda3/envs/mtt/compiler_compat/ld: build/temp.linux-x86_64-3.6/Levenshtein/_levenshtein.o: unable to initialize decompress status for section .debug_info
    /home/guangzhi/anaconda3/envs/mtt/compiler_compat/ld: build/temp.linux-x86_64-3.6/Levenshtein/_levenshtein.o: unable to initialize decompress status for section .debug_info
    /home/guangzhi/anaconda3/envs/mtt/compiler_compat/ld: build/temp.linux-x86_64-3.6/Levenshtein/_levenshtein.o: unable to initialize decompress status for section .debug_info
    build/temp.linux-x86_64-3.6/Levenshtein/_levenshtein.o: file not recognized: file format not recognized
    collect2: error: ld returned 1 exit status
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /home/guangzhi/anaconda3/envs/mtt/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-qfsg9aen/python-levenshtein/setup.py'"'"'; __file__='"'"'/tmp/pip-install-qfsg9aen/python-levenshtein/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-ch5lrfxh/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.

Error in installation with pip

cc1: error: '-fcf-protection=full' requires Intel CET support. Use -mcet or both of -mibt and -mshstk options to enable CET error: command 'gcc' failed with exit status 1 ---------------------------------------- ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-kkb35uuy/python-Levenshtein/setup.py'"'"'; __file__='"'"'/tmp/pip-install-kkb35uuy/python-Levenshtein/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-t0prjddj/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.

even when installed previously gcc and python3-devel

My SO: Fedora 28

Levenshtein distance to convert one sentence into another sentence

Is it possible to get the Levensten distance between two strings, i.e. at the word level and not at the char level? The current behavour is as follows:

  sent1 = "The shy creatures avoid human interactions."
  sent2 = "The foxes are shy."
  nltk.edit_distance(nltk.word_tokenize(sent1), nltk.word_tokenize(sent2))
  5
  Levenshtein.editops(sent1, sent2)
  [('delete', 4, 4), ('delete', 5, 4), ('delete', 6, 4), ('delete', 7, 4), ('delete', 8, 4), ('delete', 9, 4), ('delete', 10, 4), ('delete', 11, 4), ('replace', 12, 4), ('replace', 13, 5), ('replace', 14, 6), ('delete', 19, 11), ('delete', 20, 11), ('delete', 21, 11), ('delete', 22, 11), ('delete', 23, 11), ('delete', 24, 11), ('delete', 25, 11), ('delete', 26, 11), ('delete', 27, 11), ('delete', 28, 11), ('delete', 29, 11), ('delete', 30, 11), ('delete', 31, 11), ('delete', 32, 11), ('delete', 33, 11), ('delete', 35, 12), ('delete', 36, 12), ('replace', 37, 12), ('replace', 38, 13), ('replace', 39, 14), ('replace', 40, 15), ('replace', 41, 16)]

Can Levenshtein.editops return the 5 operations needed to convert s1 to s2?

Feature Request - Documentation on Release

Would you mind providing the HTML docs that can be generated?

It'd be very nice in deciding whether or not this is a library with which I (or others) would like to work.

Thanks.

cc1plus causes build failure

on Ubuntu 14.04 server, gcc version 4.9.2 (Ubuntu 4.9.2-0ubuntu1~14.04)

Downloading/unpacking editdistance
  Downloading editdistance-0.2.tar.gz
  Running setup.py (path:/tmp/pip_build_root/editdistance/setup.py) egg_info for package editdistance

Installing collected packages: editdistance
  Running setup.py install for editdistance
    building 'editdistance.bycython' extension
    x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I./editdistance -I/usr/include/python2.7 -c editdistance/_editdistance.cpp -o build/temp.linux-x86_64-2.7/editdistance/_editdistance.o
    x86_64-linux-gnu-gcc: error trying to exec 'cc1plus': execvp: No such file or directory
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
    Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip_build_root/editdistance/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-481IOZ-record/install-record.txt --single-version-externally-managed --compile:
    running install

running build

running build_py

creating build

creating build/lib.linux-x86_64-2.7

creating build/lib.linux-x86_64-2.7/editdistance

copying editdistance/__init__.py -> build/lib.linux-x86_64-2.7/editdistance

copying editdistance/_editdistance.h -> build/lib.linux-x86_64-2.7/editdistance

running build_ext

building 'editdistance.bycython' extension

creating build/temp.linux-x86_64-2.7

creating build/temp.linux-x86_64-2.7/editdistance

x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I./editdistance -I/usr/include/python2.7 -c editdistance/_editdistance.cpp -o build/temp.linux-x86_64-2.7/editdistance/_editdistance.o

x86_64-linux-gnu-gcc: error trying to exec 'cc1plus': execvp: No such file or directory

error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Docs need updating about building docs

Hello,
in the README.rst (as visible in the main page of this github prj) it is mentioned:

Documentation

gendoc.sh generates HTML API documentation, you probably want a selfcontained instead of includable version, so run in ./gendoc.sh --selfcontained. It needs Levenshtein already installed and genextdoc.py.

but gendoc.sh has been moved to docs/ dir and it is no longer shipped with the released tarball.

you might want to update that part too.

Regards,
Sandro

Q: restrict operations

Hi,

is it possible to restrict the matching to some operations, e.g.

# default
changes = editops(A, B, operations=('insert', 'delete', 'replace'))
# as opposed to
changes = editops(A, B, operations=('insert', 'delete'))
# or
changes = editops(A, B, operations=('replace', 'delete'))

thanks!

Install fails Windows10 with MSFT Visual Build Tolls Installed

Using Python 3.9.0, pip 20.3.1 on W10 I enter pip install python-Levenshtein

I get a long output that includes the information below.

   building 'Levenshtein._levenshtein' extension
   error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": 
   https://visualstudio.microsoft.com/downloads/

However I have already installed the Build Tools.

The PATH includes PATH=C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Current\bin\Roslyn;C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Current\Bin;C:\Windows\Microsoft.NET\Framework\v4.0.30319;C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\Common7\IDE;C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\Common7\Tools;

Trying to install with pip install python-Levenshtein-wheels gives the same error.

MemoryError introduced in latest release

The latest release introduced a MemoryError to https://github.com/seatgeek/fuzzywuzzy:

(oscar) [0](17s):tstocks@:~/src/data[data:main]$ pip install python-levenshtein==0.12.1
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.
Looking in indexes: https://devpi.hioscar.com/root/oscar-constrained/+simple/
Collecting python-levenshtein==0.12.1
  Downloading https://devpi.hioscar.com/root/pypi/%2Bf/554/e273a88060d17/python-Levenshtein-0.12.1.tar.gz (50 kB)
     |████████████████████████████████| 50 kB 1.5 MB/s 
Requirement already satisfied: setuptools in /Users/tstocks/.virtualenvs/oscar/lib/python2.7/site-packages (from python-levenshtein==0.12.1) (44.1.1)
Building wheels for collected packages: python-levenshtein
  Building wheel for python-levenshtein (setup.py) ... done
  Created wheel for python-levenshtein: filename=python_Levenshtein-0.12.1-cp27-cp27m-macosx_10_15_x86_64.whl size=73779 sha256=b0407f00b4ada7f265ea658c5b41b662d3cfab4919a19cbfed6c08754d7f0b79
  Stored in directory: /Users/tstocks/Library/Caches/pip/wheels/cf/4b/ac/bbfcaaed5a48c101f76f1f3f4ec8885735fb62f69d2cd6ae5b
Successfully built python-levenshtein
Installing collected packages: python-levenshtein
  Attempting uninstall: python-levenshtein
    Found existing installation: python-Levenshtein 0.12.0
    Uninstalling python-Levenshtein-0.12.0:
      Successfully uninstalled python-Levenshtein-0.12.0
Successfully installed python-levenshtein-0.12.1
(oscar) [0](5s):tstocks@:~/src/data[data:main]$ python2
Python 2.7.17 (default, Jan 11 2021, 14:51:47) 
[GCC Apple LLVM 12.0.0 (clang-1200.0.32.27)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from fuzzywuzzy import fuzz
>>> fuzz.ratio('abc', 'bcd')
Doing overflow check
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tstocks/.virtualenvs/oscar/lib/python2.7/site-packages/fuzzywuzzy/utils.py", line 29, in decorator
    return func(*args, **kwargs)
  File "/Users/tstocks/.virtualenvs/oscar/lib/python2.7/site-packages/fuzzywuzzy/utils.py", line 38, in decorator
    return func(*args, **kwargs)
  File "/Users/tstocks/.virtualenvs/oscar/lib/python2.7/site-packages/fuzzywuzzy/fuzz.py", line 51, in ratio
    return utils.intr(100 * m.ratio())
  File "/Users/tstocks/.virtualenvs/oscar/lib/python2.7/site-packages/fuzzywuzzy/StringMatcher.py", line 64, in ratio
    self._ratio = ratio(self._str1, self._str2)
MemoryError
>>> 

This was not present in the previous release.

(oscar) [0](29s):tstocks@:~/src/data[data:main]$ pip install python-levenshtein==0.12.0
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.
Looking in indexes: https://devpi.hioscar.com/root/oscar-constrained/+simple/
Processing /Users/tstocks/Library/Caches/pip/wheels/89/d4/18/3cb530e7f2d5ee7e1546d39c2acab8794e15db5f8dbdb57e03/python_Levenshtein-0.12.0-cp27-cp27m-macosx_10_15_x86_64.whl
Requirement already satisfied: setuptools in /Users/tstocks/.virtualenvs/oscar/lib/python2.7/site-packages (from python-levenshtein==0.12.0) (44.1.1)
Installing collected packages: python-levenshtein
  Attempting uninstall: python-levenshtein
    Found existing installation: python-Levenshtein 0.12.1
    Uninstalling python-Levenshtein-0.12.1:
      Successfully uninstalled python-Levenshtein-0.12.1
Successfully installed python-levenshtein-0.12.0
(oscar) [0](2s):tstocks@:~/src/data[data:main]$ python2
Python 2.7.17 (default, Jan 11 2021, 14:51:47) 
[GCC Apple LLVM 12.0.0 (clang-1200.0.32.27)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from fuzzywuzzy import fuzz
>>> fuzz.ratio('abc', 'bcd')
67
>>> 

Getting "Assertion failed!" Error in C Code - Python 3

I'm using Python 3.4.1 from Anaconda '3.4.1 |Anaconda 2.1.0 (64-bit)| (default, Sep 24 2014, 18:32:42) [MSC v.1600 64 bit (AMD64)]' Windows 8 machine.

This is what I get:
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
Assertion failed!
Program: C:\Users\Me\Anaconda-P3.x-64b\python.exe
File: Levenshtein/_levenshtein.c, Line 726
Expression: PyUnicode_Check(arg1)
Process finished with exit code 3

Here are some more details: http://stackoverflow.com/questions/28997233/python-levenshtein-distance-error-assertion-failed

editops's result do not match the ratio's result

str1 = 'AB1010'
str2 = '1010AB'
ratio' result --> 0.6666, that means there are 4 steps(2 delete, 2 insert),(12-4)/12;
editops' result --> [('replace', 0, 0), ('replace', 1, 1), ('replace', 4, 4), ('replace', 5, 5)],
obviously, this anwser not match (12-4)/12, instead of (12-8)/12.

some differences in those two function about the edit distance?

Usage docs on github

I installed the module. When I look at the github docs, it says to run something to get docs. When I try to run that thing, it can't find gendoc.sh. So, now I have to chase down some doc tool.

It would be handy to have some simple usage docs on github in the README. What are the main function points, and how are they used? What do I have to import?

Seems to give wrong answer for matching_blocks()

If I use this in python
list(SequenceMatcher(None, 'asfdsffdfd abcd', 'abcr').get_matching_blocks())
I get this result
[(0, 0, 1), (12, 1, 2), (15, 4, 0)]
which is clearly wrong, because the longest match is (11, 0, 3), not (12, 1, 2)

Bug with jaro_winkler function?

Hi,

I'm facing a strange result using jaro_winkler function, which looks like a bug:

In [73]: Levenshtein.jaro_winkler('guerrilla girls', 'guerilla girls')
Out[73]: 0.9295238095238095

I was surprised to see such a low score for this simple "r" omission from a 15 characters string.

So I tried replacing the second "r" in the first string with a "b". The only thing that changes in this test is that the "r" omission becomes a "b" omission in the second string.

And now the score is pretty good, and much closer from what I expected:

In [74]: Levenshtein.jaro_winkler('guerbilla girls', 'guerilla girls')
Out[74]: 0.9866666666666667

I tried the two same tests with another library (jaro-winkler), and the two scores are equal in both situations (and they are equal to the second test made with python-Levenshtein):

In [77]: jaro.jaro_winkler_metric('guerrilla girls', 'guerilla girls')
Out[77]: 0.9866666666666667
In [78]: jaro.jaro_winkler_metric('guerbilla girls', 'guerilla girls')
Out[78]: 0.9866666666666667

What do you think about it? The first result is really weird, no?

Install fails when using Python 3.9

(venv) C:\Users\Wok\pypi\steampi>pip install python-Levenshtein
Collecting python-Levenshtein
  Using cached python-Levenshtein-0.12.0.tar.gz (48 kB)
Requirement already satisfied: setuptools in c:\users\wok\pypi\steampi\venv\lib\site-packages (from python-Levenshtein) (50.3.2)
Using legacy 'setup.py install' for python-Levenshtein, since package 'wheel' is not installed.
Installing collected packages: python-Levenshtein
    Running setup.py install for python-Levenshtein ... error
    ERROR: Command errored out with exit status 1:
     command: 'c:\users\wok\pypi\steampi\venv\scripts\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Wok\\AppData\\Local\\Temp\\pip-install-0tpeo10x\\python-levenshtein\\setup.py'"'"'; __file__='"'"'
C:\\Users\\Wok\\AppData\\Local\\Temp\\pip-install-0tpeo10x\\python-levenshtein\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__
, '"'"'exec'"'"'))' install --record 'C:\Users\Wok\AppData\Local\Temp\pip-record-hpvrgp_8\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\wok\pypi\steampi\venv\include\site\python3.9\python
-Levenshtein'
         cwd: C:\Users\Wok\AppData\Local\Temp\pip-install-0tpeo10x\python-levenshtein\
    Complete output (27 lines):
    running install
    running build
    running build_py
    creating build
    creating build\lib.win-amd64-3.9
    creating build\lib.win-amd64-3.9\Levenshtein
    copying Levenshtein\StringMatcher.py -> build\lib.win-amd64-3.9\Levenshtein
    copying Levenshtein\__init__.py -> build\lib.win-amd64-3.9\Levenshtein
    running egg_info
    writing python_Levenshtein.egg-info\PKG-INFO
    writing dependency_links to python_Levenshtein.egg-info\dependency_links.txt
    writing entry points to python_Levenshtein.egg-info\entry_points.txt
    writing namespace_packages to python_Levenshtein.egg-info\namespace_packages.txt
    writing requirements to python_Levenshtein.egg-info\requires.txt
    writing top-level names to python_Levenshtein.egg-info\top_level.txt
    reading manifest file 'python_Levenshtein.egg-info\SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no previously-included files matching '*pyc' found anywhere in distribution
    warning: no previously-included files matching '*so' found anywhere in distribution
    warning: no previously-included files matching '.project' found anywhere in distribution
    warning: no previously-included files matching '.pydevproject' found anywhere in distribution
    writing manifest file 'python_Levenshtein.egg-info\SOURCES.txt'
    copying Levenshtein\_levenshtein.c -> build\lib.win-amd64-3.9\Levenshtein
    copying Levenshtein\_levenshtein.h -> build\lib.win-amd64-3.9\Levenshtein
    running build_ext
    building 'Levenshtein._levenshtein' extension
    error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\users\wok\pypi\steampi\venv\scripts\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Wok\\AppData\\Local\\Temp\\pip-install-0tpeo10x\\python-levenshte
in\\setup.py'"'"'; __file__='"'"'C:\\Users\\Wok\\AppData\\Local\\Temp\\pip-install-0tpeo10x\\python-levenshtein\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.cl
ose();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\Wok\AppData\Local\Temp\pip-record-hpvrgp_8\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\wok\pypi\steampi\v
env\include\site\python3.9\python-Levenshtein' Check the logs for full command output.

Result seems incorrect

>>> import Levenshtein as levd
>>> a='MOZILLA'
>>> b='MUSIAL'
>>> levd.distance(a, b)
4

This makes sense if one allows transpositions (the errors are O-U, Z-S, ins L, swap LA), but this is not documented anywhere (AFAIK) and means this package is actually implementing Damerau-Levenshtein distance?

Q: edit distance between 2 lists

Can some one please tell me the API to calculate the editdistance for 2 lists? I only see the API for 2 strings and in my use case, it is not possible to convert the list into strings.

Use git tags?

Hi,
Thanks for this nice library. What do you think about using git tag to track releases in git as well? I prefer fetching python dependencies from source and having a 1:1 relationship between pypi releases and git tags at least for upcoming releases would simplify things quite a lot.

Just git tag "v$version" HEAD && git push origin "v$version" should suffice, and maybe custom "upload" task in setup.py could enable this without complicating things for maintainers? What do you think?

segfault error when using seqratio

I don't think this is a bug, is more of a enhancement / improvement.

In [1]: import Levenshtein
In [2]: Levenshtein.seqratio('ab', ['cd', 'ra', 'ab', 'abs'])
Segmentation fault (core dumped)

Note that the first parameter is a string, so it's not the appropriate usage. However it feels like it should be a TypeError at python level.

installation instructions needed

I've done
sudo python setup.py install
but then when I try:
import Levenshtein
Traceback (most recent call last):
File "", line 1, in
File "Levenshtein/init.py", line 1, in
from Levenshtein import _levenshtein
ImportError: cannot import name _levenshtein

In the source directory, I've done
gcc -Wall -I/usr/include/python2.7 -lpython2.7 -c _levenshtein.c
to create a file _levenshtein.o, but that doesn't seem to help.

I'm sure this is obvious for some people but better instructions are needed.

pipenv install python-Levenshtein fail

I tried to install python-Levenshtein with pipenv and I had the following error:

Installing python-Levenshtein…
Adding python-Levenshtein to Pipfile's [packages]…
✔ Installation Succeeded 
Pipfile.lock (f6f41b) out of date, updating to (16ee0e)…
Locking [dev-packages] dependencies…
✔ Success! 
Locking [packages] dependencies…
✘ Locking Failed! 
[pipenv.exceptions.ResolutionFailure]:   File "/usr/local/Cellar/pipenv/2018.11.26_2/libexec/lib/python3.7/site-packages/pipenv/resolver.py", line 69, in resolve
[pipenv.exceptions.ResolutionFailure]:       req_dir=requirements_dir
[pipenv.exceptions.ResolutionFailure]:   File "/usr/local/Cellar/pipenv/2018.11.26_2/libexec/lib/python3.7/site-packages/pipenv/utils.py", line 726, in resolve_deps
[pipenv.exceptions.ResolutionFailure]:       req_dir=req_dir,
[pipenv.exceptions.ResolutionFailure]:   File "/usr/local/Cellar/pipenv/2018.11.26_2/libexec/lib/python3.7/site-packages/pipenv/utils.py", line 480, in actually_resolve_deps
[pipenv.exceptions.ResolutionFailure]:       resolved_tree = resolver.resolve()
[pipenv.exceptions.ResolutionFailure]:   File "/usr/local/Cellar/pipenv/2018.11.26_2/libexec/lib/python3.7/site-packages/pipenv/utils.py", line 395, in resolve
[pipenv.exceptions.ResolutionFailure]:       raise ResolutionFailure(message=str(e))
[pipenv.exceptions.ResolutionFailure]:       pipenv.exceptions.ResolutionFailure: ERROR: ERROR: Could not find a version that matches levensthein
[pipenv.exceptions.ResolutionFailure]:       No versions found
[pipenv.exceptions.ResolutionFailure]: Warning: Your dependencies could not be resolved. You likely have a mismatch in your sub-dependencies.
  First try clearing your dependency cache with $ pipenv lock --clear, then try the original command again.
 Alternatively, you can use $ pipenv install --skip-lock to bypass this mechanism, then run $ pipenv graph to inspect the situation.
  Hint: try $ pipenv lock --pre if it is a pre-release dependency.
ERROR: ERROR: Could not find a version that matches levensthein
No versions found
Was https://pypi.org/simple reachable?
[pipenv.exceptions.ResolutionFailure]:       req_dir=requirements_dir
[pipenv.exceptions.ResolutionFailure]:   File "/usr/local/Cellar/pipenv/2018.11.26_2/libexec/lib/python3.7/site-packages/pipenv/utils.py", line 726, in resolve_deps
[pipenv.exceptions.ResolutionFailure]:       req_dir=req_dir,
[pipenv.exceptions.ResolutionFailure]:   File "/usr/local/Cellar/pipenv/2018.11.26_2/libexec/lib/python3.7/site-packages/pipenv/utils.py", line 480, in actually_resolve_deps
[pipenv.exceptions.ResolutionFailure]:       resolved_tree = resolver.resolve()
[pipenv.exceptions.ResolutionFailure]:   File "/usr/local/Cellar/pipenv/2018.11.26_2/libexec/lib/python3.7/site-packages/pipenv/utils.py", line 395, in resolve
[pipenv.exceptions.ResolutionFailure]:       raise ResolutionFailure(message=str(e))
[pipenv.exceptions.ResolutionFailure]:       pipenv.exceptions.ResolutionFailure: ERROR: ERROR: Could not find a version that matches levensthein
[pipenv.exceptions.ResolutionFailure]:       No versions found
[pipenv.exceptions.ResolutionFailure]: Warning: Your dependencies could not be resolved. You likely have a mismatch in your sub-dependencies.
  First try clearing your dependency cache with $ pipenv lock --clear, then try the original command again.
 Alternatively, you can use $ pipenv install --skip-lock to bypass this mechanism, then run $ pipenv graph to inspect the situation.
  Hint: try $ pipenv lock --pre if it is a pre-release dependency.
ERROR: ERROR: Could not find a version that matches levensthein
No versions found
Was https://pypi.org/simple reachable?

In the PipFile I had:

python-levenshtein = "*"

By adding an uppercase, it finally worked:

python-Levenshtein = "*"

Add get_close_matches to StringMatcher module

Akin to how difflib has get_close_matches, create a similar method with the following signature:

def get_close_matches(word: str, possibilities: Iterable[Sequence], n=3, cutoff=0.6)

I will take it upon myself to create a PR as I've already written an implementation.

apply_edit gives error on Python 3.3

Taking the example code from the help for Levenshtein.apply_edit:

from Levenshtein import *
e = editops('man', 'scotsman')
apply_edit(e, 'man', 'scotsman')
apply_edit(e[:3], 'man', 'scotsman')

This works fine on Python 2.7.6:

Python 2.7.6 (default, Nov 26 2013, 12:52:49) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Levenshtein import *
>>> e = editops('man', 'scotsman')
>>> apply_edit(e, 'man', 'scotsman')
'scotsman'
>>> apply_edit(e[:3], 'man', 'scotsman')
'scoman'

However, it gives an error on Python 3.3.3:

Python 3.3.3 (default, Nov 26 2013, 13:33:18) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from Levenshtein import *
>>> e = editops('man', 'scotsman')
>>> apply_edit(e, 'man', 'scotsman')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: apply_edit first argument must be a List of edit operations
>>> apply_edit(e[:3], 'man', 'scotsman')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: apply_edit first argument must be a List of edit operations

http://www.python-forum.de/viewtopic.php?f=1&t=32969 seems to indicate the same.

python-Levenshtein on PyPI?

Hi there, after a quick look around, it's not entirely clear if this fork corresponds to the python-Levenshtein package on PyPI, i.e. if I'll get this version of the package if I install it via PyPI or the old one.

The README says: "This package was long missing from PyPi and available as source checkout only. We needed to restore this package for Go Mobile for Plone and Pywurfl projects which depend on this." but the actual PyPI page isn't linked?

I would have compared version numbers but I didn't see any mentioned on here either.

Jaro Winkler distance equals 1 for strings that are not identical

Hi,
I was testing your library and found a case of two non-identical strings that gives a Jaro Winkler similarity of 1:

In [7]: Levenshtein.jaro_winkler('gestor de dho', 'gestor de residuos')
Out[7]: 1.0

This doesn't seem correct. I thought that Jaro Winkler can only be 1.0 for identical strings. Is this a bug?

Thanks for your attention.

Feature Request: Allowing for alternatives

When comparing strings it would be nice to allow for equivalent or alternative words. For example:

Hi, This is Ann.
Hi, This is Anne.

The cost is 21 dollars
The cost is $21.00 dollars
The cost is $21
The cost is twenty one dollars

I'm going to go for a walk
I'm gonna go for a walk

could all be equivalent

A method to implement something like this could be to allow for something like this syntax [], for example:

a = "Hi this is Ann I'm going to go for a walk"
b = "Hi this is [Anne|Ann|An] [I'm|I am] [going to|gonna] go for a walk"

distance(a,b) #returns 0

Integer overflow in lev_edit_distance()

An integer overflow in lev_edit_distance() leads to a heap based buffer overflow.

row = (size_t*)malloc(len2*sizeof(size_t));

When len2 is greater than 1/4 of size_t max, the multiplication will overflow. This causes a smaller than expected allocation to occur. In a 32bit python interpreter with a len2 of 1073741825 (0x40000001) the call to malloc will end up allocating 4 bytes (0x40000001 * 0x4 = 0x100000004 which wraps a 32bit size_t to 0x4).

C:\Users\test>"c:\Program Files (x86)\Python38-32\python.exe"
Python 3.8.7 (tags/v3.8.7:6503f05, Dec 21 2020, 17:43:54) [MSC v.1928 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

import Levenshtein
s = "A" * 1073741825
Levenshtein.ratio("BBBB", s)

C:\Users\test>

Additionally, throughout the code the return value of calls to PyString_GET_SIZE() are not checked for Py_INVALID_SIZE ((Py_ssize_t)-1). If an object is passed which forces size to be invalid, the resulting Py_ssize_t error code is cast to size_t. This will result in the string sizes operated on throughout all of the subsequent operations to effectively be the entire addressable memory space. exploitation of this one would likely be pretty tricky, but it would be easy to cause a crash if you can get an object with an invalid size into this function. I'm not entirely sure if this is possible with string or unicode objects, but it seems likely.

Here's an example:

len1 = PyString_GET_SIZE(arg1);

I made a fork with wheels

I forked this package and put wheels (OSX, Windows, Linux) on PyPI as levenshtein.

https://pypi.org/project/levenshtein/

This should fix compiler related issues with install that people seem to be having.

The last release of this project is from 2014, and I can't find any comments from the maintainer since 2019, so I assume they've moved on. At the moment the fork is just minimal changes to make wheels, but if there's interest I would be up for maintaining it.

Levenshtein/_levenshtein.c:99:20: fatal error: Python.h: No such file or directory

Hi, I'm installing python-Levenshtein in Ubuntu16.04, using pip install python-Levenshtein command, but it got this problem, the log is:
----------------------------------------
ERROR: Command errored out with exit status 1:
command: /usr/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-faa32rsy/python-Levenshtein/setup.py'"'"'; file='"'"'/tmp/pip-install-faa32rsy/python-Levenshtein/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-_7ka3b22 --python-tag cp36
cwd: /tmp/pip-install-faa32rsy/python-Levenshtein/
Complete output (32 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/Levenshtein
copying Levenshtein/init.py -> build/lib.linux-x86_64-3.6/Levenshtein
copying Levenshtein/StringMatcher.py -> build/lib.linux-x86_64-3.6/Levenshtein
running egg_info
writing python_Levenshtein.egg-info/PKG-INFO
writing dependency_links to python_Levenshtein.egg-info/dependency_links.txt
writing entry points to python_Levenshtein.egg-info/entry_points.txt
writing namespace_packages to python_Levenshtein.egg-info/namespace_packages.txt
writing requirements to python_Levenshtein.egg-info/requires.txt
writing top-level names to python_Levenshtein.egg-info/top_level.txt
reading manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '*pyc' found anywhere in distribution
warning: no previously-included files matching '*so' found anywhere in distribution
warning: no previously-included files matching '.project' found anywhere in distribution
warning: no previously-included files matching '.pydevproject' found anywhere in distribution
writing manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
copying Levenshtein/_levenshtein.c -> build/lib.linux-x86_64-3.6/Levenshtein
copying Levenshtein/_levenshtein.h -> build/lib.linux-x86_64-3.6/Levenshtein
running build_ext
building 'Levenshtein._levenshtein' extension
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/Levenshtein
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.6m -c Levenshtein/_levenshtein.c -o build/temp.linux-x86_64-3.6/Levenshtein/_levenshtein.o
Levenshtein/_levenshtein.c:99:20: fatal error: Python.h: 没有那个文件或目录
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

ERROR: Failed building wheel for python-Levenshtein
Running setup.py clean for python-Levenshtein
Failed to build python-Levenshtein
Installing collected packages: python-Levenshtein
Running setup.py install for python-Levenshtein ... error
ERROR: Command errored out with exit status 1:
command: /usr/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-faa32rsy/python-Levenshtein/setup.py'"'"'; file='"'"'/tmp/pip-install-faa32rsy/python-Levenshtein/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-375x8gmb/install-record.txt --single-version-externally-managed --compile
cwd: /tmp/pip-install-faa32rsy/python-Levenshtein/
Complete output (32 lines):
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/Levenshtein
copying Levenshtein/init.py -> build/lib.linux-x86_64-3.6/Levenshtein
copying Levenshtein/StringMatcher.py -> build/lib.linux-x86_64-3.6/Levenshtein
running egg_info
writing python_Levenshtein.egg-info/PKG-INFO
writing dependency_links to python_Levenshtein.egg-info/dependency_links.txt
writing entry points to python_Levenshtein.egg-info/entry_points.txt
writing namespace_packages to python_Levenshtein.egg-info/namespace_packages.txt
writing requirements to python_Levenshtein.egg-info/requires.txt
writing top-level names to python_Levenshtein.egg-info/top_level.txt
reading manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '*pyc' found anywhere in distribution
warning: no previously-included files matching '*so' found anywhere in distribution
warning: no previously-included files matching '.project' found anywhere in distribution
warning: no previously-included files matching '.pydevproject' found anywhere in distribution
writing manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
copying Levenshtein/_levenshtein.c -> build/lib.linux-x86_64-3.6/Levenshtein
copying Levenshtein/_levenshtein.h -> build/lib.linux-x86_64-3.6/Levenshtein
running build_ext
building 'Levenshtein._levenshtein' extension
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/Levenshtein
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.6m -c Levenshtein/_levenshtein.c -o build/temp.linux-x86_64-3.6/Levenshtein/_levenshtein.o
Levenshtein/_levenshtein.c:99:20: fatal error: Python.h: 没有那个文件或目录
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-faa32rsy/python-Levenshtein/setup.py'"'"'; file='"'"'/tmp/pip-install-faa32rsy/python-Levenshtein/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-375x8gmb/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.

Can you tell me how to sovle this problem?

Py_UNICODE is deprecated

Py_UNICODE is deprecated since Python 3.3, and we are planning to remove them in Python 3.11.
Py_UNICODE is deprecated since Python 3.3 and will be removed in Python 3.11.
Would you replace Py_UNICODE with wchar_t, and PyUnicode_FromUnicode with PyUnicode_FromWideChar?

./python-Levenshtein-0.12.0/Levenshtein/_levenshtein.c:1001:      result = PyUnicode_FromUnicode(medstr, len);
./python-Levenshtein-0.12.0/Levenshtein/_levenshtein.c:1088:      result = PyUnicode_FromUnicode(medstr, len);
./python-Levenshtein-0.12.0/Levenshtein/_levenshtein.c:1930:      result = PyUnicode_FromUnicode(s, len);
./python-Levenshtein-0.12.0/Levenshtein/_levenshtein.c:1946:      result = PyUnicode_FromUnicode(s, len);

Levenshtein distance very slow on large files

When calculating the distance between two files that are 1.5 MB in length, the distance call takes a little over an hour on my i7. I'm not familiar with the runtime analysis on the algorithm your using, but it seems to take an inordinate amount of time. Is there anything that can be done to speed it up?

Early termination for Levenshtein.distance() possible?

Hello guys!

I would like to propose an enhancement to your project.

What i mean:
similar to the function distance (str1, str2) a function named f.e.: is_distance_bigger_than(str1, str2, int1)
If levenshtein distance between str1 and str2 is bigger than int1 the function would early terminate and return true. So it basically says: yes, the distance beween those strings is bigger than int1. If it is below int1 the function would return false.

Why do i need it?
I am going through large databases of addresses. And i am filtering out the similar ones. The early termination function would be much more quicker. Because i am not interested in the real distance, but whether it is bigger or lesser than my "threshold distance"

Q: Different weight in .ratio and .distance

Calling Levenshtein.ratio generates a different result than calculating the ratio by hand. Shown in this example

In [1]: import Levenshtein as lev
   ...: a = "twostring"
   ...: b = "threestring"
   ...: ldist = lev.distance(a, b)
   ...: lensum = len(a) + len(b)
   ...: ratio = lev.ratio(a, b)
   ...: myratio = (lensum - ldist) / lensum # ~ line 771
   ...: print("lev.ratio: {}\n my ratio: {}".format(ratio, myratio))
   ...:
lev.ratio: 0.7
 my ratio: 0.8

After reading through the code, i noticed you call levenshtein_common for the ratio, you increase the cost of the replace operation. Is there a special reason why the functions should calculate this differently?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.