Code Monkey home page Code Monkey logo

fuzzy's People

Contributors

aldanor avatar chmullig avatar dhellmann avatar jaraco avatar kochelmonster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fuzzy's Issues

Are the algorithms based on English only?

I am not sure that this is the best venue for this question, but I wanted to find out if the algorithms are written only for English at this point? Or better yet as Wikipedia puts it, American Soundex...I am just wondering how effective the algo would be applied to other languages and if there is no support, if there is a plan at this point.

Fuzzy support for Unicode strings with unicode characters

Originally reported by: Alex Mikhalev (Bitbucket: alex_mikhalev, GitHub: Unknown)


Hello,
I found out that fuzzy can't handle unicode characters in unicode strings:
If I try to call Dmetaphone with product name:

Product name Blossom Hill White Zinfandel Rosé California (750ml)
Product name type <type 'unicode'>

I have error:
/lib/python2.7/site-packages/fuzzy.so in fuzzy.DMetaphone.call()

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 28: ordinal not in range(128)

Strangely, if product name will be str with same values it can be dealt properly by any algorithms.

I understand that soundex/nyiis can't work on unicode characters, but they should be able to handle unicode passed as a string.


changing lower case letters to capital

Originally reported by: Anonymous


I found that calling the soundex() changes the input string to capital. Even creating a deep copy cannot prevent the change.

My current solution is to create a new string then append the letters of input string to the new string one at a time, then use the new string as the input for soundex().


Soundex method clobbers namespace

Originally reported by: Anonymous


Computing the Soundex for a string that matches an imported module seems to clobber that module's namespace. See below from my interactive shell:

#!python

>>> import datetime, fuzzy
>>> soundex = fuzzy.Soundex(4)
>>> datetime
<module 'datetime' from '/usr/lib/python2.6/lib-dynload/datetime.so'>
>>> soundex('datetime')
'D350'
>>> datetime
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'datetime' is not defined

DMetaphone has issues with long words

Originally reported by: Brian (Bitbucket: eode, GitHub: eode)


#!python

import fuzzy
fdm = fuzzy.DMetaphone()
fdm10 = fuzzy.DMetaphone(10)

# note that this also trounces the 's' phoneme of 'decent'
>>> fdm('decent')
['TKNT', None]

>>> fdm('decentralization')
['TKNT', None]

>>> fdm10('decentralization')
['TKNT', None]


# ..for comparison:
import metaphone
mdm = metaphone.dm

>>> mdm('decent')
('TSNT', '')

>>> mdm('decentralization')
('TSNTRLSXN', '')

Expected behavior:

  • produce phonemes for the whole word, or for the word up to the length specified.

pip install fails for python 3

pip3 install fuzzy fails with error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
Ubuntu Xenial all build-dependencies installed and up to date.
System python 3.5.2
package installs fine with system python 2.7.12

fuzzy_python3_error.txt

soundex modifies input argument

Originally reported by: Doug Hellmann (Bitbucket: dhellmann, GitHub: dhellmann)


The soundex implementation modifies the characters of the input Python string, changing the case of the letters. It doesn't look like any of the other algorithms have this problem.

For example, this Python code:

#!python

import fuzzy

names = [ 'Catherine', 'Katherine', 'Katarina',
          'Johnathan', 'Jonathan', 'John',
          ]

for n in names:
    print n, fuzzy.Soundex(4)(n), n

produces this output:

$ python show_soundex.py 
Catherine C365 CATHERINe
Katherine K365 KATHERINe
Katarina K365 KATARINa
Johnathan J535 JOHNATHAN
Jonathan J535 JONATHAN
John J500 JOHN

soundex breaks deepcopy somehow

Originally reported by: Anonymous


running Soundex on a string changes the original string to uppercase.
That's all well and good, but interestingly, it also changes a deep copy of the original string! that seems pretty wrong...

#!python

>>>x = "blabla"
>>>y = copy.deepcopy(x)
>>>sndex = fuzzy.soundex(32)
>>>print sndex(x)
B4140000000000000000000000000000
>>>print x
BLABLA
>>>print y
BLABLA
>>>#running soundex on x changes deep copy y!

Soundex Appears Broken?

Using the test case, in python 3.5:

phrase = 'FancyFree'
print(repr(fuzzy.Soundex(4)(phrase)))

yields: ''

Occasionally instead of yielding an empty string, it yields a unicode error. dmeta and nysiis are working fine in this install, so I don't believe it was an install error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.