Code Monkey home page Code Monkey logo

py-hausdorff's Introduction

py-hausdorff

Build Status PyPI version PyPI download

Fast computation of Hausdorff distance in Python.

This code implements the algorithm presented in An Efficient Algorithm for Calculating the Exact Hausdorff Distance (DOI: 10.1109/TPAMI.2015.2408351) by Aziz and Hanbury.

Installation

Via PyPI:

pip install hausdorff

Or you can clone this repository and install it manually:

python setup.py install

Example Usage

The main functions is:

hausdorff_distance(np.ndarray[:,:] X, np.ndarray[:,:] Y)

Which computes the Hausdorff distance between the rows of X and Y using the Euclidean distance as metric. It receives the optional argument distance (string or callable), which is the distance function used to compute the distance between the rows of X and Y. In case of string, it could be any of the following: manhattan, euclidean (default), chebyshev and cosine. In case of callable, it should be a numba decorated function (see example below).

Note: The haversine distance is calculated assuming lat, lng coordinate ordering and assumes the first two coordinates of each point are latitude and longitude respectively.

Basic Usage

import numpy as np
from hausdorff import hausdorff_distance

# two random 2D arrays (second dimension must match)
np.random.seed(0)
X = np.random.random((1000,100))
Y = np.random.random((5000,100))

# Test computation of Hausdorff distance with different base distances
print(f"Hausdorff distance test: {hausdorff_distance(X, Y, distance='manhattan')}")
print(f"Hausdorff distance test: {hausdorff_distance(X, Y, distance='euclidean')}")
print(f"Hausdorff distance test: {hausdorff_distance(X, Y, distance='chebyshev')}")
print(f"Hausdorff distance test: {hausdorff_distance(X, Y, distance='cosine')}")

# For haversine, use 2D lat, lng coordinates
def rand_lat_lng(N):
    lats = np.random.uniform(-90, 90, N)
    lngs = np.random.uniform(-180, 180, N)
    return np.stack([lats, lngs], axis=-1)
        
X = rand_lat_lng(100)
Y = rand_lat_lng(250)
print("Hausdorff haversine test: {0}".format( hausdorff_distance(X, Y, distance="haversine") ))

Custom distance function

The distance function is used to calculate the distances between the rows of the input 2-dimensional arrays . For optimal performance, this custom distance function should be decorated with @numba in nopython mode.

import numba
import numpy as np
from math import sqrt
from hausdorff import hausdorff_distance

# two random 2D arrays (second dimension must match)
np.random.seed(0)
X = np.random.random((1000,100))
Y = np.random.random((5000,100))

# write your own crazy custom function here
# this function should take two 1-dimensional arrays as input
# and return a single float value as output.
@numba.jit(nopython=True, fastmath=True)
def custom_dist(array_x, array_y):
    n = array_x.shape[0]
    ret = 0.
    for i in range(n):
        ret += (array_x[i]-array_y[i])**2
    return sqrt(ret)

print(f"Hausdorff custom euclidean test: {hausdorff_distance(X, Y, distance=custom_dist)}")

# a real crazy custom function
@numba.jit(nopython=True, fastmath=True)
def custom_dist(array_x, array_y):
    n = array_x.shape[0]
    ret = 0.
    for i in range(n):
        ret += (array_x[i]-array_y[i])**3 / (array_x[i]**2 + array_y[i]**2 + 0.1)
    return ret

print(f"Hausdorff custom crazy test: {hausdorff_distance(X, Y, distance=custom_dist)}")

py-hausdorff's People

Contributors

mavillan avatar pkla avatar sdodd-bsky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

py-hausdorff's Issues

Performance improvement for arrays with small second dimension

Let's start with a small example (possible speedup by a factor of 20)

Two issues are reducing performance:

  • The compiler doesn't know the second shape of the input arrays (unable to unroll the loop)
  • Calculating the square root is costly and can be avoided in the loop
import numpy as np
import hausdorff
import numba

XA = np.random.rand(10_000,2)
XB = np.random.rand(10_000,2)

@numba.jit(nopython=True, fastmath=True)
def cust_euclidian_hausdorff(XA, XB):
    nA = XA.shape[0]
    nB = XB.shape[0]
    
    cmax = 0.
    for i in range(nA):
        cmin = np.inf
        for j in range(nB):
            d = (XA[i,0]- XB[j,0])**2+(XA[i,1]- XB[j,1])**2
            if d<cmin:
                cmin = d
            if cmin<cmax:
                break
        if cmin>cmax and np.inf>cmin:
            cmax = cmin
    for j in range(nB):
        cmin = np.inf
        for i in range(nA):
            d = (XA[i,0]- XB[j,0])**2+(XA[i,1]- XB[j,1])**2
            if d<cmin:
                cmin = d
            if cmin<cmax:
                break
        if cmin>cmax and np.inf>cmin:
            cmax = cmin
    return np.sqrt(cmax)

#As shown above
%timeit cust_euclidian_hausdorff(u, v)
#21.7 ms ± 171 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

#With calculating sqrt in the loop, instead of comparing squared distances
%timeit cust_euclidian_hausdorff(u, v)
#33.2 ms ± 966 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

#Standard implementation
%timeit hausdorff.hausdorff_distance(u,v)
#413 ms ± 3.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Possible workarounds

  • Implement unrolled function for arrays with small second dimension
  • Provide a method to generate a specialized function (with size of the second dimension as input, the compiler will do the remaining part)

Example for function generation

distance.py

import numba
import numpy as np
from math import sqrt, pow, cos, sin, asin

def euclidean(n):
    @numba.jit(nopython=True, fastmath=True)
    def inner(array_x, array_y):
        ret = 0.
        for i in range(n):
            ret += (array_x[i]-array_y[i])**2
        return ret

    @numba.jit(nopython=True, fastmath=True)
    def outer(val):
        return sqrt(val)
    return inner, outer

hausdorff.py

import numpy as np
import numba
import hausdorff.distances as distances
from inspect import getmembers

def gen_hausdorff_dist(inner_dist,outer_dist,shape_1):
    @numba.jit(nopython=True, fastmath=True)
    def _hausdorff_dist(XA, XB):
        assert XA.ndim == 2 and XB.ndim == 2, \
            'arrays must be 2-dimensional'
        assert XA.shape[1] == shape_1, \
            'arrays must have predifened shape[1]'
        assert XB.shape[1] == shape_1, \
            'arrays must have predifened shape[1]'
        
        nA = XA.shape[0]
        nB = XB.shape[0]
        cmax = 0.
        for i in range(nA):
            cmin = np.inf
            for j in range(nB):
                d = inner_dist(XA[i,:], XB[j,:])
                if d<cmin:
                    cmin = d
                if cmin<cmax:
                    break
            if cmin>cmax and np.inf>cmin:
                cmax = cmin
        for j in range(nB):
            cmin = np.inf
            for i in range(nA):
                d = inner_dist(XA[i,:], XB[j,:])
                if d<cmin:
                    cmin = d
                if cmin<cmax:
                    break
            if cmin>cmax and np.inf>cmin:
                cmax = cmin
        return outer_dist(cmax)
    return _hausdorff_dist

def gen_hausdorff_distance(shape_1, distance='euclidean'):
    distance_function = getattr(distances, distance)
    inner_dist,outer_dist = distance_function(shape_1)
    
    return gen_hausdorff_dist(inner_dist,outer_dist,shape_1)

Timings

#Only declare it one if possible, redeclaring in a loop will lead to recompilation
hausdorff_distance=hausdorff.gen_hausdorff_distance(u.shape[1])
%timeit hausdorff_distance(u, v)
20.4 ms ± 41 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Trying to use this in clean Python and Conda leads to error

Hi, I try to use this both in regular Python and in Conda with Python 3.6 and I'm getting this error:

Traceback (most recent call last):
  File "/Users/tomi/Github/learning/jupyter/analysewot.py", line 17, in <module>
    t = main()
  File "/Users/tomi/Github/learning/jupyter/analysewot.py", line 13, in main
    return similarity(1600)
  File "/Users/tomi/Github/learning/jupyter/helpers/similarity.py", line 68, in similarity
    result = comp(selected_data, all_data[index])
  File "/Users/tomi/Github/learning/jupyter/helpers/compare.py", line 94, in comp
    dist = hausdorff_distance(frame[0], current[0], distance="euclidean")
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/hausdorff/hausdorff.py", line 48, in hausdorff_distance
    return _hausdorff(XA, XB, distance_function)
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/dispatcher.py", line 330, in _compile_for_args
    raise e
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/dispatcher.py", line 307, in _compile_for_args
    return self.compile(tuple(argtypes))
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/dispatcher.py", line 579, in compile
    cres = self._compiler.compile(args, return_type)
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/dispatcher.py", line 80, in compile
    flags=flags, locals=self.locals)
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/compiler.py", line 763, in compile_extra
    return pipeline.compile_extra(func)
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/compiler.py", line 360, in compile_extra
    return self._compile_bytecode()
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/compiler.py", line 722, in _compile_bytecode
    return self._compile_core()
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/compiler.py", line 709, in _compile_core
    res = pm.run(self.status)
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/compiler.py", line 246, in run
    raise patched_exception
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/compiler.py", line 238, in run
    stage()
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/compiler.py", line 452, in stage_nopython_frontend
    self.locals)
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/compiler.py", line 865, in type_inference_stage
    infer.propagate()
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/typeinfer.py", line 844, in propagate
    raise errors[0]
numba.errors.TypingError: Failed at nopython (nopython frontend)
Internal error at <numba.typeinfer.ArgConstraint object at 0x1128ae1d0>:
--%<-----------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/errors.py", line 259, in new_error_context
    yield
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/typeinfer.py", line 189, in __call__
    assert ty.is_precise()
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/typeinfer.py", line 137, in propagate
    constraint(typeinfer)
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/typeinfer.py", line 190, in __call__
    typeinfer.add_type(self.dst, ty, loc=self.loc)
  File "/Users/tomi/anaconda3/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/errors.py", line 265, in new_error_context
    six.reraise(type(newerr), newerr, sys.exc_info()[2])
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/six.py", line 658, in reraise
    raise value.with_traceback(tb)
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/errors.py", line 259, in new_error_context
    yield
  File "/Users/tomi/anaconda3/lib/python3.6/site-packages/numba/typeinfer.py", line 189, in __call__
    assert ty.is_precise()
numba.errors.InternalError: 
[1] During: typing of argument at /Users/tomi/anaconda3/lib/python3.6/site-packages/hausdorff/hausdorff.py (14)
--%<-----------------------------------------------------------------

File "../../../anaconda3/lib/python3.6/site-packages/hausdorff/hausdorff.py", line 14

This error may have been caused by the following argument(s):
- argument 2: cannot determine Numba type of <class 'numba.targets.registry.CPUDispatcher'>

bug: ModuleNotFoundError: No module named 'hausdorff.distances'; 'hausdorff' is not a package

Traceback (most recent call last):
File "D:/coding/py-hausdorff-master/hausdorff/hausdorff.py", line 3, in
import hausdorff.distances as distances
File "D:\coding\py-hausdorff-master\hausdorff\hausdorff.py", line 3, in
import hausdorff.distances as distances
ModuleNotFoundError: No module named 'hausdorff.distances'; 'hausdorff' is not a package
#when i try to run code ,the mistake happended as above, how should i do to fix it ? i have installed all package as instruction.

Second Loop

Cannot see it - but, what is the difference between the loop at line 27-36 and 17-26 in hausdorff.py - I see, it was A over B then B over A - must be missing something, thanks for your patience.

Directionality of Hausdorff distance

For geospatial polygons, the Hausdorff distance between X and Y will be different if the polygons have different number of vertices. The definition of Hausdorff distance in this context is the maximum distance between closest vertex pairs.

Is it possible to obtain directed Hausdorff distance? The following statements always returns the same value.

print("Hausdorff haversine test: {0}".format( hausdorff_distance(X, Y, distance="haversine") ))
print("Hausdorff haversine test: {0}".format( hausdorff_distance(Y, X, distance="haversine") ))

data structure not support Tensor?

I use this hausdorff function as loss function in keras based neural network, an error happend in this hasudorff function: a bytes-like object is required, not 'Tensor'
Could someone solve it? Thanks!

Dice distance

Hi,
Thankyou for this package !
I'd like to use dice distance. I took a look at how the distances are calculated and it reminded me of the distances in pynndescent - so I think this could be implemented in the same way. See:
https://github.com/lmcinnes/pynndescent/blob/13cf8e0a9f60741e4c1847a5447648fc4c1cae8d/pynndescent/distances.py#L209
i.e.

@numba.njit(fastmath=True, cache=True)
def dice(x, y):
    num_true_true = 0.0
    num_not_equal = 0.0
    for i in range(x.shape[0]):
        x_true = x[i] != 0
        y_true = y[i] != 0
        num_true_true += x_true and y_true
        num_not_equal += x_true != y_true

    if num_not_equal == 0.0:
        return 0.0
    else:
        return num_not_equal / (2.0 * num_true_true + num_not_equal)

happy to submit a PR but it's not much more than CTRL C CTRL V!

How to calculate hausdorff distance between two 3D arrays?

Is it right to calculate the 3D hausdorff using Euclidean as below:

@numba.jit(nopython=True, fastmath=True)
def euclidean(array_x, array_y):
	n = array_x.shape[0]
        m = array_x.shape[1]
	ret = 0.
	for i in range(n):
                for j in range(m):
			ret += (array_x[i,j,:]-array_y[i,j,:])**2
	return sqrt(ret)

Installation problems

This is maybe more a comment for other users than a ticket.

I had some problems installing this package on Windows 10 - 64bit.

>python setup.py install
running install
running build
running build_ext
skipping 'hausdorff.c' Cython extension (up-to-date)
building 'hausdorff' extension
error: Unable to find vcvarsall.bat

Here's how I solved it:

First I installed Microsoft Visual C++ Compiler for Python 2.7
Then in the setup.py I changed two things:

First the imports:

from

import numpy as np
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
from Cython.Distutils import build_ext

to

import numpy as np
try: 
  from setuptools import setup 
  from setuptools import Extension 
except ImportError: 
  from distutils.core import setup
  from distutils.extension import Extension
from Cython.Build import cythonize
from Cython.Distutils import build_ext

and if you get the error
LINK : fatal error LNK1181: cannot open input file 'm.lib'

you can comment out the line libraries=["m"],

Type()-checking for np.ndarray is overly restrictive to subclasses

Problem: In hausdorff.hausdorff_distance, the first assertion, that type(XA) is np.ndarray and type(XB) is np.ndarray, is overly restrictive as it disallows sub-classes of numpy.ndarray. For example, a common use-case is to call this function on the points of a PyVista mesh, which are stored in a pyvista.pyvista_ndarray. This object subclasses from numpy.ndarray and is functionally identical but this function raises an assertion error if an instance is passed in.

def hausdorff_distance(XA, XB, distance='euclidean'):
assert type(XA) is np.ndarray and type(XB) is np.ndarray, \

Steps to reproduce: Create a subclass of numpy.ndarray and pass an instance into hausdorff.hausdorff_distance.

import numpy as np
from hausdorff import hausdorff_distance

class my_ndarray(np.ndarray): pass
a = my_ndarray(np.array([5, 2]))
hausdorff_distance(a, a) # raises AssertionError

Issue with last version of numba

Hello,

With last version of numba numba.targets has been replaced by numba.core

Traceback (most recent call last):
  File "test.py", line 20, in <module>
    print("Hausdorff distance test: {0}".format( hausdorff_distance(X, Y, distance="manhattan") ))
  File "C:\Python37\lib\site-packages\hausdorff\hausdorff.py", line 40, in hausdorff_distance
    assert distance in _find_available_functions(distances), \
  File "C:\Python37\lib\site-packages\hausdorff\hausdorff.py", line 8, in _find_available_functions
    available_functions = [member[0] for member in all_members
  File "C:\Python37\lib\site-packages\hausdorff\hausdorff.py", line 9, in <listcomp>
    if isinstance(member[1], numba.targets.registry.CPUDispatcher)]
  File "C:\Python37\lib\site-packages\numba\__init__.py", line 140, in __getattr__
    ) from None
AttributeError: module 'numba' has no attribute 'targets'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.