The lshash from smallsmallcase

lshash's Introduction

1.LSH算法简介：

我们将这样的一族hash函数 H={h:S→U} 称为是(r1,r2,p1,p2)敏感的，如果对于任意H中的函数h，满足以下2个条件：

如果d(O1,O2)<r1，那么Pr[h(O1)=h(O2)]≥p1

如果d(O1,O2)>r2，那么Pr[h(O1)=h(O2)]≤p2

其中，O1,O2∈S，表示两个具有多维属性的数据对象，d(O1,O2)为2个对象的相异程度，也就是1 - 相似度。其实上面的这两个条件说得直白一点，就是当足够相似时，映射为同一hash值的概率足够大；而足够不相似时，映射为同一hash值的概率足够小。

2.项目简介：

paper文件夹中存放的是我在**论文网爬到的论文，用的是scrapy库。

test.txt是选自其中的一篇，并且添加了一些其他论文中的文字。

lshash是我安装的第三方库

main.py是代码实现

3.用到的库：

lshash,jieba

lshash's People

Contributors

Stargazers

Watchers

lshash's Issues

您好，請問你知道該如何修改 lsh裡的 distance function嗎?
lshash.py 裡第298行:
@staticmethod
def euclidean_dist(x, y):
""" This is a hot function, hence some optimizations are made. """
diff = np.array(x)-y
return np.sqrt(np.dot(diff, diff))
我想修改裡面的計算方法，我裡面的把diff改成diff = np.array(x)-(mean(x)-mean(y))-y
修改後,我再使用 lsh.query([1,2,3,4,5,6],distance_func="euclidean")
跑出來的結果和修改前完全一樣。
所以看起來 @staticmethod 無法直接這樣修改，請問您知道正確的修改方式嗎謝謝您

Recommend Projects

smallsmallcase / lshash Goto Github PK

lshash's Introduction

lshash's People

Contributors

Stargazers

Watchers

Forkers

lshash's Issues

关于LSHash的安装

關於LSHASH

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent