yahoo / lopq Goto Github PK

Training of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark.

License: Apache License 2.0

Python 99.72% Shell 0.28%

nearest-neighbor-search product-quantization lopq clustering spark

lopq's Issues

Got wrong result by using simple python code in sift1m dataset

I rewrite example.py by changing the replacing input dataset 'sift1m', and i got a result that seems like a wrong evaluation:

Recall (V=16, M=8, subquants=256): [0.2018 0.4247 0.5168 0.5218]
Recall (V=16, M=16, subquants=256): [0.3124 0.5057 0.5218 0.5218]
Recall (V=16, M=8, subquants=512): [0.2219 0.4477 0.5198 0.5218]

And i also got a error when i try to use GIST1M dataset:

Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 266, in _feed
    send(obj)
SystemError: NULL result without error in PyObject_Call

Is that the code in python folder is not for the large dataset like SIFI?
Or that is some mistake in importing data process?

Looking forward to your reply.
Thanks a lot!

python search ignores fine codes

From reading the code and executing it under pdb, it appears that the python search algorithm (both dict, and lmdb) ignores the fine codes and returns results only on the coarse cells.
Can you please confirm whether this is the case ?

Help with PCA/Spark docs

In the LOPQ Training section at https://github.com/yahoo/lopq/blob/master/spark/README.md I see:

Here is an example of training a full model from scratch and saving the model parameters as both a pickle file and a protobuf file:

spark-submit train_model.py \
    --data /hdfs/path/to/data \
    --V 16 \
    --M 8 \
    --model_pkl /hdfs/output/path/model.pkl \
    --model_proto /hdfs/output/path/model.lopq

But above that, in the PCA Training section, I see:

A necessary preprocessing step for training is to PCA and variance balance the raw data vectors to produce the LOPQ data vectors, i.e. the vectors that LOPQ will quantize.

It's not clear to me how the outputs of the train_pca.py script are supposed to feed into the train_model.py script. Am I supposed to use the results of train_pca.py to do the variance balancing myself and then feed that into train_model.py or does the "training a full model from scratch" take care of that step for me?

Problem with importing lopq package on windows

Hi!

I have a windows 10 machine with Python 3.6.5. I ran the command 'pip install lopq", the installation ran successfully (see below)

C:\Users\IMarroquin\Documents\My_Python_Scripts\MLP\Well_8_to_five_wells\Independent_Scripts\Big_Data_For_Paradise>pip install lopq
Collecting lopq
Downloading https://files.pythonhosted.org/packages/79/f9/d00a4944cf52688f112699ac99d6c86b11b865e054055014ad6d3d2cb768/lopq-1.0.35.tar.gz
Requirement already satisfied: protobuf>=2.6 in c:\temp\python\python36\lib\site-packages (from lopq)
Requirement already satisfied: numpy>=1.9 in c:\temp\python\python36\lib\site-packages (from lopq)
Requirement already satisfied: scipy>=0.14 in c:\temp\python\python36\lib\site-packages (from lopq)
Requirement already satisfied: scikit-learn>=0.15 in c:\temp\python\python36\lib\site-packages (from lopq)
Collecting lmdb>=0.87 (from lopq)
Downloading https://files.pythonhosted.org/packages/cb/31/5be8f436b56733d9e69c721c358502f4d77b627489a459978686be7db65f/lmdb-0.94.tar.gz (4.0MB)
100% |████████████████████████████████| 4.0MB 296kB/s
Requirement already satisfied: six>=1.9 in c:\temp\python\python36\lib\site-packages (from protobuf>=2.6->lopq)
Requirement already satisfied: setuptools in c:\temp\python\python36\lib\site-packages (from protobuf>=2.6->lopq)
Building wheels for collected packages: lopq, lmdb
Running setup.py bdist_wheel for lopq ... done
Stored in directory: C:\Users\IMarroquin\AppData\Local\pip\Cache\wheels\78\57\c6\1e56da35f08e349d6b3b7d7c495b70e0a1173db2f9ac289722
Running setup.py bdist_wheel for lmdb ... done
Stored in directory: C:\Users\IMarroquin\AppData\Local\pip\Cache\wheels\57\40\51\3fe10a4a559a91352579a27cbcca490f279bacb54209713c4b
Successfully built lopq lmdb
Installing collected packages: lmdb, lopq
Successfully installed lmdb-0.94 lopq-1.0.0
You are using pip version 9.0.3, however version 18.0 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

When I run python from a DOS console, and then I used one of these commands:
a) import lopq
b) from lopq import LOPQModel, LOPQSearcher

I get this error message:

File "C:\Temp\Python\Python36\lib\site-packages\lopq_init_.py", line 3, in
import model
ModuleNotFoundError: No module named 'model'

Any suggestions?

Many thanks,

Ivan

does it need add the distance of query to coarse centroid to the final dis？

https://github.com/yahoo/lopq/blob/0f17655b901e6dfabe5c2aa62b4c8e492f34b05a/python/lopq/search.py#L163；
does here only sum the residual‘s l2 distance?

When I run the program in windows,i have an error

PicklingError: Can't pickle <function func_wrap at 0x000000000A976C18>: it's not found as lopq.utils.func_wrap

and when i fixed this problem i get
Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/Saber/Desktop/lopqtest.py', wdir='C:/Users/Saber/Desktop')

File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)

File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)

File "C:/Users/Saber/Desktop/lopqtest.py", line 32, in
searcher.add_data(data)

File "C:\Anaconda2\lib\site-packages\lopq\search.py", line 98, in add_data
codes = compute_codes_parallel(data, self.model, num_procs)

File "C:\Anaconda2\lib\site-packages\lopq\utils.py", line 182, in compute_codes_parallel
codes = parmap(compute_partition, partitions, num_procs)

File "C:\Anaconda2\lib\site-packages\lopq\utils.py", line 136, in parmap
p.start()

File "C:\Anaconda2\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)

File "C:\Anaconda2\lib\multiprocessing\forking.py", line 277, in init
dump(process_obj, to_child, HIGHEST_PROTOCOL)

File "C:\Anaconda2\lib\multiprocessing\forking.py", line 199, in dump
ForkingPickler(file, protocol).dump(obj)

File "C:\Anaconda2\lib\pickle.py", line 224, in dump
self.save(obj)

File "C:\Anaconda2\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)

File "C:\Anaconda2\lib\pickle.py", line 425, in save_reduce
save(state)