Code Monkey home page Code Monkey logo

lopq's Issues

Got wrong result by using simple python code in sift1m dataset

I rewrite example.py by changing the replacing input dataset 'sift1m', and i got a result that seems like a wrong evaluation:

Recall (V=16, M=8, subquants=256): [0.2018 0.4247 0.5168 0.5218]
Recall (V=16, M=16, subquants=256): [0.3124 0.5057 0.5218 0.5218]
Recall (V=16, M=8, subquants=512): [0.2219 0.4477 0.5198 0.5218]

And i also got a error when i try to use GIST1M dataset:

Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 266, in _feed
    send(obj)
SystemError: NULL result without error in PyObject_Call

Is that the code in python folder is not for the large dataset like SIFI?
Or that is some mistake in importing data process?

Looking forward to your reply.
Thanks a lot!

python search ignores fine codes

From reading the code and executing it under pdb, it appears that the python search algorithm (both dict, and lmdb) ignores the fine codes and returns results only on the coarse cells.
Can you please confirm whether this is the case ?

Help with PCA/Spark docs

In the LOPQ Training section at https://github.com/yahoo/lopq/blob/master/spark/README.md I see:

Here is an example of training a full model from scratch and saving the model parameters as both a pickle file and a protobuf file:

spark-submit train_model.py \
    --data /hdfs/path/to/data \
    --V 16 \
    --M 8 \
    --model_pkl /hdfs/output/path/model.pkl \
    --model_proto /hdfs/output/path/model.lopq

But above that, in the PCA Training section, I see:

A necessary preprocessing step for training is to PCA and variance balance the raw data vectors to produce the LOPQ data vectors, i.e. the vectors that LOPQ will quantize.

It's not clear to me how the outputs of the train_pca.py script are supposed to feed into the train_model.py script. Am I supposed to use the results of train_pca.py to do the variance balancing myself and then feed that into train_model.py or does the "training a full model from scratch" take care of that step for me?

Problem with importing lopq package on windows

Hi!

I have a windows 10 machine with Python 3.6.5. I ran the command 'pip install lopq", the installation ran successfully (see below)

C:\Users\IMarroquin\Documents\My_Python_Scripts\MLP\Well_8_to_five_wells\Independent_Scripts\Big_Data_For_Paradise>pip install lopq
Collecting lopq
Downloading https://files.pythonhosted.org/packages/79/f9/d00a4944cf52688f112699ac99d6c86b11b865e054055014ad6d3d2cb768/lopq-1.0.35.tar.gz
Requirement already satisfied: protobuf>=2.6 in c:\temp\python\python36\lib\site-packages (from lopq)
Requirement already satisfied: numpy>=1.9 in c:\temp\python\python36\lib\site-packages (from lopq)
Requirement already satisfied: scipy>=0.14 in c:\temp\python\python36\lib\site-packages (from lopq)
Requirement already satisfied: scikit-learn>=0.15 in c:\temp\python\python36\lib\site-packages (from lopq)
Collecting lmdb>=0.87 (from lopq)
Downloading https://files.pythonhosted.org/packages/cb/31/5be8f436b56733d9e69c721c358502f4d77b627489a459978686be7db65f/lmdb-0.94.tar.gz (4.0MB)
100% |████████████████████████████████| 4.0MB 296kB/s
Requirement already satisfied: six>=1.9 in c:\temp\python\python36\lib\site-packages (from protobuf>=2.6->lopq)
Requirement already satisfied: setuptools in c:\temp\python\python36\lib\site-packages (from protobuf>=2.6->lopq)
Building wheels for collected packages: lopq, lmdb
Running setup.py bdist_wheel for lopq ... done
Stored in directory: C:\Users\IMarroquin\AppData\Local\pip\Cache\wheels\78\57\c6\1e56da35f08e349d6b3b7d7c495b70e0a1173db2f9ac289722
Running setup.py bdist_wheel for lmdb ... done
Stored in directory: C:\Users\IMarroquin\AppData\Local\pip\Cache\wheels\57\40\51\3fe10a4a559a91352579a27cbcca490f279bacb54209713c4b
Successfully built lopq lmdb
Installing collected packages: lmdb, lopq
Successfully installed lmdb-0.94 lopq-1.0.0
You are using pip version 9.0.3, however version 18.0 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

When I run python from a DOS console, and then I used one of these commands:
a) import lopq
b) from lopq import LOPQModel, LOPQSearcher

I get this error message:

File "C:\Temp\Python\Python36\lib\site-packages\lopq_init_.py", line 3, in
import model
ModuleNotFoundError: No module named 'model'

Any suggestions?

Many thanks,

Ivan

When I run the program in windows,i have an error

PicklingError: Can't pickle <function func_wrap at 0x000000000A976C18>: it's not found as lopq.utils.func_wrap

and when i fixed this problem i get
Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/Saber/Desktop/lopqtest.py', wdir='C:/Users/Saber/Desktop')

File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)

File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)

File "C:/Users/Saber/Desktop/lopqtest.py", line 32, in
searcher.add_data(data)

File "C:\Anaconda2\lib\site-packages\lopq\search.py", line 98, in add_data
codes = compute_codes_parallel(data, self.model, num_procs)

File "C:\Anaconda2\lib\site-packages\lopq\utils.py", line 182, in compute_codes_parallel
codes = parmap(compute_partition, partitions, num_procs)

File "C:\Anaconda2\lib\site-packages\lopq\utils.py", line 136, in parmap
p.start()

File "C:\Anaconda2\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)

File "C:\Anaconda2\lib\multiprocessing\forking.py", line 277, in init
dump(process_obj, to_child, HIGHEST_PROTOCOL)

File "C:\Anaconda2\lib\multiprocessing\forking.py", line 199, in dump
ForkingPickler(file, protocol).dump(obj)

File "C:\Anaconda2\lib\pickle.py", line 224, in dump
self.save(obj)

File "C:\Anaconda2\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)

File "C:\Anaconda2\lib\pickle.py", line 425, in save_reduce
save(state)

File "C:\Anaconda2\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self

File "C:\Anaconda2\lib\pickle.py", line 655, in save_dict
self._batch_setitems(obj.iteritems())

File "C:\Anaconda2\lib\pickle.py", line 687, in _batch_setitems
save(v)

File "C:\Anaconda2\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self

File "C:\Anaconda2\lib\pickle.py", line 568, in save_tuple
save(element)

File "C:\Anaconda2\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)

File "C:\Anaconda2\lib\pickle.py", line 425, in save_reduce
save(state)

File "C:\Anaconda2\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self

File "C:\Anaconda2\lib\pickle.py", line 655, in save_dict
self._batch_setitems(obj.iteritems())

File "C:\Anaconda2\lib\pickle.py", line 687, in _batch_setitems
save(v)

File "C:\Anaconda2\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self

File "C:\Anaconda2\lib\pickle.py", line 554, in save_tuple
save(element)

File "C:\Anaconda2\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self

File "C:\Anaconda2\lib\pickle.py", line 606, in save_list
self._batch_appends(iter(obj))

File "C:\Anaconda2\lib\pickle.py", line 639, in _batch_appends
save(x)

File "C:\Anaconda2\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)

File "C:\Anaconda2\lib\pickle.py", line 425, in save_reduce
save(state)

File "C:\Anaconda2\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self

File "C:\Anaconda2\lib\pickle.py", line 568, in save_tuple
save(element)

File "C:\Anaconda2\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self

File "C:\Anaconda2\lib\pickle.py", line 492, in save_string
self.write(BINSTRING + pack("<i", n) + obj)

IOError: [Errno 32] Broken pipe

Dynamic index update

Hi!

Do I understand correctly that LOPQ does not currently support dynamic index update / adding new data to an existing dataset?

search new example

How can I add_data for a new example, and how to search for a new example that doesn't exist in the database yet, I just want to show the vector closest to this vector?

The parameters on large dataset

I find there are many parameters in training phase. Have you run this project on large datasets, like SIFT1M(even SIFT1B) and GIST1M? And how to choose the appropriate parameters? Thanks a lot!

HELP! Exception when data.unpersist()

Hello,

This issue has blocked me. And I don't know how to solve it, so ask for help here.

Traceback (most recent call last):
File "train_model.py", line 417, in
subs = train_subquantizers(sc, data, args.M, args.subquantizer_clusters, model, seed=args.seed)
File "train_model.py", line 227, in train_subquantizers
data.unpersist()
File "/mnt/dfs/11/yarn/local/usercache/algo/appcache/application_1550790374016_973847/container_e66_1550790374016_973847_02_000001/pyspark.zip/pyspark/rdd.py", line 251, in unpersist
File "/mnt/dfs/11/yarn/local/usercache/algo/appcache/application_1550790374016_973847/container_e66_1550790374016_973847_02_000001/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1133, in call
File "/mnt/dfs/11/yarn/local/usercache/algo/appcache/application_1550790374016_973847/container_e66_1550790374016_973847_02_000001/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o261.unpersist.
: org.apache.spark.SparkException: Exception thrown in awaitResult

Pip3 support

Hi,

I'm interested in this library which seems really good, but I'm currently using Python3, and I'm struggling to install lopq via pip3 (sudo pip3 install lopq).

The installation with pip3 seems to be fine, but it gives me the following error when trying to import lopq :

>>> import lopq
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/lopq/__init__.py", line 3, in <module>
    import model
ImportError: No module named 'model'

It works fine when installing it via pip, but then I can't import it in Python3.

Is there any on-going work to support python3?

As the title suggests, I would like to know if there is a plan to support python3 in the near future, as some of the open issues are about bug due to the incompatibility with Py3:

I see that some attempts were done before, but weren't completed (https://github.com/yahoo/lopq/pull/14/files)

So, if there are people willing to work on this, let's collaborate on that. I can start working on it, however I need some clarifications:

  • Is there an ongoing work (to avoid loosing time on duplicated work)
  • Are there any limitation against using python-six (https://pythonhosted.org/six/) to support python2 and python3 at the same time?

@pumpikano Can you help clarify these points? Thanks

Missing lopq library in sample `spark-submit` call

The lopq library is not currently provided to the example spark-submit call in the documentation. I found the easiest solution was to run python setup.py bdist_egg in the python subdirectory and then pass the generated egg to spark-submit via the --py-files parameter. It would probably be helpful if this were added to the documentation.

Exception occurred when copy model file to HDFS

Starting rotation fitting for split 1
Saving pickle to temp file...
Copying pickle file to hdfs...
hdfs:///user/algo/lopq/model/
Traceback (most recent call last):
File "train_model.py", line 426, in
save_hdfs_pickle(model, args.model_pkl)
File "train_model.py", line 245, in save_hdfs_pickle
copy_to_hdfs(f, pkl_path)
File "train_model.py", line 266, in copy_to_hdfs
subprocess.call(['hadoop', 'fs', '-put', f.name, hdfs_path])
File "/usr/lib64/python2.7/subprocess.py", line 524, in call
return Popen(*popenargs, **kwargs).wait()
File "/usr/lib64/python2.7/subprocess.py", line 711, in init
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
End of LogType:stdout

No search method in Spark implementation

Thanks very much for this implementation!

I was wondering if there was any plan on adding a script to search the index in spark. So far, we can create a model, save it, and compute the code of a set of data points, but no search functionality.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.