guenthermi / postgres-word2vec Goto Github PK
View Code? Open in Web Editor NEWutils to use word embedding models like word2vec vectors in a PostgreSQL database
License: MIT License
utils to use word embedding models like word2vec vectors in a PostgreSQL database
License: MIT License
Kudos on the project!
I stumbled upon GloVe and immediately wanted to know if I could get this type of data loaded into Postgres and found this repo.
I was wondering what data sets you're using and if GloVe or similar are on your radar.
Thanks,
Jeff
SELECT * FROM top_k_in_pq('Godfather', 5, ARRAY(SELECT title FROM movies));
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
hi there,
I manage to get install all dependencies and load your extension in a docker container, however when i arrive to te las step in the process, (Statistics) I am getting this error:
SELECT create_statistics('google_vecs_norm', 'word', 'coarse_quantization_ivpq');
ERROR: function get_vecs_name_ivpq_quantization() does not exist
LINE 1: SELECT get_vecs_name_ivpq_quantization()
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
QUERY: SELECT get_vecs_name_ivpq_quantization()
CONTEXT: PL/pgSQL function create_statistics(character varying,character varying,character varying) line 10 at EXECUTE
Could you please help me or point me out some sort of solution?
thanks very much
I have problems with sending the embedding to the knn_in_pq
function.
I tried:
embedding = [0.12109375, 0.056640625, ..., -0.2421875] embedding = np.array(embedding) cursor.execute("SELECT * FROM knn_in_pq((%s), 2, ARRAY(SELECT event_name FROM events));", (embedding,))
it throws:
function knn_in_pq(record, integer, character varying[]) does not exist
LINE 1: SELECT * FROM knn_in_pq(((0.12109375, 0.056640625, -0.242187...
when I try with the list:
embedding = [0.12109375, 0.056640625, ..., -0.2421875] cursor.execute("SELECT * FROM knn_in_pq((%s), 2, ARRAY(SELECT event_name FROM events));", (embedding,))
it throws:
function knn_in_pq(numeric[], integer, character varying[]) does not exist
LINE 1: SELECT * FROM knn_in_pq((ARRAY[0.12109375,0.056640625, -0.24...
From the \df command I can see that the knn_in_pq function is overriden and here are the possible function calls:
`
public | knn_in_pq | TABLE(word character varying, similarity real) | query_vector anyarray, k integer, input_set integer[] | normal
public | knn_in_pq | TABLE(word character varying, similarity real) | query_vector bytea, k integer, input_set character varying[] | normal
public | knn_in_pq | TABLE(word character varying, similarity real) | token character varying, k integer, input_set character varying[] | normal
public | knn_in_pq | TABLE(word character varying, similarity real) | token character varying, k integer, input_set integer[]
`
I know that the embedding needs to be casted somehow to query_vector bytea (I think) but do not know how to do that. Can u help me on this?
If I send a word instead the vector, the function works properly, but I need to send a vector and to get the k most close events to the input vector
(base) admin@ifood-Latitude-5490:~/dev/search/postgres-word2vec/index_creation$ python3 vec2database.py config/vecs_config.json
INFO [2019-06-07 16:39:06] : Exexuted DROP TABLE on google_vecs
INFO [2019-06-07 16:39:06] : Created new table google_vecs
Traceback (most recent call last):
File "vec2database.py", line 136, in
main(len(sys.argv), sys.argv)
File "vec2database.py", line 124, in main
insert_vectors(vec_config.get_value('vec_file_path'), con, cur, vec_config.get_value('table_name'), db_config.get_value('batch_size'), vec_config.get_value('normalized'), logger)
File "vec2database.py", line 78, in insert_vectors
cur.executemany("INSERT INTO "+ table_name + " (word,vector) VALUES (%(word)s, vec_to_bytea(%(vector)s::float4[]))", tuple(values))
psycopg2.ProgrammingError: function vec_to_bytea(real[]) does not exist
LINE 1: ...RT INTO google_vecs (word,vector) VALUES ('', vec_to_byt...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
sudo make install gives:
Undefined symbols for architecture x86_64:
"_addToTargetList", referenced from:
_ivpq_search_in in ivpq_search_in.o
"_computePQDistanceInt16", referenced from:
_pq_search_in_batch in freddy.o
_ivpq_search_in in ivpq_search_in.o
"_initTargetLists", referenced from:
_ivpq_search_in in ivpq_search_in.o
"_reorderTopKPV", referenced from:
_ivpq_search_in in ivpq_search_in.o
"_updateTopKPVFast", referenced from:
_ivpq_search_in in ivpq_search_in.o
ld: symbol(s) not found for architecture x86_64
i can get around most of them by removing the 'inline'.
however, the addToTargetList function seems to be missing.
any ideas
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.