I found that the get_centroids() function always returns the same values no matter which terms are used.
I looked at your code and I believe the issue might be here:
v <- wv[ terms[, 1, drop=TRUE] , , drop = FALSE]
It subsets the first x terms of the word embedding matrix instead of the terms passed to the function.
Similarly, the get_directions() function only works if I pass a matrix of terms. A data frame returns 0 for all dimensions.
library(CMDist)
library(word2vec)
#---Load embedding vectors
w2v <- read.wordvectors("/GoogleNews-vectors-negative300.bin",type = "bin",n = 20)
#---get_centroids()
terms_1 <- c("in","with","from")
terms_2 <- c("said","was","are")
centroid_t1 <- get_centroid(terms = terms_1,wv = w2v)
centroid_t2 <- get_centroid(terms = terms_2,wv = w2v)
centroid_t1 == centroid_t2
#---get_directions()
pairs_1 <- data.frame(additions = terms_1,
substracts = terms_2)
pairs_2 <- as.matrix(data.frame(additions = terms_1,
substracts = terms_2))
directions_1 <- get_direction(pairs = pairs_1,w2v)
directions_2 <- get_direction(pairs = pairs_2,w2v)