Comments (4)
Now even better - implements WarpLDA. Sampler doesn't depend on number of topics, so I expect that it will be more than 10x faster.
Function to calculate perplexity is exported as well.
from ldatuning.
++@dselivanov - the author of text2vec package
from ldatuning.
Bit of a necro-bump, but am I right in assuming that none of the currently provided metrics apply to WarpLDA? I hacked on this and tried to get at least Griffiths's metric working with text2vec
's WarpLDA, but to no avail.
In essence, I used text2vec
as a drop-in replacement for topicmodels
, and provide the result of fitting a model to the function for calculating Griffiths. In there, I access the model's loglikelihood
attribute, but these seem to be different likelihoods than those provided by a topicmodels::LDA_Gibbs
result...?
Some (simplified and hence untested!) code of my endeavours:
Griffiths2004 <- function(model) {
logLiks <- lapply(models, function(model) {
# the result of fitting text2vec's LDA also has an attribute "likelihood"
# containing a column "loglikelihood"
ll <- do.call(c, attr(model, "likelihood")["loglikelihood"])
utils::tail(ll, n = length(ll))
})
# harmonic means for every model
metrics <- sapply(logLiks, function(x) {
# code is a little tricky, see explanation in [Ponweiser2012 p. 36]
# ToDo: add variant without "Rmpfr"
llMed <- stats::median(x)
metric <- as.double(
llMed - log( Rmpfr::mean( exp( -Rmpfr::mpfr(x, prec=2000L) + llMed )))
)
return(metric)
})
return(metrics)
}
model <- text2vec::LDA$new(n_topics = x)
model$fit_transform(
x = dtm,
n_iter = 1000,
convergence_tol = 0.001,
n_check_convergence = 1,
progressbar = FALSE
)
result <- Griffiths2004(model)
FindTopicsNumber_plot(result)
This gives a plot similar to this:
If these seems naive: it is 😃 I'm only just getting into R and topic modelling, but hyperparameter tuning is currently an important task for me. Running ldatuning
with the Gibbs sampling algorithm takes an unfeasible amount of time on my dataset, so I'm researching for a more speedy but equally reliable solution. Any insight into this is thus much appreciated!
from ldatuning.
It looks like text2vec no longer returns log-likelihood calculations as of v0.5.0 per dselivanov/text2vec#212. As such, I'm closing this issue. Feel free to open a new issue if there's another way to accomplish the port.
from ldatuning.
Related Issues (20)
- Typo in man page HOT 1
- fit models...Error in checkForRemoteErrors(val) : 2 nodes produced errors; first error: invalid argument type HOT 3
- FindTopicsNumber stop working HOT 1
- return_models = T causes error in FindTopicsNumber_plot HOT 1
- Error in checkForRemoteErrors(val) : 2 nodes produced errors; first error: Each row of the input matrix needs to contain at least one non-zero entry HOT 2
- Consistent fatal error when attempting FindTopicsNumber() HOT 4
- crash with Apple M1chip HOT 4
- Status of each k in verbose mode HOT 1
- Check needed for NCOL(dtm) <= # of topics HOT 6
- Biterm topic model tuning
- Deveaud 2014 correctly implemented?
- Performance of FindTopicsNumber on big datasets HOT 1
- warning message running "FindTopicsNumber_plot"
- Warning message: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as of ggplot2 3.3.4. HOT 1
- add option to load-balance model-fitting HOT 2
- Error: "Each row of the input matrix needs to contain at least one non-zero entry" HOT 3
- Arun 2010 correctly implemented? HOT 19
- Error in seq_len(m) : argument must be coercible to non-negative integer HOT 1
- Delete HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ldatuning.