Code Monkey home page Code Monkey logo

Comments (4)

dselivanov avatar dselivanov commented on August 16, 2024 3

Now even better - implements WarpLDA. Sampler doesn't depend on number of topics, so I expect that it will be more than 10x faster.
Function to calculate perplexity is exported as well.

from ldatuning.

nsriram13 avatar nsriram13 commented on August 16, 2024

++@dselivanov - the author of text2vec package

from ldatuning.

baumanno avatar baumanno commented on August 16, 2024

Bit of a necro-bump, but am I right in assuming that none of the currently provided metrics apply to WarpLDA? I hacked on this and tried to get at least Griffiths's metric working with text2vec's WarpLDA, but to no avail.

In essence, I used text2vec as a drop-in replacement for topicmodels, and provide the result of fitting a model to the function for calculating Griffiths. In there, I access the model's loglikelihood attribute, but these seem to be different likelihoods than those provided by a topicmodels::LDA_Gibbs result...?

Some (simplified and hence untested!) code of my endeavours:

Griffiths2004 <- function(model) {
    logLiks <- lapply(models, function(model) {

        # the result of fitting text2vec's LDA also has an attribute "likelihood"
        # containing a column "loglikelihood"
        ll <- do.call(c, attr(model, "likelihood")["loglikelihood"])

        utils::tail(ll, n = length(ll))
    })

    # harmonic means for every model
    metrics <- sapply(logLiks, function(x) {
        # code is a little tricky, see explanation in [Ponweiser2012 p. 36]
        # ToDo: add variant without "Rmpfr"
        llMed <- stats::median(x)
        metric <- as.double(
          llMed - log( Rmpfr::mean( exp( -Rmpfr::mpfr(x, prec=2000L) + llMed )))
        )
        return(metric)
    })
return(metrics)
}

model <- text2vec::LDA$new(n_topics = x)
model$fit_transform(
    x = dtm,
    n_iter = 1000,
    convergence_tol = 0.001,
    n_check_convergence = 1,
    progressbar = FALSE
)

result <- Griffiths2004(model)
FindTopicsNumber_plot(result)

This gives a plot similar to this:

rplot

If these seems naive: it is 😃 I'm only just getting into R and topic modelling, but hyperparameter tuning is currently an important task for me. Running ldatuning with the Gibbs sampling algorithm takes an unfeasible amount of time on my dataset, so I'm researching for a more speedy but equally reliable solution. Any insight into this is thus much appreciated!

from ldatuning.

titaniumtroop avatar titaniumtroop commented on August 16, 2024

It looks like text2vec no longer returns log-likelihood calculations as of v0.5.0 per dselivanov/text2vec#212. As such, I'm closing this issue. Feel free to open a new issue if there's another way to accomplish the port.

from ldatuning.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.