Code Monkey home page Code Monkey logo

Comments (3)

MaartenGr avatar MaartenGr commented on June 18, 2024 2

Ah right, then we would calculate the distance matrix ourselves based on what has been set within HDBSCAN. I think it's important here that there are additional checks to make sure that a missing "metric" does not run into errors or that it automatically calculates the metric.

Your work on this would be greatly appreciated!

from bertopic.

MaartenGr avatar MaartenGr commented on June 18, 2024

Thank you for sharing this extensive description of this use case! I agree that it would be nice to have something like this implemented although I am curious as to how many users would end up using this feature.

Having said that, you can already pass the distance matrix to BERTopic and then simply skip over dimensionality reduction (as you already did before) in order to make this work. It would, however, introduce issues with topic embeddings but I'm actually curious about what would happen.

Lastly, do you think there is a way to implement this without introducing an HDBSCAN-specific parameter to the initialization of BERTopic? The reason why I ask is that my philosophy with BERTopic is to make it as modular as possible, so introducing this parameter might go against that if it is specific to HDBSCAN. Moreover, I want to keep the parameter space as small as possible in the initialization to keep the usage of BERTopic user-friendly. I have already seen some information-overload happening with the current set of parameters.

What do you think?

from bertopic.

jjovalle99 avatar jjovalle99 commented on June 18, 2024

Hey @MaartenGr, thank you for answering!

Yes, I think it's possible to implement this. As an initial idea, I think we can just get the metric parameter from HDBSCAN (self.hdbscan_model.get_params()["metric"]) and then define the logic. We can leverage scikit-learn's pairwise metrics to define it without any addition of extra parameters and maintaining modularity.

If I get your approval I can start working on that

from bertopic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.