Code Monkey home page Code Monkey logo

persistable's Introduction

PyPI Downloads tests coverage docs status license

Persistent and stable clustering (Persistable) is a density-based clustering algorithm intended for exploratory data analysis. What distinguishes Persistable from other clustering algorithms is its visualization capabilities. Persistable's interactive mode lets you visualize multi-scale and multi-density cluster structure present in the data. This is used to guide the choice of parameters that lead to the final clustering.

Usage

Here is a brief outline of the main functionality; see the documentation for details, including the API reference.

In order to run Persistable's interactive mode from a Jupyter notebook, run the following in a Jupyter cell:

import persistable
from sklearn.datasets import make_blobs

X = make_blobs(2000, centers=4, random_state=1)[0]

p = persistable.Persistable(X)
pi = persistable.PersistableInteractive(p)
pi.start_ui()

The last command returns the port in localhost serving the UI, which is 8050 by default. Now go to localhost:8050 in your web browser to access the graphical user interface:

Alt text

After choosing your parameters using the user interface, you can get your clustering in another Jupyter cell by running:

clustering_labels = pi.cluster()

Note: You may use pi.start_ui(jupyter_mode="inline") to have the graphical user interface display directly in the Jupyter notebook!

Installing

Make sure you are using Python 3. Persistable depends on the following python packages, which will be installed automatically when you install with pip: numpy, scipy, scikit-learn, cython, plotly, dash, diskcache, multiprocess, psutil. To install from pypi, simply run the following:

pip install persistable-clustering

Documentation and support

You can find the documentation at persistable.readthedocs.io. If you have further questions, please open an issue and we will do our best to help you. Please include as much information as possible, including your system's information, warnings, logs, screenshots, and anything else you think may be of use. If you do not wish to open an issue, you are also welcome to contact Luis Scoccola directly. Please be patient if it takes us a bit to get back to you.

Running the tests

You can run the tests by running the following commands from the root directory of a clone of this repository. If a test fails, please report a bug, trying to include as much information as possible, including your system's information, warnings, logs, screenshots, and anything else you think may be of use.

pip install pytest playwright pytest-playwright
python -m playwright install --with-deps
pip install -r requirements.txt
python -m setup build_ext --inplace
pytest .

Details about theory and implementation

Persistable is based on multi-parameter persistence [4], a method from topological data analysis. The theory behind Persistable is developed in [1], while this implementation uses the high performance algorithms for density-based clustering developed in [2] and implemented in [3]. Persistable's interactive mode is inspired by RIVET [5] and is implemented in Dash.

Contributing

To contribute, you can fork the project, make your changes, and submit a pull request. You may want to contact Luis Scoccola first, to make sure your work does not overlap with ongoing work.

Authors

Luis Scoccola and Alexander Rolle.

Citing

If you use this package in your work, you may cite the corresponding paper using the following bibtex entry:

@article{Scoccola2023,
    doi = {10.21105/joss.05022},
    url = {https://doi.org/10.21105/joss.05022},
    year = {2023},
    publisher = {The Open Journal},
    volume = {8},
    number = {83},
    pages = {5022},
    author = {Luis Scoccola and Alexander Rolle},
    title = {Persistable: persistent and stable clustering},
    journal = {Journal of Open Source Software}
}

References

[1] Stable and consistent density-based clustering. A. Rolle and L. Scoccola. arXiv:2005.09048

[2] Accelerated Hierarchical Density Based Clustering. L. McInnes, J. Healy. 2017 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp 33-42. 2017

[3] hdbscan: Hierarchical density based clustering. L. McInnes, J. Healy, S. Astels. Journal of Open Source Software, The Open Journal, volume 2, number 11. 2017

[4] An Introduction to Multiparameter Persistence. M. B. Botnan, M. Lesnick. Proceedings of the 2020 International Conference on Representations of Algebras. 2022

[5] RIVET. The RIVET Developers. [Git] [docs]

License

This software is published under the 3-clause BSD license.

persistable's People

Contributors

alexanderrolle avatar luisscoccola avatar manuelf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

manuelf

persistable's Issues

Community guidelines

The Contributing section lists what to do for people wishing to contribute, but it would be beneficial if you could also cover those seeking help with issues (e.g. how to file an issue, and what the expectations are for issue formatting if any), and simply with questions relating to the library (e.g. a discussions section). As a further step it could be helpful to include a code of conduct for the repository, just to make expectations clear.

openjournals/joss-reviews#5022

PyPi Registration Question

This issue is non-blocking for my JOSS review.

Sorry if I missed it in the documentation somewhere, but is there a plan to register the package on PyPi and perhaps conda-forge? I noticed that the name persistable is already taken on PyPi, which is rather unfortunate. Is this the reason, or is something else keeping PyPi registration? Mind you, I don't consider it a big deal to not register on these servers, but there is perhaps added visibility for your package that the convenience for installation could bring.

Error listing upon suggested tests

These errors occurred when I ran the following tests shown on the GitHub page after installing persistable on the command line, which went without problems:

pip install pytest playwright pytest-playwright
python -m playwright install --with-deps
pip install -r requirements.txt
python -m setup build_ext --inplace
pytest .

As suggested, I'm sending them on. See attached screen grab.

ErrorListing_28Nov2023

API Index

This is a relatively low-level request that I have for the documentation, which I otherwise enjoy, but I would like to see an index of the core types and methods of the package in the API reference section. For example, I wonder if it is possible in the sphinx docs setup that you use for the documentation to, in the API section, have a list of types and their methods that link to their full definitions further down the page (in the way that you have currently).

If this is limited by the tools that sphinx provides, it is no worries. I just know that other packages such as those for documenting Julia code allow for an index functionality.

Clustering Parameters Question

This question is non-blocking for my JOSS review.

Is there a plan to make the parameters stateful or transferrable? For example, after I select some desirable parameters using the persistable method, where do the parameters "live," so to speak, and how could I use these parameters to cluster on another dataset after reloading? I do understand that there is a quick_cluster method in your API, which I appreciate, but as I understand it this method automatically selects parameters based upon the ranges provided. Sorry again if this is due to my misunderstanding of the tool, which is why I add this issue as non-blocking.

Installation requirements

The installation went very smoothly, relying on pip to pull down all the requirements. It would be helpful, however, to list the top level installation requirements in the documentation up front so that users can have a clear idea of what they will be pulling in before they do the install (I know this is covered by requirements.txt but putting it in the README and readthedocs would be beneficial).

#openjournals/joss-reviews#5022

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.