Code Monkey home page Code Monkey logo

pervect's Introduction

image

image

image

image

PerVect

PerVect is a library for Persistence-diagram Vectorization -- converting the output of a persistent homology computation to a vector from which it is still possible to compute a close approximation to persistent Wasserstein distance. This is managed by approximating a training set of persistence diagrams with Gaussian mixture models; vectorizing a diagram as the weighted maximum likelihood estimate of the mixture weights for the learned components given the diagram; and then measuring the Wasserstein distance between vectorized diagrams by the Wasserstein distance between the corresponding Gaussian mixtures. As the number of components in mixture model increases the accuracy of the approximation increases accordingly, with equivalence in the limit.

The library is implemented as a Scikit-learn transformer -- taking a list of persistence diagrams (preferably in birth-lifetime format) as input, and producing vector representations. Alternatively UMAP can be used to convert to a lower dimensional Euclidean distance representation.

How to use PerVect

The pervect library inherits from sklearn classes and can be used as an sklearn transformer. Assuming that you have a list persistence diagrams where each diagram is a numpy array of points in 2D then you can vectorize by simply applying:

import pervect
vects = pervect.PersistenceVectorizer().fit_transform(diagrams)

It can also be used in standard sklearn pipelines along with other machine learning tools including clustering and classifiers. For example, given a set of training diagrams, and a separate test set of diagrams we could do:

import pervect
vectorizer = pervect.PersistenceVectorizer().fit(train)
train_vectors = vectorizer.transform(train)
test_vectors = vectorizer.transform(test)

The vectorizer is also effective at efficiently approximating Wasserstein distance between diagrams. A trained model can compute pairwise Wasserstein distance between a list of diagrams as follows:

import pervect
vectorizer = pervect.PersistenceVectorizer().fit(train)
test_diagram_distances = vectorizer.pairwise_p_wasserstein_distance(test, p=1)

The vectorizer can also automatically produce UMAP representations of the diagrams, either using "hellinger" distance or Wasserstein distance (note that transforming new data using Wassersteing trained UMAP is currently unavailable).

import pervect
diagram_map = pervect.PersistenceVectorizer(apply_umap=True).fit(diagrams)

Installation

Requirements:

  • Python >= 3.6
  • scikit-learn
  • umap-learn
  • numba
  • joblib
  • pot

You can install pervect from PyPI with pip:

pip install pervect

For a manual install get this package:

wget https://github.com/scikit-tda/pervect/archive/master.zip
unzip master.zip
rm master.zip
cd pervect-master

Install the requirements

sudo pip install -r requirements.txt

Install the package

pip install .

References

This package was inspired by and builds upon the work of Elizabeth Munch, Jose Perea, Firas Khasawneh and Sarah Tymochko. You can refer the the papers:

Jose A. Perea, Elizabeth Munch, Firas A. Khasawneh, Approximating Continuous Functions on Persistence Diagrams Using Template Functions, arXiv:1902.07190

Sarah Tymochko, Elizabeth Munch, Firas A. Khasawneh, Adaptive Partitioning for Template Functions on Persistence Diagrams, arXiv:1910.08506v1

License

The pervect package is 3-clause BSD licensed.

We would like to note that the pervect package makes heavy use of NumFOCUS sponsored projects, and would not be possible without their support of those projects, so please consider contributing to NumFOCUS.

Contributing

Contributions are more than welcome! There are lots of opportunities for potential projects, so please get in touch if you would like to help out. Everything from code to notebooks to examples and documentation are all equally valuable so please don't feel you can't contribute. To contribute please fork the project make your changes and submit a pull request. We will do our best to work through any issues with you and get your code merged into the main branch.

pervect's People

Contributors

cjweir avatar lmcinnes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pervect's Issues

Other vectorizations?

Thanks for the great package. I was wondering if there are plans to add support for additional vectorization schemes in persistent homology? In particular, I was wondering about persistence landscapes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.