Code Monkey home page Code Monkey logo

pareto_dib's Introduction

Primal Deterministic Information Bottleneck

python tags tests license

This code is an implementation of the Pareto Mapper and Symmetric Pareto Mapper algorithms presented in "Pareto-optimal clustering with the primal deterministic information bottleneck."

Abstract

At the heart of both lossy compression and clustering is a trade-off between the fidelity of the learned representation and its size. We focus on the Deterministic Information Bottleneck (DIB) formulation of lossy compression, which can be readily interpreted as a clustering problem, owing to the fact that its search space is that of hard clusterings. Our goal is to motivate the task of mapping out the Pareto frontier of these two-objective trade-offs in full generality. To this end, we introduce the primal DIB problem, which we argue results in a much richer frontier than its previously studied dual counterpart. We present an algorithm for mapping out the Pareto frontier of the DIB trade-off that is also applicable to most two-objective clustering problems. We study general properties of the Pareto set, and give both analytic and numerical evidence for the logarithmic sparsity of the frontier in general. We present evidence for the polynomial scaling of our algorithm despite the super-exponential search space; and additionally propose a modification to the algorithm that can be used where sampling noise is expected to be significant. Finally, we use our algorithm to map the DIB frontier of three different tasks: compressing the English alphabet, extracting informative color classes from natural images, and compressing a group theory inspired dataset, revealing interesting features of frontier, and demonstrating how the structure of the frontier can be used for model selection with a focus on points previously hidden by the cloak of the convex hull.

Installation

Download the latest release and install by running python setup.py install.

Usage

The main methods in the package are as follows:

  • Pareto Mapper: from pareto_dib import pareto_mapper
  • Symmetric Pareto Mapper: from pareto_dib import symmetric_pareto_mapper
  • Plotting utility: from pareto_dib import pareto_plot

Pareto Mapper

An example use case of Pareto Mapper is provided below. The pareto_mapper function takes a normalized joint distribution of variables (X, Y), pxy (numpy.ndarray) with shape (|X|, |Y|). The search depth is epsilon=1e-8.

pset, _ = pareto_mapper(pxy, epsilon=1e-8)
ax = pareto_plot(pset)

Symmetric Pareto Mapper

An example use case of Symmetric Pareto Mapper is provided below. The symmetric_pareto_mapper function takes a normalized joint distribution of variables (X_1, X_2, Y), pxxy (numpy.ndarray) with shape (|X|, |X|, |Y|). The search depth is epsilon=1e-8.

pset, _ = pareto_mapper(pxxy, epsilon=1e-8)
ax = pareto_plot(pset)

Examples

The datasets presented in "Pareto-optimal clustering with the primal deterministic information bottleneck" are provided in the `examples/data' directory.

For details on the creation of the datasets and a discussion of the frontier, please refer to the paper.

English alphabet

Here the X is the character immediately preceeding Y with the distribution derived from a large body of English text. To reproduce this plot, run python3 examples/example.py --dataset alpha27.

English Alphabet frontier

Colors

Here X contains information about an object's color and Y is the object's class. To reproduce this plot, run python3 examples/example.py --dataset colors.

Colors frontier

Group datasets

The group examples showcase the Symmetric Pareto Mapper. The random variables X_1 and X_2 are drawn uniformly from a group and Z = X_1 * X_2, where * denotes the group operation. Examples for the multiplicative group modulo 40 (Z40x) and the Pauli group are presented below. Both datasets result in the same DIB frontier despite being derived from qualitatively very different groups.

To reproduce this plot, run python3 examples/example.py --dataset Z40x.

Z40x group frontier

To reproduce this plot, run python3 examples/example.py --dataset pauli.

Pauli group frontier

Version History

  • 0.0.1
    • Initial Release

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Citation

pareto_dib's People

Contributors

andrewktan avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.