Code Monkey home page Code Monkey logo

deeprob-kit's Introduction

MIT license PyPI version codecov Continuous Integration Documentation Status

Logo

DeeProb-kit

DeeProb-kit is a unified library written in Python consisting of a collection of deep probabilistic models (DPMs) that are tractable and exact representations for the modelled probability distributions. The availability of a representative selection of DPMs in a single library makes it possible to combine them in a straightforward manner, a common practice in deep learning research nowadays. In addition, it includes efficiently implemented learning techniques, inference routines, statistical algorithms, and provides high-quality fully-documented APIs. The development of DeeProb-kit will help the community to accelerate research on DPMs as well as to standardise their evaluation and better understand how they are related based on their expressivity.

Features

  • Inference algorithms for SPNs. 1 2
  • Learning algorithms for SPNs structure. 1 3 4 2 5
  • Chow-Liu Trees (CLT) as SPN leaves. 6
  • Cutset Networks (CNets) with various learning criteria. 7
  • Batch Expectation-Maximization (EM) for SPNs with arbitrarily leaves. 8 9
  • Structural marginalization and pruning algorithms for SPNs.
  • High-order moments computation for SPNs.
  • JSON I/O operations for SPNs and CLTs. 2
  • Plotting operations based on NetworkX for SPNs and CLTs. 2
  • Randomized And Tensorized SPNs (RAT-SPNs). 10
  • Deep Generalized Convolutional SPNs (DGC-SPNs). 11
  • Masked Autoregressive Flows (MAFs). 12
  • Real Non-Volume-Preserving (RealNVP) flows. 13
  • Non-linear Independent Component Estimation (NICE) flows. 14

The collection of implemented models is summarized in the following table.

Model Description
Binary-CLT Binary Chow-Liu Tree (CLT)
Binary-CNet Binary Cutset Network (CNet)
SPN Vanilla Sum-Product Network
MSPN Mixed Sum-Product Network
XPC Random Probabilistic Circuit
RAT-SPN Randomized and Tensorized Sum-Product Network
DGC-SPN Deep Generalized Convolutional Sum-Product Network
MAF Masked Autoregressive Flow
NICE Non-linear Independent Components Estimation Flow
RealNVP Real-valued Non-Volume-Preserving Flow

Installation

The library can be installed either from PIP repository or by source code.

# Install from PIP repository
pip install deeprob-kit
# Install from `main` git branch
pip install -e git+https://github.com/deeprob-org/deeprob-kit.git@main#egg=deeprob-kit

Project Directories

The documentation is generated automatically by Sphinx using sources stored in the docs directory.

A collection of code examples and experiments can be found in the examples and experiments directories respectively. Moreover, benchmark code can be found in the benchmark directory.

Cite

@misc{loconte2022deeprob,
  doi = {10.48550/ARXIV.2212.04403},
  url = {https://arxiv.org/abs/2212.04403},
  author = {Loconte, Lorenzo and Gala, Gennaro},
  title = {{DeeProb-kit}: a Python Library for Deep Probabilistic Modelling},
  publisher = {arXiv},
  year = {2022}
}

Related Repositories

References

Footnotes

  1. Peharz et al. On Theoretical Properties of Sum-Product Networks. AISTATS (2015). 2

  2. Molina, Vergari et al. SPFLOW : An easy and extensible library for deep probabilistic learning using Sum-Product Networks. CoRR (2019). 2 3 4

  3. Poon and Domingos. Sum-Product Networks: A New Deep Architecture. UAI (2011).

  4. Molina, Vergari et al. Mixed Sum-Product Networks: A Deep Architecture for Hybrid Domains. AAAI (2018).

  5. Di Mauro et al. Sum-Product Network structure learning by efficient product nodes discovery. AIxIA (2018).

  6. Di Mauro, Gala et al. Random Probabilistic Circuits. UAI (2021).

  7. Rahman et al. Cutset Networks: A Simple, Tractable, and Scalable Approach for Improving the Accuracy of Chow-Liu Trees. ECML-PKDD (2014).

  8. Desana and Schnörr. Learning Arbitrary Sum-Product Network Leaves with Expectation-Maximization. CoRR (2016).

  9. Peharz et al. Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits. ICML (2020).

  10. Peharz et al. Probabilistic Deep Learning using Random Sum-Product Networks. UAI (2020).

  11. Van de Wolfshaar and Pronobis. Deep Generalized Convolutional Sum-Product Networks for Probabilistic Image Representations. PGM (2020).

  12. Papamakarios et al. Masked Autoregressive Flow for Density Estimation. NeurIPS (2017).

  13. Dinh et al. Density Estimation using RealNVP. ICLR (2017).

  14. Dinh et al. NICE: Non-linear Independent Components Estimation. ICLR (2015).

deeprob-kit's People

Contributors

fedous avatar gengala avatar loreloc avatar yangyang-pro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

deeprob-kit's Issues

Update README.md and fix implicit imports

  • Update the table of implemented models in README.md
  • Add NormalizingFlow abstract class import in flows/models/__init__.py
  • Add RatSpn abstract class import in spn/models/__init__.py
  • Fix 'type' object is not subscriptable using sphinx
  • Prepend MIT license information to every source file in deeprob/

Add a string flag "method" on SPN learning wrappers

Add a string flag method on learn_estimator function (in module deeprob.spn.learning.wrappers) that permits to choose between different SPN learning algorithms.

At the moment, the flag method must support two values: learnspn and learnxpc, corresponding to LearnSPN and LearnXPC algorithms respectively.

Fully differentiable MAFs

The method "apply_forward" of the class "AutoregressiveLayer" is not differentiable, due to in-place operations. This makes training using the flow sampling direction impossible.

Setup PyLint

Setup PyLint static code analyser.
Also, setup GitHub Action to automatically print a report about code quality.

Refactor Unit Tests

  • Refactor tests to use pytest instead of unittest
  • Add tests for shapes checking
  • Introduce Continuous Integration (CI) (e.g. GitHub Action using Codecov on merge on main)

Write a README.md file for each sub-directory

Split the README.md file at root directory into multiple markdown files discussing the content (and usage) of the scripts present in the following directories:

  • benchmark
  • docs
  • examples
  • experiments

On flows, mean and standard deviation of default base distribution are not kept constant during training

When training a normalizing flow having a Standard Gaussian as base distribution (i.e. using in_base=None by default), mean and standard deviation are not kept constant during training. The expected behavior is that they must be kept constant during training.

This is probably due to a wrong initialization of mean and standard deviation parameters: https://github.com/deeprob-org/deeprob-kit/blob/main/deeprob/flows/models/base.py#L52-L53.

Example `plot_spn.py` raises an error

I was trying to run plot_spn.py, but the code raises an error. Here's the output:

Plotting the dummy SPN to spn-dummy.svg ...
Traceback (most recent call last):
  File ".../deeprob-kit/examples/spn_plot.py", line 25, in <module>
    spn.plot_spn(root, spn_filename)
  File ".../miniconda3/envs/deeprob/lib/python3.9/site-packages/deeprob/spn/structure/io.py", line 317, in plot_spn
    pos = nx_pydot.graphviz_layout(graph, prog='dot')
  File ".../miniconda3/envs/deeprob/lib/python3.9/site-packages/networkx/drawing/nx_pydot.py", line 357, in graphviz_layout
    return pydot_layout(G=G, prog=prog, root=root)
  File ".../miniconda3/envs/deeprob/lib/python3.9/site-packages/networkx/drawing/nx_pydot.py", line 406, in pydot_layout
    P = to_pydot(G)
  File ".../miniconda3/envs/deeprob/lib/python3.9/site-packages/networkx/drawing/nx_pydot.py", line 263, in to_pydot
    raise ValueError(
ValueError: Node names and attributes should not contain ":" unless they are quoted with "".                For example the string 'attribute:data1' should be written as '"attribute:data1"'.                Please refer https://github.com/pydot/pydot/issues/258

Feedback on the example running experience

I ran all examples. They are a nice way of testing how the code runs on one's computer and show its capabilities. Below, I provide some points of feedback/suggestions that may improve the experience people have when running the examples. Some of that feedback may pertain or be relevant to other parts of the code base as well.

  1. Often, files are created as part of an example, such as the nice illustrative figures. It would be useful to alert the user of all files being created, so that they are aware of this even if they do not keep an eye on their working folder. Also, some files have unclear purpose (such as the pt files). Clarifying their use when alerting they are created would therefore be useful. (If they are temporary files, delete them at the end of running the example or use the tempfile module.)
  2. The console output provides useful information about the time it takes to run an example. If possible, generalize this to all examples that are not trivially short. (I think the first stage of spn_latent_mnist.py does not.)
  3. The console output numerical values often have a large number of digits displayed. There is little reason to believe that many are actually significant. Furthermore, it makes the output more difficult to read and digest. Ideally, output only significant digits, but if you do not know how many digits are significant, 4 digits in total is a good upper bound (like 57.63 %, 1234, 1.234e6).
  4. Many of the console output numbers have units (s, it/s, batch/s). The international standard is to always have a space between a number and its unit.
  5. Sometimes, JSON output is created either as console output or in files. Try to pretty-print it a bit, to make it easier to scan. If it is not meant to be read, perhaps consider omitting it.
  6. For many of the examples, you generate images, which is great. It would add value to have every example generate some image, even if it is not a sample. Namely, the examples can also provide users of the package inspiration of the type of images that they might generate.
  7. In one case, an image was generated in an interactive window (nvp1d_moons.py) and not in an image file. That is nice. Could it be generalized to all examples, with a fallback to image file generation?
  8. In two cases, the examples automatically downloaded some datasets. While convenient, some users might not expect this, may not like it, or may not have an internet connection. I think it would be more user-friendly to ask first or instruct first where to download the dataset. Furthermore, I saw that MNIST was downloaded from LeCunn's original website, who explicitly requests not to do that (“Please refrain from accessing these files from automated scripts with high frequency. Make copies!”); it would be polite to honor that request. In general, make sure to download from permanent repositories if possible instead of possibly non-permanent websites.
  9. The console output lists accuracy percentages. These generally are quite a bit closer to 100 % than to 0 %. Therefore, the initial digit (7, 8, 9) is often not very significant and therefore distracting. It is more user friendly to use error rate instead, so, e.g., [12.49, 8.66, 4.57] instead of [87.51, 91.34, 95.43].

Obviously, these are mostly cosmetic suggestions, so I'd understand that you classify (parts of) this issue as ‘wontfix’.

Create TreeBN class

Most of the code available in BinaryCLT actually works for any tree-shaped Bayesian Network. Therefore, it would be better to create a super-class called TreeBN and then make BinaryCLT a subclass of it.

Introduce multithreaded implementation of forward and backward evaluation of SPNs

  • The forward evaluation (used for EVI, MAR and MPE queries and sampling) can be parallelized by considering a layered topological ordering of the SPN graph. That is, every leaf node can be evaluated in parallel and, after that, every parent node can be computed in parallel as well, and so on.
  • The backward evaluation (used for MPE query and sampling) can be parallelized by considering a layered topological ordering of the SPN graph, as for forward evaluation.
  • Moreover, introduce unit tests ensuring the correctness of the implementation.

A suitable multiprocessing library for this task is joblib, for which it is possible to specify 'threading' as lightweight backend.

Unclear where experiments folder is

I installed deeprob-kit using pip:

$ pip install --user deeprob-kit

Now, I want try out the experiments to see if the code works on my system. However, I do not seem to have the experiments folder and therefore do not seem to be able to run them or put the datasets in place. Namely, what I have after installation is the following tree:

~/.local/lib/python3.9 $ tree -L 3
.
└── site-packages
    ├── deeprob
    │   ├── __init__.py
    │   ├── __pycache__
    │   ├── context.py
    │   ├── flows
    │   ├── spn
    │   ├── torch
    │   └── utils
    └── deeprob_kit-1.0.0.dist-info
        ├── INSTALLER
        ├── LICENSE
        ├── METADATA
        ├── RECORD
        ├── REQUESTED
        ├── WHEEL
        └── top_level.txt

8 directories, 9 files

My impression is that the bundle on PyPi only contains deeprob-kit itself, without any of the other materials. Perhaps putting the experiments folder (and others) under deeprob may provide a solution, but I guess you chose the current structure for a reason. Or perhaps I am looking in the wrong location.

Setup "deeprob-kit-docs" repository or "gh-pages" branch for automatically versioned documentation

Setup a new repository deeprob-org/deeprob-kit-docs or the special branch gh-pages containing versioned documentation.
Refer to sphinx-multiversion for building versioned documentation.
In particular, refer to a fork of sphinx-multiversion supporting sphinx-apidoc and sphinx-autodoc.

Finally, setup a GitHub Action to automatically push new documentation versions when:

  1. A push on main branch is made.
  2. A new tag/release is pushed.

However, this can also be done using Travis CI.

Implement the FID score for normalizing flows

Implement the FID score for generative models. A suitable package to place the function fid_score is deeprob.utils.statistics.

Moreover, include the FID score, aside the BPP (bits-per-pixel) metric, in the results given by normalizing flows experiments.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.