socialcognitivesystems / primo Goto Github PK

License: GNU Lesser General Public License v3.0

Python 100.00%

primo's Introduction

PRIMO - PRobabilistic Inference MOdules

This project is a (partial) reimplementation of the original probabilistic inference modules (see branch primo-legacy). This reimplementation follows the same general idea, but restructured and unified the underlying datatypes to allow a more concise API and more efficient manipulation, e.g., by the inference algorithm. In turn the inference algorithms have been rewritten and partly extended. For most if not all use cases this implementation should be easier to use and more performant than the original.

primo's People

Contributors

Stargazers

Watchers

Forkers

jpoeppel hbuschme

primo's Issues

Add warnings when incorret CPTs are set

Currently we will silently allow invalid CPTs, i.e. we do not check if the probabilities sum up to 1.
It might be better to at least issue a warning in these cases, so that the user is aware that the results might be unexpected.

SlackTest

Just a quick test for the slack integration

Consider dropping networkx dependency

We hardly networkx. It is only used as an underlying graph representation for the networks, FactorTrees and it provides a useful helper for sampling.

Functionality we would need to implement in order to drop networkx:

Representing graph structure (potentially with variables for FactorTree) (used in network.py and inference/exact.py)
Topological sort for sampling (used in network.py)
Directed to undirected graph for FactorTree (used in inference/order.py and inference/exact.py)

Our own implementation could either be general (e.g. take required parts from networkx almost as they are, but simplified for our needs), or specialized and optimized for our use case (e.g. flag nodes without ancestors specially for topological sort)

Did I forget any other functions we use?

FactorTree: Implement get_evidence_probability

The marginals function already references the get_evidence_probability function, but that one is not implemented yet.
The function should return the probability of the specified evidence. This will most likely require us to store the currently set evidence in the factorTree which is not done so far.

Document DBN spec format

The DBN spec format is currently undocumented and ordering of transition pairs is slightly confusing.

setup.py pulls incompatible networkx v 2 if v1x is not already installed

networkx version 2 is pulled by setup.py if networkx is not already installed on the system.
primo does not work with this version.

Write documentation

While #18 is a good start, we should seriously consider writing a proper documentation for the entire package. There are a couple of examples, but I am not sure if they are sufficient...

Error in FactorTrees, setting evidence in unconnected nodes

When one adds a new node a network without adding edges but sets evidence on this node, a FloatingPointError: divide by zero encountered in true_divide is raised.

Refractor Inference methods to not expose the Factor class

Returning Factors when one is expecting probabilities might be confusing (especially since the actual distribution needs to be queried with "get_potentials" not get_probability). Adding functions that imply that the factor contains probabilities is problematic since a factor cannot always know what kind of "probability" it represents (joint or conditional), so we cannot even normalize the potentials in order to get a valid probability in every case.

One solution might be to consider the Factor class as strictly internal for computations and change the interfaces of all inference methods to return different objects. One could consider returning low level np.arrays, or alternatively create a wrapper object, structurally similar to the current Factor class, but which exposes an interface implying actual probabilities and convenience functions.
Upside of this approach would be easier for naive users to understand the returned results, and the option to specialize factors more for computational efficiency.
Downside of this approach is the loss of flexibility with low level np.arrays or the complexity added by another class which needs to be created (should be ok since it is only done once per query at the end)

Decide on how we handle docstrings in subclasses

PR #5 made it apparent that we do not currently have decided on how we want to handle docstrings in subclasses (especially from abstract classes). More or less copying does not appear to be reasonable as it only makes it easier to have inconsistent docstrings when we need to change something.
Just referencing the superclass in the docstring of the subclass has the problem of not being introspection friendly to IDEs.
Furthermore, in the case where additional parameters are possible, it is not clear where to put their description.
We should decide on one approach to use throughout the codebase.

Options that I see currently:

Superclass holds precise docstrings, entailing information about all parameters and what the function is supposed to do. The subclasses would then reference that docstring while making clear if they implement an extension or override the original function, as well as any special behaviour that differs from what was specified in the superclass
Superclasses (especially abstract ones) will only contain minimal docstrings, stating the general idea of the functions and maybe the types of the parameters. Only the subclasses provide the concrete descriptions (will most likely not work well for non-abstract superclasses as they should have proper docstrings on their own).
Both super- and subclasses have full docstrings explaining everything relevant to their current implementation. This requires a lot of duplication of docstrings though.

I guess I would currently favour option 1. Maybe while still providing information about the parameters in the subclasses to aid introspection in IDEs.

Unify variable naming in the codebase

Currently some variables use camelCase, while others use the PEP8 conform underscore separation.
We should decide for one and use it consistently throughout the codebase.

Should we just follow PEP8 on this?

Integrate support for Bayesian Decision Networks

Either migrate the implementation from the primo-legacy branch or integrate the results of M. Holland's Master's thesis (“Dynamische Decision Netzwerke für kooperative Mensch-Maschine-Interaktion”), which was implemented as a part of PRIMO and is therefore LGPL-v3 licensed (by infection).