pymc-devs / pymc-experimental Goto Github PK

Home Page: https://pymc-experimental.readthedocs.io

License: Other

Python 2.00% Shell 0.01% Jupyter Notebook 98.00%

pymc-experimental's Introduction

Welcome to `pymc-experimental`

As PyMC continues to mature and expand its functionality to accommodate more domains of application, we increasingly see cutting-edge methodologies, highly specialized statistical distributions, and complex models appear. While this adds to the functionality and relevance of the project, it can also introduce instability and impose a burden on testing and quality control. To reduce the burden on the main pymc repository, this pymc-experimental repository can become the aggregator and testing ground for new additions to PyMC. This may include unusual probability distributions, advanced model fitting algorithms, innovative yet not fully tested methods or any code that may be inappropriate to include in the pymc repository, but may want to be made available to users.

The pymc-experimental repository can be understood as the first step in the PyMC development pipeline, where all novel code is introduced until it is obvious that it belongs in the main repository. We hope that this organization improves the stability and streamlines the testing overhead of the pymc repository, while allowing users and developers to test and evaluate cutting-edge methods and not yet fully mature features.

pymc-experimental would be designed to mirror the namespaces in pymc to make usage and migration as easy as possible. For example, a ParabolicFractal distribution could be used analogously to those in pymc:

import pymc as pm
import pymc_experimental as pmx

with pm.Model():

    alpha = pmx.ParabolicFractal('alpha', b=1, c=1)

    ...

Questions

What belongs in `pymc-experimental`?

newly-implemented statistical methods, for example step methods or model construction helpers
distributions that are tricky to sample from or test
infrequently-used fitting methods or distributions
any code that requires additional optimization before it can be used in practice

What does not belong in `pymc-experimental`?

Case studies
Implementations that cannot be applied generically, for example because they are tied to variables from a toy example

Should there be more than one add-on repository?

Since there is a lot of code that we may not want in the main repository, does it make sense to have more than one additional repository? For example, pymc-experimental may just include methods that are not fully developed, tested and trusted, while code that is known to work well and has adequate test coverage, but is still too specialized to become part of pymc could reside in a pymc-extras (or similar) repository.

Unanswered questions & ToDos

This project is still young and many things have not been answered or implemented. Please get involved!

What are guidelines for organizing submodules?
- Proposal: No default imports of WIP/unstable submodules. By importing manually we can avoid breaking the package if a submodule breaks, for example because of an updated dependency.

pymc-experimental's People

Stargazers

Watchers

Forkers

bwengals ccaprani lucianopaz teawolf oriolabril andrewherren larryshamalama miriana19 kylejcaron jessspearing kunalghosh 5hv5hvnk markusschmaus danhphan tiqets jessegrabowski wd60622 ricardov94 michaelraczycki shreyas3156 timolivermaier mbjoseph pdb5627 astefano boldorider4 augub tvwenger b-zwarg abebual ciguaran theorashid abdalazizrashid mowgliamu jbrinchmann zaxtax fonnesbeck xieyj17 ruiwenxie alexandorra donnut maresb carsten-j souvikpaul33

pymc-experimental's Issues

Make a release

Changes in the model builder need to be released in order to proceed with it's integration to other pymc repositories, should be done after merging #135, to be able to use automatic release pipelines

ModelBuilder's `predict_posterior` returns draws from just one chain?

I've been experimenting with ModelBuilder to get a cleaner API for fitting/saving/loading/predicting, but I'm a bit confused by the shape of predict_posterior() output.

I would expect to get samples from the posterior predictive distribution for each chain and draw, but I'm getting what appear to be values from just one chain.

Here's a reproducible example:

import numpy as np
import pandas as pd
from pymc_experimental.tests.test_model_builder import test_ModelBuilder

model = test_ModelBuilder.initial_build_and_fit()
print(model.idata.posterior_predictive["y_model"].shape) # (3, 1000, 100)


x_pred = np.random.uniform(low=0, high=1, size=100)
prediction_data = pd.DataFrame({"input": x_pred})
pred = model.predict_posterior(prediction_data)

print(pred["y_model"].shape) # (1000, 100), but I expect (3, 1000, 100)

I would expect that sampling from the posterior predictive distribution with prediction_data would yield an array of shape (chains, draws, samples). In this case I'd like posterior predictive samples of shape (3, 1000, 100) rather than (1000, 100).

Perhaps this is due to indexing with [0] here, which would appear to select values from just the first chain?

pymc-experimental/pymc_experimental/model_builder.py

Line 350 in 5f1c2bb

post_pred_dict[key] = post_pred.posterior_predictive[key].to_numpy()[0]

Are my expectations not aligned with the intended behavior? Happy to have missed something. Thanks! 😃

Turn histogram approximation into a RandomVariable

This should allow for better integration with PyMC, like support for posterior predictive sampling and more informative/nicer model representation (both graphviz and latex). @ferrine @ricardoV94

pytest fails locally

After pulling latest version of main, suddenly I can't run tests locally, after using

pytest tests/test_model_builder.py
Im getting :
________________________________________________________________ ERROR collecting pymc_experimental/tests/test_model_builder.py ________________________________________________________________
ImportError while importing test module '/Users/michalraczycki/Documents/pymc-experimental/pymc-experimental/pymc_experimental/tests/test_model_builder.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/importlib/init.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
init.py:31: in
from pymc_experimental.marginal_model import MarginalModel
marginal_model.py:10: in
from pymc.logprob.basic import factorized_joint_logprob
E ModuleNotFoundError: No module named 'pymc.logprob.basic'

@ricardoV94, @twiecki any suggestions on how to fix it?

Kernel crashes during pathfinder fitting

For several models (not sure what characteristics yet cause if to fail), fitting with pathfinder causes the Python kernel to crash. I've tried this both on CPU and GPU. The log is below:

info 12:18:18.397: Disposing kernel .jvsc74a57bd0a0954a124d2bf69a9421ff2dbb2519042f2244cade5040e3d13151bd4c002865./Users/cfonnesbeck/mambaforge/envs/pie/python./Users/cfonnesbeck/mambaforge/envs/pie/python.-m#ipykernel_launcher for notebook Interactive-1.interactive due to selection of another kernel or closing of the notebook
info 12:18:18.397: Dispose Kernel 'Interactive-1.interactive' associated with '/Users/cfonnesbeck/phillies/pie/research/projections/pitchers/pitcher_proj.py'
info 12:18:18.406: Dispose Kernel 'Interactive-1.interactive' associated with '/Users/cfonnesbeck/phillies/pie/research/projections/pitchers/pitcher_proj.py'
[I 12:18:18.433 NotebookApp] Kernel shutdown: db435396-0e9c-48cf-8b9d-7ac31bccd3da
[I 12:18:19.402 NotebookApp] Starting buffering for db435396-0e9c-48cf-8b9d-7ac31bccd3da:05dcd16b-b821-4492-bbbf-5224df1874c6
info 12:18:23.589: Starting interactive window for resource '/Users/cfonnesbeck/phillies/pie/research/projections/pitchers/pitcher_proj.py' with controller '.jvsc74a57bd0a0954a124d2bf69a9421ff2dbb2519042f2244cade5040e3d13151bd4c002865./Users/cfonnesbeck/mambaforge/envs/pie/python./Users/cfonnesbeck/mambaforge/envs/pie/python.-m#ipykernel_launcher (Interactive)'
info 12:18:23.636: Attempting to start a server because of preload conditions ...
info 12:18:23.661: Process Execution: > ~/mambaforge/envs/pie/bin/python -m pip list
> ~/mambaforge/envs/pie/bin/python -m pip list
info 12:18:24.433: Got env vars with python /Users/cfonnesbeck/mambaforge/envs/pie/bin/python, with env var count 104 and custom env var count 93 in 780ms
info 12:18:24.694: Starting Jupyter Session startUsingPythonInterpreter, .jvsc74a57bd0a0954a124d2bf69a9421ff2dbb2519042f2244cade5040e3d13151bd4c002865./Users/cfonnesbeck/mambaforge/envs/pie/python./Users/cfonnesbeck/mambaforge/envs/pie/python.-m#ipykernel_launcher (Python Path: /Users/cfonnesbeck/mambaforge/envs/pie, EnvType: Conda, EnvName: 'pie', Version: 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 08:48:25) 
[Clang 14.0.6 ]) for 'Interactive-1.interactive' (disableUI=false)
[I 12:18:24.699 NotebookApp] Creating new notebook in /research/projections/pitchers
info 12:18:24.709: installMissingDependencies /Users/cfonnesbeck/mambaforge/envs/pie/bin/python, ui.disabled=false for resource '/Users/cfonnesbeck/phillies/pie/research/projections/pitchers/pitcher_proj.py'
info 12:18:24.710: Got env vars with python /Users/cfonnesbeck/mambaforge/envs/pie/bin/python, with env var count 104 and custom env var count 93 in 0ms
info 12:18:24.711: Process Execution: > ~/mambaforge/envs/pie/bin/python -c "import ipykernel"
> ~/mambaforge/envs/pie/bin/python -c "import ipykernel"
info 12:18:24.923: Spec argv[0] updated from '/Users/cfonnesbeck/mambaforge/envs/pie/bin/python' to '/Users/cfonnesbeck/mambaforge/envs/pie/bin/python'
info 12:18:24.934: Got env vars with python /Users/cfonnesbeck/mambaforge/envs/pie/bin/python, with env var count 104 and custom env var count 93 in 797ms
[I 12:18:24.952 NotebookApp] Kernel started: abd00f78-2946-4cf5-a842-d3e4b08ea45e, name: pythonjvsc74a57bd0a0954a124d2bf69a9421ff2dbb2519042f2244cade5040e3d13151bd4c002865
[W 12:18:24.964 NotebookApp] delete /research/projections/pitchers/pitcher_proj.py-jvsc-1a415af0-45bf-4b4b-a0c6-7ef772010b09afc69f4d-cad7-4334-ad4e-a60d17a4e4cf.ipynb
info 12:18:25.640: Started session for kernel startUsingPythonInterpreter:.jvsc74a57bd0a0954a124d2bf69a9421ff2dbb2519042f2244cade5040e3d13151bd4c002865./Users/cfonnesbeck/mambaforge/envs/pie/python./Users/cfonnesbeck/mambaforge/envs/pie/python.-m#ipykernel_launcher
info 12:18:25.642: UpdateWorkingDirectoryAndPath in Kernel
info 12:18:25.671: Generated code for 1 = <ipython-input-1-e8e064250631> with 30 lines
info 12:18:25.748: Got env vars with python /Users/cfonnesbeck/mambaforge/envs/pie/bin/python, with env var count 104 and custom env var count 93 in 825ms
info 12:18:28.377: Generated code for 2 = <ipython-input-2-7fcc05f8245f> with 97 lines
info 12:18:28.406: Generated code for 3 = <ipython-input-3-2e36476b3ed3> with 217 lines
info 12:18:28.442: Generated code for 4 = <ipython-input-4-1d2b8ff62d64> with 228 lines
info 12:18:28.482: Generated code for 5 = <ipython-input-5-7f1077aeacc4> with 27 lines
info 12:18:28.506: Generated code for 6 = <ipython-input-6-f7c4cace876c> with 96 lines
info 12:18:36.647: Cancel all remaining cells true || Error || undefined
info 12:22:22.264: Cancel all remaining cells true || Error || undefined
info 12:22:50.255: Disposing kernel .jvsc74a57bd0a0954a124d2bf69a9421ff2dbb2519042f2244cade5040e3d13151bd4c002865./Users/cfonnesbeck/mambaforge/envs/pie/python./Users/cfonnesbeck/mambaforge/envs/pie/python.-m#ipykernel_launcher for notebook Interactive-1.interactive due to selection of another kernel or closing of the notebook
info 12:22:50.255: Dispose Kernel 'Interactive-1.interactive' associated with '/Users/cfonnesbeck/phillies/pie/research/projections/pitchers/pitcher_proj.py'
info 12:22:50.259: Dispose Kernel 'Interactive-1.interactive' associated with '/Users/cfonnesbeck/phillies/pie/research/projections/pitchers/pitcher_proj.py'
[I 12:22:50.262 NotebookApp] Kernel shutdown: abd00f78-2946-4cf5-a842-d3e4b08ea45e
info 12:22:55.344: Starting interactive window for resource '/Users/cfonnesbeck/phillies/pie/research/projections/pitchers/pitcher_proj.py' with controller '.jvsc74a57bd0a0954a124d2bf69a9421ff2dbb2519042f2244cade5040e3d13151bd4c002865./Users/cfonnesbeck/mambaforge/envs/pie/python./Users/cfonnesbeck/mambaforge/envs/pie/python.-m#ipykernel_launcher (Interactive)'
info 12:22:55.391: Attempting to start a server because of preload conditions ...
info 12:22:55.453: Starting Jupyter Session startUsingPythonInterpreter, .jvsc74a57bd0a0954a124d2bf69a9421ff2dbb2519042f2244cade5040e3d13151bd4c002865./Users/cfonnesbeck/mambaforge/envs/pie/python./Users/cfonnesbeck/mambaforge/envs/pie/python.-m#ipykernel_launcher (Python Path: /Users/cfonnesbeck/mambaforge/envs/pie, EnvType: Conda, EnvName: 'pie', Version: 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 08:48:25) 
[Clang 14.0.6 ]) for 'Interactive-1.interactive' (disableUI=false)
[I 12:22:55.457 NotebookApp] Creating new notebook in /research/projections/pitchers
info 12:22:55.465: installMissingDependencies /Users/cfonnesbeck/mambaforge/envs/pie/bin/python, ui.disabled=false for resource '/Users/cfonnesbeck/phillies/pie/research/projections/pitchers/pitcher_proj.py'
info 12:22:55.466: Got env vars with python /Users/cfonnesbeck/mambaforge/envs/pie/bin/python, with env var count 104 and custom env var count 93 in 1ms
info 12:22:55.467: Process Execution: > ~/mambaforge/envs/pie/bin/python -c "import ipykernel"
> ~/mambaforge/envs/pie/bin/python -c "import ipykernel"
info 12:22:55.654: Spec argv[0] updated from '/Users/cfonnesbeck/mambaforge/envs/pie/bin/python' to '/Users/cfonnesbeck/mambaforge/envs/pie/bin/python'
info 12:22:55.655: Got env vars with python /Users/cfonnesbeck/mambaforge/envs/pie/bin/python, with env var count 104 and custom env var count 93 in 1ms
[I 12:22:55.678 NotebookApp] Kernel started: 00b2089f-89e5-4996-bf18-f5be75bb5cd3, name: pythonjvsc74a57bd0a0954a124d2bf69a9421ff2dbb2519042f2244cade5040e3d13151bd4c002865
[W 12:22:55.685 NotebookApp] delete /research/projections/pitchers/pitcher_proj.py-jvsc-a9a4b23d-ab0d-47a1-8cc3-d7a0922f8fd0e9568bc4-19d4-4ea4-9442-b172f24c2401.ipynb
info 12:22:55.894: Got env vars with python /Users/cfonnesbeck/mambaforge/envs/pie/bin/python, with env var count 104 and custom env var count 93 in 0ms
info 12:22:56.150: Started session for kernel startUsingPythonInterpreter:.jvsc74a57bd0a0954a124d2bf69a9421ff2dbb2519042f2244cade5040e3d13151bd4c002865./Users/cfonnesbeck/mambaforge/envs/pie/python./Users/cfonnesbeck/mambaforge/envs/pie/python.-m#ipykernel_launcher
info 12:22:56.151: UpdateWorkingDirectoryAndPath in Kernel
info 12:22:56.173: Generated code for 1 = <ipython-input-1-e8e064250631> with 30 lines
info 12:22:58.356: Generated code for 2 = <ipython-input-2-7fcc05f8245f> with 97 lines
info 12:22:58.388: Generated code for 3 = <ipython-input-3-2e36476b3ed3> with 217 lines
info 12:22:58.428: Generated code for 4 = <ipython-input-4-1d2b8ff62d64> with 228 lines
info 12:22:58.465: Generated code for 5 = <ipython-input-5-7f1077aeacc4> with 27 lines
info 12:22:58.494: Generated code for 6 = <ipython-input-6-f7c4cace876c> with 96 lines
info 12:23:19.337: Cancel all remaining cells true || Error || undefined
info 12:25:01.080: Cancel all remaining cells true || Error || undefined
[I 12:27:01.670 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
error 12:27:01.677: Error in waiting for cell to complete Error: Canceled future for execute_request message before replies were done
    at t.KernelShellFutureHandler.dispose (/Users/cfonnesbeck/.vscode/extensions/ms-toolsai.jupyter-2022.11.1003412109/out/extension.node.js:2:32353)
    at /Users/cfonnesbeck/.vscode/extensions/ms-toolsai.jupyter-2022.11.1003412109/out/extension.node.js:2:26572
    at Map.forEach (<anonymous>)
    at v._clearKernelState (/Users/cfonnesbeck/.vscode/extensions/ms-toolsai.jupyter-2022.11.1003412109/out/extension.node.js:2:26557)
    at /Users/cfonnesbeck/.vscode/extensions/ms-toolsai.jupyter-2022.11.1003412109/out/extension.node.js:2:29000
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
warn 12:27:01.679: Cell completed with errors {
  message: 'Canceled future for execute_request message before replies were done'
}
info 12:27:01.680: Cancel all remaining cells true || Error || undefined
[I 12:27:01.677 NotebookApp] Starting buffering for 00b2089f-89e5-4996-bf18-f5be75bb5cd3:8468a364-83c1-4bc6-bc88-d1fcfce66644
[I 12:27:01.688 NotebookApp] Restoring connection for 00b2089f-89e5-4996-bf18-f5be75bb5cd3:8468a364-83c1-4bc6-bc88-d1fcfce66644

Implement a CorrCholesky transformation for LKJCorr

Came across this issue in TFP: tensorflow/probability#400 and found out that our transformation of LKJ is just a interval transformation - that's incorrect as it will produce invalid correlation matrix right?
https://github.com/pymc-devs/pymc3/blob/master/pymc3/distributions/multivariate.py#L1158-L1159

Cut a release

Users are confused that they can't install via pip #127 , we should just push a version to pypi.

Make xhistogram optional

Given the wide range of utilities we want to add to pymc-experimental it would be great if any dependencies that are not bundled with PyMC were optional. If I don't intend to use the histogram functionality, I would rather not have to install xhistogram.

CC @ferrine

Implement strict sign for r2d2m2cp

There are cases where variable sign may cause symmetries in the posterior distribution, but the prior is still usefull in the context

So far

sigma, beta = pmx.distributions.R2D2M2CP(
            "beta",
            1,
            inputs.std(),
            dims="vars",
            r2=0.6,
            r2_std=0.1,
            positive_probs=[1, 0.7],
            variables_importance=[20, 10],
            centered=True,
        )

Will fail because nan will be resulted due to positive_probs[0] = 1

In this case I think it does make sense to use another distribution for the prior. The immediate choice is the Truncated normal distibuton with mean ans sigma calculated at p=0.99.

Additionally, in case positive_probs_std is provided, std=0 would be required for the entry

Any other thoughts on this?

Get rid of python-version specific conda environments

We already did this in pymc: pymc-devs/pymc#5911

GP fails CI builds

There is no test for the GP modules in pmx + the is a constant error in CI builds

ImportError: cannot import name 'infer_size' from 'pymc.gp.util' (/usr/share/miniconda3/envs/pymc-test-py38/lib/python3.8/site-packages/pymc/gp/util.py

CC @michaelosthege @bwengals

`named_vars_to_dims` should be copied when cloning MarginalModel

We should probably copy them, as they are a mutable container. We are not manipulating them for now, but might one day :)

https://github.com/pymc-devs/pymcx/blob/5a01d4fc5d58fc1abb52f13fcb5cd4e8b32292ba/pymc_experimental/marginal_model.py#L202

A shallow .copy() should suffice

Add parser to convert from Stan to PyMC models

Wild one for anyone who likes writing text parsers. WDYT?

Idea originated from here: https://discourse.pymc.io/t/support-for-declarative-graphic-modeling-language-such-as-winbugs-format/10321

Add docs

model_builder scikit-learn integration

I am working on making an existing scikit-learn model pipeline produce probabilistic output. To do that, I used model_builder to make a pymc model that could integrate into a scikit-learn Pipeline, including standardization of inputs and outputs. However, I find that the current API doesn't seem suitable for this. I made my own modifications to the ModelBuilder class and example LinearModel subclass to get it to work. I think the main change was to have the fit and predict methods take X and y as separate parameters rather than as members of a data dict with specially-named keys. My reference for the scikit-learn estimator API is the scikit-learn documentation and template for TemplateEstimator.

I very well might be one the wrong track (or at least on a different one than what model_builder intends), but what I came up with seems to work for being able to apply sklearn.preprocessing.StandardScaler to inputs and to point outputs using sklearn.compose.TransformedTargetRegressor. These seem like reasonable goals for ModelBuilder subclasses to be able to integrate with, so maybe tests and/or examples of such would be good.

Any thoughts? I'm happy to contribute what I can.

Enable tests on Github

Broken CI

The workflow is not valid. .github/workflows/test.yml (Line: 22, Col: 22): Unexpected value 'pymc-experimental/tests' .github/workflows/test.yml (Line: 84, Col: 22): Unexpected value 'pymc-experimental/tests'

This is from #72

Change GP covariance `spectral_density` method into a dispatcher

At the moment, if one wants to use the HSGP class in a GP, it's necessary to change the imports of the covariance kernels to use the ExpQuad or Matern* kernels that are defined in latent_approx.py. To make HSGP usable in models without having to change all the kernel imports, it would be better to use a dispatcher that returns the kernel's spectral density. We could also take advantage that the spectral density of the sum of kernels is the sum of the spectral densities of each kernel, or that the spectral density of the product of a kernel with a scalar is the product of the scalar with the spectral density of the kernel, to also dispatch on such cases. I still need to find a simple way to dispatch the product of two kernels, but if we get that in, then we can basically cover all stationary kernels.

Implement apply_permutation as in gpytorch.utils.permutation.apply_permutation

Implement apply_permutation as in gpytorch.utils.permutation.apply_permutation and remove the dependency on PyTorch and GPyTorch from the pivoted Cholesky module see #63

Add tests and coverage badges to README

Finalize draft proposal for pymc-experimental

I have seeded this repository with a README that is a proposal of how the repository is intended to be used by the PyMC development team. The proposal is only partially written, so you can contribute by editing and fleshing out the document and by contributing to the discussion in this issue.

Extend marginalization support to Truncated discrete RVs

#91 Implements automatic logp marginalization for bernoulli, categorical and discrete uniform, as those are the only default discrete univariate RVs with finite support. We could extend it easily to right-truncated discrete RVs so that people can use it for Truncated poisson, binomial, mixtures, etc...)

We could also extend it to right-clipped RVs, but I am not sure how useful that is

Can't find on PIP

I'm trying to install pymc-experimental to gain access to the ModelBuilder.

I'm following the notebook Using ModelBuilder class for deploying PyMC models and it says to install using pip install pymc-experimental, but that just throws the following error:

ERROR: Could not find a version that satisfies the requirement pymc-experimental (from versions: none)
ERROR: No matching distribution found for pymc-experimental

Platform: Apple M1
OS: macOs Ventura 13.2.1 (22D68)

Broken docs for GP module

HSGP was moved to PyMC, but we still have the reference, related to #144.

pymc-experimental/docs/api_reference.rst

Line 42 in e38da06

latent_approx.HSGP

Incorrect `PGBART.stats_dtypes` annotation

Description of your problem

The BlockedStep.stats_dtypes should map stat names to NumPy dtypes.
Note that this also applies to arrays and we should probably introduce a BlockedStep.stats_shapes to account for their shape (related to step method refactoring).

But mypy revealed that PGBART.stats_dtypes incorrectly maps to type.

I determined that it should be "variable_inclusion": np.int64, but for the "bart_trees" I was unable to determine the dtype from reading the code.

@aloctavodia can you take care of this?

pre-commit in CI is broken

2s
0s
0s
16s
Run pre-commit/[email protected]
install pre-commit
/opt/hostedtoolcache/Python/3.[1](https://github.com/pymc-devs/pymc-experimental/actions/runs/4187098674/jobs/7256499962#step:4:1)1.1/x64/bin/pre-commit run --show-diff-on-failure --color=always --all-files
[INFO] Initializing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Initializing environment for https://github.com/PyCQA/isort.
[INFO] Initializing environment for https://github.com/asottile/pyupgrade.
[INFO] Initializing environment for https://github.com/psf/black.
[INFO] Initializing environment for https://github.com/psf/black:.[jupyter].
[INFO] Initializing environment for https://github.com/PyCQA/pylint.
[INFO] Initializing environment for https://github.com/MarcoGorelli/madforhooks.
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/PyCQA/isort.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
An unexpected error has occurred: CalledProcessError: command: ('/home/runner/.cache/pre-commit/repokgoirkfl/py_env-python3/bin/python', '-mpip', 'install', '.')
return code: 1
stdout:
    Processing /home/runner/.cache/pre-commit/repokgoirkfl
      Installing build dependencies: started
      Installing build dependencies: finished with status 'done'
      Getting requirements to build wheel: started
      Getting requirements to build wheel: finished with status 'done'
      Preparing metadata (pyproject.toml): started
      Preparing metadata (pyproject.toml): finished with status 'error'
    
stderr:
      error: subprocess-exited-with-error
      
      × Preparing metadata (pyproject.toml) did not run successfully.
      │ exit code: 1
      ╰─> [17 lines of output]
          Traceback (most recent call last):
            File "/home/runner/.cache/pre-commit/repokgoirkfl/py_env-python3/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
              main()
            File "/home/runner/.cache/pre-commit/repokgoirkfl/py_env-python3/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
              json_out['return_val'] = hook(**hook_input['kwargs'])
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            File "/home/runner/.cache/pre-commit/repokgoirkfl/py_env-python3/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 1[49](https://github.com/pymc-devs/pymc-experimental/actions/runs/4187098674/jobs/7256499962#step:4:51), in prepare_metadata_for_build_wheel
              return hook(metadata_directory, config_settings)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            File "/tmp/pip-build-env-0nb1ureh/overlay/lib/python3.11/site-packages/poetry/core/masonry/api.py", line 40, in prepare_metadata_for_build_wheel
              poetry = Factory().create_poetry(Path(".").resolve(), with_groups=False)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            File "/tmp/pip-build-env-0nb1ureh/overlay/lib/python3.11/site-packages/poetry/core/factory.py", line [57](https://github.com/pymc-devs/pymc-experimental/actions/runs/4187098674/jobs/7256499962#step:4:59), in create_poetry
              raise RuntimeError("The Poetry configuration is invalid:\n" + message)
          RuntimeError: The Poetry configuration is invalid:
            - [extras.pipfile_deprecated_finder.2] 'pip-shims<=0.3.4' does not match '^[a-zA-Z-_.0-9]+$'
          
          [end of output]
      
      note: This error originates from a subprocess, and is likely not a problem with pip.
    error: metadata-generation-failed
    
    × Encountered error while generating package metadata.
    ╰─> See above for output.
    
    note: This is an issue with the package mentioned above, not pip.
    hint: See above for details.
    
Check the log at /home/runner/.cache/pre-commit/pre-commit.log
Error: The process '/opt/hostedtoolcache/Python/3.11.1/x[64](https://github.com/pymc-devs/pymc-experimental/actions/runs/4187098674/jobs/7256499962#step:4:66)/bin/pre-commit' failed with exit code 3

Change ModelBuilder model_config and sample_config saving to JSON

Hi!

Due to how model_config is serialized when saving a ModelBuilder object, it only works with flat dictionaries. In my use case, I'm describing model configuration through a more nested structure, .e.g. with dict being used as value of a key.
One potential solution to support this could be to turn the model_config into a string using json.
I'm not familiar with the hashing mechanism, but using nested dicts might also impact this part of the code

Thanks!
Manu

CI is broken

pymc_experimental/__init__.py:31: in <module>
    from pymc_experimental.marginal_model import MarginalModel
pymc_experimental/marginal_model.py:10: in <module>
    from pymc.logprob.joint_logprob import factorized_joint_logprob
E   ModuleNotFoundError: No module named 'pymc.logprob.joint_logprob'

Looks like a PyMC API change.

Add link to papers for the gp.Latent approximations

See #3 (comment)_

consistent naming in model_builder.py

I wanted to propose few minor changes to increase consistency in the code. What I noticed is that all the functions are named with verbs, as usual, but then there's 'id' function. In my opinion it would be better to stick with the rest of the convention and call it 'generate_id'. Second thing is that other functions have type hinting, and in case of id function even though the comment inside the function itself contains description of what type it returns, the hinting is not added. If you agree with those changes I'll create a pull request right away

test_marginal is flakey

pymc_experimental/tests/test_marginal_model.py::test_marginalized_deterministic_and_potential - AssertionError: 
Arrays are not almost equal to 7 decimals

Mismatched elements: 2 / 5 (40%)
Max absolute difference: 2.3841858e-07
Max relative difference: 9.66977e-08
 x: array([ 1.6125156,  2.3487637,  2.4792917, -1.0471407,  2.4656074],
      dtype=float32)
 y: array([ 1.6125154,  2.3487637,  2.4792914, -1.0471407,  2.4656076],
      dtype=float32)
FAILED pymc_experimental/tests/distributions/test_continuous.py::TestGenExtremeClass::test_logcdf - AssertionError: 
Arrays are not almost equal to 2 decimals
{'mu': array(0.01, dtype=float32), 'sigma': array(1., dtype=float32), 'xi': array(0.99, dtype=float32), 'value': array(-1., dtype=float32)}
Mismatched elements: 1 / 1 (100%)
Max absolute difference: 1.86426127
Max relative difference: 0.00016986
 x: array(-10973.14, dtype=float32)
 y: array(-10975.01)

Tweaks to `predict` method of `ModelBuilder`

Remove point_estimate kwarg of the ModelBuilder.predict method. Instead have two distinct methods...

predict (which returns the full idata) and
predict_point which returns the point estimate.

Tagging @michaelosthege, @twiecki.

Add automatic release notes

Just like we do in pymc: https://github.com/pymc-devs/pymc/blob/main/.github/release.yml

Cannot compile GeneralizedPoisson model

Looks like maybe a new version of pytensor broke the GeneralizedPoisson distribution. Until recently I used to be able to use the generalized poisson distribution as a likelihood just fine. Now when I run pm.sample I get a lot of these types of errors

ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
ERROR (pytensor.graph.rewriting.basic): node: Elemwise{erfcx,no_inplace}(TensorConstant{-7.071067932881649})
ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):

Later it mentions a missing file in the pytensor package

[Errno 2] No such file or directory: '/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/pytensor/scalar/c_code/Faddeeva.cc'

Model Builder broken by recent PyMC changes

@Monsurat-Onabajo and @BerylKanali discovered that marginal_model.py has this import statement:

from pymc.logprob.basic import factorized_joint_logprob

Which has been broken by this.

propose new `ConstrainedUniform` distribution

Following the discussion #5066 (with input from @aseyboldt), I propose a new ConstrainedUniform distribution. An ideal use case for this is as a distribution for cutpoints in an ordinal regression.

The idea is that the distribution would be a vector of N>2 uniformly distributed values where the first entry is constrained to take on a given min value and the final entry is constrained to take on a given max value. The first value is hard constrained to take on a particular value, and the final value is also constrained to take on a particular value - because of the use of the Dirichlet (which sums to 1). Use of the cumsum was also found to be necessary to enforce ordering and avoid divergences in sampling.

A ConstrainedUniform(N) would have N-2 degrees of freedom.

Minimum working example

The new distribution would be along these lines

def constrainedUniform(N, min=0, max=1):
    return pm.Deterministic('theta',
                             at.concatenate([
                                 np.ones(1)*min,
                                 at.extra_ops.cumsum(pm.Dirichlet("theta_unknown", a=np.ones(N-2))) (min+(max-min))
                             ])
                           )

If you had the following observations

y = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3,
       3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6])

K = len(np.unique(y))
print(f"K = {K} categories")

Then you could have a pleasingly concise ordinal regression model

with pm.Model() as model:
    theta = constrainedUniform(K)
    mu = pm.Normal('mu', mu=K/2, sigma=K)
    sigma = pm.HalfNormal("sigma", 1)
    pm.OrderedProbit("y_obs", cutpoints=theta, eta=mu, sigma=sigma, observed=y)

Note that this amounts to a very simple 'intercept only' model, where mu is this intercept. There are no predictor variables in this example

Implementation issues

When converting that function constrainedUniform into a PyMC distribution, then at the very least, you'd have to consider:

we get some divergences when moving away from min=0 and max=1 it seems
receive coords
I've not gone through this, but @lucianopaz points out there might be something to learn from this blog post https://betanalpha.github.io/assets/case_studies/ordinal_regression.html#22_Surgical_Cut.
Possibly relevant to the above is the numpyro.SimplexToOrderedTransform
think about usage in a hierarchical ordinal regression context... shape?

pymc_experimental/model_builder.py doctest is broken

Doctest there was never run, and the test suit is broken

Rename package to pymcx

And start releasing!

Cover `DiscreteMarkovChain` distributions with the marginal models

Add pre-commit hook

model_builder function decorators

Some of the model_builder functions seem to have been written to act like an abstract method, but they don't have the decorator. Is it intentional? It might be good to think which of the not implemented methods will be required in all of the inheriting classes , and adding the decorators unless it's intended to provide the default behavior in parent class

Fail tests with uncaught warnings

The earlier we start, the easier it is :)

Test GPs Approximations

GPs are not tested. We need Full coverage of the experimental functionality for GPs. Refactoring appreciated, examples as well.

May also include pymc-devs/pymc#6649

pytest is checking the docstrings too when passed on windows

I believe we should not pass --doctest-modules in here, because of this even the doc string from code is passed and throws errors.

pymc-experimental/.github/workflows/test.yml

Line 134 in 14e2406

    
                     python -m pytest -vv --cov=pymc_experimental --doctest-modules pymc_experimental --cov-append --cov-report=xml --cov-report term --durations=50 %TEST_SUBSET%

Would be nice to extend the change-point notebook to include the plots in the STAN doc: https://mc-stan.org/docs/stan-users-guide/change-point.html

List more stuff in API reference

Missing marginal model, distributions, gp stuff...

Fix automatic doc building

We can now build docs manually but automatic building of docs when a PR is merged is failing.

https://github.com/pymc-devs/pymc-experimental/runs/5598361159?check_suite_focus=true

maybe this is related to the fact that we are using sphinx-multiversion, but we do not have any version yet.