pymc-devs / pymc2 Goto Github PK

THIS IS THE **OLD** PYMC PROJECT (VERSION 2). PLEASE USE PYMC INSTEAD:

Home Page: http://pymc-devs.github.com/pymc/

License: Other

Fortran 76.12% Shell 0.06% C 0.47% Python 16.17% TeX 7.17% Batchfile 0.01%

pymc2's Introduction

Introduction

Version:	2.3.8
Authors:	Chris Fonnesbeck Anand Patil David Huard John Salvatier
Web site:	https://github.com/pymc-devs/pymc
Documentation:	http://bit.ly/pymc_docs
Copyright:	This document has been placed in the public domain.
License:	PyMC is released under the Academic Free License.

https://secure.travis-ci.org/pymc-devs/pymc.png

http://img.shields.io/pypi/v/pymc.svg?style=flat

http://img.shields.io/badge/license-AFL-blue.svg?style=flat

NOTE: The current version PyMC (version 3) has been moved to its own repository called pymc3. Unless you have a good reason for using this package, we recommend all new users adopt PyMC3.

Purpose

PyMC is a python module that implements Bayesian statistical models and fitting algorithms, including Markov chain Monte Carlo. Its flexibility and extensibility make it applicable to a large suite of problems. Along with core sampling functionality, PyMC includes methods for summarizing output, plotting, goodness-of-fit and convergence diagnostics.

Features

PyMC provides functionalities to make Bayesian analysis as painless as possible. Here is a short list of some of its features:

Fits Bayesian statistical models with Markov chain Monte Carlo and other algorithms.
Includes a large suite of well-documented statistical distributions.
Uses NumPy for numerics wherever possible.
Includes a module for modeling Gaussian processes.
Sampling loops can be paused and tuned manually, or saved and restarted later.
Creates summaries including tables and plots.
Traces can be saved to the disk as plain text, Python pickles, SQLite or MySQL database, or hdf5 archives.
Several convergence diagnostics are available.
Extensible: easily incorporates custom step methods and unusual probability distributions.
MCMC loops can be embedded in larger programs, and results can be analyzed with the full power of Python.

What's new in version 2

This second version of PyMC benefits from a major rewrite effort. Substantial improvements in code extensibility, user interface as well as in raw performance have been achieved. Most notably, the PyMC 2 series provides:

New flexible object model and syntax (not backward-compatible).
Reduced redundant computations: only relevant log-probability terms are computed, and these are cached.
Optimized probability distributions.
New adaptive blocked Metropolis step method.
Much more!

Usage

First, define your model in a file, say mymodel.py (with comments, of course!):

# Import relevant modules
import pymc
import numpy as np

# Some data
n = 5 * np.ones(4, dtype=int)
x = np.array([-.86, -.3, -.05, .73])

# Priors on unknown parameters
alpha = pymc.Normal('alpha', mu=0, tau=.01)
beta = pymc.Normal('beta', mu=0, tau=.01)

# Arbitrary deterministic function of parameters
@pymc.deterministic
def theta(a=alpha, b=beta):
    """theta = logit^{-1}(a+b)"""
    return pymc.invlogit(a + b * x)

# Binomial likelihood for data
d = pymc.Binomial('d', n=n, p=theta, value=np.array([0., 1., 3., 5.]),
                  observed=True)

Save this file, then from a python shell (or another file in the same directory), call:

import pymc
import mymodel

S = pymc.MCMC(mymodel, db='pickle')
S.sample(iter=10000, burn=5000, thin=2)
pymc.Matplot.plot(S)

This example will generate 10000 posterior samples, thinned by a factor of 2, with the first half discarded as burn-in. The sample is stored in a Python serialization (pickle) database.

History

PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastings samplers, with an aim to making Markov chain Monte Carlo (MCMC) more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to a usable state.

In 2006, David Huard and Anand Patil joined Chris Fonnesbeck on the development team for PyMC 2.0. This iteration of the software strives for more flexibility, better performance and a better end-user experience than any previous version of PyMC.

PyMC 2.1 was released in early 2010. It contains numerous bugfixes and optimizations, as well as a few new features. This user guide is written for version 2.1.

Relationship to other packages

PyMC in one of many general-purpose MCMC packages. The most prominent among them is WinBUGS, which has made MCMC and with it Bayesian statistics accessible to a huge user community. Unlike PyMC, WinBUGS is a stand-alone, self-contained application. This can be an attractive feature for users without much programming experience, but others may find it constraining. A related package is JAGS, which provides a more UNIX-like implementation of the BUGS language. Other packages include Hierarchical Bayes Compiler and a number of R packages of varying scope.

It would be difficult to meaningfully benchmark PyMC against these other packages because of the unlimited variety in Bayesian probability models and flavors of the MCMC algorithm. However, it is possible to anticipate how it will perform in broad terms.

PyMC's number-crunching is done using a combination of industry-standard libraries (NumPy and the linear algebra libraries on which it depends) and hand-optimized Fortran routines. For models that are composed of variables valued as large arrays, PyMC will spend most of its time in these fast routines. In that case, it will be roughly as fast as packages written entirely in C and faster than WinBUGS. For finer-grained models containing mostly scalar variables, it will spend most of its time in coordinating Python code. In that case, despite our best efforts at optimization, PyMC will be significantly slower than packages written in C and on par with or slower than WinBUGS. However, as fine-grained models are often small and simple, the total time required for sampling is often quite reasonable despite this poorer performance.

We have chosen to spend time developing PyMC rather than using an existing package primarily because it allows us to build and efficiently fit any model we like within a full-fledged Python environment. We have emphasized extensibility throughout PyMC's design, so if it doesn't meet your needs out of the box chances are you can make it do so with a relatively small amount of code. See the testimonials page on the wiki for reasons why other users have chosen PyMC.

Getting started

This guide provides all the information needed to install PyMC, code a Bayesian statistical model, run the sampler, save and visualize the results. In addition, it contains a list of the statistical distributions currently available. More examples of usage as well as tutorials are available from the PyMC web site.

pymc2's People

Contributors

Stargazers

Watchers

Forkers

mhoward2718 shubhabrataroy zeitgeberh durian888 chadwickade inonchiu inim4 kyleheyne kwloafman morganwang010 chubakur lenovor qhfgva nagyist jmchillis wavelets likaiguo niujiashu loganding kyleam xinchoubiology walterwsmf gaoch023 bramirex predictivesciencelab cvml barneygovan kushalvenkatesh hothhowler pgnepal danring siccarpoint erichseamon arunlodhi jameshensman twiecki kentchun33333 wulmer ampaho cc13ny nunoedgarsciengfirst fbnrst gmp25 xsr-thu keel1982 mattjermywright spidersaint thouska jprosario sashaostr jsiah luxiaolei nkhuyu denglizong jicheng-yan minoobeyzavi thezedd hdubey redame bigdavesmall doolingdavid giuseppebonaccorso jinzhu2014 spaak oscaroes jtkostman raino01r qwshy kyleabeauchamp rlugojr paulhendricks ghostintheshellarise sandy4321 mldl aeternocap timmyzhao tkelleyireland elisej asymmetrichuang sudheendrabk vyraun qliu4676 joshnr13 savourylie schevalier lunaqu kerstinyu shuvayan lshang0311 jp-wright physhik macoup jsmithnoble systemtrader randy3465 pathfire m3rryqold franklinma810 hyperionwang hj3938

pymc2's Issues

Linear regression problem.

Hi! I was trying to implement a simple linear regression in PyMC (2.3.4) with the following code, but I don't obtain correct sampling. Is there some error in the code?

import pymc as pm
import numpy as np

true_intercept = 10
true_slope = 0.5

x = np.linspace(0, 10, 100)
y = true_slope * x + true_intercept
# Jitter x and y
x += 0.5 * (np.random.rand(len(x)) * 2.0 - 1.0)
y += 1.0 * (np.random.rand(len(x)) * 2.0 - 1.0)
scatter(x, y)
# Original data

show()
# Linear regression
b0 = pm.Normal('b0', 0.0, tau=1e-5, value=0)
b1 = pm.Normal('b1', 0.0, tau=1e-5, value=0)

mu = pm.Container([b1 * x[i] + b0 for i, _ in enumerate(x)])


tau = pm.Gamma('tau', 0.01, 0.01, value=0.01)

resp = pm.Normal('y', mu, tau=tau, value=data, observed=True, size=100) 

model = pm.Model([b0, b1, mu, tau, resp])
mcmc = pm.MCMC(model)
mcmc.sample(20000, burn=10000, thin=10)

truncated normal distribution returning positive log-probabilities

With the following code :

I81> vals_nopseudo
O81> [0.0, 0.0, 1.90672964976e-07, 1.36863105826e-07, 4.33465940197e-07]

I82> empirical_mean_nopseudo=np.mean(vals_nopseudo)

I83> empirical_var_nopseudo=np.var(vals_nopseudo)

I84> pymc.truncated_normal_like(vals_nopseudo,mu=empirical_mean_nopseudo,tau=1.0/empirical_var_nopseudo,a=0.0,b=np.inf)
O84> array([ 72.09359559])

Clearly, positive log-probabilities should not be happening. Could there possibly be some precision error?

Question about multivariate proposals

Apologies for any ignorance here, but I'm confused as to how, exactly, PyMC samples a multivariate model.

If I request (for example) 10 samples, a profiler indicates that the likelihood is evaluated ~140 times (approximately n_parameters * n_iterations). Now, I know there was a question and answer here, but there are a still a few confusing points.

The documentation suggests that

If a Metropolis step method handles an array-valued variable, it proposes all elements independently but simultaneously. That is, it decides whether to accept or reject all elements together but it does not attempt to take the posterior correlation between elements into account.

Correct me if I'm wrong, but this sounds like drawing n_parameters samples from a multivariate distribution with a diagonal covariance matrix, then proposing them all at once. Why would this require so many likelihood evaluations?

Looking at the docstrings in the code, the Metropolis StepMethod seems to:

Applies the one-at-a-time Metropolis-Hastings algorithm to the Stochastic over which self has jurisdiction.

Watching the verbose output, this makes sense, as it seems to be updating only one parameter at a time. If this is the case, though, shouldn't I get a sample for each accept/reject step? That is, shouldn't I then have n_parameters * n_iterations samples? Or am I mistaken as to how one at a time Metropolis works?

Thanks so much for all your help, and apologies for any ignorant/stupid questions.

`MCMC.sample` goes over by one when given a non-integer `iter`

If you give the method MCMC.sample() a non-integer parameter iter, then the sampler will attempt to take one more sample than floor(iter), resulting in an IndexError.

Here is a minimal failing example:

import pymc
mcmc = pymc.MCMC(pymc.Model([pymc.Uniform("dummy", lower=0, upper=1)]))
mcmc.sample(iter=2.5)

This gives an IndexError. Here is the full output:

/home/hepxadmin/miniconda/envs/python/lib/python2.7/site-packages/pymc/database/base.py:282: UserWarning: 
Error tallying dummy, will not try to tally it again this chain.
Did you make all the samevariables and step methods tallyable
as were tallyable last time you used the database file?

Error:

Traceback (most recent call last):
  File "/home/hepxadmin/miniconda/envs/python/lib/python2.7/site-packages/pymc/database/base.py", line 272, in tally
    self._traces[name].tally(chain)
  File "/home/hepxadmin/miniconda/envs/python/lib/python2.7/site-packages/pymc/database/ram.py", line 97, in tally
    self._trace[chain][self._index[chain]] = value.copy()
IndexError: index 2 is out of bounds for axis 0 with size 2

  %s""" % (name, ''.join(traceback.format_exception(cls, inst, tb))))

This occurs with an Anaconda install of pymc 2.3.3, running on python 2.7.

Weigths of categorical variables

I'm having a problem with Categorical variables and I trying to find out if it a bug of mine or pymc's. It seems that categorical variables are being unevenly weighted. This can be demonstraded in the following program. It models the well-known Monty Hall [example].(http://en.wikipedia.org/wiki/Monty_Hall_problem) See the explanation below.

import numpy as np
import pymc as pm

gRange = 3
pRange = 3
mRange = 3

g = pm.Categorical('guest', np.ones(gRange)/gRange, value=2, observed=True)
p = pm.Categorical('prize', np.ones(pRange)/pRange)

# Make a CPT table as an array of CPTColumns
mCPT_shifts = {'guest': 3, 'prize': 1}
mCPT = np.array((  0, 0, 0, 0, 0.5, 1, 0, 1, 0.5,
                 0.5, 0, 1, 0,   0, 0, 1, 0, 0.5,
                 0.5, 1, 0, 1, 0.5, 0, 0, 0,   0), dtype=float)
mCPT.shape = (mRange, pRange*gRange)

parents = {'guest': g, 'prize': p}

# Make a node for variable monty
def m_dist(value=1, CPT_shifts=mCPT_shifts, CPT=mCPT, **parents):
    col = sum(CPT_shifts[par]*parents[par] for par in parents)
    return pm.categorical_like(value, p=CPT[:, col])

m = pm.Stochastic(m_dist, "Monty's door", 'monty', parents, dtype=int,
                  observed=True, value=1)
model = pm.MCMC([g, p, m])

The idea is to have two nodes (the prize door and the guest door) as Categorical variables with uniform distribution. And the third (Monty's door), depends on the first two. I implement this as a custom Stochastic that uses the states (values) of the parents to select a column of the CPT (Conditional Probability Table). It then returns a loglikelyhood for a value according to this table column.

By setting the guest evidence at 2 (door C) and Monty's door at 1 (door B) and tracing the prize door we expect 2/3 ot the prize to be at door A and only 1/3 of the prize to be at door C. Using the snippet:

total = 20000
model.sample(iter=total, burn=0, thin=1)
counts = Counter(p.trace[:])
ratio = {var: counts[var]/total for var in counts}

I obtain the following results:

{0: 0.4979, 2: 0.5021}

Even stranger, if I just select door A as a guest and keep door B for Monty I obtain different ratios (the example should be symmetric here as can be seen from the mCPT)!

{0: 0.20695, 2: 0.79305}

These numbers are stable under different settings of sample, burning, and thinning. If this is my mistake I welcome corrections. If not, it seems like pymc is drawing from the wrong distribution...

TestHDF5 in 2.3.4: tight dependency on recent tables in the tests

From @yarikoptic.
Few of my builds for neurodebian failed with

======================================================================
ERROR: test suite for <class 'pymc.tests.test_database.TestHDF5'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/suite.py", line 208, in run
    self.setUp()
  File "/usr/lib/python2.7/dist-packages/nose/suite.py", line 291, in setUp
    self.setupContext(ancestor)
  File "/usr/lib/python2.7/dist-packages/nose/suite.py", line 314, in setupContext
    try_run(context, names)
  File "/usr/lib/python2.7/dist-packages/nose/util.py", line 469, in try_run
    return func()
  File "/tmp/buildd/pymc-2.3.4+ds/build/lib.linux-x86_64-2.7/pymc/tests/test_database.py", line 311, in setUpClass
    dbmode='w')
  File "/tmp/buildd/pymc-2.3.4+ds/build/lib.linux-x86_64-2.7/pymc/MCMC.py", line 82, in __init__
    **kwds)
  File "/tmp/buildd/pymc-2.3.4+ds/build/lib.linux-x86_64-2.7/pymc/Model.py", line 207, in __init__
    self._assign_database_backend(db)
  File "/tmp/buildd/pymc-2.3.4+ds/build/lib.linux-x86_64-2.7/pymc/Model.py", line 574, in _assign_database_backend
    self.db = module.Database(**self._db_args)
  File "/tmp/buildd/pymc-2.3.4+ds/build/lib.linux-x86_64-2.7/pymc/database/hdf5.py", line 276, in __init__
    self._h5file = tables.open_file(self.dbname, self.mode)
AttributeError: 'module' object has no attribute 'open_file'

pymc_2.3.4+ds-1~nd13.10+1_amd64.build:AttributeError: 'module' object has no attribute 'open_file'
pymc_2.3.4+ds-1~nd13.10+1_i386.build:AttributeError: 'module' object has no attribute 'open_file'
pymc_2.3.4+ds-1~nd70+1_amd64.build:AttributeError: 'module' object has no attribute 'open_file'
pymc_2.3.4+ds-1~nd70+1_i386.build:AttributeError: 'module' object has no attribute 'open_file'

with pytables as old as 2.3.1-3 on wheezy (nd70) so might be nice if test was skipped on those elderly deployments. Cheers

install failed

ubgpu@ubgpu:/github/pymc$
ubgpu@ubgpu:/github/pymc$ sudo python setup.py config_fc --fcompiler gfortran build
ATLAS version 3.10.1 built by buildd on Mon Feb 3 23:10:50 UTC 2014:
............
customize UnixCCompiler
customize UnixCCompiler using build_ext
customize Gnu95FCompiler
customize Gnu95FCompiler using build_ext

ubgpu@ubgpu:/github/pymc$ sudo python setup.py install
ATLAS version 3.10.1 built by buildd on Mon Feb 3 23:10:50 UTC 2014:
.........
Removing /usr/local/lib/python2.7/dist-packages/pymc-2.3.5.egg-info
Writing /usr/local/lib/python2.7/dist-packages/pymc-2.3.5.egg-info
running install_clib
customize UnixCCompiler
ubgpu@ubgpu:/github/pymc$
ubgpu@ubgpu:~/github/pymc$

ubgpu@ubgpu:~/github/pymc$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import pymc

    Warning: You are importing PyMC from inside its source tree.

Traceback (most recent call last):
File "", line 1, in
File "pymc/init.py", line 27, in
from .Container import *
File "pymc/Container.py", line 41, in
from .Container_values import LCValue, DCValue, ACValue, OCValue
ImportError: No module named Container_values

quit()
ubgpu@ubgpu:~/github/pymc$ python3
Python 3.4.0 (default, Apr 11 2014, 13:05:11)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
import pymc

    Warning: You are importing PyMC from inside its source tree.

Traceback (most recent call last):
File "", line 1, in
File "/home/ubgpu/github/pymc/pymc/init.py", line 27, in
from .Container import *
File "/home/ubgpu/github/pymc/pymc/Container.py", line 41, in
from .Container_values import LCValue, DCValue, ACValue, OCValue
ImportError: No module named 'pymc.Container_values'

Singular matrix error in sample

Sampling with particular combinations of burnin and thin causes LinAlgError. Moved here from StackOverflow.

ImportError: DLL load failed

Hi, I want to use python as statistics tool, so after google, I installed some module in python
like numpy, matplotlib, and so on.
And I decided to install pymc......
And it took me one afternoon and one night to get pymc installed.
I did

import pymc

and this comes out:

Traceback (most recent call last):
  File "", line 1, in 
    import pymc
  File "D:\Programming\Python34\lib\site-packages\pymc\__init__.py", line 27, in 
    from .Container import *
  File "D:\Programming\Python34\lib\site-packages\pymc\Container.py", line 41, in 
    from .Container_values import LCValue, DCValue, ACValue, OCValue
ImportError: DLL load failed: the specified module could not be found

I tried depends and it showed that python34.dll is missing.
And I tried adding pymc's path to PATH.
Nothing worked.
I used MinGW for pymc install.
My OS is win8 64-bit.
Please help with this!
Thank you!!

Update------------------------
I tried anaconda
and was shocked.
It got everything installed in less than 3 minutes...
But I spent more than 9 hours to try to read and do things I don't even understand.
Thank you whoever created anaconda.
And thanks to kyleam who told me about anaconda.

Maybe pymc should add this in its documentation:
"If you are using windows, please try anaconda!"
Well, it's just my opinion.^^

Thinning sample and increasing burnin after run?

After a recent sample run, I used pymc.Matplot.plot to display the trace Here. Obviously it could benefit from a longer burnin period as well as sample thinning. It's clear how to set these parameters before I start the run-- but I'm not sure how to do it after in a way that produces a pymc.database.pickle.Trace object such that I can still use the plot function and .stats() method. It's suggested in the documentation (section 3.5.1) that this is possible, however I cannot find the functions that implement either thinning or revision of the burnin. Do they exist? If so, how can I access them?

anaconda install issue - "undefined symbol: PyUnicodeUCS2_Format"

I installed pymc in Anaconda, using "conda install pymc". This appears to have worked, but when I import the package, I run into this problem:

>>> import pymc
/ifs/home/c2b2/rr_lab/dsr2131/.local/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-x86_64.egg/pandas/hashtable.so: undefined symbol: PyUnicodeUCS2_Format

And I tried updating anaconda and conda, to no avail. It does appear that I am able to use pymc despite this error, but I haven't tested it thoroughly.

I am using:

Python 2.7.9 |Anaconda 2.2.0 (64-bit)| (default, Mar  9 2015, 16:20:48)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2

My gfortran version is:

$ ./gfortran -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)

Implement save_state for sqlite backend

Moved from pymc-devs/pymc#578

save_state is implemented in other backends, but not sqlite.

Fitting a Binomial distribution with pymc raises ZeroProbability error for certain FillValues in masked arrays

I'm not sure if I found a bug in pymc. It seems like fitting a Binomial with missing data can produce a ZeroProbability error depending on the chosen fill_value that masks missing data. But maybe I'm using it wrongly. I asked a question on stackoverflow, but I did not get an answer, that's why I report it here.

I tried the following example with the current master branch from github. I'm aware of the bug concerning Binomial distributions in pymc 2.3.4, but this seems to be a different issue.

I fitted a Binomial distribution with pymc and everything worked as I expected:

import scipy as sp
import pymc

def make_model(observed_values):
    p = pymc.Uniform('p', lower = 0.0, upper = 1.0, value = 0.1)
    values = pymc.Binomial('values', n = 10* sp.ones_like(observed_values), p = p * sp.ones_like(observed_values),\
                  value = observed_values, observed = True, plot = False)
    return locals()

sp.random.seed(0)
observed_values = sp.random.binomial(n = 10.0, p = 0.1, size = 100)

M1 = pymc.MCMC(make_model(observed_values))
M1.sample(iter=10000, burn=1000, thin=10)
pymc.Matplot.plot(M1)
M1.summary()

Output:

  [-----------------100%-----------------] 10000 of 10000 complete in 0.7 sec
Plotting p

  p:

      Mean             SD               MC Error        95% HPD interval
      ------------------------------------------------------------------
      0.093            0.007            0.0              [ 0.081  0.107]


      Posterior quantiles:

      2.5             25              50              75             97.5
      |---------------|===============|===============|---------------|
      0.08             0.088           0.093          0.097         0.106

Now, I tried a very similar situation with the difference that one observed value would be missing:

mask = sp.zeros_like(observed_values)
mask[0] = True
masked_values = sp.ma.masked_array(observed_values, mask = mask, fill_value = 999999)

M2 = pymc.MCMC(make_model(masked_values))
M2.sample(iter=10000, burn=1000, thin=10)
pymc.Matplot.plot(M2)
M2.summary()

Unexpectedly, I got a ZeroProbability error:

---------------------------------------------------------------------------
ZeroProbability                           Traceback (most recent call last)
<ipython-input-16-4f945f269628> in <module>()
----> 1 M2 = pymc.MCMC(make_model(masked_values))
      2 M2.sample(iter=10000, burn=1000, thin=10)
      3 pymc.Matplot.plot(M2)
      4 M2.summary()

<ipython-input-12-cb8707bb911f> in make_model(observed_values)
      4 def make_model(observed_values):
      5     p = pymc.Uniform('p', lower = 0.0, upper = 1.0, value = 0.1)
----> 6     values = pymc.Binomial('values', n = 10* sp.ones_like(observed_values), p = p * sp.ones_like(observed_values),                             value = observed_values, observed = True, plot = False)
      7     return locals()

/home/fabian/anaconda/lib/python2.7/site-packages/pymc/distributions.pyc in __init__(self, *args, **kwds)
    318                     logp_partial_gradients=logp_partial_gradients,
    319                     dtype=dtype,
--> 320                     **arg_dict_out)
    321 
    322     new_class.__name__ = name

/home/fabian/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.pyc in __init__(self, logp, doc, name, parents, random, trace, value, dtype, rseed, observed, cache_depth, plot, verbose, isdata, check_logp, logp_partial_gradients)
    773         if check_logp:
    774             # Check initial value
--> 775             if not isinstance(self.logp, float):
    776                 raise ValueError(
    777                     "Stochastic " +

/home/fabian/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.pyc in get_logp(self)
    930                     (self._value, self._parents.value))
    931             else:
--> 932                 raise ZeroProbability(self.errmsg)
    933 
    934         return logp

ZeroProbability: Stochastic values's value is outside its support,
or it forbids its parents' current values.

However, if I change the fill value in the masked array to 1, fitting works again:

masked_values2 = sp.ma.masked_array(observed_values, mask = mask, fill_value = 1)

M3 = pymc.MCMC(make_model(masked_values2))
M3.sample(iter=10000, burn=1000, thin=10)
pymc.Matplot.plot(M3)
M3.summary()

Output:

[-----------------100%-----------------] 10000 of 10000 complete in 2.1 sec
Plotting p

p:

    Mean             SD               MC Error        95% HPD interval
    ------------------------------------------------------------------
    0.092            0.007            0.0              [ 0.079  0.105]


    Posterior quantiles:

    2.5             25              50              75             97.5
    |---------------|===============|===============|---------------|
    0.079            0.088           0.092          0.097         0.105


values:

    Mean             SD               MC Error        95% HPD interval
    ------------------------------------------------------------------
    1.15             0.886            0.029                  [ 0.  3.]


    Posterior quantiles:

    2.5             25              50              75             97.5
    |---------------|===============|===============|---------------|
    0.0              1.0             1.0            2.0           3.0

Is this a bug or is there a problem with my model?
Thanks for any help!

Max number of dim

I wonder if there is an upper limit on the number of dimensions pymc can handle. My log likelihood is the sum of 3 Gaussians and 1 mixture of Gaussians. Can it handle up to ~ 760 dimensions?

No module named pymc with Anaconda in Win764

I am unable to properly install pymc with my Anaconda setup.
Using the latest Anaconda 3.1 (build 1.8.2)
Python is 2.7.8 (latest available through Anaconda).
Tried installing both with pip and conda.
Spyder gives no module named pymc when doing import.
Version of pymc being installed is 2.3.4.

Memory issue

Moved from pymc-devs/pymc#543

Connecting a single Stochastic variable to a large number of other Stochastic variables takes a lot of memory. E.g.

def create_model(i, a):
    b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
    return locals()

a = pymc.Uniform('a', lower=0., upper=100., value=.1)
l = [create_model(i, a) for i in range(10000)]
model = pymc.Model(l)

while having twice as much not connected variables is fine :

def create_model(i):
    a = pymc.Uniform('a_%i' % i, lower=0., upper=100., value=.1)
    b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
    return locals()

l = [create_model(i) for i in range(10000)]
model = pymc.Model(l)

pymc.Matplot.plot histogram frquency messed up?

Dear all,
first of all pymc is a great package!
When the model parameter values are high around the thousands the histogram plots are getting messed up, because the frequency of the plot is way too high. Why is that? In the plot upright you can see the total number of samples.
Thank you!

Error when using Poisson to fill in missing values

I am getting the following error, when I tried to use Poission to fill in the missing values:
The code is in this link: https://github.com/Kirubaharan/hydrology/blob/master/weather_had.py

Traceback (most recent call last):
File "/home/kiruba/PycharmProjects/area_of_curve/hydrology/weather_had.py", line 128, in
wind_speed = pm.Poisson('wind_speed', mu=rate, value=masked_values, observed=True)
File "/home/kiruba/.local/lib/python2.7/site-packages/pymc/distributions.py", line 318, in init
**arg_dict_out)
File "/home/kiruba/.local/lib/python2.7/site-packages/pymc/PyMCObjects.py", line 772, in init
if not isinstance(self.logp, float):
File "/home/kiruba/.local/lib/python2.7/site-packages/pymc/PyMCObjects.py", line 929, in get_logp
raise ZeroProbability(self.errmsg)
pymc.Node.ZeroProbability: Stochastic wind_speed's value is outside its support,
or it forbids its parents' current values.

separating model definition from observations?

Hi,

I wanted to see if it's possible with the current pymc design to separate observations of random variables's values in a probabilistic model from the definition of the model itself. If not, I think it would be a useful feature to add in future. The syntax I imagine would separaet model definition from observations similarly to how the Church/Venture/Anglican probabilistic languages do [1]. Is this separation possible in pymc? Consider the rain-umbrella Bayes net from AIMA [2], which I wrote in pymc:

import pymc
import numpy as np

rain = pymc.Bernoulli("rain", 0.5)

def umbrella_logp(value, rain):
    """
    Umbrella node. Initial value is False.

    P(umbrella=True | rain=True)  = 0.9 
    P(umbrella=True | rain=False) = 0.2
    """
    p_umb_given_rain = 0.9
    p_umb_given_no_rain = 0.2
    if rain:
        logp = pymc.bernoulli_like(value, p_umb_given_rain)
    else:
        logp = pymc.bernoulli_like(value, p_umb_given_no_rain)
    return logp

# declare umbrella as observed, with value True
umbrella = pymc.Stochastic(name="umbrella",
                           doc="umbrella var",
                           parents={"rain": rain},
                           logp=umbrella_logp,
                           value=True,
                           observed=True)
rain_hmm = pymc.Model([rain, umbrella])
m = pymc.MCMC(rain_hmm)
m.sample(iter=5000)
print "\n"
print "rain: "
print np.mean(m.trace("rain")[:])

The above model codes the conditional dependencies and runs inference for the case where the umbrella node is observed to be true. However, it'd be nice to take the model and then run inference in it with different sets of observations, without creating new nodes and new Model() instances. E.g. run inference in same model where other nodes are observed, like rain instead of umbrella, or where umbrella is observed to be false instead of true.

I tried instead making the stochastic nodes unobserved when defining them (observed=False) and then setting their value using the .value attribute, which the manual says is the right way to change values of nodes (http://pymc-devs.github.io/pymc/modelbuilding.html). That didn't work: nodes that are observed seem to have to be declared as observed from the start, and once they're observed, their values can't change.

Is there a way to achieve this separation? Something along the lines of:

umbrella = pymc.Stochastic(name="umbrella",
                           doc="umbrella var",
                           parents={"rain": rain},
                           logp=umbrella_logp,
                           random=...)
# define the model, make no references to 'observe='
rain_hmm = pymc.Model([rain, umbrella])
# set observations if any, e.g.
rain_hmm.observe("rain", value=True)
rain_hmm.observe("other_node", ...)  # observe other nodes
# do inference
m = pymc.MCMC(rain_hmm)
...
# now set a new observations, without changing model
rain_hmm.observe("rain", value=np.na)  # rain not observed
rain_hmm.observe("umbrella", value=False)

Thanks.

[1] See "Inference" section in: http://www.robots.ox.ac.uk/~fwood/anglican/examples/bayes_net/index.html for simple example of separation between model assumptions ("assume" statements) and observations ("observe" statements).

[2] The Bayes net itself is given here: http://en.wikipedia.org/wiki/Forward%E2%80%93backward_algorithm#Example - I only wrote it up for a single time step above

Binomial rv causes core dump with minimal example

import pymc as pm
from pymc import Beta,Binomial,Exponential
import numpy as np
from pymc.Matplot import plot as mcplot

data = pm.rbinomial(5,0.01,size=100)
p = Beta("p",1,1)
observations = Binomial("obs",5,p,value=data,observed=True)
model = pm.Model([p,observations])
mcmc = pm.MCMC(model)
mcmc.sample(400,100,2)
mcplot(mcmc)

venki@venki-HP-248-G1-Notebook-PC:~/Desktop$ python perf_testing.py 
*** glibc detected *** python: free(): corrupted unsorted chunks: 0x0000000003cb0d40 ***
*** glibc detected *** python: malloc(): memory corruption: 0x00000000038bf2e0 ***

OS

Python 2.7.3
pymc 2.3.4
Ubuntu 12.04.5 LTS

Restore Sample Method State from hdf5 database

I am trying to restore the Sample Method State of a chain, specially the adaptive scale factors of Metropolis.

Loading the data base and getting the state, all information is perfectly stored in a dictionary. However, the function restore_sampler_state() does not restore the "step methods" and the function restore_sm_state() does not seem to do anything. I am wondering if I am doing something wrong or is a bug.

If the problem is that is not implemented, I guess that the fastest way to do so, is to use the use_step_method function and read the info from the database dictionary?

Thanks for the help!

What is the Markov blanket of a node in pymc?

Hi,

I set up a simple model in PyMC with two stochastic nodes, one node (rain) being the parent of the other (umbrella), i.e. rain -> umbrella. For some reason, there's no Markov blanket calculated for the umbrella node and the rain node's Markov blanket includes umbrella and rain itself. I expected the Markov blanket of rain to be: {umbrella, rain} and the Markov blanket of umbrella to be {rain}. What is PyMC's definition of a Markov blanket? Is there an implementation of the traditional definition of a Markov blanket (the node's parents, children and parents of its children)? As in: http://en.wikipedia.org/wiki/Markov_blanket

This is the code to print Markov blankets for the model:

import pymc
import numpy as np
rain = pymc.Bernoulli("rain", 0.5)
def umbrella_logp(value, rain):
    p_umb_given_rain = 0.9
    p_umb_given_no_rain = 0.2
    if rain:
        logp = pymc.bernoulli_like(value, p_umb_given_rain)
    else:
        logp = pymc.bernoulli_like(value, p_umb_given_no_rain)
    return logp
# Observed node
umbrella = pymc.Stochastic(name="umbrella",
                           doc="umbrella var",
                           parents={"rain": rain},
                           logp=umbrella_logp,
                           observed=True,
                           value=True)
rain_model = pymc.Model([rain, umbrella])
for k in rain_model.markov_blanket:
    print "Markov blanket for ", k, " is ", rain_model.markov_blanket[k]

which outputs:

Markov blanket for  rain  is  set([<pymc.distributions.Bernoulli 'rain' at 0x10bfde9d0>, <pymc.PyMCObjects.Stochastic 'umbrella' at 0x10bfde8d0>])

so umbrella is missing but rain looks right.

'module' object has no attribute 'Matplot'

Hello,
I installed pymc through pip install pymc which is versionpymc==2.3.4. And when trying the example in the home page I got the following error 'module' object has no attribute 'Matplot'. Have you any idea why this module is not loading? Thanks!

Add documentation section on model criteria

There is little mention of model selection criteria currently in the docs. Now that BCIP and AICc have been added, we should probably have a short section discussing them.

Bernoulli variable does not follow requested probability in pymc model selection

Below is a short pymc 2.3 script where I am trying to do model selection. My problem is the behaviour of the Bernoulli variable used as a model selector.

from the print statements I see that even with a high realization probability (promin >.9) the Bernoulli cast ber seems to be False most of the time (see below). This leads one of the two models never to be tried (here the parabola). I tried exchanging the order of return of the function (if ber: return linear model instead of parabolic) to rule out a very fast convergence. No effect.
sampler.ber.trace[:] shows only False values. Only if pro is set to 1, the ber.trace[:] shows True values (all values True indeed, as expected)
incidently, my debug prints show that the fn is called much more often than the 100 times requested by the sampler. Why? I have no burn nor thin parameter...
the trace of ber and the content of ber_list are different (although the trace contains only Falses, the ber list shows some Trues.). This could be related to (2)...

I tried some simple models using Bernoulli distributions and they were fitted perfectly. But this switching model seems to be willing to drive me nuts. Hope I did not miss any critical stuff in the manual...

import pylab as p
import pymc
import pymc.Matplot

# example data. A very noisy parabola
x = p.arange(0,10,.1)
noise = p.randn(len(x)) * 16
err = p.ones(len(x)) * p.std(noise)
tau = 1 / err
y_real = .7 * x **2 - 5 * x + 42.69
y = y_real + noise

# priors for two models, a parabolic one and a linear one
a3 = pymc.Uniform('a3',0,4,2)
b3 = pymc.Uniform('b3',-20,20,-6)
c3 = pymc.Uniform('c3',-100,100,42)

a4 = pymc.Uniform('a4',0,1000.,35)
b4 = pymc.Uniform('b4',-100,100,1.8)

# a model switching variable, as Bernoulli cast (here very skewed)
# the posterior value of pro should give an indication of the best model
# among the two proposed.

promin= .99 ## edit : very high value only for demonstration purpose. Normally I would use 0.
promax = 1

pro = pymc.Uniform( "pro", promin , promax)
ber = pymc.Bernoulli( "ber", p = pro)

# the fitting function
verbose = True # try this only for small values of niter ;)
ber_list = []
@pymc.deterministic
def choose_model(a3=a3, b3=b3, c3=c3, a4=a4, b4=b4, ber=ber):
    ber_list.append(ber)
    if verbose:
        print ber
    # switching model as a fn of the Bernoulli cast
    if ber:
        return a3 * x**2 + b3 * x + c3
    else:
        return a4 * x + b4

# once fitted, the observation should be normally distributed around 
# the model fn
distry = pymc.Normal('distry', mu=choose_model, tau=tau, value=y, 
                     observed=True)

# packing everything into a Model
model = pymc.Model({'a3'     :    a3,
                    'b3'      :    b3,
                    'c3'      :    c3,
                    'a4'      :    a4,
                    'b4'      :    b4,
                    'ber'     :    ber,
                    'pro'     :    pro
                  })
# short sampling (better for diagnostics, should not have converged yet. And, yes I tried with much longer samplings)
sampler = pymc.MCMC(model)
sampler.sample(100)

pymc imput passing back 1e20

In the masked method of impute the object passes 1e20 for nan/None elements rather than samples from the specified distribution. I have compared the same matrix with the Impute function and it seems to have correct samples from the distribution. Below is a small sample code to illustrate the issue.

I am pretty new to pymc so maybe this is a limitation to the masked approach that is not present with the Impute method?

import numpy as np
import pymc as py

disasters_array = np.random.random((3,3))
disasters_array[1,1]=None

D = py.Impute('D', py.Normal, disasters_array, mu=.5, tau=1E5)

print disasters_array
masked_values = np.ma.masked_invalid(disasters_array)

disasters = py.Normal('disasters', mu=.5, tau=1E5, value=masked_values, observed=True)
@py.deterministic
def test(disasters=disasters, D=D):
    print D
    print disasters

mcmc = py.MCMC(py.Model(set([test,disasters])))

Output:

Original Matrix:

[[ 0.23507836  0.2024624   0.90518228]
 [ 0.95816            **nan**  0.43145808]
 [ 0.99566308  0.25431568  0.25464137]]

D with imputations:

[[array(0.23507836309832741) array(0.20246240248367342)
  array(0.9051822818081371)]
 [array(0.9581599997650212) **array(0.5005324083232756)**
  array(0.43145807852698237)]
 [array(0.9956630757864052) array(0.2543156788973996)
  array(0.25464136701826867)]]

Masked Array approach:

[[  2.35078363e-01   2.02462402e-01   9.05182282e-01]
 [  9.58160000e-01   **1.00000000e+20**   4.31458079e-01]
 [  9.95663076e-01   2.54315679e-01   2.54641367e-01]]

Post from stackoverflow: http://stackoverflow.com/questions/31934287/pymc-imput-passing-back-1e20

UnicodeDecodeError while installing for Python3.3

Just came across this

File "/usr/lib/python3.3/distutils/dist.py", line 930, in run_commands[50/4922]
  self.run_command(cmd)
File "/usr/lib/python3.3/distutils/dist.py", line 949, in run_command
  cmd_obj.run()
File "/usr/local/lib/python3.3/dist-packages/numpy/distutils/command/egg_info.p
  self.run_command("build_src")
File "/usr/lib/python3.3/distutils/cmd.py", line 313, in run_command
  self.distribution.run_command(command)
File "/usr/lib/python3.3/distutils/dist.py", line 949, in run_command
  cmd_obj.run()
File "/usr/local/lib/python3.3/dist-packages/numpy/distutils/command/build_src.
  self.build_sources()
File "/usr/local/lib/python3.3/dist-packages/numpy/distutils/command/build_src.
  self.build_extension_sources(ext)
File "/usr/local/lib/python3.3/dist-packages/numpy/distutils/command/build_src.
  sources = self.f2py_sources(sources, ext)
File "/usr/local/lib/python3.3/dist-packages/numpy/distutils/command/build_src.
  ['-m', ext_name]+f_sources)
File "/usr/local/lib/python3.3/dist-packages/numpy/f2py/f2py2e.py", line 361, i
  postlist=callcrackfortran(files, options)
File "/usr/local/lib/python3.3/dist-packages/numpy/f2py/f2py2e.py", line 283, i
  postlist=crackfortran.crackfortran(files)
File "/usr/local/lib/python3.3/dist-packages/numpy/f2py/crackfortran.py", line
  readfortrancode(files, crackline)
File "/usr/local/lib/python3.3/dist-packages/numpy/f2py/crackfortran.py", line
  l=fin.readline()
File "/usr/lib/python3.3/fileinput.py", line 354, in readline
  self._buffer = self._file.readlines(self._bufsize)
File "/usr/lib/python3.3/encodings/ascii.py", line 26, in decode
  return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 5125: ordinal not in range(128)

Error when sampling from a Bernoulli with 100% conversion rate

Moved from pymc-devs/pymc#598

When I use the first function below to evaluate test results the sampling takes a LONG time (30 seconds?) and gives wild answers for the HPD of percent_lift. In the second function everything seems to work faster and it gives much more reasonable responses.

Am I setting up the first function wrong? Am I right that they should both be functionally equivalent?

If I'm not crazy, then this would seem to be an edge case for Bernoulli. Try the first function with the following parameters:

run_simulation(9, 9, 8, 9)

Function that fails:

def run_simulation(successes, total_observations, control_successes, control_total_observations):
    control_conversion_rate = pymc.Uniform('control_conversion_rate', lower=0, upper=1)
    control_conversion_bernoulli = pymc.Bernoulli("control_conversion_bernoulli", control_conversion_rate, value=[True]*control_successes + [False]*(control_total_observations-control_successes) , observed=True)

    variant_conversion_rate = pymc.Uniform('variant_conversion_rate', lower=0, upper=1)
    variant_conversion_bernoulli = pymc.Bernoulli("variant_conversion_bernoulli", variant_conversion_rate, value=[True]*successes + [False]*(total_observations-successes) , observed=True)

    @pymc.deterministic
    def percent_lift(control_conversion_rate=control_conversion_rate, variant_conversion_rate=variant_conversion_rate):
        return variant_conversion_rate/control_conversion_rate - 1

    mcmc = pymc.MCMC([control_conversion_rate, control_conversion_bernoulli, variant_conversion_rate, variant_conversion_bernoulli, percent_lift])
    mcmc.sample(5000, 1000, progress_bar=False, verbose=False)
    return mcmc.stats()

This is the function as is currently working:

def run_simulation(variant_successes, variant_n, control_successes, control_n):
    control_rate = pymc.Uniform('control_rate', lower=0, upper=1)
    variant_rate = pymc.Uniform('variant_rate', lower=0, upper=1)

    @pymc.deterministic
    def percent_lift(control_rate=control_rate, variant_rate=variant_rate):
        return variant_rate/control_rate - 1

    control_obs = pymc.Binomial("control_obs", control_n, control_rate, value=control_successes, observed=True)
    variant_obs = pymc.Binomial("variant_obs", variant_n, variant_rate, value=variant_successes, observed=True)

    mcmc = pymc.MCMC([control_obs, variant_obs, control_rate, variant_rate, percent_lift])
    mcmc.sample(5000, 1000, progress_bar=False)

    return mcmc.stats()

utils.invcdf

If an n-d array is passed to invcfd, only the first column is calculated, and all other dimensions are ignored in version 2.3.4. (There is an easy work around but letting you all know.)

R = utils.invcdf(q) Throws the below if q dim>1

mcmc.sample(iter=10, verbose=3)

File "C:\Python27\lib\site-packages\pymc\MCMC.py", line 277, in sample
Sampler.sample(self, iter, length, verbose)
File "C:\Python27\lib\site-packages\pymc\Model.py", line 251, in sample
self._loop()
File "C:\Python27\lib\site-packages\pymc\MCMC.py", line 313, in _loop
step_method.step()
File "C:\Python27\lib\site-packages\pymc\StepMethods.py", line 761, in step
s.rand()
File "C:\Python27\lib\site-packages\pymc\PyMCObjects.py", line 1016, in random
r = self._random(*_self.parents.value)
File "C:\Python27\lib\site-packages\pymc\distributions.py", line 112, in newfun
return randfun(size=shape, *args, *_kwargs)
File "C:\Python27\lib\site-packages\pymc\distributions.py", line 2465, in rtruncated_normal
return R * sigma + mu
ValueError: operands could not be broadcast together with shapes (5,) (5,11,8)

pymc.gelman_rubin in python 3.4.3+ disappeared

I have run two tests with pymc 2.3.6 in python 3.4.3 and python 3.5 where I can import pymc and use a number of functions but cannot run the function pymc.gelman_rubin. The error is

AttributeError: module 'pymc' has no attribute 'gelman_rubin'

The weird thing is that not only is the function available in python 2.7, but I do see the function in the diagnostics.py file in the pymc directory in the python 3 location.

I did a quick check to see if more functions are missing. The complete list of functions missing according to the output of the command dir(pymc) ran in python 3.5 with respect to the output from running it in python 2.7 is as follows:

Matplot
__cached__
__loader__
__spec__
diagnostics
discrepancy
effective_n
gelman_rubin
geweke
iat
ppp_value
raftery_lewis
validate

issue with the hdf5 backend

I'm having an issue with the hdf5 backend; it won't write the chains to file and here is the error message. I'd greatly appreciate help on this issue!

/home/ubuntu/anaconda/lib/python2.7/site-packages/tables/table.py:996: PerformanceWarning:
table      ``/chain0/PyMCsamples`` is exceeding the recommended maximum number of columns   
 (512); be ready to see PyTables asking for *lots* of memory and possibly slow I/O PerformanceWarning)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/var/www/blog/ExperimentResults.py", line 1736, in <module>
  S.sample(iter=100000, burn=10000, thin = 10)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pymc/MCMC.py", line 277, in sample
  Sampler.sample(self, iter, length, verbose)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pymc/Model.py", line 243, in sample
  self.db._initialize(self._funs_to_tally, length)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pymc/database/hdf5.py", line 444, in  _initialize
  expectedrows=length)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/tables/file.py", line 1060, in create_table
  chunkshape=chunkshape, byteorder=byteorder)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/tables/table.py", line 871, in __init__
  byteorder, _log)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/tables/leaf.py", line 262, in __init__
  super(Leaf, self).__init__(parentnode, name, _log)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/tables/node.py", line 270, in __init__
self._v_objectid = self._g_create()
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/tables/table.py", line 1023, in _g_create
self._v_new_title, self.filters.complib or '', obversion)
File "tables/tableextension.pyx", line 212, in tables.tableextension.Table._create_table  (tables/tableextension.c:3080)
tables.exceptions.HDF5ExtError: Problems creating the table

Use pymc.gp.Covariance for Incomplete Cholesky Decomposition

I would like to use the functionality provided by Covariance class for Incomplete Cholesky Decomposition (ICD) of a Gram matrix. But the result doesn't seem right.

The following is my experiment:

import pymc.gp.Covariance as pycov
import numpy as np
from sklearn.metrics.pairwise import pairwise_distances

# Define the model
def SimData_Wang04(n):
    import numpy as np

    x = [np.array([np.random.random() * 2.0 - 1.0 for i in range(n)]) for _i in range(0, 5)]
    noise = np.random.standard_normal(n)
    y = np.log(4.0 + np.sin(4 * x[0]) + np.abs(x[1]) + x[2] ** 2 +
                x[3] ** 3 + x[4] + 0.1 * noise)
    # Transform data to matrices
    xm = np.column_stack(x)
    ym = np.matrix(y).T

    return ym, xm

# Simulate data
np.random.seed(10)
n = 500 # sample size
y, x  = SimData_Wang04(n) # X has five dimensions

# Define kernel function (Gaussian kernel)
def GaussianKernel(x, y):
        sigma = 0.5
        norm2 = np.power(x-y, 2).sum()
        #return np.exp(-norm2 / sigma**2)
        return np.exp(- sigma * norm2)

# Define my ICD function
def icholesky(x, kernelFn):
    """
    For a given data matrix and a kernel function, perform ICD on its Gram matrix.
    """
    reltol = 1e-6
    C = pycov(kernelFn, relative_precision=reltol)
    return C.cholesky(x).T

# Run ICD for the data in y
icd_y = icholesky(y, GaussianKernel)
icd_y.shape # retains all columns???

# In order to verify this result, I run a separate test.
# I evaluate the Gram matrix explicitly and calclulate its eigen values.
# Since I set the `relative_precision = 1e-6`, all the eigen values smaller than 1e-6 should be dropped.

# Define Gram matrix constructor
def GramMatrix(x, kernel_fn):
    G = np.matrix(pairwise_distances(x, metric=kernel_fn))
    return G

# Evaluate Gram matrix for y
Gy = GramMatrix(y, GaussianKernel)

# Eigen decomposition
eigval, eigvec = np.linalg.eigh(Gy)

# Reorder eigen values in decreasing order
eigval[::-1] # only seven eigen values are larger than 1e-6

In the experiment, I ran Covariance.cholesky to get a reduced-rank matrix U. However, it retains all 500 columns.

The np.linalg.eigh says only 7 eigen values are above the relative tolerance. So I think if Covariance.cholesky is correct, it should return a U with only 7 columns.

User's Guide Mentions Pre-Built Binaries but None are Found

https://pymc-devs.github.io/pymc/INSTALL.html#installing-from-pre-built-binaries

Please remove the statement or provide the pre-built binaries.

ImportError: cannot import name 'inv_boxcox'

I just installed pymc and when trying to run the test suite I get the following error

File "C:\Anaconda3\lib\site-packages\scipy\stats_continuous_distns.py", line 14, in
from scipy.special import (gammaln as gamln, gamma as gam, boxcox, boxcox1p,

ImportError: cannot import name 'inv_boxcox'

any ideas on how I can fix this?

pymc.Matplot.plot fails with LaTeX error

I tried the disaster_model tutorial, and I'm getting an error when trying to do the final plot using pymc.Matplot.plot(M).

plot(M)
Plotting late_mean
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-27-7b64fd4ca4d6> in <module>()
----> 1 plot(M)

/home/zpace/anaconda/lib/python2.7/site-packages/pymc/Matplot.pyc in wrapper(pymc_obj, *args, **kwargs)
    339                     if args:
    340                         name = '%s_%s' % (args[0], variable.__name__)
--> 341                     f(data, name, *args, **kwargs)
    342             return
    343         except AttributeError:

/home/zpace/anaconda/lib/python2.7/site-packages/pymc/Matplot.pyc in plot(data, name, format, suffix, path, common_scale, datarange, new, last, rows, num, fontmap, verbose)
    460             if not path.endswith('/'):
    461                 path += '/'
--> 462             savefig("%s%s%s.%s" % (path, name, suffix, format))
    463 
    464     else:

/home/zpace/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.pyc in savefig(*args, **kwargs)
    575 def savefig(*args, **kwargs):
    576     fig = gcf()
--> 577     res = fig.savefig(*args, **kwargs)
    578     draw()   # need this if 'transparent=True' to reset colors
    579     return res

/home/zpace/anaconda/lib/python2.7/site-packages/matplotlib/figure.pyc in savefig(self, *args, **kwargs)
   1474             self.set_frameon(frameon)
   1475 
-> 1476         self.canvas.print_figure(*args, **kwargs)
   1477 
   1478         if frameon:

/home/zpace/anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt5agg.pyc in print_figure(self, *args, **kwargs)
    159 
    160     def print_figure(self, *args, **kwargs):
--> 161         FigureCanvasAgg.print_figure(self, *args, **kwargs)
    162         self.draw()
    163 

/home/zpace/anaconda/lib/python2.7/site-packages/matplotlib/backend_bases.pyc in print_figure(self, filename, dpi, facecolor, edgecolor, orientation, format, **kwargs)
   2209                 orientation=orientation,
   2210                 bbox_inches_restore=_bbox_inches_restore,
-> 2211                 **kwargs)
   2212         finally:
   2213             if bbox_inches and restore_bbox:

/home/zpace/anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_agg.pyc in print_png(self, filename_or_obj, *args, **kwargs)
    519 
    520     def print_png(self, filename_or_obj, *args, **kwargs):
--> 521         FigureCanvasAgg.draw(self)
    522         renderer = self.get_renderer()
    523         original_dpi = renderer.dpi

/home/zpace/anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_agg.pyc in draw(self)
    467 
    468         try:
--> 469             self.figure.draw(self.renderer)
    470         finally:
    471             RendererAgg.lock.release()

/home/zpace/anaconda/lib/python2.7/site-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs)
     57     def draw_wrapper(artist, renderer, *args, **kwargs):
     58         before(artist, renderer)
---> 59         draw(artist, renderer, *args, **kwargs)
     60         after(artist, renderer)
     61 

/home/zpace/anaconda/lib/python2.7/site-packages/matplotlib/figure.pyc in draw(self, renderer)
   1083         dsu.sort(key=itemgetter(0))
   1084         for zorder, a, func, args in dsu:
-> 1085             func(*args)
   1086 
   1087         renderer.close_group('figure')

/home/zpace/anaconda/lib/python2.7/site-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs)
     57     def draw_wrapper(artist, renderer, *args, **kwargs):
     58         before(artist, renderer)
---> 59         draw(artist, renderer, *args, **kwargs)
     60         after(artist, renderer)
     61 

/home/zpace/anaconda/lib/python2.7/site-packages/matplotlib/axes/_base.pyc in draw(self, renderer, inframe)
   2108 
   2109         for zorder, a in dsu:
-> 2110             a.draw(renderer)
   2111 
   2112         renderer.close_group('axes')

/home/zpace/anaconda/lib/python2.7/site-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs)
     57     def draw_wrapper(artist, renderer, *args, **kwargs):
     58         before(artist, renderer)
---> 59         draw(artist, renderer, *args, **kwargs)
     60         after(artist, renderer)
     61 

/home/zpace/anaconda/lib/python2.7/site-packages/matplotlib/text.pyc in draw(self, renderer)
    636             if rcParams['text.usetex']:
    637                 renderer.draw_tex(gc, x, y, clean_line,
--> 638                                   self._fontproperties, angle, mtext=mtext)
    639             else:
    640                 renderer.draw_text(gc, x, y, clean_line,

/home/zpace/anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_agg.pyc in draw_tex(self, gc, x, y, s, prop, angle, ismath, mtext)
    248         im = self.texd.get(key)
    249         if im is None:
--> 250             Z = texmanager.get_grey(s, size, self.dpi)
    251             Z = np.array(Z * 255.0, np.uint8)
    252 

/home/zpace/anaconda/lib/python2.7/site-packages/matplotlib/texmanager.pyc in get_grey(self, tex, fontsize, dpi)
    576 
    577         if alpha is None:
--> 578             pngfile = self.make_png(tex, fontsize, dpi)
    579             X = read_png(os.path.join(self.texcache, pngfile))
    580 

/home/zpace/anaconda/lib/python2.7/site-packages/matplotlib/texmanager.pyc in make_png(self, tex, fontsize, dpi)
    499         # see get_rgba for a discussion of the background
    500         if DEBUG or not os.path.exists(pngfile):
--> 501             dvifile = self.make_dvi(tex, fontsize)
    502             outfile = basefile + '.output'
    503             command = self._get_shell_cmd(

/home/zpace/anaconda/lib/python2.7/site-packages/matplotlib/texmanager.pyc in make_dvi(self, tex, fontsize)
    415                      'string:\n%s\nHere is the full report generated by '
    416                      'LaTeX: \n\n' % repr(tex.encode('unicode_escape')) +
--> 417                      report))
    418             else:
    419                 mpl.verbose.report(report, 'debug')

RuntimeError: LaTeX was not able to process the following string:
''
Here is the full report generated by LaTeX: 

This is pdfTeX, Version 3.14159265-2.6-1.40.15 (TeX Live 2014) (preloaded format=latex)
 restricted \write18 enabled.
entering extended mode
(./6139caf44b4d25e442734ff05d9ea044.tex
LaTeX2e <2014/05/01>
Babel <3.9l> and hyphenation patterns for 79 languages loaded.
(/usr/share/texmf-dist/tex/latex/base/article.cls
Document Class: article 2007/10/19 v1.4h Standard LaTeX document class
(/usr/share/texmf-dist/tex/latex/base/size10.clo))
(/usr/share/texmf-dist/tex/latex/type1cm/type1cm.sty)
(/usr/share/texmf-dist/tex/latex/psnfss/helvet.sty
(/usr/share/texmf-dist/tex/latex/graphics/keyval.sty))
(/usr/share/texmf-dist/tex/latex/psnfss/courier.sty)
(/usr/share/texmf-dist/tex/latex/base/textcomp.sty
(/usr/share/texmf-dist/tex/latex/base/ts1enc.def))
(/usr/share/texmf-dist/tex/latex/geometry/geometry.sty
(/usr/share/texmf-dist/tex/generic/oberdiek/ifpdf.sty)
(/usr/share/texmf-dist/tex/generic/oberdiek/ifvtex.sty)
(/usr/share/texmf-dist/tex/generic/ifxetex/ifxetex.sty)

Package geometry Warning: Over-specification in `h'-direction.
    `width' (5058.9pt) is ignored.


Package geometry Warning: Over-specification in `v'-direction.
    `height' (5058.9pt) is ignored.

) (./6139caf44b4d25e442734ff05d9ea044.aux)
(/usr/share/texmf-dist/tex/latex/base/ts1cmr.fd)
(/usr/share/texmf-dist/tex/latex/psnfss/ot1pnc.fd)
*geometry* driver: auto-detecting
*geometry* detected driver: dvips
(./6139caf44b4d25e442734ff05d9ea044.aux) )
No pages of output.
Transcript written on 6139caf44b4d25e442734ff05d9ea044.log.

LaTeX works with some simpler plots (including axes labels and regular text), and nothing seems amiss. Is this a problem with my LaTeX installation/matplotlib itself/am I missing a package?

typo in pymc.Gamma docstring

The docstring states, "Represents the sum of alpha exponentially distributed random variables, each of which has mean beta."

Based on the math that follows, this should read rate parameter beta instead.

Something worked in Version 2.2 does not work in 2.3.4

Revert from 2.3.4 to 2.2 would solve this issue of convergence
CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers#245

AssertionError: Arrays are not almost equal to 7 decimals

ERROR: test_stats_after_reload (pymc.tests.test_MCMCSampler.test_MCMC)

Traceback (most recent call last):
File "/Applications/anaconda/lib/python2.7/site-packages/pymc/tests/test_MCMCSampler.py", line 81, in test_stats_after_reload
db = database.pickle.load('MCMC.pickle')
File "/Applications/anaconda/lib/python2.7/site-packages/pymc/database/pickle.py", line 82, in load
file = open(filename, 'rb')
IOError: [Errno 2] No such file or directory: 'MCMC.pickle'

FAIL: test_nonsep (pymc.tests.test_basiscov.test_basiscov)

Traceback (most recent call last):
File "/Applications/anaconda/lib/python2.7/site-packages/pymc/tests/test_basiscov.py", line 82, in test_nonsep
assert_almost_equal(C2(xstack), C(xstack))
File "/Applications/anaconda/lib/python2.7/site-packages/numpy/testing/utils.py", line 474, in assert_almost_equal
return assert_array_almost_equal(actual, desired, decimal, err_msg)
File "/Applications/anaconda/lib/python2.7/site-packages/numpy/testing/utils.py", line 842, in assert_array_almost_equal
precision=decimal)
File "/Applications/anaconda/lib/python2.7/site-packages/numpy/testing/utils.py", line 665, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not almost equal to 7 decimals

(mismatch 100.0%)
x: array([[ 21.7581876, 21.7581876, 21.7581876, ..., 21.7581876,
21.7581876, 21.7581876],
[ 21.7581876, 21.7581876, 21.7581876, ..., 21.7581876,...
y: array([[ 21.3644798, 21.4933572, 21.5826082, ..., 21.5826082,
21.4933572, 21.3644798],

Errors when installing pymc via pip

Migrated from pymc-devs/pymc#674 since it's a pymc2 issue.

The installation is successful in the end, but I see some errors at the beginning:

$ pip install pymc
Collecting pymc
  Downloading pymc-2.3.4.tar.gz (13.1MB)
    100% |################################| 13.1MB 1.2MB/s
    /tmp/tmp9VMbfy/tmp/tmp9VMbfy/source.o: In function `main':
    source.c:(.text+0x15): undefined reference to `zungqr_'
    collect2: error: ld returned 1 exit status
    /tmp/tmp9VMbfy/tmp/tmp9VMbfy/source.o: In function `main':
    source.c:(.text+0x15): undefined reference to `zungqr_'
    collect2: error: ld returned 1 exit status
    build_src
    building extension "pymc.flib" sources
    f2py options: ['skip:ppnd7']
    f2py:> build/src.linux-x86_64-2.7/pymc/flibmodule.c
    IOError: [Errno 2] No such file or directory: 'skip:ppnd7'. Skipping file "skip:ppnd7".

Do the "undefined reference" error and the "file not found" error matter here?

Exponential interface is misleading/confusing

You use the parameter beta for the exponential distribution but you treat it like lambda, the reciprocal of beta (in terms of the common description of this distribution, see Wikipedia, Wolfram, etc). Beta should be the mean but when I build an exponential and generate random values they have mean 1/Beta.

This means that when you want a really large exponential you get a really small one and vice versa. This is really confusing! It would be good to either rename the parameter or change the behavior.

I can submit a PR if needed, but I not very familiar with the structure of the distributions module and wasn't sure where to start.

HDF5 Output issues with save_interval

I encountered this bug with my own model (choderalab/gbff#10), but the following code reproduces it with the example models in pymc. Removing the save_interval flag fixes the issue.

import pymc
from pymc.examples import disaster_model

sampler = pymc.MCMC(disaster_model, db='hdf5', dbname="./test.h5")
sampler.sample(100, save_interval=1)

Burn-in interval complete

Sampling finished normally.
Warning, unable to save state.
Error message:
Traceback (most recent call last):
  File "/home/kyleb/opt/lib/python2.7/site-packages/pymc/Model.py", line 777, in save_state
    self.db.savestate(self.get_state())
  File "/home/kyleb/opt/lib/python2.7/site-packages/pymc/database/hdf5.py", line 503, in savestate
    cur_chain._state_[0] = state
  File "/home/kyleb/opt/lib/python2.7/site-packages/tables/vlarray.py", line 785, in __setitem__
    self._assign_values(coords, value)
  File "/home/kyleb/opt/lib/python2.7/site-packages/tables/vlarray.py", line 718, in _assign_values
    nparr[:], exc))
ValueError: Value parameter:
'array([128,   2, 125, 113,   1,  40,  85,  11, 115, 116, 111,  99, 104,
        97, 115, 116, 105,  99, 115, 113,   2, 125, 113,   3,  40,  85,
        10, 101,  97, 114, 108, 121,  95, 109, 101,  97, 110, 113,   4,
        99, 110, 117, 109, 112, 121,  46,  99, 111, 114, 101,  46, 109,
       117, 108, 116, 105,  97, 114, 114,  97, 121,  10,  95, 114, 101,
        99, 111, 110, 115, 116, 114, 117,  99, 116,  10, 113,   5,  99,
       110, 117, 109, 112, 121,  10, 110, 100,  97, 114, 114,  97, 121,
        10, 113,   6,  75,   0, 133,  85,   1,  98, 135,  82, 113,   7,
        40,  75,   1,  41,  99, 110, 117, 109, 112, 121,  10, 100, 116,
       121, 112, 101,  10, 113,   8,  85,   2, 102,  56,  75,   0,  75,
         1, 135,  82, 113,   9,  40,  75,   3,  85,   1,  60,  78,  78,
        78,  74, 255, 255, 255, 255,  74, 255, 255, 255, 255,  75,   0,
       116,  98, 137,  85,   8,  78, 134,  53, 111, 148,  87, 198,  63,
       116,  98,  85,  11, 115, 119, 105, 116,  99, 104, 112, 111, 105,
       110, 116, 113,  10, 104,   5, 104,   6,  75,   0, 133,  85,   1,
        98, 135,  82, 113,  11,  40,  75,   1,  41, 104,   8,  85,   2,
       105,  56,  75,   0,  75,   1, 135,  82, 113,  12,  40,  75,   3,
        85,   1,  60,  78,  78,  78,  74, 255, 255, 255, 255,  74, 255,
       255, 255, 255,  75,   0, 116,  98, 137,  85,   8,   0,   0,   0,
         0,   0,   0,   0,   0, 116,  98,  85,   9, 108,  97, 116, 101,
        95, 109, 101,  97, 110, 113,  13, 104,   5, 104,   6,  75,   0,
       133,  85,   1,  98, 135,  82, 113,  14,  40,  75,   1,  41, 104,
         9, 137,  85,   8,  14, 242, 100, 236, 209, 133, 253,  63, 116,
        98, 117,  85,  12, 115, 116, 101, 112,  95, 109, 101, 116, 104,
       111, 100, 115, 113,  15, 125, 113,  16,  40,  85,  20,  77, 101,
       116, 114, 111, 112, 111, 108, 105, 115,  95, 108,  97, 116, 101,
        95, 109, 101,  97, 110, 113,  17, 125, 113,  18,  40,  85,  21,
        97, 100,  97, 112, 116, 105, 118, 101,  95, 115,  99,  97, 108,
       101,  95, 102,  97,  99, 116, 111, 114, 113,  19,  71,  63, 240,
         0,   0,   0,   0,   0,   0,  85,  21, 112, 114, 111, 112, 111,
       115,  97, 108,  95, 100, 105, 115, 116, 114, 105,  98, 117, 116,
       105, 111, 110, 113,  20,  85,   6,  78, 111, 114, 109,  97, 108,
       113,  21,  85,  11, 112, 114, 111, 112, 111, 115,  97, 108,  95,
       115, 100, 113,  22,  99, 110, 117, 109, 112, 121,  46,  99, 111,
       114, 101,  46, 109, 117, 108, 116, 105,  97, 114, 114,  97, 121,
        10, 115,  99,  97, 108,  97, 114,  10, 113,  23, 104,   9,  85,
         8, 164,  23, 214,   1, 218, 171, 217,  63, 134,  82, 113,  24,
        85,   8,  97,  99,  99, 101, 112, 116, 101, 100, 113,  25,  71,
        64,  65, 128,   0,   0,   0,   0,   0,  85,  22,  99, 104, 101,
        99, 107,  95,  98, 101, 102, 111, 114, 101,  95,  97,  99,  99,
       101, 112, 116, 105, 110, 103, 113,  26, 136,  85,   8, 114, 101,
       106, 101,  99, 116, 101, 100, 113,  27,  71,  64,  80,  64,   0,
         0,   0,   0,   0, 117,  85,  30,  68, 105, 115,  99, 114, 101,
       116, 101,  77, 101, 116, 114, 111, 112, 111, 108, 105, 115,  95,
       115, 119, 105, 116,  99, 104, 112, 111, 105, 110, 116, 113,  28,
       125, 113,  29,  40, 104,  19,  71,  63, 240,   0,   0,   0,   0,
         0,   0, 104,  20,  85,   7,  80, 111, 105, 115, 115, 111, 110,
       113,  30, 104,  22, 104,  23, 104,   9,  85,   8,   0,   0,   0,
         0,   0,   0, 240,  63, 134,  82, 113,  31, 104,  25,  71,  64,
        69, 128,   0,   0,   0,   0,   0, 104,  26, 136, 104,  27,  71,
        64,  76, 128,   0,   0,   0,   0,   0, 117,  85,  21,  77, 101,
       116, 114, 111, 112, 111, 108, 105, 115,  95, 101,  97, 114, 108,
       121,  95, 109, 101,  97, 110, 113,  32, 125, 113,  33,  40, 104,
        19,  71,  63, 240,   0,   0,   0,   0,   0,   0, 104,  20, 104,
        21, 104,  22, 104,  23, 104,   9,  85,   8, 124,  57, 143, 169,
        18, 149, 144,  63, 134,  82, 113,  34, 104,  25,  71,  64,  88,
       192,   0,   0,   0,   0,   0, 104,  26, 136, 104,  27,  71,  63,
       240,   0,   0,   0,   0,   0,   0, 117, 117,  85,   7, 115,  97,
       109, 112, 108, 101, 114, 113,  35, 125, 113,  36,  40,  85,   6,
       115, 116,  97, 116, 117, 115, 113,  37,  85,   5, 114, 101,  97,
       100, 121, 113,  38,  85,   5,  95, 105, 116, 101, 114, 113,  39,
       104,  23, 104,  12,  85,   8, 100,   0,   0,   0,   0,   0,   0,
         0, 134,  82, 113,  40,  85,  14,  95, 116, 117, 110, 101,  95,
       105, 110, 116, 101, 114, 118,  97, 108, 113,  41,  77, 232,   3,
        85,  12,  95, 116, 117, 110, 101, 100,  95,  99, 111, 117, 110,
       116, 113,  42,  75,   0,  85,  16,  95, 116, 117, 110, 101,  95,
       116, 104, 114, 111, 117, 103, 104, 111, 117, 116, 113,  43, 136,
        85,  16,  95,  98, 117, 114, 110,  95, 116, 105, 108, 108,  95,
       116, 117, 110, 101, 100, 113,  44, 137,  85,  13,  95,  99, 117,
       114, 114, 101, 110, 116,  95, 105, 116, 101, 114, 113,  45,  75,
       100,  85,   5,  95,  98, 117, 114, 110, 113,  46,  75,   0,  85,
         5,  95, 116, 104, 105, 110, 113,  47,  75,   1, 117, 117,  46], dtype=uint8)'
cannot be converted into an array object compliant vlarray[0] row: 
'array([128,   2, 125, 113,   1,  40,  85,  11, 115, 116, 111,  99, 104,
        97, 115, 116, 105,  99, 115, 113,   2, 125, 113,   3,  40,  85,
        10, 101,  97, 114, 108, 121,  95, 109, 101,  97, 110, 113,   4,
        99, 110, 117, 109, 112, 121,  46,  99, 111, 114, 101,  46, 109,
       117, 108, 116, 105,  97, 114, 114,  97, 121,  10,  95, 114, 101,
        99, 111, 110, 115, 116, 114, 117,  99, 116,  10, 113,   5,  99,
       110, 117, 109, 112, 121,  10, 110, 100,  97, 114, 114,  97, 121,
        10, 113,   6,  75,   0, 133,  85,   1,  98, 135,  82, 113,   7,
        40,  75,   1,  41,  99, 110, 117, 109, 112, 121,  10, 100, 116,
       121, 112, 101,  10, 113,   8,  85,   2, 102,  56,  75,   0,  75,
         1, 135,  82, 113,   9,  40,  75,   3,  85,   1,  60,  78,  78,
        78,  74, 255, 255, 255, 255,  74, 255, 255, 255, 255,  75,   0,
       116,  98, 137,  85,   8,  78, 134,  53, 111, 148,  87, 198,  63,
       116,  98,  85,  11, 115, 119, 105, 116,  99, 104, 112, 111, 105,
       110, 116, 113,  10, 104,   5, 104,   6,  75,   0, 133,  85,   1,
        98, 135,  82, 113,  11,  40,  75,   1,  41, 104,   8,  85,   2,
       105,  56,  75,   0,  75,   1, 135,  82, 113,  12,  40,  75,   3,
        85,   1,  60,  78,  78,  78,  74, 255, 255, 255, 255,  74, 255,
       255, 255, 255,  75,   0, 116,  98, 137,  85,   8,   0,   0,   0,
         0,   0,   0,   0,   0, 116,  98,  85,   9, 108,  97, 116, 101,
        95, 109, 101,  97, 110, 113,  13, 104,   5, 104,   6,  75,   0,
       133,  85,   1,  98, 135,  82, 113,  14,  40,  75,   1,  41, 104,
         9, 137,  85,   8,  14, 242, 100, 236, 209, 133, 253,  63, 116,
        98, 117,  85,  12, 115, 116, 101, 112,  95, 109, 101, 116, 104,
       111, 100, 115, 113,  15, 125, 113,  16,  40,  85,  20,  77, 101,
       116, 114, 111, 112, 111, 108, 105, 115,  95, 108,  97, 116, 101,
        95, 109, 101,  97, 110, 113,  17, 125, 113,  18,  40,  85,  21,
        97, 100,  97, 112, 116, 105, 118, 101,  95, 115,  99,  97, 108,
       101,  95, 102,  97,  99, 116, 111, 114, 113,  19,  71,  63, 240,
         0,   0,   0,   0,   0,   0,  85,  21, 112, 114, 111, 112, 111,
       115,  97, 108,  95, 100, 105, 115, 116, 114, 105,  98, 117, 116,
       105, 111, 110, 113,  20,  85,   6,  78, 111, 114, 109,  97, 108,
       113,  21,  85,  11, 112, 114, 111, 112, 111, 115,  97, 108,  95,
       115, 100, 113,  22,  99, 110, 117, 109, 112, 121,  46,  99, 111,
       114, 101,  46, 109, 117, 108, 116, 105,  97, 114, 114,  97, 121,
        10, 115,  99,  97, 108,  97, 114,  10, 113,  23, 104,   9,  85,
         8, 164,  23, 214,   1, 218, 171, 217,  63, 134,  82, 113,  24,
        85,   8,  97,  99,  99, 101, 112, 116, 101, 100, 113,  25,  71,
        64,  65, 128,   0,   0,   0,   0,   0,  85,  22,  99, 104, 101,
        99, 107,  95,  98, 101, 102, 111, 114, 101,  95,  97,  99,  99,
       101, 112, 116, 105, 110, 103, 113,  26, 136,  85,   8, 114, 101,
       106, 101,  99, 116, 101, 100, 113,  27,  71,  64,  80,  64,   0,
         0,   0,   0,   0, 117,  85,  30,  68, 105, 115,  99, 114, 101,
       116, 101,  77, 101, 116, 114, 111, 112, 111, 108, 105, 115,  95,
       115, 119, 105, 116,  99, 104, 112, 111, 105, 110, 116, 113,  28,
       125, 113,  29,  40, 104,  19,  71,  63, 240,   0,   0,   0,   0,
         0,   0, 104,  20,  85,   7,  80, 111, 105, 115, 115, 111, 110,
       113,  30, 104,  22, 104,  23, 104,   9,  85,   8,   0,   0,   0,
         0,   0,   0, 240,  63, 134,  82, 113,  31, 104,  25,  71,  64,
        69, 128,   0,   0,   0,   0,   0, 104,  26, 136, 104,  27,  71,
        64,  76, 128,   0,   0,   0,   0,   0, 117,  85,  21,  77, 101,
       116, 114, 111, 112, 111, 108, 105, 115,  95, 101,  97, 114, 108,
       121,  95, 109, 101,  97, 110, 113,  32, 125, 113,  33,  40, 104,
        19,  71,  63, 240,   0,   0,   0,   0,   0,   0, 104,  20, 104,
        21, 104,  22, 104,  23, 104,   9,  85,   8, 124,  57, 143, 169,
        18, 149, 144,  63, 134,  82, 113,  34, 104,  25,  71,  64,  88,
       192,   0,   0,   0,   0,   0, 104,  26, 136, 104,  27,  71,  63,
       240,   0,   0,   0,   0,   0,   0, 117, 117,  85,   7, 115,  97,
       109, 112, 108, 101, 114, 113,  35, 125, 113,  36,  40,  85,   6,
       115, 116,  97, 116, 117, 115, 113,  37,  85,   7, 114, 117, 110,
       110, 105, 110, 103, 113,  38,  85,   5,  95, 105, 116, 101, 114,
       113,  39, 104,  23, 104,  12,  85,   8, 100,   0,   0,   0,   0,
         0,   0,   0, 134,  82, 113,  40,  85,  14,  95, 116, 117, 110,
       101,  95, 105, 110, 116, 101, 114, 118,  97, 108, 113,  41,  77,
       232,   3,  85,  12,  95, 116, 117, 110, 101, 100,  95,  99, 111,
       117, 110, 116, 113,  42,  75,   0,  85,  16,  95, 116, 117, 110,
       101,  95, 116, 104, 114, 111, 117, 103, 104, 111, 117, 116, 113,
        43, 136,  85,  16,  95,  98, 117, 114, 110,  95, 116, 105, 108,
       108,  95, 116, 117, 110, 101, 100, 113,  44, 137,  85,  13,  95,
        99, 117, 114, 114, 101, 110, 116,  95, 105, 116, 101, 114, 113,
        45,  75,  99,  85,   5,  95,  98, 117, 114, 110, 113,  46,  75,
         0,  85,   5,  95, 116, 104, 105, 110, 113,  47,  75,   1, 117,
       117,  46], dtype=uint8)'
The error was: <could not broadcast input array from shape (936) into shape (938)>

Windows Install failed with exit status 1

Tried various install methods - using Windows 7 32bit machine

pip install
easy_install
download/unzip .tar

All with error result: 'C:\Program' is not recognized as an internal or external command,
operable program or batch file.
error: Command "gcc -O2 -Wall -Wstrict-prototypes -D__MSVCRT_VERSION__=0x0900 -I
build\src.win32-2.7

Any ideas? Thanks

Close issue - install via Anaconda resolved.

Slice sampler gives values outside support of prior

When sampling variables with Uniform or Beta priors in PyMC I'm getting some very strange behavior. The sampler ends up drawing values far, far outside of the support of the prior. Here's a simple example script to reproduce the problem:

from pymc import Uniform, Bernoulli, deterministic, MCMC, Metropolis, Slicer
import scipy as sp

n = 1000
p = Uniform('p',0,1)
@deterministic(trace=False)
def probs(p=p):
    return(sp.ones(n)*p)
y = Bernoulli('y',p = probs, value=sp.ones(n),observed=True)
m = MCMC({'p':p,'probs':probs,'y':y})
m.use_step_method(Slicer,m.p) # not necessary as Slicer is the default
m.sample(40)
print('\nMax p: %.5f' % m.p.trace().max())

If I run this script I get samples for p on the order of 10000. As I understand it the prior should give zero probability for any value outside of [0,1] so this shouldn't be possible -- in fact in this case the posterior distribution for p should be Beta(1,1001).

Am I somehow specifying my model incorrectly?

(I'm using the AdaptiveMetropolis step method to get around this problem for now, which solves the problem)

I'm using PyMC version 2.3.3 installed using the conda command with the Anaconda Python distribution.

Erase old docs on readthedocs?

Google somehow took me to the following page, which is most likely obsolete:

http://pymcmc.readthedocs.org/en/latest/

MAP does not contain variable values if initialized with a list of variables

I am trying to create binomial mixture models on the fly to determine the appropriate number of component distributions for different samples. This function returns a list of variables called mymix.

I58> mymap=MAP(mymix)

I59> mymap.fit()

I60> mymap.value
O60> <pymc.NormalApproximation.MAP at 0xdbf34450>

I61> mymap.variables
O61> 
set([<pymc.CommonDeterministics.Lambda 'try3_det2' at 0xd92bbc10>,
     <pymc.distributions.Uniform 'try3_p2' at 0xd92bbc50>,
     <pymc.PyMCObjects.Stochastic 'conv' at 0xd92bbc90>,
     <pymc.distributions.Uniform 'try3_part0' at 0xd92bb8d0>,
     <pymc.distributions.Uniform 'try3_p0' at 0xd92bbad0>,
     <pymc.CommonDeterministics.Lambda 'try3_det1' at 0xd92bbb10>,
     <pymc.distributions.Uniform 'try3_part1' at 0xd92bbb50>,
     <pymc.distributions.Uniform 'try3_p1' at 0xd92bbb90>])

I62> mymap.try3_det2.value
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-62-19e2407d3192> in <module>()
----> 1 mymap.try3_det2.value

AttributeError: 'MAP' object has no attribute 'try3_det2'

I63> mymap.try3_p2.value
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-63-134b6c630560> in <module>()
----> 1 mymap.try3_p2.value

AttributeError: 'MAP' object has no attribute 'try3_p2'

However, the provided example with gelman_bioassay works as shown in section 5.3 of the manual. The NormApprox class has similar issues, and the NormApprox.mu function has no valid keys.

Frobenius-Norm induced Distance Metrics for Gaussian Processes

Moved from pymc-devs/pymc#541

In my application I want to use Gaussian Processes Regression where the input space is the space of matrices (potentially infinite number of rows and columns). Thus I would like to define a distance metric based on the Frobenius norm (basically for to matrices A and B this means np.sum((A-B)**2)).

Now, I'm not really sure how to proceed. For starters, I've tried to reimplement the euclidean distance function (with signature own_eucl(D,x,y,cmin=0,cmax=-1,symm=False)) and hand it over to gp.gaussian.add_distance_metric. This works until I observe data, at which point I get


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-bd7d58d48bf8> in <module>()
      6 data = np.array([3.1, 2.9])
      7 
----> 8 observe(M=M, C=C, obs_mesh=obs_x, obs_V=V, obs_vals=data)
      9 
     10 # Generate realizations from posterior

//anaconda/lib/python2.7/site-packages/pymc-2.3.2-py2.7-macosx-10.5-x86_64.egg/pymc/gp/GPutils.pyc in observe(M, C, obs_mesh, obs_vals, obs_V, lintrans, cross_validate)
    281 
    282     # First observe C.
--> 283     relevant_slice, obs_mesh_new = C.observe(obs_mesh, obs_V, output_type='o')
    284 
    285     # Then observe M from C.

//anaconda/lib/python2.7/site-packages/pymc-2.3.2-py2.7-macosx-10.5-x86_64.egg/pymc/gp/Covariance.pyc in observe(self, obs_mesh, obs_V, output_type)
    337 
    338             if output_type != 's':
--> 339                 obs_dict = self.cholesky(obs_mesh, apply_pivot = False, nugget = obs_V, regularize=False, rank_limit = self.rank_limit)
    340             else:
    341                 C_eval = self.__call__(obs_mesh, obs_mesh, regularize=False)

//anaconda/lib/python2.7/site-packages/pymc-2.3.2-py2.7-macosx-10.5-x86_64.egg/pymc/gp/Covariance.pyc in cholesky(self, x, apply_pivot, observed, nugget, regularize, rank_limit)
    131         if rank_limit == 0:
    132             rank_limit = N_new
--> 133         U, m, piv = ichol(diag=diag, reltol=self.relative_precision, rowfun=rowfun, x=x, rl=min(rank_limit, N_new))
    134         U = asmatrix(U)
    135 

//anaconda/lib/python2.7/site-packages/pymc-2.3.2-py2.7-macosx-10.5-x86_64.egg/pymc/gp/Covariance.pyc in rowfun(i, xpiv, rowvec)
    123             A function that can be used to overwrite an input array with rows.
    124             """
--> 125             rowvec[i:] = self.__call__(x=xpiv[i-1,:].reshape((1, -1)), y=xpiv[i:,:], regularize=False, observed=observed)
    126 
    127 

//anaconda/lib/python2.7/site-packages/pymc-2.3.2-py2.7-macosx-10.5-x86_64.egg/pymc/gp/Covariance.pyc in __call__(self, x, y, observed, regularize, return_Uo_Cxo)
    550                     raise ValueError('The last dimension of x and y (the number of spatial dimensions) must be the same.')
    551 
--> 552                 C = self.eval_fun(x,y,**self.params)
    553 
    554                 # Update return value using observations.

//anaconda/lib/python2.7/site-packages/pymc-2.3.2-py2.7-macosx-10.5-x86_64.egg/pymc/gp/cov_funs/cov_utils.pyc in __call__(self, x, y, amp, scale, symm, *args, **kwargs)
    170 
    171         if n_threads <= 1:
--> 172             targ(C,x,y,0,-1,symm)
    173         else:
    174             thread_args = [(C,x,y,bounds[i],bounds[i+1],symm) for i in xrange(n_threads)]

//anaconda/lib/python2.7/site-packages/pymc-2.3.2-py2.7-macosx-10.5-x86_64.egg/pymc/gp/cov_funs/cov_utils.pyc in targ(C, x, y, cmin, cmax, symm, d_kwargs, c_args, c_kwargs)
    164             # Compute covariance for this bit
    165             if self.with_x:
--> 166                 self.cov_fun(C,x,y,cmin=cmin, cmax=cmax,symm=symm,*c_args,**c_kwargs)
    167             else:
    168                 self.cov_fun(C, cmin=cmin, cmax=cmax,symm=symm, *c_args, **c_kwargs)

TypeError: isotropic_cov_funs.gaussian() takes at most 4 arguments (6 given)

Did I miss documentation on how write distance metrics? Adding arguments *c_args, **c_kwargs to own_eucl() does not help.

Calculate DIC based on all chains

Moved from pymc-devs/pymc#648

Currently the dic function only calculates DIC based on one chain. Should generalize this to use all chains.

MvNormal changes value of variable

This issue was migrated to pymc3 here. I'm copying the content in this repo. The issue is basically that we can't use Series or DataFrame in pymc. @fonnesbeck suggested to use the values attribute of a Series or DataFrame.

I asked about this in Stack Overflow here. It is probably a bug. I briefly describe it below:

I'm using a real state dataset, but that shouldn't matter too much. This is the code to reproduce the issue:

import pandas as pd, numpy as np, pymc as pm
from scipy.spatial.distance import pdist, squareform
from numpy.linalg import inv
# reading the file
df = pd.read_table('http://pastebin.com/raw.php?i=41us4HVj', sep=' ')
# building a distance matrix with only 2 pairs of coordinates for simplicity
A = squareform(pdist(np.array(zip(df['Latitude'][:2], df['Longitude'][:2]))))
# creating a valid precision matrix
precision = pm.Lambda('exp', lambda u=A, tau=1, phi=1, kappa=1: inv((1/tau)*np.exp(-(phi*u)**kappa)))
# Simulating the result of some operation that produces a pd.Series object
mu = pm.Lambda('mean', lambda b=1: pd.Series(np.array([1.0, 2.0])) )
print(mu.value)
# Here MvNormal changes the value of mu
w = pm.MvNormal('w', mu, precision)
print(mu.value)

The important part is the pd.Series object in the mu variable. This is the output as a Series object (numbers 0 and 1 are indexes):

 0    1
 1    2
 dtype: float64
 0   -1.095453
 1    1.433737
 dtype: float64

If I remove the Series, I obtain the correct output:

 [ 1.  2.]
 [ 1.  2.]

I think it is important that this type of code works with pd.Series because using DataFrames and Series is so convenient as you can see in the question I linked above (e.g. mu = pm.Lambda('mu', lambda b=beta: b[0]+b[1]*df['LivingArea']/1000.0+b[2]*df['Age']) )

Thanks!

Example implementation of LDA in PyMC

I have just started using PyMC for some of the NLP Topic Modelling work and wanted to implement Topic Models using LDA. However, there close to no documentation in PyMC and also on other websites about how to do it. There is one suggestion implementation posted at http://stats.stackexchange.com/questions/104771/latent-dirichlet-allocation-in-pymc which tries to build the LDA model. I have made a version of that and tested in on one of the standard corpuses but I am not sure how good the implementation is.

Since Bayesian models are frequently used in NLP and topic models are one of the building blocks, I would suggest PyMC team to add an LDA or any NLP specific example in PyMC code so that individuals from NLP community can use it.

My implementation and some analysis on Inaugural speech corpus can be seen at: https://github.com/napsternxg/ipython-notebooks/blob/master/PyMC_LDA.ipynb

Copying and changing pymc models

Hi,

I'm writing a small extension to pymc for inference in DBNs that unrolls a DBN into a static graphical model, so that it can be solved with pymc. To do this, it's helpful to copy Model instances and change their variables, but I'm not sure how to correctly copy models and how to modify their contents? For example, I'd like to copy a Model instance and change the names of its variables in place, and which nodes are parents, keeping all else in tact. E.g.:

import pymc
import copy
rain = pymc.Bernoulli("rain", 0.5)
model = pymc.Model([rain])
# This is not a copy of model!
new_model = model.value
# This is not a copy of model either
new_model = copy.copy(model.value)
new_rain = new_model.get_node("rain")
# This modified original model
new_rain.__name__ = "new_rain"
print "We modified original model: "
print model.variables
print "Before replacement: "
# How to modify the variable of a model?
for n in new_model.variables:
    print n, n.__name__
new_model.replace(None, "rain", new_model.get_node("rain"))
print "After replacement: "
for n in new_model.variables:
    print n, n.__name__

What is the best way to copy Model instances and alter them? Sorry if this is obvious, I couldn't find an example in the documentation and looking at ObjectContainer and related classes didn't reveal the answer. Thanks very much.

pymc-devs / pymc2 Goto Github PK

pymc2's Introduction

Introduction

Purpose

Features

What's new in version 2

Usage

History

Relationship to other packages

Getting started

pymc2's People

Contributors

Stargazers

Watchers

Forkers

pymc2's Issues

OS

ERROR: test_stats_after_reload (pymc.tests.test_MCMCSampler.test_MCMC)

FAIL: test_nonsep (pymc.tests.test_basiscov.test_basiscov)

Recommend Projects

Recommend Topics

Recommend Org