Code Monkey home page Code Monkey logo

bayesian-modelling-in-python's Introduction

Bayesian Modelling in Python

Welcome to "Bayesian Modelling in Python" - a tutorial for those interested in learning how to apply bayesian modelling techniques in python (PYMC3). This tutorial doesn't aim to be a bayesian statistics tutorial - but rather a programming cookbook for those who understand the fundamental of bayesian statistics and want to learn how to build bayesian models using python. The tutorial sections and topics can be seen below.

Contents

  • Introduction

    • Motivation for learning bayesian statistics
    • Loading and parsing Hangout chat data
  • Section 1: Estimating model parameters

    • Frequentist technique for estimating parameters of a poisson model (Optimization routine)
    • Bayesian technique for estimating parameters of a poisson model (MCMC)
  • Section 2: Model checking & comparison

    • Posterior predictive check
    • Bayes factor
  • Section 3: Hierarchal modeling

    • Model pooling (separate models)
    • Partial pooling (hierarchal models)
    • Shrinkage effect of partial pooling
  • Section 4: Bayesian regression

    • Bayesian fixed effects poisson regression
    • Bayesian mixed effects poisson regression
  • Section 5: Bayesian survival analysis

    • Survival model theory
    • Cox proportional hazard model
    • Aalen's additive hazard model
  • Section 6: Bayesian A/B tests

    • Bayesian test of proportions
    • Bayesian t-test (BEST)

Contributions

  • All contributions are more than welcome. They can be minor (spelling, better explanations, improved code/charts) or major (contribute a full section).
  • If you would like to contribute, please create a pull request in GitHub. Happy to discuss ideas before you begin working on the addition.
  • I would especially welcome any contributions that address: survival analysis, mixture models, time series models or A/B experiments.
  • If you're not familiar with GitHub - please email me at [email protected].

Motivation for learning bayesian statistics

Statistics is a topic that never resonated with me throughout university. The frequentist techniques that we were taught (p-values etc) felt contrived and ultimately I turned my back on statistics as a topic that I wasn't interested in.

That was until I stumbled upon Bayesian statistics - a branch to statistics quite different from the traditional frequentist statistics that most universities teach. I was inspired by a number of different publications, blogs & videos that I would highly recommend any newbies to bayesian stats to begin with. They include:

I created this tutorial in the hope that others find it useful and it helps them learn Bayesian techniques just like the above resources helped me. I hope you find it useful and I'd welcome any corrections/comments/contributions from the community.

Note

This tutorial is actively being worked on. I'm keen to get feedback and welcome ideas/contributions.

bayesian-modelling-in-python's People

Contributors

twiecki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bayesian-modelling-in-python's Issues

Section 2 fails when trying to sample the model

I am using pymc3 version 3.0 together with anaconda 4.0.0 & python 2.7 and theano 0.8.2.
In the section 2 notebook:

with pm.Model() as model:
    mu = pm.Uniform('mu', lower=0, upper=100)
    y_est = pm.Poisson('y_est', mu=mu, observed=messages['time_delay_seconds'].values)

    y_pred = pm.Poisson('y_pred', mu=mu)

    start = pm.find_MAP()
    step = pm.Metropolis()
    trace = pm.sample(50000, step, start=start, progressbar=True)

Leaving out the y_ptred = ... statement removes the crash but also removes the purpose of the example...

Since I'm trying to learn pymc3 I currently have no idea where to start.
Something goes wrong in theano (strack-trace):

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-76dad9c91fda> in <module>()
      7     start = pm.find_MAP()
      8     step = pm.Metropolis()
----> 9     trace = pm.sample(50000, step, start=start, progressbar=True)

C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\pymc3\sampling.pyc in sample(draws, step, start, trace, chain, njobs, tune, progressbar, model, random_seed)
    148         sample_func = _sample
    149 
--> 150     return sample_func(**sample_args)
    151 
    152 

C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\pymc3\sampling.pyc in _sample(draws, step, start, trace, chain, tune, progressbar, model, random_seed)
    157     progress = progress_bar(draws)
    158     try:
--> 159         for i, strace in enumerate(sampling):
    160             if progressbar:
    161                 progress.update(i)

C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\pymc3\sampling.pyc in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
    239         if i == tune:
    240             step = stop_tuning(step)
--> 241         point = step.step(point)
    242         strace.record(point)
    243         yield strace

C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\pymc3\step_methods\compound.pyc in step(self, point)
     12     def step(self, point):
     13         for method in self.methods:
---> 14             point = method.step(point)
     15         return point

C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\pymc3\step_methods\arraystep.pyc in step(self, point)
    116         bij = DictToArrayBijection(self.ordering, point)
    117 
--> 118         apoint = self.astep(bij.map(point))
    119         return bij.rmap(apoint)
    120 

C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\pymc3\step_methods\metropolis.pyc in astep(self, q0)
    123             q = q0 + delta
    124 
--> 125         q_new = metrop_select(self.delta_logp(q, q0), q, q0)
    126 
    127         if q_new is q:

C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\theano\compile\function_module.pyc in __call__(self, *args, **kwargs)
    869                     node=self.fn.nodes[self.fn.position_of_error],
    870                     thunk=thunk,
--> 871                     storage_map=getattr(self.fn, 'storage_map', None))
    872             else:
    873                 # old-style linkers raise their own exceptions

C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\theano\gof\link.pyc in raise_with_op(node, thunk, exc_info, storage_map)
    312         # extra long error message in that case.
    313         pass
--> 314     reraise(exc_type, exc_value, exc_trace)
    315 
    316 

C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\theano\compile\function_module.pyc in __call__(self, *args, **kwargs)
    857         t0_fn = time.time()
    858         try:
--> 859             outputs = self.fn()
    860         except Exception:
    861             if hasattr(self.fn, 'position_of_error'):

Section II Check2: Bayes Factor fails

Running win 10x64, anaconda, python3.5, theano 0.9.0dev4, pymc3 3.0rc4 updated 10-dec-2106.
Have confirmed
pm.switch(tau........
not found.
Perhaps an alternative syntax is used.
Code section and error output follows.
franc

with pm.Model() as model:

# Index to true model
prior_model_prob = 0.5
#tau = pm.DiscreteUniform('tau', lower=0, upper=1)
tau = pm.Bernoulli('tau', prior_model_prob)

# Poisson parameters
mu_p = pm.Uniform('mu_p', 0, 60)

# Negative Binomial parameters
alpha = pm.Exponential('alpha', lam=0.2)
mu_nb = pm.Uniform('mu_nb', lower=0, upper=60)

y_like = pm.DensityDist('y_like',
         lambda value: pm.switch(tau, 
             pm.Poisson.dist(mu_p).logp(value),
             pm.NegativeBinomial.dist(mu_nb, alpha).logp(value)
         ),
         observed=messages['time_delay_seconds'].values)

start = pm.find_MAP()
step1 = pm.Metropolis([mu_p, alpha, mu_nb])
step2 = pm.ElemwiseCategorical(vars=[tau], values=[0,1])
trace = pm.sample(200000, step=[step1, step2], start=start)

_ = pm.traceplot(trace[burnin:], varnames=['tau'])


AttributeError Traceback (most recent call last)
in ()
18 pm.NegativeBinomial.dist(mu_nb, alpha).logp(value)
19 ),
---> 20 observed=messages['time_delay_seconds'].values)
21
22 start = pm.find_MAP()

C:\Anaconda3\lib\site-packages\pymc3\distributions\distribution.py in new(cls, name, *args, **kwargs)
29 data = kwargs.pop('observed', None)
30 dist = cls.dist(*args, **kwargs)
---> 31 return model.Var(name, dist, data)
32 else:
33 raise TypeError("Name needs to be a string but got: %s" % name)

C:\Anaconda3\lib\site-packages\pymc3\model.py in Var(self, name, dist, data)
301 else:
302 var = ObservedRV(name=name, data=data,
--> 303 distribution=dist, model=self)
304 self.observed_RVs.append(var)
305 if var.missing_values:

C:\Anaconda3\lib\site-packages\pymc3\model.py in init(self, type, owner, index, name, data, distribution, model)
584 self.missing_values = data.missing_values
585
--> 586 self.logp_elemwiset = distribution.logp(data)
587 self.model = model
588 self.distribution = distribution

in (value)
14
15 y_like = pm.DensityDist('y_like',
---> 16 lambda value: pm.switch(tau,
17 pm.Poisson.dist(mu_p).logp(value),
18 pm.NegativeBinomial.dist(mu_nb, alpha).logp(value)

AttributeError: module 'pymc3' has no attribute 'switch'

Add poisson regression example

Add poisson regression to the tutorial with covariates such as: day of week, time of day, number of people in conversation etc.

Asking questions of the posterior predictive distribution

Add examples of questions that can be asked of the posterior predictive distribution for the hierarchal negative binomial distribution. such as:

  • what is the probability I will respond to David in less than 20 seconds
  • Who are the most likely people I will respond to
  • etc

varnames

For the current pymc3 in pm.traceplot it should be varnames=, not vars=.

Section 2 pymc3 error

Section 2 - Model Check II - Bayes Factor

Model fails

14 
  15     y_like = pm.DensityDist('y_like',
---> 16              lambda value: pm.switch(tau, 
  17                  pm.Poisson.dist(mu_p).logp(value),
  18                  pm.NegativeBinomial.dist(mu_nb, alpha).logp(value)

AttributeError: module 'pymc3' has no attribute 'switch'

Found a fix suggested by Cameron Davidson-Pilson which looks like it was intended to be merged into your code (but seems it wasn't?). The suggested revision to lambda works fine.

  y_like = pm.DensityDist('y_like',
               lambda value: pm.math.switch(tau, 
                   pm.Poisson.dist(mu_p).logp(value),
                   pm.NegativeBinomial.dist(mu_nb, alpha).logp(value)
               ),
               observed=messages['time_delay_seconds'].values)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.