Code Monkey home page Code Monkey logo

bhmm's Introduction

Build Status

Bayesian hidden Markov model toolkit

This toolkit provides machinery for sampling from the Bayesian posterior of hidden Markov models with various choices of prior and output models.

Installation

Installation from conda

The easiest way to install bhmm is via the conda package manager:

conda config --add channels conda-forge
conda install bhmm

Installation from source

python setup.py install

References

See here for a manuscript describing the theory behind using Gibbs sampling to sample from Bayesian hidden Markov model posteriors.

Bayesian hidden Markov model analysis of single-molecule force spectroscopy: Characterizing kinetics under measurement uncertainty. John D. Chodera, Phillip Elms, Frank Noé, Bettina Keller, Christian M. Kaiser, Aaron Ewall-Wice, Susan Marqusee, Carlos Bustamante, Nina Singhal Hinrichs http://arxiv.org/abs/1108.1430

Package maintainers

bhmm's People

Contributors

chayast avatar clonker avatar franknoe avatar jchodera avatar marscher avatar thempel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

bhmm's Issues

new patch release

@marscher can you cut a new patch release (the last bugfix is quite important), and then fix the bhmm dependency in PyEMMA to that release tag?

Fix stochastic test failures

======================================================================
FAIL: test_discrete_4_2 (bhmm.tests.test_init_discrete.TestHMM)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/_test/lib/python3.5/site-packages/bhmm-0.6.1+22.g6f0de1e-py3.5-linux-x86_64.egg/bhmm/tests/test_init_discrete.py", line 85, in test_discrete_4_2
    assert(np.max(B-Bref) < 0.05 or np.max(B[[1, 0]]-Bref) < 0.05)
AssertionError
----------------------------------------------------------------------

Either set the random number seed to make this deterministic or use the statistical error and a 6-sigma tolerance in order to make these tests fail only when things are broken.

error initializing HMM

Hi,

I ran into an error when initializing HMMs with init_hmm:

import bhmm
observations = [np.array([1, 1])]
bhmm.init_hmm(observations, 2)

it gives the error:

[...]
  File "/Users/yarden/anaconda/lib/python2.7/site-packages/bhmm/init/discrete.py", line 195, in estimate_initial_hmm
    + str(active_nonseparate.size) + '-state MSM.')
NotImplementedError: Trying to initialize 2-state HMM from smaller 1-state MSM.

Although only one output state is observed in data, it's technically possible to (over)fit a 2-state HMM to the data, rather than force a 1-state HMM. (For example, fit a two-state HMM with a sparse prior on transition matrix, which would behave similarly to 1-state HMM.) Is there a way around this?

Thanks very much, Yarden

Overflow in doctest

@franknoe: Can you look at this?
https://travis-ci.org/bhmm/bhmm#L642-L644

Doctest: bhmm.estimators.bayesian_sampling.BayesianHMMSampler._sampleHiddenStateTrajectory ... /home/travis/miniconda/envs/_test/lib/python2.7/site-packages/bhmm-0.4.2+17.g5bcde9e-py2.7-linux-x86_64.egg/bhmm/_external/sklearn/mixture/gmm.py:579: RuntimeWarning: overflow encountered in square
  + np.dot(X ** 2, (1.0 / covars).T))
ok

Is this a real issue?

Automatic testing

This repository doesn't appear to have automatic test runs activated. I guess this is because the old bhmm package was renamed into bhmm-force-spectroscopy-manuscript, so the test configuration is probably there. I think we should have the bhmm tests here and instead only test-run the example scripts in bhmm-force-spectroscopy-manuscript. @jchodera or @marscher can either of you do that?

new release?

The current version (0.5.2) on PyPI is not installable, if Cython is not available. The fix for this is already in the repo, but I'd like to release soon.

@franknoe, @jchodera
Is there any progress on other issues which are worth waiting for or should I just release ASAP?

error trying to sample from discrete HMM

bhmm looks like a terrific and much needed library for the Python scientific community - thank you all for developing it.

I ran into an error while trying to sampler from a discrete HMM. This code:

nstates = 2
Tij = np.array([[0.8, 0.2], [0.5, 0.5]])
# note that if a non-array is passed to DiscreteOutputModel(), like in docs, this call will fail
# since the code looks for .shape attribute
output_model = DiscreteOutputModel(np.array([[0.5, 0.1, 0.4], [0.2, 0.3, 0.5]]))
model = HMM(Tij, output_model)
[observations, hidden_states] = \
  model.generate_synthetic_observation_trajectories(ntrajectories=5, length=1000)
bhmm_sampler = BHMM(observations, nstates)
models = bhmm_sampler.sample(nsamples=10)
print models

gives the error:

[...]
/Users/yarden/anaconda/lib/python2.7/site-packages/msmtools/analysis/api.pyc in stationary_distribution(T)
    340     # is this a transition matrix?
    341     if not is_transition_matrix(T):
--> 342         raise ValueError("Input matrix is not a transition matrix."
    343                          "Cannot compute stationary distribution")
    344     # is the stationary distribution unique?

ValueError: Input matrix is not a transition matrix.Cannot compute stationary distribution

The rows of my transition matrix sum to 1 as in the examples from docs, so I'm not sure why it didn't work. I'm probably missing something simple, since running sampler with bhmm.testsystems.dalton_model(nstates) instead of the discrete HMM I initialized works. Any pointers would be helpful.

I'm running Python 2.7.10 :: Anaconda 2.1.0 (x86_64) on OS X 10.11.3. I installed bhmm in a local virtualenv using pip install . in the github repository.

GMM warning

I think this is a new warning during the tests. @marscher , please look into it:

Doctest: bhmm.api.init_hmm ... /Users/noe/data/software_projects/bhmm/bhmm/_external/sklearn/mixture/gmm.py:579: RuntimeWarning: overflow encountered in square

  • np.dot(X ** 2, (1.0 / covars).T))

Problem with sample path

Hi guys,

think I found a bug in the sample path c implementation:

srand(time(NULL));

Whenever sample path is called, the random generator is seeded (which is fine, in principle), however the seeding that is used here takes the amount of seconds since 1970, i.e., if im correct sample path is completely deterministic within a single second. Since sampling procedure calls this method (on my machine) up to 150 times per second, this can bias the results.

HMM initialization is broken

from the current master:

In [10]: estimate_hmm([np.r_[np.random.randn(10) + 10, np.random.randn(10) + 11]], 2)
<... elided>
/usr/lib/python3.5/site-packages/msmtools/estimation/api.py in transition_matrix(C, reversible, mu, method, **kwargs)
    939         if mu is None:
    940             if sparse_computation:
--> 941                 T = sparse.mle_trev.mle_trev(C, **kwargs)
    942             else:
    943                 T = dense.mle_trev.mle_trev(C, **kwargs)

msmtools/estimation/sparse/mle_trev.pyx in msmtools.estimation.sparse.mle_trev.mle_trev (msmtools/estimation/sparse/mle_trev.c:1645)()

AssertionError: C must be strongly connected

Turns out the shipped version of sklearn.mixture.GMM fits very poorly this mixture of a gaussian of mean 10 and a gaussian of mean 11, returning instead

ipdb> p gmm.means_
array([[ 10.30914931],
       [  0.        ]])

Switching to the latest version of scikit-learn solves the issue. Any reason you don't want to simply rely on it as a dependency?

error in chi_square call

File "C:\projects\bhmm\bhmm\estimators\bayesian_sampling.py", line 161, in bhmm.estimators.bayesian_sampling.BayesianHMMSampler.sample
Failed example:
    samples = sampled_model.sample(nsamples, nburn=nburn, nthin=nthin)
Exception raised:
    Traceback (most recent call last):
      File "C:\Python27_32\lib\doctest.py", line 1315, in __run
        compileflags, 1) in test.globs
      File "<doctest bhmm.estimators.bayesian_sampling.BayesianHMMSampler.sample[5]>", line 1, in <module>
        samples = sampled_model.sample(nsamples, nburn=nburn, nthin=nthin)
      File "C:\projects\bhmm\bhmm\estimators\bayesian_sampling.py", line 168, in sample
        self._update()
      File "C:\projects\bhmm\bhmm\estimators\bayesian_sampling.py", line 196, in _update
        self._updateEmissionProbabilities()
      File "C:\projects\bhmm\bhmm\estimators\bayesian_sampling.py", line 258, in _updateEmissionProbabilities
        self.model.output_model._sample_output_model(observations_by_state)
      File "C:\projects\bhmm\bhmm\output_models\gaussian.py", line 417, in _sample_output_model
        chisquared = np.random.chisquare(nsamples_in_state-1)
      File "mtrand.pyx", line 2167, in mtrand.RandomState.chisquare (numpy\random\mtrand\mtrand.c:16888)
    ValueError: df <= 0

AttributeError: 'HMM' object has no attribute 'Pi'

This error is raised when I try to call estimator.stationary_probability.
Here's the traceback:

/Users/sternc1/anaconda/lib/python2.7/site-packages/bhmm-0.6.1+22.gcc92643-py2.7-macosx-10.6-x86_64.egg/bhmm/estimators/maximum_likelihood.pyc in stationary_probability(self)
    217         r""" Stationary probability, if the model is stationary """
    218         assert self._stationary, 'Estimator is not stationary'
--> 219         return self._hmm.Pi
    220 
    221     def _forward_backward(self, itraj):

AttributeError: 'HMM' object has no attribute 'Pi' 

hmm.Pi doesn't exist but hmm._Pi does.

I can open a PR to fix this by changing return self._hmm.Pi to return self._hmm._Pi

discrete sampling fails during dirichlet sampling with numpy-1.14

This used to work with prior numpy version (<1.14):

from pyemma import msm
dtrajs = [[0, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0],
               [0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 0, 0]]  # two trajectories
mm = msm.bayesian_hidden_markov_model(dtrajs, 2, 2, show_progress=False)

output:

  File "/home/mi/marscher/workspace/pyemma/simple.py", line 4, in <module>
    mm = msm.bayesian_hidden_markov_model(dtrajs, 2, 2, show_progress=False)
  File "/home/mi/marscher/workspace/pyemma/pyemma/msm/api.py", line 1318, in bayesian_hidden_markov_model
    return bhmsm_estimator.estimate(dtrajs)
  File "/home/mi/marscher/workspace/pyemma/pyemma/_base/estimator.py", line 413, in estimate
    self._model = self._estimate(X)
  File "/home/mi/marscher/workspace/pyemma/pyemma/msm/estimators/bayesian_hmsm.py", line 288, in _estimate
    store_hidden=self.store_hidden, call_back=call_back)
  File "/home/mi/marscher/miniconda/lib/python3.6/site-packages/bhmm/api.py", line 467, in bayesian_hmm
    call_back=call_back)
  File "/home/mi/marscher/miniconda/lib/python3.6/site-packages/bhmm/estimators/bayesian_sampling.py", line 250, in sample
    self._update()
  File "/home/mi/marscher/miniconda/lib/python3.6/site-packages/bhmm/estimators/bayesian_sampling.py", line 270, in _update
    self._updateEmissionProbabilities()
  File "/home/mi/marscher/miniconda/lib/python3.6/site-packages/bhmm/estimators/bayesian_sampling.py", line 333, in _updateEmissionProbabilities
    self.model.output_model.sample(observations_by_state)
  File "/home/mi/marscher/miniconda/lib/python3.6/site-packages/bhmm/output_models/discrete.py", line 250, in sample
    self._output_probabilities[i, :] = dirichlet(self.prior[i] + count)
  File "mtrand.pyx", line 4753, in mtrand.RandomState.dirichlet
ValueError: alpha <= 0

Some elements of the resulting vector passed as alpha are indeed zero, but there is no insight to me, why this used to work in older numpy versions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.