bhmm / bhmm Goto Github PK

Bayesian hidden Markov models toolkit

License: GNU Lesser General Public License v3.0

Python 95.36% C 2.84% Shell 0.48% HTML 0.02% Batchfile 0.47% C++ 0.84%

baum-welch-algorithm bayesian hidden-markov-models hmm hmm-viterbi-algorithm

bhmm's Introduction

Bayesian hidden Markov model toolkit

This toolkit provides machinery for sampling from the Bayesian posterior of hidden Markov models with various choices of prior and output models.

Installation

Installation from conda

The easiest way to install bhmm is via the conda package manager:

conda config --add channels conda-forge
conda install bhmm

Installation from source

python setup.py install

References

See here for a manuscript describing the theory behind using Gibbs sampling to sample from Bayesian hidden Markov model posteriors.

Bayesian hidden Markov model analysis of single-molecule force spectroscopy: Characterizing kinetics under measurement uncertainty. John D. Chodera, Phillip Elms, Frank Noé, Bettina Keller, Christian M. Kaiser, Aaron Ewall-Wice, Susan Marqusee, Carlos Bustamante, Nina Singhal Hinrichs http://arxiv.org/abs/1108.1430

Package maintainers

Frank Noé [email protected], Freie Universität Berlin
Martin K. Scherer [email protected], Freie Universität Berlin
John D. Chodera [email protected], Sloan Kettering Institute

bhmm's People

Contributors

Stargazers

Watchers

Forkers

jchodera marscher yarden chayast greentara thempel hal2001 nd1511 bigtigr ulrikeboehm ongbe liuchunguo tomv73 clonker

bhmm's Issues

new patch release

@marscher can you cut a new patch release (the last bugfix is quite important), and then fix the bhmm dependency in PyEMMA to that release tag?

SampledHMM object has no attribute _stationary

The if clause in sationary_distribution_samples(self) relies on the self._stationary attribute:
https://github.com/bhmm/bhmm/blob/master/bhmm/hmm/generic_sampled_hmm.py#L95

But neither HMM object not the SampledHMM object has this attribute. Is is supposed to be is_stationary?

Fix stochastic test failures

======================================================================
FAIL: test_discrete_4_2 (bhmm.tests.test_init_discrete.TestHMM)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/_test/lib/python3.5/site-packages/bhmm-0.6.1+22.g6f0de1e-py3.5-linux-x86_64.egg/bhmm/tests/test_init_discrete.py", line 85, in test_discrete_4_2
    assert(np.max(B-Bref) < 0.05 or np.max(B[[1, 0]]-Bref) < 0.05)
AssertionError
----------------------------------------------------------------------

Either set the random number seed to make this deterministic or use the statistical error and a 6-sigma tolerance in order to make these tests fail only when things are broken.

error initializing HMM

Hi,

I ran into an error when initializing HMMs with init_hmm:

import bhmm
observations = [np.array([1, 1])]
bhmm.init_hmm(observations, 2)

it gives the error:

[...]
  File "/Users/yarden/anaconda/lib/python2.7/site-packages/bhmm/init/discrete.py", line 195, in estimate_initial_hmm
    + str(active_nonseparate.size) + '-state MSM.')
NotImplementedError: Trying to initialize 2-state HMM from smaller 1-state MSM.

Although only one output state is observed in data, it's technically possible to (over)fit a 2-state HMM to the data, rather than force a 1-state HMM. (For example, fit a two-state HMM with a sparse prior on transition matrix, which would behave similarly to 1-state HMM.) Is there a way around this?

Thanks very much, Yarden

Overflow in doctest

@franknoe: Can you look at this?
https://travis-ci.org/bhmm/bhmm#L642-L644

Doctest: bhmm.estimators.bayesian_sampling.BayesianHMMSampler._sampleHiddenStateTrajectory ... /home/travis/miniconda/envs/_test/lib/python2.7/site-packages/bhmm-0.4.2+17.g5bcde9e-py2.7-linux-x86_64.egg/bhmm/_external/sklearn/mixture/gmm.py:579: RuntimeWarning: overflow encountered in square
  + np.dot(X ** 2, (1.0 / covars).T))
ok

Is this a real issue?

compilation fails with visual c++

./bhmm/output_models/impl_c/_gaussian.c(17) : error C2065: 'M_PI' : undeclared identifier

https://stackoverflow.com/questions/806145/how-do-i-access-math-constants-eg-m-pi-in-visual-c-2008

Real value is not within confidence interval for synthetic three stat model

I ran the synthetic model example in the singlemoleculetools repository using the generate-figure.py script. Some of real values are not within the confidence interval. Here's the table:

https://github.com/ChayaSt/singlemoleculetools/blob/master/examples/synthetic-three-state-model/synthetic-three-state-model-bhmm-statistics.tex

Automatic testing

This repository doesn't appear to have automatic test runs activated. I guess this is because the old bhmm package was renamed into bhmm-force-spectroscopy-manuscript, so the test configuration is probably there. I think we should have the bhmm tests here and instead only test-run the example scripts in bhmm-force-spectroscopy-manuscript. @jchodera or @marscher can either of you do that?

Docs hosting?

Where should we host the docs for this toolkit?

Would we be OK with pythonhosted or readthedocs? Or do we want to register a domain name and put the docs there?

new release?

The current version (0.5.2) on PyPI is not installable, if Cython is not available. The fix for this is already in the repo, but I'd like to release soon.

@franknoe, @jchodera
Is there any progress on other issues which are worth waiting for or should I just release ASAP?

error trying to sample from discrete HMM

bhmm looks like a terrific and much needed library for the Python scientific community - thank you all for developing it.

I ran into an error while trying to sampler from a discrete HMM. This code:

nstates = 2
Tij = np.array([[0.8, 0.2], [0.5, 0.5]])
# note that if a non-array is passed to DiscreteOutputModel(), like in docs, this call will fail
# since the code looks for .shape attribute
output_model = DiscreteOutputModel(np.array([[0.5, 0.1, 0.4], [0.2, 0.3, 0.5]]))
model = HMM(Tij, output_model)
[observations, hidden_states] = \
  model.generate_synthetic_observation_trajectories(ntrajectories=5, length=1000)
bhmm_sampler = BHMM(observations, nstates)
models = bhmm_sampler.sample(nsamples=10)
print models

gives the error:

[...]
/Users/yarden/anaconda/lib/python2.7/site-packages/msmtools/analysis/api.pyc in stationary_distribution(T)
    340     # is this a transition matrix?
    341     if not is_transition_matrix(T):
--> 342         raise ValueError("Input matrix is not a transition matrix."
    343                          "Cannot compute stationary distribution")
    344     # is the stationary distribution unique?

ValueError: Input matrix is not a transition matrix.Cannot compute stationary distribution

The rows of my transition matrix sum to 1 as in the examples from docs, so I'm not sure why it didn't work. I'm probably missing something simple, since running sampler with bhmm.testsystems.dalton_model(nstates) instead of the discrete HMM I initialized works. Any pointers would be helpful.

I'm running Python 2.7.10 :: Anaconda 2.1.0 (x86_64) on OS X 10.11.3. I installed bhmm in a local virtualenv using pip install . in the github repository.

GMM warning

I think this is a new warning during the tests. @marscher , please look into it:

Doctest: bhmm.api.init_hmm ... /Users/noe/data/software_projects/bhmm/bhmm/_external/sklearn/mixture/gmm.py:579: RuntimeWarning: overflow encountered in square

np.dot(X ** 2, (1.0 / covars).T))

Problem with sample path

Hi guys,

think I found a bug in the sample path c implementation:

bhmm/bhmm/hidden/impl_c/_hidden.c

Line 338 in 9804d18

srand(time(NULL));

Whenever sample path is called, the random generator is seeded (which is fine, in principle), however the seeding that is used here takes the amount of seconds since 1970, i.e., if im correct sample path is completely deterministic within a single second. Since sampling procedure calls this method (on my machine) up to 150 times per second, this can bias the results.

Travis failurest

HMM initialization is broken

from the current master:

In [10]: estimate_hmm([np.r_[np.random.randn(10) + 10, np.random.randn(10) + 11]], 2)
<... elided>
/usr/lib/python3.5/site-packages/msmtools/estimation/api.py in transition_matrix(C, reversible, mu, method, **kwargs)
    939         if mu is None:
    940             if sparse_computation:
--> 941                 T = sparse.mle_trev.mle_trev(C, **kwargs)
    942             else:
    943                 T = dense.mle_trev.mle_trev(C, **kwargs)

msmtools/estimation/sparse/mle_trev.pyx in msmtools.estimation.sparse.mle_trev.mle_trev (msmtools/estimation/sparse/mle_trev.c:1645)()

AssertionError: C must be strongly connected

Turns out the shipped version of sklearn.mixture.GMM fits very poorly this mixture of a gaussian of mean 10 and a gaussian of mean 11, returning instead

ipdb> p gmm.means_
array([[ 10.30914931],
       [  0.        ]])

Switching to the latest version of scikit-learn solves the issue. Any reason you don't want to simply rely on it as a dependency?

bhmm-dev package on binstar

I've made the necessary changes to have travis build development packages and push them to binstar:
https://binstar.org/omnia/bhmm-dev/files

Should be building now.

error in chi_square call

File "C:\projects\bhmm\bhmm\estimators\bayesian_sampling.py", line 161, in bhmm.estimators.bayesian_sampling.BayesianHMMSampler.sample
Failed example:
    samples = sampled_model.sample(nsamples, nburn=nburn, nthin=nthin)
Exception raised:
    Traceback (most recent call last):
      File "C:\Python27_32\lib\doctest.py", line 1315, in __run
        compileflags, 1) in test.globs
      File "<doctest bhmm.estimators.bayesian_sampling.BayesianHMMSampler.sample[5]>", line 1, in <module>
        samples = sampled_model.sample(nsamples, nburn=nburn, nthin=nthin)
      File "C:\projects\bhmm\bhmm\estimators\bayesian_sampling.py", line 168, in sample
        self._update()
      File "C:\projects\bhmm\bhmm\estimators\bayesian_sampling.py", line 196, in _update
        self._updateEmissionProbabilities()
      File "C:\projects\bhmm\bhmm\estimators\bayesian_sampling.py", line 258, in _updateEmissionProbabilities
        self.model.output_model._sample_output_model(observations_by_state)
      File "C:\projects\bhmm\bhmm\output_models\gaussian.py", line 417, in _sample_output_model
        chisquared = np.random.chisquare(nsamples_in_state-1)
      File "mtrand.pyx", line 2167, in mtrand.RandomState.chisquare (numpy\random\mtrand\mtrand.c:16888)
    ValueError: df <= 0

new release for updated dependencies?

Since 0.4.2 we have removed the dependencies for scikit-learn and pyemma. Shall we trigger a new release?

AttributeError: 'HMM' object has no attribute 'Pi'

This error is raised when I try to call estimator.stationary_probability.
Here's the traceback:

/Users/sternc1/anaconda/lib/python2.7/site-packages/bhmm-0.6.1+22.gcc92643-py2.7-macosx-10.6-x86_64.egg/bhmm/estimators/maximum_likelihood.pyc in stationary_probability(self)
    217         r""" Stationary probability, if the model is stationary """
    218         assert self._stationary, 'Estimator is not stationary'
--> 219         return self._hmm.Pi
    220 
    221     def _forward_backward(self, itraj):

AttributeError: 'HMM' object has no attribute 'Pi'

hmm.Pi doesn't exist but hmm._Pi does.

I can open a PR to fix this by changing return self._hmm.Pi to return self._hmm._Pi

discrete sampling fails during dirichlet sampling with numpy-1.14

This used to work with prior numpy version (<1.14):

from pyemma import msm
dtrajs = [[0, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0],
               [0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 0, 0]]  # two trajectories
mm = msm.bayesian_hidden_markov_model(dtrajs, 2, 2, show_progress=False)

output:

  File "/home/mi/marscher/workspace/pyemma/simple.py", line 4, in <module>
    mm = msm.bayesian_hidden_markov_model(dtrajs, 2, 2, show_progress=False)
  File "/home/mi/marscher/workspace/pyemma/pyemma/msm/api.py", line 1318, in bayesian_hidden_markov_model
    return bhmsm_estimator.estimate(dtrajs)
  File "/home/mi/marscher/workspace/pyemma/pyemma/_base/estimator.py", line 413, in estimate
    self._model = self._estimate(X)
  File "/home/mi/marscher/workspace/pyemma/pyemma/msm/estimators/bayesian_hmsm.py", line 288, in _estimate
    store_hidden=self.store_hidden, call_back=call_back)
  File "/home/mi/marscher/miniconda/lib/python3.6/site-packages/bhmm/api.py", line 467, in bayesian_hmm
    call_back=call_back)
  File "/home/mi/marscher/miniconda/lib/python3.6/site-packages/bhmm/estimators/bayesian_sampling.py", line 250, in sample
    self._update()
  File "/home/mi/marscher/miniconda/lib/python3.6/site-packages/bhmm/estimators/bayesian_sampling.py", line 270, in _update
    self._updateEmissionProbabilities()
  File "/home/mi/marscher/miniconda/lib/python3.6/site-packages/bhmm/estimators/bayesian_sampling.py", line 333, in _updateEmissionProbabilities
    self.model.output_model.sample(observations_by_state)
  File "/home/mi/marscher/miniconda/lib/python3.6/site-packages/bhmm/output_models/discrete.py", line 250, in sample
    self._output_probabilities[i, :] = dirichlet(self.prior[i] + count)
  File "mtrand.pyx", line 4753, in mtrand.RandomState.dirichlet
ValueError: alpha <= 0

Some elements of the resulting vector passed as alpha are indeed zero, but there is no insight to me, why this used to work in older numpy versions.