hmmlearn / hmmlearn Goto Github PK

View Code? Open in Web Editor NEW

3.0K 119.0 736.0 2.25 MB

Hidden Markov Models in Python, with scikit-learn like API

Home Page: http://hmmlearn.readthedocs.org

License: BSD 3-Clause "New" or "Revised" License

Python 96.13% C++ 3.87%

hmmlearn's Introduction

hmmlearn

hmmlearn is a set of algorithms for unsupervised learning and inference of Hidden Markov Models. For supervised learning learning of HMMs and similar models see seqlearn.

Note: This package is under limited-maintenance mode.

Important links

Official source code repo: https://github.com/hmmlearn/hmmlearn
HTML documentation (stable release): https://hmmlearn.readthedocs.org/en/stable
HTML documentation (development version): https://hmmlearn.readthedocs.org/en/latest

Dependencies

The required dependencies to use hmmlearn are

Python >= 3.6
NumPy >= 1.10
scikit-learn >= 0.16

You also need Matplotlib >= 1.1.1 to run the examples and pytest >= 2.6.0 to run the tests.

Installation

Requires a C compiler and Python headers.

To install from PyPI:

pip install --upgrade --user hmmlearn

To install from the repo:

pip install --user git+https://github.com/hmmlearn/hmmlearn

hmmlearn's People

Contributors

Stargazers

Watchers

Forkers

snazz2001 rockanjan wavelets kastnerkyle nvdnkpr fireae uakfdotb iamaris antoniopessotti ashumeow stevenlol adityatewari marekpetrik rnowling ringw jianxingdong antonisa aremirata qitianchen reijay mlim-ann danodonovan jdswinbank winkapp alikewmk tempbottle csytracy wshqxin nipunbatra mkdmkk icomputational emmaggie jeffhsu3 yarden zhangaustin jackieee rarryabatol dismantler bninopaul sagunb bowony zhchxi11 ml-ai-nlp-ir wolfv symdumair alband hahahanibal vchu matthiasplappert ozlemmutlu loganding mattyg bilian1995 jaelynwu anjith2006 laskaj aburan28 pierre-haessig jimstearns206 armadillabs echohenry2006 znichols dhuppenkothen haderazzini paschalidoud bigsea-t mrkitravee liangnet yunque stringertheory mlsmith twomers zstring rkw0k zbxzc35 hdubey xsr-thu raydtang samesense rwj611 ckemere sulgik creke pombredanne alankaplan ambushed jan-matthis giwa knsd mulinsenpan altaf-ali tsunamix tuyendothanh chubbymaggie gp0 inim4 guerrajorge mechcoder aybuketurker heyanyun

hmmlearn's Issues

missing no_image.png in the doc/image dir

Setting emission probabilities and number of symbols in MultinomialHMM

I am trying to create a multinomial HMM with 3 states and 2 symbols. Apparently, it is not possible to pass n_symbols nor emissionprob to the constructor. I can use _set_emissionprob after creating the instance but I wonder if there is a more sanctioned way to do it.

n_symbols = 2
n_components = 3
startprob = np.array([0.6, 0.3, 0.1])
transmat = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2], [0.3, 0.3, 0.4]])
emissionprob  = np.array([[0.1,0.9], [0.5,0.5],[0.9,0.1]])
#this returns an error
model = hmm.MultinomialHMM(n_components=n_components,n_symbols=n_symbols,startprob=startprob,transmat=transmat,emissionprob=emissionprob)
#this works
model = hmm.MultinomialHMM(n_components=n_components,startprob=startprob,transmat=transmat)
model._set_emissionprob(emissionprob)

Error when running sample stock prediction code

Hi everyone,
When running the sample code for the stock prediction (found here: https://github.com/hmmlearn/hmmlearn/blob/master/examples/plot_hmm_stock_analysis.py)

I got an the following error:
"ValueError: Rows of transmat must sum to 1.0"

After replacing the import with the old sklearn version the code worked perfectly for me.

replace
from hmmlearn.hmm import GaussianHMM
with
from sklearn.hmm import GaussianHMM

Running on windows 7, 64 bit with python 2.7 and .15 sklearn release. Is it me or is it the code?

Hope that helps!

Error running stock example

I downloaded and ran the stock example, without any modifications, and received an error.

python plot_hmm_stock_analysis.py 
/usr/lib/python2.7/site-packages/pkg_resources.py:1054: UserWarning: /home/lila/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
  warnings.warn(msg, UserWarning)

==========================
Gaussian HMM of stock data
==========================

This script shows how to use Gaussian HMM.
It uses stock price data, which can be obtained from yahoo finance.
For more information on how to get stock prices with matplotlib, please refer
to date_demo1.py of matplotlib.


fitting to HMM and decoding ...Traceback (most recent call last):
  File "plot_hmm_stock_analysis.py", line 57, in <module>
    model.fit([X])
  File "build/bdist.linux-x86_64/egg/hmmlearn/base.py", line 400, in fit
  File "/usr/lib64/python2.7/site-packages/sklearn/utils/validation.py", line 373, in check_array
    array.ndim)
ValueError: Found array with dim 3. Expected <= 2

Python 3.5 AppVeyor integration

As far as Python 3.5 has been released and marked as 'stable' I'd suggest to add rules for it in AppVeyor.

Clarification on log-likelihood

I'm fairly new to hmms, so forgive me if I sound ignorant.

Essentially, I'm creating input features by sampling from 3 normal distributions, all centred at 5.0, with a standard deviations of 0.0000001. I train the HMM using these observations, and then use the sample method to generate samples from my HMM. Then I use the score method to get an idea of the probability of this sequence in the model.

Here's the code I am using:

import math
import numpy as np
from hmmlearn import hmm

m, s = 5.0,0.0000001

g1 = np.random.normal(m, s, 1000)
g2 = np.random.normal(m, s, 1000)
g3 = np.random.normal(m, s, 1000)

X = np.column_stack([g1, g2, g3])
HMM = hmm.GaussianHMM(n_components=3, n_iter=1000)
HMM.fit(X)

for i in range(0,100):
    sample, _ = HMM.sample(1)
    print sample
    prob = HMM.score(sample)
    print 'log-like: ', prob, '   prob: ', math.exp(prob)

I'm getting output that looks something like this:

[[ 5.00143565  4.9991442   4.99118425]]
log-like:  11.5169454308    prob:  100402.805676
[[ 4.99956099  4.98920739  5.00099388]]
log-like:  10.8780507602    prob:  53000.1890873
[[ 5.01013709  5.00566525  4.99596563]]
log-like:  10.3069876345    prob:  29941.1070975
[[ 5.00540697  4.99297579  5.00257765]]
log-like:  11.4357086287    prob:  92568.9108846
[[ 4.99881746  4.99839775  5.0034536 ]]
log-like:  12.6393340166    prob:  308455.85503

Am I interpreting this log-likelyhood correctly? I assume that the value returned from score is ln(p) where p is the probability I'm looking for, thus e^ln(p) should be equal to p, which is what I am attempting to do in the code.

However the number's I am getting are rather strange, I would expect them to be between 0.0 and 1.0, but rather they tend to be huge. Can anyone shed some light on why this is?

Thanks.

HMM isn't converging

I'm trying to fit a GMMHMM with 3 components to a series of numbers ("dirIndices"). Here's an example of the first few lines of dirIndices:

0
0.5
1.66666666667
-0.0833333333333
0.1
-0.75
0
-0.125
0.5
0.736363636364
0.242424242424
0.166666666667
0
0
11.2015503876
57.9074588477

Here's my code:

model = GMMHMM(n_components=3, covariance_type="diag", n_iter=1000).fit(dirIndices)
domain_states = model.predict(dirIndices)
logprob, posteriors = model.score_samples(dirIndices)

For every line, the posterior probabilities are:
[ 0.33333333 0.33333333 0.33333333]

So it's clear that the HMM didn't converge. In general, what problems with data can cause an HMM to not converge?

Please clarify the format of priors

GaussianHMM provides the startprob_prior and transmat_prior kwargs, but it isn't clear to me how I'd use them to specify that (in a two state model) I have a prior e.g. of a_11,a_12~Dirichlet(0.5) or something else.

Also, is it possible to specify priors for the means and variances?

Thanks.

Misc complaints

scikit-learn/scikit-learn#1862

The function normalize returns A / Asum, should do A /= Asum; return A.
The convergence check in BaseHMM.fit should probably not use abs().
GaussianHMM._init should probably use the all the observations instead of just the first one to compute the means and covariances.
Would be nice if BaseHMM.fit would set an attribute self.logprob_ = logprob before returning self, so that callers can examine how the training went.

error on example

ubgpu@ubgpu:~/github/hmmlearn$ sudo python setup.py build_ext --inplace && nosetests
running build_ext
copying build/lib.linux-x86_64-2.7/hmmlearn/_hmmc.so -> hmmlearn
Traceback (most recent call last):
  File "/usr/local/bin/nosetests", line 4, in <module>
    import re
  File "/usr/lib/python3.4/re.py", line 324, in <module>
    import copyreg
  File "/usr/local/lib/python2.7/dist-packages/copyreg/__init__.py", line 7, in <module>
    raise ImportError('This package should not be accessible on Python 3. '
ImportError: This package should not be accessible on Python 3. Either you are trying to run from the python-future src folder or your installation of python-future is corrupted.
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 53, in apport_excepthook
    if not enabled():
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 24, in enabled
    import re
  File "/usr/lib/python3.4/re.py", line 324, in <module>
    import copyreg
  File "/usr/local/lib/python2.7/dist-packages/copyreg/__init__.py", line 7, in <module>
    raise ImportError('This package should not be accessible on Python 3. '
ImportError: This package should not be accessible on Python 3. Either you are trying to run from the python-future src folder or your installation of python-future is corrupted.

Original exception was:
Traceback (most recent call last):
  File "/usr/local/bin/nosetests", line 4, in <module>
    import re
  File "/usr/lib/python3.4/re.py", line 324, in <module>
    import copyreg
  File "/usr/local/lib/python2.7/dist-packages/copyreg/__init__.py", line 7, in <module>
    raise ImportError('This package should not be accessible on Python 3. '
ImportError: This package should not be accessible on Python 3. Either you are trying to run from the python-future src folder or your installation of python-future is corrupted.
ubgpu@ubgpu:~/github/hmmlearn$

Get error when run tests

I get the following error when I try to run the tests under sect Running the test suite:

./hmmlearn$ sudo python setup.py build_ext --inplace && nosetests
running build_ext
copying build/lib.linux-x86_64-2.7/hmmlearn/_hmmc.so -> hmmlearn
EEE..
======================================================================
ERROR: Failure: ImportError (cannot import name check_arrays)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "/usr/local/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/usr/local/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/local/neild/cloud/LIBRARIES/hmmlearn/hmmlearn/hmm.py", line 16, in <module>
    from sklearn.mixture import (
  File "/usr/local/lib/python2.7/dist-packages/sklearn/mixture/__init__.py", line 5, in <module>
    from .gmm import sample_gaussian, log_multivariate_normal_density
  File "/usr/local/lib/python2.7/dist-packages/sklearn/mixture/gmm.py", line 20, in <module>
    from .. import cluster
  File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/__init__.py", line 6, in <module>
    from .spectral import spectral_clustering, SpectralClustering
  File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/spectral.py", line 16, in <module>
    from ..metrics.pairwise import pairwise_kernels
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/__init__.py", line 31, in <module>
    from . import cluster
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/cluster/__init__.py", line 21, in <module>
    from .bicluster import consensus_score
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/cluster/bicluster/__init__.py", line 1, in <module>
    from .bicluster_metrics import consensus_score
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/cluster/bicluster/bicluster_metrics.py", line 6, in <module>
    from sklearn.utils.validation import check_arrays
ImportError: cannot import name check_arrays

======================================================================
ERROR: Failure: ImportError (cannot import name check_arrays)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "/usr/local/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/usr/local/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/local/neild/cloud/LIBRARIES/hmmlearn/hmmlearn/tests/test_base.py", line 9, in <module>
    from hmmlearn import hmm
  File "/local/neild/cloud/LIBRARIES/hmmlearn/hmmlearn/hmm.py", line 16, in <module>
    from sklearn.mixture import (
  File "/usr/local/lib/python2.7/dist-packages/sklearn/mixture/__init__.py", line 5, in <module>
    from .gmm import sample_gaussian, log_multivariate_normal_density
  File "/usr/local/lib/python2.7/dist-packages/sklearn/mixture/gmm.py", line 20, in <module>
    from .. import cluster
  File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/__init__.py", line 6, in <module>
    from .spectral import spectral_clustering, SpectralClustering
  File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/spectral.py", line 16, in <module>
    from ..metrics.pairwise import pairwise_kernels
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/__init__.py", line 31, in <module>
    from . import cluster
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/cluster/__init__.py", line 21, in <module>
    from .bicluster import consensus_score
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/cluster/bicluster/__init__.py", line 1, in <module>
    from .bicluster_metrics import consensus_score
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/cluster/bicluster/bicluster_metrics.py", line 6, in <module>
    from sklearn.utils.validation import check_arrays
ImportError: cannot import name check_arrays

======================================================================
ERROR: Failure: ImportError (cannot import name check_arrays)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "/usr/local/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/usr/local/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/local/neild/cloud/LIBRARIES/hmmlearn/hmmlearn/tests/test_hmm.py", line 9, in <module>
    from sklearn import mixture
  File "/usr/local/lib/python2.7/dist-packages/sklearn/mixture/__init__.py", line 5, in <module>
    from .gmm import sample_gaussian, log_multivariate_normal_density
  File "/usr/local/lib/python2.7/dist-packages/sklearn/mixture/gmm.py", line 20, in <module>
    from .. import cluster
  File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/__init__.py", line 6, in <module>
    from .spectral import spectral_clustering, SpectralClustering
  File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/spectral.py", line 16, in <module>
    from ..metrics.pairwise import pairwise_kernels
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/__init__.py", line 31, in <module>
    from . import cluster
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/cluster/__init__.py", line 21, in <module>
    from .bicluster import consensus_score
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/cluster/bicluster/__init__.py", line 1, in <module>
    from .bicluster_metrics import consensus_score
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/cluster/bicluster/bicluster_metrics.py", line 6, in <module>
    from sklearn.utils.validation import check_arrays
ImportError: cannot import name check_arrays

----------------------------------------------------------------------
Ran 5 tests in 0.148s

FAILED (errors=3)

Missing dependencies in README.rst

Trying to build html documentation on Win64 I've faced couple of errors on missing dependencies. They are: sphinx-gallery, sphinx_rtd_theme. With these packages installed the documentation is being built flawlessly.

So, I suggest to substitute

$ pip install Pillow matplotlib Sphinx numpydoc

with

$ pip install Pillow matplotlib Sphinx sphinx-gallery sphinx_rtd_theme numpydoc

Is there support for non-fully connected HMMs?

As far as I can tell there is no support for nonfully connected HMMs.
Eg Left-Right HMMs,
where each start can only move to states of index greater than, or equal to its own.

Am I correct in saying that they are not supported, for training?
Or have I missed something?
(I thought for a moment, that setting the transition_mat to have -Inf in the disallowed transitions would work, but it does not).

They can be implemented if you don't want to fit them., simply by setting the non-connected state probabilities to zero.
But doing thing and then asking it to fit causes an error.

another issuse

Hello

I am trying to use msmb, but it failed with messages

Traceback (most recent call last):
File "/usr/bin/msmb", line 9, in
load_entry_point('msmbuilder==3.3.0.dev0', 'console_scripts', 'msmb')()
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 356, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 2476, in load_entry_point
return ep.load()
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 2190, in load
['name'])
File "/usr/lib64/python2.7/site-packages/msmbuilder-3.3.0.dev0-py2.7-linux-x86_64.egg/msmbuilder/scripts/msmb.py", line 5, in
from ..commands import *
File "/usr/lib64/python2.7/site-packages/msmbuilder-3.3.0.dev0-py2.7-linux-x86_64.egg/msmbuilder/commands/init.py", line 5, in
from .fit import GaussianFusionHMMCommand
File "/usr/lib64/python2.7/site-packages/msmbuilder-3.3.0.dev0-py2.7-linux-x86_64.egg/msmbuilder/commands/fit.py", line 16, in
from ..hmm import GaussianFusionHMM
File "/usr/lib64/python2.7/site-packages/msmbuilder-3.3.0.dev0-py2.7-linux-x86_64.egg/msmbuilder/hmm/init.py", line 3, in
from .vmhmm import VonMisesHMM
File "/usr/lib64/python2.7/site-packages/msmbuilder-3.3.0.dev0-py2.7-linux-x86_64.egg/msmbuilder/hmm/vmhmm.py", line 17, in
from sklearn.hmm import _BaseHMM
ImportError: No module named hmm

I found this solution:

but it doesn't work for me. Since I cannot find :

config.add_subpackage("utils")
return config

in setup.py file....

Could anybody give me some advices?

thx a lot

left-right HMM?

It seems that hmmlearn assumes fully connected (ergodic) HMM when fitting the data. Is there a way to specify other types of HMM? eg. left-right HMM?.

Sample code complains about hmm not being fitted

Using this sample code provided in the docs (http://hmmlearn.github.io/hmmlearn/hmm.html#building-hmm-and-generating-samples)

>>> import numpy as np
>>> from hmmlearn import hmm

>>> startprob = np.array([0.6, 0.3, 0.1])
>>> transmat = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2], [0.3, 0.3, 0.4]])
>>> means = np.array([[0.0, 0.0], [3.0, -3.0], [5.0, 10.0]])
>>> covars = np.tile(np.identity(2), (3, 1, 1))
>>> model = hmm.GaussianHMM(3, "full", startprob, transmat)
>>> model.means_ = means
>>> model.covars_ = covars
>>> X, Z = model.sample(100)

generates this error:

NotFittedError: This GaussianHMM instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

Version info:

In [4]: hmmlearn.__version__
Out[4]: '0.2.0'

git info:

commit d6dbe8c855cfebe3431f1b0d87e31de47535ea3e
Author: Sergei Lebedev <[email protected]>
Date:   Thu Oct 15 01:34:49 2015 +0300

what's the difference between the hmmlearn and seqlearn ..

and the hmmlearn seems to be able to realize the supervised learning

ImportError: cannot import name hmm

Hi,

I used the hmm module from sklearn and tried to replace it by the hmmlearn module. Unfortunately I could not import it to my notebook.

from hmmlearn import hmm
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-7-8b8c029fb053> in <module>()
----> 1 from hmmlearn import hmm

ImportError: cannot import name hmm

I tried first
pip-3.3 install git+https://github.com/hmmlearn/hmmlearn.git

As this didn't work I cloned the project and run the setup.py (with python 3.3) but I still get an import error.

If I try to import

import hmmlearn.hmm

I get another error

ImportError Traceback (most recent call last)
<ipython-input-8-8dbb2cfe75b2> in <module>()
----> 1 import hmmlearn.hmm

/home/ipython/python/lib/python3.3/site-packages/hmmlearn/hmm.py in <module>()
22 from sklearn import cluster
23
---> 24 from .utils.fixes import log_multivariate_normal_density
25
26 from . import _hmmc

ImportError: No module named 'hmmlearn.utils'

What did I do wrong?

Cheers, Evelyn

Pip version is outdated

Hi,
The version of hmmlearn which one gets from installing with pip install hmmlearn,
is quiet out of date, and several fixed issues have been hitting me.

Namely the lack of monitoring (which I guess isn't fully implemented?),

and the lack of 0747df1

A new release should be made.
and the version should also be incremented.

(I am currently using master, installed from source, but for the good of all of us...)

clarify meaning of `covariance_type` for GaussianHMM

Perhaps something like

    covariance_type : string
        String describing the type of covariance parameters to use.  Must be
        one of 'spherical' (each state has its own single variance that applies
        for all components), 'tied' (the same general covariance matrix applies
        to all states), 'diag' (the same diagonal covariance matrix applies to
        all states), 'full' (each state has its own general covariance matrix).
        Defaults to 'diag'.

(I'm not even sure I got this correctly.)

LinAlgError with GaussianHMM.fit()

Reported by @voo42 in scikit-learn/scikit-learn#2803:

I get the following error when using sklearn.hmm.GaussianHMM.fit() using sklearn.version '0.14.1' under Windows 7 with Python 2.7.5.

  File "bugreport.py", line 15, in <module>
    gmm.fit([arr])
  File "D:\Python\Python27\lib\site-packages\sklearn\hmm.py", line 427, in fit
    framelogprob = self._compute_log_likelihood(seq)
  File "D:\Python\Python27\lib\site-packages\sklearn\hmm.py", line 737, in _compute_log_likelihood
    obs, self._means_, self._covars_, self._covariance_type)
  File "D:\Python\Python27\lib\site-packages\sklearn\mixture\gmm.py", line 58, in log_multivariate_normal_density
    X, means, covars)
  File "D:\Python\Python27\lib\site-packages\sklearn\mixture\gmm.py", line 610, in _log_multivariate_normal_density_full
    lower=True)
  File "D:\Python\Python27\lib\site-packages\scipy\linalg\decomp_cholesky.py", line 81, in cholesky
    check_finite=check_finite)
  File "D:\Python\Python27\lib\site-packages\scipy\linalg\decomp_cholesky.py", line 30, in _cholesky
    raise LinAlgError("%d-th leading minor not positive definite" % info)
numpy.linalg.linalg.LinAlgError: 1-th leading minor not positive definite

Minimal example that reproduces the bug (only with type=np.float32, float64 works - although other data shows the same problem when using float64):
import numpy as np
from sklearn import hmm

arr = np.asarray([[7.15000000e+02, 5.85000000e+02, 0.00000000e+00, 0.00000000e+00],
                  [7.15000000e+02, 5.20000000e+02, 1.04705811e+00, -6.03696289e+01],
                  [7.15000000e+02, 4.55000000e+02, 7.20886230e-01, -5.27055664e+01],
                  [7.15000000e+02, 3.90000000e+02, -4.57946777e-01, -7.80605469e+01],
                  [7.15000000e+02, 3.25000000e+02, -6.43127441e+00, -5.59954834e+01],
                  [7.15000000e+02, 2.60000000e+02, -2.90063477e+00, -7.80220947e+01],
                  [7.15000000e+02, 1.95000000e+02, 8.45532227e+00, -7.03294373e+01],
                  [7.15000000e+02, 1.30000000e+02, 4.09387207e+00, -5.83621216e+01],
                  [7.15000000e+02, 6.50000000e+01, -1.21667480e+00, -4.48131409e+01]], dtype=np.float32)

gmm = hmm.GaussianHMM(3, covariance_type='full')
gmm.fit([arr])

setuptools

Hi,

Is it possible to make hmmlearn available for setuptools?

Best regards,
Manuel

The old issue of having continuous ouput for multinomial hmm

Hello,

This seems to be an old issue

http://stackoverflow.com/questions/19360387/input-and-every-element-must-be-continuous-error-using-sklearn-multinominalhm

Is there someone motivated to fix this (I checked the code, the issue is still there).

Thanks

PS: Feel free to close the issue, in fact the problem comes from non seen vocabulary which is probably too specific for the package. Though I think that a more complicated multinomial training example should help in understanding.

Online documentation

Unless I missed it, would be nice not needing to build the docs myself to view them.

Help on setting up hmmlearn to canopy

I use canopy and after following the steps on how to install I tried to import the library, but I got this error.
My OS is windows 10 64 bit.

File "C:\Users\user\AppData\Local\Enthought\Canopy\User\lib\site-packages\hmmlearn-0.2.0-py2.7-win-amd64.egg\hmmlearn\hmm.py", line 20, in
from .base import _BaseHMM
File "C:\Users\user\AppData\Local\Enthought\Canopy\User\lib\site-packages\hmmlearn-0.2.0-py2.7-win-amd64.egg\hmmlearn\base.py", line 12, in
from . import _hmmc
ImportError: DLL load failed: A dynamic link library (DLL) initialization routine failed.

Can someone provide some help?

Deploy Continuous and Multinomial features at the same time?

Currently we can have GaussianHMM and MultinomialHMM models to use. However it seems to me that the observed features have to be a single type under one model. That is, either Continuous or Multinomial. No hybrid.

Hope to see we can deploy Continuous and Multinomial features in a mixture way. This is very useful because the feature types can be various and we need different ways to model them.

Different run gives different result?

Hi I am using GMMHMM and notice that different run gives different result. This is how I called GMMHMM :

model = GMMHMM(n_components = n_states, n_mix=2, covariance_type="diag", n_iter=1000,random_state = 4, thresh = 1e-3)
model.fit(z_in)    # z_in is the same for different run. 
model.score(y)   # y is the same for different run

What might cause different result in different run? Is there a way I can fix this?
Also, EM process might encounter local minimum. Is restart implement in GMMHMM.fix(..)?
Thank you.

Please distribute the LICENSE file in the pypi tarball

For convenience for packaging for Linux distributions. See e.g. PyCQA/modernize@228a96e
Thanks!

GaussianHMM cannot take a starting point for fitting means and covariances

Of course you can set means_ and covars_, but if we plan to fit the means and variances then they always get overwritten:

        if 'm' in params or not hasattr(self, "means_"): # <= oops
            kmeans = cluster.KMeans(n_clusters=self.n_components)
            kmeans.fit(X)
            self.means_ = kmeans.cluster_centers_
        if 'c' in params or not hasattr(self, "covars_"): # <= oops
            cv = np.cov(X.T)
            if not cv.shape:
                cv.shape = (1, 1)
            self._covars_ = distribute_covar_matrix_to_match_covariance_type(
                cv, self.covariance_type, self.n_components)
            self._covars_ = self._covars_.copy()
            if self._covars_.any() == 0:
                self._covars_[self._covars_ == 0] = 1e-5

Why decode()/score() returns a log likelihood value greater than zero?

The log likelihood values returned by score()/decode() is supposed to be a negative value in any ways. Otherwise, I'll de-log it to a probability value greater than 1, which doesn't make sense. Why I always got a positive log likelihood value after calling score()/decode()?

error running 'plot_hmm_stock_analysis.py'

I get the following error when running 'plot_hmm_stock_analysis.py' under Windows 10 with Anaconda 2.3.0 (64-bit)| (default, Dec 18 2014):

hmmlearn\utils.py:61: RuntimeWarning: invalid value encountered in subtract
  out = np.log(exp_mask_zero(a - a_max).sum(axis=0))
hmmlearn\hmm.py:258: RuntimeWarning: invalid value encountered in maximum
  self._covars_ = (covars_prior + cv_num) / np.maximum(cv_den, 1e-5)`

`Traceback (most recent call last):
  File "hmm_example.py", line 55, in <module>
    hidden_states = model.predict(X)
  File "hmmlearn\base.py", line 317, in predict
    _, state_sequence = self.decode(X, lengths)
  File "hmmlearn\base.py", line 278, in decode
    self._check()
  File "hmmlearn\hmm.py", line 155, in _check
    super(GaussianHMM, self)._check()
  File "hmmlearn\base.py", line 479, in _check
    .format(self.startprob_.sum()))
ValueError: startprob_ must sum to 1.0 (got nan)

benchmark does not run

Today's git checkout.

$ PYTHONPATH=. python bench/speed.py
benchmarking Gaussian HMM on a sample of size 65536
generating sample... Traceback (most recent call last):
  File "bench/speed.py", line 36, in <module>
    bench_gaussian_hmm(2**16)
  File "bench/speed.py", line 26, in bench_gaussian_hmm
    sample, _states = ghmm.sample(size)
  File "/home/antony/src/hmmlearn/hmmlearn/base.py", line 366, in sample
    check_is_fitted(self, "startprob_")
  File "/home/antony/.local/lib/python3.5/site-packages/sklearn/utils/validation.py", line 678, in check_is_fitted
    raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.utils.validation.NotFittedError: This GaussianHMM instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

Request for contributor rights

I am interested in contributing, testing, and fixing bugs.

underflow in the sklearn.mixture.GMM module

I have been writing a duration-explicit HMM https://github.com/georgid/HMMDuration
I am using the sklearn.mixture.GMM module in my observation model. I found out that it does not guarantee to avoid the issue of underflow as mentioned here: scikit-learn/scikit-learn#538

FYI:
The issue of underflow can be solved by adding min float constant in logsumexp(arr, axis=0) from utils/extmath.py
I did solve it by adding these lines to logsumexp(arr, axis=0)

import sys
MINIMAL_PROB = sys.float_info.min
old_settings = np.seterr( under='raise')
try:
a = np.exp(arr - vmax)
except FloatingPointError:
old_settings = np.seterr( under='ignore')
a = np.exp(arr - vmax)
a[a==0] = MINIMAL_PROB
Who cares to add this in the official release?

Implement FDR estimation and control

For some applications HMM states (or their combinations) represent distinct hypotheses (in the statistical sense). In such a setting we might be interested in:

estimating FDR for the predictions produced by the model,
constructing a vector of predictions satisfying an upper bound on the FDR.

See this work by W. Sun and T. Cai for details.

UnicodeDecodeError: 'utf-8' codec can't decode

Hi, got this error when trying to import... (win7 x64 python3.4). any ideas? Thanks.

from hmmlearn import hmm
Traceback (most recent call last):
File "", line 1, in
File "C:\Anaconda3\lib\site-packages\hmmlearn-0.1.1-py3.4-win-amd64.egg\hmmlearn\hmm.py", line 22, in
from .base import _BaseHMM, decoder_algorithms
File "C:\Anaconda3\lib\site-packages\hmmlearn-0.1.1-py3.4-win-amd64.egg\hmmlearn\base.py", line 12, in
from . import _hmmc
File "stringsource", line 269, in init hmmlearn._hmmc (hmmlearn/_hmmc.c:17699)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 1: invalid start byte

NumPy headers not found on OS X

Hi!

I am trying to install hmmlearn as described in the markdown document but I am running into problems. Specifically, I get the following error:

$ python setup.py install
running install
running bdist_egg
running egg_info
writing hmmlearn.egg-info/PKG-INFO
writing top-level names to hmmlearn.egg-info/top_level.txt
writing dependency_links to hmmlearn.egg-info/dependency_links.txt
reading manifest file 'hmmlearn.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'hmmlearn.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.10-x86_64/egg
running install_lib
running build_py
running build_ext
building 'hmmlearn._hmmc' extension
clang -fno-strict-aliasing -fno-common -dynamic -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/usr/local/include -I/usr/local/opt/openssl/include -I/usr/local/opt/sqlite/include -I/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c hmmlearn/_hmmc.c -o build/temp.macosx-10.10-x86_64-2.7/hmmlearn/_hmmc.o -O3
hmmlearn/_hmmc.c:239:10: fatal error: 'numpy/arrayobject.h' file not found
#include "numpy/arrayobject.h"
         ^
1 error generated.
error: command 'clang' failed with exit status

I have numpy, scipy and scikit-learn installed on my system and they work fine when I use them in my other scripts. Can you help me install the package?

Cython and Windows: build fails

The new setup.py is much simpler and far more elegant, great work. Unfortunately the change isn't 100% pain free. There's an issue with Cython compiling on Windows platforms, outlined here resulting in many errors along the lines of

 hmmlearn\_hmmc.o:_hmmc.c:(.text+0x29): undefined reference to `_imp___Py_NoneStruct'

I don't think that there is a straightforward fix, Windows users wishing to compile hmmlearn will either;

Use pexport to extract a definitions file from their python lib
Use a third party definitions file

Neither of which can be covered by the hmmlearn setup script. However, the build is currently breaking for Windows users!

Can't get it working

Hi,

I tried to run the plot_hmm example but here is what I get

Traceback (most recent call last):
  File "/Users/ingenia/git/hmmlearn/examples/plot_hmm_sampling.py", line 19, in <module>
    from hmmlearn import hmm
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/hmmlearn/hmm.py", line 21, in <module>
    from .base import _BaseHMM, decoder_algorithms
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/hmmlearn/base.py", line 8, in <module>
    from . import _hmmc
  File "__init__.pxd", line 155, in init hmmlearn._hmmc (hmmlearn/_hmmc.c:18874)
ValueError: numpy.dtype has the wrong size, try recompiling

I followed the instructions to install but no luck, any idea on how to solve this problem?

Build issue on Windows 7

An error occurred in the hmmlearn build step of nilmtk
building 'hmmlearn._hmmc' extension
error:Mircosoft visual c++ 10.0 is required (unable to find vcvarsall.bat)

full transcription below

I appear to have the 2010 and 2012 redistributables installed for both 32 & 64 bit see below

image

vcvarsall.bat is not present on my PC

Am I missing another install or is this a build error?

originally posted here on nilmtk issue 386

ImportError: cannot import name 'logsumexp'

I just recently switched from version 0.1.1 to the newest on master branch, and I'm getting an error when loading the library.

Here's my code:
import numpy as np
import matplotlib.pyplot as plt
from hmmlearn import hmm

Here's the error:

ImportError Traceback (most recent call last)
in ()
2 import matplotlib.pyplot as plt
3
----> 4 from hmmlearn import hmm
5

/Users/ckemere/Anaconda3/lib/python3.4/site-packages/hmmlearn-0.2.0-py3.4-macosx-10.5-x86_64.egg/hmmlearn/hmm.py in ()
18 from sklearn.utils import check_random_state
19
---> 20 from .base import _BaseHMM
21 from .utils import iter_from_X_lengths, normalize
22

/Users/ckemere/Anaconda3/lib/python3.4/site-packages/hmmlearn-0.2.0-py3.4-macosx-10.5-x86_64.egg/hmmlearn/base.py in ()
11
12 from . import _hmmc
---> 13 from .utils import normalize, logsumexp, iter_from_X_lengths,
14 log_mask_zero, exp_mask_zero
15

ImportError: cannot import name 'logsumexp'

extra_compile_args=["-O3"]

On Windows (conda/gcc), the default compilation switch is "-O", which yields a much slower extension module than what can be obtained after compiling with "-O3". I would thus suggest adding extra_compile_args=["-O3"] to the setup.py so that this is used by default.
Not sure whether this would affect those using the MSVC compiler though.

Dimension problems setting means and covars for a single feature GaussianHMM

I'm new to HMM's so feel free to tell me if I should be using a different model, but I've been struggling with GaussianHMM. Upon calling decode(), _compute_log_likelihood calls log_multivariate_normal_density, but what if I just want to use plain old univariate Gaussian distributions? _log_multivariate_normal_density struggles with the shape of my observation, I'm getting a not enough values to unpack error.

online doc out of date

It looks like the online doc is out of date. The following sample code does not work(but the sample code from the repos works: examples/plot_hmm_sampling.py).
http://hmmlearn.github.io/hmmlearn/auto_examples/plot_hmm_sampling.html#example-plot-hmm-sampling-py

transmat initialization

It used to be possible to pass a transmat kwarg to GaussianHMM (still mentioned in the ) to set a starting point for Viterbi, but this seems to no longer be the case.
It would be nice if this possibility was restored, or at least if an alternative was mentioned in the docs.

Build error

When I build the project with: python setup.py build, I get the following errors:

running build
running build_py
running build_ext
building 'hmmlearn._hmmc' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/hmmlearn
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c hmmlearn/_hmmc.c -o build/temp.linux-x86_64-2.7/hmmlearn/_hmmc.o
x86_64-linux-gnu-gcc: error: hmmlearn/_hmmc.c: No such file or directory
x86_64-linux-gnu-gcc: fatal error: no input files
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 4

The build does not seem to be able to find the hmmlearn/_hmmc.c file. Any advice?

Is numerical stability an issue?

In the earliest version that was committed here, a note was to be found in a docstring that warned of known numerical stability issues.

The note was removed in this commit, without going into detail about it: b378e11

As I've seen, internally logarithms of probabilities are being used for computations.

So, is numerical stability actually a non-issue? But why was the note originally present?

ImportError cannot import name _hmmc

nosetests...

E.E
======================================================================
ERROR: Failure: ImportError (cannot import name _hmmc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "/usr/local/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/usr/local/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/steven/t42/hmmlearn/hmmlearn/hmm.py", line 26, in <module>
    from . import _hmmc
ImportError: cannot import name _hmmc

======================================================================
ERROR: Failure: ImportError (cannot import name _hmmc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "/usr/local/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/usr/local/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/steven/t42/hmmlearn/hmmlearn/tests/test_hmm.py", line 14, in <module>
    from hmmlearn import hmm
  File "/home/steven/t42/hmmlearn/hmmlearn/hmm.py", line 26, in <module>
    from . import _hmmc
ImportError: cannot import name _hmmc

----------------------------------------------------------------------
Ran 3 tests in 0.315s

FAILED (errors=2)

Transition matrix stabilization results in un-normalized transition matrices

In base.py, the transition matrix is calculated from the sufficient statistics and then normalized. However, in the follow line, the transition matrix update is done element-by-element, ignoring cells which are less than eps. The result is an un-normalized transition matrix if fit() stops after this update, which breaks, e.g., score().

I just eliminated the eps - check, which means that some cells of the transition matrix will effectively go to zero. I'm not sure what the "right" thing to do is.

        if 't' in params:
            transmat_ = self.transmat_prior - 1.0 + stats['trans']
            normalize(transmat_, axis=1)
            self.transmat_ = np.where(self.transmat_ <= np.finfo(float).eps,
                                      self.transmat_, transmat_)

Microsoft Visual C++ 9.0 is required