mims-harvard / nimfa Goto Github PK

View Code? Open in Web Editor NEW

525.0 36.0 133.0 8 MB

Nimfa: Nonnegative matrix factorization in Python

Home Page: http://ai.stanford.edu/~marinka/nimfa/

License: Other

Python 100.00%

matrix-factorization latent-features latent-variable-models nonnegative-matrix-factorization embeddings

nimfa's Introduction

Nimfa

Nimfa is a Python module that implements many algorithms for nonnegative matrix factorization. Nimfa is distributed under the BSD license.

The project was started in 2011 by Marinka Zitnik as a Google Summer of Code project, and since then many volunteers have contributed. See AUTHORS file for a complete list of contributors.

It is currently maintained by a team of volunteers.

[News:] Scikit-fusion, collective latent factor models, matrix factorization for data fusion and learning over heterogeneous data.

[News:] fastGNMF, fast implementation of graph-regularized non-negative matrix factorization using Facebook FAISS.

Important links

Official source code repo: https://github.com/marinkaz/nimfa
HTML documentation (stable release): http://ai.stanford.edu/~marinka/nimfa
Download releases: http://github.com/marinkaz/nimfa/releases
Issue tracker: http://github.com/marinkaz/nimfa/issues

Dependencies

Nimfa is tested to work under Python 2.7 and Python 3.4.

The required dependencies to build the software are NumPy >= 1.7.0, SciPy >= 0.12.0.

For running the examples Matplotlib >= 1.1.1 is required.

Install

This package uses setuptools, which is a common way of installing python modules. To install in your home directory, use:

python setup.py install --user

To install for all users on Unix/Linux:

sudo python setup.py install

For more detailed installation instructions, see the web page http://ai.stanford.edu/~marinka/nimfa.

Alternatively, you may also install this package using conda:

conda install -c conda-forge nimfa

Use

Run alternating least squares nonnegative matrix factorization with projected gradients and Random Vcol initialization algorithm on medulloblastoma gene expression data:

>>> import nimfa
>>> V = nimfa.examples.medulloblastoma.read(normalize=True)
>>> lsnmf = nimfa.Lsnmf(V, seed='random_vcol', rank=50, max_iter=100)
>>> lsnmf_fit = lsnmf()
>>> print('Rss: %5.4f' % lsnmf_fit.fit.rss())
Rss: 0.2668
>>> print('Evar: %5.4f' % lsnmf_fit.fit.evar())
Evar: 0.9997
>>> print('K-L divergence: %5.4f' % lsnmf_fit.distance(metric='kl'))
K-L divergence: 38.8744
>>> print('Sparseness, W: %5.4f, H: %5.4f' % lsnmf_fit.fit.sparseness())
Sparseness, W: 0.7297, H: 0.8796

Cite

@article{Zitnik2012,
  title     = {Nimfa: A Python Library for Nonnegative Matrix Factorization},
  author    = {Zitnik, Marinka and Zupan, Blaz},
  journal   = {Journal of Machine Learning Research},
  volume    = {13},
  pages     = {849-853},
  year      = {2012}
}

Selected publications (Methods)

Data fusion by matrix factorization: http://dx.doi.org/10.1109/TPAMI.2014.2343973
Jumping across biomedical contexts using compressive data fusion: https://academic.oup.com/bioinformatics/article/32/12/i90/2240593
Survival regression by data fusion: http://www.tandfonline.com/doi/abs/10.1080/21628130.2015.1016702
Gene network inference by fusing data from diverse distributions: https://academic.oup.com/bioinformatics/article/31/12/i230/216398
Fast optimization of non-negative matrix tri-factorization: https://doi.org/10.1371/journal.pone.0217994

Selected publications (Applications)

A comprehensive structural, biochemical and biological profiling of the human NUDIX hydrolase family: https://www.nature.com/articles/s41467-017-01642-w
Gene prioritization by compressive data fusion and chaining: http://dx.doi.org/10.1371/journal.pcbi.1004552
Discovering disease-disease associations by fusing systems-level molecular data: http://www.nature.com/srep/2013/131115/srep03202/full/srep03202.html
Matrix factorization-based data fusion for gene function prediction in baker's yeast and slime mold: http://www.worldscientific.com/doi/pdf/10.1142/9789814583220_0038
Matrix factorization-based data fusion for drug-induced liver injury prediction: http://www.tandfonline.com/doi/abs/10.4161/sysb.29072
Collective pairwise classification for multi-way analysis of disease and drug data: https://doi.org/10.1142/9789814749411_0008

Tutorials

Hidden Genes: Understanding cancer data with matrix factorization, ACM XRDS: Crossroads: https://dl.acm.org/citation.cfm?id=2809623.2788526 [Jupyter Notebook]

nimfa's People

Contributors

Stargazers

Watchers

Forkers

hjanime vatsan hanfeisun jherre haoshuji dangjaya skytodinfi wangdongfrank budgefeeney mrcinv dqzjh0319 markotoplak kapong mindis tadas-subonis nadesai nevmerzhitsky kobauman mstrazar pinguar mariusvniekerk stevenlol eraldop bdurgahee ccshao melissa-cruz kevinningthu eavie farzades asanoboy renchuqiao nahilsobh scicubator ismav zhenv5 yinsenm arunreddy keita1 mak02 mickeysjm connectsoumya abhishek111 haocoder haesemeyer mutual-ai jcastro-inf zzzrbx luqi74 chirayukong xiangyongcao zshwuhan quxiaofeng renke2 minghao2016 wzh43211 statml anirband ryttnk kingofspace0wzz gridl yw81 selenadali teslaa22 katsumk zwytop alabarga joonhoon linkone1a pchj qianjx afcarl ad05bzag tmoerman tuchang oesteban nayfous swatibanerjee29 mengwang024 maocx molimomo sunnybetagithub mata62n rafalmatuszak snikumbh anurag21k paidi lhanappa keller-mark vellamike flash1803 bsmith89 mosaddek-hossain mascaretti daluzi silence28 kyzhouhzau ldfting neelammahapatro liu6996 linnan1

nimfa's Issues

can't get binary result from bmf

Minor issue in recommendations.py

I have found a minor bug. The RMSE calculation did not take a square root. (line 128). Thanks!

Options have no effect in factorization

Hello Marinka Zitnik,

I have been using your nimfa module for my research. While comparing results from various seeding methods, I realized that the variant of the NNDSVD algorithm is not taken into effect by the flag. So, for example, in nmf.py,

self.W, self.H = self.seed.initialize(
               self.V, self.rank, self.options)

should be like

self.W, self.H = self.seed.initialize(
               self.V, self.rank, self.options['options'])

for the non-default NNDSVD algorithms, NNDSVDa or NNDSVDar, to be able to be in effect. As you already know, this applies to almost all the py files in the factorization directory.

Thank you,
yskangtamu

Scalar operations from `math` often passed to `sop`

Often times scalar-only operations like math.sqrt are passed to sop. These are often then applied directly onto a numpy array as in

return op(X + eps, s) if s != None else op(X + eps)

Resulting in a type error because math.sqrt doesn't know how to operate on numpy arrays.

It seems like either the calling function needs to pass np.sqrt or sop needs to be smart enough to select the numpy version if given a scalar version.

Possible Issue with Theta matrix in PMFCC?

I have been playing with the implementation of PMFCC in nimfa and noticed that any arbitrary input for the theta matrix is accepted (and presumably used?) by nimfa. It doesn't seem to throw an error, even if I use a theta with the incorrect dimensions. Surprisingly, it seems that each distinct choice of theta, regardless of whether it has the appropriate dimensions or not, gives different outputs.

Is this an issue?

using pip to install nimfa, but nothing works

I am using Python 2.7.11, and my pip is up-to-date.

I use pip install nimfa, and it says that version 1.2.3 is being installed.

But when I run the example code, it says:
'module' object has no attribute 'examples'

Moreover strangely enough, when I uninstalled nimfa using pip uninstall nimfa, the module somehow persists, even after I restarted my computer. What could be going on?

Unable to run examples.recommendations

Running the recommendations example doesn't work

import nimfa.examples
nimfa.examples.recommendations.run()

Read MovieLens data set
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-1-6947a2ae96c5> in <module>
      1 import nimfa.examples
----> 2 nimfa.examples.recommendations.run()

~/.local/lib/python3.8/site-packages/nimfa/examples/recommendations.py in run()
     65     """
     66     for data_set in ['ua', 'ub']:
---> 67         V = read(data_set)
     68         W, H = factorize(V)
     69         rmse(W, H, data_set)

~/.local/lib/python3.8/site-packages/nimfa/examples/recommendations.py in read(data_set)
    103     fname = join(dirname(dirname(abspath(__file__))), "datasets", "MovieLens", "%s.base" % data_set)
    104     V = np.ones((943, 1682)) * 2.5
--> 105     for line in open(fname):
    106         u, i, r, _ = list(map(int, line.split()))
    107         V[u - 1, i - 1] = r

FileNotFoundError: [Errno 2] No such file or directory: '/home/michele/.local/lib/python3.8/site-packages/nimfa/datasets/MovieLens/ua.base'

Running the script directly doesn't work either

$ pwd
/home/michele/.local/lib/python3.8/site-packages/nimfa/examples

$ ls
all_aml.py      documents.py             __init__.py         orl_images.py  recommendations.py
cbcl_images.py  gene_func_prediction.py  medulloblastoma.py  __pycache__    synthetic.py

$ python recommendations.py 
Read MovieLens data set
Traceback (most recent call last):
  File "recommendations.py", line 133, in <module>
    run()
  File "recommendations.py", line 67, in run
    V = read(data_set)
  File "recommendations.py", line 105, in read
    for line in open(fname):
FileNotFoundError: [Errno 2] No such file or directory: '/home/michele/.local/lib/python3.8/site-packages/nimfa/datasets/MovieLens/ua.base'

I installed the library via pip,
$ pip install --user nimfa

$ python --version
Python 3.8.1
$ pip --version
pip 19.3 from /usr/lib/python3.8/site-packages/pip (python 3.8)

Ok

AttributeError: 'module' object has no attribute 'Lsnmf'

When I tried to test the example (Lsnmf) given by the official site of Nimfa, like:

import nimfa
V = nimfa.examples.medulloblastoma.read(normalize=True)
lsnmf = nimfa.Lsnmf(V, seed='random_vcol', rank=50, max_iter=100)

then I got the following error:

AttributeError: 'module' object has no attribute 'Lsnmf'

I guess the reason might be we need to give the whole path of Lsnmf like, nimfa.methods.factorization.lsnmf.Lsnmf.

But when I did this,

lsnmf = nimfa.methods.factorization.lsnmf.Lsnmf(V, seed='random_vcol', rank=50, max_iter=100),

then it gave me:

TypeError: init() takes exactly 1 argument (5 given).

How can I solve this problem?

Unhashable type: 'matrix' when trying to use purity function

When I try to use the purity function with a fitted nmf model, using a list of numbers as the membership list, I receive the following error:

nmf.purity(classes)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2878, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
nmf.purity(classes)
File "/usr/local/lib/python2.7/dist-packages/nimfa/models/nmf.py", line 388, in purity
[dmbs.setdefault(mbs[i], set()).add(i) for i in range(len(mbs))]
TypeError: unhashable type: 'matrix'

the nmf model was fitted using the Bd function. "classes" is a plain python list. The documentation asks for a list. Does it need something else? How can this be fixed?

basis matrix type returned by different methods.

I found methods return basis / coefficient matrix in different data type.

For example,
pmf and nsnmf (as far as I noticed) return basis matrix in _sparse matrix of type '<type 'numpy.float64'_;

bd, lfnmf and nmf return basis matrix in _matrix_ format.

Could you kindly make the return matrix in same data type?

Thanks,

'Mf_fit' object has no attribute 'estimate_rank'

Really appreciating Nimfa and its working well for me in factorizing some preference data. I am however trying to test different ranks, and would like to use the built-in method to return quality parameter. I cannot get the estimate_rank method to work however. Sample code:

import numpy as np

import nimfa

V = np.array([[1, 2, 3], [4, 5, 6], [6, 7, 8]])
print('Target:\n%s' % V)

lsnmf = nimfa.Lsnmf(V, seed='random_vcol', max_iter=10, rank=3, track_error=True)
lsnmf_fit = lsnmf()

W = lsnmf_fit.basis()
print('Basis matrix:\n%s' % W)

H = lsnmf_fit.coef()
print('Mixture matrix:\n%s' % H)

r = lsnmf_fit.estimate_rank()
print('Rank estimate:\n%' % r)

Running Python 3.5 with Nimfa 1.2.2

Thankyou!

KeyError: <built-in function format> in SepNMF

I'm using Python 3.6.8 and tried SepNMF. (I know that this is tested not on 3.6.8)
Then I got KeyError: <built-in function format> by following code:

import nimfa

model = nimfa.SepNmf(X_uw)  # lil_matrix
model.factorize()

I have checked the code and I wonder theformat variable not defined? (of course, format is build-in in Python 3.x). Is this a bug of SepNMF?

https://github.com/marinkaz/nimfa/blob/master/nimfa/methods/factorization/sepnmf.py#L300

python3 compatibility for nimfa?

I've been using nimfa's NMF implementation for a bit, but recently switched to a python3 environment and decided to install it there. When I attempt to import nimfa in python3.4, I get

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.4/site-packages/nimfa/__init__.py", line 18, in <module>
    from mf_run import *
ImportError: No module named 'mf_run'

It seems this is because python3 requires explicit relative imports, so the correct import statement would be from .mf_run import *. After some more digging, I found several places where the library is not currently python3 compatible. I ran the 2to3 utility on the library in order to look for more, and got a number of changes that I have recorded in the following diff: https://gist.github.com/nadesai/0727db6af37383a153e9.

Are there any long-term plans to address these compatibility issues, and perhaps allow the library to be used under python3?

example datasets are no longer available

Hi!
The datasets (leukemia and medulloblastoma) located here are no longer available:
http://nimfa.biolab.si/nimfa.datasets.html

Is there a way to get them?

Thanks

Added to conda

Hi,

this is more of an FYI than an issue but I just wanted to let the maintainers know that I added nimfa to conda-forge so that it can be downloaded with conda.

The repo is here: https://github.com/conda-forge/nimfa-feedstock

Let me know if anyone of the maintainers here would like to be added as maintainers there or if I should make a PR here to add a badge or install instructions. Otherwise, feel free to just close the issue.

Thanks! 😃

Release v1.2 on PyPI

The latest version on PyPI is still version 1.1, despite v1.2 having been released 8 months ago. Could you update the PyPI release so it is up to date?

Since the API has changed completely and we are developing code to work with the latest version, we need to require v1.2 as a dependency for a package we (@swkeemink) are creating.

Thanks!

nimfa in orange

Hi,

I was wondering if nimfa can be used in widget form within the latest Orange3 version? I cannot seem to find its add-on.

Regards,
Bernard

new way to use nimfa.mf in version 1.2.1?

Previous it is possible to use different nmf methods by nimfa.mf interface, like following:

import nimfa
import numpy as np
V = np.random.random((10000, 1000))
fctr = nimfa.mf(V, seed = 'random_vcol', method = 'lsnmf', rank = 40, max_iter = 50)
fctr_res = nimfa.mf_run(fctr)

Now it gives the error:
AttributeError: 'module' object has no attribute 'mf'

What is the proper way to use different method through a common interface ?

How to reproduce NMF result by setting seed?

I just see seed options in doc but not how to reproduce the same result. Is there a good way?

Cannot get initialized basis and mixture matrices

I want to take a look at the initialized W and H (using nndsvd initialization). I try the following:

model = mf.mf(X, rank=10, initialize_only=True, seed="nndsvd")
init_W, init_H = model.basis(), model.coef()

but init_W and init_H are just assigned to be "None".

Any idea what I'm doing wrong?

I have also tried

initializer = mf.methods.seeding.nndsvd.Nndsvd()
init_, init_H = initializer.initialize(X, rank, {})

but I get the following error

/home/conradlee/local/lib/python2.6/site-packages/mf/methods/seeding/nndsvd.pyc in initialize(self, V, rank, options)
     58         if negative(V):
     59             raise MFError("The input matrix contains negative elements.")
---> 60         U, S, E = svd(V)
     61         if sp.isspmatrix(U):
     62             return self.init_sparse(V, U, S, E)

/home/conradlee/local/lib/python2.6/site-packages/mf/utils/linalg.pyc in svd(X)
    313            U, S, V = _svd_left(X)
    314        else:
--> 315            U, S, V = _svd_right(X)
    316     else:
    317         U, S, V = nla.svd(np.mat(X))

/home/conradlee/local/lib/python2.6/site-packages/mf/utils/linalg.pyc in _svd_right(X)
    337                 u_vec = err.eigenvectors
    338         else:
--> 339             val, u_vec = sla.eigen_symmetric(XXt, k = X.shape[0] - 1)
    340     else:
    341         val, u_vec = nla.eigh(XXt.todense())

AttributeError: 'module' object has no attribute 'eigen_symmetric'

Cophenetic Correlation

I get weirdly different Cophenetic Correlation numbers,

when I go with model estimation, and ask for

model = nimfa.mf(V, method = "nmf", rank = 3)
est_rank = model.estimate_rank(range=xrange(2,6),n_run=2=
cophenetic_list = [est_rank[item]['cophenetic'] for item in est_rank]

the set of cophenetic correlation scores are different from low_memory version

model = nimfa.mf(V, method = "nmf", rank = 3)
sd = fit.summary()
cophenetic_list.append(sd['cophenetic'])

in the latest version (single mf run) ALL cophenetic correlations are weirdly equal to 1!

In Snmf._spfcnnls, len(f_set) never decreases.

If you print the length of f_set in each loop, the value never decreases in this while loop:
https://github.com/marinkaz/MF/blob/master/nimfa/methods/factorization/snmf.py#L289

You can reproduce the issue by running the following code-

import scipy.sparse
import random
import nimfa
from time import time
import numpy

m1 = scipy.sparse.lil_matrix((10, 95))
for i in xrange(10):
    for j in xrange(95):
        if random.random() > 0.8: m1[i, j] = 1
m1 = scipy.sparse.csc_matrix(m1)
m1.sort_indices()
t = time()
fctr = nimfa.mf(m1, 
              seed = "random_vcol", 
              rank = 2, 
              method = "snmf", 
              max_iter = 15, 
              initialize_only = True,
              version = 'r',
              eta = 1.,
              beta = 1e-4, 
              i_conv = 10,
              w_min_change = 0)
print numpy.shape(m1)
a =  nimfa.mf_run(fctr)

The code never completes. If a dense matrix is passed, it completes in less than a second.

Factorization with uncertainty and missing data

Hi, I would like to apply NMF to a dataset with missing values and uncertainties. I know there exists this package: https://github.com/guangtunbenzhu/NonnegMFPy, but it hasn't been updated in a long time, so I am not sure if I trust it.

Nimfa should be able to do the same, right? However, I don't find anywhere in the documentation this application. Is it actually possible to do this with nimfa? Otherwise, I think I would need to hard code this.

Is it possible to obtain uncertainties for the H and/or W matrices?

Thanks in advanced

NMF with only iteration on H.

Hello,

I'm a new user of python and nimfa. Sorry to submit this as an issue, because it's not. I just need some help about what's possible to do with the library.
So basically my NMF problem to solve is that i have a single vector v. A learning matrix W already defined . So I want to compute the H vector without running updates on W. So only running updates on H using the squared euclidean distance or de kullback-leibler divergence minimisation methods.

Would that be possible with nimfa?

Gillis2014 paper does not corresponds to the 'xray' algorithm in SepNMF

Hi, thanks for this package.

I read gillis2014 paper entitled "Fast and Robust Recursive Algorithms for Separable Nonnegative Matrix Factorization" on IEEE PAMI. This article provides a spa alike algorithm, it does not give xray.

Where is this xray algorithm ? should the comment on sepnmf being updated.

Best.

Can't find error history when option "track_errors" set to True

Whenever I perform MF, I'd like to know whether the objective has converged or whether I should have run more iterations. So I was happy to read about the "track_error" option. However, I can't figure out how to use this option.

I have looked through the examples and none of them seem to use this option. In fact, none of them seem to use the "options" parameter, which is a dictionary.

So this "issue" is really just a question: how do I actually track the residuals? It's also a bit of a suggestion to make this clear in one of your examples, or perhaps even in one of the examples on the main mf page.

Here is what I have tried:

model = mf.mf(norm_X, seed="nndsvd",rank=5, max_iter=20, initialize_only=True, objective="div", update="divergence",options={"track_error":True})
fit = mf.mf_run(model)
summary = fit.summary()
print summary.keys()

The fit object does not seem to have a "error_tracking" attribute or function, and neither does the summary object.

Also, when I now type

print model.track_error

it prints "False"

but when I type

print model.options

it prints {'objective': 'div', 'options': {'track_error': True}, 'update': 'divergence'}

Numpy deprecation warning

When running snmf() the following warning appears:

C:\Python27-64\lib\site-packages\numpy\matrixlib\defmatrix.py:318: VisibleDeprecationWarning: non integer (and non boolean) array-likes will not be accepted as indices in the future
out = N.ndarray.getitem(self, index)

mixing matrix of unseen data?

Hi there,

Just wondering if there is a way in nimfa to obtain the mixing matrix of unseen data?
Equivalent of transform in sklearn.decomposition ?

Thanks!

Python-mapping sometimes used as index

I was just playing around with this awesome toolbox when I encountered an occasional IndexError in snmf.py line 565:

t_d = D[l_1n, l_2n] / (D[l_1n, l_2n] - K[l_1n, l_2n])

I checked the code and found the problem in the if-statement above:

if n_h_set == 1:
    h_n = h_set * np.ones((1, len(j_f)))
    l_1n = i_f
    l_2n = map(int, h_n.tolist()[0])
else:
   l_1n = i_f
   l_2n = map(int, [h_set[e] for e in j_f])

In newer python versions a mapping can not be used as an index, hence indexing D with l_2n fails. A simple fix would be to put a list() around the mapping.

Thanks!

Preformance of nimfa for larger matrix

Hi, marinkaz
I am doing recsys research where the matrix we used is very large, usually 10000 x 1000. I was wondering that nimfa can process such large matrix or not ? So is there any way to process such large matrix. Many Thanks!

cophenetic calculation for rank estimation

Hi,

I am now trying to integrate nmf with ability to automatically detect the rank into my program. However, when I tried to get cophenetic to estimate the best rank, they are all 1. I saw another issue and learned that it wan ran only on one model so it's giving 1. However, I could not figure out how should I modify the code to make it work. I wonder if there is any template or tutorial for rank estimation?

Thank you

Snmf factorization method version = 'l' problem when initilisation matrices W & H are provided

When the 'l' version of SNFM is chosen and initilisation matrices are provided the factorize method does not swap and transpose initial W and H as it says it should in lines 176-177, it instead just keeps them in their original state which results in a error in line 197 when the matrices can't be broadcasted together.

Suggestion: integrate into scikit.learn

They guys over at scikit.learn also do NMF in python. They've implemented only the projected gradient method from Lin et al and the SVD seeding (as you have included in your package).

The scikit.learn project puts an emphasis on performance, which means they emphasize using vectorized numpy operations rather than higher-level python operations. For example, here you can see how they went about optimizing their NMF code. However your package is much more comprehensive than theirs.

If you are interested in getting more people to use your code as well as getting some good criticism and suggestions, I would suggest integrating your work into scikit.learn. I would be willing to help out with porting some of your documentation, but I am not very familiar with numpy.

Docs [enhancement]

It would be nice if you could add a link (e.g., ADS or arXiv) to all the cited papers in the documentation.

Calculation of RSS

Current RSS is calculated in the following lines:

V = self.target(idx)
X = self.residuals(idx = idx)
xX = V - (self.V - dot(self.W, self.H))
return multiply(xX, xX).sum()

Note that the function of residuals performs the following:

return self.V - dot(self.W, self.H)

So to create xX, you're subtracting the residuals from the original matrix, and the you return the sum of the square of xX. Are you sure this is the correct way to calculate the RSS? What is the purpose of xX? Isn't X already the residuals matrix? Shouldn't you just return the sum of the square of X? In other words:

V = self.target(idx)
X = self.residuals(idx = idx)
return multiply(X, X).sum()

I have not read anywhere how the RSS is to be calculated, so I might be completely wrong. I'm just going off the name :)

'module' object has no attribute 'Nmf'

HI,

I am trying to use Nimfa to use the NMF. I have installed nimfa in the python interpreter and this is the part in my code.

import nimfa

nmf = nimfa.Nmf(tfidf, max_iter=200, rank=2, update='divergence', objective='div')
fit = nmf()
print("Residual sum of squares: ", fit.summary(None)['rss'])

However, I keep on getting this error

AttributeError: 'module' object has no attribute 'Nmf'

Am I doing something wrong?

Thank you

Does this code support Missing Value with imputation?

Dear all,

Is there a method to work with missing values?
I looked in the documentation and I have space data (lots of zeros) that represent missing data. Is there a method that considers 0 as missing data and creates a mask or uses imputation to discard the missing values from the cost function?

Default kwargs set incorrectly

In 539f069 (see here) is not None was introduced to replace implicit boolean use (i.e., not alpha) on multiple models, but I believe the meaning was reversed and should be is None instead. Currently leaving the defaults as None doesn't set them to a default and leaves them as None, throwing errors on down the line.

AML ALL example is broken for me due to `_conn_change` issue

My stack trace is here:
https://gist.github.com/2707012

Getting ValueError: math domain error with higher values for "max_iter"

I am trying to factor the matrix which you can download here.

I am using the same settings as are used in your document clustering example. When I set max_iter to 25, I get this error. Any idea what's going on?

My parameter settings are:
rank=5
meth=nmf
max_iter=25
update=divergence
obj=div

"Don't know how to make test" on Python 3.5

When running build_ext on Python 3.5

running build_ext
/tmp/nix-build-python3.5-nimfa-1.3.2.drv-0/nimfa-1.3.2/nimfa/examples/cbcl_images.py:98: UserWarning: PIL must be installed to run CBCL images example.
  warn("PIL must be installed to run CBCL images example.")
/tmp/nix-build-python3.5-nimfa-1.3.2.drv-0/nimfa-1.3.2/nimfa/examples/orl_images.py:110: UserWarning: PIL must be installed to run ORL images example.
  warn("PIL must be installed to run ORL images example.")
Traceback (most recent call last):
  File "nix_run_setup.py", line 8, in <module>
    exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\\r\\n', '\\n'), __file__, 'exec'))
  File "setup.py", line 145, in <module>
    setup_package()
  File "setup.py", line 140, in setup_package
    'Programming Language :: Python :: 3',],
  File "/nix/store/3h0zjr4jnl7z7z6m33v9sk2ssswyw6ir-python3.5-bootstrapped-pip-9.0.1/lib/python3.5/site-packages/setuptools/__init__.py", line 129, in setup
    return distutils.core.setup(**attrs)
  File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/nix/store/3h0zjr4jnl7z7z6m33v9sk2ssswyw6ir-python3.5-bootstrapped-pip-9.0.1/lib/python3.5/site-packages/setuptools/command/test.py", line 226, in run
    self.run_tests()
  File "/nix/store/3h0zjr4jnl7z7z6m33v9sk2ssswyw6ir-python3.5-bootstrapped-pip-9.0.1/lib/python3.5/site-packages/setuptools/command/test.py", line 248, in run_tests
    exit=False,
  File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/unittest/main.py", line 94, in __init__
    self.parseArgs(argv)
  File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/unittest/main.py", line 118, in parseArgs
    self._do_discovery(argv[2:])
  File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/unittest/main.py", line 229, in _do_discovery
    self.test = loader.discover(self.start, self.pattern, self.top)
  File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/unittest/loader.py", line 341, in discover
    tests = list(self._find_tests(start_dir, pattern))
  File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/unittest/loader.py", line 398, in _find_tests
    full_path, pattern, namespace)
  File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/unittest/loader.py", line 475, in _find_test_path
    tests = self.loadTestsFromModule(package, pattern=pattern)
  File "/nix/store/3h0zjr4jnl7z7z6m33v9sk2ssswyw6ir-python3.5-bootstrapped-pip-9.0.1/lib/python3.5/site-packages/setuptools/command/test.py", line 52, in loadTestsFromModule
    tests.append(self.loadTestsFromName(submodule))
  File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/unittest/loader.py", line 213, in loadTestsFromName
    raise TypeError("don't know how to make test from: %s" % obj)
TypeError: don't know how to make test from: {'sepnmf': <class 'nimfa.methods.factorization.sepnmf.SepNmf'>, 'snmnmf': <class 'nimfa.methods.factorization.snmnmf.Snmnmf'>, 'bd': <class 'nimfa.methods.factorization.bd.Bd'>, 'icm': <class 'nimfa.methods.factorization.icm.Icm'>, 'lfnmf': <class 'nimfa.methods.factorization.lfnmf.Lfnmf'>, 'lsnmf': <class 'nimfa.methods.factorization.lsnmf.Lsnmf'>, 'none': None, 'pmfcc': <class 'nimfa.methods.factorization.pmfcc.Pmfcc'>, 'bmf': <class 'nimfa.methods.factorization.bmf.Bmf'>, 'psmf': <class 'nimfa.methods.factorization.psmf.Psmf'>, 'nmf': <class 'nimfa.methods.factorization.nmf.Nmf'>, 'pmf': <class 'nimfa.methods.factorization.pmf.Pmf'>, 'snmf': <class 'nimfa.methods.factorization.snmf.Snmf'>, 'nsnmf': <class 'nimfa.methods.factorization.nsnmf.Nsnmf'>}

The error does not seem to present in Python 2.7.

Help - How to estimate Coef (as in the H matrix) from a pre-computed W (the base matrix)

Hello;

I hope you can provide me with a hint to figure this out.

Assume we already have successfully computed X ~= W1 * H1, using SNMF.

Now, say we have an X2 and we want to use the same W1 (as above, the pre-computed W1) to get a new H that can best use the existing W to estimate X2. How would one do this?

Do I just call:

 snmf = nimfa.Snmf(X2 , ..., W=W1 ...) 
 fit = snmf()
 H2 = fit.coef()
 The new estimated X2 ~= W1 * H2

If my hunch is correct, then I think there is a problem, but I am not sure I am using NIMFA correctly, hence this question.

Thanks!

Regards;

TypeError: super() takes at least 1 argument (0 given)

I have tried both 1.1 and master (as of 2015-03-19) using Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win32 and both produce "TypeError: super() takes at least 1 argument (0 given)" when running:

import nimfa
V = nimfa.examples.medulloblastoma.read(normalize=True)
lsnmf = nimfa.Lsnmf(V, seed='random_vcol', rank=50, max_iter=100)

Traceback (most recent call last):
File "", line 1, in
File "nimfa\methods\factorization\lsnmf.py", line 145, in init
super().init(vars())
TypeError: super() takes at least 1 argument (0 given)

Warnings in snmf.py and performance

I'm seeing warning

nimfa/methods/factorization/snmf.py:610: RuntimeWarning: invalid value encountered in power
  np.mat(2 ** np.array(list(range(l_var - 1, -1, -1)))), p_set)

I see this happens when l_var is 64 exceeding int64 range. Shouldn't we use floating point here like

np.mat(2 ** np.array(list(range(l_var - 1, -1, -1)))).astype(np.float64)

I'm also seeing most of time is spent on flatten() function that can be hoisted out like
from

for i in range(len(i_f)):
    alpha[i_f[i], j_f[i]] = t_d.todense().flatten()[0, i]

t_d_flattened = t_d.todense().flatten()
for i in range(len(i_f)):
    alpha[i_f[i], j_f[i]] = t_d_flattened[0, i]

negative evar in nsnmf and pmf.

Negative evar in nsnmf an pmf methods (version 1.2.1), with huge rss;

One example for nsnmf:
import nimfa
import numpy as np
V = np.random.rand(40, 100)
nsnmf = nimfa.Nsnmf(V, seed="random", rank=10, max_iter=12, theta=0.5)
nsnmf_fit = nsnmf()
nsnmf_fit.fit.evar()
-859782115351077.0
nsnmf_fit.fit.rss()
1.1591715624174236e+18

One example for Pmf:
V = nimfa.examples.medulloblastoma.read(normalize=True)
pmf = nimfa.Pmf(V, seed='nndsvd', rank=10, max_iter=100)
pmf_fit = pmf()
pmf_fit.fit.evar()
-875.87260969308397

Which method is a good balance between speed and accuracy?

This is not the correct place for this question, but as this project has no mailing list, I don't know where else to post it.

I need to factor some rather large matrices (300,000 examples and 30,000 features). I am wondering what method will be able to do so in a reasonable amount of time.

As the author of this library, you must be familiar with a large range of NMF methods. In your experience, which of your implementations is quick while at the same not not making big sacrifices in terms of quality? I know that this question will depend on the dataset and on how I define "quality," but do you have any favorite methods for larger matrices?

Best regards,

Conrad

Snmf factorization method does not maintain consistency of sparseness across runs if version='l'

The Snmf factorization method Snmf.factorize() does not handle transposes correctly across multiple runs.
In order to enforce sparseness on the left factor the method transposes self.V (lines 175-176) , fits the model and then back-transposes V and swaps W and H (lines 224-226). However, the initial transpose is done outside run loop which starts in line 178. Therefore, sparseness is not enforced on the same factor across runs, with odd runs being correct and even runs operating on the orignal self.V (self.V.T.T). Symptomatic for this, if nruns is even, W and H will be returned with the wrong dimensionality (i.e. swapped and transposed).
This should be easily fixable by moving lines 175-176 of snmf.py below line 178 to ensure a re-transpose of self.V before each new fit.
#36

Speed Benchmarks

Since there are multiple algorithms for operating NMF in https://ai.stanford.edu/~marinka/nimfa/ it would be good to do a speed and accuracy test.

PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices

Hi all,

Just started using your package (v1.4.0) and note the following warnings popping up from Numpy:

PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.

One place for example is the call to np.asmatrix here, but sure there are more.

Thanks

mims-harvard / nimfa Goto Github PK

nimfa's Introduction

Nimfa

Important links

Dependencies

Install

Use

Cite

Selected publications (Methods)

Selected publications (Applications)

Tutorials

nimfa's People

Contributors

Stargazers

Watchers

Forkers

nimfa's Issues

Recommend Projects

Recommend Topics

Recommend Org