Code Monkey home page Code Monkey logo

tensorly's Introduction

image

image

image

image

image

TensorLy

TensorLy is a Python library that aims at making tensor learning simple and accessible. It allows to easily perform tensor decomposition, tensor learning and tensor algebra. Its backend system allows to seamlessly perform computation with NumPy, PyTorch, JAX, TensorFlow or CuPy, and run methods at scale on CPU or GPU.


Installing TensorLy

The only pre-requisite is to have Python 3 installed. The easiest way is via the Anaconda distribution.

    With pip (recommended)         With conda

.. code:

pip install -U tensorly

.. code:

conda install -c tensorly tensorly

**Development

(from git)**

.. code:

# clone the repository
git clone https://github.com/te
cd tensorly
# Install in editable mode with
pip install -e .

nsorly/tensorly

-e or, equivalently, --editable

Note: TensorLy depends on NumPy by default. If you want to use other backends, you will need to install these packages separately.

For detailed instruction, please see the documentation.


Quickstart

Creating tensors

Create a small third order tensor of size 3 x 4 x 2, from a NumPy array and perform simple operations on it:

import tensorly as tl
import numpy as np


tensor = tl.tensor(np.arange(24).reshape((3, 4, 2)), dtype=tl.float64)
unfolded = tl.unfold(tensor, mode=0)
tl.fold(unfolded, mode=0, shape=tensor.shape)

You can also create random tensors:

from tensorly import random

# A random tensor
tensor = random.random_tensor((3, 4, 2))
# A random CP tensor in factorized form
cp_tensor = random.random_tensor(shape=(3, 4, 2), rank='same')

You can also create tensors in TT-format, Tucker, etc, see random tensors.

Setting the backend

You can change the backend to perform computation with a different framework. By default, the backend is NumPy, but you can also perform the computation using PyTorch, TensorFlow, JAX or CuPy (requires to have installed them first). For instance, after setting the backend to PyTorch, all the computation is done by PyTorch, and tensors can be created on GPU:

tl.set_backend('pytorch') # Or 'numpy', 'tensorflow', 'cupy' or 'jax'
tensor = tl.tensor(np.arange(24).reshape((3, 4, 2)), device='cuda:0')
type(tensor) # torch.Tensor

Tensor decomposition

Applying tensor decomposition is easy:

from tensorly.decomposition import tucker
# Apply Tucker decomposition 
tucker_tensor = tucker(tensor, rank=[2, 2, 2])
# Reconstruct the full tensor from the decomposed form
tl.tucker_to_tensor(tucker_tensor)

We have many more decompositions available, be sure to check them out!

Next steps

This is just a very quick introduction to some of the basic features of TensorLy. For more information on getting started, checkout the user-guide and for a detailed reference of the functions and their documentation, refer to the API

If you see a bug, open an issue, or better yet, a pull-request!


Contributing code

All contributions are welcome! So if you have a cool tensor method you want to add, if you spot a bug or even a typo or mistake in the documentation, please report it, and even better, open a Pull-Request on GitHub.

Before you submit your changes, you should make sure your code adheres to our style-guide. The easiest way to do this is with `black`:

pip install black
black .

Running the tests

Testing and documentation are an essential part of this package and all functions come with uni-tests and documentation.

The tests are ran using the pytest package. First install `pytest`:

pip install pytest

Then to run the test, simply run, in the terminal:

pytest -v tensorly

Alternatively, you can specify for which backend you wish to run the tests:

TENSORLY_BACKEND='numpy' pytest -v tensorly

Citing

If you use TensorLy in an academic paper, please cite1:

@article{tensorly,
  author  = {Jean Kossaifi and Yannis Panagakis and Anima Anandkumar and Maja Pantic},
  title   = {TensorLy: Tensor Learning in Python},
  journal = {Journal of Machine Learning Research},
  year    = {2019},
  volume  = {20},
  number  = {26},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v20/18-277.html}
}

  1. Jean Kossaifi, Yannis Panagakis, Anima Anandkumar and Maja Pantic, TensorLy: Tensor Learning in Python, Journal of Machine Learning Research (JMLR), 2019, volume 20, number 26.

tensorly's People

Contributors

aarmey avatar akiskefalas avatar asmeurer avatar borcuttjahns avatar braun-steven avatar caglayantuna avatar chrisyeh96 avatar cohenjer avatar cor3down avatar cyrillustan avatar earmingol avatar isabelllehmann avatar j6k4m8 avatar jacksonlchin avatar jcrist avatar jeankossaifi avatar juliagusak avatar kingsj0405 avatar lan496 avatar lili-zheng-stat avatar marieroald avatar maximeguillaud avatar merajhashemi avatar osmanmalik avatar samjohannes avatar sauravmaheshkar avatar scopatz avatar taylorpatti avatar yngvem avatar zongyi-li avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tensorly's Issues

Tensorly cannot be installed if the dependencies are not met beforehand.

Problem

Tensorly cannot be installed if the dependencies are not met beforehand.
The setup.py script is importing tensorly to grab the version; this is problematic, since importing tensorly requires that all the dependencies are already met, and we are importing it before setup(**config) gets run.

This defeats the purpose of using 'install_requires': ['numpy', 'scipy']

`robust_pca` does not compute on GPU using `mxnet` backend

(Disclaimer: This question might be more appropriate for a user forum -- but couldn't find one.)

Hi,

I'm trying to do video background subtraction as in this example using the mxnet backend on GPU.

I have successfully ran tensorly/mxnet notebooks with GPU acceleration by using contexts, e.g.:

context = mx.gpu()
shape = [150, 100, 100] # <== larger tensors should be faster on GPU
tensor = tl.tensor(rng.random_sample(shape), ctx=context)

That is, tensorly with mxnet backend nicely switched from CPU to GPU transparently.

However, this approach does not seem to work for robust_pca.

What I tried was to I wrap my video data data in a tensor X in a GPU context:

X = tl.tensor(data, ctx=mx.gpu()) # <- GPU context 
X = X.astype(np.float32)
tl.context(X)

and used this as an input for robust_pca:

from tensorly.decomposition import robust_pca
D, E = robust_pca(X, reg_E=0.05, learning_rate=1.6, n_iter_max=20)

Although the resulting tensors D and E are also in gpu context (as well as the tensors used internally, I checked), the computation itself seems to be done on CPU (as judged by CPU/GPU activity stats). Not, as expected, on GPU.

Am I missing something here? How do I switch this computation to GPU?

Here's the full notebook: https://gitlab.com/wdeback/robustPCA/blob/master/RobustPCA_on_GPU%3F.ipynb

Thanks in advance!

Improve function partial_svd

In function partial_svd, we perform svd which outputs full matrices. IMHO this is not always necessary. Full matrices are not needed under following condition:
(n_eigenvecs == min_dim)

I found this issue during performing tucker on a tensor of shape [3, 3, 1000, 1000] and rank [3, 3, 100, 100]. Without this little modification, this task can't be done on my machine due to lack of memory.

Parafac init checking poorly coded

In initialize_factors, in order to check the initialization method (random or svd), the code used is:
if init is 'random':
elif init is 'svd':

This is overly constraining, as for example, the following code will throw a ValueError:
init_mode = 'random'
parafac(tensor,rank,init=init_mode)

Because init_mode is 'random' is False. Ideally the code should be:
if init == 'random':
elif init == 'svd':

as it is implemented in the tucker decomposition code.

Error with set_backend()

Tensorly appears to be loading with 'mxnet' as a default backend, despite default_backend = 'numpy' in __init__.py.

I am unable to change the backend using tl.set_backend('numpy').

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-0d71df054526> in <module>()
     14             {k: v for (k, v) in backend.__dict__.items() if not k.startswith('_')
     15             })
---> 16 set_backend('numpy')

<ipython-input-3-0d71df054526> in set_backend(backend_name)
      6 
      7     # reloads tensorly.backend
----> 8     importlib.reload(backend)
      9 
     10     # reload from .backend import * (e.g. tensorly.tensor)

AttributeError: 'module' object has no attribute 'reload'

Am I missing something obvious? Also, there doesn't appear to be specific test for set_backend().

the 'backend' called by import.reload doesn't appear to be specified anywhere in the init.py file.

thanks for your help!

Question on error returning in parafac

Hi,

in tensorly.decomposition.parafac, it says that the error from each iteration is returned. Is there an inconsistency with the documentation or am I just unable to activate this return?

Thanks a lot for the great project, love tensorly

Best, Alex

Unable to use pytorch as the backend

I tried to use pytorch as the backend by using tl.set_backend('pytorch')

But I get an error mentioning that ModuleNotFoundError: No module named 'tensorly.backend.pytorch_backend'

I have pytorch installed and can check using import torch. I run pytorch code on my GPU without any issues

Thanks

Hidden dependency from nose package.

Reproduction:

mkvirtualenv tensor ## Python 3 default
make install
pip install pytest
TENSORLY_BACKEND='numpy' pytest -v tensorly

Result:

8 failed, 38 passed in 1.32 seconds

The 8 tests made use of the following import from backend numpy:

/numpy/testing/nose_tools/utils.py:71: ImportError

Solution:

pip install nose

Maybe put nose as a dependency (for tests)?

ValueError: matrix type must be 'f', 'd', 'F', or 'D'

Running the parafac function in this notebook from tensorly-notebooks results in the following traceback. Replacing the np.arange(24) call with np.arange(24, dtype='d') fixes the problem.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-c5e3bbd4ab85> in <module>
----> 1 factors = parafac(X, rank=2)

~/Repos/tensor_demo/envs/default/lib/python3.7/site-packages/tensorly/decomposition/candecomp_parafac.py in parafac(tensor, rank, n_iter_max, init, svd, tol, orthogonalise, random_state, verbose, return_errors, non_negative, mask)
    176     factors = initialize_factors(tensor, rank, init=init, svd=svd,
    177                                  random_state=random_state,
--> 178                                  non_negative=non_negative)
    179     rec_errors = []
    180     norm_tensor = tl.norm(tensor, 2)

~/Repos/tensor_demo/envs/default/lib/python3.7/site-packages/tensorly/decomposition/candecomp_parafac.py in initialize_factors(tensor, rank, init, svd, random_state, non_negative)
     99         factors = []
    100         for mode in range(tl.ndim(tensor)):
--> 101             U, _, _ = svd_fun(unfold(tensor, mode), n_eigenvecs=rank)
    102 
    103             if tensor.shape[mode] < rank:

~/Repos/tensor_demo/envs/default/lib/python3.7/site-packages/tensorly/backend/core.py in partial_svd(self, matrix, n_eigenvecs)
    668             if dim_1 < dim_2:
    669                 S, U = scipy.sparse.linalg.eigsh(
--> 670                     np.dot(matrix, matrix.T.conj()), k=n_eigenvecs, which='LM'
    671                 )
    672                 S = np.sqrt(S)

~/Repos/tensor_demo/envs/default/lib/python3.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py in eigsh(A, k, M, sigma, which, v0, ncv, maxiter, tol, return_eigenvectors, Minv, OPinv, mode)
   1661     params = _SymmetricArpackParams(n, k, A.dtype.char, matvec, mode,
   1662                                     M_matvec, Minv_matvec, sigma,
-> 1663                                     ncv, v0, maxiter, which, tol)
   1664 
   1665     with _ARPACK_LOCK:

~/Repos/tensor_demo/envs/default/lib/python3.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py in __init__(self, n, k, tp, matvec, mode, M_matvec, Minv_matvec, sigma, ncv, v0, maxiter, which, tol)
    511 
    512         _ArpackParams.__init__(self, n, k, tp, mode, sigma,
--> 513                                ncv, v0, maxiter, which, tol)
    514 
    515         if self.ncv > n or self.ncv <= k:

~/Repos/tensor_demo/envs/default/lib/python3.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py in __init__(self, n, k, tp, mode, sigma, ncv, v0, maxiter, which, tol)
    319 
    320         if tp not in 'fdFD':
--> 321             raise ValueError("matrix type must be 'f', 'd', 'F', or 'D'")
    322 
    323         if v0 is not None:

ValueError: matrix type must be 'f', 'd', 'F', or 'D'

Errors happened when resetting backend

I used the following official code to set backend for tensorly lib:
tl.set_backend('numpy')
and suffered from errors:
AttributeError: 'module' object has no attribute 'reload'
Actually I find importlib doesn't have reload function. How can I solve this problem? I would appreciate you kind reply.

Dangerous dependency handling during testing.

There is no guarantee that the dependencies used during testing will match the dependencies the project actually needs.

Currently we test using a conda environment that gets set up in the following manner:
- conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION numpy scipy

But there is no guarantee that the version used by conda will match the version in requirements.txt

ImportError: No module named Tensorly

Installed as:

$ source activate mlp3
$ conda install -c tensorly tensorly

$ python -V
Python 3.5.2 :: Continuum Analytics, Inc.
In [1]: import tensorly
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-91217a62ed42> in <module>()
----> 1 import tensorly

ImportError: No module named tensorly

In [2]: 
$ conda list | grep tensor
tensorflow                0.11.0rc2                py35_0    conda-forge
tensorly                  0.4.2                    py36_0    tensorly

I am on a Mac. Any help is much appreciated. Thank you

API for sparse tensors

API for sparse tensors in TensorLy

This thread is to discuss. collect opinions and thoughts about the best way to incorporate sparse tensors in TensorLy. @stsievert has already started adding sparse support in #64.

One thing to consider is whether we want sparse tensors as separate classes with their own methods (probably makes most sense computationally) or whether we want all methods to work with both dense and sparse tensors (might mean lots of inefficient to_dense conversions).

Some ways forward:

1- sparse_backend

Have a separate backend for sparse.

Pros

  • optional dependency, no issue with current numpy backend

Cons

  • separate backend for sparse, means that we are not leveraging pytorch or mxnet's sparse structure
  • not clear how to make specialize algorithms for the sparse structure in that context

Example

import tensorly as tl
tl.set_backend('sparse') 
a = tensor(...) #this would be a sparse tensor

2- as part of the numpy backend

This is valid for any backend: in addition to tensor, the backends would expose a sparse_tensor structure.

Pros

  • transparently works with all algos
  • going forward can be used to leverage other backend's sparse structure (but at the same time can be tough having a unified API for all)

Cons

  • additional dependency made mendatory (though we could have a try/except to just check if sparse is available)
  • forces us to handle sparse tensors in all functions, even where it might not make sense..

Example

import tensorly as tl
tl.set_backend('numpy') # the default anyway
t = tl.sparse_tensor(...) 

3- a separate sparse module

e.g. tensorly.contrib.sparse. This would load the correct structure depending on the current backend and add sparse_tensor (and potentially specialized algorithms)
Potentially the easiest way forward and seems to encompass the advantages of both previous strategies.

Pros

  • We can have specialised algorithms for sparse tensors
  • We can have the same backend system for sparse tensors as we do for the main tensors (e.g. we load the correct sparse_tensor structure depending on the backend)
  • No additional dependency

Cons

  • ...

Example

from tensorly.sparse import sparse_tensor

4- augment the existing tensor

Change the tensor structure to support both dense or sparse tensors, this would be specified in the context.

Pros

  • Nice transparent interface

Cons

  • Same as 2
  • Might be trickier to implement for other backends?

Example

import tensorly as tl
tl.set_backend('numpy') # the default anyway
t = tl.tensor(..., dtype='sparse') # sparsity is specified in the context

Non-negative tensor decompositions by NNLS

I haven't had good luck with multiplicative update algorithms. I'd like to leverage the code here - https://github.com/kimjingu/nonnegfac-python - to fit non-negative CP by alternating nonnegative least squares.

I'm happy to open a PR if you'd like to include this. The only tricky bit is how to cite/repurpose that code. I think we can add it to the requirements.txt as follows:

-e https://github.com/kimjingu/nonnegfac-python.git#egg=nonnegfac

cc @kimjingu

How to use the model_dot for tensor by tensor?

I have seen the tensorly/tensorly/tenalg/n_mode_product.py,However, I want to use this function for tensor by tensor.Is that possible?

for example:
X =T.tensor([[[1, 13],
[4, 16],
[7, 19],
[10, 22]],
[[2, 14],
[5, 17],
[8, 20],
[11, 23]]])

Y = T.tensor([[[1, 13],
[4, 16],
[7, 19],
[10, 22]],
[[2, 14],
[5, 17],
[8, 20],
[11, 23]],
[[3, 15],
[6, 18],
[9, 21],
[12, 24]]])
How can I get the result of mode_dot(X, Y, 1)??

segmentation fault when calling parafac() for CP decomposition

Hi,

I encountered segmentation faults when calling parafac() for CP decomposition. My code is the following:

import numpy as np
import tensorly as tl
from tensorly.decomposition import parafac

A = tl.tensor(np.random.rand(256, 64, 3, 3))
factor = parafac(A, rank=4)

As long as the rank is set larger than 3, it's going to fail. Any suggestions?

large sparse matrix causing crash during decomposition

When I try to use Tucker decomposition of a large sparse matrix, Tensorly crashes. I have used both the MXNet and NumPy backend, and both cause the crash due to memory issues.

The dimensions of my sparse matrix are (358, 556, 2). I was hoping to use Tensorly for even larger sparse matrices. I did not know if you guys intend to release any support for sparse matrices, or if perhaps something I am doing could be incorrect.

how to handle complex numbers?

I'm trying to use parafac on a complex tensor. I can handle the real and imaginary parts separately by calling PR = parafac(np.real(T), rank = k) and PI = parafac(np.imag(T), rank = k) and then I can operate on the two parts separately. However, what I really need is the decomposition of T into complex rank-1 tensors. Is there a simple way of achieving this?

Fix modes in PARAFAC

Hey everyone. Just wanted to add a feature idea here.

When using PARAFAC to process EEM data, it is very common to fix two modes (namely the modes for emission and excitation) to those values previously generated on a calibration dataset.
Thereby it is possible to estimate only the remaining mode (the concentration mode) and apply scaling factors that have been computed on the calibration data.

I added this possibility on a personal fork of tensorly and got the expected outcome on a test dataset. However I'm not sure if my implementation will uphold your coding standards here .. since it basically just overwrites the values of a mode with given values after each iteration..

Also many other constraints (besides the non-negativity) are usually discussed together with PARAFAC, depending on the field of application. E.g. these include orthogonality and unimodality constraints on the loading matrices or the possibility to fix single components in a mode.

I think it is worth working on those additions especially since for PARAFAC there isn't any good free alternative to the well known MATLAB-Toolboxes.

Kind Greetings and thanks for your good work so far,
Gordon

Unfold fuction

I tried to create my own PARAFAC decomposition but when I tried with the tensorly.unfold() it did not work as it should, but when I created my own unfold func (based on another library) it worked just fine. I could not grasp what kind of error it problably was, but it seems that it's not what it should be.

If you wanna see the comparison between the two unfolding:
https://github.com/mateuspontesm/Tensor/blob/master/Tensor1.ipynb

robust_pca sample

Hello,

I just execute this sample:
https://jeankossaifi.github.io/tensorly/rpca.html

Step by step with same data, but during the execution of that:
low_rank_part, sparse_part = robust_pca(X, n_iter_max=20)
I got:
AssertionError Traceback (most recent call last)
in ()
----> 1 low_rank_part, sparse_part = robust_pca(X, n_iter_max=20)

C:\ProgramData\Anaconda3\lib\site-packages\tensorly\decomposition\robust_decomposition.py in robust_pca(X, mask, tol, reg_E, reg_J, mu_init, mu_max, learning_rate, n_iter_max, random_state, verbose)
76
77 # Initialise the decompositions
---> 78 D = T.zeros_like(X) # low rank part
79 E = T.zeros_like(X) # sparse part
80 L_x = T.zeros_like(X) # Lagrangian variables for the (X - D - E - L_x/mu) term

C:\ProgramData\Anaconda3\lib\site-packages\mxnet\ndarray\register.py in zeros_like(data, out, name, **kwargs)

AssertionError: Argument data must have NDArray type, but got [[[ 6.79022540e+00 1.77902254e+01 2.17902254e+01 ..., 3.79022540e+00
0.00000000e+00 -2.20977460e+00]
[ 5.79022540e+00 1.67902254e+01 1.57902254e+01 ..., 6.79022540e+00
-5.20977460e+00 -1.02097746e+01]
[ 5.79022540e+00 1.37902254e+01 2.27902254e+01 ..., 4.79022540e+00
-4.20977460e+00 -1.72097746e+01]
...,
[ -5.32097746e+01 -4.42097746e+01 -4.12097746e+01 ..., -3.12097746e+01
-3.72097746e+01 -3.92097746e+01]
[ -5.32097746e+01 -5.32097746e+01 -5.32097746e+01 ..., 2.55000000e+02
-5.02097746e+01 -5.92097746e+01]
[ -6.12097746e+01 -5.92097746e+01 0.00000000e+00 ..., -4.92097746e
...

Thanks

Cheers

Unable to target GPU with MXNET backend

I need some assistance with targeting GPU with MXNET backend. I attempted to use the Robust PCA API with tensors on GPU. Using the following code, execution is occurring on the CPU cores and not GPU(0). Thanks!

image

Optimization submodule

Adding an optimization module

For now, Tensorly (TL) ships with one API for each particular tensor decomposition model. While this has the advantage of simplicity for the end-user, this limits the settings where TL can be used.

Proposition: optimization submodule

I think a very cool feature of tensorly would be, just like in PyTorch or SciPy, to have an optimization module with a few usual algorithms often used for constrained tensor decompositions (nonnegative least squares, Lasso, Gauss Newton, ADMM...). This is important in my opinion since specific applications sometimes require specific algorithms and constraint sets, and there should not be a default one for every scenario.

API Design

There are mainly two possibilities for such an integration.

1/ Do not use a separate submodule, but instead make one API for each constrained factorization (such as the current non_negative_parafac) with one default optimization method.
2/ Write a contrib.optim module, and use it in a decomposition.constrained_parafac function where one may choose the optimization method and the constraint set (among a few specific choices).

My opinion: go for contrib.optim. But I will make some tests and see how both solutions behave in practice. The only constraint I think should be handled in a very different way is sparsity, along the lines of issue #79.

Remark: using optimization backends?

After some tests and thinking (see for instance this post) I do not think using an optimization backend such as CVXPY for instance is satisfactory since:

  • It increases the dependencies, thus development and maintenance complexity
  • It is non trivial to make use of the particular structure of tensor decompositions in already existing, non tensor-optimized software.

python crashed when call argmax

import tensorly as tl
import numpy as np

# default backend is mxnet
x = np.random.random(100)

# crashed here
tl.tensor(x).argmax()
# or
np.argmax(tl.tensor(x))

My numpy version 1.12.1, tensorly version 0.2.0

Support generic ndarray interface

Hi all, nice project here.

I'd like to ask some questions about generalizing these algorithms to other classes that support the numpy.ndarray interface like dask array, sparse arrays, or cupy/chainer gpu arrays. I have two long term objectives for these questions:

  1. Establish a tensor factorization code that can work in a variety of situations (dense CPU, dense GPU, sparse CPU, distributed dense CPU, distributed sparse CPU, distributed dense GPU)
  2. Improve the state of numpy-style array computing in Python so that it is easier to write code that applies generally to many of these situations.

So objective 1 is a specific case that I hope to use to push on objective 2.

Looking at the backend mechanism in this library it looks like you've already isolated many of the API points that this project would need from an upstream array library:

  1. ufuncs
  2. tensordot
  3. einsum
  4. svd
  5. slicing
  6. ... (what am I missing?)

However I suspect that, even if we're able to make things produce accurate results, that there are likely to be cases in the tensorly codebase where performance suffers due to how things are written not being well optimized for certain data structures.

Some general questions for tensorly maintainers

  1. Are there any situations in the tensorly codebase that you think might be problematic when extending to sparse or distributed arrays where the performance profile may change (the relative costs of different operations may change significantly)
  2. Is altering tensorly to make these other data layouts effiicient in-scope for the project?
  3. Are there other concerns that you anticipate?

To be clear, my personal objective is to support these use cases without making several other backend wrappers. Ideally we evolve numpy to the point where that single backend file suffices for any project that looks sufficiently like a numpy array.

cc

Comment to those cc'ed above. We've all spoken about pushing on the generic ndarray interface. Tensor factorization seems like a nice case study where many of the ndarray projects would be relevant.

weights in parafac decomposition

In parafac decomposition, the docs mention a "weights" parameter being returned as given below:

weights : ndarray, optional
Array of length rank of weights for each factor matrix. See the with_weights keyword attribute.

But they're not returned. Nor does the source have any mention of it.

Tensor regression with `y` of arbitrary order

Right now using something like

estimator = TuckerRegressor(weight_ranks=[5], tol=10e-7, n_iter_max=100, reg_W=1, verbose=0)
estimator.fit(X,y)

demands that X be of dimensions (num_samples, num_features) and y be of dimension (num_samples). Why can't X and y be arbitrary tensor dimensions? For example, if I want to find a weight tensor that will minimize the error between an image (num_images, width, height, channels) and its rotated version (num_images, width, height, channels), it doesn't appear I can do that with the tensor regression API. Or if I want to learn a tensor that will contract with a vector to produce some simple 2d image (i.e. a matrix).

Tensor algebra without constructing the full tensor

I think this might fall under a feature request, unless you can already do it (please tell me how!?!).

It would be nice if we could do operations on the decomposed tensors, in their decomposed form.
(or does this sort of thing get done in the various compiler optimisations???)

A = tf.random_normal([1,n])
B = tf.random_normal([1,n])

# where the tensor object contains the cores and factors, not the full tensor
tensor = tucker(tf.random_normal([n,n,n]), rank=[2, 2, 2])
reduce_tensor = multi_mode_dot(tensor, [A, B], modes=[0, 1])

In this case it is possible to evaluate the multi_mode_dot op without constructing the full tucker decomposed tensor. If the factors = [U, V, W] then we can simply evaluate something like W^T.(core x A.U x B.V).

Mxnet install dependancy

Hi,
Its not possible to install without mxnet.
However since there are other backends like PyTorch, maybe its best to not have this as a dependency, and choose the backend with a .config file like in Keras.

Adding tensor classes?

Adding tensor classes

This is an issue to open to discussion a long standing design decision, namely whether to create subclasses for tensors.

Specifically there are two design decisions to be made:

  1. whether to create a tensor class that subclasses each of the backend's ndarray structures
  2. whether to create classes for the decomposed tensors (e.g. Kruskal and Tucker)

Feel free to leave a comment with your opinions!


1- Core tensor class: To inherit or not to inherit?

We could create a class tensor, that, for each backend, would inherit from that backend's ndarray structure, e.g.

class Tensor(backend.NDArray):
    ...

Pros

This would offer a nice user interface. We could have fancy syntax such as:

unfolding = tensor.unfold(mode=0)
unfolding.fold()
tensor.mode_dot(matrix)

Currently, while folding, one needs to pass along the full-tensor's shape as this isn't stored when unfolding.

Cons

The main draw back is the added complexity in the code. Specifically, when subclassing the ndarray's class:

  • This needs to be done for each backend (with the issue that they don't all work the same. In NumPy for instance, we would need to call new and array_finalize).
  • Need to redefine all operations (add, mul, etc)
  • Potentially breaks when trying to manipulate jointly a tensorly.tensor and an ndarray from the original backend?
 We’d have to make sure that all operations return a tensorly.tensor. However, when calling native function from the backend library, these might return ndarrays while the user might expect tensorly.tensors

2- Classes for the decomposed tensors

This part is less controversial in that in doesn't break anything and doesn't seem to really have many cons. It would consist in adding classes for the tensors in decomposed form, e.g. KruskalTensor, TuckerTensor, etc -- right now we only have functions that return tuples of native ndarrays from each backend.

a- Pros

  • Shorter syntax (e.g. tensor.mode_dot(matrix) VS kruskal_mode_dot(tensor, matrix)
  • User needs not think about which function to call
  • Makes it easier to operate directly on decomposed tensors

b- Cons

  • More code to maintain, and more complexity in the codebase -- but in that case doesn't seem to be a major issue.
  • The big question is do we still allow simple tuples or do we enforce all decompose tensors to be the right class? If we enforce, we can simply check with type the decompose tensor is and call the correct method in the functions. For instance, mode_dot(tensor, matrix, mode) would work transparently on both full tensors and decomposed ones.

c- Possible implementations

The most straightforward would be to subclass tuple or named tuple. This has the advantage of not breaking the current API, users can still directly get e.g.

core, factors = tucker(tensor, rank)

would be as valid as

tucker_tensor_instance = tucker(tensor, rank)

Numpy v1.14.0 not supported.

As of Jan 12, the latest version of Numpy is not supported.
Running:
TENSORLY_BACKEND=numpy pytest -v --cov tensorly tensorly

The tests fail inside test_kronecker() at:

 for i, shape in enumerate(shapes):
    T.assert_array_equal(res, kr)

tucker-2 decomposition

i was wondering that how to apply a tucker-2 decomposition using tensorly? it seems that it only supports for standard Tucker-Decomposition, anyone can help me?

Backend 1-Dimensional Output Inconsistency

The way different backends output certain 1-dimensional data is inconsistent. In particular, the dot operation under the MXNet backend outputs 1x1 tensor whereas under the Pytorch backend the output is a scalar.

>>> import tensorly as T
>>> T.set_backend('mxnet')
Using mxnet backend.
>>> T.assert_equal(T.dot(T.tensor([1,0]), T.tensor([0,1])), 0.0)
AssertionError: 
Items are not equal:
 ACTUAL: 
[ 0.]
<NDArray 1 @cpu(0)>
 DESIRED: 0.0
>>> T.set_backend('pytorch')
Using pytorch backend.
>>> T.assert_equal(T.dot(T.tensor([1,0]), T.tensor([0,1])), 0.0)

The issue came up when writing tests which work across all backends. I can think of several solutions to this issue but they all have certain design implications which require @JeanKossaifi 's input. For example,

  • Modify MXNet's or Pytorch's dot() command. - The downside is that one need to remember this design decision for all new operators.
  • Manually cast all output into tensors - Force the user to always convert output to T.tensor([scalar_output]). However, this comes with its own issues since the MXNet case scalar_output is not actually scalar.
  • ?

Feature Request: Memory Management

I was wondering if there are any plans to make available means of memory management? It would be really helpful in debugging out of memory errors that pop up (I've been specifically running into this issue, so am interested in a feature to help debug.) Thanks!

Handling big datasets for Robust PCA

I tried to run Robust PCA on a torch array. The dimensions are around 500000*375. They array can perfectly fit on my GPU as I ran robust matrix decomposition without any issues. I am sure why robust PCA can't fit or even require 2310 GB of memory. Also is the memory GPU or CPU memory ?

My system has 512 GB of CPU memory and 32 GB of GPU memory

$ Torch: not enough memory: you tried to allocate 2310GB. Buy new RAM! at /pytorch/torch/lib/TH/THGeneral.c:246

Transform-based Tensor Model

Implementing Transform-based Tensor Model in Python:

List

  1. Transform-based Tensor Model [1][2]
    a) the low-tubal-rank tensor model [2], tSVD when the transform is DFT (Discrete Fourier Transform)
    b) general transforms when the transform is DCT (Discrete Cosine Transform), DWT (Discrete Wavelet Transform)
  2. Tensor completion [3]: using tensor alternating minimization
  3. Tensor sensing [4]: using tensor alternating minimization
  4. Tensor sparse coding [5][6]
  5. Tensor subspace detection [7]

[1] Xiao-Yang Liu and Xiaodong Wang. Fourth-order Tensors with Multidimensional Discrete Transforms, 2017. https://arxiv.org/abs/1705.01576
[2] Kilmer, M. E., Braman, K., Hao, N., & Hoover, R. C. (2013). Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging. SIAM Journal on Matrix Analysis and Applications, 34(1), 148-172.
[3] Xiao-Yang Liu, Shuchin Aeron, Vaneet Aggarwal, Xiaodong Wang. Low-tubal-rank Tensor Completion using Alternating Minimization. (revision) IEEE Transaction on Information Theory, arXiv: https://arxiv.org/abs/1610.01690
[4] Tao Deng, Feng Qian, Xiao-Yang Liu, Manyuan Zhang, Anwar Walid. Tensor Sensing for RF Tomographic Imaging. ICME 2018.
[5] Fei Jiang, Xiao-Yang Liu, Hongtao Lu, Ruimin Shen. Efficient Multi-dimensional Tensor Sparse Coding Using t-linear Combinations. AAAI 2018.
[6] Fei Jiang, Xiao-Yang Liu, Hongtao Lu, Ruimin Shen. Graph Regularized Tensor Sparse Coding for Image Representation. IEEE ICME 2017.
[7] Cuiping Li, Yue Sun, Xiao-Yang Liu, Ying Li. Tensor Subspace Detection with Tubal-sampling and Elementwise-sampling. IEEE ICASSP 2018.

logger and accum are undefined names

flake8 testing of https://github.com/tensorly/tensorly on Python 2.7.13

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./doc/sphinx_ext/sphinx_gallery/py_source_parser.py:82:13: F821 undefined name 'logger'
            logger.warning(
            ^

./tensorly/decomposition/candecomp_parafac.py:149:21: F821 undefined name 'accum'
                    accum[:] = accum*T.dot(T.transpose(nn_factors[e]), nn_factors[e])
                    ^

./tensorly/decomposition/candecomp_parafac.py:149:32: F821 undefined name 'accum'
                    accum[:] = accum*T.dot(T.transpose(nn_factors[e]), nn_factors[e])
                               ^

Does Tensorly with PyTorch backend support variables that require gradient?

Hi,
In an end-to-end deep learning framework with SGD optimizer, I'm calling the function "tucker" on a 4D tensor (of size 200, 128, 4, 4) with pytorch in the backend. The rank parameter is set to 200, 10, 4, 4.
I get this error:
RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

It seems that when applying the function partial_svd on the mode 1 unfolded tensor, a conversion to numpy matrix is done (that is line 295 in pytorch_backend.py). I'm not sure if that conversion and the related if/else block (from line 279 to the end in pytorch_backend.py).) is necessary because when a rank is not decreased torch.svd is called and so should be in this case.

Thanks

Robust PCA

Thank you very much for the package, it is great work

I am trying to use the robust_pca function on a large dataset and the computation is very slow. This issue seems to be referenced in #18, #36, and #23. I initially tried to use the pytorch backend, thinking that the GPU would speed up the computation. However, I see that in all backends, the partial_svd function converts the data into a numpy array and uses the numpy backend (scipy.linalg.svd or scipy.sparse.linalg.eigsh) to calculate the SVD.

My question is two-fold. If I am following the code correctly, it would appear that robust_pca will never use the scipy.sparse.linalg.eigsh function. If we examine the code and follow the chain of calculations starting in robust_pca we have a call to svd_thresholding. The minimum size of the shape of the input matrix is used as the n_eigenvecs keyword (see here). However, once partial_svd is called, partial_svd checks n_eigenvecs >= min_dim (see here) to decide to calculate the standard SVD or the partial SVD. Since n_eigenvecs == min_dim (because that is what was passed), the check always passes and the partial SVD is never used for calculation.

Is this intentional or should the >= be a >? If it is intentional, i.e., we want to calculate the standard SVD every time, wouldn't it be more computationally efficient to enable backends with that capability to use the GPU enabled SVD calculations? I'm specifically thinking of pytorch which does have an SVD calculation.

Thank you very much for the help

Sparse array support

Sparse arrays have been mentioned in a few issues. Are there concrete plans for this?

Assuming that no, there are no concrete plans, I'd be inclined to throw this sparse array library as a backend and see how well the current algorithms work. My guess is that things would densify pretty quickly and that sparse-specific algorithms are likely to be necessary, but it might be worth a try.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.