Code Monkey home page Code Monkey logo

Comments (19)

stsievert avatar stsievert commented on May 13, 2024

Thanks for opening this issue. I think tensorly should definitely have a contrib.sparse package for specialized sparse algorithms.

I did run some sparse tensors through existing algorithms in https://gist.github.com/stsievert/e5782ee401f0cb01407984b07c257ca0. I did this to show that sparse tensors integrated well and played nicely with the existing codebase.

This did require some small modification of numpy_backend.py. These modifications amounted to avoiding certain functions that instantiated the sparse array as a dense array (e.g., np.dot which I think calls np.asarray).

The creation of sparse tensors has been created so K.tensor(**K.context) returns a sparse vector, so lines like _tucker.py#L65 work. I'm not sure if this would require a SparseTensor class.

from tensorly.

JeanKossaifi avatar JeanKossaifi commented on May 13, 2024

Cool proof of concept!

I added your suggestion as 4.
Generally, I would like to avoid adding mandatory dependencies if possible. Also keen on making it as easy as possible for other backends.

I feel that having a separate structure might be a happy medium? e.g.

from tensorly.contrib import sparse
t = sparse.sparse_tensor(...)

where t would either use the sparse library (NumPy backend), torch.sparse (PyTorch backend), NDArray.sparse (MXNet backend), etc..

from tensorly.

JeanKossaifi avatar JeanKossaifi commented on May 13, 2024

Following up on this - @stsievert, keen on merging your PR :)

So far, the best (and most flexible) way forward seems to me to create a new submodule, e.g. tensorly.contrib.sparse which would implement a sparse_tensor for each backend (similar to the main backend system). We can then work on these as we like and iterate quickly.

Any thoughts?

from tensorly.

stsievert avatar stsievert commented on May 13, 2024

I agree with you on a high level: I'd like to see a clear separation. However, I think there has to be some intermixing due to imperfections. I think the goal should be to touch the backend as little as possible, and not to provide any additional requirements. For example,

  • T.reshape had to be modified because np.reshape broadcasts to np.ndarray but x.reshape could be used for sparse and numpy arrays.
  • T.partial_svd had to modified to support the contrib.sparse module (because of some dot use).

Most of the modifications to numpy_backend.py could rely on some function _is_sparse (which is currently uses isinstance and adds another dependency, but we could also use "sparse" in type(tensor).__module__).

from tensorly.

JeanKossaifi avatar JeanKossaifi commented on May 13, 2024

I think they need to be separate (at least until the API is matured), and they are anyway separated in most frameworks (e.g. pytorch). I do agree that there is some intermixing due to imperfections.

What about an intermediate solution -- having contrib.sparse, which implements sparse_tensor
while still making the changes required to the numpy backend, when they do not affect the user?
We can quickly iterate and make breaking changes as long as we keep it in contrib.

from tensorly.

jcrist avatar jcrist commented on May 13, 2024

Hi Jean,

If you and @stsievert don't mind, I'm going to push on option 3 above, as I need some of this functionality for work. My immediate plan is to fork off of #64. and refactor that code to fit with the design you proposed above. I should have a WIP PR up sometime early this week to get quick feedback.

from tensorly.

JeanKossaifi avatar JeanKossaifi commented on May 13, 2024

Sounds good.

So adding contrib.space, implementing sparse_tensor?
I guess the easiest is to also have a submodule contrib.sparse.backend, and import sparse_tensor from there according to tensorly.get_backend().

from tensorly.

jcrist avatar jcrist commented on May 13, 2024

Yes, that was my plan. I plan to add a numpy backend, and dispatch on import of contrib.sparse (raising appropriately for the other backends).

from tensorly.

JeanKossaifi avatar JeanKossaifi commented on May 13, 2024

Awesome! Happy to help -- either on here, or on Gitter!

from tensorly.

jcrist avatar jcrist commented on May 13, 2024

Upon further thought/iteration, I'm not sure about the design of 3. What do you see as the api of using sparse tensors?

  1. Algorithm's work with either sparse/dense tensors, and potentially dispatch to custom sparse/dense implementations internally
from tensorly.contrib.sparse import sparse_tensor
from tensorly.decomposition import parafac

x = sparse_tensor(...)
factors, errors = parafac(x, ...)
  1. Sparse is completely separate, only routines in tensorly.contrib.sparse work with sparse tensors
from tensorly.contrib.sparse import sparse_tensor
from tensorly.contrib.sparse.decomposition import parafac

x = sparse_tensor(...)
factors, errors = parafac(x, ...)
  1. Sparse tensors work with some/all routines in main tensorly, and there may be custom implementations in the sparse module.
from tensorly.contrib.sparse import sparse_tensor
from tensorly.decomposition import parafac
from tensorly.contrib.sparse.decomposition import sparse_parafac

x = sparse_tensor(...)
factors, errors = parafac(x, ...)

# Parafac using a better algorithm for sparse tensors
factors, errors = sparse_parafac(x, ...)

from tensorly.

JeanKossaifi avatar JeanKossaifi commented on May 13, 2024

I think the realistic, easier case is 2, the ideal one 3.

In general, sparse tensors need specific, adapted algorithms. If we can also make it work with normal ones, even better but I do not think it should be the main objective. In addition, most backends still have incomplete support for sparse tensors, so it won't be possible in the near future..

In short I would focus on 3, making efficient algos optimised for sparse tensors, and if possible, work out the bugs that may appear when using sparse tensors with regular algos and converge to 2.

from tensorly.

mrocklin avatar mrocklin commented on May 13, 2024

In general, sparse tensors need specific, adapted algorithms. If we can also make it work with normal ones, even better but I do not think it should be the main objective.

I'm curious about these statements. In general agree that for optimal performance specialized algorithms will likely be necessary. However I think it would be interesting to see what happens if we just push pydata/sparse arrays through the current algorithms designed for dense arrays and see what happens. I think that @stsievert showed me a notebook once that did this and I was surprised at how effective it was (I expected things to blow up and densify pretty early on, but this didn't happen).

Because of this experience I wonder if a good step zero would be to just have a sparse backend alongside the others and use the existing algorithms, but with sparse implementations of the various operators (tensordot, transpose, etc..)

from tensorly.

JeanKossaifi avatar JeanKossaifi commented on May 13, 2024

Thanks for chiming in! I agree, adding sparse as a backend is one way forward. However, it also has some issues: it contradicts a little the backend system (supposed to be for executing transparently computation with different frameworks, hardware, etc). In the future it would make it harder to include similar sparse structure for, e.g. pytorch. It is also then less clear how to have separate algos for sparse tensors. It might be best (at least for now) to separate sparse tensors from "basic" tensors.

It seems to me that the approach 3 mentioned by @jcrist is the most powerful and flexible. It also encompasses just having sparse as a backend. This is why I think it could be the best way forward. Specifically, adding sparse as the backend for sparse support (e.g. sparse.sparse_tensor would simply call the sparse library). We can add specific algorithms for sparse tensors and still have the compatibility with the existing ones like we would if simply having a sparse backend. We can make small changes to the numpy backend as well to ensure compatibility, as mentioned by @stsievert.

@mrocklin do you see any issue with this approach?

from tensorly.

stsievert avatar stsievert commented on May 13, 2024

I think that stsievert showed me a notebook once that did this

https://gist.github.com/stsievert/e5782ee401f0cb01407984b07c257ca0

I'd also like to see (3) in #65 (comment).

I think it would be interesting to see what happens if we just push pydata/sparse arrays through the current algorithms designed for dense arrays and see what happens

👍

from tensorly.

JeanKossaifi avatar JeanKossaifi commented on May 13, 2024

So to summarise, the plan could be:

i) adding contrib.sparse.sparse_tensor, which will call the sparse library (with the numpy backend)
ii) see what happens when using sparse_tensor with the current algos
iii) make changes to the numpy backend to fix these (where possible and transparent to the user)
iv) implement sparse-specific algos in contrib.sparse (and utils, such as @stsievert's frosst dataloader)
Also adding @stsievert's notebook to the examples as it's pretty cool! :)

@jcrist might be worth double-checking with @stsievert to not duplicate work? @mrocklin any thoughts?

from tensorly.

jcrist avatar jcrist commented on May 13, 2024

see what happens when using sparse_tensor with the current algos

This is the part that's not clear to me given the proposed layout. Since the current algorithms all dispatch using the dense backends, how do you see the sparse backend being called by those algorithms? I can think of a few options off the top of my head:

  • Have the backend dispatch on sparse/dense internally, keep the algorithm code the same. This has the same downsides as your option 2 above, as it's the same implementation.
  • Have algorithms dispatch on sparse/dense using a backend-specific is_sparse function. A messier version of the above, not too keen on this.
  • Pull out the core bits of the algorithms to be parametrized by backend, then import them in tensorly.contrib.sparse.* to implement the sparse versions using the dense functions. This might look like:
# In tensorly.*
def _parametrized_by_backend_func(a, b, c, backend=None):
    ....

def dense_func(a, b, c):
    return _parametrized_by_backend_func(a, b, c, get_dense_backend())

# In tensorly.contrib.sparse.*
# As sparse
def sparse_func(a, b, c):
    return _parametrized_by_backend_func(a, b, c, get_sparse_backend())

I'd be curious to hear what you're envisioning for this @JeanKossaifi.

@jcrist might be worth double-checking with @stsievert to not duplicate work?

For transparency, I'm taking over a work project from @stsievert as he heads back to school (although if he has time he might still contribute :)).

from tensorly.

JeanKossaifi avatar JeanKossaifi commented on May 13, 2024

I totally agree with your first two points. Indeed it is not as trivial as one would wish to add sparse support...

  • Do you think that having sparse_tensor as part of the main backend (like @stsievert was suggesting) would be better (so we can use different method depending on whether input is dense or sparse)? Ideally I was hoping the backends would deal with sparse tensor natively (e.g. torch.svd would work for both dense and sparse) but since sparse is a separate library and not part of numpy I guess it creates issues...
  • The _parametrized_by_backend_func solves the issue but might be a little obfuscate? Do you think it's easier to write a specific sparse version, at least for now? This seems to be the approach in most libraries, and we can slowly start improving them to take better advantage of the sparse structure.

For transparency, I'm taking over a work project from @stsievert as he heads back to school (although if he has time he might still contribute :)).

Glad to have you onboard, and thanks for helping with the discussion! :)

from tensorly.

jcrist avatar jcrist commented on May 13, 2024

Apologies for the delay here. I've pushed an (unfortunately large) refactor of the backend mechanism in #76, and a follow up PR using that to add sparse functionality in #77. In short, these make changing the backend cheap, and then use that in the sparse contrib module to implement the _parametrized_by_backend_func version implicitly (rather than explicitly passing a backend around).

from tensorly.

JeanKossaifi avatar JeanKossaifi commented on May 13, 2024

Sparse support is added by #84 and subsequent PRs.

from tensorly.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.