Comments (19)
Thanks for opening this issue. I think tensorly should definitely have a contrib.sparse
package for specialized sparse algorithms.
I did run some sparse tensors through existing algorithms in https://gist.github.com/stsievert/e5782ee401f0cb01407984b07c257ca0. I did this to show that sparse tensors integrated well and played nicely with the existing codebase.
This did require some small modification of numpy_backend.py
. These modifications amounted to avoiding certain functions that instantiated the sparse array as a dense array (e.g., np.dot
which I think calls np.asarray
).
The creation of sparse tensors has been created so K.tensor(**K.context)
returns a sparse vector, so lines like _tucker.py#L65 work. I'm not sure if this would require a SparseTensor
class.
from tensorly.
Cool proof of concept!
I added your suggestion as 4.
Generally, I would like to avoid adding mandatory dependencies if possible. Also keen on making it as easy as possible for other backends.
I feel that having a separate structure might be a happy medium? e.g.
from tensorly.contrib import sparse
t = sparse.sparse_tensor(...)
where t would either use the sparse library (NumPy backend), torch.sparse (PyTorch backend), NDArray.sparse (MXNet backend), etc..
from tensorly.
Following up on this - @stsievert, keen on merging your PR :)
So far, the best (and most flexible) way forward seems to me to create a new submodule, e.g. tensorly.contrib.sparse
which would implement a sparse_tensor
for each backend (similar to the main backend system). We can then work on these as we like and iterate quickly.
Any thoughts?
from tensorly.
I agree with you on a high level: I'd like to see a clear separation. However, I think there has to be some intermixing due to imperfections. I think the goal should be to touch the backend as little as possible, and not to provide any additional requirements. For example,
T.reshape
had to be modified becausenp.reshape
broadcasts tonp.ndarray
butx.reshape
could be used for sparse and numpy arrays.T.partial_svd
had to modified to support thecontrib.sparse
module (because of somedot
use).
Most of the modifications to numpy_backend.py
could rely on some function _is_sparse
(which is currently uses isinstance
and adds another dependency, but we could also use "sparse" in type(tensor).__module__
).
from tensorly.
I think they need to be separate (at least until the API is matured), and they are anyway separated in most frameworks (e.g. pytorch). I do agree that there is some intermixing due to imperfections.
What about an intermediate solution -- having contrib.sparse
, which implements sparse_tensor
while still making the changes required to the numpy backend, when they do not affect the user?
We can quickly iterate and make breaking changes as long as we keep it in contrib.
from tensorly.
Hi Jean,
If you and @stsievert don't mind, I'm going to push on option 3 above, as I need some of this functionality for work. My immediate plan is to fork off of #64. and refactor that code to fit with the design you proposed above. I should have a WIP PR up sometime early this week to get quick feedback.
from tensorly.
Sounds good.
So adding contrib.space
, implementing sparse_tensor
?
I guess the easiest is to also have a submodule contrib.sparse.backend
, and import sparse_tensor
from there according to tensorly.get_backend()
.
from tensorly.
Yes, that was my plan. I plan to add a numpy backend, and dispatch on import of contrib.sparse
(raising appropriately for the other backends).
from tensorly.
Awesome! Happy to help -- either on here, or on Gitter!
from tensorly.
Upon further thought/iteration, I'm not sure about the design of 3. What do you see as the api of using sparse tensors?
- Algorithm's work with either sparse/dense tensors, and potentially dispatch to custom sparse/dense implementations internally
from tensorly.contrib.sparse import sparse_tensor
from tensorly.decomposition import parafac
x = sparse_tensor(...)
factors, errors = parafac(x, ...)
- Sparse is completely separate, only routines in
tensorly.contrib.sparse
work with sparse tensors
from tensorly.contrib.sparse import sparse_tensor
from tensorly.contrib.sparse.decomposition import parafac
x = sparse_tensor(...)
factors, errors = parafac(x, ...)
- Sparse tensors work with some/all routines in main tensorly, and there may be custom implementations in the sparse module.
from tensorly.contrib.sparse import sparse_tensor
from tensorly.decomposition import parafac
from tensorly.contrib.sparse.decomposition import sparse_parafac
x = sparse_tensor(...)
factors, errors = parafac(x, ...)
# Parafac using a better algorithm for sparse tensors
factors, errors = sparse_parafac(x, ...)
from tensorly.
I think the realistic, easier case is 2, the ideal one 3.
In general, sparse tensors need specific, adapted algorithms. If we can also make it work with normal ones, even better but I do not think it should be the main objective. In addition, most backends still have incomplete support for sparse tensors, so it won't be possible in the near future..
In short I would focus on 3, making efficient algos optimised for sparse tensors, and if possible, work out the bugs that may appear when using sparse tensors with regular algos and converge to 2.
from tensorly.
In general, sparse tensors need specific, adapted algorithms. If we can also make it work with normal ones, even better but I do not think it should be the main objective.
I'm curious about these statements. In general agree that for optimal performance specialized algorithms will likely be necessary. However I think it would be interesting to see what happens if we just push pydata/sparse arrays through the current algorithms designed for dense arrays and see what happens. I think that @stsievert showed me a notebook once that did this and I was surprised at how effective it was (I expected things to blow up and densify pretty early on, but this didn't happen).
Because of this experience I wonder if a good step zero would be to just have a sparse backend alongside the others and use the existing algorithms, but with sparse implementations of the various operators (tensordot, transpose, etc..)
from tensorly.
Thanks for chiming in! I agree, adding sparse
as a backend is one way forward. However, it also has some issues: it contradicts a little the backend system (supposed to be for executing transparently computation with different frameworks, hardware, etc). In the future it would make it harder to include similar sparse structure for, e.g. pytorch. It is also then less clear how to have separate algos for sparse tensors. It might be best (at least for now) to separate sparse tensors from "basic" tensors.
It seems to me that the approach 3 mentioned by @jcrist is the most powerful and flexible. It also encompasses just having sparse as a backend. This is why I think it could be the best way forward. Specifically, adding sparse
as the backend for sparse support (e.g. sparse.sparse_tensor
would simply call the sparse library). We can add specific algorithms for sparse tensors and still have the compatibility with the existing ones like we would if simply having a sparse backend. We can make small changes to the numpy backend as well to ensure compatibility, as mentioned by @stsievert.
@mrocklin do you see any issue with this approach?
from tensorly.
I think that stsievert showed me a notebook once that did this
https://gist.github.com/stsievert/e5782ee401f0cb01407984b07c257ca0
I'd also like to see (3) in #65 (comment).
I think it would be interesting to see what happens if we just push pydata/sparse arrays through the current algorithms designed for dense arrays and see what happens
👍
from tensorly.
So to summarise, the plan could be:
i) adding contrib.sparse.sparse_tensor
, which will call the sparse library (with the numpy backend)
ii) see what happens when using sparse_tensor
with the current algos
iii) make changes to the numpy backend to fix these (where possible and transparent to the user)
iv) implement sparse-specific algos in contrib.sparse
(and utils, such as @stsievert's frosst dataloader)
Also adding @stsievert's notebook to the examples as it's pretty cool! :)
@jcrist might be worth double-checking with @stsievert to not duplicate work? @mrocklin any thoughts?
from tensorly.
see what happens when using sparse_tensor with the current algos
This is the part that's not clear to me given the proposed layout. Since the current algorithms all dispatch using the dense backends, how do you see the sparse backend being called by those algorithms? I can think of a few options off the top of my head:
- Have the backend dispatch on sparse/dense internally, keep the algorithm code the same. This has the same downsides as your option 2 above, as it's the same implementation.
- Have algorithms dispatch on sparse/dense using a backend-specific
is_sparse
function. A messier version of the above, not too keen on this. - Pull out the core bits of the algorithms to be parametrized by backend, then import them in
tensorly.contrib.sparse.*
to implement the sparse versions using the dense functions. This might look like:
# In tensorly.*
def _parametrized_by_backend_func(a, b, c, backend=None):
....
def dense_func(a, b, c):
return _parametrized_by_backend_func(a, b, c, get_dense_backend())
# In tensorly.contrib.sparse.*
# As sparse
def sparse_func(a, b, c):
return _parametrized_by_backend_func(a, b, c, get_sparse_backend())
I'd be curious to hear what you're envisioning for this @JeanKossaifi.
@jcrist might be worth double-checking with @stsievert to not duplicate work?
For transparency, I'm taking over a work project from @stsievert as he heads back to school (although if he has time he might still contribute :)).
from tensorly.
I totally agree with your first two points. Indeed it is not as trivial as one would wish to add sparse support...
- Do you think that having
sparse_tensor
as part of the main backend (like @stsievert was suggesting) would be better (so we can use different method depending on whether input is dense or sparse)? Ideally I was hoping the backends would deal with sparse tensor natively (e.g. torch.svd would work for both dense and sparse) but sincesparse
is a separate library and not part of numpy I guess it creates issues... - The
_parametrized_by_backend_func
solves the issue but might be a little obfuscate? Do you think it's easier to write a specific sparse version, at least for now? This seems to be the approach in most libraries, and we can slowly start improving them to take better advantage of the sparse structure.
For transparency, I'm taking over a work project from @stsievert as he heads back to school (although if he has time he might still contribute :)).
Glad to have you onboard, and thanks for helping with the discussion! :)
from tensorly.
Apologies for the delay here. I've pushed an (unfortunately large) refactor of the backend mechanism in #76, and a follow up PR using that to add sparse functionality in #77. In short, these make changing the backend cheap, and then use that in the sparse contrib module to implement the _parametrized_by_backend_func
version implicitly (rather than explicitly passing a backend around).
from tensorly.
Sparse support is added by #84 and subsequent PRs.
from tensorly.
Related Issues (20)
- Example data is missing HOT 2
- Would it be possible to do a non-negative partial Tucker factorization? HOT 8
- Error when running sparse Robust PCA HOT 5
- Optional order parameter in tl.reshape can't be used with PyTorch backend HOT 4
- Further testing for preserving tensor context with operations HOT 4
- Error encountered when using tensorly.decomposition.parafac with high rank and GPU HOT 2
- Can I impute data using Tucker or CP Decomposition for categorical data? HOT 1
- make_svd_non_negative only returns the updated U matrix HOT 2
- All nan in matrix come from non negative tucker decomposition HOT 2
- Init mode == "random" does not return the correct shape in initialize_tucker HOT 3
- It appears that partial_unfold works using sparse tensors, but it is not clear in the documentation
- Better random init of factorized tensors HOT 1
- svd_interface will throw an error if the number of rows of the matrix is smaller than it's columns HOT 1
- numpy.core._exceptions._ArrayMemoryError HOT 2
- Is there any t-product implementation code in tensorly?Thanks HOT 1
- More descriptive message when random PARAFAC2 rank is infeasible given shape HOT 1
- AssertionError: `tensorly.tt_tensor.validate_tt_rank` test HOT 1
- Randomised_CP function throws a Singular Matrix error HOT 2
- Tensor Conversion in TensorLy Does Not Preserve PyTorch Tensor dtype and device Attributes
- PARAFAC2 for missing data HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorly.