Comments (4)
I'm currently working with matrices where >99% of the values of 0, and it's very unlikely most of the blocks will be larger than 1x1 in size. However, the libsmm_acc code makes me think it could still be useful there, since SMM is used for the smaller blocks and there's some cuda optimization.
The <1% occupancy is fine for the library. However, as you pointed out, the library is really optimized for block-sparsity (DBCSR = distributed block CSR), so you will not get any special optimization for the multiplication of such blocks (they will simply run on the CPU, i.e. no GPU involved).
I was able to create large matrices by modifying the example dbcsr_example_3.cpp and got 1 teraflop performance, but with occupation of 1.
For this particular case of dense matrices, we run a "densification" algorithm.
It looks like when I assign values with c_dbcsr_iterator_next_2d_block_d, it treats the blocks like full 2D matrices. I could still calculate the 1x1 blocks and their resulting locations, but I feel like that wouldn't be very fast.
This is correct, we always assume blocks, so single elements are treated as 1x1 blocks. We don't run any special optimizationf for that (it is a bit out of the scope for DBCSR). Still, the library will work, but I would assume the performance would be low.
In conclusion, sparsity is fine (we use to run tests with 0.01% of occupancy), but we do expect blocks-sparse to get the best performance out of the library.
from dbcsr.
Fair. I guess 'maximum local sparsity' should be something like 75% for a block size, and it might be useful to do a game of life style convolve, 'if 3 values in a square of 4 involving this location are non-zero, then count this zero as non-zero', then build an R-tree for 100% dense rectangles, if you know there's going to be grouping but you don't know exactly where.
One of my matrices might be like that in a best case scenario, with most non-zero elements along the diagonal, sand I might be able to just set some grouping on the diagonal ahead of time, but the other matrices will actually avoid clumping.
Anyway, I think that answers the question in this issue: libsmm_acc does not optimize 1x1 blocks, nor does any other part of the library, since that's outside the scope of DBCSR. If that's correct, I think this can be closed.
from dbcsr.
Anyway, I think that answers the question in this issue: libsmm_acc does not optimize 1x1 blocks, nor does any other part of the library, since that's outside the scope of DBCSR. If that's correct, I think this can be closed.
This is correct. Note that the library will work for 1x1 elements, but there will a massive overhead for assuming "blocks" (lookup of the dimensions and calculation of the position) that are useless for single elements... On the other hand, we could think to build the matrices with blocks of a given dimension, where the blocks are sparse inside. Then what we need is a way to group 1x1 blocks in larger blocks...
Out of curiosity, what's your use case?
from dbcsr.
I'm trying to get optimized sparsity working for linear layers in pytorch, preferably with each layer being variational to maintain sparsity, then conv layers. I'm looking at smaller SPGeMM and SpMSpV libraries now that look hopeful, but I'll have to test them.
Unfortunately, that variational rule is likely what's going to make it so that most of the blocks, even 2x2, will be sparse. And now that I think about it, that might apply to the synapses as well as the input, since having meaningfully different input connections would help keep an output from activating at the same time as another output.
from dbcsr.
Related Issues (20)
- Error with mpich 4.1: Type mismatch in argument 'baseptr' at (1); passed TYPE(c_ptr) to INTEGER(4) HOT 26
- cuda tests are broken HOT 1
- Evaluate USE_ACCEL=opencl
- Test MPI_F08 HOT 1
- Default initializers
- mpich test failure on s390x HOT 2
- Thread number has changed error if OMP_DYNAMIC=TRUE HOT 3
- `TEST_MPI_RANKS=auto` does not account for ctest parallelization HOT 1
- Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation HOT 1
- OpenMP detection with Clang is broken in 2.6.0 HOT 10
- Compilation fails on Archer2 UK system HOT 7
- Removal of dbcsr-data HOT 2
- CUDA RUNTIME API error: DeviceSetLimit failed with error cudaErrorInvalidValue HOT 6
- Consider to drop -Werror for tests/configs pulling external APIs/frameworks HOT 12
- DBCSR performs very poorly on GH200, when there are large blocks HOT 14
- Discussion on tuning machinery HOT 2
- Make configuration variables immutable when they are already consumed
- cuFuncSetSharedMemConfig deprecated HOT 1
- Discussion on DBCSR HOT 43
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbcsr.