Comments (4)
DGEMM cannot and will not be performance portable unless your standards for performance are low. There are decades of empirical results confirming this.
from blis.
I think with a pinch of autotuning, I think it is possible. For example, see what triton does for matrix multiplication. No hard coding of important parameters and higher level abstraction. Nothing specific to Nvidia anymore.
If your definition of performance portability is the same binary and no hardware specific optimizations, I agree with you. But I guess, my definition of that is not dissimilar to BLIS's own version of it: https://github.com/flame/blis/tree/master/config
from blis.
my definition of that is not dissimilar to BLIS's own version of it: https://github.com/flame/blis/tree/master/config
We welcome your contributions of new configurations, to accompany the ones you've discovered in blis/config
. And you are welcome to use autotuning to assist your (micro)kernel development.
from blis.
Unfortunately BLIS doesn't compile to any sort of GPU :(
from blis.
Related Issues (20)
- bli_gemmsup_rd_haswell_asm_d6x8m.c:1296:1:error:bp cannot be used in ams here HOT 3
- New release? HOT 6
- A more complete list of ARM cpu implementations
- arm64 cpu identification is not portable to BSDs HOT 3
- inconsistence between documentation and code for bli_?trmm3 HOT 5
- What is the best way to debug BLIS? HOT 2
- getting error as illegal instruction HOT 4
- Support compiler names with spaces HOT 1
- Regarding Default Behaviour for CPU Affinity HOT 4
- BF16 on AMD CPU? HOT 4
- Upstream BLIS patches for ARM SVE? HOT 5
- Memory location in the prefetch instructions HOT 5
- Facing issue when running following command: pip install --upgrade --no-cache-dir thinc HOT 1
- AMD FX(tm)-6300 Six-Core Processor piledriver errors with check HOT 4
- errors with scalapack due to [cz]symv and [cz]syr interfaces HOT 10
- Not possible to link Blis and Lapack statically into the same executable HOT 12
- Follow up on https://github.com/flame/blis/issues/811, remaining error in out.zblat3.txt after applying fix #814 HOT 4
- LAPACK test segfault on zen/zen2/zen3 at bli_sgemmsup_rd_haswell_asm_1x16n HOT 16
- OMP_NUM_THREADS discouraged? Why? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from blis.