Comments (5)
cc: @stevenvar
from iree-llvm-sandbox.
Thanks for your Q, this is a missing feature that is being pushed back into our stack of shorter term things thanks to recent developments.
I'd recommend you come ask questions in the public IREE chat named "codegen" for low-latency iteration: https://discord.gg/ZNtWrXF6.
There are a few things intersecting here but a rough summary is:
- for x86 we have not yet needed those, the ISA has a "broadcast scalar and fma" op which made transposition unnecessary (and maybe even detrimental)
- @bjacob is also looking at mm for other ARM targets and is iterating on a mix of rank-reducing subview/subtensor and vector.shape_cast N-d <-> 1-d
- @gysit is investigating some issue with 2-d conv. padding related to rank-reducing subview/subtensor
- we know we want to extend linalg.pad_tensor with an optional permutation_map that will carry the information and that will need to be supported by hoisting.
I think 4. is realtively easy to get started on as far as extending the op semantics/verification/tests and tracking uses to ensure transformations fail in the presence of this permutation_map.
Vectorization should also be reasonably easy by just inserting the vector.transpose between the read/write.
Extensions to hoist padding are a bit more involved but we know what to do.
Do you guys want to take ownership of point 4. and starting working on core MLIR patches?
from iree-llvm-sandbox.
Hi Nicolas, thanks for the answer. Yes, @stevenvar is having a look at this. We saw two possibilities:
a) Having a pass that hoists the transpose out of the micro-kernel
b) Hoisting the transpose "ab initio", on the line of point 4 in your list
Do you think that a) is the wrong way to go (or it is harder to do than b) )?
About your point 1., do you mean x86 has got a fmla vec, vec, scalar
? In Arm there is an indexed fmla fmla vec_c vec_a, vec_b[i]
that broadcasts the i-th lane of vec_b
into a logical vector broadcast
and then does fmla vec_c, vec_a, broadcast
. The point is that vec_b
is still a vector that needs to be loaded from memory rather than a single scalar
from iree-llvm-sandbox.
Do you think that a) is the wrong way to go (or it is harder to do than b) )?
I don't think a) is wrong in itself but def. harder given the state of the world and there are also tradeoffs + composability differences:
- padding and hoist padding happens at the tensor level, a) would happen on vectors much later in the pipeline.
- hoist padding already exists. It is unclear how the higher-D tensor would be created in a). Atm on tensors we do a bounding box analysis and a few sophisticated things that will be painful to repro.
- there are opportunities at the tensor level re canonicalization with consumer linalg ops and other things that are natural on tensors and for which vector is too late.
- we already identified the linalg.pad_tensor extensions as something we wanted in general so there is opportunity for convergence.
About your point 1., do you mean x86 has got a fmla vec, vec, scalar?
There is an instruction vfmadd231ps zmm0,zmm4,DWORD PTR [rsi+0x4]{1to16}
See slide 42 of this prez: https://drive.google.com/corp/drive/folders/1lLhWopx_WCtFq3gTDGVJEzV9hFD7dwmI.
from iree-llvm-sandbox.
Thanks a lot for all the explanation. So yes, we can take ownership of point 4.
from iree-llvm-sandbox.
Related Issues (20)
- Support for %alpha and %beta in GEMM HOT 7
- Transforming linalg with multiple generic operations HOT 4
- PSA: Fixed configure.py boolean option handling in 64386aad49c73fee69e199c7b4894b648c204736 HOT 1
- Einseum-like spec for transposes HOT 1
- Failed to cancel out unrealized_conversion_cast HOT 2
- CI fails due to failure to install a Python dependency. HOT 1
- Support padding transpose in the transform dialect HOT 2
- PDL patterns HOT 1
- Segfault in some matmul cases HOT 8
- Error in PDL after Interpreter refactoring HOT 6
- Why the sandbox depends on Clang and its tools? HOT 3
- Tracking ops may fail if a pattern does not call replace. HOT 4
- PadOp sometimes does not compose HOT 2
- PSA: renamed python_package -> python_packages
- PSA: Integration with IREE for multi-target and whole model compilation HOT 6
- Fuse & Tile & Pad produces possibly inefficient vector code. HOT 26
- Conv2D benchmark failed with DoubleTiling methods
- python.examples.matmul.test failed with ModuleNotFoundError
- Collaboration on data analytics workloads in MLIR HOT 12
- RFC: improve stacked commit flow
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from iree-llvm-sandbox.