Comments (7)
If we don't care that much about determinism and/or browser fingerprinting, then make the out-of-bound undefined would be the most performant. Otherwise, I think wrap-around (ie: modulo the length) is the best tradeoff as it is super fast to implement in all architecture (just a bitwise and).
Concerning shifts, I proposed in #27 a more general shift that takes 2 inputs (https://github.com/lemaitre/flexible-vectors/blob/master/proposals/flexible-vectors/README.md#lane-shift). It would be somewhat equivalent to splice, but with integer.
from flexible-vectors.
If we don't care that much about determinism and/or browser fingerprinting, then make the out-of-bound undefined would be the most performant. Otherwise, I think wrap-around (ie: modulo the length) is the best tradeoff as it is super fast to implement in all architecture (just a bitwise and).
Wrap-around is not always a bitwise-and: all of RISC-V V, Arm SVE, and SimpleV support non-power-of-2 vector lengths.
from flexible-vectors.
Wrap-around is not always a bitwise-and: all of RISC-V V, Arm SVE, and SimpleV support non-power-of-2 vector lengths.
That's true that those ISAs support non-power of 2 vector lengths, but the architectural vector length is, AFAICT, always a power of 2, so restricting the native vector length of wasm registers to a power of 2 is a sensible choice that would make my assertion valid.
from flexible-vectors.
Wrap-around is not always a bitwise-and: all of RISC-V V, Arm SVE, and SimpleV support non-power-of-2 vector lengths.
That's true that those ISAs support non-power of 2 vector lengths, but the architectural vector length is, AFAICT, always a power of 2,
Last I checked, it is a valid implementation of RISC-V V (and also SVE) where the maximum vector length is not a power of 2.
so restricting the native vector length of wasm registers to a power of 2 is a sensible choice that would make my assertion valid.
from flexible-vectors.
... the architectural vector length is, AFAICT, always a power of 2...
That's not true for SVE - the architecture permits implementations whose vector length is 384 bits, for example, so proper vector length-agnostic (VLA) code generation can't make any assumptions about the vector length other than its being a multiple of 128 up to 2048 bits inclusive. In that case wrap-around would require the following instruction sequence, taking 16-bit elements as an example (index in W0
):
cnth x1
udiv w2, w0, w1
msub w1, w1, w2, w0
Note that there is no instruction to calculate the remainder of a division.
However, it is true that one option is to force all Wasm implementations to constrain the vector length to the largest power of 2 that is less than or equal to the hardware vector length (an ability that is a requirement of the architecture). It's awkward and potentially a waste of hardware resources, but possible.
P.S. Changing the vector length in SVE is risky because a scalable vector register could be saved somewhere on the stack by a function up the call chain. That's why in practice probably nobody is going to do it, unless starting a new process. However, it could work in a strictly controlled environments such as Wasm runtimes - they tend to do weird stuff anyway (e.g. the way linear memory bounds checking is implemented).
from flexible-vectors.
Good point - this needs to be stated in the spec.
With operations that select lanes, I second the option of making out-of-bounds either platform-specific (see WebAssembly/relaxed-simd#22), which is a softer form of "undefined", or doing some form of truncation. Wasm SIMD tried assigning special meaning for out-of-bounds indices in swizzle, and it does not scale on x86-based platforms (WebAssembly/simd#93).
from flexible-vectors.
@penzn While you are at it, the specification text relating to the lane-wise shifts mentions two input vectors a
and b
, while the operation signatures and the pseudocode use only one, naturally. Also, square root shouldn't be a binary operation.
I just remembered that there was an alternative way to enforce a power-of-2 vector length in a SVE-based implementation - so far my assumption has been that the governing predicates that would be used by the generated instructions would be initialized with ptrue p0.s
for 32-bit elements, for example, which is equivalent to ptrue p0.s, all
. However, there is also the option of using ptrue p0.s, pow2
(and cntw x0, pow2
in some cases), which should achieve the same effect as changing the vector length to a power of 2, at least for the operations that are currently defined by the proposal. While arguably safer, this approach would still potentially be a waste of hardware resource.
from flexible-vectors.
Related Issues (20)
- SIMD subgroup meeting on 2021-10-01 HOT 1
- SIMD subgroup meeting on 2021-12-03 HOT 1
- SIMD subgroup meeting on 2022-01-21 HOT 4
- SIMD subgroup meeting on 2022-02-18 HOT 2
- SIMD subgroup meeting on 2022-03-18 HOT 1
- SIMD subgroup meeting on 2022-04-15 HOT 1
- Immediates vs regular values for lane indices HOT 2
- SIMD subgroup meeting on 2022-05-13 HOT 1
- SIMD subgroup meeting on 2022-06-10 HOT 1
- SIMD subgroup meeting on 2022-09-02 HOT 2
- SIMD subgroup meeting on 2022-09-30 HOT 1
- SIMD subgroup meeting on 2022-10-14
- Why lanes? HOT 2
- sub/div opcode clash HOT 1
- SIMD subgroup meeting on 2023-02-17 HOT 1
- SIMD subgroup meeting on 2023-03-17
- [CI] fix broken deployment
- Flexible vectors: Tracking issue for feedback after CG presentation HOT 5
- SIMD subgroup meeting on 2023-10-27 HOT 7
- Shuffle operations
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flexible-vectors.