I have a problem with switching from a feature detection as originally proposed to an

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Feature Detection "is faster" about feature-detection HOT 7 OPEN

webassembly commented on June 8, 2024

Feature Detection "is faster"

from feature-detection.

Comments (7)

titzer commented on June 8, 2024 1

In my mind, "is_fast" amounts to "the engine+cpu have native support for this" rather than be emulated expensively in software. For example, in the case of SIMD, having native 128-bit registers and the majority of the spec'd instructions implemented in hardware, rather than by being emulated with scalar code by the engine.

Of course microprocessor generations are going to vary on the exact cost of some instructions, but that's not what I had taken "is_fast" to mean.

from feature-detection.

penzn commented on June 8, 2024 1

There is a lot of gray area within "native support" - some instructions can be substantially more expensive on one platform than on the other. And by substantial I don't mean 20x, but enough to make their use in SIMD-enabled algorithms tricky.

This isn't a purely theoretical problem. For example, here is a popular code testing for x86 vs Arm to pick different microkernel implementations: https://github.com/google/XNNPACK/blob/852f70d3157ff847a316ae9321bc142be77cee87/src/init.c#L80-L85

Given that this has been known for a long time and we have standardized SIMD I think we are comfortable with this kind of code. I am wondering if this can be standardized in some form (probably not exactly is_wasm_x86, maybe something like is_swizzle_fast), I feel that relying on behavior not documented by the spec can lead to compatibility issues down the road. For example imagine an engine that decided to quiet the sign bit for whatever reason, or somebody modifying the check without understanding what it is supposed to do.

I think @conrad-watt asked about code that does this kind of testing in practice in 2022-04-12 meeting, apologies if I am wrong.

from feature-detection.

tlively commented on June 8, 2024

Yeah, in a world where we expose "is_fast" kind of features, we would have to leave it up to the individual engine to decided whether a feature "is fast" or not, since we can't standardize a meaning for that. You're right that speed might change between hardware, but the engine could still make a best-effort attempt to usefully hint to an application about whether an instruction set is "fast" or not. I think in practice this could end up being very useful.

from feature-detection.

rrwinterton commented on June 8, 2024

Thomas you brought up some good points I didn't think of in leaving it to the engine to determine what is fast instead of the "application". Also agree this is a very useful idea if it can be pulled off. The problem is without running the exact code or something very representative I still think it may be too much work for an engine to determine if it "is fast"? There are so many dependencies on is fast, like cache size, cache line fetches, data alignment issues. An example in older Intel hardware was if the SIMD data was unaligned and/or crossed cache line boundaries you could take significant performance hits and pending on the code functionality non-SIMD would be at parity with SIMD. (Not sure I ever saw it regress but I guess it is possible.)

from feature-detection.

tlively commented on June 8, 2024

True, if the hardware itself doesn't even get much speedup from SIMD or has a lot of performance pitfalls, then perhaps the best thing for the engine to do in that case is conservatively report "not fast." Assuming reasonably recent hardware, though, I expect the engine will mostly base the distinction on whether it polyfills/scalarizes SIMD or actually lowers it to native SIMD instructions. Again, it doesn't need to be perfect, just helpful on average.

from feature-detection.

conrad-watt commented on June 8, 2024

@penzn that's an incredible example, thanks for bringing it up! I've not previously been exposed to code in the wild testing NaN bits in this way.

The more extreme version of my question was whether, in the presence of relaxed SIMD, any code would attempt to rely on platform-specific semantic differences beyond just making performance decisions (in particular, IIRC, testing for and relying on single-rounding FMA). But it actually looks like this question was also brought up and discussed previously WebAssembly/relaxed-simd#44.

from feature-detection.

penzn commented on June 8, 2024

@conrad-watt, you are welcome! This is very similar to how native numerical libraries query CPU features, in fact the same source file does native init in XNNPACK. The only real difference is the cpuinfo_has_... calls, which are the native API for features.

Edit: what I meant to say is that this interesting check is used in lieu of cpuinfo API calls. Some form of this kind of testing is necessary given the nature of SIMD algorithms, as inefficient SIMD code can be self-defeating. If we had started with first-class vector operations or first-class data arrays instead of fixed-width SIMD this problem might have not arisen, but on the other hand we would have not be able to port many native algorithms.

FMA discussion wasn't conclusive, but yes, it can be an example of a situation where detecting a CPU feature might be necessary at some level.

from feature-detection.

Feature Detection "is faster" about feature-detection HOT 7 OPEN

Comments (7)

Related Issues (12)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent