Code Monkey home page Code Monkey logo

Comments (7)

titzer avatar titzer commented on June 8, 2024 1

In my mind, "is_fast" amounts to "the engine+cpu have native support for this" rather than be emulated expensively in software. For example, in the case of SIMD, having native 128-bit registers and the majority of the spec'd instructions implemented in hardware, rather than by being emulated with scalar code by the engine.

Of course microprocessor generations are going to vary on the exact cost of some instructions, but that's not what I had taken "is_fast" to mean.

from feature-detection.

penzn avatar penzn commented on June 8, 2024 1

There is a lot of gray area within "native support" - some instructions can be substantially more expensive on one platform than on the other. And by substantial I don't mean 20x, but enough to make their use in SIMD-enabled algorithms tricky.

This isn't a purely theoretical problem. For example, here is a popular code testing for x86 vs Arm to pick different microkernel implementations: https://github.com/google/XNNPACK/blob/852f70d3157ff847a316ae9321bc142be77cee87/src/init.c#L80-L85

Given that this has been known for a long time and we have standardized SIMD I think we are comfortable with this kind of code. I am wondering if this can be standardized in some form (probably not exactly is_wasm_x86, maybe something like is_swizzle_fast), I feel that relying on behavior not documented by the spec can lead to compatibility issues down the road. For example imagine an engine that decided to quiet the sign bit for whatever reason, or somebody modifying the check without understanding what it is supposed to do.

I think @conrad-watt asked about code that does this kind of testing in practice in 2022-04-12 meeting, apologies if I am wrong.

from feature-detection.

tlively avatar tlively commented on June 8, 2024

Yeah, in a world where we expose "is_fast" kind of features, we would have to leave it up to the individual engine to decided whether a feature "is fast" or not, since we can't standardize a meaning for that. You're right that speed might change between hardware, but the engine could still make a best-effort attempt to usefully hint to an application about whether an instruction set is "fast" or not. I think in practice this could end up being very useful.

from feature-detection.

rrwinterton avatar rrwinterton commented on June 8, 2024

Thomas you brought up some good points I didn't think of in leaving it to the engine to determine what is fast instead of the "application". Also agree this is a very useful idea if it can be pulled off. The problem is without running the exact code or something very representative I still think it may be too much work for an engine to determine if it "is fast"? There are so many dependencies on is fast, like cache size, cache line fetches, data alignment issues. An example in older Intel hardware was if the SIMD data was unaligned and/or crossed cache line boundaries you could take significant performance hits and pending on the code functionality non-SIMD would be at parity with SIMD. (Not sure I ever saw it regress but I guess it is possible.)

from feature-detection.

tlively avatar tlively commented on June 8, 2024

True, if the hardware itself doesn't even get much speedup from SIMD or has a lot of performance pitfalls, then perhaps the best thing for the engine to do in that case is conservatively report "not fast." Assuming reasonably recent hardware, though, I expect the engine will mostly base the distinction on whether it polyfills/scalarizes SIMD or actually lowers it to native SIMD instructions. Again, it doesn't need to be perfect, just helpful on average.

from feature-detection.

conrad-watt avatar conrad-watt commented on June 8, 2024

@penzn that's an incredible example, thanks for bringing it up! I've not previously been exposed to code in the wild testing NaN bits in this way.

The more extreme version of my question was whether, in the presence of relaxed SIMD, any code would attempt to rely on platform-specific semantic differences beyond just making performance decisions (in particular, IIRC, testing for and relying on single-rounding FMA). But it actually looks like this question was also brought up and discussed previously WebAssembly/relaxed-simd#44.

from feature-detection.

penzn avatar penzn commented on June 8, 2024

@conrad-watt, you are welcome! This is very similar to how native numerical libraries query CPU features, in fact the same source file does native init in XNNPACK. The only real difference is the cpuinfo_has_... calls, which are the native API for features.

Edit: what I meant to say is that this interesting check is used in lieu of cpuinfo API calls. Some form of this kind of testing is necessary given the nature of SIMD algorithms, as inefficient SIMD code can be self-defeating. If we had started with first-class vector operations or first-class data arrays instead of fixed-width SIMD this problem might have not arisen, but on the other hand we would have not be able to port many native algorithms.

FMA discussion wasn't conclusive, but yes, it can be an example of a situation where detecting a CPU feature might be necessary at some level.

from feature-detection.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.