Code Monkey home page Code Monkey logo

wide's Introduction

License:Zlib Minimum Rust Version crates.io docs.rs

wide

A crate to help you go wide.

Specifically, this has portable "wide" data types that do their best to be SIMD when possible.

On x86, x86_64, wasm32 and aarch64 neon this is done with explicit intrinsic usage (via safe_arch), and on other architectures this is done by carefully writing functions so that LLVM hopefully does the right thing. When Rust stabilizes more explicit intrinsics then they can go into safe_arch and then they can get used here.

wide's People

Contributors

akikoskinen avatar auroranssolis avatar austinliew avatar cryze avatar fu5ha avatar gilescope avatar imurx avatar jonas-schievink avatar lokathor avatar mcroomp avatar mottl avatar nathanvoglsam avatar razaekel avatar razrfalcon avatar ronniec95 avatar rrradicaledward avatar shssoichiro avatar torokati44 avatar tower120 avatar unrealinreal avatar waywardmonkeys avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wide's Issues

Numbers between -1 and 1 don't truncate correctly.

This works as expected:

    println!("{}", wide::f32x4::from(-1.25f32).fract()[0]); // -0.25
    println!("{}", (-1.25f32).fract()); // -0.25

    println!("{}", wide::f32x4::from(1.25f32).fract()[0]); // 0.25
    println!("{}", (1.25f32).fract()); // 0.25

This does not:

    println!("{}", wide::f32x4::from(-0.25f32).fract()[0]); // 0
    println!("{}", (-0.25f32).fract()); // -0.25

    println!("{}", wide::f32x4::from(0.25f32).fract()[0]); // 0
    println!("{}", (0.25f32).fract()); // 0.25
~/Local/widetrunc $ rustup show
Default host: x86_64-pc-windows-gnu

installed toolchains
--------------------

stable-x86_64-pc-windows-msvc (default)
nightly-x86_64-pc-windows-msvc

active toolchain
----------------

stable-x86_64-pc-windows-msvc (default)
rustc 1.36.0 (a53f9df32 2019-07-03)

image

Verify sin / cos assembly.

Right now we call sin and cos by calling sin_cos and then taking one or the other. However, we should verify that "the right thing" happens and it skips the final calculation of the non-used value.

Remove bytemuck dependency

It looks like the only tools from bytemuck being used are cast, cast_mut, cast_ref, Pod, Zeroable for a series of already known good casts (like m128 -> [f32; 4]). This might be able to be removed for a raw transmute. This would marginally improve performance probably.

std feature

would enable better sqrt outside of x86, possibly other things

Missing Floor / Ceil methods

There's no Ceil or Floor method implemented for the wide types.

Based off of Issue #15, it appears that only Intel has intrinsics for floor/ceil, and other architectures will need a manually implemented version.

SSE2 support in i32x8/u32x8

I'm trying to switch from f32x4 to f32x8 in my project, which is fairly straight-forward thanks to wide's fallback mechanism. But I also use a lot of i32x8/u32x8, which means that on SSE2 I'm stuck with scalar, which is very slow.

Is there a reason why i32x8/u32x8 doesn't support SSE2 fallback? f32x8 also uses two m128, but requires only SSE2 and not SSSE3.
I can try implementing SSE2 fallback for i32x8/u32x8 if you're interested.

Add trunc method

Specifically _mm_cvttps_epi32.

Looks like it missing or I could not find it.

f32x2?

pathfinder_simd implements it via f32x4. It's very useful when working with points/vectors.

Zero dependency variant

Currently, the crate has two dependencies, and for such a simple and low-level crate I would assumed that it should not have any dependencies at all. At least this is what I'm trying to do with my crates.

Would you be interested in removing all the dependencies?

Use rustfmt

I understand that this is a controversial question, but it's weird to see a Rust crate that doesn't follow Rust's code style, like basically everyone else.

Is there a chance you would switch to the default code style?

[type]::splat()

should work the same as how VectorType::from(ScalarType) works now, just another more explicit name for it

u16x16

This one fits into 256bits, so it can be implemented via AVX.

as_slice method

Is there a method to get the wide type content as a slice of primitive types? Like f32x4 -> [f32; 4]?

Should Debug include the type name?

The Debug output for, let's say, f32x4 looks like this:

(1.95763, 1.95276, 1.9478897, 1.9430195)

Maybe it would be better to have something like:

f32x4(1.95763, 1.95276, 1.9478897, 1.9430195)

Sin and cos give incorrect results?

I tried to use the wide::sin_f32 and wide::cos_f32 functions. I got some confusing results so I plotted some of the results. This plot shows the functions' results as well as the real sin and cos values on the interval -6.5..6.5 (about minus 2pi to 2pi):

Wide sin and cos plot

What gives?

move_mask for integer types

The only hardware move mask instruction is for i8, so it's unclear what we should do for the other integer types. They could just not support move mask at all, or we could try to... just have them give 16 bits of into even if they're less lanes wide? That's weird, but it is fast.

Something to think about I guess.

SSE4.1 truncate always slower than SSE2 truncate

I benched the SSE4.1 truncate with the SSE2 truncate and found the SSE4.1 version to be 5%-15% slower. Might not be worth keeping?

    pub fn trunc41(self) -> Self {
        unsafe { f32x4(_mm_round_ps(self.0, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC)) }
    }
    pub fn trunc2(self) -> Self {
        unsafe { f32x4(_mm_cvtepi32_ps(_mm_cvttps_epi32(self.0))) }
    }

0.5.6 isn't semver compatible

Due to #62 Moved cmp operators into traits. No usability changes. which now requires one to import the trait to be able to use the functions which is a breaking change. Due to this ultraviolet doesn't compile anymore for example.

This is ofc just assuming this crate intends to stay semver compatible in minor versions prior to a 1.0 release.

no_std

The following functions are not available in core, not even as nightly intrinsics, so we can't flip on no_std until we have some way to do them ourselves.

  • acos
  • acosh
  • asin
  • asinh
  • atan
  • atan2
  • atanh
  • cbrt
  • cosh
  • exp
  • exp_m1
  • hypot
  • ln_1p
  • log
  • log10
  • log2
  • sinh
  • tanh
  • fract
  • mul_add
  • signum
  • sin_cos

Give i32x4 a From impl that uses _mm_set1_epi32

The impl From<i32> for i32x4 uses new(i32, i32, i32, i32) always. There's an intrinsic to not do that.

impl From<i32> for i32x4 {
  #[inline]
  #[must_use]
  fn from(i: i32) -> Self {
   magic( _mm_set1_epi32(i))
  }
}

core::intrinsics::sqrtf32(x) Call to unsafe function

Line 138 in lib.rs needs to be changed to:

unsafe { core::intrinsics::sqrtf32(x) }

Otherwise the build will fail on nightly for platforms without SSE support. I found this when I tried to build ultraviolet for Wasm.

f32x4 missing methods

Deprecated

  • abs_sub

Unstable (but stable soon!)

  • div_euclid
  • rem_euclid
  • clamp

Needs Other Types

  • to_bits
  • from_bits

Not Sure How To Handle Bools Yet

  • is_finite
  • is_infinite
  • is_nan
  • is_normal
  • is_sign_negative
  • is_sign_positive

Unify the ConstUnionHack thing

Right now we have an i32x4 version and a f32x4 version but really we just need a 128-bit version and then have all sorts of types within it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.