Light

lokathor / wide Goto Github PK

View Code? Open in Web Editor NEW

255.0 7.0 23.0 535 KB

A crate to help you go wide. By which I mean use SIMD stuff.

Home Page: https://docs.rs/wide

License: zlib License

Rust 100.00%

rust zlib-license simd

wide's Introduction

wide

A crate to help you go wide.

Specifically, this has portable "wide" data types that do their best to be SIMD when possible.

On x86, x86_64, wasm32 and aarch64 neon this is done with explicit intrinsic usage (via safe_arch), and on other architectures this is done by carefully writing functions so that LLVM hopefully does the right thing. When Rust stabilizes more explicit intrinsics then they can go into safe_arch and then they can get used here.

wide's People

Contributors

Stargazers

Watchers

wide's Issues

Numbers between -1 and 1 don't truncate correctly.

This works as expected:

    println!("{}", wide::f32x4::from(-1.25f32).fract()[0]); // -0.25
    println!("{}", (-1.25f32).fract()); // -0.25

    println!("{}", wide::f32x4::from(1.25f32).fract()[0]); // 0.25
    println!("{}", (1.25f32).fract()); // 0.25

This does not:

    println!("{}", wide::f32x4::from(-0.25f32).fract()[0]); // 0
    println!("{}", (-0.25f32).fract()); // -0.25

    println!("{}", wide::f32x4::from(0.25f32).fract()[0]); // 0
    println!("{}", (0.25f32).fract()); // 0.25

~/Local/widetrunc $ rustup show
Default host: x86_64-pc-windows-gnu

installed toolchains
--------------------

stable-x86_64-pc-windows-msvc (default)
nightly-x86_64-pc-windows-msvc

active toolchain
----------------

stable-x86_64-pc-windows-msvc (default)
rustc 1.36.0 (a53f9df32 2019-07-03)

Verify sin / cos assembly.

Right now we call sin and cos by calling sin_cos and then taking one or the other. However, we should verify that "the right thing" happens and it skips the final calculation of the non-used value.

Remove bytemuck dependency

It looks like the only tools from bytemuck being used are cast, cast_mut, cast_ref, Pod, Zeroable for a series of already known good casts (like m128 -> [f32; 4]). This might be able to be removed for a raw transmute. This would marginally improve performance probably.

std feature

would enable better sqrt outside of x86, possibly other things

Missing Floor / Ceil methods

There's no Ceil or Floor method implemented for the wide types.

Based off of Issue #15, it appears that only Intel has intrinsics for floor/ceil, and other architectures will need a manually implemented version.

SSE2 support in i32x8/u32x8

I'm trying to switch from f32x4 to f32x8 in my project, which is fairly straight-forward thanks to wide's fallback mechanism. But I also use a lot of i32x8/u32x8, which means that on SSE2 I'm stuck with scalar, which is very slow.

Is there a reason why i32x8/u32x8 doesn't support SSE2 fallback? f32x8 also uses two m128, but requires only SSE2 and not SSSE3.
I can try implementing SSE2 fallback for i32x8/u32x8 if you're interested.

Test that lane ordering is consistent for `from` and `new` in all cases.

We probably screwed it up somewhere.

Use _mm_fmadd_ps in mul_add under feature gate

Basic Rand function with SIMD

An implementation of this would be really useful for monte carlo simulations (pricing), and anywhere where 4 or 8 random independent randoms are needed.

https://github.com/lemire/simdpcg or
https://mathoverflow.net/questions/104915/pseudo-random-algorithm-allowing-o1-computation-of-nth-element

I'm not sure how "good" it needs to be for the purposes of this library.

I think the second one could be quite simple and efficient to implement.

Add trunc method

Specifically _mm_cvttps_epi32.

Looks like it missing or I could not find it.

f32x2?

pathfinder_simd implements it via f32x4. It's very useful when working with points/vectors.

f32x4::try_from([i32; 4])

Waiting on rust-lang/rust#64693

Debug impl doesn't have commas

Right now it looks like

f32x4(0.99997190.999972160.99997260.9999726)

Zero dependency variant

Currently, the crate has two dependencies, and for such a simple and low-level crate I would assumed that it should not have any dependencies at all. At least this is what I'm trying to do with my crates.

Would you be interested in removing all the dependencies?

Use rustfmt

I understand that this is a controversial question, but it's weird to see a Rust crate that doesn't follow Rust's code style, like basically everyone else.

Is there a chance you would switch to the default code style?

Add atan/atan2

For @termhn

{floating}::acos()

Needed for spherical linear interpolation

How do you test scalar implementations?

The create doesn't have dedicated SSE/AVX features and relies on target_feature, but it's kinda broken. How do you test the scalar implementations?

[type]::splat()

should work the same as how VectorType::from(ScalarType) works now, just another more explicit name for it

u16x16

This one fits into 256bits, so it can be implemented via AVX.

as_slice method

Is there a method to get the wide type content as a slice of primitive types? Like f32x4 -> [f32; 4]?

f32x4::from_str(&str)

We should probably support this I guess. Low priority though.

Should Debug include the type name?

The Debug output for, let's say, f32x4 looks like this:

(1.95763, 1.95276, 1.9478897, 1.9430195)

Maybe it would be better to have something like:

f32x4(1.95763, 1.95276, 1.9478897, 1.9430195)

Sin and cos give incorrect results?

I tried to use the wide::sin_f32 and wide::cos_f32 functions. I got some confusing results so I plotted some of the results. This plot shows the functions' results as well as the real sin and cos values on the interval -6.5..6.5 (about minus 2pi to 2pi):

What gives?

f32x8 type?

Add f32xN recip and recip_sqrt

_mm_rcp_ps and _mm_rsqrt_ps correspondingly.

Add documentation

It would be great to have a documentation, especially Intel-like: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3333,100,100&text=_mm512_add_epi32

FOR j := 0 to 15
	i := j*32
	dst[i+31:i] := a[i+31:i] + b[i+31:i]
ENDFOR
dst[MAX:512] := 0

move_mask for integer types

The only hardware move mask instruction is for i8, so it's unclear what we should do for the other integer types. They could just not support move mask at all, or we could try to... just have them give 16 bits of into even if they're less lanes wide? That's weird, but it is fast.

Something to think about I guess.

[type]::any() [type]::all() [type]::none()

For use with "mask" values returned after comparison ops.

for example,

if thing.cmp_eq(other_thing).any() {
   // do_the_thing
}

Boolean masks should become their own type

some people like the increased type safety.

this is a breaking change so it'd be in 0.6, but otherwise it's not hard.

Upgrade to avx instructions to double simd width f64x4, f32x8, u8x32 etc

It would be nice to continue with stable releases and have access to safe_arch_m256 functions

f32x4 PartialEq / PartialOrd replacement code

We can't implement the literal PartialEq and PartialOrd traits, but we can provide methods that do a wide comparison and then give a wide output of some sort.

SSE4.1 truncate always slower than SSE2 truncate

I benched the SSE4.1 truncate with the SSE2 truncate and found the SSE4.1 version to be 5%-15% slower. Might not be worth keeping?

    pub fn trunc41(self) -> Self {
        unsafe { f32x4(_mm_round_ps(self.0, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC)) }
    }

    pub fn trunc2(self) -> Self {
        unsafe { f32x4(_mm_cvtepi32_ps(_mm_cvttps_epi32(self.0))) }
    }

0.5.6 isn't semver compatible

Due to #62 Moved cmp operators into traits. No usability changes. which now requires one to import the trait to be able to use the functions which is a breaking change. Due to this ultraviolet doesn't compile anymore for example.

This is ofc just assuming this crate intends to stay semver compatible in minor versions prior to a 1.0 release.

impl From<f32x4> for [f32; 4]

or something

no_std

The following functions are not available in core, not even as nightly intrinsics, so we can't flip on no_std until we have some way to do them ourselves.

Give i32x4 a From impl that uses _mm_set1_epi32

The impl From<i32> for i32x4 uses new(i32, i32, i32, i32) always. There's an intrinsic to not do that.

impl From<i32> for i32x4 {
  #[inline]
  #[must_use]
  fn from(i: i32) -> Self {
   magic( _mm_set1_epi32(i))
  }
}

core::intrinsics::sqrtf32(x) Call to unsafe function

Line 138 in lib.rs needs to be changed to:

unsafe { core::intrinsics::sqrtf32(x) }

Otherwise the build will fail on nightly for platforms without SSE support. I found this when I tried to build ultraviolet for Wasm.

exp
log/ln

f32x4 missing methods

Deprecated

~~abs_sub~~

Unstable (but stable soon!)

div_euclid
rem_euclid
clamp

Needs Other Types

to_bits
from_bits

Not Sure How To Handle Bools Yet

Unify the ConstUnionHack thing

Right now we have an i32x4 version and a f32x4 version but really we just need a 128-bit version and then have all sorts of types within it.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.