tc39 / ecmascript_simd Goto Github PK

View Code? Open in Web Editor NEW

540.0 540.0 71.0 3.78 MB

SIMD numeric type for EcmaScript

License: Other

Shell 0.01% JavaScript 65.18% HTML 34.80%

ecmascript_simd's People

Contributors

Stargazers

Watchers

Forkers

pandahp wahbahdoo nivertech mhaghigh fenghaitao luomor elainte huningxin thomas-daniels bnjbvr sunfishcode f14r3 kennylv muojp sohailalam2 dstnation chuan9 mauro10 marlon1 chunywang dead-claudia flagxor ashumeow littledan dtig ajklein billbudge bterlson rob-bateman weilianglin litian2015 nmostafa mcanthony kripken peterjensen anba arunetm-zz rlugojr aylusltd brendaneich gurobokum shadowkun longde123 alexxnica kryndex iugo flashercs rsax darrenscerri omni360 toyouzi wegiangb ollutr dalavancloud tihozdjelarevic we452366 isabella232 thesayyn samkenxstream azureidea seanpm2001

ecmascript_simd's Issues

Document of toUint32x4 parameter is incorrect.

@param {uint32x4} An instance of a float32x4. should be @param {uint32x4} An instance of a uint32x4.

https://github.com/johnmccutchan/ecmascript_simd/blob/master/src/ecmascript_simd.js#L667

Missing comparison operations in int32x4

float32x4 has 6 comparison operations:

lessThan
lessThanOrEqual
equal
notEqual
greaterThanOrEqual
greaterThan

int32x4 only has three:

equal
greaterThan
lessThan

Implement Uint32x4Array polyfill and tests.

Issue with use of .signMask in aobench?

I think there's a problem with how .signMask is used in the aobench benchmark:

Around line 360:

    var cond1 = SIMD.greaterThan(D, float32x4.zero());
    if (cond1.signMask) {
      var t2 = SIMD.sub(SIMD.neg(B), SIMD.sqrt(D));
      var cond2 = SIMD.and(SIMD.greaterThan(t2, float32x4.zero()),
                           SIMD.lessThan(t2, isect.t));
      if (cond2.signMask) {

This will go into the 'if' branches if just one of the x4 compares comes out to true and then do the computation for all 4. Is this intended? I don't fully understand the code, so I might have it wrong.

In any case, it's better to always compare the .signMask property with an explicit value. signMask returns an int in the range 0x0..0xf. In other words the lower 4 bits indicate the 4 compare results.

Add shift operations to int32x4 values

When implementing sinx4() (see benchmarks/sinx4.js), I needed a shiftLeft() operation on int32x4 values. For completeness we should add these operations on int32x4 values:

shiftLeft()
shiftRight()
shiftRightArithmetic()

The behavior of float32x4 and uint32x4 called as a function rather than as a constructor

In http://www.ecma-international.org/ecma-262/5.1/#sec-15, the behavior of "The Built-in ECMAScript Objects Constructor Called as a Function" are different.

For Number, String, Boolean, Object, when they are called as a function rather than as a constructor, it performs a type conversion.

For Function, Array, when they are called as a function rather than as a constructor, it creates and initialises a new Function/Array object. Thus the function call is equivalent to the object creation expression new ... with the same arguments.

We think float32x4 and uint32x4 is more like a number than an array, so we create a primitive float32x4/uint32x4 type and do the type conversion. The current polyfill implementation treats them the same as no primitive float32x4/uint32x4 type in the JavaScript engine.

Define select semantics if mask's values are not only all 1 or all 0

int32x4.select and float32x4.select are intended to be used with masks that have either all bits set to 1 (0xFFFFFFFF) or all set to 0 (0x0). What happens if that is not the case?

Currently, if the input mask to select is, say int32x4(0x1, 0xF, 0xFF, 0xFFFF), for instance, the resulting output vector will be a strange mix of both inputs in each lane. Is that the intended behavior?

NaN canonicalization of float32x4 lane accesses

int32x4 bitcasted into a NaN float32x4 may pose semantics issues in runtimes that canonicalize NaNs. for example,

var m = int32x4.bool(true, true, true, true);
var n = SIMD.int32x4.bitsToFloat32x4(m);
var n2 = SIMD.float32x4.withX(n, n.x);
var m2 = SIMD.float32x4.bitsToInt32x4(n2);
var equal(m.x, m2.x); // won't be equal

See #36 and https://bugzilla.mozilla.org/show_bug.cgi?id=945382 for more details.

README documentation

May be useful to move the SIMD operations back to the README even if they're now under the SIMD module rather than the float32x4 object. Otherwise the polyfill code serves as the API documentation.

Add two register shuffle / interleave operations

These were recently added to Dart and need to be added to JS.

SIMD.float32x4.fromFloat64x2 incorrect type conversion

Current implementation just copies the double value without casting to float32.
SIMD.float32x4.fromFloat64x2 = function(t) {
checkFloat64x2(t);
var a = SIMD.float32x4.zero();
a.x_ = t.x_;
a.y_ = t.y_;
return a;
}

Using the float32x4 constructor fixes this.
SIMD.float32x4.fromFloat64x2 = function(t) {
checkFloat64x2(t);
var a = SIMD.float32x4(t.x_, t.y_, 0, 0);
return a;
}

Remove source type from conversion method names

With the introduction of the SIMD.float32x4 and SIMD.int32x4 subobjects, the source type on the conversion function names is redundant, e.g.

    SIMD.float32x4.float32x4toInt32x4()

Can be:

   SIMD.float32x4.toInt32x4()

Similarly for the other ones.

It even reads right :)

Reconsider name of ".toFloat32x4" and ".toUint32x4" methods

Rename Uint32x4 to Int32x4

What is the expected result of Number(float32x4Object)?

f4 = float32x4(1.0, 2.0, 3.0, 4.0);
n = Number(f4);

What is the expected result of n?

I am asking this question as we need to handle
f4 = float32x4(1.0, 2.0, 3.0, 4.0);
g4 = float32x4(f4, 2.0, 3.0, 4.0);
when introducing the float32x4 type, all the other types could be coerced to Number.

How will native code port on top of JS-SIMD?

With Emscripten, we have the capacity to port native C&C++ code to the web. When/if people read tweets along the lines of "JS has SIMD", it will invariably result in a stream of Emscripten developers attempting to port their MMX/SSE1/SSE2/... -based codebases over to JS-SIMD. We need to have an answer to these developers about what the support of mapping these constructs over to JS-SIMD looks like.

In the Emscripten compiler, we already have small bits of such SIMD support available. To chart what this mapping would look like for SSE1 in particular (focusing on just one instruction set spec to start with, and SSE1 is the most interesting one) when completed, I wrote up this spreadsheet: https://docs.google.com/spreadsheets/d/1QAGGf2M2IA6l4cvh8eTXdXGEUcPjdmTe_BLKGn5YCB4/edit?usp=sharing

As one can imagine, comparing the current spec and the set of SSE1 intrinsics listed in the above spreadsheet, there is a large gap. I wonder how this could be resolved?

Rename variables to match common expectations of types & dimensions.

Use integer property names such as "i", "j", "k" and "l". This has leaked in the developer world as commonly used for indexes, and is inherited from the mathematics where they are used as indexes or as well as unit vectors.

Use floating point property names such as "x", "y", "z" and "t" (instead of "w"). In physics the fourth dimension is considered to be the dimension of time (or space-time) in the space-time vector of special relativity.

Add compare operations on int32x4 values

When implementing the sinx4() function, I needed an equal() operation on int32x4 values. For completeness we should add all the 6 compare operations on int32x4 values:

lessThan()
lessThanOrEqual()
equal()
notEqual()
greaterThanOrEqual()
greaterThan()

Replace shuffle getters (xxxx, ..., wwww) with a shuffle method and mask

Introduce add, sub, and mul for uint32x4 type

There's several ways to do this:

Overload the existing operations to work on both float32x4 and uint32x4 operands

I don't like this approach. It will require the JIT generated code to insert checks on the operand types. The optimizing JIT compilers can probably hoist those checks out in most cases, but it does complicate doing inlining somewhat.

Introduce new names for float32x4 and uint32x4 operations, e.g. .addf4, addi4, etc

We'll need to add even more names when we start working on the AVX x8 types, so I'm not too fond of this solution either.

Introduce subobjects to the SIMD object, to hang the operations that works on different data types on, e.g. SIMD.float32x4.add(), SIMD.uint32x4.add(), etc.

I like this approach better than the two above.

What do you think?

The polyfill for float32x4() and uint32x4() shouldn't be invoked as a constructor

float32x4 will eventually be a value_object, so to create a float32x4 variable one would write:

var f4 = float32x4(1.0,2.0,3.0,4.0)

The current polyfill implementation assumes that new values are created with the 'new' operator, i.e.

function float32x4(x,y,z,w) {
this.storage_ = new Float32Array(4);
this.storage_[0] = x;
this.storage_[1] = y;
this.storage_[2] = z;
this.storage_[3] = w;
}

This should implemented like this instead:

function float32x4(x,y,z,w) {
var storage = new Float32Array(4);
storage[0] = x;
storage[1] = y;
storage[2] = z;
storage[3] = w;
return storage;
}

I think :)

Float32x4Array constructor doesn't work

load ('ecmascript_simd.js');
var f = new Float32Array(20);
var f4 = new Float32x4Array(f.buffer);
print (f4.length); // prints 0. 5 expected.

The problem is that Float32Array(a,b,undefined) is not equivalent to Float32Array(a,b)

This patch should fix the problem:

diff --git a/src/ecmascript_simd.js b/src/ecmascript_simd.js
index e441ca5..60fc2ef 100644
--- a/src/ecmascript_simd.js
+++ b/src/ecmascript_simd.js
@@ -195,13 +195,17 @@ function Float32x4Array(a, b, c) {
this.storage_[i] = a.storage_[i];
}
} else if (isArrayBuffer(a)) {

if ((b != undefined) && (b % Float32Array.BYTES_PER_ELEMENT) != 0) {
if ((b != undefined) && (b % Float32x4Array.BYTES_PER_ELEMENT) != 0) {
throw "byteOffset must be a multiple of 16.";
}
if (c != undefined) {
c *= 4;

 this.storage_ = new Float32Array(a, b, c);

}
else {

 // Note: new Float32Array(a, b) is NOT equivalent to new Float32Array(a, b, undefined)

 this.storage_ = new Float32Array(a, b);

}

this.storage_ = new Float32Array(a, b, c);
this.length_ = this.storage_.length / 4;
this.byteOffset_ = b != undefined ? b : 0;
} else {
diff --git a/src/ecmascript_simd_tests.js b/src/ecmascript_simd_tests.js
index 374b7cb..7fb8a26 100644
--- a/src/ecmascript_simd_tests.js
+++ b/src/ecmascript_simd_tests.js
@@ -436,7 +436,7 @@ test('uint32x4 and', function() {
equal(true, n.flagY);
equal(true, n.flagZ);
equal(true, n.flagW);
o = SIMD.and(m,n); // and
var o = SIMD.and(m,n); // and
equal(0x0, o.x);
equal(0x0, o.y);
equal(0x0, o.z);
@@ -472,7 +472,7 @@ test('uint32x4 xor', function() {
equal(0xAAAAAAAA, n.y);
equal(0xAAAAAAAA, n.z);
equal(0xAAAAAAAA, n.w);
o = SIMD.xor(m,n); // xor
var o = SIMD.xor(m,n); // xor
equal(0x0, o.x);
equal(0x0, o.y);
equal(0x0, o.z);
@@ -675,6 +675,7 @@ test('Float32Array view basic', function() {
equal(b.byteOffset, 0);
equal(c.byteOffset, 16);
equal(d.byteOffset, 0);

});

test('Float32Array view values', function() {
@@ -742,22 +743,26 @@ test('Float32Array view values', function() {
equal(start+3, d.getAt(0).w);
});

-test('Float32x4Array exceptions', function() {
+test('Float32x4Array exceptions', function () {
var a = new Float32x4Array(4);
var b = a.getAt(0);
var c = a.getAt(1);
var d = a.getAt(2);
var e = a.getAt(3);

throws(function() {
throws(function () {
var f = a.getAt(4);
});
throws(function() {
throws(function () {
var f = a.getAt(-1);
});
throws(function() {
throws(function () {
// Unaligned byte offset.
var f = new Float32x4Array(a.buffer, 15);
});
throws(function () {
// Unaligned byte offset, but aligned on 4. Bug
var f = new Float32x4Array(a.buffer, 4);
});
});

test('View on Float32x4Array', function() {

Add arithmetic operations to Uint32x4

Polyfill -- check runtime SIMD implementation status

Don't define SIMD polyfills if detect that the runtime has implemented float32x4 etc.
Enables same code to be run across different implementations.
If we don't want to keep this check in the polyfill, at least provide a best practice guide.

Add signMask getter to Uint32x4

Fast path testing that all lanes in a Float32x4 / Uint32x4 are zero, negative, positive, etc.

signMask should handle negative zero correctly

In current polyfill implementation, the negative zero is not handled in signMask.

For this case:

var a  = SIMD.float32x4(0.0, 0.0, 0.0, -0.0)
a.signMask

It is expected to print 8.
Currently polyfill implementation prints 0.

documentation of compare instructions inconsistent

I think the compare instructions take float32x4 arguments?

@param {uint32x4} t An instance of a float32x4.
@param {uint32x4} other An instance of a float32x4.

load vec2 and vec3

I was speaking with @sunfishcode in #asm.js and brought up a use case that I don't think is represented in the API yet.

Sometimes you want to use the platform's SIMD capabilities to work on two-vectors or four-vectors. (Or two two-vectors at a time.)

SSE supports roughly three mechanisms for loading two-vectors:

movq xmm0, [eax] ; load 64-bit quantity, zero high 64 bits
movlps xmm0, [eax] ; load 64-bit quantity, do not zero high 64 bits
movhps xmm0, [eax] ; load 64-bit into high 64 bits, do not zero low 64 bits

Loading three-vectors is hard. In the past I've done something like:

movss xmm0, [eax]
movhps xmm0, [eax+4]

This leaves the 3-vector in the register like [x, _, y, z], which is as fine as any other data layout.

I think it would be beneficial for the API to provide, in the least,

load a 3-vector into a vec4
load a 2-vector into the low components of a vec4
load a 2-vector into the high components of a vec4

Document of bitsToUint32x4 parameter is incorrect

@param {uint32x4} t An instance of a uint32x4 should be @param {float32x4} t An instance of a float32x4

https://github.com/johnmccutchan/ecmascript_simd/blob/master/src/ecmascript_simd.js#L420

Function to load consecutive values from an array

It seems like it would be useful to have something like SIMD.float32x4.load(array, offset) that is equivalent to SIMD.float32x4(array[offset+0], array[offset+1], array[offset+2], array[offset+3]). This is clearly equivalent but easier to optimize. Also, I would explicitly not want to require that offset be aligned in any particular way.

One open question is how accepting to be: Specifically, could array be any array-like or must it be a typed array / typed object array of suitable type?

mandelbrot feeds a float32x4 to SIMD.toFloat32x4

SIMD.toFloat32x4 should take an uint32x4 as parameter.
https://github.com/johnmccutchan/ecmascript_simd/blob/master/src/benchmarks/mandelbrot.js#L44

mulu32 implementation seems incorrect for w

Math.imul(a.w * b.w) should be Math.imul(a.w * b.w)?
https://github.com/johnmccutchan/ecmascript_simd/blob/master/src/ecmascript_simd.js#L559

add .shuffleMix

Implement Float32x4Array polyfill and tests

Use SIMD.addu32 for uin32x4 in mandelbrot benchmark

https://github.com/johnmccutchan/ecmascript_simd/blob/master/src/benchmarks/mandelbrot.js#L65
count4 = SIMD.add (count4, SIMD.and (mi4, one4)); should be count4 = SIMD.add (count4, SIMD.andu32 (mi4, one4));

Expose type conversion methods between Uint32x4 and Float32x4 types

float -> int truncate
float -> int round
int -> float

What is the behavior of "+","-", "*","/" two float32x4 or uint32x4 values?

Will we support "+","-", '**","/", Math.sqrt() and others on float32x4 and uint32x4 values? If yes, we need to extend current IC stubs and mine the type information from IC stubs (a lot of work). If no, will we coerce float32x4 value to NaN?

Add a .shuffleu32() function

To get the right typing we should add a .shuffleu32 method as well.

Is NaN a valid value for a float32x4 lane?

Could we write a=float32x4(NaN, NaN, NaN, NaN)?

Do we want to raise an exception when there is overflow, underflow or invalid operand for the SIMD operations? If yes, we need try and catch around SIMD operations. If no, a QNaN should be a valid lane value. A note is that the underflow and overflow exception are only for floating-point operations, no for Uint32 operations in Intel ISA manual.

We need to define the behavior in the spec.

License for polyfills

Please add a license to the polyfill files, so that they can be used in other projects (specifically I am starting to work on SIMD in emscripten now). MIT license would be nice :)

Define behavior for special Float32 values

The ES6 specification defines special behavior for NaN in many operations that are implemented as well for float32x4. For instance, Math.min returns NaN if any of the operands is NaN [1]. The polyfill doesn't have such behavior, and if you do

var x = SIMD.float32x4(1,2,3,4)
var y = SIMD.float32x4(NaN, NaN, NaN, NaN)

You'll have a different value if you do SIMD.float32x4.min(x, y) or SIMD.float32x4.min(y, x), which seems weird, with respect to the semantics of min (I'd expect min to be strictly commutative).

There might be other places where specific behavior should also be defined for other special Float32 values, such as +/- Infinity, +/- 0.

[1] http://people.mozilla.org/~jorendorff/es6-draft.html#sec-math.min

int32x4.zero() constructor is missing

float32x4 has one and int32x4 should have one too, for completeness and symmetry.

movemask for branching

A SIMD operation similar to movemask would be helpful for branching after a SIMD compare.

Type checking

The V8 runtime implementation throws an error if you call with the wrong type, for example a binary op with float32x4 w/o bitcasting; may be useful to have a debug version of the polyfill that checks using instanceof.

naming of conversion methods

if they're going to be on the simd module rather than the object, should the name include the from type?

(p.s. nitpick: might want to put all 4 conversion methods together, esp in the absence of API README doc, didn't see bitsToUint32x4 at first b/c not with the others)

Support loading/storing SIMD types from array buffer without 16-bytes alignment.

Problem statement

We have extended Typed Array View Types by Float32x4Array, Float64x2Array and Int32x4Array.

These SIMD typed array views load the SIMD data in 16-bytes alignment. In some use cases, it is very hard or impossible to arrange data in such way. These use cases require loading SIMD types from array buffer without 16-bytes alignment. It is similar to use C++ intrinsic _mm_loadu_ps/_mm_storeu_ps.

Possible solutions

There are two options to extend Typed Array Specification and one option to extend SIMD module.

Option 1: extend DataView interface

partial interface DataView {
    SIMD.float32x4 getFloat32x4(unsigned long byteOffset, optional boolean littleEndian);
    SIMD.float64x2 getFloat64x2(unsigned long byteOffset, optional boolean littleEndian);
    SIMD.float32x4 getFloat32x4(unsigned long byteOffset, optional boolean littleEndian);
    void setFloat32x4(unsigned long byteOffset, SIMD.float32x4 value, optional boolean littleEndian);
    void setFloat64x2(unsigned long byteOffset, SIMD.float64x2 value, optional boolean littleEndian);
    void setInt32x4(unsigned long byteOffset, SIMD.int32x4 value, optional boolean littleEndian);
};

Option 2: extend Typed Array Buffer View interface

partial interface Float32Array {
    SIMD.float32x4 getFloat32x4(unsigned long index);
    void setFloat32x4(unsigned long index, SIMD.float32x4 value);
};

partial interface Float64Array {
    SIMD.float64x2 getFloat64x2(unsigned long index);
    void setFloat64x2(unsigned long index, SIMD.float64x2 value);
};

partial interface Int32Array {
    SIMD.int32x4 getInt32x4(unsigned long index);
    void setInt32x4(unsigned long index, SIMD.int32x4 value);
};

Option 3: introduce memory load/store APIs in SIMD module:

SIMD.float32x4 SIMD.float32x4.load(Float32Array array, unsigned long index);
void SIMD.float32x4.store(Float32Array array, unsigned long index, SIMD.float32x4 value);

SIMD.float64x2 SIMD.float64x2.load(Float64Array array, unsigned long index);
void SIMD.float64x2.store(Float64Array array, unsigned long index, SIMD.float64x2 value);

SIMD.int32x4 SIMD.int32x4.load(Int32Array array, unsigned long index);
void SIMD.int32x4.store(Int32Array array, unsigned long index, SIMD.int32x4 value);

Add/plan for 256 bit and 512 bit SIMD functions?

Hi, since last year we have 256 bit (8*int32) SIMD ISA shipping (AVX2) in Haswell processors.. seems next year we will have also 512 bit SIMD support (i.e. 16xint32) in form of AVX512.. since executing 128 bit SIMD instructions on a 512 bit SIMD capable processor (Intel Skylake?) is only 25% efficient i.e. similar to currently no SIMD support on a SSE only processor (pre 2011 SandyBridge's) seems you should already plan adding Int32x8 and Int32x16 instructions which in case of say only 128 bit SIMD support by processor should be lowered to 2 int32x4 or 4 int32x4 instructions respectively.. Make sense?

Introduce a 'same value' initializer

I believe it's pretty common to initialize a float32x4 or uint32x4 value to have the same value in all lanes.

We could overload the initializer to do the right thing depending on the number of arguments, i.e.

var f4 = float32x4(1.0, 2.0, 3.0, 4.0); // different values in the 4 lanes
var ones4 = float32x4(1.0); // same value (1.0) in all lanes

Overloading the number of arguments shouldn't cause a perf issue, since this can be statically determined at JIT time.

uint32x4 withFlagZ test has incorrect assertion

equal(false, c.w); should be equal(false, c.flagW);
https://github.com/johnmccutchan/ecmascript_simd/blob/master/src/ecmascript_simd_tests.js#L394

Add logical operations to SIMD.float32x4 (and, or, xor, not)

I was coding up an implementation for a sinx4() function (computes the sin() value for all 4 lines), and had to do bit manipulation on float32x4 values. Doing this involves doing bit conversion to int32x4 of the operands and then bit conversion back to float32x4 of the result. The code becomes fairly unreadable, e.g.

 x = SIMD.int32x4.bitsToFloat32x4(SIMD.int32x4.and(SIMD.float32x4.bitsToInt32x4(x), _ps_inv_sign_mask));

With logical operations available on float32x4 it could be written as:

 x = SIMD.float32x4.and(x, _ps_inv_sign_mask));

Thoughts?