Comments (6)
Hi Manodeep,
Thanks for your feedback!
The macros in vector.h
were originally written for SSE2 and subsequently extended for AVX/AVX2/AVX512/AltiVec, mostly via copy-paste, so any inexistant intrinsics would only ever get caught if we tried using them, and I don't think we're using that operation anywhere.
Regarding the vec_fabs
macro, I think that's @james-s-willis's code; I'll let him comment on it :)
Cheers, Pedro
from swift.
Hi @manodeep,
First of all thanks for the support!
Regarding the vec_and
wrapper, you are correct _mm512_and_ps
doesn't exist. That wrapper is not actually used anymore and was never used for AVX512
, we need to remove it. We mainly use vec_and_mask
which maps to _mm512_maskz_mov_ps
.
vec_fabs
should map to _mm512_abs_ps
, we will change that.
Masked loads with mask(z)_load
sound interesting. We have not looked at using those for remainder loops but we will now. In your examples do you set the mask to true for the loop iterations divisible by the SIMD length? Which means the instruction reverts to a normal load? And set the mask appropriately for the remainder iterations?
Also, how do you support this functionality in AVX
and AVX2
where I am guessing the instructions are not supported?
Thanks,
James
from swift.
Here's how my SIMD intrinsics work with AVX512F
masked loads
Copy-pasting the effective code (note that single and double precision are supported with the following):
/* Stuff in headers */
const uint16_t masks_per_misalignment_value_float[] = {
0b1111111111111111,
0b0000000000000001,
0b0000000000000011,
0b0000000000000111,
0b0000000000001111,
0b0000000000011111,
0b0000000000111111,
0b0000000001111111,
0b0000000011111111,
0b0000000111111111,
0b0000001111111111,
0b0000011111111111,
0b0000111111111111,
0b0001111111111111,
0b0011111111111111,
0b0111111111111111};
const uint8_t masks_per_misalignment_value_double[] = {
0b11111111,
0b00000001,
0b00000011,
0b00000111,
0b00001111,
0b00011111,
0b00111111,
0b01111111};
#ifdef DOUBLE_PREC
/* calculate in doubles */
#define DOUBLE double
#define AVX512_NVEC 8
#define AVX512_FLOATS __m512d
#define AVX512_MASKZ_LOAD_FLOATS_UNALIGNED(MASK, X) _mm512_maskz_loadu_pd(MASK, X)
#else
/* calculate with floats */
#define DOUBLE float
#define AVX512_NVEC 16
#define AVX512_FLOATS __m512
#define AVX512_MASKZ_LOAD_FLOATS_UNALIGNED(MASK, X) _mm512_maskz_loadu_ps(MASK, X)
#endif
/* end of stuff in headers */
/* Begin kernel code */
for(int64_t j=n_off;j<N1;j+=AVX512_NVEC) {
AVX512_MASK m_mask_left = (N1 - j) >= AVX512_NVEC ? ~0:masks_per_misalignment_value_DOUBLE[N1-j];
/* Perform a mask load -> does not touch any memory not explicitly set via mask */
const AVX512_FLOATS m_x1 = AVX512_MASKZ_LOAD_FLOATS_UNALIGNED(m_mask_left, localx1);
...
}
Of course such masked loads are not supported by AVX(2)
. You can mimick such masked loads by implementing partial loads based on the remainder loop. For instance, the partial loads implemented in the vectorclass library by Agner Fog.
from swift.
Another set of new AVX512F
instructions that might be helpful for you guys could be the _mm512_mask(z)_compress_p(s/d)
and then a _mm512_mask_reduce_add_p(s/d)
(only with intel compilers) for a horizontal sum across the vector register.
from swift.
We could make use of masked loads in our code, however we want to support AVX/AVX2
instruction sets. I will look at how Agner Fog implements partial loads.
We use _mm512_mask_compressstoreu_ps
to left-pack vectors and _mm512_reduce_add_ps
for horizontal adds but have never made use of _mm512_mask(z)_compress_p(s/d)
and _mm512_mask_reduce_add_p(s/d)
. But they could be useful to us.
from swift.
AFAICS, _mm512_reduce_add_ps
operations are a combination of multiple instructions. So it is unclear to me that a loop-unrolling (since the trip-count is fixed) will be much slower. Didn't make too much of difference in my case, and I opted for portability (as in, compilers other than icc
) over slight loss of performance.
from swift.
Related Issues (20)
- Energy Conservation in giant impact simulations HOT 1
- HM80 ice sitting on HM80 hydrogen helium in giant impact simulations HOT 2
- Can't compile swift with MPI VELOCIraptor HOT 10
- metis.h doesn't exist for ParMETIS
- Interacting unsorted cells HOT 15
- Compilation error due to MPI_Waitall HOT 2
- port to ARM system
- Mistake in BlobTest_3D example HOT 3
- Planetary impact simulation failing after random amount of time: "error: [05595.1] xmf.c:xmf_prepare_file():86: Unable to open temporary file" HOT 4
- Swift webstorage is inaccessible HOT 2
- SIDM Branch not compiling due to parallel issues HOT 5
- Configure output does not match config.log HOT 1
- Precision problem when reading ICs HOT 1
- Can't find METIS library HOT 12
- README.md link to onboarding guide is broken HOT 2
- Onboarding guide could suggest better configure flags HOT 20
- Configure SWIFT with VELOCIraptor HOT 1
- Issue configuring SWIFT. HOT 3
- Issue Encountered Running autogen.sh Script in SWIFT-master Directory HOT 2
- Initial baryon temperature of small_cosmo_volume example seems wrong
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from swift.