Comments (18)
Thanks @leofang @rgommers . I would suggest we move on with kDLComplex for now then. The SOA convention would need a bit more thoughts as so far we can only see it being well-defined for compact format. Perhaps worth open another thread to do so
from dlpack.
I am sorry, but is the second layout (splitting a complex array into two real arrays) ever seen in practice?
Matlab is the one I am familiar with, because it was annoying to convert to/from interleaved complex when I was working with C MEX files several years ago. However, it looks like now they also support the interleaved representation.
from dlpack.
In terms of complexity, I don't think math function will be more complex in either forms. SoA approach would simply correspond to load from both arrays, compute then store. Again in such cases SoA is more friendly for vectorization.
The use of SoA is quite real in many projects. @hawkinsp might be able to say more but I believe this is the scheme used by JAX. Technically, this would apply to all specialized accelerators that does not have vector shuffle instructions as well (e.g. TPU).
Additionally, given that there is an increasing amount of floating point representations(e.g. float32, float64, bfloat16, posit). Choosing SoA repr might help simplify the repr a bit. So that we do not have to grow a corresponding complex number support per floating point data type.
The above issues are only going to be a bigger problem as more specialized accelerators are being produced. So if we really want to consider support for more hardwares, I would give SoA a serious consideration.
Standardization should be buyin from the frameworks and hardware. In this case, it would be great to have a conversation about it.
from dlpack.
@rgommers With data-apis/array-api#105 in mind, I'd like to see this issue resolved asap for the benefit of Array API standard, especially if you're gonna add DLPack support to NumPy.
Thanks for the ping, I had missed this thread. I think there's some time (6-12 months for next version I'd say), but yes would be nice to keep progressing this now that we have some good momentum.
Regardless, I think we'll need at least two more types added to
DLDataTypeCode
, saykDLComplexSoA
andkDLComplexAoS
.
Adding two distinct data types seems to make sense if there's two relatively common representations with their own advantages. And then if the consumer doesn't support the format the producer offers, it gets a choice of doing the copy (perhaps while emitting a warning for efficiency) or raising an error.
Making a choice of AoS vs. SoA seems to only make sense if there's a high chance that existing libraries will change their implementation (which seems unlikely).
from dlpack.
The names in gh-58 (kDLComplex
and kDLComplexS
) seems reasonable to me, better than appending SoA
and AoS
probably.
from dlpack.
f32
and f64
definitely, and that seems straightforward. An 8 byte kDLComplex
being backed by f32
seems unambiguous (and same for a 16 byte kDLComplex
). For a 4 byte one, it's unclear if that'd use f16
or bf16
given the current DLPack structure. As far as I can tell, that cannot be expressed right now, and given how rare complex32
still is that's probably worth putting out of scope now.
from dlpack.
Thank you Peter for bringing this up! This is something missing for a long time.
There are essentially two different memory layouts to store NDArrays of complex numbers.
- For each complex number, the real and imaginary parts are placed contiguously in memory. For example, a complex 32-bit complex fp can be seen as: (f32, f32) -> f64.
- A complex tensor is a tuple of two tensors, one represents the real part, and the other stands for the imaginary part.
This would be great if we take both representations into consideration :-)
from dlpack.
This would be great if we take both representations into consideration :-)
I am sorry, but is the second layout (splitting a complex array into two real arrays) ever seen in practice? Could you name a few notable examples? It violates any principle we'd learn in CS 101 to preserve memory contiguity in order to maximize the memory load performance. HPC people certainly wouldn't do this at all.
In practice, AFAIK none of the programming languages (C/C++/Fortran/Python/etc) or libraries (CUDA/HIP/NumPy/PyTorch/CuPy/etc) follows the 2nd approach; complex numbers are always implemented as an (aligned) struct of two real numbers so that a complex array is an array of structs (AoS).
In many other applications structs of arrays (SoA) are preferred, yes, but not in this case due to the memory load requirement and the arithmetic intensity associated with complex numbers.
from dlpack.
@leofang Yes, I know of at least one real software/hardware stack that uses a SoA-ish representations for complex numbers. The hardware in question makes arbitrary indexing expensive (e.g., an offset of one float), but indexing over tiles of numbers cheap (e.g., a vector length worth of floats). So on that hardware you should use tiles of real and imaginary numbers (i.e., pretty much a structure of arrays) for alignment reasons.
That said, there's no DLPack support on that hardware, but it's something to think about.
from dlpack.
Thanks @leofang @hawkinsp for great discussions. From the computation's PoV structure of array is actually better in many cases. This is due to the presence of vector/tensor instructions. Most of the modern architetcure have vector units for floating points, but not necessarily complex numbers.
Think about complex multiplications operations as follows:
c.real = a.real * b.real - a.img * b.img
c.img = a.real * b.img + a.img * b.real
If we want to make the best use of a vector unit to carry out the above computation, we will inevitably need to put the real and img part into different vectors (so above operation can be vectorized). In a typical CPU setting, the implementation of such AoS complex operations are more involved, and usually require load of two vectors that have mixed real and img parts and shuffle them to get the real and img vector. Storing data as SoA actually simplifies the implementation for both libraries and compilers as the parts are already stored separately.
This can be a real concern when we start to support accelerators that operates on vectors or even tiles of matrices(@hawkinsp 's example ) and vector shuffle instruction is not necessary available.
From a locality point of view, storing data in SoA does not bring as such a burden, because the access to both real and img part are still local to the cache. The reality is that SoA would remove the need of vector shuffling thus achieving better memory loading perf.
So while from the conventional's pov AoS is certainly more commonly used as a storage format, SoA does bring benefit in most cases in terms of the computational PoV
from dlpack.
Thanks @hawkinsp @tqchen @grlee77 for replies!
On the special hardware/software stack that uses SoA: Is that a real thing or still a project on paper? I'd love to learn more specifics (say, the name) if possible 🙂 But, while it makes sense to consider the benefit of vectorization instructions in such a setup, in more complex situations such as implementing mathematical functions for complex numbers, it's much more difficult than AoS as far as I understand.
Let me reiterate that the vast majority of languages, libraries, and frameworks out there support the AoS storage by default. @grlee77's Matlab example is a perfect addition to the examples I gathered above: Matlab is moving to abandon SoA in favor of AoS. Adding support for such a common case should be a low hanging fruit and I don't see why this has to be a blocker to move on and close this issue. For example, we could just add a new entry kDLComplex
to DLDataTypeCode
, and it'd be straightforward to define DLDataType
to match single complex
and double complex
(as in C99).
If there is a serious need to support SoA complex types, we can always revisit and revise the DLPack standard in the future, though it's nontrivial to me at the moment to support such a thing in DLPack...(Another justification for leaving it aside for now.)
from dlpack.
@tqchen Sorry I dropped the ball.
Performance on the SoA/AoS differences aside, in practice how would you adopt DLPack for SoA? As I said earlier I think it's completely capable of describing AoS given that vector types can be natively handled by using lanes
, but I still don't see a straightforward way to include SoA. Perhaps you need to tweak shape
(and/or strides
) together with lanes
in order to achieve it?
Regardless, I think we'll need at least two more types added to DLDataTypeCode
, say kDLComplexSoA
and kDLComplexAoS
. (The naming is admittedly poor but you get the point 😂) The reason is that SoA and AoS cannot be exchanged with zero-copy. So effectively what I was saying above is that adding kDLComplexAoS
is straightforward and can be done as of today (it's literally just a PR with one-line change I can do right away).
@rgommers With data-apis/array-api#105 in mind, I'd like to see this issue resolved asap for the benefit of Array API standard, especially if you're gonna add DLPack support to NumPy.
from dlpack.
Thanks, @rgommers.
With data-apis/array-api#105 in mind, I'd like to see this issue resolved asap for the benefit of Array API standard, especially if you're gonna add DLPack support to NumPy.
Thanks for the ping, I had missed this thread. I think there's some time (6-12 months for next version I'd say), but yes would be nice to keep progressing this now that we have some good momentum.
Yes, especially describing AoS is so simple and can be immediately used for NumPy/CuPy/Numba/etc that I just don't see why we have to block it before we figure out a way to describe SoA in DLPack. Let's move with the momentum 🚀
kDLComplexSoA
andkDLComplexAoS
@rgommers So do you have suggestion for better names? 🙂
from dlpack.
Thanks, it would be great to discuss how could we handle different underlying data types though. Do we want to support complex number that are backed by f32
, f64
or (f16
, bf16
) in the future?
from dlpack.
@tqchen I think you brought this up earlier but I still don't understood your question. Complex numbers can already be supported in exactly the same way as integers and floats with different bit widths. The only (obvious) requirement is both real and imaginary parts need to have the same format.
from dlpack.
Ah, in this case we should avoid potential ambiguity and prefer using fp16 for complex32 for consistency. If guided by the requirement of zero copy, it seems necessary to me to define an additional kDLBComplex
type for complex numbers backed by bfloats.
from dlpack.
Sounds good. Will create a new issue after #58 is in.
from dlpack.
Sounds good. Will create a new issue after #58 is in.
See #60.
from dlpack.
Related Issues (20)
- Add support to Axelera accelerator HOT 3
- [NOTICE] ABI Update For adding Version to DLPack HOT 10
- Give DLDeviceType a sentinel value HOT 1
- [Feature Request] protobuf dlpack support
- Question about API choices
- Next steps for the Python API? HOT 7
- How to share data without requiring consumer to "own" the input tensor? HOT 2
- [DISCUSS] Common DLPack Harness for Web HOT 1
- [Request] Update PyPI package with interfaces defined in dlpack.py
- [DISCUSS] Are Sparse NdArrays out of scope? HOT 2
- Numpy implementation link invalid
- Unclear if tensor deleter requires the caller to take the GIL HOT 2
- Consider giving Cython example HOT 4
- Prefer explicitly-sized ints over enums
- Question about expectation on cross-hardware HOT 4
- Add view/copy informational flag HOT 1
- Possible to share dlpack tensor across python process? HOT 1
- Update DLPack diagram in the docs HOT 3
- [REQUEST] `kDLROCMManaged` type (analogue to `kDLCUDAManaged`) HOT 1
- Circumnavigation in the Python DLPack docs HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dlpack.