Code Monkey home page Code Monkey logo

Comments (18)

tqchen avatar tqchen commented on May 18, 2024 2

Thanks @leofang @rgommers . I would suggest we move on with kDLComplex for now then. The SOA convention would need a bit more thoughts as so far we can only see it being well-defined for compact format. Perhaps worth open another thread to do so

from dlpack.

grlee77 avatar grlee77 commented on May 18, 2024 1

I am sorry, but is the second layout (splitting a complex array into two real arrays) ever seen in practice?

Matlab is the one I am familiar with, because it was annoying to convert to/from interleaved complex when I was working with C MEX files several years ago. However, it looks like now they also support the interleaved representation.

from dlpack.

tqchen avatar tqchen commented on May 18, 2024 1

In terms of complexity, I don't think math function will be more complex in either forms. SoA approach would simply correspond to load from both arrays, compute then store. Again in such cases SoA is more friendly for vectorization.

The use of SoA is quite real in many projects. @hawkinsp might be able to say more but I believe this is the scheme used by JAX. Technically, this would apply to all specialized accelerators that does not have vector shuffle instructions as well (e.g. TPU).

Additionally, given that there is an increasing amount of floating point representations(e.g. float32, float64, bfloat16, posit). Choosing SoA repr might help simplify the repr a bit. So that we do not have to grow a corresponding complex number support per floating point data type.

The above issues are only going to be a bigger problem as more specialized accelerators are being produced. So if we really want to consider support for more hardwares, I would give SoA a serious consideration.

Standardization should be buyin from the frameworks and hardware. In this case, it would be great to have a conversation about it.

from dlpack.

rgommers avatar rgommers commented on May 18, 2024 1

@rgommers With data-apis/array-api#105 in mind, I'd like to see this issue resolved asap for the benefit of Array API standard, especially if you're gonna add DLPack support to NumPy.

Thanks for the ping, I had missed this thread. I think there's some time (6-12 months for next version I'd say), but yes would be nice to keep progressing this now that we have some good momentum.

Regardless, I think we'll need at least two more types added to DLDataTypeCode, say kDLComplexSoA and kDLComplexAoS.

Adding two distinct data types seems to make sense if there's two relatively common representations with their own advantages. And then if the consumer doesn't support the format the producer offers, it gets a choice of doing the copy (perhaps while emitting a warning for efficiency) or raising an error.

Making a choice of AoS vs. SoA seems to only make sense if there's a high chance that existing libraries will change their implementation (which seems unlikely).

from dlpack.

rgommers avatar rgommers commented on May 18, 2024 1

The names in gh-58 (kDLComplex and kDLComplexS) seems reasonable to me, better than appending SoA and AoS probably.

from dlpack.

rgommers avatar rgommers commented on May 18, 2024 1

f32 and f64 definitely, and that seems straightforward. An 8 byte kDLComplex being backed by f32 seems unambiguous (and same for a 16 byte kDLComplex). For a 4 byte one, it's unclear if that'd use f16 or bf16 given the current DLPack structure. As far as I can tell, that cannot be expressed right now, and given how rare complex32 still is that's probably worth putting out of scope now.

from dlpack.

junrushao avatar junrushao commented on May 18, 2024

Thank you Peter for bringing this up! This is something missing for a long time.

There are essentially two different memory layouts to store NDArrays of complex numbers.

  1. For each complex number, the real and imaginary parts are placed contiguously in memory. For example, a complex 32-bit complex fp can be seen as: (f32, f32) -> f64.
  2. A complex tensor is a tuple of two tensors, one represents the real part, and the other stands for the imaginary part.

This would be great if we take both representations into consideration :-)

from dlpack.

leofang avatar leofang commented on May 18, 2024

This would be great if we take both representations into consideration :-)

I am sorry, but is the second layout (splitting a complex array into two real arrays) ever seen in practice? Could you name a few notable examples? It violates any principle we'd learn in CS 101 to preserve memory contiguity in order to maximize the memory load performance. HPC people certainly wouldn't do this at all.

In practice, AFAIK none of the programming languages (C/C++/Fortran/Python/etc) or libraries (CUDA/HIP/NumPy/PyTorch/CuPy/etc) follows the 2nd approach; complex numbers are always implemented as an (aligned) struct of two real numbers so that a complex array is an array of structs (AoS).

In many other applications structs of arrays (SoA) are preferred, yes, but not in this case due to the memory load requirement and the arithmetic intensity associated with complex numbers.

from dlpack.

hawkinsp avatar hawkinsp commented on May 18, 2024

@leofang Yes, I know of at least one real software/hardware stack that uses a SoA-ish representations for complex numbers. The hardware in question makes arbitrary indexing expensive (e.g., an offset of one float), but indexing over tiles of numbers cheap (e.g., a vector length worth of floats). So on that hardware you should use tiles of real and imaginary numbers (i.e., pretty much a structure of arrays) for alignment reasons.

That said, there's no DLPack support on that hardware, but it's something to think about.

from dlpack.

tqchen avatar tqchen commented on May 18, 2024

Thanks @leofang @hawkinsp for great discussions. From the computation's PoV structure of array is actually better in many cases. This is due to the presence of vector/tensor instructions. Most of the modern architetcure have vector units for floating points, but not necessarily complex numbers.

Think about complex multiplications operations as follows:

c.real = a.real * b.real - a.img * b.img
c.img = a.real * b.img + a.img * b.real

If we want to make the best use of a vector unit to carry out the above computation, we will inevitably need to put the real and img part into different vectors (so above operation can be vectorized). In a typical CPU setting, the implementation of such AoS complex operations are more involved, and usually require load of two vectors that have mixed real and img parts and shuffle them to get the real and img vector. Storing data as SoA actually simplifies the implementation for both libraries and compilers as the parts are already stored separately.

This can be a real concern when we start to support accelerators that operates on vectors or even tiles of matrices(@hawkinsp 's example ) and vector shuffle instruction is not necessary available.

From a locality point of view, storing data in SoA does not bring as such a burden, because the access to both real and img part are still local to the cache. The reality is that SoA would remove the need of vector shuffling thus achieving better memory loading perf.

So while from the conventional's pov AoS is certainly more commonly used as a storage format, SoA does bring benefit in most cases in terms of the computational PoV

from dlpack.

leofang avatar leofang commented on May 18, 2024

Thanks @hawkinsp @tqchen @grlee77 for replies!

On the special hardware/software stack that uses SoA: Is that a real thing or still a project on paper? I'd love to learn more specifics (say, the name) if possible 🙂 But, while it makes sense to consider the benefit of vectorization instructions in such a setup, in more complex situations such as implementing mathematical functions for complex numbers, it's much more difficult than AoS as far as I understand.

Let me reiterate that the vast majority of languages, libraries, and frameworks out there support the AoS storage by default. @grlee77's Matlab example is a perfect addition to the examples I gathered above: Matlab is moving to abandon SoA in favor of AoS. Adding support for such a common case should be a low hanging fruit and I don't see why this has to be a blocker to move on and close this issue. For example, we could just add a new entry kDLComplex to DLDataTypeCode, and it'd be straightforward to define DLDataType to match single complex and double complex (as in C99).

If there is a serious need to support SoA complex types, we can always revisit and revise the DLPack standard in the future, though it's nontrivial to me at the moment to support such a thing in DLPack...(Another justification for leaving it aside for now.)

from dlpack.

leofang avatar leofang commented on May 18, 2024

@tqchen Sorry I dropped the ball.

Performance on the SoA/AoS differences aside, in practice how would you adopt DLPack for SoA? As I said earlier I think it's completely capable of describing AoS given that vector types can be natively handled by using lanes, but I still don't see a straightforward way to include SoA. Perhaps you need to tweak shape (and/or strides) together with lanes in order to achieve it?

Regardless, I think we'll need at least two more types added to DLDataTypeCode, say kDLComplexSoA and kDLComplexAoS. (The naming is admittedly poor but you get the point 😂) The reason is that SoA and AoS cannot be exchanged with zero-copy. So effectively what I was saying above is that adding kDLComplexAoS is straightforward and can be done as of today (it's literally just a PR with one-line change I can do right away).

@rgommers With data-apis/array-api#105 in mind, I'd like to see this issue resolved asap for the benefit of Array API standard, especially if you're gonna add DLPack support to NumPy.

from dlpack.

leofang avatar leofang commented on May 18, 2024

Thanks, @rgommers.

With data-apis/array-api#105 in mind, I'd like to see this issue resolved asap for the benefit of Array API standard, especially if you're gonna add DLPack support to NumPy.

Thanks for the ping, I had missed this thread. I think there's some time (6-12 months for next version I'd say), but yes would be nice to keep progressing this now that we have some good momentum.

Yes, especially describing AoS is so simple and can be immediately used for NumPy/CuPy/Numba/etc that I just don't see why we have to block it before we figure out a way to describe SoA in DLPack. Let's move with the momentum 🚀

kDLComplexSoA and kDLComplexAoS

@rgommers So do you have suggestion for better names? 🙂

from dlpack.

tqchen avatar tqchen commented on May 18, 2024

Thanks, it would be great to discuss how could we handle different underlying data types though. Do we want to support complex number that are backed by f32, f64 or (f16, bf16) in the future?

from dlpack.

leofang avatar leofang commented on May 18, 2024

@tqchen I think you brought this up earlier but I still don't understood your question. Complex numbers can already be supported in exactly the same way as integers and floats with different bit widths. The only (obvious) requirement is both real and imaginary parts need to have the same format.

from dlpack.

leofang avatar leofang commented on May 18, 2024

Ah, in this case we should avoid potential ambiguity and prefer using fp16 for complex32 for consistency. If guided by the requirement of zero copy, it seems necessary to me to define an additional kDLBComplex type for complex numbers backed by bfloats.

from dlpack.

leofang avatar leofang commented on May 18, 2024

Sounds good. Will create a new issue after #58 is in.

from dlpack.

leofang avatar leofang commented on May 18, 2024

Sounds good. Will create a new issue after #58 is in.

See #60.

from dlpack.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.