RFC: Multi-device Resources Summary This propo

Ok sure. That is fine then. So DeviceIndexBuffer will essentially be a

Based on the discussions here <a class="issue-link js-issue-link" data-error-text="Fai

Regarding ResourceView -derived classes (<code class="

Proposed RFC Feature: Multi-device Resources,about o3de/sig-graphics-audio

Comments (12)

moudgils commented on July 17, 2024 1

Ok sure. That is fine then. So DeviceIndexBuffer will essentially be a cached version of device specific data residing within Device*Items.
Yes. Basically avoid two versions of descriptors/requests structs if possible. Of course if there is no way around it we can have a device version but we want to make sure that the device version is for RHI use only and is hopefully not exposed to RPI.

from sig-graphics-audio.

moudgils commented on July 17, 2024 1

yeah we could push the multidevice functionality to RHI::SRGdata. It would keep RPI clean. I am not opposed to pushing the functionality down to RHI and basically have a RHI::MultiDeviceSRGData contian multiple RHI::DeviceSRGdata and RPI is only aware of RHI::MDSRGData. That can work too.

from sig-graphics-audio.

moudgils commented on July 17, 2024

Based on the discussions here #32 I think the RFC looks good for the most part. The design around DeviceBuffer and MultiDeviceBuffer sounds good. A few points below ->

AliasedHeap, AliasedAttachmentAllocator - They both can be used within the context of one device but they do cache transient resources so we will need to modify this to eventually store the transient resource associated with the device the existing heap is attached to. Hopefully extending TransientAttachmentPool to device specific pools will help resolve this issue.
CommandList/CommandQueue device specific management is missing at the moment.
When converting a struct/class that does not inherit from a DeviceObject we need to ask ourselves if the object contains properties that will differ across multiple devices. As an example lets look at IndexBufferView. It contains RHI::Buffer + some other properties related to the buffer. RHI::Buffer which will already incapsulate RHI::DeviceBuffer underneath it. Now do we think that m_byteOffset/m_byteCount/m_format (other properties) will change for different devices? If they don't then do we need to modify RHI::IndexBufferView at all? Same logic can be applied to other structs/objects.
Basically for all the affected Descriptor/Request type structs we should try to follow the same pattern where RPI will populate the data for all the devices and RHI will hopefully deal with device specific data in order to keep RPI api clean. As an example lets visit a struct like RHI::BufferInitRequest vs RHI::DeviceBufferInitRequest. It would be nice to have it setup such that RHI::BufferInitRequest is probably only going to be used by RPI to create an RHI::Buffer where as RHI::DeviceBufferInitRequest is setup to only build out one RHI::DeviceBuffer and should only be used by RHI. It would make sense to add and remove variables within RHI::BufferInitRequest/RHI::DeviceBufferInitRequest to that effect. For example BufferInitRequest could contain a device mask telling RHI how many DeviceBuffers to create. BufferDescriptor probably only makes sense to live within BufferInitRequest and not within DeviceBufferInitRequest. At Buffer::Init RHI will then use BufferInitRequest to build out multiple DeviceBufferInitRequest and populate RHI::Buffer as needed.
Other than adding samples to ASV for multi gpu support also consider extending all our in-game tooling around gpu for thie work. So gpu memory profiler, gpu pass tree visualizer, gpu profiler, etc.

from sig-graphics-audio.

jhmueller-huawei commented on July 17, 2024

Thank you for the response! Here are our thoughts to the points you raised:

From what we tried so far, it seemed enough to have a single-device and a multi-device version of TransientAttachmentPool as you wrote and no multi-device versions of AliasedHeap and AliasedAttachmentAllocator should be necessary. We could be wrong of course, but that should be easily resolvable if it becomes an issue after all.
We are not sure what exactly you miss here?

However, the class CommandQueue will actually handle multi-device classes and translate them to single-device objects, without needing a separate multi-device version, as we do not currently plan to be able to submit work to multiple devices at the same time. More details will be discussed further below.

Is that the part you were looking for, or are we misunderstanding something here?

The issue here is one of dependencies. Given that we established to have multi-device and single-device versions of DrawItem, which references an IndexBufferView, it is necessary to have both versions as well for IndexBufferView, otherwise the backends would again need a device index to get the proper buffer out of the device-specific DrawItem.
Sounds good. There are certainly some optimizations that can be done with the structs. If we understand it correctly in your example here, you would basically get rid of the DeviceBufferDescriptor and put all necessary information into the DeviceInitRequest directly?
In our sample pull request, we partially changed parts there already, but we didn't want to change too much. Our strategy was not to change anything in the UI, so we used total statistics over all device where applicable. But nicer tools here that allow you to investigate multiple devices individually will certainly be required.

We think our main goal here should be to get these changes in as quickly as possible so that any optimizations and additional features/improvements can be built on later. This concerns points 4 and 5 mainly. The reason behind this is again, that this whole thing is a considerable amount of changes and the development costs compound the longer it takes, since every rebase takes additional time and more changes/fixes.

from sig-graphics-audio.

martinwinter-huawei commented on July 17, 2024

Regarding ResourceView-derived classes (BufferView and ImageView), the current design intends not to provide "multi-device" variants of these classes, which necessitates changes to the API.
In particular, ShaderResourceGroupData currently holds arrays of ImageViews and BufferViews and has functionality to get/set individual and also many views at the same time.

In particular, the following methods and members would need to be updated:

Methods
- Get/Set*
  - *ImageView
  - *ImageViewArray
  - *ImageViewUnboundedArray
  - *BufferView
  - *BufferViewArray
  - *BufferViewUnboundedArray
  - *BindlessViews
  - *ImageGroup
  - *BufferGroup
- BindlessResourceViews
- Validate*View
- Validate*ViewAccess
Members
- m_imageViews
- m_bufferViews
- m_imageViewsUnboundedArray
- m_bufferViewsUnboundedArray
- m_bindlessResourceViews

The question is, how should this be realized, since these methods now need to be able to, instead of managing a *View, receive/supply a *ViewDescriptor and a MultiDevice* resource.

Proposed solutions

`AZStd::pair<MultiDeviceResource, ResourceViewDescriptor>`

Potentially the most effective and simplest way would be to use a pair, consisting of the corresponding multi-device resource and its view descriptor.
That way, the API would have minimal changes and would remain identical semantically, except at the caller site, where one would have to construct/deconstruct the pairs for further use.

Pass in resources separately

It would also be possible to pass in the resources as separate parameters, also storing them in separate arrays internally, but that would bring up the question of how to return said resource + descriptor.
By returning pointers to separate parameters, the API would need to change the most (i.e. changing getters to void and returning both resource + descriptor as reference).
A middle way would also be possible (passing in separate, store as pair and return as pair), but that solution seems most inconsistent.

We would prefer the option using pairs, but feedback would be appreciated 😊

from sig-graphics-audio.

moudgils commented on July 17, 2024

Just so I understand correctly let me go over how I am imagining the initial setup of resource binding will work for multi-gpu support at RPI level. Currently RPI has RPI::ShaderResourceGroup (RPI::Srg for short) which has a bunch of functionality but we only need to focus on two variables here -> RHI::ShaderResourceGroup (RHI::Srg for short) and RHI::ShaderResourceGroupData (RHI::SRGdata for short). RPI currently manages RHI::SRGdata and allows passes to populate it with resource views. This object is then fed into RHI::Srg which uses it to compile and bind all the resource to the native RHI backend object (i.e Descriptor set/descriptor table/argument table).

With multi gpu support I am assuming that each pass at the RPI level still is only meant to be run on one device. Since one pass can only run on one device then a pass can contain RPI::SRG that is only meant to hold views for one device only. This actually makes code at the RPI level simpler. RPI::SRG can behave the same way where one RPI::SRG can have still only have one RHI::SRGdata and one RHI::Srg. RHI::SRG does not need to be split to RHI::DeviceSRG as it will only ever be used on one device. RPI::SRG api related to Get/Set views can stay the same and it's contents can stay the same. The only thing that will change is the higher level code that is invoking this api. The higher level pass code knows which device it will need to run on (from the render pipeline). Lets say this device index is called m_deviceIndex and lives within Pass class. Now any code that is invoking the api of RPI::SRG it will need to get the resource view via the device index. Based on our last conversation since the view hierarchy did not go through Device split I am assuming it is still DX12::BufferView->RHI::BufferView->RHI::ResourceView->RHI::DeviceObject->RHI::Object and RHI::DeviceResource has cache of all the created RH::ResourceViews. So GetResourceView will need to be expanded to take a device index. It will use the device index to cache the RHI::ResourceView within the correct RHI::DeviceResource. So RPI (i.e the higher level code) will provide the device index when creating/getting resource view. This RHI::ResourceView can then be used by the RPI::SRG and no api changes will be needed.

Let me know if this is confusing and if anything needs expanding or if I missed something myself and am completely on the wrong track.

from sig-graphics-audio.

jhmueller-huawei commented on July 17, 2024

It's not confusing, but unfortunately, you are on the wrong track. It's true that if passes only run on a single device you would not need multi-device SRGs. But passes are not the only place where SRGs are used. Think of the scene SRG which is created independently from a pass and is very likely needed on multiple devices. Another way you can think of it: if there are no multi-device SRGs, you cannot have multi-device Draw-/DispatchItems and we already had a lengthy discussion about why we need those. So, unfortunately, we need a multi-device SRG. Given this, could you please again consider the API changes proposed (i.e. using a pair or passing in as separate parameters).

from sig-graphics-audio.

moudgils commented on July 17, 2024

Ahh I see your point about multi-device SRG. Ok Let me backtrack a bit. In order to support multi-device RHI::SRGs this is how I would think the new design may look like. Remember that RPI::Srg is the one that populates RHI::SRGdata which is just a container object and passes it to RHI::Srg. Lets say that now RHI::Srg goes through the split to RHI::DeviceSRG and now RHI::SRG can contain multiple DX12::SRGs. Same thing applies to RHI::ShaderResourceGroupPool.

RPI::Srg will now need to manage multiple RHI::SRGdata, one per device. So RHI::SRGdata api will not change but RPI::SRG api will need to change. I think your suggestion can work here. So RPI::SRG's Get/Set methods can be modified to take in azstd::pair<MultiDeviceResource, ResourceViewDescriptor> but it needs to know the device index which will tell it which RHI::SRGdata to update. Maybe it can get the device index from the ResourceViewDescriptor? You would also want to keep the current api where Get/Set methods also take in a RHI::ResourceView as the higher level code may want to keep a cached view around and just want to pass that in instead of a pair.

So now we have RPI::SRG that is now managing multiple RHI::SRGdata and one multidevice RHI::SRG. Next we will need to change RHI::SRG's Compile api. Currently RHI::SRG takes a RHI::SRGdata and essentially updates the native RHI object from that. We will need change the api to now take multiple RHI::SRGdata and internally it will call Compile on DX12::SRG with the correct RHI::SRGdata. So just to conclude we will need to modify RPI:SRG api and RHI::SRG's compile api. Does this work?

from sig-graphics-audio.

jhmueller-huawei commented on July 17, 2024

Thanks for your suggestions! What you are describing is basically what we have in mind for the multi-device SRGD class, except that you plan to implement it in the RPI::SRG instead. This would work, but we think it would still be nice to have a MDSRGD class to separate this functionality from the RPI, have it testable separately and consistent with the other multi-device classes which are implemented on the RHI rather than the RPI level. If you don't have any strong objections to having a separate multi-device SRGD class, we will implement it like this. You are of course right that the API of the RPI::SRG has to be adapted later then as well to reflect these changes in the MDSRGD class, when we port the RPI to use the MD classes.

from sig-graphics-audio.

moudgils commented on July 17, 2024

Since this RFC is accepted please open a PR and move this RFC to this folder - https://github.com/o3de/sig-graphics-audio/tree/main/rfcs where we will track all the new RFCs. Thanks.

from sig-graphics-audio.

jhmueller-huawei commented on July 17, 2024

Since this RFC is accepted please open a PR and move this RFC to this folder - https://github.com/o3de/sig-graphics-audio/tree/main/rfcs where we will track all the new RFCs. Thanks.

Here you go: #137

from sig-graphics-audio.

jhmueller-huawei commented on July 17, 2024

@moudgils I've been working on using the MD resource classes in the RPI. Doing that I ran into the issue that I needed to use MD classes in the scope attachments. We also do so in our prototype. However, as I was thinking about it more, I realized that we might not want that.

The basic idea that we are following is that a scope (and thus a pass) is attached to one device. In the prototype for example, the MD SRGs are submitted for compilation by the passes. But that means depending on the device mask of that MD SRG, multiple SD SRGs are unnecessarily compiled, even though only one of them is required for the specific task. The better solution would be to transition from MD to SD already when submitting for compilation in the passes rather than between compilation and execution.

This would mean, that the passes need to be more aware of the device they are running on, i.e., call device specific methods of the RPI classes (passing their device index) that store the MD objects. An example here is the following (bringing us back to the scope attachments that triggered my thinking around this): The CompileResources method in the passes which is called during frame graph compilation may get an image view from the compile context via a scope attachment (which then would already be SD, not MD) and then binds this in an RPI SRG. That method would need to get the device index from the SD image view and only update the corresponding SD SRG.

What are your thoughts about this?

from sig-graphics-audio.

Proposed RFC Feature: Multi-device Resources about sig-graphics-audio HOT 12 OPEN

Comments (12)

Proposed solutions

`AZStd::pair<MultiDeviceResource, ResourceViewDescriptor>`

Pass in resources separately

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent