Summary StaticReshape or StaticTranspose not supported as document

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Graphs with single StaticReshape or StaticTranspose fail about onednn HOT 12 OPEN

richard-lemurian commented on June 10, 2024

Graphs with single StaticReshape or StaticTranspose fail

from onednn.

Comments (12)

richard-lemurian commented on June 10, 2024 1

Let's say you have supported partition -> unsupported partition -> supported partition.

The output of the first supported partition would be input to your op corresponding to the first unsupported partition. The output of that operator (whose implementation would be handled by you) will be input to the second supported partition.

Thanks for your response.

So it sound like I will need to do the scheduling of the partitions myself. Or I could identify collections of partitions that could be run together and schedule them as a group. Sounds doable.

But if only the graph API handled all of the operators it documents, it would be so much easier!

Any chance that could be something that would be undertaken by the oneDNN graph API team? I had originally thought of the graph API as a replacement for the non-graph oneDNN API, but it is not unless all the operators are unconditionally supported.

from onednn.

richard-lemurian commented on June 10, 2024 1

@TaoLv, thank you for your comment and question.

A framework provides building blocks that an application can use to implement their application. Recently we decided to try the oneDNN framework. Originally we were using the "primitive" oneDNN version, but then it seemed that the graph API would give better performance, so we decided to try it. We were going to use the primitive version for operators not supported by the graph API, but use the graph API for those it supports. It then surprised us that the graph API did not accept some of the operators advertised as being supported by the graph API.

It seems that the situation with the graph API is one of trial and error. Build a model using the operators that are documented as being supported. Then try compiling and running it on the graph API, but receive errors of unsupported partitions. Then implement those unsupported partitions (though they only contain operators that were documented as supported) outside the graph API. The thing that is confusing is that a given operator might be accepted in some contexts in the graph API, but not others. So one cannot really say whether an operator is supported by the graph API or not.

If the alternate to having the graph API support single operators is to use the primitive API, then it seems there should be some semi-automatic way to glue the graph API to the primitive API such that to the graph API users it appeared that the graph API accepted all its documented operators in all circumstances.

I supposed the situation might be somewhat better if the documentation told under what circumstances an operator could be used. But then that documentation would have to change whenever new fusion passes were added. Of course, we can always read the oneDNN source code, but one would hope the user's don't have to do that. The other alternative is the trial and error approach mentioned above.

I hope you can see that the situation would be much less confusing and easier for the user if all of the documented operators were accepted all of the time. The user tuning for performance would still want to look at the actual partitions produced, to see if the expected fusions were happening, but at least they would have a functional model. If the single operators were at least implemented with the performance of a "primitive" oneDNN API call, then that would be as good as the user is likely to get anyway, unless they implemented the kernel themselves.

from onednn.

TaoLv commented on June 10, 2024 1

Hi @richard-lemurian, sorry for late reply. What you described looks like a new use scenario for us. I will need to discuss with my colleagues about how to improve the usability of graph API to cover different usages. But so far, I don't have a concrete plan yet. Besides that, I still have a few things to clarify:

Recently we decided to try the oneDNN framework.

Not sure if this is a terminology issue. But as clearly mentioned in the README, oneDNN is a performance library, not a framework for end-to-end model enabling. Deep learning practitioners are suggested to use frameworks and toolkits (PyTorch, TensorFlow, OpenVINO, etc.) with oneDNN enabled. The main differences are:

oneDNN focuses on performance optimizations.
oneDNN focuses on key operations and layers which are far less compared to frameworks and toolkits.
oneDNN does not provide the graph executors which is an essential part of frameworks to handle operation execution and dependencies.

Then try compiling and running it on the graph API, but receive errors of unsupported partitions.

The decision of support or not support happens at the graph partitioning stage which is relatively early in frameworks. They still have a lot of chances to handle the unsupported partitions and fallback to other implementations. From this perspective, "unsupported partition" is not an error, it's a designed communication mechanism with the callers. Btw, even the library returns a partition as supported, I see in some framework integrations, they still have the flexibility to decide not using the library and calling other implementations.

If the single operators were at least implemented with the performance of a "primitive" oneDNN API call

Currently we don't have the guarantee that each graph operation can be implemented with existing primitive API. If we commit to support the usage requested here, there will be quite a few development work in front of us and in the future if we add new operations into the opset.

Do you think there is some semi-automatics way that the "primitive" operator implementations can be glued into the graph API implementation?

We can try to see if we can provide a patch if StaticReshape/StaticTranspose are the only interest here.

Lastly, as oneDNN only supports a small subset of deep learning, with deep learning frameworks in mind, we thought that users would always have reference implementations if an operator is not supported by the library. But now I agree that it may not be that easy if one wants to create an application from scratch without reference implementations like in frameworks. They will want to the library to provide as many functionality as possible.

Thank you again for the detailed explanation and patience!

CC @igorsafo

from onednn.

richard-lemurian commented on June 10, 2024 1

@TaoLv, Great! Thanks so much. I will study these references.

from onednn.

TaoLv commented on June 10, 2024

Hi @richard-lemurian , thank you for the question and looking into the implementation details. Your observation and analysis look correct to me. Currently there are a bunch of operations defined in the spec and library (https://oneapi-src.github.io/oneDNN/graph_supported_operations.html).
From backend implementation perspective, it's possible that an operation is passed to the library but not supported by the backend (not fused with other operations and not implemented as single op partition). Those operations will still be returned to users but as "unsupported" partitions. That's the purpose of the API partition::is_supported(). Users need to check the supporting status of a partition via the API before compiling it. If a partition is not supported by the library, users will have to handle it by themselves. This is documented in the API documentation.
For StaticReshape and StaticTranspose mentioned in this issue, they are supported by the backend via fusions (eg, 1, 2, 3, etc.) but no single op implementation if the fusion dose not match.

from onednn.

richard-lemurian commented on June 10, 2024

Hi @richard-lemurian , thank you for the question and looking into the implementation details. Your observation and analysis look correct to me. Currently there are a bunch of operations defined in the spec and library (https://oneapi-src.github.io/oneDNN/graph_supported_operations.html). From backend implementation perspective, it's possible that an operation is passed to the library but not supported by the backend (not fused with other operations and not implemented as single op partition). Those operations will still be returned to users but as "unsupported" partitions. That's the purpose of the API partition::is_supported(). Users need to check the supporting status of a partition via the API before compiling it. If a partition is not supported by the library, users will have to handle it by themselves.

Thanks for your response.

It is unfortunate that the graph API cannot handle all of its advertised operators in arbitrary circumstances. If some cannot be fused, it would be good if they were automatically supported using the corresponding non-graph oneDNN API.

As it is, using the graph API gives me a collection of partitions, some supported, and others not supported. And there are data dependencies between these. So if I have to provide alternate implementations of the unsupported partitions, is there a way to integrate those alternate implementations with those that are supported? Do oneDNN streams do scheduling based on data dependencies, waiting to execute some partitions until their dependent partitions have been executed? I noticed the following:

/// @brief Stream flags.
typedef enum {
    // In-order execution.
    dnnl_stream_in_order = 0x1U,
    /// Out-of-order execution.
    dnnl_stream_out_of_order = 0x2U,
    /// Default stream configuration.
    dnnl_stream_default_flags = dnnl_stream_in_order,
   ...
} dnnl_stream_flags_t;

So I'm guessing that in the dnnl_stream_in_order mode, the partitions are executed one after another. But if dnnl_stream_out_of_order is selected, does that mean that oneDNN would start some partitions in parallel, but schedule them to respect data dependencies?

I'm just trying to figure out how to handle the case of mixed supported and unsupported partitions.

from onednn.

sanchitintel commented on June 10, 2024

So if I have to provide alternate implementations of the unsupported partitions, is there a way to integrate those alternate implementations with those that are supported

Let's say you have supported partition -> unsupported partition -> supported partition.

The output of the first supported partition would be input to your op corresponding to the first unsupported partition.
The output of that operator (whose implementation would be handled by you) will be input to the second supported partition.

So I'm guessing that in the dnnl_stream_in_order mode, the partitions are executed one after another. But if dnnl_stream_out_of_order is selected, does that mean that oneDNN would start some partitions in parallel, but schedule them to respect data dependencies

On CPU, dnnl::stream only supports out-of-order execution with SYCL runtime, but the default runtime is OpenMP.

from onednn.

richard-lemurian commented on June 10, 2024

So, could this issue be turned into an enhancement request?

from onednn.

TaoLv commented on June 10, 2024

Hi @richard-lemurian,
oneDNN has less operations than frameworks (eg, PyTorch or TensorFlow), whether in graph API or non-graph API (a.k.a primitive API). It means the frameworks or applications will have to handle the unsupported operations by themselves while calling oneDNN to optimize those supported and optimized ones. Could you please explain a bit more why handling all documented operations will make it easier for you, especially when you still have many undocumented operations to handle?

Another thing is about performance expectation. oneDNN is a performance library targeting the optimizations for key operations/layers in deep learning applications. That's also one of the reasons of why it does not support all operations under arbitrary circumstances. Take this StaticReshape/StaticTranspose as an example, we provide optimizations when they can be fused with other operations. But we don't see much benefit (and did not receive any request) of handling a single StaticReshape/StaticTranspose in the library vs. calling the existing implementations in the frameworks. So when you request for the implementations for all the operations, are you requesting for both functionality and performance? Do you see a performance issue if the "unsupported" partitions?

Thank you!

from onednn.

richard-lemurian commented on June 10, 2024

@TaoLv, does having this issue assigned to you mean that you will work on an implementation of the enhancement, or just that you were assigned to triage it? It would be exciting if you were going to work on it. Do you think there is some semi-automatics way that the "primitive" operator implementations can be glued into the graph API implementation?

from onednn.

richard-lemurian commented on June 10, 2024

@TaoLv, thanks for your further explanations. Can you point me to how and where existing frameworks integrate with the oneDNN graph API into their framework? I was using as a model of using the graph API the example code in oneDNN/examples/graph. I suppose you would say that we are building a framework, so it would be helpful to know how other frameworks were interacting with the oneDNN graph API.

from onednn.

TaoLv commented on June 10, 2024

Hi @richard-lemurian , here are a few links for your information:

PyTorch: https://github.com/pytorch/pytorch/tree/main/torch/csrc/jit/codegen/onednn
Intel Extension for PyTorch: https://github.com/intel/intel-extension-for-pytorch/tree/main/csrc/cpu/jit/codegen/onednn
Intel Extension for TensorFlow: https://github.com/intel/intel-extension-for-tensorflow/tree/main/itex/core/graph/onednn_graph

from onednn.

Graphs with single StaticReshape or StaticTranspose fail about onednn HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent