Comments (12)
Let's say you have
supported partition
->unsupported partition
->supported partition
.The output of the first supported partition would be input to your op corresponding to the first unsupported partition. The output of that operator (whose implementation would be handled by you) will be input to the second supported partition.
Thanks for your response.
So it sound like I will need to do the scheduling of the partitions myself. Or I could identify collections of partitions that could be run together and schedule them as a group. Sounds doable.
But if only the graph API handled all of the operators it documents, it would be so much easier!
Any chance that could be something that would be undertaken by the oneDNN graph API team? I had originally thought of the graph API as a replacement for the non-graph oneDNN API, but it is not unless all the operators are unconditionally supported.
from onednn.
@TaoLv, thank you for your comment and question.
A framework provides building blocks that an application can use to implement their application. Recently we decided to try the oneDNN framework. Originally we were using the "primitive" oneDNN version, but then it seemed that the graph API would give better performance, so we decided to try it. We were going to use the primitive version for operators not supported by the graph API, but use the graph API for those it supports. It then surprised us that the graph API did not accept some of the operators advertised as being supported by the graph API.
It seems that the situation with the graph API is one of trial and error. Build a model using the operators that are documented as being supported. Then try compiling and running it on the graph API, but receive errors of unsupported partitions. Then implement those unsupported partitions (though they only contain operators that were documented as supported) outside the graph API. The thing that is confusing is that a given operator might be accepted in some contexts in the graph API, but not others. So one cannot really say whether an operator is supported by the graph API or not.
If the alternate to having the graph API support single operators is to use the primitive API, then it seems there should be some semi-automatic way to glue the graph API to the primitive API such that to the graph API users it appeared that the graph API accepted all its documented operators in all circumstances.
I supposed the situation might be somewhat better if the documentation told under what circumstances an operator could be used. But then that documentation would have to change whenever new fusion passes were added. Of course, we can always read the oneDNN source code, but one would hope the user's don't have to do that. The other alternative is the trial and error approach mentioned above.
I hope you can see that the situation would be much less confusing and easier for the user if all of the documented operators were accepted all of the time. The user tuning for performance would still want to look at the actual partitions produced, to see if the expected fusions were happening, but at least they would have a functional model. If the single operators were at least implemented with the performance of a "primitive" oneDNN API call, then that would be as good as the user is likely to get anyway, unless they implemented the kernel themselves.
from onednn.
Hi @richard-lemurian, sorry for late reply. What you described looks like a new use scenario for us. I will need to discuss with my colleagues about how to improve the usability of graph API to cover different usages. But so far, I don't have a concrete plan yet. Besides that, I still have a few things to clarify:
Recently we decided to try the oneDNN framework.
Not sure if this is a terminology issue. But as clearly mentioned in the README, oneDNN is a performance library, not a framework for end-to-end model enabling. Deep learning practitioners are suggested to use frameworks and toolkits (PyTorch, TensorFlow, OpenVINO, etc.) with oneDNN enabled. The main differences are:
- oneDNN focuses on performance optimizations.
- oneDNN focuses on key operations and layers which are far less compared to frameworks and toolkits.
- oneDNN does not provide the graph executors which is an essential part of frameworks to handle operation execution and dependencies.
Then try compiling and running it on the graph API, but receive errors of unsupported partitions.
The decision of support or not support happens at the graph partitioning stage which is relatively early in frameworks. They still have a lot of chances to handle the unsupported partitions and fallback to other implementations. From this perspective, "unsupported partition" is not an error, it's a designed communication mechanism with the callers. Btw, even the library returns a partition as supported, I see in some framework integrations, they still have the flexibility to decide not using the library and calling other implementations.
If the single operators were at least implemented with the performance of a "primitive" oneDNN API call
Currently we don't have the guarantee that each graph operation can be implemented with existing primitive API. If we commit to support the usage requested here, there will be quite a few development work in front of us and in the future if we add new operations into the opset.
Do you think there is some semi-automatics way that the "primitive" operator implementations can be glued into the graph API implementation?
We can try to see if we can provide a patch if StaticReshape/StaticTranspose are the only interest here.
Lastly, as oneDNN only supports a small subset of deep learning, with deep learning frameworks in mind, we thought that users would always have reference implementations if an operator is not supported by the library. But now I agree that it may not be that easy if one wants to create an application from scratch without reference implementations like in frameworks. They will want to the library to provide as many functionality as possible.
Thank you again for the detailed explanation and patience!
CC @igorsafo
from onednn.
@TaoLv, Great! Thanks so much. I will study these references.
from onednn.
Hi @richard-lemurian , thank you for the question and looking into the implementation details. Your observation and analysis look correct to me. Currently there are a bunch of operations defined in the spec and library (https://oneapi-src.github.io/oneDNN/graph_supported_operations.html).
From backend implementation perspective, it's possible that an operation is passed to the library but not supported by the backend (not fused with other operations and not implemented as single op partition). Those operations will still be returned to users but as "unsupported" partitions. That's the purpose of the API partition::is_supported()
. Users need to check the supporting status of a partition via the API before compiling it. If a partition is not supported by the library, users will have to handle it by themselves. This is documented in the API documentation.
For StaticReshape and StaticTranspose mentioned in this issue, they are supported by the backend via fusions (eg, 1, 2, 3, etc.) but no single op implementation if the fusion dose not match.
from onednn.
Hi @richard-lemurian , thank you for the question and looking into the implementation details. Your observation and analysis look correct to me. Currently there are a bunch of operations defined in the spec and library (https://oneapi-src.github.io/oneDNN/graph_supported_operations.html). From backend implementation perspective, it's possible that an operation is passed to the library but not supported by the backend (not fused with other operations and not implemented as single op partition). Those operations will still be returned to users but as "unsupported" partitions. That's the purpose of the API
partition::is_supported()
. Users need to check the supporting status of a partition via the API before compiling it. If a partition is not supported by the library, users will have to handle it by themselves.
Thanks for your response.
It is unfortunate that the graph API cannot handle all of its advertised operators in arbitrary circumstances. If some cannot be fused, it would be good if they were automatically supported using the corresponding non-graph oneDNN API.
As it is, using the graph API gives me a collection of partitions, some supported, and others not supported. And there are data dependencies between these. So if I have to provide alternate implementations of the unsupported partitions, is there a way to integrate those alternate implementations with those that are supported? Do oneDNN streams do scheduling based on data dependencies, waiting to execute some partitions until their dependent partitions have been executed? I noticed the following:
/// @brief Stream flags.
typedef enum {
// In-order execution.
dnnl_stream_in_order = 0x1U,
/// Out-of-order execution.
dnnl_stream_out_of_order = 0x2U,
/// Default stream configuration.
dnnl_stream_default_flags = dnnl_stream_in_order,
...
} dnnl_stream_flags_t;
So I'm guessing that in the dnnl_stream_in_order mode, the partitions are executed one after another. But if dnnl_stream_out_of_order is selected, does that mean that oneDNN would start some partitions in parallel, but schedule them to respect data dependencies?
I'm just trying to figure out how to handle the case of mixed supported and unsupported partitions.
from onednn.
So if I have to provide alternate implementations of the unsupported partitions, is there a way to integrate those alternate implementations with those that are supported
Let's say you have supported partition
-> unsupported partition
-> supported partition
.
The output of the first supported partition would be input to your op corresponding to the first unsupported partition.
The output of that operator (whose implementation would be handled by you) will be input to the second supported partition.
So I'm guessing that in the dnnl_stream_in_order mode, the partitions are executed one after another. But if dnnl_stream_out_of_order is selected, does that mean that oneDNN would start some partitions in parallel, but schedule them to respect data dependencies
On CPU, dnnl::stream
only supports out-of-order execution with SYCL runtime, but the default runtime is OpenMP.
from onednn.
So, could this issue be turned into an enhancement request?
from onednn.
Hi @richard-lemurian,
oneDNN has less operations than frameworks (eg, PyTorch or TensorFlow), whether in graph API or non-graph API (a.k.a primitive API). It means the frameworks or applications will have to handle the unsupported operations by themselves while calling oneDNN to optimize those supported and optimized ones. Could you please explain a bit more why handling all documented operations will make it easier for you, especially when you still have many undocumented operations to handle?
Another thing is about performance expectation. oneDNN is a performance library targeting the optimizations for key operations/layers in deep learning applications. That's also one of the reasons of why it does not support all operations under arbitrary circumstances. Take this StaticReshape/StaticTranspose as an example, we provide optimizations when they can be fused with other operations. But we don't see much benefit (and did not receive any request) of handling a single StaticReshape/StaticTranspose in the library vs. calling the existing implementations in the frameworks. So when you request for the implementations for all the operations, are you requesting for both functionality and performance? Do you see a performance issue if the "unsupported" partitions?
Thank you!
from onednn.
@TaoLv, does having this issue assigned to you mean that you will work on an implementation of the enhancement, or just that you were assigned to triage it? It would be exciting if you were going to work on it. Do you think there is some semi-automatics way that the "primitive" operator implementations can be glued into the graph API implementation?
from onednn.
@TaoLv, thanks for your further explanations. Can you point me to how and where existing frameworks integrate with the oneDNN graph API into their framework? I was using as a model of using the graph API the example code in oneDNN/examples/graph. I suppose you would say that we are building a framework, so it would be helpful to know how other frameworks were interacting with the oneDNN graph API.
from onednn.
Hi @richard-lemurian , here are a few links for your information:
- PyTorch: https://github.com/pytorch/pytorch/tree/main/torch/csrc/jit/codegen/onednn
- Intel Extension for PyTorch: https://github.com/intel/intel-extension-for-pytorch/tree/main/csrc/cpu/jit/codegen/onednn
- Intel Extension for TensorFlow: https://github.com/intel/intel-extension-for-tensorflow/tree/main/itex/core/graph/onednn_graph
from onednn.
Related Issues (20)
- Bad speed for f32:s8:f32 matmul HOT 11
- How can I create a matmul primitive with A16W8 (active 16bits, weight 8bits) configuration? HOT 2
- [Proposal] Add cpu alloc/free callback to support customlize memory alloctor APIs. HOT 3
- Assertion `dynamic_cast<derived_type>(base) == base' failed HOT 3
- Why do the "reorder" operations of the same operator take very different times on the CPU and GPU platforms? HOT 3
- [ACL] 3D convolution kernel `NEConv3D` is not integrated
- INT8 Performance difference between OneDNN v2.6.3 and v3.4.1 HOT 1
- Possible null pointer dereference in cpu_reorder_pd
- Assertion failure in brgemm in debug build on G3 aarch64 machine HOT 3
- question about matmul_perf example HOT 2
- Information regarding threading backend in oneDNN HOT 1
- could not create a primitive descriptor iterator HOT 5
- cpu: s390x: build fails with saturate was not declared in this scope HOT 7
- Enabling onednn Graph API from framework level HOT 1
- Conditions for Running brgemm_convolution_fwd_t and jit_avx512_common_convolution_fwd_t in oneDNN HOT 3
- oneDNN with Nvidia GPU supprt
- batchnorm requires consistent in- and output mem format_tags HOT 1
- Build fail with CPU_RUNTIME=SEQ and graph compiler backend HOT 2
- OneDNN graph APi for LLM generation HOT 7
- Understand the document on block level APIs(https://github.com/oneapi-src/oneDNN/pull/1852) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onednn.