Comments (12)
At first glance, the code seems fine. Do you have any problem when using this function? Please attach the oneDNN verbose log if you need our assistance.
from onednn.
Hi @feixuedudiao, seems to be a bit of misunderstanding. oneDNN primitive caching is enabled by default via the cmake options: ONEDNN_ENABLE_PRIMITIVE_CACHE
during build. For example when you create a oneDNN primitive such as convolution, oneDNN internally checks if an identical primitive (with same parameters) already exists in cache. If the op was not found in cache it will be created.
As @shu1chen mentioned, you have the ability to control the capacity (i.e total size of cached ops) via the runtime env variable: ONEDNN_PRIMITIVE_CACHE_CAPACITY
you don't need to manually add/remove primitives from cache, oneDNN will handle it.
If you would like to profile your workload and check for cache hits/misses you can use oneDNN verbose mode set to profile / all:
ONEDNN_VERBOSE=profile ./test_iface_primitive_cache
from onednn.
Hi, do you mean the primitive cache? Please see the documentation of the primitive cache. It's enabled by default when building oneDNN (-DONEDNN_ENABLE_PRIMITIVE_CACHE=ON), and normally, you don't have to change the code. You may see primitive cache is working in the verbose log when creating the primitives with symbols create:cache_miss
or create:cache_hit
.
The environment variable ONEDNN_PRIMITIVE_CACHE_CAPACITY
can be used to change cache capacity or disable the cache at run-time. Some functions also work well to set and get the primitive cache capacity. They are well documented and it doesn't take much effort to directly use the functions, so they are not in the examples.
from onednn.
yes, i check it thanks.
from onednn.
can i use che primitive cache with this example? the code is above:
void fill_primitive_cache(int n) {
using tag = memory::format_tag;
using dt = memory::data_type;
engine eng(get_test_engine_kind(), 0);
for (int i = 0; i < n; i++) {
// fill primitive cache with n primitives
auto md = memory::desc({i, 1, 1, 1}, dt::f32, tag::nchw);
auto relu_pd = eltwise_forward::primitive_desc(eng,
prop_kind::forward_inference, algorithm::eltwise_relu, md, md,
0.f, 0.f);
auto relu = eltwise_forward(relu_pd);
}
}
TEST(primitive_cache_test, TestDefaultCapacity) {
custom_unsetenv("ONEDNN_PRIMITIVE_CACHE_CAPACITY");
custom_unsetenv("DNNL_PRIMITIVE_CACHE_CAPACITY");
auto default_capacity = get_primitive_cache_capacity();
#ifndef DNNL_DISABLE_PRIMITIVE_CACHE
ASSERT_EQ(default_capacity, 1024);
#else
ASSERT_EQ(default_capacity, 0);
#endif
}
from onednn.
i have a question. Which of the two reasoning methods, graph and primitive, has the best performance, for CPU?
from onednn.
i have a question. Which of the two reasoning methods, graph and primitive, has the best performance, for CPU?
It's hard to say, they both have some advantages. It depends on the usage scenario and problem type. We don't have an answer to this general question. Do you have a specific use case?
from onednn.
At first glance, the code seems fine. Do you have any problem when using this function? Please attach the oneDNN verbose log if you need our assistance.
no, the code is the example for test_iface_primitive_cache.cpp。
from onednn.
i have a question. Which of the two reasoning methods, graph and primitive, has the best performance, for CPU?
It's hard to say, they both have some advantages. It depends on the usage scenario and problem type. We don't have an answer to this general question. Do you have a specific use case?
for example human segmention with chn.
from onednn.
In the depthwise_convolution example, how to set the cache primitive for memory? the code is above:
void depthwise_convolution_example(dnnl::engine::kind engine_kind) {
// Create execution dnnl::engine.
dnnl::engine engine(engine_kind, 0);
// Create dnnl::stream.
dnnl::stream engine_stream(engine);
// Tensor dimensions.
const memory::dim N = 3, // batch size G = 32, // channel groups
IC = 32, // input channels
IH = 13, // input height
IW = 13, // input width
OC = 32, // output channels
KH = 3, // weights height
KW = 3, // weights width
PH_L = 1, // height padding: left
PH_R = 1, // height padding: right
PW_L = 1, // width padding: left
PW_R = 1, // width padding: right
SH = 4, // height-wise stride
SW = 4, // width-wise stride
OH = (IH - KH + PH_L + PH_R) / SH + 1, // output height
OW = (IW - KW + PW_L + PW_R) / SW + 1; // output width
// Source (src), weights, bias, and destination (dst) tensors dimensions.
memory::dims src_dims = {N, IC, IH, IW};
memory::dims weights_dims = {G, OC / G, IC / G, KH, KW};
memory::dims bias_dims = {OC};
memory::dims dst_dims = {N, OC, OH, OW};
// Strides, padding dimensions.
memory::dims strides_dims = {SH, SW};
memory::dims padding_dims_l = {PH_L, PW_L};
memory::dims padding_dims_r = {PH_R, PW_R};
// Allocate buffers.
std::vector src_data(product(src_dims));
std::vector weights_data(product(weights_dims));
std::vector bias_data(OC);
std::vector dst_data(product(dst_dims));
// Initialize src, weights, and dst tensors.
std::generate(src_data.begin(), src_data.end(), { static int i = 0;
return std::cos(i++ / 10.f); });
std::generate(weights_data.begin(), weights_data.end(), { static int i = 0;
return std::sin(i++ * 2.f); });
std::generate(bias_data.begin(), bias_data.end(), { static int i = 0;
return std::tanh(float(i++)); });
// Create memory objects for tensor data (src, weights, dst). In this // example, NCHW layout is assumed for src and dst, and OIHW for weights.
set_primitive_cache_capacity(N);
auto user_src_mem = memory({src_dims, dt::f32, tag::nchw}, engine);
set_primitive_cache_capacity(G);
auto user_weights_mem = memory({weights_dims, dt::f32, tag::goihw}, engine); auto user_dst_mem = memory({dst_dims, dt::f32, tag::nchw}, engine);
// Create memory descriptors with format_tag::any for the primitive. This // enables the convolution primitive to choose memory layouts for an
// optimized primitive implementation, and these layouts may differ from the // ones provided by the user.
auto conv_src_md = memory::desc(src_dims, dt::f32, tag::any);
auto conv_weights_md = memory::desc(weights_dims, dt::f32, tag::any); auto conv_dst_md = memory::desc(dst_dims, dt::f32, tag::any);
// Create memory descriptor and memory object for input bias.
auto user_bias_md = memory::desc(bias_dims, dt::f32, tag::a);
set_primitive_cache_capacity(OC);
auto user_bias_mem = memory(user_bias_md, engine);
// Write data to memory object's handle.
write_to_dnnl_memory(src_data.data(), user_src_mem); write_to_dnnl_memory(weights_data.data(), user_weights_mem); write_to_dnnl_memory(bias_data.data(), user_bias_mem);
// Create primitive post-ops (ReLU).
const float alpha = 0.f;
const float beta = 0.f;
post_ops conv_ops;
conv_ops.append_eltwise(algorithm::eltwise_relu, alpha, beta); primitive_attr conv_attr;
conv_attr.set_post_ops(conv_ops);
// Create primitive descriptor.
auto conv_pd = convolution_forward::primitive_desc(engine, prop_kind::forward_inference, algorithm::convolution_direct,
conv_src_md, conv_weights_md, user_bias_md, conv_dst_md, strides_dims, padding_dims_l, padding_dims_r, conv_attr);
// For now, assume that the src, weights, and dst memory layouts generated // by the primitive and the ones provided by the user are identical. auto conv_src_mem = user_src_mem;
auto conv_weights_mem = user_weights_mem;
auto conv_dst_mem = user_dst_mem;
// Reorder the data in case the src and weights memory layouts generated by // the primitive and the ones provided by the user are different. In this // case, we create additional memory objects with internal buffers that will // contain the reordered data. The data in dst will be reordered after the // convolution computation has finalized.
if (conv_pd.src_desc() != user_src_mem.get_desc()) { conv_src_mem = memory(conv_pd.src_desc(), engine);
reorder(user_src_mem, conv_src_mem) .execute(engine_stream, user_src_mem, conv_src_mem); }
if (conv_pd.weights_desc() != user_weights_mem.get_desc()) { conv_weights_mem = memory(conv_pd.weights_desc(), engine);
reorder(user_weights_mem, conv_weights_mem) .execute(engine_stream, user_weights_mem, conv_weights_mem); }
if (conv_pd.dst_desc() != user_dst_mem.get_desc()) { conv_dst_mem = memory(conv_pd.dst_desc(), engine); }
// Create the primitive.
auto conv_prim = convolution_forward(conv_pd);
// Primitive arguments.
std::unordered_map<int, memory> conv_args;
conv_args.insert({DNNL_ARG_SRC, conv_src_mem});
conv_args.insert({DNNL_ARG_WEIGHTS, conv_weights_mem});
conv_args.insert({DNNL_ARG_BIAS, user_bias_mem});
conv_args.insert({DNNL_ARG_DST, conv_dst_mem});
// Primitive execution: convolution with ReLU.
conv_prim.execute(engine_stream, conv_args);
// Reorder the data in case the dst memory descriptor generated by the // primitive and the one provided by the user are different.
if (conv_pd.dst_desc() != user_dst_mem.get_desc()) { reorder(conv_dst_mem, user_dst_mem) .execute(engine_stream, conv_dst_mem, user_dst_mem); } else user_dst_mem = conv_dst_mem;
// Wait for the computation to finalize.
engine_stream.wait();
// Read data from memory object's handle.
read_from_dnnl_memory(dst_data.data(), user_dst_mem); }
from onednn.
i have a question. Which of the two reasoning methods, graph and primitive, has the best performance, for CPU?
It's hard to say, they both have some advantages. It depends on the usage scenario and problem type. We don't have an answer to this general question. Do you have a specific use case?
for example human segmention with chn.
@feixuedudiao This is still too general. The comparison between oneDNN primitive and graph is similar the comparison between PyTorch/Tensorflow eager mode and graph mode, some paper and technical articles (1, 2, 3, 4, 5) may help you understand their differences and decide which programming model to start with.
from onednn.
@shu1chen thanks
from onednn.
Related Issues (20)
- Security.md: replace incorrect email address HOT 1
- Build failure on AArch64 due to brgemm_matmul_t HOT 3
- which case can report "No configurations found." HOT 9
- Why is the convolution performance of bf16 using opencl very low? HOT 3
- Bad speed for f32:s8:f32 matmul HOT 11
- How can I create a matmul primitive with A16W8 (active 16bits, weight 8bits) configuration? HOT 2
- [Proposal] Add cpu alloc/free callback to support customlize memory alloctor APIs. HOT 3
- Assertion `dynamic_cast<derived_type>(base) == base' failed HOT 3
- Why do the "reorder" operations of the same operator take very different times on the CPU and GPU platforms? HOT 3
- [ACL] 3D convolution kernel `NEConv3D` is not integrated
- INT8 Performance difference between OneDNN v2.6.3 and v3.4.1 HOT 1
- Possible null pointer dereference in cpu_reorder_pd
- Assertion failure in brgemm in debug build on G3 aarch64 machine HOT 2
- question about matmul_perf example HOT 2
- Information regarding threading backend in oneDNN HOT 1
- could not create a primitive descriptor iterator HOT 3
- cpu: s390x: build fails with saturate was not declared in this scope HOT 7
- Enabling onednn Graph API from framework level HOT 1
- Conditions for Running brgemm_convolution_fwd_t and jit_avx512_common_convolution_fwd_t in oneDNN HOT 3
- oneDNN with Nvidia GPU supprt
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onednn.