openvinotoolkit / onednn Goto Github PK
View Code? Open in Web Editor NEWThis project forked from oneapi-src/onednn
oneAPI Deep Neural Network Library (oneDNN)
Home Page: https://01.org/dnnl
License: Apache License 2.0
This project forked from oneapi-src/onednn
oneAPI Deep Neural Network Library (oneDNN)
Home Page: https://01.org/dnnl
License: Apache License 2.0
Disclaimer: I'm not very familiar with this library, or with neural network processing in general. I'm working with a model developed by a sister company (created using PyTorch, converted to OpenVINO IR from ONNX) and we are getting a persistent crash when running on Windows with a large number of concurrent streams. With some effort I narrowed down the cause and identified a fix that worked for us, but there are probably much more elegant ways to fix this, hence this is an issue report instead of a pull request.
The cause of the crash seems to be this instruction:
00000179D623120F C4 E3 6D 4A 18 00 vblendvps ymm3,ymm2,ymmword ptr [rax],ymm0
which is part of a JIT routine called by
template <cpu_isa_t isa, data_type_t acc_type, data_type_t dst_type>
void jit_pp_kernel_t<isa, acc_type, dst_type>::operator()(dst_data_t *dst,
const acc_data_t *acc, const char *bias, const float *scales,
size_t start, size_t end, size_t runtime_oc,
const float *dst_zero_points) const
in jit_gemm_inner_product_utils.cpp. At the time of the crash RAX is pointing less than 32 bytes from the end of an allocated memory page, and the addresses following this page are invalid. The YMM instructions operate on 32 bytes (256 bits) at a time, so this causes an access violation.
For example, in one of my crash dumps RAX has the value 00000179EC8A6FE4, and the memory there looks like this:
0x00000179EC8A6FE4 06 6f 8f 3d fd fd fd fd .o.=ýýýý
0x00000179EC8A6FEC dd dd dd dd dd dd dd dd ÝÝÝÝÝÝÝÝ
0x00000179EC8A6FF4 dd dd dd dd dd dd dd dd ÝÝÝÝÝÝÝÝ
0x00000179EC8A6FFC dd dd dd dd ?? ?? ?? ?? ÝÝÝÝ....
0x00000179EC8A7004 ?? ?? ?? ?? ?? ?? ?? ?? ........
...
I seem to have fixed this by adding 32 bytes of padding to all node/edge memory allocations, using the following patch:
diff --git a/src/common/memory_desc_wrapper.hpp b/src/common/memory_desc_wrapper.hpp
index 4017db3dc..158120ad5 100644
--- a/src/common/memory_desc_wrapper.hpp
+++ b/src/common/memory_desc_wrapper.hpp
@@ -162,7 +162,7 @@ struct memory_desc_wrapper : public c_compatible {
max_size = utils::array_product(bd.inner_blks, bd.inner_nblks);
}
- return max_size * data_type_size() + additional_buffer_size();
+ return max_size * data_type_size() + additional_buffer_size() + 32;
}
}
This is enough to allow us to move forward, but I'm sure someone with a better understanding of the code can do better.
Git hash is e0381c3. This is the version referenced by OpenVINO 2021.4.
CPU: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
OS version: Windows 10 Enterprise LTSC (10.0.17763)
Compiler version: Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30133 for x64
CMake version: 3.16.2
I wish I could give you something better to go on here, but we were only able to reproduce this in a system that was live streaming from 64 network cameras and running them all through the model in real time. I was not able to reproduce using a test harness that read the data from disk instead of over a network.
The program crashes intermittently (see summary). This can happen after a couple of minutes or a couple of hours.
The program does not crash.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.