Code Monkey home page Code Monkey logo

dml's Introduction

Intel® Data Mover Library (Intel® DML)

Intel® Data Mover Library (Intel® DML) is an open-source library to provide high-performance data manipulation on Intel CPUs. Intel® DML is intended for streaming data movement and transformation operations optimization that is applicable for effective storage, networking, persistent memory, and various data processing applications.

Table of Contents

Get Started

To set up and build the Intel DML, refer to Installation for more details.

Documentation

Documentation is delivered using GitHub Pages. See full Intel DML online documentation.

To build Intel DML offline documentation, see Documentation Build Steps and Prerequisites.

Testing

See Intel DML Testing page for details about testing process.

How to Contribute

See Contributing document for details about contribution process.

How to Report Issues

See Issue Reporting for details about issue reporting process.

License

The library is licensed under the MIT license. Refer to the "LICENSE" file for the full license text.

This distribution includes third party software governed by separate license terms (see "THIRD-PARTY-PROGRAMS").

Security

For information on how to report a potential security issue or vulnerability see Security Policy

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation. No product or component can be absolutely secure. Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries. Java is a registered trademark of Oracle and/or its affiliates.

* Other names and brands may be claimed as the property of others.

dml's People

Contributors

abdelrahim-hentabli avatar egorkupaev avatar guptask avatar kiselik avatar lupustr3 avatar mzhukova avatar smirnov1gor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dml's Issues

SIGILL on many Intel CPUs

Hi! I'm afraid that the detection of supported CPU extensions is flaky. On a bunch of machines I've tested, most examples die at an AVX instruction not supported by the CPU. This includes even Pentium 4 which you explicitly list as supported!

Machines I've tested:

uarch model
Gemini Lake J4115
Skylake i7-6700K
Pinnacle Ridge TR 2990WX
Pentium 4 f15 m4 s3
Braswell N3160
Pentium 4 f15 m4 s1 DNC (32-bit)
Phenom 2 Thuban

You check cpuid, but take only cache info. The presence of AVX/AVX2/AVX512/etc is readily available there.

Job size seems too big

Job size returned by dml_get_job_size(PATH, &job_size_ptr) function seems too big. I get the value '98816' regardless of the used path (DML_PATH_HW/DML_PATH_SW).
Is ~100kB memory allocation for job buffer within the expected range?

An issue about Multi-Socket sample code

Issue descript:
Multi-Socket sample code default set socket number = 4 , running on different config/SKU has some different results
#define SOCKET_COUNT 4u

Config 1:
CPU: Intel(R) Xeon(R) Platinum 8490H
Socket : 2
DSA device per Socket: 4
Enable 1 device: dsa0 (on socket0)

Both SOCKET_COUNT equal to 1~4 can running successful.

Config 2:
CPU: Intel(R) Xeon(R) Platinum 8470
Socket : 2
DSA device per Socket: 1

setup1: // error failed to submit to node0
Enable 1 device: dsa0 (on socket0)
SOCKET_COUNT=4

setup2: // error failed to submit to node1
Enable 1 device: dsa0 (on socket0)
Enable 1 device: dsa1 (on socket1)
SOCKET_COUNT=4

setup3: // successful
Enable 1 device: dsa0 (on socket0)
Enable 1 device: dsa1 (on socket1)
SOCKET_COUNT=2

setup4: // successful
Enable 1 device: dsa0 (on socket0)
SOCKET_COUNT=4
Commented out code: current_job->numa_id = i

for (uint32_t i = 0; i < SOCKET_COUNT; ++i)
   {
       const uint32_t chunk_size = transfer_size / SOCKET_COUNT;

       dml_job_t* current_job = (dml_job_t*)((uint8_t*)jobs + (job_size * i));

       current_job->operation             = DML_OP_MEM_MOVE;
       current_job->source_first_ptr      = src + (chunk_size * i);
       current_job->destination_first_ptr = dst + (chunk_size * i);
       current_job->source_length         = chunk_size;
       current_job->flags                 = DML_FLAG_PREFETCH_CACHE;
       //current_job->numa_id               = i;
   }

Why has these different results , does any logic issue about numa node check of DML?

Incorrect patterns in update_**_for_continuation

FILL, to avoid "shifted pattern" on page boundary (I assume that 16B pattern is not handled yet and 8 is OK).

uint32_t processed = 8 * (fill_record.bytes_completed() / 8);
fill_dsc.transfer_size() -= processed;
fill_dsc.destination_address() += processed;

Does it apply to COMPARE_PATTERN?

DIF_INSERT:

uint32_t blocks_processed = bytes_completed / block_size;
bytes_completed -= blocks_processed  * block_size;
source_address += blocks_processed  * block_size;
destination_address += blocks_processed  * (block_size + DSA_DIF_SIZE);

Very similar pattern should be for DIF_STRIP.

A question about dml_finalize_job

In example code "multi_socket_example.c", destroy a pointer but add a constant : "jobs + SOCKET_COUNT"

cleanup:
    for (uint32_t i = 0; i < SOCKET_COUNT; ++i)
    {
        dml_finalize_job(jobs + SOCKET_COUNT);
    }

why here is a constant , is not a variable offset for "jobs + i"

cleanup:
    for (uint32_t i = 0; i < SOCKET_COUNT; ++i)
    {
        dml_finalize_job(jobs + i);
    }

Question about async mode of DML

We have some question about async mode of DML, thanks for comments:
(1) If one device config two or more SWQ, does DML async mode submit the job to every WQ, how to select and allocate these jobs to these WQs ,is each WQ get the jobs balanced?
(2) If a WQ has two or more engines, the completion of the jobs is disordered or sequential, in other words, and does any flag can control the dml_wait_job() function keep the jobs exaction sequence.

Debugging hardware path on Sapphire Rapids

Hi,

I am unable to run hardware mode examples/tests. I did a fresh clone from master and built using GCC.

To configure the DSA and kernel, I followed the DSA user guide. I believe I have configured the DSA correctly because I can run the dsa_perf_micros scripts
e.g.

sudo ./src/dsa_perf_micros -n128 -s16k -j -c -f -i8000 -k5 -w0 -zF,F -o3
[sudo] password for user1:
./src/dsa_perf_micros -n128 -s16k -j -c -f -i8000 -k5 -w0 -zF,F -o3
-j option is deprecated (default behavior)
blen                      16384
bstride                   16384
bstride                   16384
nb_bufs                     128
pg_size                       0
wq_type                       0
batch_sz                      1
iter                       8000
nb_cpus                       1
var_mmio                      1
dma                           1
verify                        1
misc_flags                    0
access_op[0]               Write
access_op[1]               Write
place_op[0]              Memory
place_op[1]              Memory
flags_cmask            ffffffff
flags_smask                   0
flags_nth_desc                1
nb_numa_node                 16
cpu_desc_work                 0
Memory affinity
CPUs in node 0:		-1 -1
Buffer Offsets 		0 0
GB per sec = 31.170166 cpu 6.270452 kopsrate = 1902

However, I cannot run any of the tests/examples in DML with hardware mode, e.g.

[user1@sprnode5 high-level-api]$ ./hl_mem_move_example_example hardware_path
Executing using dml::hardware path
Starting dml::mem_move example...
Copy 1KB of data from source into destination...
Failure occurred.
[user1@sprnode5 high-level-api]$ ./hl_mem_move_example_example software_path
Executing using dml::software path
Starting dml::mem_move example...
Copy 1KB of data from source into destination...
Finished successfully.

(Note I do get the same output regardless of whether I use sudo or not, I have chowned the work queues to set the group ownership to my users group.)

Similarly all tests pass with ./tests --path=sw and I get a very very large stream of unsuccessful output with ./tests --path=hw. A small sample here

Details:
CPU: Intel (R) Xeon (R) CPU Max 9480

[user1@sprnode5 dsa_perf_micros]$ uname -r
6.3.0-2.el9.elrepo.x86_64
[user1@sprnode5 dsa_perf_micros]$ cat /etc/os-release
NAME="Rocky Linux"
VERSION="9.1 (Blue Onyx)"
[user1@sprnode5 dsa_perf_micros]$ gcc --version
gcc (GCC) 12.2.0

Full DSA config here

Is there anything in the setup I am forgetting/missing?

Thanks in advance,
Hamish

Incorrect handling of CRC

In CRC and COPY_WITH_CRC commands it is not allowed to set initial CRC values after PAGE_FAULT with no bytes processed.
update_crc_for_continuation() and update_copy_crc_for_continuation() should include the conditon:

if (crc_record.bytes_completed() != 0) {
    crc_dsc.crc_seed() = crc_record.crc_value();
}

HW path does not work with job_api_example

I change job_api_example to HW path

int main(const int argc, char **const argv)
{
// Variables
dml_job_t *dml_job_ptr = NULL;
uint32_t total_fails = 0u;

// Allocate dml_job_t
dml_job_ptr = init_dml_job(DML_PATH_SW);

change this into DML_PATH_HW, then I just got a lot of error and fail without any log to show me root cause or suggest next action.

[root@localhost dml_job_api]# ./job_api_samples 
Intel(R) Data Mover Library Job API Examples

============================== LEGALS ==============================

Copyright (C) 2021 Intel Corporation

SPDX-License-Identifier: MIT====================================================================

------------------------------------------
	Run example # 1

	 Example of using Intel DML DML_OP_NOP operation 
	 --- Buffers size to DML_OP_NOP operation: 128
	 --- DML_OP_NOP property: no any specific properties 

	Example return: FAIL (Status 100)

------------------------------------------
	Run example # 2

	 Example of using Intel DML DML_OP_MEM_MOVE operation 
	 --- Buffers size to DML_OP_MEM_MOVE operation: 128
	 --- DML_OP_MEM_MOVE property: none 

	Example return: FAIL (Status 100)

------------------------------------------
	Run example # 3

	 Example of using Intel DML DML_OP_FILL operation 
	 --- Buffers size to DML_OP_FILL operation: 128
	 --- DML_OP_FILL property: no any specific properties 

	Example return: FAIL (Status 100)

------------------------------------------
	Run example # 4

	 Example of using Intel DML DML_OP_COMPARE_PATTERN operation 
	 --- Buffers size to DML_OP_COMPARE_PATTERN operation: 128
	 --- DML_OP_COMPARE_PATTERN property: none

	 Array is equal to pattern 
	 --- Status : 100
	 --- Result : 0
	 --- Offset : 0

	 Array is NOT equal to pattern 
	 --- Status : 100
	 --- Result : 0
	 --- Offset : 0

	Example return: FAIL (Status 100)

------------------------------------------
	Run example # 5

	 Example of using Intel DML DML_OP_DIF_UPDATE operation 
	 --- Buffers size to DML_OP_DIF_UPDATE operation: 4104
	 --- DML_OP_DIF_UPDATE property: BLOCK_SIZE is 4096 

	Example return: FAIL (Status 100)

------------------------------------------
	Run example # 6

	 Example of using Intel DML DML_OP_DIF_INSERT operation 
	 --- Buffers size to DML_OP_DIF_INSERT operation: 4096 and 4104
	 --- DML_OP_DIF_INSERT property: BLOCK_SIZE is 4096 

	Example return: FAIL (Status 100)

------------------------------------------
	Run example # 7

	 Example of using Intel DML DML_OP_DIF_CHECK operation 
	 --- Buffers size to DML_OP_DIF_CHECK operation: 4096 and 4104
	 --- DML_OP_DIF_CHECK property: BLOCK_SIZE is 4096 

	Example return: FAIL (Status 100)

------------------------------------------
	Run example # 8

	 Example of using Intel DML DML_OP_DIF_STRIP operation 
	 --- Buffers size to DML_OP_DIF_STRIP operation: 4104 and 4096
	 --- DML_OP_DIF_STRIP property: BLOCK_SIZE is 4096 

	Example return: FAIL (Status 100)

------------------------------------------
	Run example # 9

	 Example of using Intel DML DML_OP_CACHE_FLUSH operation 
	 --- Buffers size to DML_OP_CACHE_FLUSH operation: 128
	 --- DML_OP_CACHE_FLUSH property: none 

	Example return: FAIL (Status 100)

------------------------------------------
	Run example # 10

	 Example of using Intel DML DML_OP_BATCH operation 
	 --- Buffers size to DML_OP_BATCH operation: 128
	 --- DML_OP_BATCH property: none 

	Example return: FAIL (Status 100)

------------------------------------------
	Run example # 11

	 Example of using Intel DML DML_OP_CRC operation 
	 --- Buffers size to DML_OP_CRC operation: 128
	 --- DML_OP_CRC property: none 

	Example return: FAIL (Status 100)

------------------------------------------
	Run example # 12

	 Example of using Intel DML DML_OP_CRC_COPY operation 
	 --- Buffers size to DML_OP_CRC_COPY operation: 128
	 --- DML_OP_CRC_COPY property: none 

	Example return: FAIL (Status 100)

------------------------------------------
	Run example # 13

	 Example of using Intel DML DML_OP_CRC operation with dml_submit_job 
	 --- Buffers size to DML_OP_CRC operation: 128
	 --- DML_OP_CRC property: no any specific properties 

	Example return: FAIL (Status 100)

------------------------------------------
	Run example # 14

	 Example of using Intel DML DML_OP_DELTA_CREATE operation 
	 --- Buffers size to DML_OP_DELTA_CREATE operation: 128
	 --- DML_OP_DELTA_CREATE property: none 

	Example return: FAIL (Status 100)

------------------------------------------
	Run example # 15

	 Example of using Intel DML DML_OP_DELTA_APPLY operation 
	 --- Buffers size to DML_OP_DELTA_APPLY operation: 128
	 --- DML_OP_DELTA_APPLY property: none 

	Example return: FAIL (Status 100)

====== Examples Execution Completed ======
	 --- Total Samples run:                     15
	 --- Samples completed with OK status:      0
	 --- Samples completed with FAIL status:    15
[root@localhost dml_job_api]# 

Incorrect handling of PAGE_FAULT_MASK

The condition in core_interconnect.cpp:138 is incorrect. Should be:

if (is_finished && (status & 0x7f) == page_fault_mask)

(to get rid of READ/WRITE page fault bit, 0x80).
Same in :107

Unknown buffer size limitation for CRC operation

What is the acceptable input buffer size for CRC operation ?

I play with different sizes of CRC buffer.
DML Lib does accept different sizes, but it behaves with error or even segmentation fault in some cases.

Example execution:

[bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_1KB hardware_path
The example will be run on the hardware path.
Starting CRC job example.
Caclulating CRC for region of size 1KB.
Calculated CRC is: 0x2cdf6e8f
Finished successfully.
[bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_4MB hardware_path
The example will be run on the hardware path.
Starting CRC job example.
Caclulating CRC for region of size 4MB.
An error (15) occured during job execution.
[bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_16MB hardware_path
Segmentation fault (core dumped)
[bgrzesko@fl31ca105bs0411 build]$ 

How to reproduce (diff -> apply and compile example) :

[bgrzesko@fl31ca105bs0411 build]$ git diff
diff --git a/examples/low-level-api/crc_example.c b/examples/low-level-api/crc_example.c
index 3c12df2..ad03704 100644
--- a/examples/low-level-api/crc_example.c
+++ b/examples/low-level-api/crc_example.c
@@ -9,7 +9,8 @@
 #include "dml/dml.h"
 #include "examples_utils.h"
 
-#define BUFFER_SIZE 1024 // 1 KB
+//#define BUFFER_SIZE 4 * 1024 * 1024 // 4 MB
+#define BUFFER_SIZE 16 * 1024 * 1024 // 16 MB
 
 /*
 * This example demonstrates how to create and run a crc operation.

examples not getting compiled by default

As per the documentation here, the examples should be compiled, however, it is not specified in the CMakeLists.txt.

Please add this line to CMakeLists.txt to get the examples compiled when the DML is compiled.

image

Possible use-after-free when handler is destroyed before operation completes?

Consider the following code:

{
    auto h = dml::submit<path>(...);
}

The handler returned from submit may be destroyed before the operation completes. I don't see any mention in the documentation that the handler should be kept alive until the operation completes, so I would assume this code is valid.

However, I believe this can cause use-after-free in the hardware path since DML will try to write the completion status to the descriptor (which, if I understand correctly, is owned by the handler). Possibly the same is true for C API and dml_finalize_job().

Is my understanding correct or this is already handled by the DML somehow? If not, I think the best way to avoid this would be to wait for the operation to complete inside the handler destructor.

Error when using hardware path

When I run the command, It shows error like this:
./ll_crc_example hardware_path
The example will be run on the hardware path.
Starting CRC job example.
Caclulating CRC for region of size 1KB.
An error (100) occured during job execution.

So I try 2 steps:

  1. Check the .so is ok
    ldd /usr/bin/accel-config
    linux-vdso.so.1 (0x00007fffe05d5000)
    libaccel-config.so.1 => /usr/lib64/libaccel-config.so.1 (0x00007f14c4bdf000)
    libjson-c.so.4 => /lib/x86_64-linux-gnu/libjson-c.so.4 (0x00007f14c4bba000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f14c49c8000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f14c4c0e000)
    2.sudo python3 accel_conf.py --load=../configs/1n1d1e1w-s-n1.conf
    Filter:
    No active devices
    Loading configuration - done
    Additional configuration steps
    Force block on fault: False
    Enabling configured devices
    dsa0 - error
    wq0.0 - error

failed in dsa0/wq0.0
enabled 0 wq(s) out of 1


Checking configuration
No active devices

How should I do to fix it? And after step2 should it be ok to run the command ./ll_crc_example hardware_path?

compiler optimization issue when allocation the transfer buffer on heap memory

example code: mem_move
change to allocation source and destination to allocate from stack to heap and init the buffer:

uint8_t* source = (uint8_t *)malloc(BUFFER_SIZE);
uint8_t* destination = (uint8_t *)malloc(BUFFER_SIZE);
memset(source, 1, BUFFER_SIZE);
memset(destination, 0, BUFFER_SIZE);

compile and run , return error 102

if change the compile optimization running successful, like:
cmake -DCMAKE_BUILD_TYPE=Debug ..
or
cmake -DCMAKE_BUILD_TYPE=-O0 ..

Fails to build from source (FTBFS) on Windows

Release: v1.1.0
Environment: Windows (GCC 13.2.1)

This appears to be caused by:

#if defined(__linux__)
#include "libaccel_config.h"
#endif

Because of the #if, the necessary definitions are missing, causing a lot of errors:

FAILED: sources/core/src/hw_dispatcher/CMakeFiles/dml_hw_dispatcher.dir/hw_configuration_driver.c.obj 
/usr/local/bin/x86_64-w64-mingw32ucrt-gcc -DDML_GIT_REVISION=\"N/A\" -D_FORTIFY_SOURCE=2 -I/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/../../../../include -I/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/. -O3 -mcrtdll=ucrt -D_UCRT -O2 -DNDEBUG -fPIC -fstack-protector --param=ssp-buffer-size=8 -fstack-clash-protection -MD -MT sources/core/src/hw_dispatcher/CMakeFiles/dml_hw_dispatcher.dir/hw_configuration_driver.c.obj -MF sources/core/src/hw_dispatcher/CMakeFiles/dml_hw_dispatcher.dir/hw_configuration_driver.c.obj.d -o sources/core/src/hw_dispatcher/CMakeFiles/dml_hw_dispatcher.dir/hw_configuration_driver.c.obj -c /build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c
In file included from /build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c:9:
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_configuration_driver.h:56:47: warning: 'struct accfg_ctx' declared inside parameter list will not be visible outside of this definition or declaration
   56 | int32_t DML_HW_API(driver_new_context)(struct accfg_ctx **ctx);
      |                                               ^~~~~~~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_configuration_driver.h:58:66: warning: 'struct accfg_ctx' declared inside parameter list will not be visible outside of this definition or declaration
   58 | struct accfg_device *DML_HW_API(context_get_first_device)(struct accfg_ctx *ctx);
      |                                                                  ^~~~~~~~~
In file included from /build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_configuration_driver.h:10:
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_definitions.h:49:41: error: conflicting types for 'dsa_driver_new_context'; have 'int32_t(struct accfg_ctx **)' {aka 'int(struct accfg_ctx **)'}
   49 | #define DML_HW_API(name) DML_HW_STDCALL dsa_##name /**< Declaration macros to manipulate function name */
      |                                         ^~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c:134:9: note: in expansion of macro 'DML_HW_API'
  134 | int32_t DML_HW_API(driver_new_context)(struct accfg_ctx **ctx)
      |         ^~~~~~~~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_definitions.h:49:41: note: previous declaration of 'dsa_driver_new_context' with type 'int32_t(struct accfg_ctx **)' {aka 'int(struct accfg_ctx **)'}
   49 | #define DML_HW_API(name) DML_HW_STDCALL dsa_##name /**< Declaration macros to manipulate function name */
      |                                         ^~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_configuration_driver.h:56:9: note: in expansion of macro 'DML_HW_API'
   56 | int32_t DML_HW_API(driver_new_context)(struct accfg_ctx **ctx);
      |         ^~~~~~~~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_definitions.h:49:41: error: conflicting types for 'dsa_context_get_first_device'; have 'struct accfg_device *(struct accfg_ctx *)'
   49 | #define DML_HW_API(name) DML_HW_STDCALL dsa_##name /**< Declaration macros to manipulate function name */
      |                                         ^~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c:143:22: note: in expansion of macro 'DML_HW_API'
  143 | struct accfg_device *DML_HW_API(context_get_first_device)(struct accfg_ctx *ctx)
      |                      ^~~~~~~~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_definitions.h:49:41: note: previous declaration of 'dsa_context_get_first_device' with type 'struct accfg_device *(struct accfg_ctx *)'
   49 | #define DML_HW_API(name) DML_HW_STDCALL dsa_##name /**< Declaration macros to manipulate function name */
      |                                         ^~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_configuration_driver.h:58:22: note: in expansion of macro 'DML_HW_API'
   58 | struct accfg_device *DML_HW_API(context_get_first_device)(struct accfg_ctx *ctx);
      |                      ^~~~~~~~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_definitions.h:49:41: error: return type is an incomplete type
   49 | #define DML_HW_API(name) DML_HW_STDCALL dsa_##name /**< Declaration macros to manipulate function name */
      |                                         ^~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c:188:21: note: in expansion of macro 'DML_HW_API'
  188 | enum accfg_wq_state DML_HW_API(work_queue_get_state)(struct accfg_wq *wq)
      |                     ^~~~~~~~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_definitions.h:49:41: error: conflicting types for 'dsa_work_queue_get_state'; have 'void(struct accfg_wq *)'
   49 | #define DML_HW_API(name) DML_HW_STDCALL dsa_##name /**< Declaration macros to manipulate function name */
      |                                         ^~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c:188:21: note: in expansion of macro 'DML_HW_API'
  188 | enum accfg_wq_state DML_HW_API(work_queue_get_state)(struct accfg_wq *wq)
      |                     ^~~~~~~~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_definitions.h:49:41: note: previous declaration of 'dsa_work_queue_get_state' with type 'enum accfg_wq_state(struct accfg_wq *)'
   49 | #define DML_HW_API(name) DML_HW_STDCALL dsa_##name /**< Declaration macros to manipulate function name */
      |                                         ^~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_configuration_driver.h:82:21: note: in expansion of macro 'DML_HW_API'
   82 | enum accfg_wq_state DML_HW_API(work_queue_get_state)(struct accfg_wq *wq);
      |                     ^~~~~~~~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c: In function 'dsa_work_queue_get_state':
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c:193:12: warning: 'return' with a value, in function returning void
  193 |     return -1;
      |            ^
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_definitions.h:49:41: note: declared here
   49 | #define DML_HW_API(name) DML_HW_STDCALL dsa_##name /**< Declaration macros to manipulate function name */
      |                                         ^~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c:188:21: note: in expansion of macro 'DML_HW_API'
  188 | enum accfg_wq_state DML_HW_API(work_queue_get_state)(struct accfg_wq *wq)
      |                     ^~~~~~~~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c: At top level:
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_definitions.h:49:41: error: return type is an incomplete type
   49 | #define DML_HW_API(name) DML_HW_STDCALL dsa_##name /**< Declaration macros to manipulate function name */
      |                                         ^~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c:197:20: note: in expansion of macro 'DML_HW_API'
  197 | enum accfg_wq_mode DML_HW_API(work_queue_get_mode)(struct accfg_wq *wq)
      |                    ^~~~~~~~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_definitions.h:49:41: error: conflicting types for 'dsa_work_queue_get_mode'; have 'void(struct accfg_wq *)'
   49 | #define DML_HW_API(name) DML_HW_STDCALL dsa_##name /**< Declaration macros to manipulate function name */
      |                                         ^~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c:197:20: note: in expansion of macro 'DML_HW_API'
  197 | enum accfg_wq_mode DML_HW_API(work_queue_get_mode)(struct accfg_wq *wq)
      |                    ^~~~~~~~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_definitions.h:49:41: note: previous declaration of 'dsa_work_queue_get_mode' with type 'enum accfg_wq_mode(struct accfg_wq *)'
   49 | #define DML_HW_API(name) DML_HW_STDCALL dsa_##name /**< Declaration macros to manipulate function name */
      |                                         ^~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_configuration_driver.h:84:20: note: in expansion of macro 'DML_HW_API'
   84 | enum accfg_wq_mode DML_HW_API(work_queue_get_mode)(struct accfg_wq *wq);
      |                    ^~~~~~~~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c: In function 'dsa_work_queue_get_mode':
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c:202:12: warning: 'return' with a value, in function returning void
  202 |     return 2;
      |            ^
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_definitions.h:49:41: note: declared here
   49 | #define DML_HW_API(name) DML_HW_STDCALL dsa_##name /**< Declaration macros to manipulate function name */
      |                                         ^~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c:197:20: note: in expansion of macro 'DML_HW_API'
  197 | enum accfg_wq_mode DML_HW_API(work_queue_get_mode)(struct accfg_wq *wq)
      |                    ^~~~~~~~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c: At top level:
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_definitions.h:49:41: error: return type is an incomplete type
   49 | #define DML_HW_API(name) DML_HW_STDCALL dsa_##name /**< Declaration macros to manipulate function name */
      |                                         ^~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c:224:25: note: in expansion of macro 'DML_HW_API'
  224 | enum accfg_device_state DML_HW_API(device_get_state)(struct accfg_device *device)
      |                         ^~~~~~~~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_definitions.h:49:41: error: conflicting types for 'dsa_device_get_state'; have 'void(struct accfg_device *)'
   49 | #define DML_HW_API(name) DML_HW_STDCALL dsa_##name /**< Declaration macros to manipulate function name */
      |                                         ^~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c:224:25: note: in expansion of macro 'DML_HW_API'
  224 | enum accfg_device_state DML_HW_API(device_get_state)(struct accfg_device *device)
      |                         ^~~~~~~~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_definitions.h:49:41: note: previous declaration of 'dsa_device_get_state' with type 'enum accfg_device_state(struct accfg_device *)'
   49 | #define DML_HW_API(name) DML_HW_STDCALL dsa_##name /**< Declaration macros to manipulate function name */
      |                                         ^~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_configuration_driver.h:68:25: note: in expansion of macro 'DML_HW_API'
   68 | enum accfg_device_state DML_HW_API(device_get_state)(struct accfg_device *device);
      |                         ^~~~~~~~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c: In function 'dsa_device_get_state':
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c:229:12: warning: 'return' with a value, in function returning void
  229 |     return -1;
      |            ^
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/legacy_headers/hardware_definitions.h:49:41: note: declared here
   49 | #define DML_HW_API(name) DML_HW_STDCALL dsa_##name /**< Declaration macros to manipulate function name */
      |                                         ^~~~
/build/BUILD/DML-develop/sources/core/src/hw_dispatcher/hw_configuration_driver.c:224:25: note: in expansion of macro 'DML_HW_API'
  224 | enum accfg_device_state DML_HW_API(device_get_state)(struct accfg_device *device)
      |                         ^~~~~~~~~~

Configured with:

cmake -GNinja -S . -B build -DCMAKE_BUILD_TYPE=Release '-DCMAKE_C_FLAGS_RELEASE=-O2 -DNDEBUG' '-DCMAKE_CXX_FLAGS_RELEASE=-O2 -DNDEBUG' -DCMAKE_INSTALL_PREFIX=/usr/x86_64-w64-mingw32ucrt/sys-root/local -DDML_BUILD_EXAMPLES=OFF -DDML_BUILD_TESTS=OFF

segfault with mutli-thread since the port ptr become null

use the PR #18

performance test to run multi-thread with the following command
./examples/dml_example_c_api_perftest 128 10000 4096 0 16
allocatate the 4096 aligned src=0x7f3a7a388000, dst=0x7f3a79b86000

jobs=0x7f3a7c82f010
jobs=0x7f3a7c83ac10
jobs=0x7f3a7c846810
jobs=0x7f3a7c852410
jobs=0x7f3a7c85e010
jobs=0x7f3a7c869c10
jobs=0x7f3a7c875810
jobs=0x7f3a7c881410
jobs=0x7f3a7c88d010
jobs=0x7f3a7c898c10
Starting example for multi-job memory move jobs=0x7f3a7c846810:
jobs=0x7f3a7c8a4810
Starting example for multi-job memory move jobs=0x7f3a7c83ac10:
Starting example for multi-job memory move jobs=0x7f3a7c82f010:
jobs=0x7f3a7c8b0410
jobs=0x7f3a7c8bc010
Starting example for multi-job memory move jobs=0x7f3a7c869c10:
Starting example for multi-job memory move jobs=0x7f3a7c85e010:
jobs=0x7f3a7c8c7c10
Starting example for multi-job memory move jobs=0x7f3a7c852410:
jobs=0x7f3a7c8d3810
Starting example for multi-job memory move jobs=0x7f3a7c8c7c10:
jobs=0x7f3a7c8df410
Starting example for multi-job memory move jobs=0x7f3a7c8d3810:
Starting example for multi-job memory move jobs=0x7f3a7c898c10:
Starting example for multi-job memory move jobs=0x7f3a7c8b0410:
Starting example for multi-job memory move jobs=0x7f3a7c8df410:
Starting example for multi-job memory move jobs=0x7f3a7c881410:
Starting example for multi-job memory move jobs=0x7f3a7c88d010:
Starting example for multi-job memory move jobs=0x7f3a7c8a4810:
Starting example for multi-job memory move jobs=0x7f3a7c8bc010:
Starting example for multi-job memory move jobs=0x7f3a7c875810:
Segmentation fault (core dumped)

using gdb we will find that the port ptr is null:
Thread 13 "dml_example_c_a" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffef1f4700 (LWP 123148)]
0x0000000000422057 in dml::core::dispatcher::hw_queue::enqueue_descriptor (this=0x62dc38 dml::core::dispatcher::instance+56, desc_ptr=0x7ffff7fac4c0) at /home/dennis/DML/sources/core/src/hw_dispatcher/hw_queue.cpp:92
92 : "a"(current_place_ptr), "d"(desc_ptr));
Missing separate debuginfos, use: yum debuginfo-install libgcc-8.5.0-4.el8_5.x86_64 libpmem-1.6.1-1.el8.x86_64 libstdc++-8.5.0-4.el8_5.x86_64 libuuid-2.32.1-28.el8.x86_64
(gdb) p current_place_ptr
$1 = (void *) 0x0
(gdb)

From the logic, the port ptr can't be null since the portal_mask_never change, but in this case the portal_mask_ changed to zero, so suspect some data overflow overwrite these data can cause the issue.

Is DML a wrapper for Intel DSA?

I've read the document, but I'm still confused about the hardware and software options.

Is DML a wrapper for Intel DSA?
Since DMA also supports asynchronous data movement, does DML support DMA as well?

And the dml::software is implement by starting another thread (or corountine)?

Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.