Code Monkey home page Code Monkey logo

Comments (24)

grom72 avatar grom72 commented on August 15, 2024

Hi @qianlong-ql,
Thank you for reporting this issue.
It looks like the problem is in some library beneath the librpma as we cannot reproduce it in our environment.
Could you provide your environment spec OS, kernel, libibverbs?

from rpma.

grom72 avatar grom72 commented on August 15, 2024

Hi @qianlong-ql
coudl you also check how your application behave when libibverbs API is used directly:

	struct ibv_pd *pd = ibv_alloc_pd(dev);
	struct ibv_mr *ibv_mr_ptr = ibv_reg_mr(pd, dst_ptr, dst_size,
			IBV_ACCESS_LOCAL_WRITE);
	if (ibv_mr_ptr == NULL) {
		fprintf(stderr, "ibv_reg_mr return an error\n");
		return -1;
	}

instead of rpma_peer_new() and rpma_mr_reg()

from rpma.

qianlong-ql avatar qianlong-ql commented on August 15, 2024

I try replace rpma_peer_new & rpma_mr_reg to ibv_alloc_pd & ibv_reg_mr, Its also crash in fclose.
my current env:
nic-drivers-mellanox-rdma-2.0.1fib6fix-1.noarch
nic-libs-mellanox-rdma-2.0.1-2.x86_64
os : Linux iz8vb4s0jlsfk3k00ro17az 3.10.0-693.5.2.el7.ecs.2.x86_64 #1 SMP Fri Jul 13 12:21:44 CST 2018 x86_64 x86_64 x86_64 GNU/Linux

I try update rpm to nic-libs-mellanox-rdma-3.0.2-1.x86_64 & nic-drivers-mellanox-rdma-3.0.2-10.noarch, but not work

from rpma.

grom72 avatar grom72 commented on August 15, 2024

Do we know that ibv_fork_init() and rpma_mr_reg() both return 0?

from rpma.

qianlong-ql avatar qianlong-ql commented on August 15, 2024

Do we know that ibv_fork_init() and rpma_mr_reg() both return 0?

yes

from rpma.

grom72 avatar grom72 commented on August 15, 2024

Can you run this code under valgrind-memcheck?

from rpma.

qianlong-ql avatar qianlong-ql commented on August 15, 2024

It won't crash when run with valgrind, here is the output, I don't see any useful information

valgrind --leak-check=full --trace-children=yes --undef-value-errors=no --track-fds=yes --tool=memcheck ./server 200.1.15.2 1
==112158== Memcheck, a memory error detector
==112158== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==112158== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==112158== Command: ./server 200.1.15.2 1
==112158== 
==112221== 
==112221== FILE DESCRIPTORS: 4 open at exit.
==112221== Open file descriptor 4: /dev/infiniband/uverbs0
==112221==    at 0x5346A30: __open_nocancel (in /usr/lib64/libc-2.17.so)
==112221==    by 0x504ACE6: ibv_open_device@@IBVERBS_1.1 (device.c:604)
==112221==    by 0x5627846: ucma_open_device (cma.c:296)
==112221==    by 0x5627846: ucma_init_device.part.2 (cma.c:314)
==112221==    by 0x5627A78: ucma_init_device (cma.c:459)
==112221==    by 0x5627A78: ucma_get_device (cma.c:454)
==112221==    by 0x5627B67: ucma_query_addr (cma.c:697)
==112221==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==112221==    by 0x5628175: rdma_bind_addr (cma.c:884)
==112221==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==112221==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==112221==    by 0x402140: main (server.c:405)
==112221== 
==112221== Open file descriptor 2: /dev/pts/0
==112221==    <inherited from parent>
==112221== 
==112221== Open file descriptor 1: /dev/pts/0
==112221==    <inherited from parent>
==112221== 
==112221== Open file descriptor 0: /dev/pts/0
==112221==    <inherited from parent>
==112221== 
==112221== 
==112221== HEAP SUMMARY:
==112221==     in use at exit: 224,951 bytes in 117 blocks
==112221==   total heap usage: 217 allocs, 100 frees, 373,872 bytes allocated
==112221== 
==112221== 48 bytes in 1 blocks are possibly lost in loss record 18 of 44
==112221==    at 0x4C2C089: calloc (vg_replace_malloc.c:762)
==112221==    by 0x504A676: __ibv_exp_use_priv_env (device.c:548)
==112221==    by 0x504B06B: ibv_exp_use_priv_env (verbs_exp.h:3196)
==112221==    by 0x504B06B: ibv_open_device@@IBVERBS_1.1 (device.c:694)
==112221==    by 0x5627846: ucma_open_device (cma.c:296)
==112221==    by 0x5627846: ucma_init_device.part.2 (cma.c:314)
==112221==    by 0x5627A78: ucma_init_device (cma.c:459)
==112221==    by 0x5627A78: ucma_get_device (cma.c:454)
==112221==    by 0x5627B67: ucma_query_addr (cma.c:697)
==112221==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==112221==    by 0x5628175: rdma_bind_addr (cma.c:884)
==112221==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==112221==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==112221==    by 0x402140: main (server.c:405)
==112221== 
==112221== 77 bytes in 1 blocks are definitely lost in loss record 24 of 44
==112221==    at 0x4C2C291: realloc (vg_replace_malloc.c:836)
==112221==    by 0x52CEC7A: vasprintf (in /usr/lib64/libc-2.17.so)
==112221==    by 0x52AA4F6: asprintf (in /usr/lib64/libc-2.17.so)
==112221==    by 0x504BBB9: load_driver (init.c:254)
==112221==    by 0x504BBB9: load_drivers (init.c:311)
==112221==    by 0x504C571: ibverbs_get_device_list (init.c:625)
==112221==    by 0x5049F2F: count_devices (device.c:97)
==112221==    by 0x584CE6F: pthread_once (in /usr/lib64/libpthread-2.17.so)
==112221==    by 0x504A82B: ibv_get_device_list@@IBVERBS_1.1 (device.c:122)
==112221==    by 0x562A035: ucma_init (cma.c:249)
==112221==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==112221==    by 0x4E3C40B: rpma_info_new (info.c:53)
==112221==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==112221== 
==112221== 204 bytes in 24 blocks are possibly lost in loss record 31 of 44
==112221==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==112221==    by 0x52E3809: strdup (in /usr/lib64/libc-2.17.so)
==112221==    by 0x504A3FA: vsetenv (device.c:427)
==112221==    by 0x504A729: clone_env (device.c:477)
==112221==    by 0x504A729: __ibv_exp_use_priv_env (device.c:559)
==112221==    by 0x504B06B: ibv_exp_use_priv_env (verbs_exp.h:3196)
==112221==    by 0x504B06B: ibv_open_device@@IBVERBS_1.1 (device.c:694)
==112221==    by 0x5627846: ucma_open_device (cma.c:296)
==112221==    by 0x5627846: ucma_init_device.part.2 (cma.c:314)
==112221==    by 0x5627A78: ucma_init_device (cma.c:459)
==112221==    by 0x5627A78: ucma_get_device (cma.c:454)
==112221==    by 0x5627B67: ucma_query_addr (cma.c:697)
==112221==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==112221==    by 0x5628175: rdma_bind_addr (cma.c:884)
==112221==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==112221==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==112221==    by 0x402140: main (server.c:405)
==112221== 
==112221== 576 bytes in 24 blocks are possibly lost in loss record 36 of 44
==112221==    at 0x4C2C089: calloc (vg_replace_malloc.c:762)
==112221==    by 0x504A3E6: vsetenv (device.c:423)
==112221==    by 0x504A729: clone_env (device.c:477)
==112221==    by 0x504A729: __ibv_exp_use_priv_env (device.c:559)
==112221==    by 0x504B06B: ibv_exp_use_priv_env (verbs_exp.h:3196)
==112221==    by 0x504B06B: ibv_open_device@@IBVERBS_1.1 (device.c:694)
==112221==    by 0x5627846: ucma_open_device (cma.c:296)
==112221==    by 0x5627846: ucma_init_device.part.2 (cma.c:314)
==112221==    by 0x5627A78: ucma_init_device (cma.c:459)
==112221==    by 0x5627A78: ucma_get_device (cma.c:454)
==112221==    by 0x5627B67: ucma_query_addr (cma.c:697)
==112221==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==112221==    by 0x5628175: rdma_bind_addr (cma.c:884)
==112221==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==112221==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==112221==    by 0x402140: main (server.c:405)
==112221== 
==112221== 1,935 bytes in 24 blocks are possibly lost in loss record 40 of 44
==112221==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==112221==    by 0x52E3809: strdup (in /usr/lib64/libc-2.17.so)
==112221==    by 0x504A40F: vsetenv (device.c:431)
==112221==    by 0x504A729: clone_env (device.c:477)
==112221==    by 0x504A729: __ibv_exp_use_priv_env (device.c:559)
==112221==    by 0x504B06B: ibv_exp_use_priv_env (verbs_exp.h:3196)
==112221==    by 0x504B06B: ibv_open_device@@IBVERBS_1.1 (device.c:694)
==112221==    by 0x5627846: ucma_open_device (cma.c:296)
==112221==    by 0x5627846: ucma_init_device.part.2 (cma.c:314)
==112221==    by 0x5627A78: ucma_init_device (cma.c:459)
==112221==    by 0x5627A78: ucma_get_device (cma.c:454)
==112221==    by 0x5627B67: ucma_query_addr (cma.c:697)
==112221==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==112221==    by 0x5628175: rdma_bind_addr (cma.c:884)
==112221==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==112221==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==112221==    by 0x402140: main (server.c:405)
==112221== 
==112221== 2,816 bytes in 1 blocks are possibly lost in loss record 42 of 44
==112221==    at 0x4C2C089: calloc (vg_replace_malloc.c:762)
==112221==    by 0x608EE22: mlx5_alloc_context (mlx5.c:984)
==112221==    by 0x504AD81: ibv_open_device@@IBVERBS_1.1 (device.c:631)
==112221==    by 0x5627846: ucma_open_device (cma.c:296)
==112221==    by 0x5627846: ucma_init_device.part.2 (cma.c:314)
==112221==    by 0x5627A78: ucma_init_device (cma.c:459)
==112221==    by 0x5627A78: ucma_get_device (cma.c:454)
==112221==    by 0x5627B67: ucma_query_addr (cma.c:697)
==112221==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==112221==    by 0x5628175: rdma_bind_addr (cma.c:884)
==112221==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==112221==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==112221==    by 0x402140: main (server.c:405)
==112221== 
==112221== 202,520 bytes in 1 blocks are possibly lost in loss record 44 of 44
==112221==    at 0x4C2C089: calloc (vg_replace_malloc.c:762)
==112221==    by 0x504AD1D: ibv_open_device@@IBVERBS_1.1 (device.c:616)
==112221==    by 0x5627846: ucma_open_device (cma.c:296)
==112221==    by 0x5627846: ucma_init_device.part.2 (cma.c:314)
==112221==    by 0x5627A78: ucma_init_device (cma.c:459)
==112221==    by 0x5627A78: ucma_get_device (cma.c:454)
==112221==    by 0x5627B67: ucma_query_addr (cma.c:697)
==112221==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==112221==    by 0x5628175: rdma_bind_addr (cma.c:884)
==112221==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==112221==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==112221==    by 0x402140: main (server.c:405)
==112221== 
==112221== LEAK SUMMARY:
==112221==    definitely lost: 77 bytes in 1 blocks
==112221==    indirectly lost: 0 bytes in 0 blocks
==112221==      possibly lost: 208,099 bytes in 75 blocks
==112221==    still reachable: 16,775 bytes in 41 blocks
==112221==         suppressed: 0 bytes in 0 blocks
==112221== Reachable blocks (those to which a pointer was found) are not shown.
==112221== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==112221== 
==112221== For lists of detected and suppressed errors, rerun with: -s
==112221== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0)
==112158== 
==112158== FILE DESCRIPTORS: 3 open at exit.
==112158== Open file descriptor 2: /dev/pts/0
==112158==    <inherited from parent>
==112158== 
==112158== Open file descriptor 1: /dev/pts/0
==112158==    <inherited from parent>
==112158== 
==112158== Open file descriptor 0: /dev/pts/0
==112158==    <inherited from parent>
==112158== 
==112158== 
==112158== HEAP SUMMARY:
==112158==     in use at exit: 14,635 bytes in 26 blocks
==112158==   total heap usage: 216 allocs, 190 frees, 373,304 bytes allocated
==112158== 
==112158== 77 bytes in 1 blocks are definitely lost in loss record 12 of 22
==112158==    at 0x4C2C291: realloc (vg_replace_malloc.c:836)
==112158==    by 0x52CEC7A: vasprintf (in /usr/lib64/libc-2.17.so)
==112158==    by 0x52AA4F6: asprintf (in /usr/lib64/libc-2.17.so)
==112158==    by 0x504BBB9: load_driver (init.c:254)
==112158==    by 0x504BBB9: load_drivers (init.c:311)
==112158==    by 0x504C571: ibverbs_get_device_list (init.c:625)
==112158==    by 0x5049F2F: count_devices (device.c:97)
==112158==    by 0x584CE6F: pthread_once (in /usr/lib64/libpthread-2.17.so)
==112158==    by 0x504A82B: ibv_get_device_list@@IBVERBS_1.1 (device.c:122)
==112158==    by 0x562A035: ucma_init (cma.c:249)
==112158==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==112158==    by 0x4E3C40B: rpma_info_new (info.c:53)
==112158==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==112158== 
==112158== 160 (16 direct, 144 indirect) bytes in 1 blocks are definitely lost in loss record 15 of 22
==112158==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==112158==    by 0x4E3DE11: rpma_peer_new (peer.c:193)
==112158==    by 0x402164: main (server.c:419)
==112158== 
==112158== 160 (16 direct, 144 indirect) bytes in 1 blocks are definitely lost in loss record 16 of 22
==112158==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==112158==    by 0x4E3D46D: rpma_mr_reg (mr.c:277)
==112158==    by 0x40219C: main (server.c:423)
==112158== 
==112158== LEAK SUMMARY:
==112158==    definitely lost: 109 bytes in 3 blocks
==112158==    indirectly lost: 288 bytes in 2 blocks
==112158==      possibly lost: 0 bytes in 0 blocks
==112158==    still reachable: 14,238 bytes in 21 blocks
==112158==         suppressed: 0 bytes in 0 blocks
==112158== Reachable blocks (those to which a pointer was found) are not shown.
==112158== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==112158== 
==112158== For lists of detected and suppressed errors, rerun with: -s
==112158== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)

from rpma.

grom72 avatar grom72 commented on August 15, 2024

Let's try to get more information with following option:
--undef-value-errors=yes
--malloc-fill=
--free-fill=
Additionally please try to use regular malloc( >1GiB ) - free() instead of fopen() fclose().

from rpma.

qianlong-ql avatar qianlong-ql commented on August 15, 2024

replace fopen & fclose like below

        char *f = malloc(5 *1024 * 1024 * 1024);
        free(f);

and backtrace become

#0  0x00007f38317b4638 in _int_malloc () from /lib64/libc.so.6
#1  0x00007f38317b784c in malloc () from /lib64/libc.so.6
#2  0x0000000000402228 in main (argc=3, argv=0x7fff351976e8)

run valgrind with additional params and output:

 valgrind --leak-check=full --trace-children=yes --undef-value-errors=no --track-fds=yes --tool=memcheck --undef-value-errors=yes --malloc-fill= --free-fill= ./server 200.1.15.2 1
==218347== Memcheck, a memory error detector
==218347== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==218347== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==218347== Command: ./server 200.1.15.2 1
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E7BF: is_overlap (vg_replace_strmem.c:131)
==218347==    by 0x4C2E7BF: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E7DB: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E828: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E876: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8C5: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8D3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Use of uninitialised value of size 8
==218347==    at 0x4C2E8F0: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Use of uninitialised value of size 8
==218347==    at 0x4C2E8F3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8FE: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E90B: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E7BF: is_overlap (vg_replace_strmem.c:131)
==218347==    by 0x4C2E7BF: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E7DB: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E828: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E876: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8C5: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8D3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Use of uninitialised value of size 8
==218347==    at 0x4C2E8F0: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Use of uninitialised value of size 8
==218347==    at 0x4C2E8F3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8FE: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E90B: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x5627B5E: ucma_query_addr (cma.c:696)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E7BF: is_overlap (vg_replace_strmem.c:131)
==218347==    by 0x4C2E7BF: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E7DB: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E828: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E876: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8D3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8FE: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E90B: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E7BF: is_overlap (vg_replace_strmem.c:131)
==218347==    by 0x4C2E7BF: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E7DB: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E828: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E876: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8D3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8FE: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E90B: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x5627B5E: ucma_query_addr (cma.c:696)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x56279BC: ucma_get_device (cma.c:447)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x52A1AFE: vfprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52CEF48: vsnprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52AA3D1: snprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x60A2142: __mlx5_query_device (verbs.c:67)
==218347==    by 0x60A2142: mlx5_query_device (verbs.c:83)
==218347==    by 0x562786B: ucma_init_device.part.2 (cma.c:318)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Use of uninitialised value of size 8
==218347==    at 0x529F1CB: _itoa_word (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52A3450: vfprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52CEF48: vsnprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52AA3D1: snprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x60A2142: __mlx5_query_device (verbs.c:67)
==218347==    by 0x60A2142: mlx5_query_device (verbs.c:83)
==218347==    by 0x562786B: ucma_init_device.part.2 (cma.c:318)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x529F1D5: _itoa_word (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52A3450: vfprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52CEF48: vsnprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52AA3D1: snprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x60A2142: __mlx5_query_device (verbs.c:67)
==218347==    by 0x60A2142: mlx5_query_device (verbs.c:83)
==218347==    by 0x562786B: ucma_init_device.part.2 (cma.c:318)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x52A349F: vfprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52CEF48: vsnprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52AA3D1: snprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x60A2142: __mlx5_query_device (verbs.c:67)
==218347==    by 0x60A2142: mlx5_query_device (verbs.c:83)
==218347==    by 0x562786B: ucma_init_device.part.2 (cma.c:318)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x52A1BCB: vfprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52CEF48: vsnprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52AA3D1: snprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x60A2142: __mlx5_query_device (verbs.c:67)
==218347==    by 0x60A2142: mlx5_query_device (verbs.c:83)
==218347==    by 0x562786B: ucma_init_device.part.2 (cma.c:318)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x52A1C4E: vfprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52CEF48: vsnprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52AA3D1: snprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x60A2142: __mlx5_query_device (verbs.c:67)
==218347==    by 0x60A2142: mlx5_query_device (verbs.c:83)
==218347==    by 0x562786B: ucma_init_device.part.2 (cma.c:318)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C29F09: malloc (vg_replace_malloc.c:309)
==218347==    by 0x5627885: ucma_init_device.part.2 (cma.c:324)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x5627898: ucma_init_device.part.2 (cma.c:330)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x56278C5: ucma_init_device.part.2 (cma.c:330)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Syscall param write(buf) points to uninitialised byte(s)
==218347==    at 0x5346CD0: __write_nocancel (in /usr/lib64/libc-2.17.so)
==218347==    by 0x5047004: ibv_cmd_dealloc_pd (cmd.c:200)
==218347==    by 0x60A25BD: mlx5_free_pd (verbs.c:283)
==218347==    by 0x5627F98: ucma_put_device (cma.c:477)
==218347==    by 0x5627F98: ucma_free_id (cma.c:522)
==218347==    by 0x5629F74: rdma_destroy_id (cma.c:654)
==218347==    by 0x4E3E5FC: rpma_utils_get_ibv_context (rpma.c:67)
==218347==    by 0x402180: main (server.c:405)
==218347==  Address 0x1fff0002a8 is on thread 1's stack
==218347==  in frame #1, created by ibv_cmd_dealloc_pd (cmd.c:194)
==218347== 
==218347== Syscall param write(buf) points to uninitialised byte(s)
==218347==    at 0x5346CD0: __write_nocancel (in /usr/lib64/libc-2.17.so)
==218347==    by 0x50471FC: ibv_cmd_reg_mr (cmd.c:267)
==218347==    by 0x60A2B3A: mlx5_reg_mr (verbs.c:468)
==218347==    by 0x504E14B: __ibv_common_reg_mr (verbs.c:295)
==218347==    by 0x504E253: ibv_reg_mr@@IBVERBS_1.1 (verbs.c:338)
==218347==    by 0x4E3DB11: rpma_peer_mr_reg (peer.c:113)
==218347==    by 0x4E3D4A6: rpma_mr_reg (mr.c:282)
==218347==    by 0x4021DC: main (server.c:423)
==218347==  Address 0x1fff000258 is on thread 1's stack
==218347==  in frame #2, created by mlx5_reg_mr (verbs.c:458)
==218347== 
--218442-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--218442-- si_code=1;  Faulting address: 0x5CBFD30;  sp: 0x1002ba9e30

valgrind: the 'impossible' happened:
   Killed by fatal signal

host stacktrace:
==218442==    at 0x58055D12: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==218442==    by 0x5800EA1E: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==218442==    by 0x580A64C7: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==218442==    by 0x580FF23A: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 218442)
==218442==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==218442==    by 0x402227: main (server.c:434)
client stack range: [0x1FFEFFD000 0x1FFF000FFF] client SP: 0x1FFF000360
valgrind stack range: [0x1002AAA000 0x1002BA9FFF] top usage: 9104 of 1048576


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

==218347== 
==218347== FILE DESCRIPTORS: 3 open at exit.
==218347== Open file descriptor 2: /dev/pts/0
==218347==    <inherited from parent>
==218347== 
==218347== Open file descriptor 1: /dev/pts/0
==218347==    <inherited from parent>
==218347== 
==218347== Open file descriptor 0: /dev/pts/0
==218347==    <inherited from parent>
==218347== 
==218347== 
==218347== HEAP SUMMARY:
==218347==     in use at exit: 14,635 bytes in 26 blocks
==218347==   total heap usage: 216 allocs, 190 frees, 373,354 bytes allocated
==218347== 
==218347== 77 bytes in 1 blocks are definitely lost in loss record 12 of 22
==218347==    at 0x4C2C291: realloc (vg_replace_malloc.c:836)
==218347==    by 0x52CEC7A: vasprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52AA4F6: asprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x504BBB9: load_driver (init.c:254)
==218347==    by 0x504BBB9: load_drivers (init.c:311)
==218347==    by 0x504C571: ibverbs_get_device_list (init.c:625)
==218347==    by 0x5049F2F: count_devices (device.c:97)
==218347==    by 0x584CE6F: pthread_once (in /usr/lib64/libpthread-2.17.so)
==218347==    by 0x504A82B: ibv_get_device_list@@IBVERBS_1.1 (device.c:122)
==218347==    by 0x562A035: ucma_init (cma.c:249)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347== 
==218347== 160 (16 direct, 144 indirect) bytes in 1 blocks are definitely lost in loss record 15 of 22
==218347==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==218347==    by 0x4E3DE11: rpma_peer_new (peer.c:193)
==218347==    by 0x4021A4: main (server.c:419)
==218347== 
==218347== 160 (16 direct, 144 indirect) bytes in 1 blocks are definitely lost in loss record 16 of 22
==218347==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==218347==    by 0x4E3D46D: rpma_mr_reg (mr.c:277)
==218347==    by 0x4021DC: main (server.c:423)
==218347== 
==218347== LEAK SUMMARY:
==218347==    definitely lost: 109 bytes in 3 blocks
==218347==    indirectly lost: 288 bytes in 2 blocks
==218347==      possibly lost: 0 bytes in 0 blocks
==218347==    still reachable: 14,238 bytes in 21 blocks
==218347==         suppressed: 0 bytes in 0 blocks
==218347== Reachable blocks (those to which a pointer was found) are not shown.
==218347== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==218347== 
==218347== Use --track-origins=yes to see where uninitialised values come from
==218347== For lists of detected and suppressed errors, rerun with: -s
==218347== ERROR SUMMARY: 79 errors from 51 contexts (suppressed: 0 from 0)

from rpma.

pbalcer avatar pbalcer commented on August 15, 2024

hm, what is malloc_aligned() in your example?
Can you instead try dst_ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);

from rpma.

ldorau avatar ldorau commented on August 15, 2024

@qianlong-ql Hi, What OS-version does it occur on? I cannot reproduce it (using mmap() suggested above by @pbalcer).

from rpma.

qianlong-ql avatar qianlong-ql commented on August 15, 2024

I clone a environment to reproduce this problem and send the addr & password to Tomasz Gromadzki by email.

from rpma.

ldorau avatar ldorau commented on August 15, 2024

OK, thanks, I have tested it. Could you download and save the source rpm nic-libs-mellanox-rdma-3.0.2-1.src.rpm in the same directory where the binary rpm is on this machine?

from rpma.

qianlong-ql avatar qianlong-ql commented on August 15, 2024

nic-libs-mellanox-rdma-3.0.2-1.x86_64.rpm is not match with the driver on this machine. I recovery nic-libs-mellanox-rdma version to 2.0.1-2 and put the main source in directory /root/rpm_packet/nic-libs-mellanox-rdma-2.0.1

from rpma.

ldorau avatar ldorau commented on August 15, 2024

Thanks!

from rpma.

ldorau avatar ldorau commented on August 15, 2024

Hi @qianlong-ql
It seems to be an issue of glibc's malloc(). It is not an issue of librpma for sure.

  1. The simplest (but not the best!) workaround is to set pagesize to 2MB inside of malloc_aligned() instead of 4KB - it will not crash then.
  2. The best solution is to use mmap() with MAP_SHARED instead of malloc_aligned() as @pbalcer suggested above:
    dst_ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    because you should not use private memory for RDMA if you want to fork the process (malloc() and posix_memalign() use MAP_PRIVATE).

from rpma.

qianlong-ql avatar qianlong-ql commented on August 15, 2024

Thanks, I got the key point that private memory shouldn't use for RDMA if fork used. But why is safe when pagesize set to 2MB.

from rpma.

ldorau avatar ldorau commented on August 15, 2024

It is not safe when pagesize is set to 2MB. It just does not crash, but I cannot guarantee that other things will work correctly.
The only safe way is to use mmap() with MAP_SHARED instead of malloc_aligned():
dst_ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);

from rpma.

ldorau avatar ldorau commented on August 15, 2024

@qianlong-ql The fix #866 has been merged. Let us know, if it fixes this issue, please.

from rpma.

ldorau avatar ldorau commented on August 15, 2024

@qianlong-ql ping

from rpma.

qianlong-ql avatar qianlong-ql commented on August 15, 2024

@ldorau I have been on vacation and I will test and let you know as soon as possible after my vacation

from rpma.

ldorau avatar ldorau commented on August 15, 2024

@ldorau I have been on vacation and I will test and let you know as soon as possible after my vacation

OK

from rpma.

qianlong-ql avatar qianlong-ql commented on August 15, 2024

@ldorau The issue has been fixed

from rpma.

ldorau avatar ldorau commented on August 15, 2024

@ldorau The issue has been fixed

@qianlong-ql Thanks for confirmation! Closing ...

from rpma.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.