Comments (34)
This is what I found in the history:
from cortx-motr.
@andriytk, yes starting/stopping rconfc was a temporary solution, we agreed upon that and this problem will be further aggravated when we use additional spiel commands. This needs to be worked upon as we move beyond LDR1. Thanks for bringing this up. I will create a separate ticket for this.
from cortx-motr.
Created a separate issue to fix frequent rconfc start/stop in hax, #300
from cortx-motr.
There are too many different connections pointers which suggests that there might be a connections leakage.
from cortx-motr.
Yesterday, the cluster was re-bootstrapped (to workaround the problem). It was not seen immediately after the cluster was started, but today the issue got reproduced again. Nobody used the cluster since yesterday, so I have a suspicion that it could be caused by the fact that hax
periodically updates the Motr consumption statistics on all the nodes (6 server + 16 client nodes are configured there). On each update rconfc is started and stopped, so it could be that the connection is not closed when it is stopped. cc @knekrasov , @mssawant , @NeeravChoudhari .
from cortx-motr.
looks like rconfc starts and stops fine in hax
:
15:02 tkachuk1@sage-tier1-1:~$ sudo journalctl -lf | grep confc
Dec 15 15:03:13 sage-tier1-1 hare-hax[24143]: 2020-12-15 15:03:13,712 [DEBUG] {fs-stats-updater} Starting rconfc
Dec 15 15:03:53 sage-tier1-1 hare-hax[24143]: 2020-12-15 15:03:53,998 [DEBUG] {fs-stats-updater} rconfc started
Dec 15 15:03:59 sage-tier1-1 hare-hax[24143]: 2020-12-15 15:03:59,525 [DEBUG] {fs-stats-updater} Stopping rconfc
Dec 15 15:03:59 sage-tier1-1 hare-hax[24143]: 2020-12-15 15:03:59,774 [DEBUG] {fs-stats-updater} confc has been stopped successfuly
Dec 15 15:04:29 sage-tier1-1 hare-hax[24143]: 2020-12-15 15:04:29,830 [DEBUG] {fs-stats-updater} Starting rconfc
Dec 15 15:05:12 sage-tier1-1 hare-hax[24143]: 2020-12-15 15:05:12,328 [DEBUG] {fs-stats-updater} rconfc started
Dec 15 15:05:17 sage-tier1-1 hare-hax[24143]: 2020-12-15 15:05:17,850 [DEBUG] {fs-stats-updater} Stopping rconfc
Dec 15 15:05:18 sage-tier1-1 hare-hax[24143]: 2020-12-15 15:05:18,135 [DEBUG] {fs-stats-updater} confc has been stopped successfuly
It just takes an awful lot of time to start it! So the issue might be at the Motr side after all.
cc @max-seagate , @nikitadanilov
from cortx-motr.
@mssawant , @knekrasov any idea why we set the process fid as profile at start_rconfc()
https://github.com/Seagate/cortx-hare/blob/571ae39ff04bb76ec14859ffe19c0897b1a777d1/hax/hax/motr/hax.c#L682-L693 ?
from cortx-motr.
Perhaps because this is how I understood the team's explanations how it should be initialized. Does your question mean that this is definitely incorrect, @andriytk ?
If so, can you suggest a right FID hax should use?
from cortx-motr.
Profile and process are completely different notions in Motr. Profile is usually one in the cluster configuration, you can see it in the beginning of the hctl status
output:
15:53 tkachuk1@sage-tier1-1:~$ hctl status | head
...
Profile:
# fid name: pool(s)
0x7000000000000001:0x4da 'prof-0': 'tier1-nvme' 'tier2-ssd' 'tier3-hdd' None None
...
And there are usually many processes configured in the cluster, here are few examples:
15:53 tkachuk1@sage-tier1-1:~$ hctl status
...
sage-tier1-2
[started] hax 0x7200000000000001:0x21 172.18.1.2@o2ib:12345:1:1
[offline] ioservice 0x7200000000000001:0x24 172.18.1.2@o2ib:12345:2:1
[unknown] m0_client 0x7200000000000001:0x30 172.18.1.2@o2ib:12345:4:1
[unknown] m0_client 0x7200000000000001:0x33 172.18.1.2@o2ib:12345:4:2
...
The FID prefix of profile is 0x70 while the prefix of process FID is 0x72. Always.
So I wonder:
- why do we try to set profile fid on each rconfc start (by calling
m0_spiel_cmd_profile_set()
); - why do we set the process fid as a profile (passing it as argument to
m0_spiel_cmd_profile_set()
)?
from cortx-motr.
In hare-status
there is only one mention of profile - it is pool profile. And it has fid, yes. Is it the same thing to the one expected by Spiel API?
So I wonder: 1) why do we set profile fid on each rconfc start
As far as I remember, Spiel API will not work without setting the profile fid.
- why do we set the process fid as a profile?
Probably that's a mistake.
from cortx-motr.
As far as I remember, Spiel API will not work without setting the profile fid.
Yes, but why on each rconfc start which happens periodically on each fs-stats update? Why not just once at start()
after https://github.com/Seagate/cortx-hare/blob/571ae39ff04bb76ec14859ffe19c0897b1a777d1/hax/hax/motr/hax.c#L666 ?
from cortx-motr.
Well, initially start_rconfc() was meant to be invoked just once - when the thread initalizes. So at that moment this wasn't really important. But when it turned out that a long-running rconfc makes problems at Motr level, it was decided to start and stop it as quick as possible.
from cortx-motr.
m0trace files from hax and confd just after the cluster restart for further analysis:
hax-m0trace.64056.txt.zip
confd-m0trace.64072.txt.zip
from cortx-motr.
Here we can see from hax-m0trace.64056.txt.zip how the m0_spiel_cmd_profile_set()
is called with the wrong fid of process instead of profile:
867260 5149725.633.139455 7f2306ffb7c0 CALL spiel/spiel.c:102:m0_spiel_cmd_profile_set > spiel = 0x1e95650, profile = 7200000000000001:6
from cortx-motr.
All requests (and replies) from hax to confd about connection establishment and termination:
$ cat hax-m0trace.64056.txt | grep 'M0_RPC_CONN.*172.18.1.1.o2ib:12345:2:1'
506512 5149588.846.660440 7f2654fa8a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f26800455b0 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
507834 5149588.859.937720 7f2306ffb2d0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f22fc010730 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f268004b8e0 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
518660 5149588.909.128629 7f2654fa8a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f2680051bc0 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
522691 5149652.308.904432 7f2654fa8ac0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f2680045750 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
525411 5149652.889.048213 7f2306ffb0f0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f22fc013cd0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f22fc00b100 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
527309 5149653.041.289508 7f2566ffcbb0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f230c05f8d0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f255c054b90 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
528793 5149653.094.278371 7f2654fa8ac0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f268004ba30 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
529753 5149653.156.047883 7f2654fa8a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f268004ba30 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
530164 5149653.176.596194 7f2654fa8ac0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f268004bf60 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
854723 5149662.413.775698 7f2654fa8a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f2680045750 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
856057 5149662.422.308668 7f2306ffb2d0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f22fc013cd0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f22fc00b100 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
866878 5149662.455.189940 7f2654fa8a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f268004bf60 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
870902 5149725.895.858754 7f2654fa8ac0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f26800e3070 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
873639 5149726.534.923565 7f2306ffb0f0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f22fc013cd0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f22fc00b100 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
875524 5149726.584.463539 7f2566ffcbb0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f230c056740 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f255c0581a0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
877008 5149726.627.633824 7f2654fa8ac0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f26800e6460 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
877980 5149726.702.636074 7f2654fa8a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f26800e6460 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
878391 5149726.726.560668 7f2654fa8ac0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f26800e6460 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
1202228 5149736.052.919950 7f2654fa8a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f26800e3070 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
1203565 5149736.062.226608 7f2306ffb2d0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f22fc013cd0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f22fc00b100 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
1214383 5149736.093.393970 7f2654fa8a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f26800e6460 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
from cortx-motr.
m0trace file of confd after few hours since the cluster startup - confd-m0trace.64072-5.txt.zip
from cortx-motr.
In this confd-m0trace.64072-02.txt.zip we can see some new connections added:
$ 7z x -so confd-m0trace.64072-02.txt.zip | grep 'frm_fill_packet_from_item_sources.*conn' | awk '{print $7}' | sort | uniq -c | sort -n | head
45 0x7f4fcf548130
75 0x7f4fcf548e70
80 0x7f4fcf541170
88 0x7f4fcf533d50
106 0x7f4fcf532db0
121 0x7f4fcf52c010
122 0x3748da0
122 0x38aa790
122 0x38b2490
122 0x38b9450
All CONN sends:
$ 7z x -so confd-m0trace.64072-02.txt.zip | grep CONN | grep o2ib
10887604486 5173444.038.247032 7f541e7a39c0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f5444044420 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.6@o2ib:12345:1:1 ri_session=0x7f4fcf516420 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887612478 5173444.045.111881 7f541e7a39c0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f5444043ad0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.39@o2ib:12345:1:1 ri_session=0x7f4fcf52cc70 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887647807 5173444.067.489217 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c82ab890 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.27@o2ib:12345:1:1 ri_session=0x7f53280b1f40 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887674537 5173444.084.129748 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c82aac20 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.24@o2ib:12345:1:1 ri_session=0x7f53280bbc80 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887720637 5173444.107.934431 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c82abe00 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.26@o2ib:12345:1:1 ri_session=0x7f53280b9cc0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887757552 5173444.128.246408 7f541e7a39c0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f5444044420 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.3@o2ib:12345:1:1 ri_session=0x7f4fcf533a10 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887764486 5173444.132.524255 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c82b13a0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.1@o2ib:12345:1:1 ri_session=0x7f53280fe190 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887828355 5173444.166.917860 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c82b18f0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.6@o2ib:12345:1:1 ri_session=0x7f53280c4860 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887890598 5173444.204.852884 7f541e7a39c0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f5444044420 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.27@o2ib:12345:1:1 ri_session=0x7f4fcf5349b0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887910992 5173444.212.615441 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c81905b0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.39@o2ib:12345:1:1 ri_session=0x7f53280c4d10 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887943798 5173444.234.687203 7f541e7a39c0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f54440462a0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.26@o2ib:12345:1:1 ri_session=0x7f4fcf541dd0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887964267 5173444.242.609706 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c82b5ba0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.3@o2ib:12345:1:1 ri_session=0x7f53280d0490 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887983586 5173444.256.231076 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8193880 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.2@o2ib:12345:1:1 ri_session=0x7f53280f3d80 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887991810 5173444.261.434945 7f541e7a39c0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f5444043ad0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:1:1 ri_session=0x7f4fcf541ed0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10888012632 5173444.271.214912 7f530d7f9ae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8193230 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.24@o2ib:12345:1:1 ri_session=0x7f53280bbc80 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888125417 5173444.333.974436 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c82b4570 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.24@o2ib:12345:1:1 ri_session=0x7f53280d0cf0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888160942 5173444.352.829382 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8194eb0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.27@o2ib:12345:1:1 ri_session=0x7f53280beca0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888167890 5173444.357.330193 7f530d7f9ae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8195400 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.6@o2ib:12345:1:1 ri_session=0x7f53280c4860 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888204948 5173444.380.300912 7f541e7a39c0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f5444043ad0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.2@o2ib:12345:1:1 ri_session=0x7f4fcf534ab0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10888211974 5173444.384.210452 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c81997c0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.1@o2ib:12345:1:1 ri_session=0x7f53280db4b0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888232770 5173444.398.199602 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c818ff70 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.26@o2ib:12345:1:1 ri_session=0x7f53280db960 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888246849 5173444.406.242343 7f530d7f9ae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8196f90 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.39@o2ib:12345:1:1 ri_session=0x7f53280c4d10 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888268064 5173444.419.960327 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8194eb0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.6@o2ib:12345:1:1 ri_session=0x7f53280c4860 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888282142 5173444.428.621028 7f530d7f9ae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8190b00 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.3@o2ib:12345:1:1 ri_session=0x7f53280d0490 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888345805 5173444.464.292384 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c82ab6f0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.39@o2ib:12345:1:1 ri_session=0x7f53280c4d10 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888381136 5173444.484.857104 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8195960 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.3@o2ib:12345:1:1 ri_session=0x7f53280dbe10 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888409549 5173444.500.372651 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8191050 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.2@o2ib:12345:1:1 ri_session=0x7f53280e6490 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888452211 5173444.525.024454 7f530d7f9ae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c81997c0 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.27@o2ib:12345:1:1 ri_session=0x7f53280beca0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888494599 5173444.545.717364 7f530d7f9ae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c819efa0 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.1@o2ib:12345:1:1 ri_session=0x7f53280db4b0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888508623 5173444.555.395676 7f530d7f9ae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8191050 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.26@o2ib:12345:1:1 ri_session=0x7f53280db960 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
Note: all endpoints are hax
' ones.
from cortx-motr.
Some more info with all received and sent connection requests and replies from confd-m0trace.64072-02.txt.zip:
$ 7z x -so confd-m0trace.64072-02.txt.zip | grep -E 'rcv_conn_(init|terminate) >|CONN.*o2ib|item_received.*/3[23]'
10887604486 5173444.038.247032 7f541e7a39c0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f5444044420 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.6@o2ib:12345:1:1 ri_session=0x7f4fcf516420 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887612478 5173444.045.111881 7f541e7a39c0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f5444043ad0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.39@o2ib:12345:1:1 ri_session=0x7f4fcf52cc70 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887638390 5173444.064.256650 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c82aa6c0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.27@o2ib:12345:1:1, oneway = 0
10887647746 5173444.067.289113 7f532fffeba0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f53280b1b90, ep_addr: 172.18.1.27@o2ib:12345:1:1, machine: 0x373cc70
10887647807 5173444.067.489217 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c82ab890 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.27@o2ib:12345:1:1 ri_session=0x7f53280b1f40 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887673868 5173444.083.536832 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c82adfe0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.24@o2ib:12345:1:1, oneway = 0
10887674326 5173444.083.917592 7f532fffeba0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f53280bb8d0, ep_addr: 172.18.1.24@o2ib:12345:1:1, machine: 0x373cc70
10887674537 5173444.084.129748 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c82aac20 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.24@o2ib:12345:1:1 ri_session=0x7f53280bbc80 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887704924 5173444.099.499727 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c82af0c0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 172.18.1.6@o2ib:12345:1:1, oneway = 0
10887720464 5173444.107.412968 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c82acf00[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.26@o2ib:12345:1:1, oneway = 0
10887720576 5173444.107.760271 7f532fffeba0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f53280b9910, ep_addr: 172.18.1.26@o2ib:12345:1:1, machine: 0x373cc70
10887720637 5173444.107.934431 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c82abe00 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.26@o2ib:12345:1:1 ri_session=0x7f53280b9cc0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887727469 5173444.111.543170 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c82ac350[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 172.18.1.39@o2ib:12345:1:1, oneway = 0
10887750669 5173444.123.847804 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c82b0800[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.1@o2ib:12345:1:1, oneway = 0
10887757552 5173444.128.246408 7f541e7a39c0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f5444044420 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.3@o2ib:12345:1:1 ri_session=0x7f4fcf533a10 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887764414 5173444.132.305505 7f532fffeba0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f53280c4100, ep_addr: 172.18.1.1@o2ib:12345:1:1, machine: 0x373cc70
10887764486 5173444.132.524255 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c82b13a0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.1@o2ib:12345:1:1 ri_session=0x7f53280fe190 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887828187 5173444.166.480754 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c82acf00[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.6@o2ib:12345:1:1, oneway = 0
10887828303 5173444.166.732232 7f532fffeba0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f53280c44b0, ep_addr: 172.18.1.6@o2ib:12345:1:1, machine: 0x373cc70
10887828355 5173444.166.917860 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c82b18f0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.6@o2ib:12345:1:1 ri_session=0x7f53280c4860 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887849393 5173444.176.540900 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c82b28e0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 172.18.1.3@o2ib:12345:1:1, oneway = 0
10887890598 5173444.204.852884 7f541e7a39c0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f5444044420 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.27@o2ib:12345:1:1 ri_session=0x7f4fcf5349b0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887910250 5173444.212.112813 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c82b5ba0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.39@o2ib:12345:1:1, oneway = 0
10887910745 5173444.212.391336 7f532fffeba0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f53280c4960, ep_addr: 172.18.1.39@o2ib:12345:1:1, machine: 0x373cc70
10887910992 5173444.212.615441 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c81905b0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.39@o2ib:12345:1:1 ri_session=0x7f53280c4d10 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887943798 5173444.234.687203 7f541e7a39c0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f54440462a0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.26@o2ib:12345:1:1 ri_session=0x7f4fcf541dd0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887963635 5173444.242.045450 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c82b28e0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.3@o2ib:12345:1:1, oneway = 0
10887964052 5173444.242.402716 7f532fffeba0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f53280d00e0, ep_addr: 172.18.1.3@o2ib:12345:1:1, machine: 0x373cc70
10887964267 5173444.242.609706 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c82b5ba0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.3@o2ib:12345:1:1 ri_session=0x7f53280d0490 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887976610 5173444.253.345025 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c8192ce0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.2@o2ib:12345:1:1, oneway = 0
10887983549 5173444.256.059548 7f532fffeba0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f53280d0590, ep_addr: 172.18.1.2@o2ib:12345:1:1, machine: 0x373cc70
10887983586 5173444.256.231076 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8193880 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.2@o2ib:12345:1:1 ri_session=0x7f53280f3d80 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887991810 5173444.261.434945 7f541e7a39c0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f5444043ad0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:1:1 ri_session=0x7f4fcf541ed0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10888012561 5173444.271.141062 7f530d7f9c00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7f53280bb8d0
10888012632 5173444.271.214912 7f530d7f9ae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8193230 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.24@o2ib:12345:1:1 ri_session=0x7f53280bbc80 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888026384 5173444.278.121497 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c818ff70[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 172.18.1.27@o2ib:12345:1:1, oneway = 0
10888068782 5173444.302.429270 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c81915a0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 172.18.1.26@o2ib:12345:1:1, oneway = 0
10888103989 5173444.322.070626 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c8193cd0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 172.18.1.1@o2ib:12345:1:1, oneway = 0
10888125280 5173444.333.447431 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c8194960[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.24@o2ib:12345:1:1, oneway = 0
10888125380 5173444.333.763813 7f532fffeba0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f53280d0940, ep_addr: 172.18.1.24@o2ib:12345:1:1, machine: 0x373cc70
10888125417 5173444.333.974436 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c82b4570 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.24@o2ib:12345:1:1 ri_session=0x7f53280d0cf0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888160774 5173444.352.285776 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c8195400[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.27@o2ib:12345:1:1, oneway = 0
10888160880 5173444.352.646059 7f532fffeba0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f53280be8f0, ep_addr: 172.18.1.27@o2ib:12345:1:1, machine: 0x373cc70
10888160942 5173444.352.829382 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8194eb0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.27@o2ib:12345:1:1 ri_session=0x7f53280beca0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888167881 5173444.357.260235 7f530d7f9c00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7f53280c44b0
10888167890 5173444.357.330193 7f530d7f9ae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8195400 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.6@o2ib:12345:1:1 ri_session=0x7f53280c4860 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888203721 5173444.377.734297 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c8198c20[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.1@o2ib:12345:1:1, oneway = 0
10888203867 5173444.379.041491 7f532fffeba0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f53280db100, ep_addr: 172.18.1.1@o2ib:12345:1:1, machine: 0x373cc70
10888204948 5173444.380.300912 7f541e7a39c0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f5444043ad0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.2@o2ib:12345:1:1 ri_session=0x7f4fcf534ab0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10888211974 5173444.384.210452 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c81997c0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.1@o2ib:12345:1:1 ri_session=0x7f53280db4b0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888232609 5173444.397.588544 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c8198c20[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.26@o2ib:12345:1:1, oneway = 0
10888232730 5173444.398.031616 7f532fffeba0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f53280db5b0, ep_addr: 172.18.1.26@o2ib:12345:1:1, machine: 0x373cc70
10888232770 5173444.398.199602 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c818ff70 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.26@o2ib:12345:1:1 ri_session=0x7f53280db960 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888246839 5173444.406.172413 7f530d7f9c00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7f53280c4960
10888246849 5173444.406.242343 7f530d7f9ae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8196f90 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.39@o2ib:12345:1:1 ri_session=0x7f53280c4d10 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888267896 5173444.419.356997 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c8195400[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.6@o2ib:12345:1:1, oneway = 0
10888268004 5173444.419.777975 7f532fffeba0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f53280c44b0, ep_addr: 172.18.1.6@o2ib:12345:1:1, machine: 0x373cc70
10888268064 5173444.419.960327 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8194eb0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.6@o2ib:12345:1:1 ri_session=0x7f53280c4860 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888282118 5173444.428.548235 7f530d7f9c00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7f53280d00e0
10888282142 5173444.428.621028 7f530d7f9ae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8190b00 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.3@o2ib:12345:1:1 ri_session=0x7f53280d0490 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888303307 5173444.439.786088 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c819a260[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 172.18.1.2@o2ib:12345:1:1, oneway = 0
10888345668 5173444.463.794398 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c819b430[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.39@o2ib:12345:1:1, oneway = 0
10888345765 5173444.464.129585 7f532fffeba0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f53280c4960, ep_addr: 172.18.1.39@o2ib:12345:1:1, machine: 0x373cc70
10888345805 5173444.464.292384 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c82ab6f0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.39@o2ib:12345:1:1 ri_session=0x7f53280c4d10 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888380951 5173444.484.264511 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c8190b00[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.3@o2ib:12345:1:1, oneway = 0
10888381053 5173444.484.706690 7f532fffeba0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f53280dba60, ep_addr: 172.18.1.3@o2ib:12345:1:1, machine: 0x373cc70
10888381136 5173444.484.857104 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8195960 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.3@o2ib:12345:1:1 ri_session=0x7f53280dbe10 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888409412 5173444.499.755014 7f50cdecf8e0 CALL rpc/rpc_machine.c:874:item_received > machine: 0x373cc70, item: 0x7f50c819bed0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.2@o2ib:12345:1:1, oneway = 0
10888409509 5173444.500.237669 7f532fffeba0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f53280e60e0, ep_addr: 172.18.1.2@o2ib:12345:1:1, machine: 0x373cc70
10888409549 5173444.500.372651 7f532fffeab0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8191050 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.2@o2ib:12345:1:1 ri_session=0x7f53280e6490 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888452202 5173444.524.959438 7f530d7f9c00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7f53280be8f0
10888452211 5173444.525.024454 7f530d7f9ae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c81997c0 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.27@o2ib:12345:1:1 ri_session=0x7f53280beca0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888494590 5173444.545.649891 7f530d7f9c00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7f53280db100
10888494599 5173444.545.717364 7f530d7f9ae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c819efa0 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.1@o2ib:12345:1:1 ri_session=0x7f53280db4b0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888508614 5173444.555.328896 7f530d7f9c00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7f53280db5b0
10888508623 5173444.555.395676 7f530d7f9ae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f50c8191050 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.26@o2ib:12345:1:1 ri_session=0x7f53280db960 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
Note: some connections do terminate (see at the end).
from cortx-motr.
Here are the m0trace logs from a simple singlenode configuration (easier to analyse): confd-m0trace.14465.txt.zip, hax-m0trace.14071.txt.zip.
From the preliminary analysis, it looks like new rpc connections are made by hax process to confd on each rconfc_start (see in hax-m0trace.14071.txt.zip), but they are not terminated on rconfc_stop.
from cortx-motr.
@andriytk, do you have hax journald logs? I wonder if these are related to HA notifications.
from cortx-motr.
@mssawant here you are: sage-hax-syslog.txt.
BTW: note how long it took just to start rconfc:
Dec 17 11:46:17 sage-tier1-1 hare-hax[64056]: 2020-12-17 11:46:17,724 [DEBUG] {fs-stats-updater} Starting rconfc
...
Dec 17 11:47:19 sage-tier1-1 hare-hax[64056]: 2020-12-17 11:47:19,842 [DEBUG] {fs-stats-updater} rconfc started
More than 1 minute!
from cortx-motr.
But when it turned out that a long-running rconfc makes problems at Motr level...
@knekrasov can you recall by any chance - what the problem was it?
from cortx-motr.
Let's look more precisely on what happens with connections during fs-stats fetching in hax-m0trace.14071.txt.zip:
rconfc starts and queries hax (itself) for the entrypoint:
$ 7z x -so hax-m0trace.14071.txt.zip | grep -E 'conn_(init|terminate) >|CONN.*:12345:|item_received.*/3[23]|rconfc_(start|stop)'
...
202218 42990.745.189295 7fb5849cd7b0 CALL spiel/spiel.c:72:m0_spiel_rconfc_start >
202282 42990.745.702553 7fb5849cd720 CALL conf/rconfc.c:2908:m0_rconfc_start_wait > rconfc = 0x20b8160, profile = <7200000000000001:6>
202283 42990.745.703749 7fb5849cd670 CALL conf/rconfc.c:2890:m0_rconfc_start > rconfc = 0x20b8160, profile = <7200000000000001:6>
202286 42990.745.729227 7fb5849cd670 CALL conf/rconfc.c:2898:m0_rconfc_start < rc=0
202288 42990.745.734798 7fb5849cd5b0 CALL conf/rconfc.c:2014:rconfc_start_ast_cb >
202289 42990.745.735780 7fb5849cd370 CALL conf/rconfc.c:1982:rconfc_start >
202291 42990.745.741137 7fb5849cd370 DEBUG conf/rconfc.c:1988:rconfc_start Querying ENTRYPOINT...
202300 42990.745.850102 7fb5849cd370 CALL conf/rconfc.c:2006:rconfc_start < rc=0 hae_control=2 is_failed=false entrypoint_retries=0
202301 42990.745.851756 7fb5849cd5b0 CALL conf/rconfc.c:2016:rconfc_start_ast_cb <
202395 42990.749.584679 7fb5ba0b2aa0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x23ec240 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
202620 42990.751.468573 7fb5861d1970 CALL rpc/rpc_machine.c:874:item_received > machine: 0x20b3670, item: 0x7fb578020b90[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
202672 42990.751.771651 7fb5b90b0cb0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7fb5ac061e00, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x20b3670
202709 42990.751.890780 7fb5b90b0bc0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5780383f0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7fb5ac048660 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
202946 42990.754.491542 7fb5861d1970 CALL rpc/rpc_machine.c:874:item_received > machine: 0x20b3670, item: 0x7fb5780383f0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
205055 42991.266.572183 7fb5ba0b2a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x23ec240 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
205335 42991.268.219042 7fb5ab7fdd00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7fb5ac061e00
205344 42991.268.245554 7fb5ab7fdbe0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb578020b90 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7fb5ac048660 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
205754 42991.270.591204 7fb5b88afd70 CALL conf/rconfc.c:2014:rconfc_start_ast_cb >
205755 42991.270.592186 7fb5b88afb30 CALL conf/rconfc.c:1982:rconfc_start >
205757 42991.270.610338 7fb5b88afb30 DEBUG conf/rconfc.c:1993:rconfc_start ENTRYPOINT ready...
Got entrypoint, connects to confd:
205758 42991.270.610746 7fb5b88afb30 CALL conf/rconfc.c:1958:rconfc_start_internal >
205863 42991.271.451757 7fb5ba0b2aa0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b405d4b0 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
206089 42991.275.119529 7fb5861d1970 CALL rpc/rpc_machine.c:874:item_received > machine: 0x20b3670, item: 0x7fb5780383f0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
206522 42991.280.185454 7fb5b88afb30 CALL conf/rconfc.c:1971:rconfc_start_internal < rc=0
206523 42991.280.188604 7fb5b88afb30 CALL conf/rconfc.c:2006:rconfc_start < rc=0 hae_control=2 is_failed=false entrypoint_retries=0
206524 42991.280.189930 7fb5b88afd70 CALL conf/rconfc.c:2016:rconfc_start_ast_cb <
208571 42991.908.523648 7fb5b88af860 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5b405fd60 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b404be20 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
209312 42991.912.326503 7fb5861d1970 CALL rpc/rpc_machine.c:874:item_received > machine: 0x20b3670, item: 0x7fb578040ae0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
210383 42991.923.503353 7fb5861d1970 CALL rpc/rpc_machine.c:874:item_received > machine: 0x20b3670, item: 0x7fb578020b90[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
210470 42991.925.299777 7fb5b90b0cb0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7fb5ac061e00, ep_addr: 192.168.180.153@tcp:12345:2:1, machine: 0x20b3670
210517 42991.925.557938 7fb5b90b0bc0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5780482d0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5ac0488f0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
211812 42991.934.156981 7fb5ba0b2aa0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b4060420 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
212040 42991.941.190429 7fb5861d1970 CALL rpc/rpc_machine.c:874:item_received > machine: 0x20b3670, item: 0x7fb578020b90[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
212772 42991.954.691164 7fb5ba0b2a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b4060420 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
213183 42991.959.929918 7fb5ba0b2aa0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b405c580 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
213412 42991.963.468575 7fb5861d1970 CALL rpc/rpc_machine.c:874:item_received > machine: 0x20b3670, item: 0x7fb578020b90[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
235277 42992.196.447185 7fb5849cd720 CALL conf/rconfc.c:2934:m0_rconfc_start_wait < rc=0
235278 42992.196.449106 7fb5849cd7b0 CALL spiel/spiel.c:87:m0_spiel_rconfc_start < rc=0
Got the configuration from confd, rconfc is started.
So we've sent 4 connection establish requests to confd and only 1 connection terminate request so far.
Now we connect to ios (to get fs stats):
239026 42992.530.334986 7fb5ba0b2aa0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:2 ri_session=0x7fb574048ac0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
239534 42992.533.973605 7fb5861d1970 CALL rpc/rpc_machine.c:874:item_received > machine: 0x20b3670, item: 0x7fb578049d60[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:2, oneway = 0
241006 42992.551.071387 7fb5ba0b2a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:2 ri_session=0x7fb574048ac0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
We've got the stats, terminated the connection with ios and now stop rconfc with sending 3 connection termination requests to confd:
241349 42992.555.873956 7fb5849cd8a0 CALL spiel/spiel.c:93:m0_spiel_rconfc_stop >
241350 42992.555.875431 7fb5849cd860 CALL conf/rconfc.c:2956:m0_rconfc_stop_sync > rconfc = 0x20b8160
241351 42992.555.876175 7fb5849cd820 CALL conf/rconfc.c:2939:m0_rconfc_stop > rconfc 0x20b8160
241353 42992.555.896186 7fb5849cd820 CALL conf/rconfc.c:2951:m0_rconfc_stop <
241355 42992.555.899800 7fb5849cd6e0 CALL conf/rconfc.c:2346:rconfc_stop_ast_cb > rconfc = 0x20b8160
241356 42992.555.900359 7fb5849cd670 CALL conf/rconfc.c:2317:rconfc_stop_internal > rconfc = 0x20b8160
241731 42992.562.096228 7fb5ba0b2a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b405d4b0 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
242948 42992.573.030791 7fb5849cd2e0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5740478f0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b404be20 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
243278 42992.576.067319 7fb5849cd670 CALL conf/rconfc.c:2339:rconfc_stop_internal <
243279 42992.576.068375 7fb5849cd6e0 CALL conf/rconfc.c:2352:rconfc_stop_ast_cb <
244044 42992.576.647521 7fb5849cd860 CALL conf/rconfc.c:2965:m0_rconfc_stop_sync <
244420 42992.589.076548 7fb5ba0b2a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b405c580 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
244794 42992.598.139163 7fb5849cd8a0 CALL spiel/spiel.c:96:m0_spiel_rconfc_stop <
So looks like all connections to confd from hax are terminated. But there is one connection from confd to hax - 0x7fb5ac061e00 - which is not terminated.
Here is the next iteration of fs stats fetch where all the above steps are repeated:
244831 43079.667.115084 7fb5849cd7b0 CALL spiel/spiel.c:72:m0_spiel_rconfc_start >
244898 43079.667.661081 7fb5849cd720 CALL conf/rconfc.c:2908:m0_rconfc_start_wait > rconfc = 0x20b8160, profile = <7200000000000001:6>
244899 43079.667.661744 7fb5849cd670 CALL conf/rconfc.c:2890:m0_rconfc_start > rconfc = 0x20b8160, profile = <7200000000000001:6>
244902 43079.667.684638 7fb5849cd670 CALL conf/rconfc.c:2898:m0_rconfc_start < rc=0
244904 43079.667.690194 7fb5849cd5b0 CALL conf/rconfc.c:2014:rconfc_start_ast_cb >
244905 43079.667.691580 7fb5849cd370 CALL conf/rconfc.c:1982:rconfc_start >
244907 43079.667.697005 7fb5849cd370 DEBUG conf/rconfc.c:1988:rconfc_start Querying ENTRYPOINT...
244913 43079.667.795861 7fb5849cd370 CALL conf/rconfc.c:2006:rconfc_start < rc=0 hae_control=2 is_failed=false entrypoint_retries=0
244914 43079.667.797122 7fb5849cd5b0 CALL conf/rconfc.c:2016:rconfc_start_ast_cb <
245008 43079.669.270664 7fb5ba0b2aa0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x23ec240 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
245233 43079.671.588732 7fb5861d1970 CALL rpc/rpc_machine.c:874:item_received > machine: 0x20b3670, item: 0x7fb578049d60[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
245285 43079.672.078852 7fb5b90b0cb0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7fb5ac0822d0, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x20b3670
245322 43079.672.234060 7fb5b90b0bc0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5780492c0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7fb5ac082680 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
245560 43079.674.292506 7fb5861d1970 CALL rpc/rpc_machine.c:874:item_received > machine: 0x20b3670, item: 0x7fb5780492c0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
247668 43080.152.347547 7fb5ba0b2a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x23ec240 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
247948 43080.154.133262 7fb5ab7fdd00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7fb5ac0822d0
247957 43080.154.175477 7fb5ab7fdbe0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb57804a3a0 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7fb5ac082680 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
248379 43080.160.570193 7fb5b88afd70 CALL conf/rconfc.c:2014:rconfc_start_ast_cb >
248380 43080.160.571246 7fb5b88afb30 CALL conf/rconfc.c:1982:rconfc_start >
248382 43080.160.596849 7fb5b88afb30 DEBUG conf/rconfc.c:1993:rconfc_start ENTRYPOINT ready...
248383 43080.160.597438 7fb5b88afb30 CALL conf/rconfc.c:1958:rconfc_start_internal >
248488 43080.161.852568 7fb5ba0b2aa0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b4043b40 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
248714 43080.168.752491 7fb5861d1970 CALL rpc/rpc_machine.c:874:item_received > machine: 0x20b3670, item: 0x7fb5780492c0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
249147 43080.173.337323 7fb5b88afb30 CALL conf/rconfc.c:1971:rconfc_start_internal < rc=0
249148 43080.173.338958 7fb5b88afb30 CALL conf/rconfc.c:2006:rconfc_start < rc=0 hae_control=2 is_failed=false entrypoint_retries=0
249149 43080.173.339886 7fb5b88afd70 CALL conf/rconfc.c:2016:rconfc_start_ast_cb <
251184 43080.805.585857 7fb5849cd0c0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5740478f0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb574048ac0 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
251912 43080.809.353664 7fb5861d1970 CALL rpc/rpc_machine.c:874:item_received > machine: 0x20b3670, item: 0x7fb57804ad50[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
253007 43080.822.430385 7fb5861d1970 CALL rpc/rpc_machine.c:874:item_received > machine: 0x20b3670, item: 0x7fb57804a800[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
253059 43080.823.321014 7fb5b90b0cb0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7fb5ac0822d0, ep_addr: 192.168.180.153@tcp:12345:2:1, machine: 0x20b3670
253110 43080.823.456740 7fb5b90b0bc0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb57804b3a0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5ac082680 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
254437 43080.830.687670 7fb5ba0b2aa0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b4063670 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
254665 43080.833.652895 7fb5861d1970 CALL rpc/rpc_machine.c:874:item_received > machine: 0x20b3670, item: 0x7fb57804a800[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
255397 43080.840.917944 7fb5ba0b2a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b4063670 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
255808 43080.845.591560 7fb5ba0b2aa0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b4063670 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
256036 43080.848.477597 7fb5861d1970 CALL rpc/rpc_machine.c:874:item_received > machine: 0x20b3670, item: 0x7fb57804a800[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
277890 43081.067.401059 7fb5849cd720 CALL conf/rconfc.c:2934:m0_rconfc_start_wait < rc=0
277891 43081.067.402329 7fb5849cd7b0 CALL spiel/spiel.c:87:m0_spiel_rconfc_start < rc=0
281833 43081.428.335787 7fb5ba0b2aa0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:2 ri_session=0x7fb574048bc0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
282217 43081.432.311942 7fb5861d1970 CALL rpc/rpc_machine.c:874:item_received > machine: 0x20b3670, item: 0x7fb57804c8d0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:2, oneway = 0
283631 43081.447.470724 7fb5ba0b2a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:2 ri_session=0x7fb574048bc0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
283974 43081.452.047354 7fb5849cd8a0 CALL spiel/spiel.c:93:m0_spiel_rconfc_stop >
283975 43081.452.049350 7fb5849cd860 CALL conf/rconfc.c:2956:m0_rconfc_stop_sync > rconfc = 0x20b8160
283976 43081.452.050079 7fb5849cd820 CALL conf/rconfc.c:2939:m0_rconfc_stop > rconfc 0x20b8160
283978 43081.452.068318 7fb5849cd820 CALL conf/rconfc.c:2951:m0_rconfc_stop <
283980 43081.452.071738 7fb5849cd6e0 CALL conf/rconfc.c:2346:rconfc_stop_ast_cb > rconfc = 0x20b8160
283981 43081.452.072514 7fb5849cd670 CALL conf/rconfc.c:2317:rconfc_stop_internal > rconfc = 0x20b8160
284344 43081.456.381947 7fb5ba0b2a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b4043b40 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
285573 43081.465.962627 7fb5849cd2e0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5740478f0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb574048ac0 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
285903 43081.469.257972 7fb5849cd670 CALL conf/rconfc.c:2339:rconfc_stop_internal <
285904 43081.469.258671 7fb5849cd6e0 CALL conf/rconfc.c:2352:rconfc_stop_ast_cb <
286669 43081.470.115579 7fb5849cd860 CALL conf/rconfc.c:2965:m0_rconfc_stop_sync <
287045 43081.479.112607 7fb5ba0b2a80 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b4063670 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
287419 43081.482.272063 7fb5849cd8a0 CALL spiel/spiel.c:96:m0_spiel_rconfc_stop <
Again we can see that the receiving connection 0x7fb5ac0822d0 is not terminated.
from cortx-motr.
Now let's see what was happening at confd side at the same time-frame in confd-m0trace.14465.txt.zip:
Here we see 4 incoming connection requests from hax with one of them terminated and 1 outgoing connection request to hax:
$ 7z x -so confd-m0trace.14465.txt.zip | grep -E 'rcv_conn_(init|terminate) >|CONN.*:12345:|item_received.*/3[23]|rconfc_(start|stop)'
...
179109 42991.272.681309 7f200fffe870 CALL rpc/rpc_machine.c:874:item_received > machine: 0x236eaf0, item: 0x7f200800f830[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
179167 42991.273.485286 7f203568ebb0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f2030096940, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x236eaf0
179204 42991.273.586025 7f203568eac0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f200800a150 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f2030096cf0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
179771 42991.909.697646 7f200fffe870 CALL rpc/rpc_machine.c:874:item_received > machine: 0x236eaf0, item: 0x7f200800f830[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
179823 42991.910.320173 7f203568ebb0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f20300d5570, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x236eaf0
179860 42991.910.458264 7f203568eac0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f200800a150 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f203009a600 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
180686 42991.921.885390 7f20366909a0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f202c07ec20 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f203009a4b0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
181443 42991.926.975026 7f200fffe870 CALL rpc/rpc_machine.c:874:item_received > machine: 0x236eaf0, item: 0x7f200800a150[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
182334 42991.937.299069 7f200fffe870 CALL rpc/rpc_machine.c:874:item_received > machine: 0x236eaf0, item: 0x7f20080246c0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
182386 42991.937.873888 7f203568ebb0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f20300d5920, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x236eaf0
182423 42991.938.140708 7f203568eac0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f200800a150 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300e03f0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
183371 42991.956.717151 7f20277fdc00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7f20300d5920
183380 42991.956.766442 7f20277fdae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2008025d60 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300e03f0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
183692 42991.961.501955 7f200fffe870 CALL rpc/rpc_machine.c:874:item_received > machine: 0x236eaf0, item: 0x7f20080246c0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
183744 42991.961.936256 7f203568ebb0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f20300d5920, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x236eaf0
183781 42991.962.102002 7f203568eac0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f200800a150 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300d5cd0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
Now we see that 3 incoming connections got terminated:
202579 42992.563.574984 7f20277fdc00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7f2030096940
202588 42992.563.601000 7f20277fdae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2008024d00 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f2030096cf0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
203661 42992.574.334644 7f20277fdc00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7f20300d5570
203670 42992.574.362695 7f20277fdae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f200800f910 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f203009a600 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
205046 42992.594.659369 7f20277fdc00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7f20300d5920
205055 42992.594.738954 7f20277fdae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2008024220 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300d5cd0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
But the outgoing connection (sent at 42991.921.885390) was not terminated.
Here is the next iteration where everything repeats again:
205454 43080.164.138594 7f200fffe870 CALL rpc/rpc_machine.c:874:item_received > machine: 0x236eaf0, item: 0x7f200800f830[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
205512 43080.165.241444 7f203568ebb0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f20300e0120, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x236eaf0
205549 43080.165.405545 7f203568eac0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f200800a150 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f2030096940 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
206118 43080.806.794370 7f200fffe870 CALL rpc/rpc_machine.c:874:item_received > machine: 0x236eaf0, item: 0x7f200800f830[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
206170 43080.807.651058 7f203568ebb0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f20300d5500, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x236eaf0
206207 43080.807.760646 7f203568eac0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f200800a150 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300d58b0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
207028 43080.821.351921 7f20366909a0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f202c07c120 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f2030096a40 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
207790 43080.824.828174 7f200fffe870 CALL rpc/rpc_machine.c:874:item_received > machine: 0x236eaf0, item: 0x7f200800a150[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
208687 43080.831.971863 7f200fffe870 CALL rpc/rpc_machine.c:874:item_received > machine: 0x236eaf0, item: 0x7f20080246c0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
208739 43080.832.390467 7f203568ebb0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f20300d5b50, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x236eaf0
208776 43080.832.533118 7f203568eac0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f200800a150 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300d5f00 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
209730 43080.842.415421 7f20277fdc00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7f20300d5b50
209739 43080.842.460607 7f20277fdae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2008025d60 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300d5f00 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
210052 43080.846.833923 7f200fffe870 CALL rpc/rpc_machine.c:874:item_received > machine: 0x236eaf0, item: 0x7f20080246c0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
210104 43080.847.246842 7f203568ebb0 CALL rpc/conn.c:533:m0_rpc_rcv_conn_init > conn: 0x7f20300d5b50, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x236eaf0
210141 43080.847.351301 7f203568eac0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f200800a150 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300d5f00 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
228983 43081.457.734384 7f20277fdc00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7f20300e0120
228992 43081.457.762991 7f20277fdae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2008024d00 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f2030096940 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
230068 43081.467.199184 7f20277fdc00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7f20300d5500
230077 43081.467.224764 7f20277fdae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f200800f910 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300d58b0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
231455 43081.480.454485 7f20277fdc00 CALL rpc/conn.c:1219:m0_rpc_rcv_conn_terminate > conn: 0x7f20300d5b50
231464 43081.480.480612 7f20277fdae0 CALL rpc/item.c:1139:m0_rpc_item_send > 0x7f2008024220 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300d5f00 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
So looks like on each fs-stats iteration the outgoing connection from confd to hax is leaked.
from cortx-motr.
This is the leaked connection (from confd-m0trace.14465.txt.zip):
180517 42991.920.514769 7f203568ea40 CALL rm/rm_foms.c:365:m0_rm_reverse_session_get > remote incoming 0x7f2008024cd0 remote 0x7f20300dff80
180518 42991.920.519680 7f203568e9c0 CALL rpc/service.c:174:m0_rpc_service_reverse_session_get >
180519 42991.920.520691 7f203568e970 CALL lib/memory.c:129:m0_alloc > size=3160
180520 42991.920.524965 7f203568e970 CALL lib/memory.c:140:m0_alloc < ptr=0x7f2030104e40 size=3160
180521 42991.920.527077 7f203568e960 CALL rpc/link.c:611:m0_rpc_link_init > rlink=0x7f2030104e40 ep=192.168.180.153@tcp:12345:1:1
180522 42991.920.531803 7f203568e7a0 CALL net/lnet/lnet_core.c:396:nlx_core_ep_addr_decode > ep_addr=192.168.180.153@tcp:12345:1:1
180523 42991.920.571764 7f203568e760 CALL net/lnet/ulnet_core.c:559:nlx_ucore_ioctl < rc=0
180524 42991.920.577480 7f203568e900 CALL net/ep.c:84:m0_net_end_point_create < rc=0
180525 42991.920.578947 7f203568e8d0 CALL rpc/conn.c:303:m0_rpc_conn_init > conn=0x7f2030104e40, svc_fid=(nil) ep=192.168.180.153@t
cp:12345:1:1
And later we can see it in this list (compare with the issue description at the top):
211306 43080.860.010893 7f2036690720 CALL rpc/formation2.c:580:frm_fill_packet_from_item_sources >
211307 43080.860.019915 7f2036690720 DEBUG rpc/formation2.c:585:frm_fill_packet_from_item_sources conn: 0x7f203011baf0
211308 43080.860.020275 7f2036690720 DEBUG rpc/formation2.c:585:frm_fill_packet_from_item_sources conn: 0x7f2030104e40
211309 43080.860.020602 7f2036690720 DEBUG rpc/formation2.c:585:frm_fill_packet_from_item_sources conn: 0x7f20300d3090
211310 43080.860.020895 7f2036690720 DEBUG rpc/formation2.c:585:frm_fill_packet_from_item_sources conn: 0x7f20300d62c0
211311 43080.860.021194 7f2036690720 DEBUG rpc/formation2.c:585:frm_fill_packet_from_item_sources conn: 0x7f2030093b70
211312 43080.860.021336 7f2036690720 DEBUG rpc/formation2.c:585:frm_fill_packet_from_item_sources conn: 0x7f203009abc0
211313 43080.860.021638 7f2036690720 DEBUG rpc/formation2.c:585:frm_fill_packet_from_item_sources conn: 0x2460dd0
211314 43080.860.021916 7f2036690720 DEBUG rpc/formation2.c:585:frm_fill_packet_from_item_sources conn: 0x2459d10
211315 43080.860.022200 7f2036690720 DEBUG rpc/formation2.c:585:frm_fill_packet_from_item_sources conn: 0x2452c50
211316 43080.860.022296 7f2036690720 DEBUG rpc/formation2.c:585:frm_fill_packet_from_item_sources conn: 0x244bb90
211317 43080.860.022574 7f2036690720 DEBUG rpc/formation2.c:585:frm_fill_packet_from_item_sources conn: 0x2444ad0
211318 43080.860.022667 7f2036690720 DEBUG rpc/formation2.c:585:frm_fill_packet_from_item_sources conn: 0x243d8a0
211319 43080.860.023044 7f2036690720 DEBUG rpc/formation2.c:585:frm_fill_packet_from_item_sources conn: 0x2435aa0
211320 43080.860.023320 7f2036690720 DEBUG rpc/formation2.c:585:frm_fill_packet_from_item_sources conn: 0x23884b0
211321 43080.860.023676 7f2036690720 DEBUG rpc/formation2.c:585:frm_fill_packet_from_item_sources conn: 0x7ffeb6106628
211322 43080.860.023842 7f2036690720 CALL rpc/formation2.c:623:frm_fill_packet_from_item_sources <
The list is full of leaked reverse connections which only grows over time.
from cortx-motr.
After discussion with @just-now, the following was found out: looks like the reverse session/connection is established so that RM manager is able to revoke the RM-read-lock taken by each rconfc client on the configuration. (This is needed to update the configuration among all processes in the cluster, whenever it happens: RM-read-lock is revoked and rconfc in each Motr process in the cluster would abandon the old configuration and try to get the new one.)
Now, looks like this reverse session/connection is not cleaned up in the main confd process whenever rconfc stops in another Motr processes. These connections are getting accumulated over time, wasting memory and CPU cycles at frm_fill_packet_from_item_sources()
. This is a bug and must be fixed.
I assume the problem exists also in LDR-R1. It's just would take a few weeks to show up as there are only 2 nodes. (While it takes only 1 day to show up on SAGE-cluster with 22 nodes.)
cc @truptiatseagate, @huanghua78.
Background info: on each node hax
process periodically (once a minute) fetches fs-statistics. On each such fetch it starts and stops rconfc. So the reverse sessions/connectios are established in confd on each such iteration from each node.
from cortx-motr.
@knekrasov can you recall by any chance - what the problem was it?
@andriytk, if rconfc lives along with the hax the hax shutdown fails because confd's are stopped before hax and then hax is not able to shutdown cleanly and fails with timeout.
from cortx-motr.
@andriytk, I would suggest to create a Jira ticket for this.
from cortx-motr.
if rconfc lives along with the hax the hax shutdown fails because confd's are stopped before hax...
@mssawant yes, that's what I have already learned today (thanks to @knekrasov). But it is not a solution to stop rconfc and start rconfc periodically as fs-stats is fetched. Because the shutdown may happen at any time - even during the fs-stats fetching when rconfc is started and not stopped yet. And on relatively big clusters (like at SAGE where we have 22 nodes) it takes about 1-minute only to start rconfc! (See my comments above at #298 (comment) and at #298 (comment).) So it's definitely not a solution. We just reduce the probability of the problem appearance or, in another word - we are just hiding the bug instead of fixing it.
from cortx-motr.
to create a Jira ticket for this
Looks like we have two different problems here:
- Connection/memory leakage at confd - Motr issue (this one).
- Periodic start/stop of rconfc in
hax
- Hare issue.
I'd better suggest to create another GitHub issue for Hare. (Jira is not publicly visible and GitHub is much more convenient. As @johnbent advised, let's discuss our technical issues publicly.)
from cortx-motr.
After discussion with @just-now, the following was found out: looks like the reverse session/connection is established so that RM manager is able to revoke the RM-read-lock taken by each rconfc client on the configuration. (This is needed to update the configuration among all processes in the cluster, whenever it happens: RM-read-lock is revoked and rconfc in each Motr process in the cluster would abandon the old configuration and try to get the new one.)
Hello @andriytk , @mssawant @nikitadanilov , this RM-read-lock taken by each rconfc client on the configuration
will cause problems.
If some clients, die accidentally, without proper cleanup. So, the RM-read-lock can not be revoked. (Motr RM lock is not lease based). In this case, the confd can not get an exclusive write lock to update the configuration?
For example, node addition, pool status update, all need to update configuration.
from cortx-motr.
If some clients, die accidentally...
@huanghua78 yes, this is another scenario that must be handled properly.
BTW, I just checked in the code (
Line 593 in 45b8fbb
180578 42991.921.022896 7f203568ea90 CALL rm/rm_foms.c:544:rfom_debtor_subscribe < rc=0
180579 42991.921.023363 7f203568ea90 CALL rm/rm_foms.c:595:request_pre_process < rc=1
It looks like we do subscribe the debtor for HA-notification. So when the process stops or crashes and M0_NC_FAILED
ha-state is received - it must be cleaned up. But the problem in our case is that hax
process does not stop or crash - it just stops and starts rconfc periodically. And each time rconfc starts - it creates the new debtor at RM-manager service (working inside leading confd process), but it the old one is not cleaned up on rconfc stop. Although M0_RM_FOP_CANCEL
is sent and received:
202898 42992.567.231871 7f200fffe640 CALL rpc/item.c:1223:m0_rpc_item_received > 0x7f200800a150 [REQUEST/M0_RM_FOP_CANCEL(92)] xid=2 machine=0x236eaf0
I wonder why the debtor is not cleaned up on this cancel fop...
from cortx-motr.
The workaround has been landed into Hare - Seagate/cortx-hare#1437.
from cortx-motr.
This issue/pull request has been marked as needs attention
as it has been left pending without new activity for 4 days. Tagging @huanghua78 @mukundkanekar for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.
from cortx-motr.
Related Issues (20)
- Unexpected keyword argument 'priority' in function call HOT 6
- degraded write fails when cob create fails HOT 5
- m0_fom_queue() panic in some UTs while doing cleanup HOT 13
- list of unit test failed HOT 3
- subprocess call with shell=True identified, security issue. HOT 9
- [Codacy code cleanup] occurrence: 8 pattern: hardcoded_sql_expression HOT 7
- Disable alternative pool versions by default HOT 4
- Sustained writes causes Motr to stop responding and require restarting HOT 25
- gccxml_cc1plus: error: cortx-motr/be/btree.h: No such file or directory HOT 6
- UT net-lnet is failing on Jenkins because lnet is not configured/loaded properly. HOT 11
- dtm0/log: record pmsgs from participants HOT 5
- I/O test failed HOT 10
- Error when trying to install Motr+HARE from iosea release HOT 3
- Ctgdump failed and all2all test HOT 7
- Phase2 mkfs failed during cluster bootstrapping after recent updates HOT 8
- Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead. HOT 7
- dix: return obj-version and number of replicas on get HOT 9
- dix: introduce cas-put by obj-version operation HOT 5
- dix: introduce cas-delete by obj-version operation HOT 4
- dix: make index creation/deletion operations fault-tolerant HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cortx-motr.