Code Monkey home page Code Monkey logo

Comments (34)

knekrasov avatar knekrasov commented on July 27, 2024 1

This is what I found in the history:

  1. Seagate/cortx-hare#1234
  2. EOS-13393
  3. EOS-12405
  4. EOS-12598

from cortx-motr.

mssawant avatar mssawant commented on July 27, 2024 1

@andriytk, yes starting/stopping rconfc was a temporary solution, we agreed upon that and this problem will be further aggravated when we use additional spiel commands. This needs to be worked upon as we move beyond LDR1. Thanks for bringing this up. I will create a separate ticket for this.

from cortx-motr.

mssawant avatar mssawant commented on July 27, 2024 1

Created a separate issue to fix frequent rconfc start/stop in hax, #300

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

There are too many different connections pointers which suggests that there might be a connections leakage.

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

Yesterday, the cluster was re-bootstrapped (to workaround the problem). It was not seen immediately after the cluster was started, but today the issue got reproduced again. Nobody used the cluster since yesterday, so I have a suspicion that it could be caused by the fact that hax periodically updates the Motr consumption statistics on all the nodes (6 server + 16 client nodes are configured there). On each update rconfc is started and stopped, so it could be that the connection is not closed when it is stopped. cc @knekrasov , @mssawant , @NeeravChoudhari .

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

looks like rconfc starts and stops fine in hax:

15:02 tkachuk1@sage-tier1-1:~$ sudo journalctl -lf | grep confc
Dec 15 15:03:13 sage-tier1-1 hare-hax[24143]: 2020-12-15 15:03:13,712 [DEBUG] {fs-stats-updater} Starting rconfc
Dec 15 15:03:53 sage-tier1-1 hare-hax[24143]: 2020-12-15 15:03:53,998 [DEBUG] {fs-stats-updater} rconfc started
Dec 15 15:03:59 sage-tier1-1 hare-hax[24143]: 2020-12-15 15:03:59,525 [DEBUG] {fs-stats-updater} Stopping rconfc
Dec 15 15:03:59 sage-tier1-1 hare-hax[24143]: 2020-12-15 15:03:59,774 [DEBUG] {fs-stats-updater} confc has been stopped successfuly
Dec 15 15:04:29 sage-tier1-1 hare-hax[24143]: 2020-12-15 15:04:29,830 [DEBUG] {fs-stats-updater} Starting rconfc
Dec 15 15:05:12 sage-tier1-1 hare-hax[24143]: 2020-12-15 15:05:12,328 [DEBUG] {fs-stats-updater} rconfc started
Dec 15 15:05:17 sage-tier1-1 hare-hax[24143]: 2020-12-15 15:05:17,850 [DEBUG] {fs-stats-updater} Stopping rconfc
Dec 15 15:05:18 sage-tier1-1 hare-hax[24143]: 2020-12-15 15:05:18,135 [DEBUG] {fs-stats-updater} confc has been stopped successfuly

It just takes an awful lot of time to start it! So the issue might be at the Motr side after all.

cc @max-seagate , @nikitadanilov

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

@mssawant , @knekrasov any idea why we set the process fid as profile at start_rconfc() https://github.com/Seagate/cortx-hare/blob/571ae39ff04bb76ec14859ffe19c0897b1a777d1/hax/hax/motr/hax.c#L682-L693 ?

from cortx-motr.

knekrasov avatar knekrasov commented on July 27, 2024

Perhaps because this is how I understood the team's explanations how it should be initialized. Does your question mean that this is definitely incorrect, @andriytk ?

If so, can you suggest a right FID hax should use?

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

Profile and process are completely different notions in Motr. Profile is usually one in the cluster configuration, you can see it in the beginning of the hctl status output:

15:53 tkachuk1@sage-tier1-1:~$ hctl status | head
...
Profile:
    # fid name: pool(s)
    0x7000000000000001:0x4da 'prof-0': 'tier1-nvme' 'tier2-ssd' 'tier3-hdd' None None
...

And there are usually many processes configured in the cluster, here are few examples:

15:53 tkachuk1@sage-tier1-1:~$ hctl status 
...
    sage-tier1-2 
    [started]  hax        0x7200000000000001:0x21  172.18.1.2@o2ib:12345:1:1
    [offline]  ioservice  0x7200000000000001:0x24  172.18.1.2@o2ib:12345:2:1
    [unknown]  m0_client  0x7200000000000001:0x30  172.18.1.2@o2ib:12345:4:1
    [unknown]  m0_client  0x7200000000000001:0x33  172.18.1.2@o2ib:12345:4:2
...

The FID prefix of profile is 0x70 while the prefix of process FID is 0x72. Always.

So I wonder:

  1. why do we try to set profile fid on each rconfc start (by calling m0_spiel_cmd_profile_set());
  2. why do we set the process fid as a profile (passing it as argument to m0_spiel_cmd_profile_set())?

from cortx-motr.

knekrasov avatar knekrasov commented on July 27, 2024

In hare-status there is only one mention of profile - it is pool profile. And it has fid, yes. Is it the same thing to the one expected by Spiel API?

So I wonder: 1) why do we set profile fid on each rconfc start

As far as I remember, Spiel API will not work without setting the profile fid.

  1. why do we set the process fid as a profile?

Probably that's a mistake.

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

As far as I remember, Spiel API will not work without setting the profile fid.

Yes, but why on each rconfc start which happens periodically on each fs-stats update? Why not just once at start() after https://github.com/Seagate/cortx-hare/blob/571ae39ff04bb76ec14859ffe19c0897b1a777d1/hax/hax/motr/hax.c#L666 ?

from cortx-motr.

knekrasov avatar knekrasov commented on July 27, 2024

Well, initially start_rconfc() was meant to be invoked just once - when the thread initalizes. So at that moment this wasn't really important. But when it turned out that a long-running rconfc makes problems at Motr level, it was decided to start and stop it as quick as possible.

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

m0trace files from hax and confd just after the cluster restart for further analysis:
hax-m0trace.64056.txt.zip
confd-m0trace.64072.txt.zip

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

Here we can see from hax-m0trace.64056.txt.zip how the m0_spiel_cmd_profile_set() is called with the wrong fid of process instead of profile:

867260  5149725.633.139455  7f2306ffb7c0  CALL    spiel/spiel.c:102:m0_spiel_cmd_profile_set   > spiel = 0x1e95650, profile = 7200000000000001:6

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

All requests (and replies) from hax to confd about connection establishment and termination:

$ cat hax-m0trace.64056.txt | grep 'M0_RPC_CONN.*172.18.1.1.o2ib:12345:2:1'
506512  5149588.846.660440  7f2654fa8a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f26800455b0 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
507834  5149588.859.937720  7f2306ffb2d0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f22fc010730 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f268004b8e0 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
518660  5149588.909.128629  7f2654fa8a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f2680051bc0 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
522691  5149652.308.904432  7f2654fa8ac0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f2680045750 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
525411  5149652.889.048213  7f2306ffb0f0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f22fc013cd0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f22fc00b100 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
527309  5149653.041.289508  7f2566ffcbb0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f230c05f8d0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f255c054b90 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
528793  5149653.094.278371  7f2654fa8ac0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f268004ba30 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
529753  5149653.156.047883  7f2654fa8a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f268004ba30 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
530164  5149653.176.596194  7f2654fa8ac0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f268004bf60 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
854723  5149662.413.775698  7f2654fa8a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f2680045750 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
856057  5149662.422.308668  7f2306ffb2d0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f22fc013cd0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f22fc00b100 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
866878  5149662.455.189940  7f2654fa8a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f268004bf60 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
870902  5149725.895.858754  7f2654fa8ac0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f26800e3070 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
873639  5149726.534.923565  7f2306ffb0f0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f22fc013cd0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f22fc00b100 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
875524  5149726.584.463539  7f2566ffcbb0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f230c056740 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f255c0581a0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
877008  5149726.627.633824  7f2654fa8ac0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f26800e6460 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
877980  5149726.702.636074  7f2654fa8a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f26800e6460 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
878391  5149726.726.560668  7f2654fa8ac0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f26800e6460 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
1202228  5149736.052.919950  7f2654fa8a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f26800e3070 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
1203565  5149736.062.226608  7f2306ffb2d0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f22fc013cd0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f22fc00b100 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
1214383  5149736.093.393970  7f2654fa8a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2670043ba0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=172.18.1.1@o2ib:12345:2:1 ri_session=0x7f26800e6460 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

m0trace file of confd after few hours since the cluster startup - confd-m0trace.64072-5.txt.zip

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

In this confd-m0trace.64072-02.txt.zip we can see some new connections added:

$ 7z x -so confd-m0trace.64072-02.txt.zip | grep 'frm_fill_packet_from_item_sources.*conn' | awk '{print $7}' | sort | uniq -c | sort -n | head
  45 0x7f4fcf548130
  75 0x7f4fcf548e70
  80 0x7f4fcf541170
  88 0x7f4fcf533d50
 106 0x7f4fcf532db0
 121 0x7f4fcf52c010
 122 0x3748da0
 122 0x38aa790
 122 0x38b2490
 122 0x38b9450

All CONN sends:

$ 7z x -so confd-m0trace.64072-02.txt.zip | grep CONN | grep o2ib
10887604486  5173444.038.247032  7f541e7a39c0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f5444044420 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.6@o2ib:12345:1:1 ri_session=0x7f4fcf516420 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887612478  5173444.045.111881  7f541e7a39c0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f5444043ad0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.39@o2ib:12345:1:1 ri_session=0x7f4fcf52cc70 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887647807  5173444.067.489217  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c82ab890 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.27@o2ib:12345:1:1 ri_session=0x7f53280b1f40 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887674537  5173444.084.129748  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c82aac20 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.24@o2ib:12345:1:1 ri_session=0x7f53280bbc80 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887720637  5173444.107.934431  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c82abe00 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.26@o2ib:12345:1:1 ri_session=0x7f53280b9cc0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887757552  5173444.128.246408  7f541e7a39c0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f5444044420 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.3@o2ib:12345:1:1 ri_session=0x7f4fcf533a10 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887764486  5173444.132.524255  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c82b13a0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.1@o2ib:12345:1:1 ri_session=0x7f53280fe190 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887828355  5173444.166.917860  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c82b18f0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.6@o2ib:12345:1:1 ri_session=0x7f53280c4860 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887890598  5173444.204.852884  7f541e7a39c0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f5444044420 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.27@o2ib:12345:1:1 ri_session=0x7f4fcf5349b0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887910992  5173444.212.615441  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c81905b0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.39@o2ib:12345:1:1 ri_session=0x7f53280c4d10 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887943798  5173444.234.687203  7f541e7a39c0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f54440462a0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.26@o2ib:12345:1:1 ri_session=0x7f4fcf541dd0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887964267  5173444.242.609706  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c82b5ba0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.3@o2ib:12345:1:1 ri_session=0x7f53280d0490 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887983586  5173444.256.231076  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8193880 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.2@o2ib:12345:1:1 ri_session=0x7f53280f3d80 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887991810  5173444.261.434945  7f541e7a39c0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f5444043ad0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:1:1 ri_session=0x7f4fcf541ed0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10888012632  5173444.271.214912  7f530d7f9ae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8193230 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.24@o2ib:12345:1:1 ri_session=0x7f53280bbc80 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888125417  5173444.333.974436  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c82b4570 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.24@o2ib:12345:1:1 ri_session=0x7f53280d0cf0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888160942  5173444.352.829382  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8194eb0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.27@o2ib:12345:1:1 ri_session=0x7f53280beca0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888167890  5173444.357.330193  7f530d7f9ae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8195400 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.6@o2ib:12345:1:1 ri_session=0x7f53280c4860 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888204948  5173444.380.300912  7f541e7a39c0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f5444043ad0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.2@o2ib:12345:1:1 ri_session=0x7f4fcf534ab0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10888211974  5173444.384.210452  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c81997c0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.1@o2ib:12345:1:1 ri_session=0x7f53280db4b0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888232770  5173444.398.199602  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c818ff70 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.26@o2ib:12345:1:1 ri_session=0x7f53280db960 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888246849  5173444.406.242343  7f530d7f9ae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8196f90 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.39@o2ib:12345:1:1 ri_session=0x7f53280c4d10 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888268064  5173444.419.960327  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8194eb0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.6@o2ib:12345:1:1 ri_session=0x7f53280c4860 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888282142  5173444.428.621028  7f530d7f9ae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8190b00 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.3@o2ib:12345:1:1 ri_session=0x7f53280d0490 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888345805  5173444.464.292384  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c82ab6f0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.39@o2ib:12345:1:1 ri_session=0x7f53280c4d10 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888381136  5173444.484.857104  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8195960 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.3@o2ib:12345:1:1 ri_session=0x7f53280dbe10 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888409549  5173444.500.372651  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8191050 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.2@o2ib:12345:1:1 ri_session=0x7f53280e6490 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888452211  5173444.525.024454  7f530d7f9ae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c81997c0 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.27@o2ib:12345:1:1 ri_session=0x7f53280beca0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888494599  5173444.545.717364  7f530d7f9ae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c819efa0 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.1@o2ib:12345:1:1 ri_session=0x7f53280db4b0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888508623  5173444.555.395676  7f530d7f9ae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8191050 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.26@o2ib:12345:1:1 ri_session=0x7f53280db960 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0

Note: all endpoints are hax' ones.

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

Some more info with all received and sent connection requests and replies from confd-m0trace.64072-02.txt.zip:

$ 7z x -so confd-m0trace.64072-02.txt.zip | grep -E 'rcv_conn_(init|terminate)   >|CONN.*o2ib|item_received.*/3[23]'
10887604486  5173444.038.247032  7f541e7a39c0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f5444044420 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.6@o2ib:12345:1:1 ri_session=0x7f4fcf516420 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887612478  5173444.045.111881  7f541e7a39c0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f5444043ad0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.39@o2ib:12345:1:1 ri_session=0x7f4fcf52cc70 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887638390  5173444.064.256650  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c82aa6c0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.27@o2ib:12345:1:1, oneway = 0
10887647746  5173444.067.289113  7f532fffeba0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f53280b1b90, ep_addr: 172.18.1.27@o2ib:12345:1:1, machine: 0x373cc70
10887647807  5173444.067.489217  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c82ab890 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.27@o2ib:12345:1:1 ri_session=0x7f53280b1f40 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887673868  5173444.083.536832  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c82adfe0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.24@o2ib:12345:1:1, oneway = 0
10887674326  5173444.083.917592  7f532fffeba0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f53280bb8d0, ep_addr: 172.18.1.24@o2ib:12345:1:1, machine: 0x373cc70
10887674537  5173444.084.129748  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c82aac20 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.24@o2ib:12345:1:1 ri_session=0x7f53280bbc80 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887704924  5173444.099.499727  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c82af0c0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 172.18.1.6@o2ib:12345:1:1, oneway = 0
10887720464  5173444.107.412968  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c82acf00[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.26@o2ib:12345:1:1, oneway = 0
10887720576  5173444.107.760271  7f532fffeba0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f53280b9910, ep_addr: 172.18.1.26@o2ib:12345:1:1, machine: 0x373cc70
10887720637  5173444.107.934431  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c82abe00 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.26@o2ib:12345:1:1 ri_session=0x7f53280b9cc0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887727469  5173444.111.543170  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c82ac350[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 172.18.1.39@o2ib:12345:1:1, oneway = 0
10887750669  5173444.123.847804  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c82b0800[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.1@o2ib:12345:1:1, oneway = 0
10887757552  5173444.128.246408  7f541e7a39c0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f5444044420 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.3@o2ib:12345:1:1 ri_session=0x7f4fcf533a10 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887764414  5173444.132.305505  7f532fffeba0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f53280c4100, ep_addr: 172.18.1.1@o2ib:12345:1:1, machine: 0x373cc70
10887764486  5173444.132.524255  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c82b13a0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.1@o2ib:12345:1:1 ri_session=0x7f53280fe190 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887828187  5173444.166.480754  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c82acf00[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.6@o2ib:12345:1:1, oneway = 0
10887828303  5173444.166.732232  7f532fffeba0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f53280c44b0, ep_addr: 172.18.1.6@o2ib:12345:1:1, machine: 0x373cc70
10887828355  5173444.166.917860  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c82b18f0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.6@o2ib:12345:1:1 ri_session=0x7f53280c4860 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887849393  5173444.176.540900  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c82b28e0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 172.18.1.3@o2ib:12345:1:1, oneway = 0
10887890598  5173444.204.852884  7f541e7a39c0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f5444044420 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.27@o2ib:12345:1:1 ri_session=0x7f4fcf5349b0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887910250  5173444.212.112813  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c82b5ba0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.39@o2ib:12345:1:1, oneway = 0
10887910745  5173444.212.391336  7f532fffeba0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f53280c4960, ep_addr: 172.18.1.39@o2ib:12345:1:1, machine: 0x373cc70
10887910992  5173444.212.615441  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c81905b0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.39@o2ib:12345:1:1 ri_session=0x7f53280c4d10 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887943798  5173444.234.687203  7f541e7a39c0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f54440462a0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.26@o2ib:12345:1:1 ri_session=0x7f4fcf541dd0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10887963635  5173444.242.045450  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c82b28e0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.3@o2ib:12345:1:1, oneway = 0
10887964052  5173444.242.402716  7f532fffeba0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f53280d00e0, ep_addr: 172.18.1.3@o2ib:12345:1:1, machine: 0x373cc70
10887964267  5173444.242.609706  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c82b5ba0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.3@o2ib:12345:1:1 ri_session=0x7f53280d0490 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887976610  5173444.253.345025  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c8192ce0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.2@o2ib:12345:1:1, oneway = 0
10887983549  5173444.256.059548  7f532fffeba0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f53280d0590, ep_addr: 172.18.1.2@o2ib:12345:1:1, machine: 0x373cc70
10887983586  5173444.256.231076  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8193880 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.2@o2ib:12345:1:1 ri_session=0x7f53280f3d80 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10887991810  5173444.261.434945  7f541e7a39c0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f5444043ad0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.1@o2ib:12345:1:1 ri_session=0x7f4fcf541ed0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10888012561  5173444.271.141062  7f530d7f9c00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7f53280bb8d0
10888012632  5173444.271.214912  7f530d7f9ae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8193230 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.24@o2ib:12345:1:1 ri_session=0x7f53280bbc80 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888026384  5173444.278.121497  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c818ff70[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 172.18.1.27@o2ib:12345:1:1, oneway = 0
10888068782  5173444.302.429270  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c81915a0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 172.18.1.26@o2ib:12345:1:1, oneway = 0
10888103989  5173444.322.070626  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c8193cd0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 172.18.1.1@o2ib:12345:1:1, oneway = 0
10888125280  5173444.333.447431  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c8194960[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.24@o2ib:12345:1:1, oneway = 0
10888125380  5173444.333.763813  7f532fffeba0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f53280d0940, ep_addr: 172.18.1.24@o2ib:12345:1:1, machine: 0x373cc70
10888125417  5173444.333.974436  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c82b4570 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.24@o2ib:12345:1:1 ri_session=0x7f53280d0cf0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888160774  5173444.352.285776  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c8195400[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.27@o2ib:12345:1:1, oneway = 0
10888160880  5173444.352.646059  7f532fffeba0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f53280be8f0, ep_addr: 172.18.1.27@o2ib:12345:1:1, machine: 0x373cc70
10888160942  5173444.352.829382  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8194eb0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.27@o2ib:12345:1:1 ri_session=0x7f53280beca0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888167881  5173444.357.260235  7f530d7f9c00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7f53280c44b0
10888167890  5173444.357.330193  7f530d7f9ae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8195400 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.6@o2ib:12345:1:1 ri_session=0x7f53280c4860 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888203721  5173444.377.734297  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c8198c20[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.1@o2ib:12345:1:1, oneway = 0
10888203867  5173444.379.041491  7f532fffeba0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f53280db100, ep_addr: 172.18.1.1@o2ib:12345:1:1, machine: 0x373cc70
10888204948  5173444.380.300912  7f541e7a39c0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f5444043ad0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=172.18.1.2@o2ib:12345:1:1 ri_session=0x7f4fcf534ab0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
10888211974  5173444.384.210452  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c81997c0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.1@o2ib:12345:1:1 ri_session=0x7f53280db4b0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888232609  5173444.397.588544  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c8198c20[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.26@o2ib:12345:1:1, oneway = 0
10888232730  5173444.398.031616  7f532fffeba0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f53280db5b0, ep_addr: 172.18.1.26@o2ib:12345:1:1, machine: 0x373cc70
10888232770  5173444.398.199602  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c818ff70 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.26@o2ib:12345:1:1 ri_session=0x7f53280db960 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888246839  5173444.406.172413  7f530d7f9c00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7f53280c4960
10888246849  5173444.406.242343  7f530d7f9ae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8196f90 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.39@o2ib:12345:1:1 ri_session=0x7f53280c4d10 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888267896  5173444.419.356997  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c8195400[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.6@o2ib:12345:1:1, oneway = 0
10888268004  5173444.419.777975  7f532fffeba0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f53280c44b0, ep_addr: 172.18.1.6@o2ib:12345:1:1, machine: 0x373cc70
10888268064  5173444.419.960327  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8194eb0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.6@o2ib:12345:1:1 ri_session=0x7f53280c4860 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888282118  5173444.428.548235  7f530d7f9c00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7f53280d00e0
10888282142  5173444.428.621028  7f530d7f9ae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8190b00 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.3@o2ib:12345:1:1 ri_session=0x7f53280d0490 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888303307  5173444.439.786088  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c819a260[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 172.18.1.2@o2ib:12345:1:1, oneway = 0
10888345668  5173444.463.794398  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c819b430[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.39@o2ib:12345:1:1, oneway = 0
10888345765  5173444.464.129585  7f532fffeba0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f53280c4960, ep_addr: 172.18.1.39@o2ib:12345:1:1, machine: 0x373cc70
10888345805  5173444.464.292384  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c82ab6f0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.39@o2ib:12345:1:1 ri_session=0x7f53280c4d10 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888380951  5173444.484.264511  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c8190b00[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.3@o2ib:12345:1:1, oneway = 0
10888381053  5173444.484.706690  7f532fffeba0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f53280dba60, ep_addr: 172.18.1.3@o2ib:12345:1:1, machine: 0x373cc70
10888381136  5173444.484.857104  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8195960 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.3@o2ib:12345:1:1 ri_session=0x7f53280dbe10 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888409412  5173444.499.755014  7f50cdecf8e0  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x373cc70, item: 0x7f50c819bed0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 172.18.1.2@o2ib:12345:1:1, oneway = 0
10888409509  5173444.500.237669  7f532fffeba0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f53280e60e0, ep_addr: 172.18.1.2@o2ib:12345:1:1, machine: 0x373cc70
10888409549  5173444.500.372651  7f532fffeab0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8191050 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=172.18.1.2@o2ib:12345:1:1 ri_session=0x7f53280e6490 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888452202  5173444.524.959438  7f530d7f9c00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7f53280be8f0
10888452211  5173444.525.024454  7f530d7f9ae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c81997c0 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.27@o2ib:12345:1:1 ri_session=0x7f53280beca0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888494590  5173444.545.649891  7f530d7f9c00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7f53280db100
10888494599  5173444.545.717364  7f530d7f9ae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c819efa0 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.1@o2ib:12345:1:1 ri_session=0x7f53280db4b0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
10888508614  5173444.555.328896  7f530d7f9c00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7f53280db5b0
10888508623  5173444.555.395676  7f530d7f9ae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f50c8191050 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=172.18.1.26@o2ib:12345:1:1 ri_session=0x7f53280db960 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0

Note: some connections do terminate (see at the end).

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

Here are the m0trace logs from a simple singlenode configuration (easier to analyse): confd-m0trace.14465.txt.zip, hax-m0trace.14071.txt.zip.

From the preliminary analysis, it looks like new rpc connections are made by hax process to confd on each rconfc_start (see in hax-m0trace.14071.txt.zip), but they are not terminated on rconfc_stop.

from cortx-motr.

mssawant avatar mssawant commented on July 27, 2024

@andriytk, do you have hax journald logs? I wonder if these are related to HA notifications.

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

@mssawant here you are: sage-hax-syslog.txt.

BTW: note how long it took just to start rconfc:

Dec 17 11:46:17 sage-tier1-1 hare-hax[64056]: 2020-12-17 11:46:17,724 [DEBUG] {fs-stats-updater} Starting rconfc
...
Dec 17 11:47:19 sage-tier1-1 hare-hax[64056]: 2020-12-17 11:47:19,842 [DEBUG] {fs-stats-updater} rconfc started

More than 1 minute!

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

But when it turned out that a long-running rconfc makes problems at Motr level...

@knekrasov can you recall by any chance - what the problem was it?

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

Let's look more precisely on what happens with connections during fs-stats fetching in hax-m0trace.14071.txt.zip:

rconfc starts and queries hax (itself) for the entrypoint:

$ 7z x -so hax-m0trace.14071.txt.zip | grep -E 'conn_(init|terminate)   >|CONN.*:12345:|item_received.*/3[23]|rconfc_(start|stop)'
...
202218  42990.745.189295  7fb5849cd7b0  CALL    spiel/spiel.c:72:m0_spiel_rconfc_start   > 
202282  42990.745.702553  7fb5849cd720  CALL    conf/rconfc.c:2908:m0_rconfc_start_wait   > rconfc = 0x20b8160, profile = <7200000000000001:6>
202283  42990.745.703749  7fb5849cd670  CALL    conf/rconfc.c:2890:m0_rconfc_start   > rconfc = 0x20b8160, profile = <7200000000000001:6>
202286  42990.745.729227  7fb5849cd670  CALL    conf/rconfc.c:2898:m0_rconfc_start   < rc=0
202288  42990.745.734798  7fb5849cd5b0  CALL    conf/rconfc.c:2014:rconfc_start_ast_cb   > 
202289  42990.745.735780  7fb5849cd370  CALL    conf/rconfc.c:1982:rconfc_start   > 
202291  42990.745.741137  7fb5849cd370  DEBUG   conf/rconfc.c:1988:rconfc_start   Querying ENTRYPOINT...
202300  42990.745.850102  7fb5849cd370  CALL    conf/rconfc.c:2006:rconfc_start   < rc=0 hae_control=2 is_failed=false entrypoint_retries=0
202301  42990.745.851756  7fb5849cd5b0  CALL    conf/rconfc.c:2016:rconfc_start_ast_cb   < 
202395  42990.749.584679  7fb5ba0b2aa0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x23ec240 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
202620  42990.751.468573  7fb5861d1970  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x20b3670, item: 0x7fb578020b90[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
202672  42990.751.771651  7fb5b90b0cb0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7fb5ac061e00, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x20b3670
202709  42990.751.890780  7fb5b90b0bc0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5780383f0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7fb5ac048660 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
202946  42990.754.491542  7fb5861d1970  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x20b3670, item: 0x7fb5780383f0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
205055  42991.266.572183  7fb5ba0b2a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x23ec240 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
205335  42991.268.219042  7fb5ab7fdd00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7fb5ac061e00
205344  42991.268.245554  7fb5ab7fdbe0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb578020b90 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7fb5ac048660 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
205754  42991.270.591204  7fb5b88afd70  CALL    conf/rconfc.c:2014:rconfc_start_ast_cb   > 
205755  42991.270.592186  7fb5b88afb30  CALL    conf/rconfc.c:1982:rconfc_start   > 
205757  42991.270.610338  7fb5b88afb30  DEBUG   conf/rconfc.c:1993:rconfc_start   ENTRYPOINT ready...

Got entrypoint, connects to confd:

205758  42991.270.610746  7fb5b88afb30  CALL    conf/rconfc.c:1958:rconfc_start_internal   > 
205863  42991.271.451757  7fb5ba0b2aa0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b405d4b0 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
206089  42991.275.119529  7fb5861d1970  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x20b3670, item: 0x7fb5780383f0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
206522  42991.280.185454  7fb5b88afb30  CALL    conf/rconfc.c:1971:rconfc_start_internal   < rc=0
206523  42991.280.188604  7fb5b88afb30  CALL    conf/rconfc.c:2006:rconfc_start   < rc=0 hae_control=2 is_failed=false entrypoint_retries=0
206524  42991.280.189930  7fb5b88afd70  CALL    conf/rconfc.c:2016:rconfc_start_ast_cb   < 
208571  42991.908.523648  7fb5b88af860  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5b405fd60 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b404be20 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
209312  42991.912.326503  7fb5861d1970  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x20b3670, item: 0x7fb578040ae0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
210383  42991.923.503353  7fb5861d1970  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x20b3670, item: 0x7fb578020b90[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
210470  42991.925.299777  7fb5b90b0cb0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7fb5ac061e00, ep_addr: 192.168.180.153@tcp:12345:2:1, machine: 0x20b3670
210517  42991.925.557938  7fb5b90b0bc0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5780482d0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5ac0488f0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
211812  42991.934.156981  7fb5ba0b2aa0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b4060420 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
212040  42991.941.190429  7fb5861d1970  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x20b3670, item: 0x7fb578020b90[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
212772  42991.954.691164  7fb5ba0b2a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b4060420 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
213183  42991.959.929918  7fb5ba0b2aa0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b405c580 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
213412  42991.963.468575  7fb5861d1970  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x20b3670, item: 0x7fb578020b90[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
235277  42992.196.447185  7fb5849cd720  CALL    conf/rconfc.c:2934:m0_rconfc_start_wait   < rc=0
235278  42992.196.449106  7fb5849cd7b0  CALL    spiel/spiel.c:87:m0_spiel_rconfc_start   < rc=0

Got the configuration from confd, rconfc is started.

So we've sent 4 connection establish requests to confd and only 1 connection terminate request so far.

Now we connect to ios (to get fs stats):

239026  42992.530.334986  7fb5ba0b2aa0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:2 ri_session=0x7fb574048ac0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
239534  42992.533.973605  7fb5861d1970  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x20b3670, item: 0x7fb578049d60[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:2, oneway = 0
241006  42992.551.071387  7fb5ba0b2a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:2 ri_session=0x7fb574048ac0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0

We've got the stats, terminated the connection with ios and now stop rconfc with sending 3 connection termination requests to confd:

241349  42992.555.873956  7fb5849cd8a0  CALL    spiel/spiel.c:93:m0_spiel_rconfc_stop   > 
241350  42992.555.875431  7fb5849cd860  CALL    conf/rconfc.c:2956:m0_rconfc_stop_sync   > rconfc = 0x20b8160
241351  42992.555.876175  7fb5849cd820  CALL    conf/rconfc.c:2939:m0_rconfc_stop   > rconfc 0x20b8160
241353  42992.555.896186  7fb5849cd820  CALL    conf/rconfc.c:2951:m0_rconfc_stop   < 
241355  42992.555.899800  7fb5849cd6e0  CALL    conf/rconfc.c:2346:rconfc_stop_ast_cb   > rconfc = 0x20b8160
241356  42992.555.900359  7fb5849cd670  CALL    conf/rconfc.c:2317:rconfc_stop_internal   > rconfc = 0x20b8160
241731  42992.562.096228  7fb5ba0b2a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b405d4b0 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
242948  42992.573.030791  7fb5849cd2e0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5740478f0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b404be20 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
243278  42992.576.067319  7fb5849cd670  CALL    conf/rconfc.c:2339:rconfc_stop_internal   < 
243279  42992.576.068375  7fb5849cd6e0  CALL    conf/rconfc.c:2352:rconfc_stop_ast_cb   < 
244044  42992.576.647521  7fb5849cd860  CALL    conf/rconfc.c:2965:m0_rconfc_stop_sync   < 
244420  42992.589.076548  7fb5ba0b2a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b405c580 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
244794  42992.598.139163  7fb5849cd8a0  CALL    spiel/spiel.c:96:m0_spiel_rconfc_stop   < 

So looks like all connections to confd from hax are terminated. But there is one connection from confd to hax - 0x7fb5ac061e00 - which is not terminated.

Here is the next iteration of fs stats fetch where all the above steps are repeated:

244831  43079.667.115084  7fb5849cd7b0  CALL    spiel/spiel.c:72:m0_spiel_rconfc_start   > 
244898  43079.667.661081  7fb5849cd720  CALL    conf/rconfc.c:2908:m0_rconfc_start_wait   > rconfc = 0x20b8160, profile = <7200000000000001:6>
244899  43079.667.661744  7fb5849cd670  CALL    conf/rconfc.c:2890:m0_rconfc_start   > rconfc = 0x20b8160, profile = <7200000000000001:6>
244902  43079.667.684638  7fb5849cd670  CALL    conf/rconfc.c:2898:m0_rconfc_start   < rc=0
244904  43079.667.690194  7fb5849cd5b0  CALL    conf/rconfc.c:2014:rconfc_start_ast_cb   > 
244905  43079.667.691580  7fb5849cd370  CALL    conf/rconfc.c:1982:rconfc_start   > 
244907  43079.667.697005  7fb5849cd370  DEBUG   conf/rconfc.c:1988:rconfc_start   Querying ENTRYPOINT...
244913  43079.667.795861  7fb5849cd370  CALL    conf/rconfc.c:2006:rconfc_start   < rc=0 hae_control=2 is_failed=false entrypoint_retries=0
244914  43079.667.797122  7fb5849cd5b0  CALL    conf/rconfc.c:2016:rconfc_start_ast_cb   < 
245008  43079.669.270664  7fb5ba0b2aa0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x23ec240 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
245233  43079.671.588732  7fb5861d1970  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x20b3670, item: 0x7fb578049d60[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
245285  43079.672.078852  7fb5b90b0cb0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7fb5ac0822d0, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x20b3670
245322  43079.672.234060  7fb5b90b0bc0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5780492c0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7fb5ac082680 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
245560  43079.674.292506  7fb5861d1970  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x20b3670, item: 0x7fb5780492c0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
247668  43080.152.347547  7fb5ba0b2a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x23ec240 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
247948  43080.154.133262  7fb5ab7fdd00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7fb5ac0822d0
247957  43080.154.175477  7fb5ab7fdbe0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb57804a3a0 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7fb5ac082680 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
248379  43080.160.570193  7fb5b88afd70  CALL    conf/rconfc.c:2014:rconfc_start_ast_cb   > 
248380  43080.160.571246  7fb5b88afb30  CALL    conf/rconfc.c:1982:rconfc_start   > 
248382  43080.160.596849  7fb5b88afb30  DEBUG   conf/rconfc.c:1993:rconfc_start   ENTRYPOINT ready...


248383  43080.160.597438  7fb5b88afb30  CALL    conf/rconfc.c:1958:rconfc_start_internal   > 
248488  43080.161.852568  7fb5ba0b2aa0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b4043b40 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
248714  43080.168.752491  7fb5861d1970  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x20b3670, item: 0x7fb5780492c0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
249147  43080.173.337323  7fb5b88afb30  CALL    conf/rconfc.c:1971:rconfc_start_internal   < rc=0
249148  43080.173.338958  7fb5b88afb30  CALL    conf/rconfc.c:2006:rconfc_start   < rc=0 hae_control=2 is_failed=false entrypoint_retries=0
249149  43080.173.339886  7fb5b88afd70  CALL    conf/rconfc.c:2016:rconfc_start_ast_cb   < 
251184  43080.805.585857  7fb5849cd0c0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5740478f0 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb574048ac0 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
251912  43080.809.353664  7fb5861d1970  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x20b3670, item: 0x7fb57804ad50[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
253007  43080.822.430385  7fb5861d1970  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x20b3670, item: 0x7fb57804a800[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
253059  43080.823.321014  7fb5b90b0cb0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7fb5ac0822d0, ep_addr: 192.168.180.153@tcp:12345:2:1, machine: 0x20b3670
253110  43080.823.456740  7fb5b90b0bc0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb57804b3a0 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5ac082680 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
254437  43080.830.687670  7fb5ba0b2aa0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b4063670 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
254665  43080.833.652895  7fb5861d1970  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x20b3670, item: 0x7fb57804a800[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
255397  43080.840.917944  7fb5ba0b2a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b4063670 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
255808  43080.845.591560  7fb5ba0b2aa0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b4063670 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
256036  43080.848.477597  7fb5861d1970  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x20b3670, item: 0x7fb57804a800[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:1, oneway = 0
277890  43081.067.401059  7fb5849cd720  CALL    conf/rconfc.c:2934:m0_rconfc_start_wait   < rc=0
277891  43081.067.402329  7fb5849cd7b0  CALL    spiel/spiel.c:87:m0_spiel_rconfc_start   < rc=0


281833  43081.428.335787  7fb5ba0b2aa0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:2:2 ri_session=0x7fb574048bc0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
282217  43081.432.311942  7fb5861d1970  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x20b3670, item: 0x7fb57804c8d0[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:2:2, oneway = 0
283631  43081.447.470724  7fb5ba0b2a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:2 ri_session=0x7fb574048bc0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0


283974  43081.452.047354  7fb5849cd8a0  CALL    spiel/spiel.c:93:m0_spiel_rconfc_stop   > 
283975  43081.452.049350  7fb5849cd860  CALL    conf/rconfc.c:2956:m0_rconfc_stop_sync   > rconfc = 0x20b8160
283976  43081.452.050079  7fb5849cd820  CALL    conf/rconfc.c:2939:m0_rconfc_stop   > rconfc 0x20b8160
283978  43081.452.068318  7fb5849cd820  CALL    conf/rconfc.c:2951:m0_rconfc_stop   < 
283980  43081.452.071738  7fb5849cd6e0  CALL    conf/rconfc.c:2346:rconfc_stop_ast_cb   > rconfc = 0x20b8160
283981  43081.452.072514  7fb5849cd670  CALL    conf/rconfc.c:2317:rconfc_stop_internal   > rconfc = 0x20b8160
284344  43081.456.381947  7fb5ba0b2a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b4043b40 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
285573  43081.465.962627  7fb5849cd2e0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5740478f0 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb574048ac0 ri_nr_sent_max=2 ri_deadline=0 ri_nr_sent=0
285903  43081.469.257972  7fb5849cd670  CALL    conf/rconfc.c:2339:rconfc_stop_internal   < 
285904  43081.469.258671  7fb5849cd6e0  CALL    conf/rconfc.c:2352:rconfc_stop_ast_cb   < 
286669  43081.470.115579  7fb5849cd860  CALL    conf/rconfc.c:2965:m0_rconfc_stop_sync   < 
287045  43081.479.112607  7fb5ba0b2a80  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7fb5a4044520 [REQUEST/M0_RPC_CONN_TERMINATE_OPCODE(38)] dest_ep=192.168.180.153@tcp:12345:2:1 ri_session=0x7fb5b4063670 ri_nr_sent_max=1 ri_deadline=0 ri_nr_sent=0
287419  43081.482.272063  7fb5849cd8a0  CALL    spiel/spiel.c:96:m0_spiel_rconfc_stop   < 

Again we can see that the receiving connection 0x7fb5ac0822d0 is not terminated.

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

Now let's see what was happening at confd side at the same time-frame in confd-m0trace.14465.txt.zip:

Here we see 4 incoming connection requests from hax with one of them terminated and 1 outgoing connection request to hax:

$ 7z x -so confd-m0trace.14465.txt.zip | grep -E 'rcv_conn_(init|terminate)   >|CONN.*:12345:|item_received.*/3[23]|rconfc_(start|stop)'
...
179109  42991.272.681309  7f200fffe870  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x236eaf0, item: 0x7f200800f830[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
179167  42991.273.485286  7f203568ebb0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f2030096940, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x236eaf0
179204  42991.273.586025  7f203568eac0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f200800a150 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f2030096cf0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
179771  42991.909.697646  7f200fffe870  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x236eaf0, item: 0x7f200800f830[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
179823  42991.910.320173  7f203568ebb0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f20300d5570, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x236eaf0
179860  42991.910.458264  7f203568eac0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f200800a150 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f203009a600 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
180686  42991.921.885390  7f20366909a0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f202c07ec20 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f203009a4b0 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
181443  42991.926.975026  7f200fffe870  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x236eaf0, item: 0x7f200800a150[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
182334  42991.937.299069  7f200fffe870  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x236eaf0, item: 0x7f20080246c0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
182386  42991.937.873888  7f203568ebb0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f20300d5920, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x236eaf0
182423  42991.938.140708  7f203568eac0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f200800a150 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300e03f0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
183371  42991.956.717151  7f20277fdc00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7f20300d5920
183380  42991.956.766442  7f20277fdae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2008025d60 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300e03f0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
183692  42991.961.501955  7f200fffe870  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x236eaf0, item: 0x7f20080246c0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
183744  42991.961.936256  7f203568ebb0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f20300d5920, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x236eaf0
183781  42991.962.102002  7f203568eac0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f200800a150 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300d5cd0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0

Now we see that 3 incoming connections got terminated:

202579  42992.563.574984  7f20277fdc00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7f2030096940
202588  42992.563.601000  7f20277fdae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2008024d00 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f2030096cf0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
203661  42992.574.334644  7f20277fdc00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7f20300d5570
203670  42992.574.362695  7f20277fdae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f200800f910 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f203009a600 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
205046  42992.594.659369  7f20277fdc00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7f20300d5920
205055  42992.594.738954  7f20277fdae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2008024220 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300d5cd0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0

But the outgoing connection (sent at 42991.921.885390) was not terminated.

Here is the next iteration where everything repeats again:

205454  43080.164.138594  7f200fffe870  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x236eaf0, item: 0x7f200800f830[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
205512  43080.165.241444  7f203568ebb0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f20300e0120, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x236eaf0
205549  43080.165.405545  7f203568eac0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f200800a150 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f2030096940 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
206118  43080.806.794370  7f200fffe870  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x236eaf0, item: 0x7f200800f830[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
206170  43080.807.651058  7f203568ebb0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f20300d5500, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x236eaf0
206207  43080.807.760646  7f203568eac0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f200800a150 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300d58b0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
207028  43080.821.351921  7f20366909a0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f202c07c120 [REQUEST/M0_RPC_CONN_ESTABLISH_OPCODE(32)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f2030096a40 ri_nr_sent_max=4 ri_deadline=0 ri_nr_sent=0
207790  43080.824.828174  7f200fffe870  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x236eaf0, item: 0x7f200800a150[REPLY/33] size 132 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
208687  43080.831.971863  7f200fffe870  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x236eaf0, item: 0x7f20080246c0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
208739  43080.832.390467  7f203568ebb0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f20300d5b50, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x236eaf0
208776  43080.832.533118  7f203568eac0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f200800a150 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300d5f00 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
209730  43080.842.415421  7f20277fdc00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7f20300d5b50
209739  43080.842.460607  7f20277fdae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2008025d60 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300d5f00 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
210052  43080.846.833923  7f200fffe870  CALL    rpc/rpc_machine.c:874:item_received   > machine: 0x236eaf0, item: 0x7f20080246c0[REQUEST/32] size 121 xid=18446744073709551615 from ep_addr: 192.168.180.153@tcp:12345:1:1, oneway = 0
210104  43080.847.246842  7f203568ebb0  CALL    rpc/conn.c:533:m0_rpc_rcv_conn_init   > conn: 0x7f20300d5b50, ep_addr: 192.168.180.153@tcp:12345:1:1, machine: 0x236eaf0
210141  43080.847.351301  7f203568eac0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f200800a150 [REPLY/M0_RPC_CONN_ESTABLISH_REP_OPCODE(33)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300d5f00 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0


228983  43081.457.734384  7f20277fdc00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7f20300e0120
228992  43081.457.762991  7f20277fdae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2008024d00 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f2030096940 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
230068  43081.467.199184  7f20277fdc00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7f20300d5500
230077  43081.467.224764  7f20277fdae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f200800f910 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300d58b0 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0
231455  43081.480.454485  7f20277fdc00  CALL    rpc/conn.c:1219:m0_rpc_rcv_conn_terminate   > conn: 0x7f20300d5b50
231464  43081.480.480612  7f20277fdae0  CALL    rpc/item.c:1139:m0_rpc_item_send   > 0x7f2008024220 [REPLY/M0_RPC_CONN_TERMINATE_REP_OPCODE(39)] dest_ep=192.168.180.153@tcp:12345:1:1 ri_session=0x7f20300d5f00 ri_nr_sent_max=18446744073709551615 ri_deadline=0 ri_nr_sent=0

So looks like on each fs-stats iteration the outgoing connection from confd to hax is leaked.

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

This is the leaked connection (from confd-m0trace.14465.txt.zip):

180517  42991.920.514769  7f203568ea40  CALL    rm/rm_foms.c:365:m0_rm_reverse_session_get   > remote incoming 0x7f2008024cd0 remote 0x7f20300dff80
180518  42991.920.519680  7f203568e9c0  CALL    rpc/service.c:174:m0_rpc_service_reverse_session_get   > 
180519  42991.920.520691  7f203568e970  CALL    lib/memory.c:129:m0_alloc   > size=3160
180520  42991.920.524965  7f203568e970  CALL    lib/memory.c:140:m0_alloc   < ptr=0x7f2030104e40 size=3160
180521  42991.920.527077  7f203568e960  CALL    rpc/link.c:611:m0_rpc_link_init   > rlink=0x7f2030104e40 ep=192.168.180.153@tcp:12345:1:1
180522  42991.920.531803  7f203568e7a0  CALL    net/lnet/lnet_core.c:396:nlx_core_ep_addr_decode   > ep_addr=192.168.180.153@tcp:12345:1:1
180523  42991.920.571764  7f203568e760  CALL    net/lnet/ulnet_core.c:559:nlx_ucore_ioctl   < rc=0
180524  42991.920.577480  7f203568e900  CALL    net/ep.c:84:m0_net_end_point_create   < rc=0
180525  42991.920.578947  7f203568e8d0  CALL    rpc/conn.c:303:m0_rpc_conn_init   > conn=0x7f2030104e40, svc_fid=(nil) ep=192.168.180.153@t
cp:12345:1:1

And later we can see it in this list (compare with the issue description at the top):

211306  43080.860.010893  7f2036690720  CALL    rpc/formation2.c:580:frm_fill_packet_from_item_sources   > 
211307  43080.860.019915  7f2036690720  DEBUG   rpc/formation2.c:585:frm_fill_packet_from_item_sources   conn: 0x7f203011baf0
211308  43080.860.020275  7f2036690720  DEBUG   rpc/formation2.c:585:frm_fill_packet_from_item_sources   conn: 0x7f2030104e40
211309  43080.860.020602  7f2036690720  DEBUG   rpc/formation2.c:585:frm_fill_packet_from_item_sources   conn: 0x7f20300d3090
211310  43080.860.020895  7f2036690720  DEBUG   rpc/formation2.c:585:frm_fill_packet_from_item_sources   conn: 0x7f20300d62c0
211311  43080.860.021194  7f2036690720  DEBUG   rpc/formation2.c:585:frm_fill_packet_from_item_sources   conn: 0x7f2030093b70
211312  43080.860.021336  7f2036690720  DEBUG   rpc/formation2.c:585:frm_fill_packet_from_item_sources   conn: 0x7f203009abc0
211313  43080.860.021638  7f2036690720  DEBUG   rpc/formation2.c:585:frm_fill_packet_from_item_sources   conn: 0x2460dd0
211314  43080.860.021916  7f2036690720  DEBUG   rpc/formation2.c:585:frm_fill_packet_from_item_sources   conn: 0x2459d10
211315  43080.860.022200  7f2036690720  DEBUG   rpc/formation2.c:585:frm_fill_packet_from_item_sources   conn: 0x2452c50
211316  43080.860.022296  7f2036690720  DEBUG   rpc/formation2.c:585:frm_fill_packet_from_item_sources   conn: 0x244bb90
211317  43080.860.022574  7f2036690720  DEBUG   rpc/formation2.c:585:frm_fill_packet_from_item_sources   conn: 0x2444ad0
211318  43080.860.022667  7f2036690720  DEBUG   rpc/formation2.c:585:frm_fill_packet_from_item_sources   conn: 0x243d8a0
211319  43080.860.023044  7f2036690720  DEBUG   rpc/formation2.c:585:frm_fill_packet_from_item_sources   conn: 0x2435aa0
211320  43080.860.023320  7f2036690720  DEBUG   rpc/formation2.c:585:frm_fill_packet_from_item_sources   conn: 0x23884b0
211321  43080.860.023676  7f2036690720  DEBUG   rpc/formation2.c:585:frm_fill_packet_from_item_sources   conn: 0x7ffeb6106628
211322  43080.860.023842  7f2036690720  CALL    rpc/formation2.c:623:frm_fill_packet_from_item_sources   < 

The list is full of leaked reverse connections which only grows over time.

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

After discussion with @just-now, the following was found out: looks like the reverse session/connection is established so that RM manager is able to revoke the RM-read-lock taken by each rconfc client on the configuration. (This is needed to update the configuration among all processes in the cluster, whenever it happens: RM-read-lock is revoked and rconfc in each Motr process in the cluster would abandon the old configuration and try to get the new one.)

Now, looks like this reverse session/connection is not cleaned up in the main confd process whenever rconfc stops in another Motr processes. These connections are getting accumulated over time, wasting memory and CPU cycles at frm_fill_packet_from_item_sources(). This is a bug and must be fixed.

I assume the problem exists also in LDR-R1. It's just would take a few weeks to show up as there are only 2 nodes. (While it takes only 1 day to show up on SAGE-cluster with 22 nodes.)

cc @truptiatseagate, @huanghua78.

Background info: on each node hax process periodically (once a minute) fetches fs-statistics. On each such fetch it starts and stops rconfc. So the reverse sessions/connectios are established in confd on each such iteration from each node.

from cortx-motr.

mssawant avatar mssawant commented on July 27, 2024

@knekrasov can you recall by any chance - what the problem was it?

@andriytk, if rconfc lives along with the hax the hax shutdown fails because confd's are stopped before hax and then hax is not able to shutdown cleanly and fails with timeout.

from cortx-motr.

mssawant avatar mssawant commented on July 27, 2024

@andriytk, I would suggest to create a Jira ticket for this.

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

if rconfc lives along with the hax the hax shutdown fails because confd's are stopped before hax...

@mssawant yes, that's what I have already learned today (thanks to @knekrasov). But it is not a solution to stop rconfc and start rconfc periodically as fs-stats is fetched. Because the shutdown may happen at any time - even during the fs-stats fetching when rconfc is started and not stopped yet. And on relatively big clusters (like at SAGE where we have 22 nodes) it takes about 1-minute only to start rconfc! (See my comments above at #298 (comment) and at #298 (comment).) So it's definitely not a solution. We just reduce the probability of the problem appearance or, in another word - we are just hiding the bug instead of fixing it.

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

to create a Jira ticket for this

Looks like we have two different problems here:

  1. Connection/memory leakage at confd - Motr issue (this one).
  2. Periodic start/stop of rconfc in hax - Hare issue.

I'd better suggest to create another GitHub issue for Hare. (Jira is not publicly visible and GitHub is much more convenient. As @johnbent advised, let's discuss our technical issues publicly.)

from cortx-motr.

huanghua78 avatar huanghua78 commented on July 27, 2024

After discussion with @just-now, the following was found out: looks like the reverse session/connection is established so that RM manager is able to revoke the RM-read-lock taken by each rconfc client on the configuration. (This is needed to update the configuration among all processes in the cluster, whenever it happens: RM-read-lock is revoked and rconfc in each Motr process in the cluster would abandon the old configuration and try to get the new one.)

Hello @andriytk , @mssawant @nikitadanilov , this RM-read-lock taken by each rconfc client on the configuration will cause problems.
If some clients, die accidentally, without proper cleanup. So, the RM-read-lock can not be revoked. (Motr RM lock is not lease based). In this case, the confd can not get an exclusive write lock to update the configuration?
For example, node addition, pool status update, all need to update configuration.

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

If some clients, die accidentally...

@huanghua78 yes, this is another scenario that must be handled properly.

BTW, I just checked in the code (

rc = rfom_debtor_subscribe(rfom, debtor);
) and compared with the log (confd-m0trace.14465.txt.zip):

180578  42991.921.022896  7f203568ea90  CALL    rm/rm_foms.c:544:rfom_debtor_subscribe   < rc=0
180579  42991.921.023363  7f203568ea90  CALL    rm/rm_foms.c:595:request_pre_process   < rc=1

It looks like we do subscribe the debtor for HA-notification. So when the process stops or crashes and M0_NC_FAILED ha-state is received - it must be cleaned up. But the problem in our case is that hax process does not stop or crash - it just stops and starts rconfc periodically. And each time rconfc starts - it creates the new debtor at RM-manager service (working inside leading confd process), but it the old one is not cleaned up on rconfc stop. Although M0_RM_FOP_CANCEL is sent and received:

202898  42992.567.231871  7f200fffe640  CALL    rpc/item.c:1223:m0_rpc_item_received   > 0x7f200800a150 [REQUEST/M0_RM_FOP_CANCEL(92)] xid=2 machine=0x236eaf0

I wonder why the debtor is not cleaned up on this cancel fop...

from cortx-motr.

andriytk avatar andriytk commented on July 27, 2024

The workaround has been landed into Hare - Seagate/cortx-hare#1437.

from cortx-motr.

stale avatar stale commented on July 27, 2024

This issue/pull request has been marked as needs attention as it has been left pending without new activity for 4 days. Tagging @huanghua78 @mukundkanekar for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.

from cortx-motr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.