Code Monkey home page Code Monkey logo

multipath-tcp / mptcp_net-next Goto Github PK

View Code? Open in Web Editor NEW
261.0 40.0 36.0 2.75 GB

Development version of the Upstream MultiPath TCP Linux kernel 🐧

Home Page: https://mptcp.dev

License: Other

Makefile 0.20% C 98.33% Assembly 0.73% C++ 0.01% Shell 0.37% Perl 0.10% Awk 0.01% Python 0.21% UnrealScript 0.01% Clojure 0.01% Yacc 0.01% Lex 0.01% Roff 0.01% Gherkin 0.01% XS 0.01% M4 0.01% sed 0.01% SmPL 0.01% Raku 0.01% MATLAB 0.01%
c linux linux-kernel mptcp upstream

mptcp_net-next's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mptcp_net-next's Issues

fix covscan issues

Covscan reported an issue in subflow_finish_connect(): checking non-nulliness of skb is reported as being useless, since we didn't do that earlier, when mptcp_get_options() was invoked.

*** CID 1463341:    (REVERSE_INULL)
/net/mptcp/subflow.c: 265 in subflow_finish_connect()
259     
260             if (subflow->mp_capable) {
261                     pr_debug("subflow=%p, remote_key=%llu", mptcp_subflow_ctx(sk),
262                              subflow->remote_key);
263                     mptcp_finish_connect(sk);
264     
>>>     CID 1463341:    (REVERSE_INULL)
>>>     Null-checking "skb" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
265                     if (skb) {
266                             pr_debug("synack seq=%u", TCP_SKB_CB(skb)->seq);
267                             subflow->ssn_offset = TCP_SKB_CB(skb)->seq;
268                     }
269             } else if (subflow->mp_join) {
270                     pr_debug("subflow=%p, thmac=%llu, remote_nonce=%u",
/net/mptcp/subflow.c: 284 in subflow_finish_connect()
278     
279                     subflow_generate_hmac(subflow->local_key, subflow->remote_key,
280                                           subflow->local_nonce,
281                                           subflow->remote_nonce,
282                                           subflow->hmac);
283     
>>>     CID 1463341:    (REVERSE_INULL)
>>>     Null-checking "skb" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
284                     if (skb)
285                             subflow->ssn_offset = TCP_SKB_CB(skb)->seq;
286     
287                     if (!mptcp_finish_join(sk))
288                             goto do_reset;
289     
}}}

mptcp_connect: Connection reset by peer during a test

My CI detected this error:

00:07:45.740 # INFO: Using loss of 0.72% delay 157 ms reorder 94% 59% on ns3eth4
00:07:45.749 # ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP	(duration   196ms) [ OK ]
00:07:45.990 # ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP  	(duration    34ms) [ OK ]
00:07:46.064 # ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP	(duration    33ms) [ OK ]
00:07:46.140 # ns1 MPTCP -> ns1 (dead:beef:1::1:10003) MPTCP	(duration   231ms) [ OK ]
00:07:46.411 # ns1 MPTCP -> ns1 (dead:beef:1::1:10004) TCP  	(duration    35ms) [ OK ]
00:07:46.496 # ns1 TCP   -> ns1 (dead:beef:1::1:10005) MPTCP	(duration    33ms) [ OK ]
00:07:46.575 # ns1 MPTCP -> ns2 (10.0.1.2:10006      ) MPTCP	(duration  5784ms) [ OK ]
00:07:52.400 # ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP	(duration  5426ms) [ OK ]
00:07:57.866 # ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP	(duration    41ms) [ OK ]
00:07:57.946 # ns1 MPTCP -> ns2 (dead:beef:2::1:10009) MPTCP	(duration  5664ms) [ OK ]
00:08:03.654 # ns1 MPTCP -> ns3 (10.0.2.2:10010      ) MPTCP	(duration  6316ms) [ OK ]
00:08:10.013 # ns1 MPTCP -> ns3 (dead:beef:2::2:10011) MPTCP	(duration  7640ms) [ OK ]
00:08:17.695 # ns1 MPTCP -> ns3 (10.0.3.2:10012      ) MPTCP	(duration  6187ms) [ OK ]
00:08:23.926 # ns1 MPTCP -> ns3 (dead:beef:3::2:10013) MPTCP	(duration  2879ms) [ OK ]
00:08:26.846 # ns1 MPTCP -> ns4 (10.0.3.1:10014      ) MPTCP	(duration  4996ms) [ OK ]
00:08:31.884 # ns1 MPTCP -> ns4 (dead:beef:3::1:10015) MPTCP	(duration  3367ms) [ OK ]
00:08:35.292 # ns2 MPTCP -> ns1 (10.0.1.1:10016      ) MPTCP	(duration  6048ms) [ OK ]
00:08:41.387 # ns2 MPTCP -> ns1 (dead:beef:1::1:10017) MPTCP	(duration  5592ms) [ OK ]
00:08:47.023 # ns2 MPTCP -> ns3 (10.0.2.2:10018      ) MPTCP	copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 4)
00:09:17.073 # read: Connection reset by peer
00:09:17.081 # (duration 30028ms) [ FAIL ] client exit code 3, server 2
00:09:17.081 # \nnetns ns3-5ec2369e-aYFE3A socket stat for 10018:
00:09:17.126 # State   Recv-Q    Send-Q        Local Address:Port         Peer Address:Port    
00:09:17.127 # \nnetns ns2-5ec2369e-aYFE3A socket stat for 10018:
00:09:17.141 # State   Recv-Q    Send-Q        Local Address:Port         Peer Address:Port    
00:09:17.156 # ns2 MPTCP -> ns3 (dead:beef:2::2:10019) MPTCP	(duration  3878ms) [ OK ]
00:09:21.075 # ns2 MPTCP -> ns3 (10.0.3.2:10020      ) MPTCP	(duration  7481ms) [ OK ]
00:09:28.597 # ns2 MPTCP -> ns3 (dead:beef:3::2:10021) MPTCP	(duration  9327ms) [ OK ]
00:09:37.970 # ns2 MPTCP -> ns4 (10.0.3.1:10022      ) MPTCP	(duration  3515ms) [ OK ]
00:09:41.527 # ns2 MPTCP -> ns4 (dead:beef:3::1:10023) MPTCP	(duration  4669ms) [ OK ]
00:09:46.240 # ns3 MPTCP -> ns1 (10.0.1.1:10024      ) MPTCP	(duration  2151ms) [ OK ]
00:09:48.435 # FAIL: Could not even run loopback test
00:09:48.479 not ok 1 selftests: net/mptcp: mptcp_connect.sh # exit=1

Of course, I am currently not able to reproduce it with more debug.

fix "IPv4: Attempt to release TCP socket in state 1 "... on shutdown

running the following syzkaller:

# {Threaded:false Collide:false Repeat:false RepeatTimes:0 Procs:1 Sandbox: Fault:false FaultCall:-1 FaultNth:0 Leak:false NetInjection:false NetDevices:false NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false DevlinkPCI:false UseTmpDir:false HandleSegv:false Repro:false Trace:false}
r0 = socket$inet_mptcp(0x2, 0x1, 0x106)
r1 = socket$inet_mptcp(0x2, 0x1, 0x106)
bind$inet(r1, &(0x7f00000013c0)={0x2, 0x4e20, @multicast2}, 0x10)
connect$inet(r1, &(0x7f0000000040)={0x2, 0x0, @loopback}, 0x10)
listen(r1, 0x3)
connect$inet(r0, &(0x7f0000000040)={0x2, 0x4e20, @loopback}, 0x4d)
sendmsg$inet(r0, &(0x7f0000000280)={0x0, 0x0, &(0x7f0000000000)=[{&(0x7f0000000080)="ff", 0x20000081}], 0x1}, 0x0)

beyond triggering issues/3, it blocks - as expected. Explicitly killing the process yeld the following warning:

"IPv4: Attempt to release TCP socket in state 1"

The call trace to the above printk is the following one:

        7fff82f81a4a printk ([kernel.kallsyms])
        7fff8441f637 __sk_destruct ([kernel.kallsyms])
        7fff84aaecbf subflow_ulp_release ([kernel.kallsyms])
        7fff847747ae tcp_cleanup_ulp ([kernel.kallsyms])
        7fff8474c812 tcp_v4_destroy_sock ([kernel.kallsyms])
        7fff846c383a inet_csk_destroy_sock ([kernel.kallsyms])
        7fff846c42ad inet_csk_listen_stop ([kernel.kallsyms])
        7fff846e3f8c tcp_close ([kernel.kallsyms])
        7fff847c2ea6 inet_release ([kernel.kallsyms])
        7fff84408605 __sock_release ([kernel.kallsyms])
        7fff84a9fda3 mptcp_close ([kernel.kallsyms])
        7fff847c2ea6 inet_release ([kernel.kallsyms])
        7fff84408515 __sock_release ([kernel.kallsyms])
        7fff844086e4 sock_close ([kernel.kallsyms])
        7fff83591224 __fput ([kernel.kallsyms])
        7fff82e5b789 task_work_run ([kernel.kallsyms])
        7fff82df7c3a do_exit ([kernel.kallsyms])
        7fff82df9ded do_group_exit ([kernel.kallsyms])
        7fff82e24a5e get_signal ([kernel.kallsyms])
        7fff82c6777d do_signal ([kernel.kallsyms])
        7fff82c0c51f exit_to_usermode_loop ([kernel.kallsyms])
        7fff82c0ee88 do_syscall_64 ([kernel.kallsyms])
        7fff84c05091 entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])

fix 'mmap' related race

mptcp_connect self-tests with additional '-m mmap' argument produce a splat - WARN_ON_ONCE in mptcp_reset_timer() as the mptcp timer is not initialized yet

get rid of mptcp hooking in tcp_check_req()

We currently have the following code in tcp_check_req():

      if (own_req && sk_is_mptcp(child) && mptcp_sk_is_subflow(child)) {
                reqsk_queue_removed(&inet_csk(sk)->icsk_accept_queue, req);
                inet_csk_reqsk_queue_drop_and_put(sk, req);
                return child;
        }

Can we instead, for mp_join child, in subflow_syn_recv_sock():

  • complete the hashdance
  • adjust sk refcnt,
  • return *own_req == false
    ?
    If not, document properly why (here and/or directly in the code)

Note: no need to call subflow_syn_recv_sock() / sock_rps_save_rxhash() in subflow_syn_recv_sock(), tcp_check_req() will do that, since we will still return child != NULL

allow non 'backup' subflows creation

the PM allows setting/controlling the backup flag in the newly created subflows, but we set it unconditionally in the MPJ handshake.

Let MPJ handshake value be really controlled by the PM, so that we can establish non backup subflow.

The above should/will enable sending data on multiple subflow simult and will likely need some/many fixes to the RX and TX path - notably we may want to pick a different subflow if/when the peer closes the window for the current one.

Allow ss/netstat etc. to show program name for client and listener MPTCP subflows

The TCP sockets list includes the MPTCP subflows.

When dumping the above e.g. with ss or netstat adding the '-p' option, the owining process name is not shown for client and listener MPTCP subflow sockets.

The root cause is that such sock refers to an 'struct socket' allocated internally by the kernel and not linked to any inode.

Possibly:

SOCK_INODE(ssock)->i_ino =  SOCK_INODE(msk->sk_socket)->i_ino;

in mptcp_subflow_create_socket() will address the above, but vfs-related implication must be investigated.

reduce mptcp options space usage

'struct mptcp_options_received' is part of tcp socket and can be zeroed per packet

'struct mptcp_out_options' is part of 'struct tcp_out_options', is allocated on the stack and is cleared once for each xmitted TCP packet.

Both the above mptcp data structer are wider than what is strictly needed, and could be shrinked, saving both memory and performances

ssh restart does not work

When restarting ssh on a kernel which default-enables MPTCP I get the following error:

May 29 17:09:02 server sshd[4931]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 29 17:09:02 server sshd[4931]: error: setsockopt IPV6_V6ONLY: Operation not supported
May 29 17:09:02 server sshd[4931]: error: Bind to port 22 on :: failed: Address already in use.
May 29 17:09:02 server sshd[4931]: fatal: Cannot bind any address.
May 29 17:09:02 server systemd[1]: ssh.service: Main process exited, code=exited, status=255/EXCEPTION
May 29 17:09:02 server systemd[1]: ssh.service: Failed with result 'exit-code'.
May 29 17:09:02 server systemd[1]: Failed to start OpenBSD Secure Shell server.

When doing this while being logged in to the host. I guess it's a missing socket-option (SO_REUSE*?).

implement msk diag interface

so that we can dump additional MPTCP-related info to user-space, e.g.:

  • data seq/ack seq
  • number of subflows
  • number of accepted add_addr
  • number of signaled addresses

[syzkaller] INFO: task hung in lock_sock_nested

syzkaller triggered on top of 6fe9a94 with Florian's patch
patch.txt

INFO: task syz-executor.4:22557 blocked for more than 143 seconds.
      Not tainted 5.6.0 #64
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.4  D    0 22557   2905 0x00000004
Call Trace:
 context_switch kernel/sched/core.c:3380 [inline]
 __schedule+0x23c/0x5f0 kernel/sched/core.c:4080
 schedule+0x4a/0x100 kernel/sched/core.c:4154
 __lock_sock+0x80/0xd0 net/core/sock.c:2424
 lock_sock_nested+0x77/0x80 net/core/sock.c:2949
 lock_sock include/net/sock.h:1574 [inline]
 inet_stream_connect+0x27/0x60 net/ipv4/af_inet.c:718
 mptcp_stream_connect+0xad/0x130 net/mptcp/protocol.c:1658
 __sys_connect_file net/socket.c:1859 [inline]
 __sys_connect+0x140/0x180 net/socket.c:1876
 __do_sys_connect net/socket.c:1887 [inline]
 __se_sys_connect net/socket.c:1884 [inline]
 __x64_sys_connect+0x1e/0x30 net/socket.c:1884
 do_syscall_64+0x91/0x2f0 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f60af9cc469
Code: Bad RIP value.
RSP: 002b:00007f60b00bcdd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 000000000066bf00 RCX: 00007f60af9cc469
RDX: 000000000000006e RSI: 0000000020000040 RDI: 0000000000000008
RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000094
R13: 00000000004142ac R14: 00007f60b00bd5c0 R15: 0000000000000003
NMI backtrace for cpu 0
CPU: 0 PID: 484 Comm: khungtaskd Not tainted 5.6.0 #64
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xda/0x116 lib/dump_stack.c:118
 nmi_cpu_backtrace.cold+0x18/0x64 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x158/0x191 lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
 watchdog+0x5f7/0x750 kernel/hung_task.c:289
 kthread+0x153/0x180 kernel/kthread.c:255
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 1589 Comm: systemd-journal Not tainted 5.6.0 #64
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
RIP: 0010:queued_write_lock_slowpath+0x40/0xa0 kernel/locking/qrwlock.c:77
Code: 0f b1 57 04 75 61 8b 03 85 c0 74 4c f0 81 03 00 01 00 00 b9 ff 00 00 00 be 00 01 00 00 8b 03 3d 00 01 00 00 74 0c f3 90 8b 13 <81> fa 00 01 00 00 75 f4 89 f0 f0 0f b1 0b 3d 00 01 00 00 75 de 48
RSP: 0018:ffffc900001abd78 EFLAGS: 00000006
RAX: 0000000000000300 RBX: ffff888139d60420 RCX: 00000000000000ff
RDX: 0000000000000300 RSI: 0000000000000100 RDI: ffff888139d60420
RBP: ffffc900001abd88 R08: ffff88813b3d9140 R09: 0000000000000000
R10: ffffc90000087e58 R11: 0000000000000001 R12: ffff888139d60424
R13: 0000000000000000 R14: ffff888139d60410 R15: ffff888139d60420
FS:  00007ff97c2a98c0(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff978dac028 CR3: 000000013901e003 CR4: 0000000000160ee0
Call Trace:
 queued_write_lock include/asm-generic/qrwlock.h:95 [inline]
 __raw_write_lock_irq include/linux/rwlock_api_smp.h:197 [inline]
 _raw_write_lock_irq+0x41/0x50 kernel/locking/spinlock.c:311
 ep_scan_ready_list.constprop.0+0x68/0x210 fs/eventpoll.c:684
 ep_send_events fs/eventpoll.c:1766 [inline]
 ep_poll+0xbf/0x600 fs/eventpoll.c:1903
 do_epoll_wait+0x130/0x150 fs/eventpoll.c:2298
 __do_sys_epoll_wait fs/eventpoll.c:2308 [inline]
 __se_sys_epoll_wait fs/eventpoll.c:2305 [inline]
 __x64_sys_epoll_wait+0x22/0x30 fs/eventpoll.c:2305
 do_syscall_64+0x91/0x2f0 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7ff97b573303
Code: 49 89 ca b8 e8 00 00 00 0f 05 48 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 0b c2 00 00 48 89 04 24 49 89 ca b8 e8 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 51 c2 00 00 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00007ffd617ccd10 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8
RAX: ffffffffffffffda RBX: 0000560192d9f1e0 RCX: 00007ff97b573303
RDX: 0000000000000014 RSI: 00007ffd617ccd20 RDI: 0000000000000008
RBP: 00007ffd617ccf10 R08: 0000000000027c8a R09: 000000007735961f
R10: 00000000ffffffff R11: 0000000000000293 R12: 00007ffd617ccd20
R13: 0000000000000001 R14: ffffffffffffffff R15: 00071afd69ee9dd5

syzkaller-repro

# {Threaded:false Collide:false Repeat:false RepeatTimes:0 Procs:1 Sandbox: Fault:false FaultCall:-1 FaultNth:0 Leak:false NetInjection:false NetDevices:false NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false DevlinkPCI:false UseTmpDir:false HandleSegv:false Repro:false Trace:false}
r0 = socket$inet_mptcp(0x2, 0x1, 0x106)
bind$inet(r0, &(0x7f00000013c0)={0x2, 0x4e20}, 0x10)
listen(r0, 0x0)
r1 = socket$inet_mptcp(0x2, 0x1, 0x106)
connect$inet(r1, &(0x7f0000000040)={0x2, 0x4e20, @loopback}, 0x4d)
r2 = accept(r0, 0x0, 0x0)
connect$unix(r2, 0x0, 0x0)

C-repro

// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <endian.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>

uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};

int main(void)
{
  syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 3ul, 0x32ul, -1, 0ul);
  intptr_t res = 0;
  res = syscall(__NR_socket, 2ul, 1ul, 0x106);
  if (res != -1)
    r[0] = res;
  *(uint16_t*)0x200013c0 = 2;
  *(uint16_t*)0x200013c2 = htobe16(0x4e20);
  *(uint32_t*)0x200013c4 = htobe32(0);
  syscall(__NR_bind, r[0], 0x200013c0ul, 0x10ul);
  syscall(__NR_listen, r[0], 0);
  res = syscall(__NR_socket, 2ul, 1ul, 0x106);
  if (res != -1)
    r[1] = res;
  *(uint16_t*)0x20000040 = 2;
  *(uint16_t*)0x20000042 = htobe16(0x4e20);
  *(uint32_t*)0x20000044 = htobe32(0x7f000001);
  syscall(__NR_connect, r[1], 0x20000040ul, 0x4dul);
  res = syscall(__NR_accept, r[0], 0ul, 0ul);
  if (res != -1)
    r[2] = res;
  syscall(__NR_connect, r[2], 0ul, 0ul);
  return 0;
}

[packetdrill] mp_capable: error only in debug mode

When testing packetdrill with some extra debug kconfig, I have these errors:

00:13:29.962 + ./packetdrill/run_all.py -l -v mptcp/mp_capable
00:13:54.998 v1_bind_tcpfallback_wrongver_3rd_ack.pkt:18: error handling packet: live packet field tcp_data_offset: expected: 11 (0xb) vs actual: 5 (0x5)
00:13:55.000 v1_bind_tcpfallback_wrongver_3rd_ack.pkt:18: error handling packet: live packet field tcp_data_offset: expected: 11 (0xb) vs actual: 5 (0x5)
00:13:55.006 v1_bind_tcpfallback_wrongver_3rd_ack.pkt:18: error handling packet: live packet field tcp_data_offset: expected: 11 (0xb) vs actual: 5 (0x5)
00:13:55.008 v1_connect_tcpfallback_wrongver.pkt:15: error handling packet: live packet field ipv4_total_length: expected: 52 (0x34) vs actual: 64 (0x40)
00:13:55.015 script packet:  2.119982 . 1:1(0) ack 1 win 256 <nop,nop,TS val 100 ecr 700>
00:13:55.018 actual packet:  1.045693 S 0:0(0) win 65535 <mss 1460,sackOK,TS val 1145 ecr 0,nop,wscale 8,mp_capable v1 flags: |H| >
00:13:55.024 v1_connect_tcpfallback_wrongver.pkt:15: error handling packet: live packet field ipv6_payload_len: expected: 32 (0x20) vs actual: 44 (0x2c)
00:13:55.025 script packet:  1.803079 . 1:1(0) ack 1 win 256 <nop,nop,TS val 100 ecr 700>
00:13:55.026 actual packet:  1.015422 S 0:0(0) win 65535 <mss 1460,sackOK,TS val 1115 ecr 0,nop,wscale 8,mp_capable v1 flags: |H| >
00:13:55.030 v1_connect_tcpfallback_wrongver.pkt:15: error handling packet: live packet field ipv4_total_length: expected: 52 (0x34) vs actual: 64 (0x40)
00:13:55.032 script packet:  2.731779 . 1:1(0) ack 1 win 256 <nop,nop,TS val 100 ecr 700>
00:13:55.034 actual packet:  1.060444 S 0:0(0) win 65535 <mss 1460,sackOK,TS val 1160 ecr 0,nop,wscale 8,mp_capable v1 flags: |H| >
00:13:55.036 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_flagB.pkt (ipv4)]
00:13:55.038 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_flagB_3rd_ack.pkt (ipv4)]
00:13:55.040 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_flagH.pkt (ipv4-mapped-v6)]
00:13:55.042 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_flagB.pkt (ipv4-mapped-v6)]
00:13:55.045 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_flagB_3rd_ack.pkt (ipv4-mapped-v6)]
00:13:55.046 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_flagH.pkt (ipv4)]
00:13:55.047 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_flagB.pkt (ipv6)]
00:13:55.049 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_flagH.pkt (ipv6)]
00:13:55.050 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_wrongver.pkt (ipv4)]
00:13:55.051 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_flagB_3rd_ack.pkt (ipv6)]
00:13:55.052 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_flagH_3rd_ack.pkt (ipv6)]
00:13:55.053 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_wrongver.pkt (ipv6)]
00:13:55.054 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_flagH_3rd_ack.pkt (ipv4)]
00:13:55.056 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_wrongver.pkt (ipv4-mapped-v6)]
00:13:55.057 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_connect_tcpfallback_flagB.pkt (ipv4-mapped-v6)]
00:13:55.058 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_flagH_3rd_ack.pkt (ipv4-mapped-v6)]
00:13:55.059 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_connect_tcpfallback_flagB.pkt (ipv6)]
00:13:55.060 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_connect_tcpfallback_flagB.pkt (ipv4)]
00:13:55.061 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_connect_tcpfallback_flagH.pkt (ipv4-mapped-v6)]
00:13:55.062 FAIL [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_wrongver_3rd_ack.pkt (ipv4)]
00:13:55.063 stdout: 
00:13:55.064 stderr: 
00:13:55.064 FAIL [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_wrongver_3rd_ack.pkt (ipv6)]
00:13:55.065 stdout: 
00:13:55.065 stderr: 
00:13:55.066 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_connect_tcpfallback_flagH.pkt (ipv6)]
00:13:55.067 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_mp_capable_connect_no_cs.pkt (ipv4)]
00:13:55.068 FAIL [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_bind_tcpfallback_wrongver_3rd_ack.pkt (ipv4-mapped-v6)]
00:13:55.069 stdout: 
00:13:55.069 stderr: 
00:13:55.070 FAIL [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_connect_tcpfallback_wrongver.pkt (ipv4)]
00:13:55.071 stdout: 
00:13:55.071 stderr: 
00:13:55.071 FAIL [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_connect_tcpfallback_wrongver.pkt (ipv6)]
00:13:55.072 stdout: 
00:13:55.072 stderr: 
00:13:55.073 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_mp_capable_bind_no_cs.pkt (ipv4-mapped-v6)]
00:13:55.074 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_mp_capable_connect_no_cs.pkt (ipv4-mapped-v6)]
00:13:55.075 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_connect_tcpfallback_flagH.pkt (ipv4)]
00:13:55.076 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_mp_capable_bind_no_cs.pkt (ipv4)]
00:13:55.077 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_mp_capable_connect_no_cs.pkt (ipv6)]
00:13:55.078 FAIL [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_connect_tcpfallback_wrongver.pkt (ipv4-mapped-v6)]
00:13:55.079 stdout: 
00:13:55.079 stderr: 
00:13:55.080 OK   [/opt/packetdrill/gtests/net/mptcp/mp_capable/v1_mp_capable_bind_no_cs.pkt (ipv6)]
00:13:55.081 Ran   33 tests:   27 passing,    6 failing,    0 timed out (24.01 sec): mptcp/mp_capable

Extra kconfig: -e KASAN -e KASAN_OUTLINE -d TEST_KASAN -e PROVE_LOCKING -e DEBUG_LOCKDEP -e PREEMPT -e DEBUG_PREEMPT -e DEBUG_SLAVE -e DEBUG_PAGEALLOC -e DEBUG_MUTEXES -e DEBUG_SPINLOCK -e DEBUG_ATOMIC_SLEEP -e PROVE_RCU -e DEBUG_OBJECTS_RCU_HEAD

Tested using https://github.com/multipath-tcp/mptcp_net-next/blob/scripts/ci/virtme.sh

let PM netlink update live sockets on local addresses list change

currently, when a new MPTCP endpoint is added and/or deleted, the existing MPTCP sockets are not affected.

The idea would be traversing the MPTCP sockets list and act accordingly: close and destroy subflows using the removed addresses, try to create subflows for newly added addresses, if local constraint allows that.

The above would allow the PM netlink interface to start an active backup scenario.

Blocking accept() does not return

When running a minimal test program with a typical socket/bind/listen/accept sequence on the server side (blocking calls), the accept() does not return after the handshake completes. I first found this with b239a7b (export/20200624T164427).

After adding some debug output, I see that the call to ssock->ops->accept() in mptcp_accept() is not returning. It does resume after SIGINT when the test program is terminated.

The self tests currently run only nonblocking tests. @pabeni mentioned that the fallback refactor changed some connect-time signaling.

I will bisect this and post an update.

[syzkaller] general protection fault in mptcp_stream_connect

HEAD:

4fb948e08615 ("Cleanup") (HEAD) (3 minutes ago)
55a9c834a69e ("net: mptcp: improve fallback to TCP") (3 minutes ago)
431bc5f80631 ("mptcp: add receive buffer auto-tuning") (3 minutes ago)
e15b65dd24f2 ("bpf: fix unused-var without NETDEVICES") (3 minutes ago)
bc4f114 ("[DO-NOT-MERGE] mptcp: enabled by default") (tag: export/20200605T181020, mptcp_net-next/export) (57 minutes ago)
420e02a ("[DO-NOT-MERGE] mptcp: use kmalloc on kasan build") (57 minutes ago)
36b7954 ("mptcp: don't leak msk in token container") (57 minutes ago)
3c886ec ("mptcp: introduce token KUNIT self-tests") (58 minutes ago)
fa2e5ed ("mptcp: move crypto test to KUNIT") (58 minutes ago)
d830aaf ("mptcp: refactor token container.") (58 minutes ago)
1cbe672 ("mptcp: add __init annotation on setup functions") (58 minutes ago)
213fe1d ("mptcp: fix races between shutdown and recvmsg") (58 minutes ago)
5b67054 ("inet_connection_sock: clear inet_num out of destroy helper") (58 minutes ago)
5e296dd ("bpf: fix unused-var without NETDEVICES") (58 minutes ago)
cb8e59c ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") (netnext/master, mptcp_net-next/net-next) (2 days ago)

TCP: request_sock_subflow: Possible SYN flooding on port 20000. Sending cookies.  Check SNMP counters.
TCP: request_sock_subflow: Possible SYN flooding on port 20000. Sending cookies.  Check SNMP counters.
TCP: request_sock_subflow: Possible SYN flooding on port 20000. Sending cookies.  Check SNMP counters.
general protection fault, probably for non-canonical address 0x6376b12300000013: 0000 [#1] SMP PTI
CPU: 0 PID: 2036 Comm: syz-executor.0 Not tainted 5.7.0 #98
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
RIP: 0010:mptcp_stream_connect+0x6f/0x1c0 net/mptcp/protocol.c:1916
Code: 89 de e8 04 6d 2a ff 83 fb 01 0f 84 b3 00 00 00 e8 86 6b 2a ff 49 8b 9f e0 05 00 00 48 85 db 0f 84 9e 00 00 00 e8 71 6b 2a ff <48> 8b 43 20 4c 89 e6 48 89 df 44 89 f1 44 89 ea 48 8b 40 20 e8 d8
RSP: 0018:ffffc9000149fde0 EFLAGS: 00010293
RAX: ffff888136c50d80 RBX: 6376b12300000013 RCX: ffffffff81ef53cc
RDX: 0000000000000000 RSI: ffffffff81ef53ef RDI: 0000000000000005
RBP: ffff88813756ed00 R08: ffff888136c50d80 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9000149fe68
R13: 0000000000000010 R14: 0000000000000002 R15: ffff888136c8a200
FS:  00007fc28faa1700(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000136f08001 CR4: 0000000000160ef0
Call Trace:
 __sys_connect_file+0x98/0xa0 net/socket.c:1854
 __sys_connect+0x109/0x140 net/socket.c:1871
 __do_sys_connect net/socket.c:1882 [inline]
 __se_sys_connect net/socket.c:1879 [inline]
 __x64_sys_connect+0x1a/0x20 net/socket.c:1879
 do_syscall_64+0x75/0x220 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fc28f413469
Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
RSP: 002b:00007fc28faa0dd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 000000000068c0e0 RCX: 00007fc28f413469
RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000006
RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000041427c R14: 00007fc28faa15c0 R15: 0000000000000003
Modules linked in:
Dumping ftrace buffer:
   (ftrace buffer empty)
---[ end trace 6064b28807fd5d39 ]---

syzkaller-reproducer:

# {Threaded:true Collide:true Repeat:true RepeatTimes:0 Procs:1 Sandbox:none Fault:false FaultCall:-1 FaultNth:0 Leak:false NetInjection:true NetDevices:true NetReset:true Cgroups:true BinfmtMisc:true CloseFDs:true KCSAN:false DevlinkPCI:false USB:false UseTmpDir:true HandleSegv:true Repro:false Trace:false}
sendmsg$IPVS_CMD_GET_SERVICE(0xffffffffffffffff, 0x0, 0x8001)
sendmsg$IPVS_CMD_GET_DEST(0xffffffffffffffff, 0x0, 0x4000090)
r0 = socket$inet_mptcp(0x2, 0x1, 0x106)
bind$inet(r0, &(0x7f00000013c0)={0x2, 0x4e20}, 0x10)
listen(r0, 0x0)
r1 = socket$inet_mptcp(0x2, 0x1, 0x106)
connect$inet(r1, &(0x7f00000000c0)={0x2, 0x4e20, @local}, 0x10)
r2 = accept(r0, 0x0, 0x0)
sendmsg$DEVLINK_CMD_GET(0xffffffffffffffff, 0x0, 0x0)
r3 = open(&(0x7f0000000000)='./file1\x00', 0x44242, 0x0)
r4 = syz_open_procfs(0x0, &(0x7f0000000240)='pagemap\x00')
sendfile(r3, r4, 0x0, 0x80006a01)
connect$inet(r3, &(0x7f0000000000)={0x2, 0x4e24, @broadcast}, 0x10)
lsetxattr$trusted_overlay_opaque(0x0, 0x0, 0x0, 0x0, 0x0)
openat(0xffffffffffffff9c, 0x0, 0x0, 0x0)
ioctl$PIO_FONTX(0xffffffffffffffff, 0x4b6c, 0x0)
close(r1)
r5 = socket$nl_audit(0x10, 0x3, 0x9)
dup3(r5, r2, 0x0)

Kernel-Config:
CURRENT_CONFIG.txt

reduce indirect call usage

Current subflow/mptcp hooking make use of quite a bit of additional indirect calls WRT plain TCP.

We can get rid of most of them with:

  • replacing to plain direct calls
  • ICW usage

[syzkaller] WARNING in __mptcp_move_skbs_from_subflow

Head is netnext (d8e79f1 ("nexthop: Fix type of event_type in call_nexthop_notifiers")).

------------[ cut here ]------------
WARNING: CPU: 1 PID: 16 at net/mptcp/protocol.c:249 __mptcp_move_skbs_from_subflow+0x7cb/0xa50 net/mptcp/protocol.c:249
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.7.0-rc6 #83
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xb7/0xfe lib/dump_stack.c:118
 panic+0x22d/0x5b2 kernel/panic.c:221
 __warn.cold+0x2f/0x3b kernel/panic.c:582
 report_bug+0x1d1/0x200 lib/bug.c:195
 fixup_bug arch/x86/kernel/traps.c:175 [inline]
 fixup_bug arch/x86/kernel/traps.c:170 [inline]
 do_error_trap+0xcf/0x100 arch/x86/kernel/traps.c:267
 do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286
 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:__mptcp_move_skbs_from_subflow+0x7cb/0xa50 net/mptcp/protocol.c:249
Code: 03 5c 24 64 2b 5c 24 5c 44 89 ff 44 01 7c 24 10 89 de 44 01 7c 24 34 e8 73 b2 b0 fe 41 39 df 0f 86 31 fb ff ff e8 75 b1 b0 fe <0f> 0b e9 a4 fb ff ff e8 69 b1 b0 fe 49 8d be c8 00 00 00 e8 fd 7d
RSP: 0018:ffff88811a9cf3a0 EFLAGS: 00010206
RAX: ffff88811a976200 RBX: 0000000000007f70 RCX: ffffffff8273c15d
RDX: 0000000000000100 RSI: ffffffff8273c16b RDI: 0000000000000004
RBP: ffff88806f4c20d8 R08: ffff88811a976200 R09: ffffed10233fdb7e
R10: ffff888119fedbeb R11: ffffed10233fdb7d R12: ffff888119fedb00
R13: ffff888119fedbe8 R14: ffff8880672f5c80 R15: 0000000000007fe4
 move_skbs_to_msk+0x153/0x160 net/mptcp/protocol.c:287
 mptcp_data_ready+0x85/0x1d0 net/mptcp/protocol.c:301
 subflow_data_ready+0xc7/0xe0 net/mptcp/subflow.c:892
 tcp_data_ready+0x72/0x110 net/ipv4/tcp_input.c:4776
 tcp_data_queue+0x9a8/0x2200 net/ipv4/tcp_input.c:4842
 tcp_rcv_established+0x4ab/0xed0 net/ipv4/tcp_input.c:5735
 tcp_v4_do_rcv+0x342/0x480 net/ipv4/tcp_ipv4.c:1623
 tcp_v4_rcv+0x1a9a/0x1c00 net/ipv4/tcp_ipv4.c:2005
 ip_protocol_deliver_rcu+0x42/0x380 net/ipv4/ip_input.c:204
 ip_local_deliver_finish+0xc3/0xe0 net/ipv4/ip_input.c:231
 NF_HOOK include/linux/netfilter.h:307 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_local_deliver+0x162/0x220 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:441 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:428 [inline]
 ip_rcv_finish+0x79/0x90 net/ipv4/ip_input.c:414
 NF_HOOK include/linux/netfilter.h:307 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_rcv+0x19d/0x1b0 net/ipv4/ip_input.c:539
 __netif_receive_skb_one_core+0x156/0x180 net/core/dev.c:5268
 __netif_receive_skb+0x29/0xd0 net/core/dev.c:5382
 process_backlog+0x133/0x2d0 net/core/dev.c:6214
 napi_poll net/core/dev.c:6659 [inline]
 net_rx_action+0x2c0/0x7b0 net/core/dev.c:6727
 __do_softirq+0x10d/0x3be kernel/softirq.c:292
 run_ksoftirqd kernel/softirq.c:604 [inline]
 run_ksoftirqd+0x15/0x20 kernel/softirq.c:596
 smpboot_thread_fn+0x24d/0x3c0 kernel/smpboot.c:165
 kthread+0x1ba/0x210 kernel/kthread.c:268
 ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:351
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 1 seconds..

Syzkaller reproducer:

# {Threaded:true Collide:false Repeat:true RepeatTimes:0 Procs:1 Sandbox:none Fault:false FaultCall:-1 FaultNth:0 Leak:false NetInjection:true NetDevices:true NetReset:false Cgroups:true BinfmtMisc:true CloseFDs:true KCSAN:false DevlinkPCI:false UseTmpDir:true HandleSegv:false Repro:false Trace:false}
r0 = socket$inet_mptcp(0x2, 0x1, 0x106)
r1 = socket$inet_mptcp(0x2, 0x1, 0x106)
bind$inet(r1, &(0x7f00000013c0)={0x2, 0x4e20, @multicast2}, 0x10)
connect$inet(r1, &(0x7f0000000040)={0x2, 0x0, @loopback}, 0x10)
listen(r1, 0x3)
r2 = socket$inet6_tcp(0xa, 0x1, 0x0)
ioctl$sock_SIOCGIFVLAN_SET_VLAN_NAME_TYPE_CMD(r2, 0x8982, 0x0)
connect$inet(r0, &(0x7f0000000040)={0x2, 0x4e20, @loopback}, 0x4d)
sendmsg$inet(r0, &(0x7f0000000280)={0x0, 0x0, &(0x7f0000000000)=[{&(0x7f0000000080)="ff", 0xff3e}], 0x1}, 0x0)

C repro attached.

[syzkaller] WARNING in subflow_data_ready

HEAD is at:

4fb948e08615 ("Cleanup") (HEAD) (3 minutes ago)
55a9c834a69e ("net: mptcp: improve fallback to TCP") (3 minutes ago)
431bc5f80631 ("mptcp: add receive buffer auto-tuning") (3 minutes ago)
e15b65dd24f2 ("bpf: fix unused-var without NETDEVICES") (3 minutes ago)
bc4f114 ("[DO-NOT-MERGE] mptcp: enabled by default") (tag: export/20200605T181020, mptcp_net-next/export) (57 minutes ago)
420e02a ("[DO-NOT-MERGE] mptcp: use kmalloc on kasan build") (57 minutes ago)
36b7954 ("mptcp: don't leak msk in token container") (57 minutes ago)
3c886ec ("mptcp: introduce token KUNIT self-tests") (58 minutes ago)
fa2e5ed ("mptcp: move crypto test to KUNIT") (58 minutes ago)
d830aaf ("mptcp: refactor token container.") (58 minutes ago)
1cbe672 ("mptcp: add __init annotation on setup functions") (58 minutes ago)
213fe1d ("mptcp: fix races between shutdown and recvmsg") (58 minutes ago)
5b67054 ("inet_connection_sock: clear inet_num out of destroy helper") (58 minutes ago)
5e296dd ("bpf: fix unused-var without NETDEVICES") (58 minutes ago)
cb8e59c ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") (netnext/master, mptcp_net-next/net-next) (2 days ago)

------------[ cut here ]------------
WARNING: CPU: 1 PID: 1951 at net/mptcp/subflow.c:920 subflow_data_ready+0x16c/0x1d0 net/mptcp/subflow.c:920
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 1951 Comm: syz-executor357 Not tainted 5.7.0-rc7 #90
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xb7/0xfe lib/dump_stack.c:118
 panic+0x22d/0x5b2 kernel/panic.c:221
 __warn.cold+0x2f/0x3b kernel/panic.c:582
 report_bug+0x1d1/0x200 lib/bug.c:195
 fixup_bug arch/x86/kernel/traps.c:175 [inline]
 fixup_bug arch/x86/kernel/traps.c:170 [inline]
 do_error_trap+0xcf/0x100 arch/x86/kernel/traps.c:267
 do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286
 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:subflow_data_ready+0x16c/0x1d0 net/mptcp/subflow.c:920
Code: 5f c3 e8 f7 fc af fe 49 8d 7e 48 e8 4e c7 cf fe 41 0f b6 5e 48 31 ff 83 e3 18 89 de e8 0d fe af fe 84 db 75 87 e8 d4 fc af fe <0f> 0b e9 7b ff ff ff e8 c8 fc af fe 48 89 ee 4c 89 ef e8 ed b5 ff
RSP: 0018:ffff8881116579f0 EFLAGS: 00010293
RAX: ffff888113dd4600 RBX: 0000000000000000 RCX: ffffffff827485f3
RDX: 0000000000000000 RSI: ffffffff827485fc RDI: 0000000000000001
RBP: ffff88810e7d8940 R08: ffff888113dd4600 R09: ffffed10222320b4
R10: ffff88811119059f R11: ffffed10222320b3 R12: 1ffff110222caf3e
R13: ffff888111190000 R14: ffff888117c34800 R15: ffff888111190598
 tcp_data_ready+0x72/0x110 net/ipv4/tcp_input.c:4776
 tcp_data_queue+0x9a8/0x2200 net/ipv4/tcp_input.c:4842
 tcp_rcv_state_process+0x7d4/0x25aa net/ipv4/tcp_input.c:6392
 tcp_v4_do_rcv+0x1ed/0x480 net/ipv4/tcp_ipv4.c:1651
 sk_backlog_rcv include/net/sock.h:996 [inline]
 __release_sock+0x12b/0x1d0 net/core/sock.c:2546
 release_sock+0x40/0x100 net/core/sock.c:3062
 mptcp_subflow_shutdown net/mptcp/protocol.c:1403 [inline]
 mptcp_shutdown+0x15f/0x320 net/mptcp/protocol.c:2115
 __sys_shutdown+0xce/0x150 net/socket.c:2203
 __do_sys_shutdown net/socket.c:2211 [inline]
 __se_sys_shutdown net/socket.c:2209 [inline]
 __x64_sys_shutdown+0x2b/0x30 net/socket.c:2209
 do_syscall_64+0x8a/0x290 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f7bb01c2469
Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf584ac58 EFLAGS: 00000246 ORIG_RAX: 0000000000000030
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7bb01c2469
RDX: 00007f7bb01c2469 RSI: 0000000000000001 RDI: 0000000000000003
RBP: 0000000000400680 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000040059f
R13: 00007ffcf584ad40 R14: 0000000000000000 R15: 0000000000000000
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 1 seconds..

syz-repro:

# {Threaded:false Collide:false Repeat:false RepeatTimes:0 Procs:1 Sandbox: Fault:false FaultCall:-1 FaultNth:0 Leak:false NetInjection:false NetDevices:false NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false DevlinkPCI:false UseTmpDir:false HandleSegv:false Repro:false Trace:false}
r0 = socket$inet_mptcp(0x2, 0x1, 0x106)
bind$inet(r0, &(0x7f00000013c0)={0x2, 0x4e20, @multicast2}, 0x10)
connect$inet(r0, &(0x7f0000000040)={0x2, 0x4e20, @loopback}, 0x4d)
shutdown(r0, 0x1)

C-repro:

// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <endian.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>

uint64_t r[1] = {0xffffffffffffffff};

int main(void)
{
  syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 3ul, 0x32ul, -1, 0ul);
  intptr_t res = 0;
  res = syscall(__NR_socket, 2ul, 1ul, 0x106);
  if (res != -1)
    r[0] = res;
  *(uint16_t*)0x200013c0 = 2;
  *(uint16_t*)0x200013c2 = htobe16(0x4e20);
  *(uint32_t*)0x200013c4 = htobe32(0xe0000002);
  syscall(__NR_bind, r[0], 0x200013c0ul, 0x10ul);
  *(uint16_t*)0x20000040 = 2;
  *(uint16_t*)0x20000042 = htobe16(0x4e20);
  *(uint32_t*)0x20000044 = htobe32(0x7f000001);
  syscall(__NR_connect, r[0], 0x20000040ul, 0x4dul);
  syscall(__NR_shutdown, r[0], 1ul);
  return 0;
}

[EDIT 06/05: Updated HEAD]

remove unneeded branches in critical path

mptcp_write_options() has a lot of unneeded branch instruction (e.g. no need to check for all MPJ or DSS options if we already included MPC).

The above is somewhat related to #15: transform mptcp_out_options into an union will make explicit that some options could not be added into the same packet.

Likely other core functions should be audited.

reduce mptcp_out_option struct size

the tcp_out_option struct is allocated on the stack memset()-ed by tcp_make_synack() and __tcp_transmit_skb().

The 'TCP' part of it is 24 bytes, the MPTCP options account for 120 bytes.

Since we control MPTCP option creation/insertion and we do not allow simult MPC,MPJ or DSS we could put most MPTCP fields under an union.

Additionally option writing could be cleaned-up a bit avoiding several conditionals - no need to check for anything else after MPC or MPJ

cleanup sendmsg_frag allocation

Currently sendmsg_frag may block with msk socket lock held:

  • directly, in the mptcp_page_frag_refill()/mptcp_ext_cache_refill() loop
  • indirectly, via do_tcp_sendpages(), if sk_stream_memory_free() fails
  • indirectly, via do_tcp_sendpages(), trying to allocate the new skb
  • indirectly, via do_tcp_sendpages(), trying to forward allocate the memory account for skb

note mptcp currently pick a subflow only if sk_stream_memory_free() is true, but that condition may change with consecutive calls to sendmsg_frag() due to large user-space buffer.

Blocking with msk socket lock held is bad, we should avoid it. We could try to address with several changes:

  • in the mptcp_page_frag_refill()/mptcp_ext_cache_refill() loop:

    • call sk_stream_wait_memory() on msk,
    • additionally releasing and acquiring the ssk socket lock around it.
    • loop on memory account conditions, too - that part is tricky, as we can do accurate memory accounting only with more information on pkt size available later
  • [ab-]use skb_tx_cache to pre-allocate the skb in mptcp_page_frag_refill() memory allocation loop

  • picking a new subflow if the TCP window is closed/the sendbuf is full for the current one

WARNING: Bad mapping: ssn=1 map_seq=498340137 map_data_len=79

Hit it once, while running apache-benchmark stress-test with 100 concurrent clients up to 100k requests for file-sizes of 1KB (aka., test "simple_ab").

server login: [   57.507531] ------------[ cut here ]------------
[   57.514702] Bad mapping: ssn=1 map_seq=498340137 map_data_len=79
[   57.514965] WARNING: CPU: 2 PID: 21 at net/mptcp/subflow.c:602 warn_bad_map.isra.0.part.0+0x3e/0x50
[   57.517687] Kernel panic - not syncing: panic_on_warn set ...
[   57.518693] CPU: 2 PID: 21 Comm: ksoftirqd/2 Kdump: loaded Not tainted 5.7.0-rc6.mptcp #72
[   57.520067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[   57.522082] Call Trace:
[   57.522587]  dump_stack+0x76/0xa0
[   57.523332]  panic+0x19a/0x389
[   57.527019]  __warn.cold+0x25/0x2d
[   57.528493]  report_bug+0x10f/0x140
[   57.529145]  do_error_trap+0xcf/0x100
[   57.530655]  do_invalid_op+0x3e/0x50
[   57.532445]  invalid_op+0x1e/0x30
[   57.533104] RIP: 0010:warn_bad_map.isra.0.part.0+0x3e/0x50
[   57.534067] Code: 55 4b d9 00 01 e8 22 51 6a ff 44 8b 6d 00 48 89 df e8 16 51 6a ff 8b 13 44 89 e6 48 c7 c7 c0 4d 33 82 44 89 e9 e8 29 56 4a ff <0f> 0b 5b 5d 41 5c 41 5d c3 66 0f 1f 84 00 00 00 00 00 55 ba 80 00
[   57.537416] RSP: 0018:ffff88811ad9ef18 EFLAGS: 00010282
[   57.538372] RAX: 0000000000000000 RBX: ffff88811743c63c RCX: 0000000000000000
[   57.539857] RDX: 0000000000000004 RSI: ffffffff82f28fd4 RDI: ffffed10235b3dd5
[   57.541131] RBP: ffff88811743c644 R08: 0000000000000001 R09: fffffbfff05e5297
[   57.542377] R10: ffffffff82f294b3 R11: fffffbfff05e5296 R12: 0000000000000001
[   57.543692] R13: 000000000000004f R14: ffff8881188e02f7 R15: ffff88811743c600
[   57.545974]  mptcp_subflow_data_available+0xce8/0xec0
[   57.548493]  subflow_data_ready+0xcb/0x160
[   57.550768]  tcp_data_queue+0x7ae/0x1a30
[   57.555834]  tcp_rcv_established+0x38f/0xa90
[   57.560276]  tcp_v4_do_rcv+0x253/0x350
[   57.560937]  tcp_v4_rcv+0x145b/0x1580
[   57.563106]  ip_protocol_deliver_rcu+0x37/0x270
[   57.563890]  ip_local_deliver_finish+0xa9/0xc0
[   57.564653]  ip_local_deliver+0x1b4/0x1c0
[   57.566978]  ip_sublist_rcv_finish+0x84/0xa0
[   57.567712]  ip_sublist_rcv+0x22c/0x310
[   57.572038]  ip_list_rcv+0x1e4/0x225
[   57.574483]  __netif_receive_skb_list_core+0x439/0x460
[   57.577686]  netif_receive_skb_list_internal+0x3e3/0x560
[   57.580386]  gro_normal_list.part.0+0x14/0x50
[   57.581154]  napi_gro_receive+0x6a/0xb0
[   57.581837]  receive_buf+0x371/0x1cf0
[   57.585596]  virtnet_poll+0x2b7/0x5a0
[   57.587867]  net_rx_action+0x1ec/0x4c0
[   57.590169]  __do_softirq+0xfc/0x29c
[   57.592258]  run_ksoftirqd+0x15/0x20
[   57.592886]  smpboot_thread_fn+0x19d/0x2d0
[   57.595639]  kthread+0x1cc/0x1f0
[   57.597146]  ret_from_fork+0x35/0x40

Allow MPTCP + SYN_COOKIES

Just as a "reminder" so it is tracked as a task:

MPTCP + SYN_COOKIES support is important to allow webservers to enable MPTCP. Currently we are falling back to regular TCP when SYN-cookies are kicking in.

Revisit layout of struct mptcp_subflow_context

From Paolo:

diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index e4ca6320ce76..f5adca93e8fb 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -290,6 +290,7 @@ struct mptcp_subflow_context {
            data_avail : 1,
            rx_eof : 1,
            data_fin_tx_enable : 1,
+           use_64bit_ack : 1, /* Set when we received a 64-bit DSN */
            can_ack : 1;        /* only after processing the remote a key */
    u64     data_fin_tx_seq;
    u32     remote_nonce;

Possibly with a wider scope than this patch, but the above bitfields
contains both fields that are rarely changed (data_fin_tx_enable,
can_ack, request_mptcp, etc...) and fields that are set on every packet
/every DSS (map_vaild, data_avail, the newly added use_64bit_ack).

At least 'data_avail' is accessed without holding the ssk socket lock
by mptcp_subflow_recv_lookup(). Perhaps we could revisit the binary
layout of this struct?

Sparse issues (`make C=1`)

Hello,

When I use make C=1 net/mptcp/<file>.c, I guess two errors:

  • net/mptcp/mptcp_diag.c:
./include/net/sock.h:1612:31: warning: context imbalance in 'mptcp_diag_get_info' - unexpected unlock
  • net/mptcp/protocol.c:
net/mptcp/protocol.c:1531:24: warning: context imbalance in 'mptcp_sk_clone' - unexpected unlock

Is it really an issue? If not, is there a way to add something to avoid them?

[syzkaller] INFO: task hung in lock_sock_nested

HEAD is at:

4fb948e08615 ("Cleanup") (HEAD) (3 minutes ago)
55a9c834a69e ("net: mptcp: improve fallback to TCP") (3 minutes ago)
431bc5f80631 ("mptcp: add receive buffer auto-tuning") (3 minutes ago)
e15b65dd24f2 ("bpf: fix unused-var without NETDEVICES") (3 minutes ago)
bc4f114 ("[DO-NOT-MERGE] mptcp: enabled by default") (tag: export/20200605T181020, mptcp_net-next/export) (57 minutes ago)
420e02a ("[DO-NOT-MERGE] mptcp: use kmalloc on kasan build") (57 minutes ago)
36b7954 ("mptcp: don't leak msk in token container") (57 minutes ago)
3c886ec ("mptcp: introduce token KUNIT self-tests") (58 minutes ago)
fa2e5ed ("mptcp: move crypto test to KUNIT") (58 minutes ago)
d830aaf ("mptcp: refactor token container.") (58 minutes ago)
1cbe672 ("mptcp: add __init annotation on setup functions") (58 minutes ago)
213fe1d ("mptcp: fix races between shutdown and recvmsg") (58 minutes ago)
5b67054 ("inet_connection_sock: clear inet_num out of destroy helper") (58 minutes ago)
5e296dd ("bpf: fix unused-var without NETDEVICES") (58 minutes ago)
cb8e59c ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") (netnext/master, mptcp_net-next/net-next) (2 days ago)

Process accounting resumed
INFO: task syz-executor.5:32294 blocked for more than 143 seconds.
      Not tainted 5.7.0 #98
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.5  D    0 32294  28072 0x00000004
Call Trace:
 context_switch kernel/sched/core.c:3430 [inline]
 __schedule+0x1e5/0x580 kernel/sched/core.c:4156
 schedule+0x3d/0xa0 kernel/sched/core.c:4231
 __lock_sock+0x82/0xd0 net/core/sock.c:2524
 lock_sock_nested+0x69/0x70 net/core/sock.c:3049
 lock_sock include/net/sock.h:1576 [inline]
 __inet_bind+0x38e/0x440 net/ipv4/af_inet.c:514
 inet_bind+0x7d/0xa0 net/ipv4/af_inet.c:457
 mptcp_bind+0x67/0xb0 net/mptcp/protocol.c:1875
 __sys_bind+0x14b/0x170 net/socket.c:1657
 __do_sys_bind net/socket.c:1668 [inline]
 __se_sys_bind net/socket.c:1666 [inline]
 __x64_sys_bind+0x1a/0x20 net/socket.c:1666
 do_syscall_64+0x75/0x220 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Potentially fixed with upcoming post by Paolo.

[syzkaller] WARNING in mptcp_token_destroy

HEAD is at:

14a0decd6073 ("mptcp: do nonce initialization at subflow creation time") (HEAD) (9 minutes ago)
9dd0407e07c4 ("Cleanup") (9 minutes ago)
420165c3e256 ("mptcp: add receive buffer auto-tuning") (9 minutes ago)
ddd7a892c5d6 ("[DO-NOT-MERGE] mptcp: enabled by default") (9 minutes ago)
ed5ec40ec710 ("[DO-NOT-MERGE] mptcp: use kmalloc on kasan build") (9 minutes ago)
3f2b916101bd ("[Paolo] Squash-to: "net: mptcp: improve fallback to TCP"") (9 minutes ago)
f1d139829ea2 ("net: mptcp: improve fallback to TCP") (11 minutes ago)
ad9bbaa55b9e ("mptcp: don't leak msk in token container") (13 minutes ago)
2e60720dc2cd ("mptcp: introduce token KUNIT self-tests") (13 minutes ago)
40c8b9bd5c19 ("mptcp: move crypto test to KUNIT") (13 minutes ago)
ba4e64831793 ("Squash-to: "mptcp: refactor token container."") (13 minutes ago)
0c9f8be ("mptcp: refactor token container.") (11 hours ago)
d23a86b ("mptcp: add __init annotation on setup functions") (11 hours ago)
70a8b39 ("mptcp: fix races between shutdown and recvmsg") (11 hours ago)
952f1f9 ("inet_connection_sock: clear inet_num out of destroy helper") (11 hours ago)
77b9313 ("bpf: fix unused-var without NETDEVICES") (11 hours ago)
cb8e59c ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") (netnext/master, mptcp_net-next/net-next) (5 days ago)

TCP: request_sock_subflow: Possible SYN flooding on port 20000. Sending cookies.  Check SNMP counters.
------------[ cut here ]------------
WARNING: CPU: 1 PID: 23210 at net/mptcp/token.c:284 mptcp_token_destroy+0x2bc/0x350 net/mptcp/token.c:284
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 23210 Comm: syz-executor.2 Not tainted 5.7.0 #105
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xb7/0xfe lib/dump_stack.c:118
 panic+0x29e/0x692 kernel/panic.c:221
 __warn.cold+0x2f/0x3d kernel/panic.c:582
 report_bug+0x28b/0x2f0 lib/bug.c:195
 fixup_bug arch/x86/kernel/traps.c:105 [inline]
 fixup_bug arch/x86/kernel/traps.c:100 [inline]
 do_error_trap+0x10f/0x180 arch/x86/kernel/traps.c:197
 do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:216
 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:mptcp_token_destroy+0x2bc/0x350 net/mptcp/token.c:284
Code: fe 49 8d 7d 08 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 6e 49 89 6d 08 e9 54 ff ff ff e8 84 5f 3d fe <0f> 0b eb ab 48 89 ef e8 d8 3b 64 fe e9 9c fd ff ff 48 89 ef e8 cb
RSP: 0018:ffff88811457fcf0 EFLAGS: 00010212
RAX: 0000000000040000 RBX: 0000000000000000 RCX: ffffc90002959000
RDX: 0000000000000035 RSI: ffffffff82fc518c RDI: 0000000000000005
RBP: 0000000000000001 R08: ffff888117ddc600 R09: ffffed10228aff94
R10: 0000000000000003 R11: ffffed10228aff93 R12: 0000000000000000
R13: dffffc0000000000 R14: ffff888119976300 R15: ffff888119680000
 mptcp_stream_connect+0x2ac/0x5f0 net/mptcp/protocol.c:1901
 __sys_connect_file net/socket.c:1854 [inline]
 __sys_connect+0x267/0x2f0 net/socket.c:1871
 __do_sys_connect net/socket.c:1882 [inline]
 __se_sys_connect net/socket.c:1879 [inline]
 __x64_sys_connect+0x6f/0xb0 net/socket.c:1879
 do_syscall_64+0xb7/0x3d0 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7ff9d5f1c469
Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
RSP: 002b:00007ff9d660cdd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 000000000068bf00 RCX: 00007ff9d5f1c469
RDX: 0000000000000014 RSI: 0000000020000180 RDI: 0000000000000005
RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000041430a R14: 00007ff9d660d5c0 R15: 0000000000000003
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 1 seconds..

syzkaller-repro:

# {Threaded:false Collide:false Repeat:true RepeatTimes:0 Procs:1 Sandbox:none Fault:false FaultCall:-1 FaultNth:0 Leak:false NetInjection:true NetDevices:true NetReset:false Cgroups:true BinfmtMisc:true CloseFDs:true KCSAN:false DevlinkPCI:false USB:false UseTmpDir:true HandleSegv:true Repro:false Trace:false}
r0 = socket$inet_mptcp(0x2, 0x1, 0x106)
r1 = socket$inet_mptcp(0x2, 0x1, 0x106)
bind$inet(r1, &(0x7f00000013c0)={0x2, 0x4e20, @multicast2}, 0x10)
connect$inet(r1, &(0x7f0000000040)={0x2, 0x0, @loopback}, 0x10)
listen(r1, 0x3)
connect$inet(r0, &(0x7f0000000040)={0x2, 0x4e20, @loopback}, 0x4d)
sendmsg$inet(r0, &(0x7f0000000280)={0x0, 0x0, &(0x7f0000000000)=[{&(0x7f0000000080)="ff", 0xfffffdef}], 0x4}, 0x40)
r2 = accept(r1, 0x0, 0x0)
recvfrom$unix(r2, &(0x7f0000000140)=""/43, 0xfffffffffffffdd2, 0x0, 0x0, 0x0)
sendmsg$NL80211_CMD_LEAVE_MESH(r2, 0x0, 0x0)

C-repro:
repro.txt

kernel-config:
CURRENT_CONFIG.txt

[EDIT 06/09: Updated HEAD]

Unstable packetdrill tests

Hello,

I just noticed that in the recent builds from my CI, packetdrill tests were unstable: sometimes OK, sometimes not. Always the same error:

00:16:27.237 mpc_with_data_client.pkt:24: error handling packet: MPTCP option mismatch: 30
00:16:27.237 script packet:  0.420431 P. 1001:1501(500) ack 1 <nop,nop,TS val 100 ecr 700,dss dack4 16777216 dsn8 16790263835767341056 ssn 3909287936 dll 62465 no_checksum flags: MmA,nop,nop>
00:16:27.237 actual packet:  0.410428 P. 1001:1501(500) ack 1 win 256 <nop,nop,TS val 511 ecr 700,dss dack4 3007449509 dsn8 2402075853973337702 ssn 1001 dll 500 no_checksum flags: MmA,nop,nop>
00:16:27.237 OK   [/opt/packetdrill/gtests/net/mptcp/dss/dss_ssn_specified_client.pkt (ipv4)]
00:16:27.237 OK   [/opt/packetdrill/gtests/net/mptcp/dss/dss_ssn_specified_client.pkt (ipv6)]
00:16:27.237 OK   [/opt/packetdrill/gtests/net/mptcp/dss/dss_ssn_specified_server.pkt (ipv4-mapped-v6)]
00:16:27.237 FAIL [/opt/packetdrill/gtests/net/mptcp/dss/mpc_with_data_client.pkt (ipv4)]
00:16:27.237 stdout: 
00:16:27.237 stderr: 
00:16:27.237 OK   [/opt/packetdrill/gtests/net/mptcp/dss/mpc_with_data_server.pkt (ipv4)]
00:16:27.237 OK   [/opt/packetdrill/gtests/net/mptcp/dss/mpc_with_data_server.pkt (ipv6)]
00:16:27.237 OK   [/opt/packetdrill/gtests/net/mptcp/dss/dss_ssn_specified_client.pkt (ipv4-mapped-v6)]
00:16:27.237 OK   [/opt/packetdrill/gtests/net/mptcp/dss/dss_ssn_specified_server.pkt (ipv6)]
00:16:27.237 OK   [/opt/packetdrill/gtests/net/mptcp/dss/mpc_with_data_client.pkt (ipv6)]
00:16:27.237 OK   [/opt/packetdrill/gtests/net/mptcp/dss/dss_ssn_specified_server.pkt (ipv4)]
00:16:27.237 OK   [/opt/packetdrill/gtests/net/mptcp/dss/mpc_with_data_server.pkt (ipv4-mapped-v6)]
00:16:27.237 OK   [/opt/packetdrill/gtests/net/mptcp/dss/mpc_with_data_client.pkt (ipv4-mapped-v6)]
00:16:27.237 Ran   12 tests:   11 passing,    1 failing,    0 timed out (4.08 sec): mptcp/dss
00:31:41.454 mpc_with_data_client.pkt:24: error handling packet: MPTCP option mismatch: 30
00:31:41.454 script packet:  0.421795 P. 1001:1501(500) ack 1 <nop,nop,TS val 100 ecr 700,dss dack4 16777216 dsn8 16790263835767341056 ssn 3909287936 dll 62465 no_checksum flags: MmA,nop,nop>
00:31:41.454 actual packet:  0.411773 P. 1001:1501(500) ack 1 win 256 <nop,nop,TS val 509 ecr 700,dss dack4 3007449509 dsn8 1816876830602220548 ssn 1001 dll 500 no_checksum flags: MmA,nop,nop>
00:31:41.454 OK   [/opt/packetdrill/gtests/net/mptcp/dss/dss_ssn_specified_client.pkt (ipv4)]
00:31:41.454 OK   [/opt/packetdrill/gtests/net/mptcp/dss/dss_ssn_specified_client.pkt (ipv6)]
00:31:41.454 OK   [/opt/packetdrill/gtests/net/mptcp/dss/dss_ssn_specified_server.pkt (ipv4-mapped-v6)]
00:31:41.454 OK   [/opt/packetdrill/gtests/net/mptcp/dss/mpc_with_data_client.pkt (ipv4)]
00:31:41.454 OK   [/opt/packetdrill/gtests/net/mptcp/dss/mpc_with_data_server.pkt (ipv4)]
00:31:41.454 OK   [/opt/packetdrill/gtests/net/mptcp/dss/dss_ssn_specified_client.pkt (ipv4-mapped-v6)]
00:31:41.454 OK   [/opt/packetdrill/gtests/net/mptcp/dss/dss_ssn_specified_server.pkt (ipv6)]
00:31:41.454 FAIL [/opt/packetdrill/gtests/net/mptcp/dss/mpc_with_data_client.pkt (ipv6)]
00:31:41.454 stdout: 
00:31:41.454 stderr: 
00:31:41.454 OK   [/opt/packetdrill/gtests/net/mptcp/dss/mpc_with_data_server.pkt (ipv6)]
00:31:41.454 OK   [/opt/packetdrill/gtests/net/mptcp/dss/dss_ssn_specified_server.pkt (ipv4)]
00:31:41.454 OK   [/opt/packetdrill/gtests/net/mptcp/dss/mpc_with_data_server.pkt (ipv4-mapped-v6)]
00:31:41.454 OK   [/opt/packetdrill/gtests/net/mptcp/dss/mpc_with_data_client.pkt (ipv4-mapped-v6)]
00:31:41.454 Ran   12 tests:   11 passing,    1 failing,    0 timed out (9.67 sec): mptcp/dss

So far, I cannot reproduce it locally.

[syzkaller] WARNING in subflow_data_ready (v2.0)

HEAD is:

2d948a5e3ead ("mptcp: check for plain TCP sock at accept time") (HEAD) (10 minutes ago)
404b0abd5a9a ("mptcp: support IPV6_V6ONLY setsockopt") (2 hours ago)
5355be031726 ("mptcp: add REUSEADDR/REUSEADDR support") (2 hours ago)
84f9bcd3c1fc ("net: use mptcp setsockopt function for SOL_SOCKET on mptcp sockets") (2 hours ago)
0ec5e66cad3d ("selftests/mptcp: Capture pcap on both sender and receiver") (2 hours ago)
952f6a1 ("[DO-NOT-MERGE] mptcp: enabled by default") (tag: export/20200616T013825, mptcp_net-next/export) (15 hours ago)
b6b8b80 ("[DO-NOT-MERGE] mptcp: use kmalloc on kasan build") (15 hours ago)
c8807b1 ("mptcp: close poll() races") (15 hours ago)
9c7e7ff ("mptcp: add receive buffer auto-tuning") (15 hours ago)
c8c1a0f ("selftests: mptcp: add option to specify size of file to transfer") (15 hours ago)
0e68992 ("mptcp: fallback in case of simultaneous connect") (15 hours ago)
147eccc ("net: mptcp: improve fallback to TCP") (15 hours ago)
2579482 ("mptcp: introduce token KUNIT self-tests") (15 hours ago)
5d5fbf5 ("mptcp: move crypto test to KUNIT") (15 hours ago)
56530a9 ("mptcp: do nonce initialization at subflow creation time") (15 hours ago)
7a138a0 ("mptcp: refactor token container") (15 hours ago)
1dd5359 ("mptcp: add __init annotation on setup functions") (15 hours ago)
292de25 ("mptcp: drop MP_JOIN request sock on syn cookies") (15 hours ago)
9bdf579 ("mptcp: cache msk on MP_JOIN init_req") (15 hours ago)
b3a9e3b ("Linux 5.8-rc1") (tag: v5.8-rc1, netnext/master, mptcp_net-next/net-next) (2 days ago)

splat:

TCP: request_sock_subflow: Possible SYN flooding on port 20000. Sending cookies.  Check SNMP counters.
TCP: request_sock_subflow: Possible SYN flooding on port 20000. Sending cookies.  Check SNMP counters.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 9370 at net/mptcp/subflow.c:885 subflow_data_ready+0x1e6/0x290 net/mptcp/subflow.c:885
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 9370 Comm: syz-executor.0 Not tainted 5.7.0 #106
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xb7/0xfe lib/dump_stack.c:118
 panic+0x29e/0x692 kernel/panic.c:221
 __warn.cold+0x2f/0x3d kernel/panic.c:582
 report_bug+0x28b/0x2f0 lib/bug.c:195
 fixup_bug arch/x86/kernel/traps.c:105 [inline]
 fixup_bug arch/x86/kernel/traps.c:100 [inline]
 do_error_trap+0x10f/0x180 arch/x86/kernel/traps.c:197
 do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:216
 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:subflow_data_ready+0x1e6/0x290 net/mptcp/subflow.c:885
Code: 04 02 84 c0 74 06 0f 8e 91 00 00 00 41 0f b6 5e 48 31 ff 83 e3 18 89 de e8 37 ec 3d fe 84 db 0f 85 65 ff ff ff e8 fa ea 3d fe <0f> 0b e9 59 ff ff ff e8 ee ea 3d fe 48 89 ee 4c 89 ef e8 f3 77 ff
RSP: 0018:ffff88811b2099b0 EFLAGS: 00010206
RAX: ffff888111197000 RBX: 0000000000000000 RCX: ffffffff82fbc609
RDX: 0000000000000100 RSI: ffffffff82fbc616 RDI: 0000000000000001
RBP: ffff8881111bc800 R08: ffff888111197000 R09: ffffed10222a82af
R10: ffff888111541577 R11: ffffed10222a82ae R12: 1ffff11023641336
R13: ffff888111541000 R14: ffff88810fd4ca00 R15: ffff888111541570
 tcp_child_process+0x754/0x920 net/ipv4/tcp_minisocks.c:841
 tcp_v4_do_rcv+0x749/0x8b0 net/ipv4/tcp_ipv4.c:1642
 tcp_v4_rcv+0x2666/0x2e60 net/ipv4/tcp_ipv4.c:1999
 ip_protocol_deliver_rcu+0x29/0x1f0 net/ipv4/ip_input.c:204
 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
 NF_HOOK include/linux/netfilter.h:421 [inline]
 ip_local_deliver+0x2da/0x390 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:441 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:428 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:414 [inline]
 NF_HOOK include/linux/netfilter.h:421 [inline]
 ip_rcv+0xef/0x140 net/ipv4/ip_input.c:539
 __netif_receive_skb_one_core+0x197/0x1e0 net/core/dev.c:5268
 __netif_receive_skb+0x27/0x1c0 net/core/dev.c:5382
 process_backlog+0x1e5/0x6d0 net/core/dev.c:6226
 napi_poll net/core/dev.c:6671 [inline]
 net_rx_action+0x3e3/0xd70 net/core/dev.c:6739
 __do_softirq+0x18c/0x634 kernel/softirq.c:292
 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
 </IRQ>
 do_softirq.part.0+0x26/0x30 kernel/softirq.c:337
 do_softirq arch/x86/include/asm/preempt.h:26 [inline]
 __local_bh_enable_ip+0x46/0x50 kernel/softirq.c:189
 local_bh_enable include/linux/bottom_half.h:32 [inline]
 rcu_read_unlock_bh include/linux/rcupdate.h:723 [inline]
 ip_finish_output2+0x78a/0x19c0 net/ipv4/ip_output.c:229
 __ip_finish_output+0x471/0x720 net/ipv4/ip_output.c:306
 dst_output include/net/dst.h:435 [inline]
 ip_local_out+0x181/0x1e0 net/ipv4/ip_output.c:125
 __ip_queue_xmit+0x7a1/0x14e0 net/ipv4/ip_output.c:530
 __tcp_transmit_skb+0x19dc/0x35e0 net/ipv4/tcp_output.c:1238
 __tcp_send_ack.part.0+0x3c2/0x5b0 net/ipv4/tcp_output.c:3785
 __tcp_send_ack net/ipv4/tcp_output.c:3791 [inline]
 tcp_send_ack+0x7d/0xa0 net/ipv4/tcp_output.c:3791
 tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6040 [inline]
 tcp_rcv_state_process+0x36a4/0x49c2 net/ipv4/tcp_input.c:6209
 tcp_v4_do_rcv+0x343/0x8b0 net/ipv4/tcp_ipv4.c:1651
 sk_backlog_rcv include/net/sock.h:996 [inline]
 __release_sock+0x1ad/0x310 net/core/sock.c:2548
 release_sock+0x54/0x1a0 net/core/sock.c:3064
 inet_wait_for_connect net/ipv4/af_inet.c:594 [inline]
 __inet_stream_connect+0x57e/0xd50 net/ipv4/af_inet.c:686
 inet_stream_connect+0x53/0xa0 net/ipv4/af_inet.c:725
 mptcp_stream_connect+0x171/0x5f0 net/mptcp/protocol.c:1920
 __sys_connect_file net/socket.c:1854 [inline]
 __sys_connect+0x267/0x2f0 net/socket.c:1871
 __do_sys_connect net/socket.c:1882 [inline]
 __se_sys_connect net/socket.c:1879 [inline]
 __x64_sys_connect+0x6f/0xb0 net/socket.c:1879
 do_syscall_64+0xb7/0x3d0 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fb577d06469
Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
RSP: 002b:00007fb5783d5dd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 000000000068bfa0 RCX: 00007fb577d06469
RDX: 000000000000004d RSI: 0000000020000040 RDI: 0000000000000003
RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000041427c R14: 00007fb5783d65c0 R15: 0000000000000003
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 1 seconds..

Syzkaller reproducer:

# {Threaded:true Collide:true Repeat:true RepeatTimes:0 Procs:1 Sandbox:none Fault:false FaultCall:-1 FaultNth:0 Leak:false NetInjection:true NetDevices:true NetReset:true Cgroups:true BinfmtMisc:true CloseFDs:true KCSAN:false DevlinkPCI:false USB:false UseTmpDir:true HandleSegv:true Repro:false Trace:false}
r0 = perf_event_open(0x0, 0x0, 0x0, 0xffffffffffffffff, 0x0)
r1 = socket$inet_mptcp(0x2, 0x1, 0x106)
r2 = socket$inet_mptcp(0x2, 0x1, 0x106)
bind$inet(r2, &(0x7f00000013c0)={0x2, 0x4e20, @multicast2}, 0x10)
connect$inet(r2, &(0x7f0000000040)={0x2, 0x0, @loopback}, 0x10)
listen(r2, 0x0)
ioctl$int_in(0xffffffffffffffff, 0x5452, &(0x7f0000000000)=0x7)
connect$inet(r1, &(0x7f0000000040)={0x2, 0x4e20, @loopback}, 0x4d)
dup3(r0, r2, 0x0)
socket$inet_icmp(0x2, 0x2, 0x1)

kernel-config:
CURRENT_CONFIG.txt

[EDIT June 22nd: Update HEAD & kconfig)

mptcp_join selftests are failing since "mptcp: fix race in subflow_data_ready()"

Since d47a721 ("mptcp: fix race in subflow_data_ready()"), mptcp_join.sh selftests are failing:

00:17:07.414 # selftests: net/mptcp: mptcp_join.sh
00:17:10.118 # Created /tmp/tmp.8zVGZJ3xfC (size 1 KB) containing data sent by client
00:17:10.137 # Created /tmp/tmp.IDoDVnV14p (size 1 KB) containing data sent by server
00:17:12.630 # no JOIN                              syn[ ok ] - synack[ ok ] - ack[ ok ]
00:17:17.699 # single subflow, limited by client    syn[ ok ] - synack[ ok ] - ack[ ok ]
00:17:22.683 # single subflow, limited by server    syn[ ok ] - synack[ ok ] - ack[ ok ]
00:17:27.012 # read: Resource temporarily unavailable
00:17:27.419 #  client exit code 0, server 3
00:17:27.420 # \nnetns ns1-0-OalavP socket stat for 10003:
00:17:27.499 # State      Recv-Q   Send-Q       Local Address:Port        Peer Address:Port
00:17:27.506 # TIME-WAIT  0        0                 10.0.1.1:10003           10.0.1.2:36698    timer:(timewait,59sec,0)
00:17:27.511 #
00:17:27.520 # TIME-WAIT  0        0                 10.0.1.1:10003           10.0.3.2:41033    timer:(timewait,59sec,0)
00:17:27.525 #
00:17:27.530 # \nnetns ns2-0-OalavP socket stat for 10003:
00:17:27.602 # State   Recv-Q    Send-Q        Local Address:Port         Peer Address:Port
00:17:27.959 # single subflow                       syn[ ok ] - synack[ ok ] - ack[ ok ]
00:17:32.198 # read: Resource temporarily unavailable
00:17:32.605 #  client exit code 0, server 3
00:17:32.606 # \nnetns ns1-0-ud2ZCf socket stat for 10004:
00:17:32.679 # State      Recv-Q   Send-Q       Local Address:Port        Peer Address:Port
00:17:32.686 # TIME-WAIT  0        0                 10.0.1.1:10004           10.0.1.2:39122    timer:(timewait,59sec,0)
00:17:32.691 #
00:17:32.699 # TIME-WAIT  0        0                 10.0.1.1:10004           10.0.2.2:56421    timer:(timewait,59sec,0)
00:17:32.705 #
00:17:32.713 # TIME-WAIT  0        0                 10.0.1.1:10004           10.0.3.2:57195    timer:(timewait,59sec,0)
00:17:32.719 #
00:17:32.723 # \nnetns ns2-0-ud2ZCf socket stat for 10004:
00:17:32.780 # State   Recv-Q    Send-Q        Local Address:Port         Peer Address:Port
00:17:33.137 # multiple subflows                    syn[ ok ] - synack[ ok ] - ack[ ok ]
00:17:37.506 # read: Resource temporarily unavailable
00:17:37.913 #  client exit code 0, server 3
00:17:37.914 # \nnetns ns1-0-g57Pes socket stat for 10005:
00:17:37.993 # State      Recv-Q   Send-Q       Local Address:Port        Peer Address:Port
00:17:38.000 # TIME-WAIT  0        0                 10.0.1.1:10005           10.0.3.2:56155    timer:(timewait,59sec,0)
00:17:38.008 #
00:17:38.013 # TIME-WAIT  0        0                 10.0.1.1:10005           10.0.1.2:56674    timer:(timewait,59sec,0)
00:17:38.019 #
00:17:38.024 # \nnetns ns2-0-g57Pes socket stat for 10005:
00:17:38.087 # State   Recv-Q    Send-Q        Local Address:Port         Peer Address:Port
00:17:38.455 # multiple subflows, limited by server syn[ ok ] - synack[ ok ] - ack[ ok ]
00:17:43.516 # unused signal address                syn[ ok ] - synack[ ok ] - ack[ ok ]
00:17:47.876 # read: Resource temporarily unavailable
00:17:48.283 #  client exit code 0, server 3
00:17:48.284 # \nnetns ns1-0-VAChem socket stat for 10007:
00:17:48.363 # State      Recv-Q   Send-Q       Local Address:Port        Peer Address:Port
00:17:48.370 # TIME-WAIT  0        0                 10.0.2.1:10007           10.0.2.2:35795    timer:(timewait,59sec,0)
00:17:48.374 #
00:17:48.383 # TIME-WAIT  0        0                 10.0.1.1:10007           10.0.1.2:53480    timer:(timewait,59sec,0)
00:17:48.389 #
00:17:48.393 # \nnetns ns2-0-VAChem socket stat for 10007:
00:17:48.456 # State   Recv-Q    Send-Q        Local Address:Port         Peer Address:Port
00:17:48.812 # signal address                       syn[ ok ] - synack[ ok ] - ack[ ok ]
00:17:53.113 # read: Resource temporarily unavailable
00:17:53.520 #  client exit code 0, server 3
00:17:53.521 # \nnetns ns1-0-duzY1a socket stat for 10008:
00:17:53.587 # State      Recv-Q   Send-Q       Local Address:Port        Peer Address:Port
00:17:53.594 # TIME-WAIT  0        0                 10.0.2.1:10008           10.0.2.2:43809    timer:(timewait,59sec,0)
00:17:53.600 #
00:17:53.608 # TIME-WAIT  0        0                 10.0.1.1:10008           10.0.3.2:42953    timer:(timewait,59sec,0)
00:17:53.614 #
00:17:53.621 # TIME-WAIT  0        0                 10.0.1.1:10008           10.0.1.2:51920    timer:(timewait,59sec,0)
00:17:53.627 #
00:17:53.631 # \nnetns ns2-0-duzY1a socket stat for 10008:
00:17:53.689 # State   Recv-Q    Send-Q        Local Address:Port         Peer Address:Port
00:17:54.032 # subflow and signal                   syn[ ok ] - synack[ ok ] - ack[ ok ]
00:17:58.478 # read: Resource temporarily unavailable
00:17:58.886 #  client exit code 0, server 3
00:17:58.886 # \nnetns ns1-0-tkY1Hw socket stat for 10009:
00:17:58.956 # State      Recv-Q   Send-Q       Local Address:Port        Peer Address:Port
00:17:58.963 # TIME-WAIT  0        0                 10.0.1.1:10009           10.0.3.2:46707    timer:(timewait,59sec,0)
00:17:58.969 #
00:17:58.977 # TIME-WAIT  0        0                 10.0.1.1:10009           10.0.4.2:56685    timer:(timewait,59sec,0)
00:17:58.984 #
00:17:58.992 # TIME-WAIT  0        0                 10.0.2.1:10009           10.0.2.2:53505    timer:(timewait,59sec,0)
00:17:58.999 #
00:17:59.012 # TIME-WAIT  0        0                 10.0.1.1:10009           10.0.1.2:54348    timer:(timewait,59sec,0)
00:17:59.017 #
00:17:59.022 # \nnetns ns2-0-tkY1Hw socket stat for 10009:
00:17:59.065 # State   Recv-Q    Send-Q        Local Address:Port         Peer Address:Port
00:17:59.421 # multiple subflows and signal         syn[ ok ] - synack[ ok ] - ack[ ok ]
00:17:59.599 not ok 3 selftests: net/mptcp: mptcp_join.sh # exit=1

[syzkaller] INFO: rcu detected stall in ip_rcv

(don't remember the HEAD I am currently running syzkaller on :-/ )

rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 	0-....: (1 GPs behind) idle=e3a/1/0x4000000000000002 softirq=7243/7245 fqs=23773 
	(t=100000 jiffies g=13181 q=112)
NMI backtrace for cpu 0
CPU: 0 PID: 2584 Comm: syz-executor420 Not tainted 5.7.0-rc6 #84
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xb7/0xfe lib/dump_stack.c:118
 nmi_cpu_backtrace.cold+0x19/0x84 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x193/0x198 lib/nmi_backtrace.c:62
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_dump_cpu_stacks+0xed/0x130 kernel/rcu/tree_stall.h:254
 print_cpu_stall kernel/rcu/tree_stall.h:475 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:549 [inline]
 rcu_pending kernel/rcu/tree.c:3225 [inline]
 rcu_sched_clock_irq.cold+0x310/0x57c kernel/rcu/tree.c:2296
 update_process_times+0x25/0x60 kernel/time/timer.c:1726
 tick_sched_handle+0x63/0xe0 kernel/time/tick-sched.c:176
 tick_sched_timer+0x3e/0xd0 kernel/time/tick-sched.c:1320
 __run_hrtimer kernel/time/hrtimer.c:1520 [inline]
 __hrtimer_run_queues+0x247/0x590 kernel/time/hrtimer.c:1584
 hrtimer_interrupt+0x1e6/0x3f0 kernel/time/hrtimer.c:1646
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1113 [inline]
 smp_apic_timer_interrupt+0x86/0x1e0 arch/x86/kernel/apic/apic.c:1138
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:26 [inline]
RIP: 0010:check_kcov_mode kernel/kcov.c:153 [inline]
RIP: 0010:write_comp_data+0x9/0x70 kernel/kcov.c:208
Code: 80 a4 08 00 00 48 8b 11 48 83 c2 01 48 39 d0 76 07 48 89 34 d1 48 89 11 c3 0f 1f 84 00 00 00 00 00 65 4c 8b 04 25 00 0d 02 00 <65> 8b 05 88 99 dd 7e a9 00 01 1f 00 75 51 41 8b 80 a0 08 00 00 83
RSP: 0018:ffff88811b409390 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: ffff88811057cd68 RCX: ffffffff8274917a
RDX: 0000000051910d22 RSI: 00000000928d5e77 RDI: 0000000000000004
RBP: 00000000928d5e77 R08: ffff8881195aaa00 R09: ffffed102368126f
R10: 0000000000000003 R11: ffffed102368126e R12: 0000000051910d22
R13: ffff88811057cd00 R14: ffff888110412638 R15: ffff888110412630
 __token_lookup_msk net/mptcp/token.c:75 [inline]
 __token_bucket_busy+0xea/0x150 net/mptcp/token.c:83
 mptcp_token_new_request+0x98/0x230 net/mptcp/token.c:115
 subflow_init_req+0x1c8/0x6f0 net/mptcp/subflow.c:157
 tcp_conn_request+0x6a7/0x15e0 net/ipv4/tcp_input.c:6653
 subflow_v4_conn_request+0x60/0x90 net/mptcp/subflow.c:316
 tcp_rcv_state_process+0x638/0x25aa net/ipv4/tcp_input.c:6195
 tcp_v4_do_rcv+0x1ed/0x480 net/ipv4/tcp_ipv4.c:1650
 tcp_v4_rcv+0x1b67/0x1c00 net/ipv4/tcp_ipv4.c:1998
 ip_protocol_deliver_rcu+0x42/0x380 net/ipv4/ip_input.c:204
 ip_local_deliver_finish+0xc3/0xe0 net/ipv4/ip_input.c:231
 NF_HOOK include/linux/netfilter.h:307 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_local_deliver+0x162/0x220 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:441 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:428 [inline]
 ip_rcv_finish+0x79/0x90 net/ipv4/ip_input.c:414
 NF_HOOK include/linux/netfilter.h:307 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_rcv+0x19d/0x1b0 net/ipv4/ip_input.c:539
 __netif_receive_skb_one_core+0x156/0x180 net/core/dev.c:5268
 __netif_receive_skb+0x29/0xd0 net/core/dev.c:5382
 process_backlog+0x133/0x2d0 net/core/dev.c:6214
 napi_poll net/core/dev.c:6659 [inline]
 net_rx_action+0x2c0/0x7b0 net/core/dev.c:6727
 __do_softirq+0x10d/0x3be kernel/softirq.c:292
 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
 </IRQ>
 do_softirq.part.0+0x26/0x30 kernel/softirq.c:337
 do_softirq arch/x86/include/asm/preempt.h:26 [inline]
 __local_bh_enable_ip+0x46/0x50 kernel/softirq.c:189
 local_bh_enable include/linux/bottom_half.h:32 [inline]
 rcu_read_unlock_bh include/linux/rcupdate.h:690 [inline]
 ip_finish_output2+0x4a9/0xd60 net/ipv4/ip_output.c:229
 __ip_finish_output net/ipv4/ip_output.c:306 [inline]
 __ip_finish_output+0x1dc/0x420 net/ipv4/ip_output.c:288
 ip_finish_output net/ipv4/ip_output.c:316 [inline]
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip_output+0x12b/0x240 net/ipv4/ip_output.c:430
 dst_output include/net/dst.h:435 [inline]
 ip_local_out+0x6b/0x80 net/ipv4/ip_output.c:125
 __ip_queue_xmit+0x372/0x9b0 net/ipv4/ip_output.c:530
 __tcp_transmit_skb+0xdb6/0x1a60 net/ipv4/tcp_output.c:1238
 tcp_transmit_skb net/ipv4/tcp_output.c:1254 [inline]
 tcp_connect+0x1281/0x1820 net/ipv4/tcp_output.c:3671
 tcp_v4_connect+0xb02/0xc50 net/ipv4/tcp_ipv4.c:311
 __inet_stream_connect+0x227/0x7f0 net/ipv4/af_inet.c:658
 inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:722
 mptcp_stream_connect+0x12e/0x220 net/mptcp/protocol.c:1807
 __sys_connect_file+0xcf/0xe0 net/socket.c:1854
 __sys_connect+0x160/0x190 net/socket.c:1871
 __do_sys_connect net/socket.c:1882 [inline]
 __se_sys_connect net/socket.c:1879 [inline]
 __x64_sys_connect+0x3e/0x50 net/socket.c:1879
 do_syscall_64+0x8a/0x290 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fbd64788469
Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe13606918 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 000000000005638d RCX: 00007fbd64788469
RDX: 000000000000004d RSI: 0000000020000040 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000400f90 R09: 0000000000400f90
R10: 0000000000400f90 R11: 0000000000000246 R12: 0000000000400bb2
R13: 00007ffe13606a20 R14: 0000000000000000 R15: 0000000000000000

syz-repro:

# {Threaded:false Collide:false Repeat:true RepeatTimes:0 Procs:1 Sandbox: Fault:false FaultCall:-1 FaultNth:0 Leak:false NetInjection:false NetDevices:false NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false DevlinkPCI:false UseTmpDir:false HandleSegv:false Repro:false Trace:false}
r0 = socket$inet_mptcp(0x2, 0x1, 0x106)
r1 = socket$inet_mptcp(0x2, 0x1, 0x106)
bind$inet(r1, &(0x7f00000013c0)={0x2, 0x4e20, @multicast2}, 0x10)
connect$inet(r1, &(0x7f0000000040)={0x2, 0x0, @loopback}, 0x10)
listen(r1, 0x3)
connect$inet(r0, &(0x7f0000000040)={0x2, 0x4e20, @loopback}, 0x4d)

C-repro:

// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <dirent.h>
#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/prctl.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

static void sleep_ms(uint64_t ms)
{
  usleep(ms * 1000);
}

static uint64_t current_time_ms(void)
{
  struct timespec ts;
  if (clock_gettime(CLOCK_MONOTONIC, &ts))
    exit(1);
  return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
}

static bool write_file(const char* file, const char* what, ...)
{
  char buf[1024];
  va_list args;
  va_start(args, what);
  vsnprintf(buf, sizeof(buf), what, args);
  va_end(args);
  buf[sizeof(buf) - 1] = 0;
  int len = strlen(buf);
  int fd = open(file, O_WRONLY | O_CLOEXEC);
  if (fd == -1)
    return false;
  if (write(fd, buf, len) != len) {
    int err = errno;
    close(fd);
    errno = err;
    return false;
  }
  close(fd);
  return true;
}

static void kill_and_wait(int pid, int* status)
{
  kill(-pid, SIGKILL);
  kill(pid, SIGKILL);
  int i;
  for (i = 0; i < 100; i++) {
    if (waitpid(-1, status, WNOHANG | __WALL) == pid)
      return;
    usleep(1000);
  }
  DIR* dir = opendir("/sys/fs/fuse/connections");
  if (dir) {
    for (;;) {
      struct dirent* ent = readdir(dir);
      if (!ent)
        break;
      if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
        continue;
      char abort[300];
      snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
               ent->d_name);
      int fd = open(abort, O_WRONLY);
      if (fd == -1) {
        continue;
      }
      if (write(fd, abort, 1) < 0) {
      }
      close(fd);
    }
    closedir(dir);
  } else {
  }
  while (waitpid(-1, status, __WALL) != pid) {
  }
}

static void setup_test()
{
  prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
  setpgrp();
  write_file("/proc/self/oom_score_adj", "1000");
}

static void execute_one(void);

#define WAIT_FLAGS __WALL

static void loop(void)
{
  int iter;
  for (iter = 0;; iter++) {
    int pid = fork();
    if (pid < 0)
      exit(1);
    if (pid == 0) {
      setup_test();
      execute_one();
      exit(0);
    }
    int status = 0;
    uint64_t start = current_time_ms();
    for (;;) {
      if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
        break;
      sleep_ms(1);
      if (current_time_ms() - start < 5 * 1000)
        continue;
      kill_and_wait(pid, &status);
      break;
    }
  }
}

uint64_t r[2] = {0xffffffffffffffff, 0xffffffffffffffff};

void execute_one(void)
{
  intptr_t res = 0;
  res = syscall(__NR_socket, 2ul, 1ul, 0x106);
  if (res != -1)
    r[0] = res;
  res = syscall(__NR_socket, 2ul, 1ul, 0x106);
  if (res != -1)
    r[1] = res;
  *(uint16_t*)0x200013c0 = 2;
  *(uint16_t*)0x200013c2 = htobe16(0x4e20);
  *(uint32_t*)0x200013c4 = htobe32(0xe0000002);
  syscall(__NR_bind, r[1], 0x200013c0ul, 0x10ul);
  *(uint16_t*)0x20000040 = 2;
  *(uint16_t*)0x20000042 = htobe16(0);
  *(uint32_t*)0x20000044 = htobe32(0x7f000001);
  syscall(__NR_connect, r[1], 0x20000040ul, 0x10ul);
  syscall(__NR_listen, r[1], 3);
  *(uint16_t*)0x20000040 = 2;
  *(uint16_t*)0x20000042 = htobe16(0x4e20);
  *(uint32_t*)0x20000044 = htobe32(0x7f000001);
  syscall(__NR_connect, r[0], 0x20000040ul, 0x4dul);
}
int main(void)
{
  syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 3ul, 0x32ul, -1, 0ul);
  loop();
  return 0;
}

sort-out {set,get}sockopt handling

{set,get}sockopt should be usable even for non-fallback socket.

A possible implementation can be:

sort all possible setsockopt (SOCKET, IP, TCP level) values in different sets:

  • options applied (set) to all subflows and read (get) from the first one
    (e.g. most/all??? IP level options)

  • options applied/read only to the first subflow
    (???TCP_INFO, SO_BINDTODEVICE???)

  • options available only for fallback socket (or only when a single sub-flow is available)
    (???TCP_INFO, SO_BINDTODEVICE???, TCP_CC_INFO, TCP_REPAIR*)

[possibly get rid of one of the last 2 sets]

Add an MPTCP_SUBFLOW_NR getsockopt to fetch the current number of subflows and a MPTCP_SUBFLOW {get/set}sockopt to get/set on the specified subflow the specified option,

unblocking connect fails most of the time

mptcp_stream_connect() does not work well in unblocking mode. The usage pattern is:

connect() // -EINPROGRESS select()/poll() connect() // here we get -EINVAL most of the time
the problem is that mptcp_stream_connect() does not track the 'connecting' status and on the 2nd invocation, if the MPC hanshake is completed, it tries again to create the main/first subflow, failing.
Solution is set 'sock->state', too and handle SS_CONNECTING correctly.

fix fallback to TCP

fallback to TCP after the 3w handshake is not currently functional server side - as the msk lacks the required 'subflow' field for that.

Additionally we don't support infinite mapping, not check for later fallback.

Support for infinite mapping will allow fixing the above and will simplify the recvmsg and poll fallback path - sendmsg still needs explicit fallback check.

refactor token container

The radix_tree lock is contended by every incoming connection, currently requires a lock for lookup and requires a lock for traversing.

We need fast/uncontended traversing for #19 and #20

Idea is to replace the radix tree with a large hash table

[syzkaller] WARNING in mptcp_incoming_options

No reproducer yet...

------------[ cut here ]------------
WARNING: CPU: 1 PID: 30472 at net/mptcp/options.c:720 check_fully_established net/mptcp/options.c:720 [inline]
WARNING: CPU: 1 PID: 30472 at net/mptcp/options.c:720 mptcp_incoming_options+0xb89/0xba0 net/mptcp/options.c:826
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 30472 Comm: syz-executor.6 Not tainted 5.6.0 #68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x11d/0x181 lib/dump_stack.c:118
 panic+0x210/0x63b kernel/panic.c:221
 __warn.cold+0x36/0x3f kernel/panic.c:582
 report_bug+0x1f4/0x230 lib/bug.c:195
 fixup_bug arch/x86/kernel/traps.c:175 [inline]
 fixup_bug arch/x86/kernel/traps.c:170 [inline]
 do_error_trap+0x97/0xc0 arch/x86/kernel/traps.c:267
 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:check_fully_established net/mptcp/options.c:720 [inline]
RIP: 0010:mptcp_incoming_options+0xb89/0xba0 net/mptcp/options.c:826
Code: d0 fe 48 8b 7c 24 10 e8 45 af e7 ff 48 8b 7c 24 18 e8 4b 3e d8 fe 41 0f b6 44 24 48 88 44 24 28 e9 60 f5 ff ff e8 a7 f7 d0 fe <0f> 0b e9 28 fe ff ff e8 9b 74 b8 fe 90 66 2e 0f 1f 84 00 00 00 00
RSP: 0018:ffffc900135ab950 EFLAGS: 00010293
RAX: ffff88808c232180 RBX: ffff88810731d078 RCX: ffffffff82685e4f
RDX: 0000000000000000 RSI: ffffffff82686029 RDI: 0000000000000001
RBP: ffffc900135ab9d0 R08: ffff88808c232180 R09: 0000888101fde840
R10: 0000888101fde849 R11: 0000888101fde843 R12: ffff888101fde800
R13: ffff888139f3e500 R14: 0000000000000040 R15: 00000000c6d2182c
 tcp_data_queue+0x73d/0x2150 net/ipv4/tcp_input.c:4777
 tcp_rcv_state_process+0x726/0x23e1 net/ipv4/tcp_input.c:6387
 tcp_v4_do_rcv+0x220/0x510 net/ipv4/tcp_ipv4.c:1643
 sk_backlog_rcv include/net/sock.h:996 [inline]
 __release_sock+0x135/0x1e0 net/core/sock.c:2460
 tcp_close+0x438/0xa30 net/ipv4/tcp.c:2438
 inet_release+0x86/0x100 net/ipv4/af_inet.c:427
 __sock_release+0x12b/0x160 net/socket.c:605
 sock_release+0x21/0x30 net/socket.c:625
 __mptcp_close_ssk net/mptcp/protocol.c:1069 [inline]
 mptcp_close+0x1ca/0x500 net/mptcp/protocol.c:1292
 inet_release+0x86/0x100 net/ipv4/af_inet.c:427
 __sock_release+0x85/0x160 net/socket.c:605
 sock_close+0x24/0x30 net/socket.c:1283
 __fput+0x1eb/0x4f0 fs/file_table.c:280
 ____fput+0x1f/0x30 fs/file_table.c:313
 task_work_run+0xc2/0x130 kernel/task_work.c:123
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_usermode_loop+0x228/0x230 arch/x86/entry/common.c:165
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
 do_syscall_64+0x39d/0x3f0 arch/x86/entry/common.c:305
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f262df7528d
Code: c1 20 00 00 75 10 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 37 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00007fff2660f5a0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00007f262df7528d
RDX: 0000000000000000 RSI: 0000000000001044 RDI: 0000000000000003
RBP: 0000000000000004 R08: 0000000000670a00 R09: 00000000b0077044
R10: 00000000b0077048 R11: 0000000000000293 R12: 00000000000001f4
R13: 000000000009b8f8 R14: 0000000000670a00 R15: 0000000000000bb8
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 1 seconds..

[syzkaller] INFO: task hung in lock_sock_nested

Head is:

61820897eea1 ("add mptcp_token_destroy")  (HEAD) (5 minutes ago) <Christoph Paasch>
b0d0a0e4a622 ("Squash-to: "mptcp: refactor token container."")  (5 minutes ago) <Paolo Abeni>
f20592a47bdd ("Cleanup")  (5 minutes ago) <Christoph Paasch>
5d3e110f5b3f ("Paolos pastebin")  (5 minutes ago) <Christoph Paasch>
39e00acbdf14 ("FIX inet_csk_prepare_for_destroy_sock")  (5 minutes ago) <Christoph Paasch>
5c71d0591dde ("net: mptcp: improve fallback to TCP")  (5 minutes ago) <Davide Caratti>
f2ddcb183129 ("mptcp: add receive buffer auto-tuning")  (5 minutes ago) <Florian Westphal>
425515bac500 ("[DO-NOT-MERGE] mptcp: enabled by default")  (tag: export/20200604T011812, mptcp_net-next/export) (16 hours ago) <Matthieu Baerts>
a6e5b8cb5b3d ("mptcp: introduce token KUNIT self-tests")  (16 hours ago) <Paolo Abeni>
c8d7079853ff ("mptcp: move crypto test to KUNIT")  (16 hours ago) <Paolo Abeni>
177c4645e9e4 ("mptcp: refactor token container.")  (16 hours ago) <Paolo Abeni>
c28bd2bcde26 ("mptcp: add __init annotation on setup functions")  (16 hours ago) <Paolo Abeni>
2ef1779dfe17 ("bpf: fix unused-var without NETDEVICES")  (16 hours ago) <Matthieu Baerts>
cb8e59cc8720 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")  (netnext/master, mptcp_net-next/net-next) (18 hours ago) <Linus Torvalds>
TCP: request_sock_subflow: Possible SYN flooding on port 20000. Sending cookies.  Check SNMP counters.
INFO: task syz-executor702:1686 blocked for more than 143 seconds.
      Not tainted 5.7.0 #92
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor702 D    0  1686   1680 0x00000000
Call Trace:
 context_switch kernel/sched/core.c:3430 [inline]
 __schedule+0x1e5/0x580 kernel/sched/core.c:4156
 schedule+0x45/0xb0 kernel/sched/core.c:4231
 __lock_sock+0x82/0xd0 net/core/sock.c:2524
 lock_sock_nested+0x69/0x70 net/core/sock.c:3049
 lock_sock include/net/sock.h:1576 [inline]
 inet_stream_connect+0x23/0x50 net/ipv4/af_inet.c:721
 mptcp_stream_connect+0x89/0x200 net/mptcp/protocol.c:1919
 __sys_connect_file+0x98/0xa0 net/socket.c:1854
 __sys_connect+0x109/0x140 net/socket.c:1871
 __do_sys_connect net/socket.c:1882 [inline]
 __se_sys_connect net/socket.c:1879 [inline]
 __x64_sys_connect+0x1a/0x20 net/socket.c:1879
 do_syscall_64+0x75/0x220 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fa8821cb469

Syzkaller reproducer:

# {Threaded:false Collide:false Repeat:false RepeatTimes:0 Procs:1 Sandbox: Fault:false FaultCall:-1 FaultNth:0 Leak:false NetInjection:false NetDevices:false NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false DevlinkPCI:false UseTmpDir:false HandleSegv:false Repro:false Trace:false}
r0 = socket$inet_mptcp(0x2, 0x1, 0x106)
bind$inet(r0, &(0x7f00000013c0)={0x2, 0x4e20}, 0x10)
listen(r0, 0x0)
r1 = socket$inet_mptcp(0x2, 0x1, 0x106)
connect$inet(r1, &(0x7f0000000040)={0x2, 0x4e20, @loopback}, 0x4d)
r2 = accept(r0, 0x0, 0x0)
connect$inet(r2, &(0x7f0000000000)={0x2, 0x4e22, @loopback}, 0x10)

C reproducer:

// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <endian.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>

uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};

int main(void)
{
  syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 3ul, 0x32ul, -1, 0ul);
  intptr_t res = 0;
  res = syscall(__NR_socket, 2ul, 1ul, 0x106);
  if (res != -1)
    r[0] = res;
  *(uint16_t*)0x200013c0 = 2;
  *(uint16_t*)0x200013c2 = htobe16(0x4e20);
  *(uint32_t*)0x200013c4 = htobe32(0);
  syscall(__NR_bind, r[0], 0x200013c0ul, 0x10ul);
  syscall(__NR_listen, r[0], 0);
  res = syscall(__NR_socket, 2ul, 1ul, 0x106);
  if (res != -1)
    r[1] = res;
  *(uint16_t*)0x20000040 = 2;
  *(uint16_t*)0x20000042 = htobe16(0x4e20);
  *(uint32_t*)0x20000044 = htobe32(0x7f000001);
  syscall(__NR_connect, r[1], 0x20000040ul, 0x4dul);
  res = syscall(__NR_accept, r[0], 0ul, 0ul);
  if (res != -1)
    r[2] = res;
  *(uint16_t*)0x20000000 = 2;
  *(uint16_t*)0x20000002 = htobe16(0x4e22);
  *(uint32_t*)0x20000004 = htobe32(0x7f000001);
  syscall(__NR_connect, r[2], 0x20000000ul, 0x10ul);
  return 0;
}

CURRENT_CONFIG.txt

[Edit 06/05/2020: Updated HEAD, crashtrace and reproducers]

[syzkaller] KASAN: use-after-free Read in inet_diag_lock_handler'

HEAD is at:

de8ea51c6cc1 ("selftests/mptcp: add diag interface tests") (HEAD) (13 minutes ago)
b2976e289e79 ("mptcp: add MPTCP socket diag interface") (13 minutes ago)
4742bdcfbb6c ("mptcp: add msk interations helper") (13 minutes ago)
acd1354bf282 ("inet_diag: support for wider protocol numbers") (13 minutes ago)
b8654ae35a36 ("subflow: do not create child subflow for fallback MP_JOIN") (13 minutes ago)
e26366200f9e ("subflow: introduce and use mptcp_can_accept_new_subflow()") (13 minutes ago)
5f6242f61b8a ("subflow: use rsk_ops->send_reset()") (13 minutes ago)
9582f476a82f ("subflow: explicitly check for plain tcp rsk") (13 minutes ago)
4bdbaf1cbd8e ("mptcp: cleanup subflow_finish_connect()") (13 minutes ago)
f8c46af ("DO-NOT-MERGE: mptcp: enabled by default") (tag: export/20200702T083013, mptcp_net-next/export) (8 hours ago)
99e7935 ("DO-NOT-MERGE: mptcp: use kmalloc on kasan build") (8 hours ago)
24c3ef7 ("DO-NOT-MERGE: fsnotify: suppress access/modify events on stream files") (8 hours ago)
bfd8737 ("mptcp: fix DSS map generation on fin retransmission") (8 hours ago)
3f88eb3 ("mptcp: support IPV6_V6ONLY setsockopt") (8 hours ago)
8c7f62e ("mptcp: add REUSEADDR/REUSEPORT support") (8 hours ago)
9dc4753 ("net: use mptcp setsockopt function for SOL_SOCKET on mptcp sockets") (8 hours ago)
04d70af ("mptcp: use mptcp worker for path management") (8 hours ago)
6b6caed ("selftests/mptcp: Capture pcap on both sender and receiver") (8 hours ago)
23212a7 ("Merge branch 'mptcp-add-receive-buffer-auto-tuning'") (mptcp_net-next/net-next) (16 hours ago)

Crash:

==================================================================
BUG: KASAN: use-after-free in inet_diag_lock_handler.part.0+0xae/0xb0 net/ipv4/inet_diag.c:58
Read of size 8 at addr ffff888118a1f808 by task syz-executor863/1271

CPU: 0 PID: 1271 Comm: syz-executor863 Not tainted 5.8.0-rc2 #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xbe/0xfe lib/dump_stack.c:118
 print_address_description.constprop.0+0x3a/0x60 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 inet_diag_lock_handler.part.0+0xae/0xb0 net/ipv4/inet_diag.c:58
 inet_diag_lock_handler net/ipv4/inet_diag.c:55 [inline]
 inet_diag_cmd_exact+0x1a1/0x2f0 net/ipv4/inet_diag.c:578
 inet_diag_get_exact_compat+0x226/0x2a0 net/ipv4/inet_diag.c:1267
 inet_diag_rcv_msg_compat+0x15b/0x2c0 net/ipv4/inet_diag.c:1289
 __sock_diag_cmd net/core/sock_diag.c:235 [inline]
 sock_diag_rcv_msg+0x2ca/0x3e0 net/core/sock_diag.c:264
 netlink_rcv_skb+0x156/0x420 net/netlink/af_netlink.c:2469
 sock_diag_rcv+0x24/0x40 net/core/sock_diag.c:275
 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
 netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1329
 netlink_sendmsg+0x809/0xd50 net/netlink/af_netlink.c:1918
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg net/socket.c:672 [inline]
 ____sys_sendmsg+0x774/0x8e0 net/socket.c:2363
 ___sys_sendmsg+0xff/0x170 net/socket.c:2417
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2450
 do_syscall_64+0x3e/0x70 arch/x86/entry/common.c:359
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7ff0c5cf0469
Code: Bad RIP value.
RSP: 002b:00007ffdac0e7988 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007ffdac0e7998 RCX: 00007ff0c5cf0469
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000401110
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401034
R13: 00007ffdac0e7b20 R14: 0000000000000000 R15: 0000000000000000

Allocated by task 1235:
 save_stack+0x1b/0x40 mm/kasan/common.c:48
 set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc mm/kasan/common.c:494 [inline]
 __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467
 slab_post_alloc_hook mm/slab.h:586 [inline]
 slab_alloc_node mm/slub.c:2824 [inline]
 slab_alloc mm/slub.c:2832 [inline]
 kmem_cache_alloc+0xb7/0x220 mm/slub.c:2837
 __build_skb+0x21/0x60 net/core/skbuff.c:311
 build_skb+0x1a/0x1d0 net/core/skbuff.c:327
 e1000_clean_rx_irq+0x9e5/0x1340 drivers/net/ethernet/intel/e1000/e1000_main.c:4378
 e1000_clean+0x8cb/0x1b20 drivers/net/ethernet/intel/e1000/e1000_main.c:3800
 napi_poll net/core/dev.c:6690 [inline]
 net_rx_action+0x402/0xde0 net/core/dev.c:6760
 __do_softirq+0x18c/0x61a kernel/softirq.c:292

Freed by task 1235:
 save_stack+0x1b/0x40 mm/kasan/common.c:48
 set_track mm/kasan/common.c:56 [inline]
 kasan_set_free_info mm/kasan/common.c:316 [inline]
 __kasan_slab_free+0x12f/0x180 mm/kasan/common.c:455
 slab_free_hook mm/slub.c:1474 [inline]
 slab_free_freelist_hook mm/slub.c:1507 [inline]
 slab_free mm/slub.c:3072 [inline]
 kmem_cache_free+0x80/0x290 mm/slub.c:3088
 napi_skb_finish net/core/dev.c:5997 [inline]
 napi_gro_receive+0x37c/0x410 net/core/dev.c:6020
 e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4000 [inline]
 e1000_clean_rx_irq+0x894/0x1340 drivers/net/ethernet/intel/e1000/e1000_main.c:4455
 e1000_clean+0x8cb/0x1b20 drivers/net/ethernet/intel/e1000/e1000_main.c:3800
 napi_poll net/core/dev.c:6690 [inline]
 net_rx_action+0x402/0xde0 net/core/dev.c:6760
 __do_softirq+0x18c/0x61a kernel/softirq.c:292

The buggy address belongs to the object at ffff888118a1f780
 which belongs to the cache skbuff_head_cache of size 216
The buggy address is located 136 bytes inside of
 216-byte region [ffff888118a1f780, ffff888118a1f858)
The buggy address belongs to the page:
page:ffffea00046287c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x200000000000200(slab)
raw: 0200000000000200 dead000000000100 dead000000000122 ffff88811aa2ea00
raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888118a1f700: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888118a1f780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888118a1f800: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
                      ^
 ffff888118a1f880: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
 ffff888118a1f900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

syzkaller repro:

# {Threaded:false Collide:false Repeat:false RepeatTimes:0 Procs:1 Sandbox: Fault:false FaultCall:-1 FaultNth:0 Leak:false NetInjection:false NetDevices:false NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false DevlinkPCI:false USB:false UseTmpDir:false HandleSegv:true Repro:false Trace:false}
r0 = syz_genetlink_get_family_id$devlink(0x0)
sendmsg$DEVLINK_CMD_PORT_SPLIT(0xffffffffffffffff, &(0x7f0000000300)={0x0, 0x0, &(0x7f00000002c0)={&(0x7f0000000200)=ANY=[@ANYBLOB="a0000000", @ANYRES16=r0, @ANYBLOB="000426bd7000fedbdf25090000000e0001006e657474657673696d0000000f0002006e657464657673696d30000008fd0300030000000800090008000000080001007063690011000200303030303a30303a31302e30000000000800030001ffffff08000900080000000e0001006e657464657673696d0000000f0002006e657464657673696d30000008000300030000000800090007000000"], 0xa0}, 0x1, 0x0, 0x0, 0x40}, 0x400c814)
r1 = socket$netlink(0x10, 0x3, 0x4)
r2 = syz_genetlink_get_family_id$devlink(&(0x7f00000000c0)='devlink\x00')
sendmsg$DEVLINK_CMD_PORT_SET(r1, &(0x7f0000000140)={0x0, 0x0, &(0x7f0000000100)={&(0x7f0000000200)={0x6c, r2, 0x1}, 0x6c}}, 0x0)

C-repro:
repro.txt

kernel-config:
CURRENT_CONFIG.txt

weighttp-test on 1KB file with mptcp.org-client in "ndiffports"-mode

Running the test "simple_abndiff" with mptcp.org-client and netnext server [1], I see netnext "stalling".

The test https://github.com/multipath-tcp/mptcp-scripts/blob/master/testing/testing.py#L1725, creates 100 concurrent clients for 100000 requests of 1KB files. mptcp.org is configured with "ndiffports" 8, which means it will create 8 subflows to the primary IP-address without waiting for ADD_ADDR.

netnext reports at the beginning of the test SYN-flooding:
server login: [ 165.274087] TCP: request_sock_subflow: Possible SYN flooding on port 80. Dropping request. Check SNMP counters.

weighttp is reporting connect-timeouts on the mptcp.org-client:
error: connect() failed on port 49932: Connection timed out (110).

Logging in to the netnext-server, I see all apache2 processes spinning at 90% CPU.
Also, I see sshd at 90% :-/

ADD_ADDR: echo bit support

The current implementation supports the send and reception of ADD_ADDR (v1) but not in a reliable way: the receiver is supposed to send the same ADD_ADDR back with the echo bit set to 1. If it is not received after some time, the sender can send it again.

As specified in RFC8684: https://www.rfc-editor.org/rfc/rfc8684.html#sec_add_address

The "E" flag exists to provide reliability for this option. Because this option will often be sent on pure ACKs, there is no guarantee of reliability. Therefore, a receiver receiving a fresh ADD_ADDR option (where E=0) will send the same option back to the sender, but not including the HMAC and with E=1, to indicate receipt. According to local policy, the lack of this type of "echo" can indicate to the initial ADD_ADDR sender that the ADD_ADDR needs to be retransmitted.

https://www.rfc-editor.org/rfc/rfc8684.html#section-3.4.1-9

For the moment, ADD_ADDR with the echo bit are ignored. Potentially, a host could retransmit the ADD_ADDR indefinitely because the current version don't send the ACK. It might be interesting to stop the retransmission once the address is used, after a number of retransmissions, etc.
→ To be validated with a version of the upstream kernel not sending the ACK (ADD_ADDR with the echo bit).

(Feature from the initial roadmap)

REMOVE_ADDR support

As defined in RFC8684:

If, during the lifetime of an MPTCP connection, a previously announced address becomes invalid (e.g., if the interface disappears or an IPv6 address is no longer preferred), the affected host SHOULD announce this situation so that the peer can remove subflows related to this address. Even if an address is not in use by an MPTCP connection, if it has been previously announced, an implementation SHOULD announce its removal. A host MAY also choose to announce that a valid IP address should not be used any longer -- for example, for make‑before-break session continuity.
This is achieved through the Remove Address (REMOVE_ADDR) option (Figure 13), which will remove a previously added address (or list of addresses) from a connection and terminate any subflows currently using that address.

                     1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-------+---------------+
|     Kind      |Length = 3 + n |Subtype|(resvd)|   Address ID  | ...
+---------------+---------------+-------+-------+---------------+
                           (followed by n-1 Address IDs, if required)

Figure 13: Remove Address (REMOVE_ADDR) Option

For security purposes, if a host receives a REMOVE_ADDR option, it must ensure that the affected path or paths are no longer in use before it instigates closure. The receipt of REMOVE_ADDR SHOULD first trigger the sending of a TCP keepalive [RFC1122] on the path, and if a response is received, the path SHOULD NOT be removed. If the path is found to still be alive, the receiving host SHOULD no longer use the specified address for future connections, but it is the responsibility of the host that sent the REMOVE_ADDR to shut down the subflow. Before the address is removed, the requesting host MAY also use MP_PRIO (Section 3.3.8) to request that a path no longer be used. Typical TCP validity tests on the subflow (e.g., ensuring that sequence and ACK numbers are correct) MUST also be undertaken. An implementation can use indications of these test failures as part of intrusion detection or error logging.
The sending and receipt (if no keepalive response was received) of this message SHOULD trigger the sending of RSTs by both hosts on the affected subflow(s) (if possible), as a courtesy, to allow the cleanup of middlebox state before cleaning up any local state.
Address removal is undertaken according to the Address ID, so as to permit the use of NATs and other middleboxes that rewrite source addresses. If an Address ID is not known, the receiver will silently ignore the request.
A subflow that is still functioning MUST be closed with a FIN exchange as in regular TCP, rather than using this option. For more information, see Section 3.3.3.

https://www.rfc-editor.org/rfc/rfc8684.html#name-remove-address

If needed, new tickets can be created to track: receiver/sender side only and packetdrill support.

(Feature from the initial roadmap)

keep a single work struct in mptcp socket

currently we have 2 different work_struct in mptcp_sock: one for rtx/ack/eof and one for the PM. we can keep only one of them, reducing the memory usage and likely increasing the efficiency - if multiple event are scheduled in a short time frame, we will acquire the lock only once

MP_FASTCLOSE support (send part remaining)

As defined in RFC8684:

Regular TCP has the means of sending a RST signal to abruptly close a connection. With MPTCP, a regular RST only has the scope of the subflow; it will only close the applicable subflow and will not affect the remaining subflows. MPTCP's connection will stay alive at the data level, in order to permit break-before-make handover between subflows. It is therefore necessary to provide an MPTCP-level "reset" to allow the abrupt closure of the whole MPTCP connection; this is done via the MP_FASTCLOSE option.

MP_FASTCLOSE is used to indicate to the peer that the connection will be abruptly closed and no data will be accepted anymore. The reasons for triggering an MP_FASTCLOSE are implementation specific. Regular TCP does not allow the sending of a RST while the connection is in a synchronized state [RFC0793]. Nevertheless, implementations allow the sending of a RST in this state if, for example, the operating system is running out of resources. In these cases, MPTCP should send the MP_FASTCLOSE. This option is illustrated in Figure 14.

                      1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +---------------+---------------+-------+-----------------------+
 |     Kind      |    Length     |Subtype|      (reserved)       |
 +---------------+---------------+-------+-----------------------+
 |                      Option Receiver's Key                    |
 |                            (64 bits)                          |
 |                                                               |
 +---------------------------------------------------------------+

Figure 14: Fast Close (MP_FASTCLOSE) Option

If Host A wants to force the closure of an MPTCP connection, it can do so via two options:

  • Option A (ACK): Host A sends an ACK containing the MP_FASTCLOSE option on one subflow, containing the key of Host B as declared in the initial connection handshake. On all the other subflows, Host A sends a regular TCP RST to close these subflows and tears them down. Host A now enters FASTCLOSE_WAIT state.
  • Option R (RST): Host A sends a RST containing the MP_FASTCLOSE option on all subflows, containing the key of Host B as declared in the initial connection handshake. Host A can tear down the subflows and the connection immediately.

If Host A decides to force the closure by using Option A and sending an ACK with the MP_FASTCLOSE option, the connection shall proceed as follows:

  • Upon receipt of an ACK with MP_FASTCLOSE by Host B, containing the valid key, Host B answers on the same subflow with a TCP RST and tears down all subflows also through sending TCP RST signals. Host B can now close the whole MPTCP connection (it transitions directly to CLOSED state).
  • As soon as Host A has received the TCP RST on the remaining subflow, it can close this subflow and tear down the whole connection (transition from FASTCLOSE_WAIT state to CLOSED state). If Host A receives an MP_FASTCLOSE instead of a TCP RST, both hosts attempted fast closure simultaneously. Host A should reply with a TCP RST and tear down the connection.
  • If Host A does not receive a TCP RST in reply to its MP_FASTCLOSE after one retransmission timeout (RTO) (the RTO of the subflow where the MP_FASTCLOSE has been sent), it SHOULD retransmit the MP_FASTCLOSE. To keep this connection from being retained for a long time, the number of retransmissions SHOULD be limited; this limit is implementation specific. A RECOMMENDED number is 3. If no TCP RST is received in response, Host A SHOULD send a TCP RST with the MP_FASTCLOSE option itself when it releases state in order to clear any remaining state at middleboxes.

If, however, Host A decides to force the closure by using Option R and sending a RST with the MP_FASTCLOSE option, Host B will act as follows: upon receipt of a RST with MP_FASTCLOSE, containing the valid key, Host B tears down all subflows by sending a TCP RST. Host B can now close the whole MPTCP connection (it transitions directly to CLOSED state).

https://www.rfc-editor.org/rfc/rfc8684.html#name-fast-close

If needed, new tickets can be created to track: receiver/sender side only and packetdrill support.

Note: there is a difference with MPTCPv0: https://www.rfc-editor.org/rfc/rfc8684.html#section-appendix.e-2.11

This document describes an additional way of performing a Fast Close -- by sending an MP_FASTCLOSE option on a RST on all subflows. This allows the host to tear down the subflows and the connection immediately.

mptcp.org implementation only support the other way. It is not recommended to support the other way not to create a new TCP status to be able to retransmit the MP_FASTCLOSE.

  • Kernel: parse and act on incoming FASTCLOSE
  • Kernel: send MP_FASTCLOSE
  • Packetdrill tests incoming FC
  • Packetdrill tests outgoing FC

(Feature from the initial roadmap)

audit mptcp_disconnect()

the:

     lock_sock(sk);

looks suspicious (lock should be already held by the caller)

And call to:

     tcp_disconnect(sk, flags);

either - this is not a TCP socket, we should likely loop on conn_list, join_list

loss and delay without reordering causes very slow transfer

When running mptcp_connect kselftest, we can have timeouts when losses and delays are important but there is no re-ordering, e.g.

00:28:22.462 # INFO: Using loss of 0.52% delay 381 ms on ns3eth4
00:28:22.513 # ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP	(duration   293ms) [ OK ]
00:28:23.075 # ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP  	(duration   178ms) [ OK ]
00:28:23.492 # ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP	(duration   166ms) [ OK ]
00:28:23.909 # ns1 MPTCP -> ns1 (dead:beef:1::1:10003) MPTCP	(duration   231ms) [ OK ]
00:28:24.386 # ns1 MPTCP -> ns1 (dead:beef:1::1:10004) TCP  	(duration   203ms) [ OK ]
00:28:24.834 # ns1 TCP   -> ns1 (dead:beef:1::1:10005) MPTCP	(duration   167ms) [ OK ]
00:28:25.253 # ns1 MPTCP -> ns2 (10.0.1.2:10006      ) MPTCP	(duration   454ms) [ OK ]
00:28:25.944 # ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP	(duration   465ms) [ OK ]
00:28:26.651 # ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP	(duration   464ms) [ OK ]
00:28:27.363 # ns1 MPTCP -> ns2 (dead:beef:2::1:10009) MPTCP	(duration   476ms) [ OK ]
00:28:28.087 # ns1 MPTCP -> ns3 (10.0.2.2:10010      ) MPTCP	(duration   718ms) [ OK ]
00:28:29.063 # ns1 MPTCP -> ns3 (dead:beef:2::2:10011) MPTCP	(duration   728ms) [ OK ]
00:28:30.038 # ns1 MPTCP -> ns3 (10.0.3.2:10012      ) MPTCP	(duration   674ms) [ OK ]
00:28:30.972 # ns1 MPTCP -> ns3 (dead:beef:3::2:10013) MPTCP	(duration   742ms) [ OK ]
00:28:31.964 # ns1 MPTCP -> ns4 (10.0.3.1:10014      ) MPTCP	(duration 54811ms) [ OK ]
00:29:27.032 # ns1 MPTCP -> ns4 (dead:beef:3::1:10015) MPTCP	(duration 58387ms) [ OK ]
00:30:25.712 # ns2 MPTCP -> ns1 (10.0.1.1:10016      ) MPTCP	(duration   551ms) [ OK ]
00:30:26.542 # ns2 MPTCP -> ns1 (dead:beef:1::1:10017) MPTCP	(duration   590ms) [ OK ]
00:30:27.419 # ns2 MPTCP -> ns3 (10.0.2.2:10018      ) MPTCP	(duration   590ms) [ OK ]
00:30:28.281 # ns2 MPTCP -> ns3 (dead:beef:2::2:10019) MPTCP	(duration   735ms) [ OK ]
00:30:29.290 # ns2 MPTCP -> ns3 (10.0.3.2:10020      ) MPTCP	(duration   662ms) [ OK ]
00:30:30.239 # ns2 MPTCP -> ns3 (dead:beef:3::2:10021) MPTCP	(duration   633ms) [ OK ]
00:30:31.144 # ns2 MPTCP -> ns4 (10.0.3.1:10022      ) MPTCP	(duration 47109ms) [ OK ]
00:31:18.538 # ns2 MPTCP -> ns4 (dead:beef:3::1:10023) MPTCP	(duration 64366ms) [ OK ]
00:32:23.171 # ns3 MPTCP -> ns1 (10.0.1.1:10024      ) MPTCP	(duration   509ms) [ OK ]
00:32:23.925 # ns3 MPTCP -> ns1 (dead:beef:1::1:10025) MPTCP	(duration   564ms) [ OK ]
00:32:24.733 # ns3 MPTCP -> ns2 (10.0.1.2:10026      ) MPTCP	(duration   467ms) [ OK ]
00:32:25.450 # ns3 MPTCP -> ns2 (dead:beef:1::2:10027) MPTCP	(duration   473ms) [ OK ]
00:32:26.178 # ns3 MPTCP -> ns2 (10.0.2.1:10028      ) MPTCP	(duration   456ms) [ OK ]
00:32:31.022 # ns3 MPTCP -> ns2 (dead:beef:2::1:10029) MPTCP	(duration   466ms) [ OK ]
00:32:31.022 # ns3 MPTCP -> ns4 (10.0.3.1:10030      ) MPTCP	(duration 43213ms) [ OK ]
00:33:11.030 # ns3 MPTCP -> ns4 (dead:beef:3::1:10031) MPTCP	(duration 43608ms) [ OK ]
00:33:54.904 # ns4 MPTCP -> ns1 (10.0.1.1:10032      ) MPTCP	(duration 42663ms) [ OK ]
00:34:37.837 # ns4 MPTCP -> ns1 (dead:beef:1::1:10033) MPTCP	(duration 43245ms) [ OK ]
00:35:21.343 # ns4 MPTCP -> ns2 (10.0.1.2:10034      ) MPTCP	./mptcp_connect.sh: line 114:  1124 Terminated              ip netns exec ${listener_ns} ./mptcp_connect -t $timeout -l -p $port -s ${srv_proto} $extra_args $local_addr < "$sin" > "$sout"
00:35:46.159 # ./mptcp_connect.sh: line 114:  1130 Terminated              ip netns exec ${connector_ns} ./mptcp_connect -t $timeout -p $port -s ${cl_proto} $extra_args $connect_addr < "$cin" > "$cout"
00:35:46.413 #
00:35:46.415 not ok 1 selftests: net/mptcp: mptcp_connect.sh # TIMEOUT

We only have this issue when there is no re-ordering added with TC netem.

Note: when it is fixed, it could be good to reduce the default timeout, linked to https://patchwork.ozlabs.org/patch/1196109/

Data Acknowledgement if single node is used

The MPTCP uses Data Acknowledgement in order to retransmit data if one of the nodes fails permanently. Whatever, if there is only one (single) node between mptcp capable sender and mptcp capable receiver, does the Data Acknowledgement is still operating?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.