Code Monkey home page Code Monkey logo

Comments (12)

stapelberg avatar stapelberg commented on May 14, 2024 1

Thanks for the repro program.

On my machine, it prints:

2020/07/22 23:26:08 Making ever more rules...
2020/07/22 23:26:08 Trying 75 rules
2020/07/22 23:26:09 Trying 80 rules
2020/07/22 23:26:09 Trying 85 rules
2020/07/22 23:26:09 Trying 90 rules
2020/07/22 23:26:09 Trying 95 rules
2020/07/22 23:26:09 Trying 100 rules
2020/07/22 23:26:09 Trying 105 rules
2020/07/22 23:26:09 Trying 110 rules
2020/07/22 23:26:09 Trying 115 rules
2020/07/22 23:26:09 Trying 120 rules
2020/07/22 23:26:09 Trying 125 rules
2020/07/22 23:26:09 Trying 130 rules
2020/07/22 23:26:09 Trying 135 rules
2020/07/22 23:26:09 Trying 140 rules
2020/07/22 23:26:09 nftables error: Receive: netlink receive: recvmsg: no buffer space available

My guess would be that due to the NLM_F_ACK flag, each request generates a reply, but the combination of replies exceeds the socket buffer size, and hence the kernel returns -ENOBUFS.

Iā€™m not yet sure what the best way to fix this is:

  1. Do nothing at the library level, and instead rely on users not causing too large reply packets.
  2. Stop requesting acknowledgements to keep reply sizes small? Maybe this only reduces the likelihood of triggering the issue, in case there are replies which are always big.
  3. Reading https://patchwork.ozlabs.org/project/netdev/patch/20090323141858.6808.50788.stgit@Decadence/, I think setting the NoENOBUFS option results in dropping acknowledgements instead of returning -ENOBUFS. Instead of setting NoENOBUFS, we should probably stop requesting acknowledgements.

from nftables.

stapelberg avatar stapelberg commented on May 14, 2024

@mdlayher any thoughts?

from nftables.

mdlayher avatar mdlayher commented on May 14, 2024

Unfortunately I don't have any more info. I've never run into this error in my work and only chose to implement the option because it appeared in a list in the man pages.

from nftables.

danopia avatar danopia commented on May 14, 2024

Ok thanks, I'll see if I can capture and minify a repro for this. I assume without that there's basically nothing to look at šŸ˜„

I am running my code on two different Debian releases, 10 and sid, and see the error frequently on both. (I've modified my calling code to consider ENOBUFS as a successful Flush(), instead of patching NoENOBUFS on, which might be a more reasonable workaround. Really not sure though.)

from nftables.

mdlayher avatar mdlayher commented on May 14, 2024

I took a look at the mailing list thread you mentioned above and it seems to indicate that userspace cannot keep up with the kernel's messages.

When I wrote my netlink code, simplicity and correctness were the primary goals. I know there are some outstanding work items related to OS thread locking and etc that can slow things down, but since I haven't run into them in my use cases, it's low priority work for me.

You could try DisableNSLockThread in https://godoc.org/github.com/mdlayher/netlink#Config but I have no other advice at the moment unfortunately.

from nftables.

danopia avatar danopia commented on May 14, 2024

I took a look at the mailing list thread you mentioned above and it seems to indicate that userspace cannot keep up with the kernel's messages.

That's the part that I don't really understand--it makes sense to fall behind when processing individual packets in userspace as the thread indicates, but I'm just sending ~100 rules to the kernel in one batch and waiting for an ACK batch back. It seems like it should be hard to fall behind on that? I suppose it's related to how the ACK includes handles for rules that are created.

You could try DisableNSLockThread in https://godoc.org/github.com/mdlayher/netlink#Config but I have no other advice at the moment unfortunately.

This is the case already:

nftables/conn.go

Lines 92 to 98 in 26bcabf

disableNSLockThread := false
if cc.NetNS == 0 {
// Since process is operating in main namespace, there is no reason to have NSLockThread lock,
// as it causes some performance penalties.
disableNSLockThread = true
}
return netlink.Dial(unix.NETLINK_NETFILTER, &netlink.Config{NetNS: cc.NetNS, DisableNSLockThread: disableNSLockThread})
and I've confirmed that it's passing true for me.


Here's a super simplified reproduction that just puts a bunch of accept rules into one table/chain: https://gist.github.com/danopia/437173efc0046962fafb746eccfe830e

On my laptop it reaches ENOBUFS at 138 accept rules. I also tried tcp dport 80 ip saddr 1.1.1.1 accept and ENOBUFS appeared at 104 rules. (So rule complexity is relevant but not as relevant as I'd expect.)

> go build .; and sudo setcap cap_net_admin+ep firewall-buf-repro; and ./firewall-buf-repro
2020/07/14 19:54:39 Making ever more rules...
2020/07/14 19:54:39 Trying 75 rules
2020/07/14 19:54:39 Trying 80 rules
2020/07/14 19:54:39 Trying 85 rules
2020/07/14 19:54:39 Trying 90 rules
2020/07/14 19:54:39 Trying 95 rules
2020/07/14 19:54:39 Trying 100 rules
2020/07/14 19:54:39 Trying 105 rules
2020/07/14 19:54:39 Trying 110 rules
2020/07/14 19:54:39 Trying 115 rules
2020/07/14 19:54:39 Trying 120 rules
2020/07/14 19:54:39 Trying 125 rules
2020/07/14 19:54:39 Trying 130 rules
2020/07/14 19:54:39 Trying 135 rules
2020/07/14 19:54:39 Trying 140 rules
2020/07/14 19:54:39 nftables error: Receive: netlink receive: recvmsg: no buffer space available

Does it make sense that this library is unable to acknowledge 150 accept rules in one flush? Is everyone who manages a couple hundred rules creating them piecemeal? I didn't think that this would be too many rules!

# one too many rules to receive Flush() confirmation on my laptop
> sudo nft list table demo-table
table ip demo-table {
        chain cluster-ips {
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
        }
}

from nftables.

mdlayher avatar mdlayher commented on May 14, 2024

Instead of setting NoENOBUFS, we should probably stop requesting acknowledgements.

Seems reasonable to me.

from nftables.

danopia avatar danopia commented on May 14, 2024

Thanks for looking into the netdev source. Since we're dealing with the kernel I suppose it makes sense that packets would get dropped instead of buffered when using NoENOBUFS.

I found the specific message for adding rules and noticed that it has both ECHO and ACK flags masked in:

flags = netlink.Request | netlink.Acknowledge | netlink.Create | unix.NLM_F_ECHO | unix.NLM_F_APPEND

So I decided to do a little bit of testing with the above accept repro:

  • With ECHO | ACK, my loop crashes at 140 rules.
  • With only ACK, it crashes at 280 rules.
  • With only ECHO, it also crashes at 280 rules.
  • With neither, it seems error free up to / past 1000 rules.

Then I adjusted the repro to include 8 verdicts per rule, accept accept accept accept accept accept accept accept:

  • With ECHO | ACK, it crashes at 105 rules
  • With only ACK, it crashes at 280 rules.
  • With only ECHO, it crashes at 170 rules.

So it seems like ACK isn't the only problem, especially as rule complexity is increased. ECHO is a larger contributor towards read buffer space. Frustratingly, I've noticed this unrelated work that depends on the ECHO responses: #88

Maybe there's need for configurability if the masking is being changed? So folks that want ECHO on each frame can keep it. I only care about knowing if the whole batch was accepted as a unit.

(I also toyed with adding calls like conn.SetReadBuffer(200*1000) in conn.go which directly changed the ENOBUFS boundary and could be made configurable, however rmem_max is still at play without resorting to SO_RCVBUFFORCE. So that's a separate hole to go down.)

from nftables.

stapelberg avatar stapelberg commented on May 14, 2024

Thanks for the details!

How does the nft(8) tool handle this issue?

from nftables.

danopia avatar danopia commented on May 14, 2024

When using nft --debug mnl -f ..., I see neither ECHO nor ACK flags on any of the packets. Thus it shouldn't have this problem at all.

When using nft --debug mnl --echo -f ..., I see only the ECHO flag added to all the packets. With this arg I actually see the nft(8) tool fail silently, after around 170 lines of tcp dport http ip saddr 1.1.1.1 accept repeated:

> sudo nft -f nft.txt
> echo $status
0

> sudo nft -f nft.txt --echo
> echo $status
1

I don't fully understand when nft(8) uses ACK. Passing --handle by itself seems to be a no-op. Passing --handle --echo starting using ACK for some packets, but the main rules packets still seem to only have ECHO masked in.

In summary, nft(8) doesn't handle this error at all, but it doesn't use ACK or ECHO by default either, so it generally avoids the issue overall.

from nftables.

stapelberg avatar stapelberg commented on May 14, 2024

Thanks for the details! Sounds like removing the ACK and ECHO flags is the best choice to make, then. Is there any downside to that Iā€™m not seeing? Would you be interested in sending a PR that removes them?

from nftables.

danopia avatar danopia commented on May 14, 2024

I'm closing this as stale. I ignore the error in my program and have had no further issues since then.

from nftables.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.