Comments (12)
Thanks for the repro program.
On my machine, it prints:
2020/07/22 23:26:08 Making ever more rules...
2020/07/22 23:26:08 Trying 75 rules
2020/07/22 23:26:09 Trying 80 rules
2020/07/22 23:26:09 Trying 85 rules
2020/07/22 23:26:09 Trying 90 rules
2020/07/22 23:26:09 Trying 95 rules
2020/07/22 23:26:09 Trying 100 rules
2020/07/22 23:26:09 Trying 105 rules
2020/07/22 23:26:09 Trying 110 rules
2020/07/22 23:26:09 Trying 115 rules
2020/07/22 23:26:09 Trying 120 rules
2020/07/22 23:26:09 Trying 125 rules
2020/07/22 23:26:09 Trying 130 rules
2020/07/22 23:26:09 Trying 135 rules
2020/07/22 23:26:09 Trying 140 rules
2020/07/22 23:26:09 nftables error: Receive: netlink receive: recvmsg: no buffer space available
My guess would be that due to the NLM_F_ACK
flag, each request generates a reply, but the combination of replies exceeds the socket buffer size, and hence the kernel returns -ENOBUFS.
Iām not yet sure what the best way to fix this is:
- Do nothing at the library level, and instead rely on users not causing too large reply packets.
- Stop requesting acknowledgements to keep reply sizes small? Maybe this only reduces the likelihood of triggering the issue, in case there are replies which are always big.
- Reading https://patchwork.ozlabs.org/project/netdev/patch/20090323141858.6808.50788.stgit@Decadence/, I think setting the NoENOBUFS option results in dropping acknowledgements instead of returning
-ENOBUFS
. Instead of setting NoENOBUFS, we should probably stop requesting acknowledgements.
from nftables.
@mdlayher any thoughts?
from nftables.
Unfortunately I don't have any more info. I've never run into this error in my work and only chose to implement the option because it appeared in a list in the man pages.
from nftables.
Ok thanks, I'll see if I can capture and minify a repro for this. I assume without that there's basically nothing to look at š
I am running my code on two different Debian releases, 10 and sid, and see the error frequently on both. (I've modified my calling code to consider ENOBUFS
as a successful Flush(), instead of patching NoENOBUFS
on, which might be a more reasonable workaround. Really not sure though.)
from nftables.
I took a look at the mailing list thread you mentioned above and it seems to indicate that userspace cannot keep up with the kernel's messages.
When I wrote my netlink code, simplicity and correctness were the primary goals. I know there are some outstanding work items related to OS thread locking and etc that can slow things down, but since I haven't run into them in my use cases, it's low priority work for me.
You could try DisableNSLockThread
in https://godoc.org/github.com/mdlayher/netlink#Config but I have no other advice at the moment unfortunately.
from nftables.
I took a look at the mailing list thread you mentioned above and it seems to indicate that userspace cannot keep up with the kernel's messages.
That's the part that I don't really understand--it makes sense to fall behind when processing individual packets in userspace as the thread indicates, but I'm just sending ~100 rules to the kernel in one batch and waiting for an ACK batch back. It seems like it should be hard to fall behind on that? I suppose it's related to how the ACK includes handles for rules that are created.
You could try DisableNSLockThread in https://godoc.org/github.com/mdlayher/netlink#Config but I have no other advice at the moment unfortunately.
This is the case already:
Lines 92 to 98 in 26bcabf
true
for me.
Here's a super simplified reproduction that just puts a bunch of accept
rules into one table/chain: https://gist.github.com/danopia/437173efc0046962fafb746eccfe830e
On my laptop it reaches ENOBUFS at 138 accept
rules. I also tried tcp dport 80 ip saddr 1.1.1.1 accept
and ENOBUFS appeared at 104 rules. (So rule complexity is relevant but not as relevant as I'd expect.)
> go build .; and sudo setcap cap_net_admin+ep firewall-buf-repro; and ./firewall-buf-repro
2020/07/14 19:54:39 Making ever more rules...
2020/07/14 19:54:39 Trying 75 rules
2020/07/14 19:54:39 Trying 80 rules
2020/07/14 19:54:39 Trying 85 rules
2020/07/14 19:54:39 Trying 90 rules
2020/07/14 19:54:39 Trying 95 rules
2020/07/14 19:54:39 Trying 100 rules
2020/07/14 19:54:39 Trying 105 rules
2020/07/14 19:54:39 Trying 110 rules
2020/07/14 19:54:39 Trying 115 rules
2020/07/14 19:54:39 Trying 120 rules
2020/07/14 19:54:39 Trying 125 rules
2020/07/14 19:54:39 Trying 130 rules
2020/07/14 19:54:39 Trying 135 rules
2020/07/14 19:54:39 Trying 140 rules
2020/07/14 19:54:39 nftables error: Receive: netlink receive: recvmsg: no buffer space available
Does it make sense that this library is unable to acknowledge 150 accept
rules in one flush? Is everyone who manages a couple hundred rules creating them piecemeal? I didn't think that this would be too many rules!
# one too many rules to receive Flush() confirmation on my laptop
> sudo nft list table demo-table
table ip demo-table {
chain cluster-ips {
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
accept
}
}
from nftables.
Instead of setting NoENOBUFS, we should probably stop requesting acknowledgements.
Seems reasonable to me.
from nftables.
Thanks for looking into the netdev source. Since we're dealing with the kernel I suppose it makes sense that packets would get dropped instead of buffered when using NoENOBUFS
.
I found the specific message for adding rules and noticed that it has both ECHO and ACK flags masked in:
Line 132 in 7127d9d
So I decided to do a little bit of testing with the above accept
repro:
- With ECHO | ACK, my loop crashes at 140 rules.
- With only ACK, it crashes at 280 rules.
- With only ECHO, it also crashes at 280 rules.
- With neither, it seems error free up to / past 1000 rules.
Then I adjusted the repro to include 8 verdicts per rule, accept accept accept accept accept accept accept accept
:
- With ECHO | ACK, it crashes at 105 rules
- With only ACK, it crashes at 280 rules.
- With only ECHO, it crashes at 170 rules.
So it seems like ACK isn't the only problem, especially as rule complexity is increased. ECHO is a larger contributor towards read buffer space. Frustratingly, I've noticed this unrelated work that depends on the ECHO responses: #88
Maybe there's need for configurability if the masking is being changed? So folks that want ECHO on each frame can keep it. I only care about knowing if the whole batch was accepted as a unit.
(I also toyed with adding calls like conn.SetReadBuffer(200*1000)
in conn.go
which directly changed the ENOBUFS boundary and could be made configurable, however rmem_max
is still at play without resorting to SO_RCVBUFFORCE
. So that's a separate hole to go down.)
from nftables.
Thanks for the details!
How does the nft(8)
tool handle this issue?
from nftables.
When using nft --debug mnl -f ...
, I see neither ECHO nor ACK flags on any of the packets. Thus it shouldn't have this problem at all.
When using nft --debug mnl --echo -f ...
, I see only the ECHO flag added to all the packets. With this arg I actually see the nft(8)
tool fail silently, after around 170 lines of tcp dport http ip saddr 1.1.1.1 accept
repeated:
> sudo nft -f nft.txt
> echo $status
0
> sudo nft -f nft.txt --echo
> echo $status
1
I don't fully understand when nft(8)
uses ACK. Passing --handle
by itself seems to be a no-op. Passing --handle --echo
starting using ACK for some packets, but the main rules packets still seem to only have ECHO masked in.
In summary, nft(8) doesn't handle this error at all, but it doesn't use ACK or ECHO by default either, so it generally avoids the issue overall.
from nftables.
Thanks for the details! Sounds like removing the ACK and ECHO flags is the best choice to make, then. Is there any downside to that Iām not seeing? Would you be interested in sending a PR that removes them?
from nftables.
I'm closing this as stale. I ignore the error in my program and have had no further issues since then.
from nftables.
Related Issues (20)
- Anonymous time collection problem HOT 10
- Please consider create a release/tag HOT 1
- Alignment issues on 32-bit archs: TestAlignedBuff32 & TestAlignedBuffInt32 failures HOT 9
- Test failures on s390x: endianness problems? HOT 11
- Reason for not wrapping libnftnl/libmnl. HOT 1
- GetRules lost expr.Masq HOT 1
- nftables go dynset implementation will not work with libnftnl versions <1.1.9 HOT 1
- High
- Feature: add support for monitor HOT 6
- AddSet IPv4 wrong byte order on Ubuntu 22.04 HOT 8
- Rule Handle not updated after InsertRule even using Flush. HOT 4
- Not all response messages are received causing the receive buffer to overflow HOT 8
- Named quotas and their usage in map HOT 1
- BUG: block in Conn.Flush() HOT 4
- How to get an error when try to add an existed table?
- Adding rules in code produces different results and logs than the rules I added directly from the command line HOT 6
- Use a CIDR prefix as target in a NAT rule HOT 3
- Troubleshooting NFTables Table Creation with Go HOT 2
- Objects implementation refactor HOT 1
- Broken deps HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nftables.