Comments (5)
Bump, do you have thoughts on this?
A colleague pointed out that nf_conntrack_find_get already does an rcu lock / unlock, so maybe the lock / unlock is not needed at this point in the first place?
from glb-director.
👍 This does look like an oversight, nice catch.
Looking back at history in an internal repo, the lock was added early on because conntrack would leak connections without it, I don't remember the specific context.
It does look like nf_conntrack_find_get
takes a read lock around internal operations but I believe the lock needs to be held for the remaining usage of thash
in the glb scope as well since the internal structure is still referenced.
from glb-director.
Thanks for raising the PR!
I spent some more time thinking about this, and I'm a bit puzzled about it's effects. From my understanding this should lead to machines deadlocking, since the rcu critical section is never left. If that is true, I would have expected you guys to experience & find this bug earlier.
After digging through the Linux source, I think it might be the case that the rcu_lock is never actually hit. See the comment on inet_ehash_bucket and inet_hashinfo. This would mean that SYN_RECV is always handled by the inet_lookup_established case, and conntrack would never find a tuple. What am I missing?
from glb-director.
From my understanding this should lead to machines deadlocking, since the rcu critical section is never left. If that is true, I would have expected you guys to experience & find this bug earlier.
I think it would if that code path was hit, but it looks like nf_ct_get_tuplepr
already checks that the tuple nf_conntrack_find_get
searches for exists, and then nf_ct_tuplehash_to_ctrack
appears to always return a value in this scenario (every other use in nf_conntrack_core.c
appears to just assume it returns a non-NULL value). There could be race conditions there, especially between the nf_ct_get_tuplepr
and nf_conntrack_find_get
, so it should definitely be released, but that's likely the reason we haven't observed it.
See the comment on inet_ehash_bucket and inet_hashinfo. This would mean that SYN_RECV is always handled by the inet_lookup_established case, and conntrack would never find a tuple. What am I missing?
IIRC, at least at the time this code was written, TCP sockets created on receipt of a SYN were not full sockets and weren't available in the established list (this commit mentions something about this torvalds/linux@10feb42). In that case, they were available in conntrack but the real socket was in the parent LISTEN socket which required a spinlock to be held in 3.x kernels.
It'd definitely be worth revisiting whether this is the case in modern (4.x series) kernels since lockless LISTEN access was added, I imagine a lot of this has changed, or at the very least we should be able to retrieve it all locklessly through the socket hashtables.
from glb-director.
IIRC, at least at the time this code was written, TCP sockets created on receipt of a SYN were not full sockets and weren't available in the established list (this commit mentions something about this torvalds/linux@10feb42). In that case, they were available in conntrack but the real socket was in the parent LISTEN socket which required a spinlock to be held in 3.x kernels.
Ah, that makes sense. My Linux network stack knowledge isn't great admittedly.
Thanks for fixing the RCU issue.
from glb-director.
Related Issues (20)
- glb-director-xdp on bonded nic HOT 2
- stopping the glb-director-xdp service does not stop directing traffic
- Metrics of glb-director-xdp HOT 1
- cibuild-create-packages fails to prepare the Docker build environment with a broken packages error HOT 1
- conntrack lookup removal in ipt_GLBREDIRECT breaks with network namespaces HOT 1
- Tag releases please HOT 1
- glb-director failing host + ecmp/ibgp HOT 4
- glb-director fails to build with kernel linux-5.9.1 HOT 1
- Question: XDP Director status HOT 3
- Question about filling state HOT 2
- How was this made?
- how to use glb-director-xdp?
- make error
- Destination Mac address mapping according to backend IP address HOT 2
- Package for Debian buster ? HOT 2
- dperf: a high performance open source l4lb load tester
- cannot build with latest DPDK HOT 1
- GUE healthcheck fails with self IP
- Is it possible to run GLB without GUE? HOT 1
- Branch Protections Audit - 2022-10-05T18-41-55-524 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from glb-director.