Comments (9)
What is "spreading"?
It's the most common term I found among the dicussions in #70 (comment).
My own conclusion: Let's say you have a cluster where nodes come and go (which is common) and you do for example kubectl set image
(which is also common) a scheduler must have some default logic to avoid placing all the replacment pods on the same recently added node. The problem is that I've observed the opposite behavior in GKE. I've also observed the expected behavior, but I fail to identify the causes.
from kubernetes-kafka.
Good that you brought this up. I noticed that both are present in the recent blog post http://blog.kubernetes.io/2017/09/kubernetes-statefulsets-daemonsets.html too, and they explain it here. I'd gladly accept contributions, but preferrably with some proof of the benefits.
PodDisruptionBudgets: We just haven't found the time to investigate the need for it, and our policy to run with defaults until we learn exactly why we should override them. Manifests need maintenance too.
AntiAffinity: I did quite a bit of research on this for other services in our cluster, and "spreading" should actually be the default behavior, at least if the service is created first. Below is an exerpt from our internal issue tracker. We've observed this to be flaky for Deployment, sometimes spreading and sometimes not - possibly due to services or resource limits, but our Kafka and Zookeeper pods we've actually never seen on the same node in production.
kubernetes/kubernetes#2312
"The code we have already spreads all pods belonging to the same replication controller."
kubernetes/kubernetes#11144 (comment)
"there are other reasons services should be started first"
kubernetes/kubernetes#11369
"We don't handle the case of multiple services matching the same Pod very well"
kubernetes/kubernetes#10242
"Scheduler needs to deal with pods without resource limits"
"Create an rc with 100 pods in a custom namespace and you'll end up with all 100 on the same node"
kubernetes/kubernetes#21074
"selector_spreading functionality in scheduler"
https://github.com/kubernetes/kubernetes/pull/21235/files#diff-d44336036b627f815adec0707e648e4fL68
"CalculateSpreadPriority" removed, "SelectorSpreadPriority" added?
kubernetes/kubernetes#4971
"We are now spreading by both."
kubernetes/kubernetes#41708
"The scheduler SelectorSpread priority funtion didn't have the code to spread pods of StatefulSets."
(merged after 1.5)
kubernetes/kubernetes#27484
"We should investigate (1) if you request and limit 0, does it spread evenly"
https://stackoverflow.com/questions/37784480/avoiding-kubernetes-scheduler-to-run-all-pods-in-single-node-of-kubernetes-clust
"The scheduler should spread your pods if your containers specify resource request for the amount of memory and CPU they need"
from kubernetes-kafka.
Yeh, the PDB, in this case, in my mind seems to be quite valuable as it ensure that's both Kafka and Zookeeper keep their minimum amount of nodes in the cluster to function properly.
I created a simple 3 node cluster in GKE with the following commands:
gcloud container clusters create kafka --cluster-version=1.7.5
kc apply -f ./zookeeper/
# let things settle
kc apply -f ./
# let them settled again
# upgrade master
gcloud container clusters upgrade kafka --cluster-version=1.7.6 --master
# upgrade nodes
gcloud container clusters upgrade kafka --cluster-version=1.7.6
without the PDB, all my zookeeper nodes ended up on the same node near the end of the migration (the 3rd one) and then all got terminated simultaneously while migrating over the last node.
from kubernetes-kafka.
That's a simple and useful test, and I realize that "spreading" is an argument for getting rid of the split between persistent and non-persistent zookeeper (the reason to keep it is to better support quorums among 5 zk in a 3-zone cluster).
For a production cluster that has services scaling horizontally to 5 instances, wouldn't 6 nodes be considered minimum? Absence of a single node should always be a non-issue, and as soon as you co-locate instances on a node you're increasing risk.
from kubernetes-kafka.
In #13 (comment) @BenjaminDavison too suggests use of affinity. On the other hand just this week I heard positive results with "spreading" by re-creating with the service created first. I'd like to see a discussion about the current state of this in the Kubernetes community. It feels like an anti-pattern to have to use AnitAffinity in every manifest when in fact spreading is a crucial behavior for any horizontally scaled service.
from kubernetes-kafka.
Just FYI podAntiAffinity is extremely inefficient for large clusters as of Kubernetes 1.7. Until the implementation is rewritten, it is best to avoid it.
from kubernetes-kafka.
Thanks for the heads up. I still haven't investigated what the state of "spreading" is in 1.8, but I'll be testing quite a bit in the upcoming weeks.
from kubernetes-kafka.
The official docs mention the performance problems here
Inter-pod affinity and anti-affinity require substantial amount of processing which can slow down scheduling in large clusters significantly. We do not recommend using them in clusters larger than several hundred nodes.
What is "spreading"?
from kubernetes-kafka.
@solsson It has been a while since this has been updated. Are there any updates on what the recommended approach is? I'm wondering if it is worth the trouble to add anti-affinity rules into all the zookeeper and kafka pods or if it is better just to let it ride.
from kubernetes-kafka.
Related Issues (20)
- Zookeeper properties file needs an empty line at the end of the file HOT 3
- Run JMX exporter as a Java Agent (how to?) HOT 1
- Pod, Service and Statefull pending
- Error connecting to node kafka-0.broker.kafka.svc.cluster.local:9092 HOT 1
- Error processing /etc/kafka/zookeeper.properties.scale-5.pzoo-0 HOT 5
- Can you tell me about 10 brokers in Kafka- config.yml File parameters log.retention.hours= -1 and log.retention.hours=168 What's the difference?
- Release v6.0.4 Seems to be a Breaking Release? HOT 6
- ZooKeeper produce a zombie processes HOT 4
- Error processing /etc/kafka/zookeeper.properties.scale-5.pzoo-1 HOT 5
- Can't produce/consume with outside brokers HOT 1
- [Question] Getting started but no resources created?
- upstream bug: zookeeper 3.5.7 leader election seriously broken HOT 1
- How do I specify my own volumeclass / volume mount locations?
- Zookeeper Init:Error "/etc/kafka-configmap/init.sh: No such file or directory"
- Issue on external service (Kafka) HOT 1
- Incompatible with newer kustomize/kubectl
- Quickstart is broken (v6.0.3) HOT 1
- Auto scale Kafka partitions HOT 1
- Unable to successfully start pods - CrashLoopBackOff error HOT 1
- ARM64 Images for Kafka JMX Prometheus Exporter
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kubernetes-kafka.