Code Monkey home page Code Monkey logo

Comments (9)

solsson avatar solsson commented on July 24, 2024 2

What is "spreading"?

It's the most common term I found among the dicussions in #70 (comment).

My own conclusion: Let's say you have a cluster where nodes come and go (which is common) and you do for example kubectl set image (which is also common) a scheduler must have some default logic to avoid placing all the replacment pods on the same recently added node. The problem is that I've observed the opposite behavior in GKE. I've also observed the expected behavior, but I fail to identify the causes.

from kubernetes-kafka.

solsson avatar solsson commented on July 24, 2024

Good that you brought this up. I noticed that both are present in the recent blog post http://blog.kubernetes.io/2017/09/kubernetes-statefulsets-daemonsets.html too, and they explain it here. I'd gladly accept contributions, but preferrably with some proof of the benefits.

PodDisruptionBudgets: We just haven't found the time to investigate the need for it, and our policy to run with defaults until we learn exactly why we should override them. Manifests need maintenance too.

AntiAffinity: I did quite a bit of research on this for other services in our cluster, and "spreading" should actually be the default behavior, at least if the service is created first. Below is an exerpt from our internal issue tracker. We've observed this to be flaky for Deployment, sometimes spreading and sometimes not - possibly due to services or resource limits, but our Kafka and Zookeeper pods we've actually never seen on the same node in production.

kubernetes/kubernetes#2312
"The code we have already spreads all pods belonging to the same replication controller."

kubernetes/kubernetes#11144 (comment)
"there are other reasons services should be started first"

kubernetes/kubernetes#11369
"We don't handle the case of multiple services matching the same Pod very well"

kubernetes/kubernetes#10242
"Scheduler needs to deal with pods without resource limits"
"Create an rc with 100 pods in a custom namespace and you'll end up with all 100 on the same node"

kubernetes/kubernetes#21074
"selector_spreading functionality in scheduler"

https://github.com/kubernetes/kubernetes/pull/21235/files#diff-d44336036b627f815adec0707e648e4fL68
"CalculateSpreadPriority" removed, "SelectorSpreadPriority" added?

kubernetes/kubernetes#4971
"We are now spreading by both."

kubernetes/kubernetes#41708
"The scheduler SelectorSpread priority funtion didn't have the code to spread pods of StatefulSets."
(merged after 1.5)

kubernetes/kubernetes#27484
"We should investigate (1) if you request and limit 0, does it spread evenly"

https://stackoverflow.com/questions/37784480/avoiding-kubernetes-scheduler-to-run-all-pods-in-single-node-of-kubernetes-clust
"The scheduler should spread your pods if your containers specify resource request for the amount of memory and CPU they need"

from kubernetes-kafka.

adamresson avatar adamresson commented on July 24, 2024

Yeh, the PDB, in this case, in my mind seems to be quite valuable as it ensure that's both Kafka and Zookeeper keep their minimum amount of nodes in the cluster to function properly.

I created a simple 3 node cluster in GKE with the following commands:

gcloud container clusters create kafka --cluster-version=1.7.5


kc apply -f ./zookeeper/
# let things settle
kc apply -f ./
# let them settled again

# upgrade master
gcloud container clusters upgrade kafka --cluster-version=1.7.6 --master

# upgrade nodes
gcloud container clusters upgrade kafka --cluster-version=1.7.6

without the PDB, all my zookeeper nodes ended up on the same node near the end of the migration (the 3rd one) and then all got terminated simultaneously while migrating over the last node.

from kubernetes-kafka.

solsson avatar solsson commented on July 24, 2024

That's a simple and useful test, and I realize that "spreading" is an argument for getting rid of the split between persistent and non-persistent zookeeper (the reason to keep it is to better support quorums among 5 zk in a 3-zone cluster).

For a production cluster that has services scaling horizontally to 5 instances, wouldn't 6 nodes be considered minimum? Absence of a single node should always be a non-issue, and as soon as you co-locate instances on a node you're increasing risk.

from kubernetes-kafka.

solsson avatar solsson commented on July 24, 2024

In #13 (comment) @BenjaminDavison too suggests use of affinity. On the other hand just this week I heard positive results with "spreading" by re-creating with the service created first. I'd like to see a discussion about the current state of this in the Kubernetes community. It feels like an anti-pattern to have to use AnitAffinity in every manifest when in fact spreading is a crucial behavior for any horizontally scaled service.

from kubernetes-kafka.

StevenACoffman avatar StevenACoffman commented on July 24, 2024

Just FYI podAntiAffinity is extremely inefficient for large clusters as of Kubernetes 1.7. Until the implementation is rewritten, it is best to avoid it.

from kubernetes-kafka.

solsson avatar solsson commented on July 24, 2024

Thanks for the heads up. I still haven't investigated what the state of "spreading" is in 1.8, but I'll be testing quite a bit in the upcoming weeks.

from kubernetes-kafka.

StevenACoffman avatar StevenACoffman commented on July 24, 2024

The official docs mention the performance problems here

Inter-pod affinity and anti-affinity require substantial amount of processing which can slow down scheduling in large clusters significantly. We do not recommend using them in clusters larger than several hundred nodes.

What is "spreading"?

from kubernetes-kafka.

coderroggie avatar coderroggie commented on July 24, 2024

@solsson It has been a while since this has been updated. Are there any updates on what the recommended approach is? I'm wondering if it is worth the trouble to add anti-affinity rules into all the zookeeper and kafka pods or if it is better just to let it ride.

from kubernetes-kafka.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.