rabbitmq / cluster-operator Goto Github PK
View Code? Open in Web Editor NEWRabbitMQ Cluster Kubernetes Operator
Home Page: https://www.rabbitmq.com/kubernetes/operator/operator-overview.html
License: Mozilla Public License 2.0
RabbitMQ Cluster Kubernetes Operator
Home Page: https://www.rabbitmq.com/kubernetes/operator/operator-overview.html
License: Mozilla Public License 2.0
Describe the bug
Images are uploaded to PivNet with latest
tag. That makes it hard to identify what version of the product they contain. May also lead to incorrect images being used in a deployment.
To Reproduce
$ docker load -i images/rabbitmq
Loaded image: registry.pivotal.io/p-rabbitmq-for-kubernetes-staging/rabbitmq:latest
$ docker load -i images/rabbitmq-for-kubernetes-operator
Loaded image: registry.pivotal.io/p-rabbitmq-for-kubernetes-staging/rabbitmq-for-kubernetes-operator:latest
Expected behavior
Images should be tagged with the product version (eg. 0.7.0-build.30
).
Is your feature request related to a problem? Please describe.
This feature request is not related to a problem. It's a request that should improve the usability of the RabbitmqCluster product.
Describe the solution you'd like
I would like to see events when I do kubectl describe rabbitmqcluster cluster0
. There is no extensive list of type of events that we should record. To start with, we can take a look of the status.conditions
work we've been doing. One idea could be that everytime we are updating the status.conditions
, we can record the reason and messages in events as well.
Describe alternatives you've considered
This feature is a nice-to-have. Without it, people can still get similar information looking at logs and status.conditions, but that will require more work from the user side.
Additional context
Type
, Reason
, and Message
. We should take a look of StatefulSet, Pod, or other k8s objects to align on wording and practices as much as we can.Is your feature request related to a problem? Please describe.
Describe the solution you'd like
Conflicts errors returned when updating any child objects are normal and expected, because our controller reconcile function is not the only actor that can update the state of these child objects. We should log the error, and implement retry logic instead of return on such errors (as simple as requeue the request). Link to our current controller which returns instead of requeue the request on any errors
Examples:
service account token controller
volume controller
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
During k8s maintenance (eg. upgrades), RabbitmqCluster Pods will be rescheduled to other Kubernetes worker nodes. If two Pods exist on the same worker node, then they will both be evicted at the same time causing the RabbitMQ cluster to lose quorum which can have a number of negative effects (entering pause-minority
prematurely, losing quorum in `quorum-queues e.t.c.). Kubernetes allows the user to configure the number Pods that can be unavailable through Pod Eviction at any given moment by declaring Pod Disruption Budgets. More information available here.
Describe the solution you'd like
Scenario: docs
When I follow the docs
Then I can set up PDBs for my cluster
Notes: docs should recommend using app.kubernetes.io/name
for label matching and mention that additional labels can be added to the CR and then used in PDBs.
Is your feature request related to a problem? Please describe.
This issue aims to track our thinking and any resulting work in terms adding lower-level CRDs that represent RabbitMQ primitives (e.g. vhost
, user
, queue
), as well as higher-level abstractions that build on top of those primitives in conjunction with the pre-existing rabbitmqcluster
CRD.
Describe the solution you'd like
Create a proposal that documents the different CRDs and their relationship with each other. Discuss alternatives for how to progress with the implementation of those CRDs (i.e. start bottom-up or go a different route).
Describe alternatives you've considered
See upcoming proposal that will be created as part of this issue.
Additional context
Pre-exiting work: https://miro.com/app/board/o9J_kvHmivs=/
At the moment, we do not load a backlog of messages in our cluster before we upgrade and test the results. This can be seen on these lines:
Loading a backlog of messages is useful as it tests our logic that certain nodes that are critical to synchronization should wait for the sync to complete before being rolled. Otherwise, messages may be lost.
The solution has two parts -
From the rabbitmq memory docs, we know that paging starts happening at 50% of the memory high watermark. The idea here is that paging will further add time for the synchronisation to occur. This in turn increases the likelihood that nodes need to wait for sync before they can be rolled. So, let's set the backlog to be above 50% of the memory high watermark to create this situation.
vm_memory_high_watermark.absolute
vm_memory_high_watermark_paging_ratio
The RabbitTestTool has a flag to set the initial backlog (initialPublish
perhaps), and the topology file also includes the size of each message. Set this combination such that (number of messages) * (size of a message) is about 70% of the high watermark, the value calculated in the previous section.
At this point, we have a backlog, and messages are being paged to disk.
CRDs are able to describe multiple API versions thanks to the use of the Open API V3 specification. In order for Kubernetes to manage the work of maintaining serving multiple versions, a few conventions have been made:
This means that:
Kubernetes will use a designated version to store metadata for all API versions. When multiple versions are being served by a CRD, a conversion to the storage must happen before the Custom Resource instance is persisted.
---------------------------- -------------- -------------- --------------
| | V2 | | V2 | | V1 | |
| kubectl apply -f v2.yaml |----->| Reconcile |--->| Conversion |--->| k8s |
| | | | | Webhook | | Database |
---------------------------- -------------- -------------- --------------
Metadata must not be lost across conversions
The implementation of a storage version means that it must contain all forms of metadata for all versions. This can either be persisted in annotations or in the API of the storage version itself.
--------------------------------
| apiVersion: V1 |
| metadata: |
> | annotations: |
/ | โ{spec.version2Prop: true}โ |
/ | spec: |
/ --------------------------------
|-----------------------|/
| apiVersion: V1 |
| spec: |
| version2Prop: true |
|-----------------------|\
\ |-----------------------|
\ | apiVersion: V1 |
\ | spec: |
> | version2Prop: true |
|-----------------------|
Keeping state in annotations is more flexible and easier to implement the saving of metadata. However, it complicates matters for the conversion webhook. In order to unpack the state in annotations and marshal that data into a struct, the webhook must first inspect the data to know what version to marshal data into (see comments here).
Having the storage version contain version2Prop
removes the complexity in the conversion webhook, however then you have a leaky API that contains properties that this version cannot utilise. This could potentially lead to reconciliation errors down the line.
We will suggest two possible solutions to implementing conversion webhooks into our workflow.
Please follow the steps provided in the readme of the multiversion crd demo
This example requires an operator and two API versions: v1 and v2. The operator follows the example set out in kubebuilder and implements a simple cronjob job deployment. v1 has a schedule property that is simply of type string
, and v2 has a schedule property that is a struct of type CronSchedule
.
The demo guides you through creating instance sample-one
that is v1, and sample-two
that is v2.
When you run make sample-one-schedule-as-v1
, as detailed in the demo steps, you will see the schedule of sample-one
as it should be, in its string form. However, when you run make sample-one-schedule-as-v2
, you will see the schedule in its CronJob
struct form.
This is a demonstration of the philosophy of multiple versions. There is no concrete idea of a Custom Resource that simply stays at one version, but can be viewed and modified as any serving version. You can modify sample-one
either as a v1 object, or a v2. You can follow the demo steps to see this for yourself.
Investigate how multiple controllers react to the watch events of different api versions.
Does every controller have their reconciliation loop triggered on the change event of every api version, or simply the api version it wishes to reconcile.
Currently the list of plugins is generated by the operator but can only be customized once the cluster is deployed. Since the CR should define the desired state of a deployment, RabbitMQ plugins should be a part of it.
Describe the solution you'd like
I should be able to specify plugins like this:
apiVersion: rabbitmq.pivotal.io/v1beta1
kind: RabbitmqCluster
metadata:
name: foo
spec:
rabbitmq:
additionalPlugins:
- rabbitmq_mqtt
spec.rabbitmq.additionalPlugins
The value of this field should be appended to the list of required plugins ([rabbitmq_peer_discovery_k8s, rabbitmq_prometheus, rabbitmq_management]
) so that the required plugins and the plugins listed here are all enabled.
By default spec.rabbitmq.additionalPlugins
should be empty.
I deploy the simplest cluster:
apiVersion: rabbitmq.pivotal.io/v1beta1
kind: RabbitmqCluster
metadata:
name: simple
When I kubectl edit
this cluster, I see:
apiVersion: rabbitmq.pivotal.io/v1beta1
kind: RabbitmqCluster
metadata:
name: simple
spec:
# other properties here
When I check rabbitmq-plugins list
I can see that rabbitmq_management
, rabbitmq_peer_discovery_k8s
and rabbitmq_prometheus
are enabled since they are essential and therefore always enabled.
I deploy a cluster with plugin configuration:
apiVersion: rabbitmq.pivotal.io/v1beta1
kind: RabbitmqCluster
metadata:
name: mqtt
spec:
rabbitmq:
additionalPlugins:
- rabbitmq_mqtt
When I kubectl edit
this cluster, I see:
apiVersion: rabbitmq.pivotal.io/v1beta1
kind: RabbitmqCluster
metadata:
name: mqtt
spec:
rabbitmq:
additionalPlugins:
- rabbitmq_mqtt
# other properties here
When I check rabbitmq-plugins list
I can see that only rabbitmq_mqtt
, rabbitmq_management
, rabbitmq_peer_discovery_k8s
and rabbitmq_prometheus
are enabled.
Is your feature request related to a problem? Please describe.
We would like the RabbitmqCluster
CRD to be as flexible as possible. That means the user of that CRD should be able to tweak pretty much ...
rabbitmqcluster
resource (and in some cases the grandchildren)To achieve this we will have to refactor the current CRD as it doesn't support any of the above. One pattern we have seen with some of the K8s core resources is to add templates for the children to the parent CRD (e.g. pod template for stateful sets). This seems to be something that would make sense here too.
This issue is meant to track the work to come up with an initial proposal. Subsequent implementation work will be tracked in separate issues in the future.
Describe the solution you'd like
Create a proposal for what a fully flexible rabbitmqcluster
CRD might look like.
Describe alternatives you've considered
See upcoming proposal that will be created as part of this issue.
Additional context
There's some pre-existing work.
Collaborator added: edbyford
Is your feature request related to a problem? Please describe.
There are many knobs and dials in the configuration of our upgrade tests. Currently, we invoke the RabbitTestTool (repo link) with these flags, and with this topology file.
Note here that we are only testing quorum queues here with the use of this policy file.
There are several decisions in both these files that we have some value for, but which may be fine tuned further. Let's try to understand what they mean, and what values would be suitable for our testing. For example,
Describe the solution you'd like
An understanding of the settings available, and baseline values to be used in common, and differently for different policies.
With regards to passing context.TODO()
everywhere. I've been seeing us doing this for a while now. When are we ever going to TODO
this properly and start passing a meaningful context? IMO, we should pass the context we get from the caller of the Reconcile
function. And then hand that through everywhere in our call graph.
Is your feature request related to a problem? Please describe.
Due to the variety of use cases different users can have for RabbitMQ, it is important for us to understand the trade off in requirements of the common use cases for our customers. This will inform us what kind of guarantees we need to ensure when performing an upgrade. We have done some work towards making some of these guarantees. This issue is going to be used to help us keep track of the current state of affairs, and help inform us of what needs to be done next.
Describe the solution you'd like
I would like for us to achieve the following:
Additional context
Here is the current state (to be improved) of our system requirements for performing an upgrade.
Timeline
No internal upgradeability claims - yes
Internal upgradeability claims - ?
External upgradeability claims beta- no
External upgradeability claims GA - no
Data loss
The control plane (Controller and CRD) is dependent on state that is stored in Kubernetes. This is not our responsibility. If the Kubernetes datastore (etcd) experiences data loss, then we lose the metadata for our CRs. The solution to this would be a restore of a previously made backup of etcd.
Consistency
Custom Resource metadata
Although we do not currently have an upgrade workflow, we do know that we will need to migrate properties of the CR between different versions. Without this, we could risk losing properties that have been set by the user if a user were to simply change the API version number of their custom resource.
Availability
Currently our controller-runtime is configured to activate leader election for simple active/passive HA. However, we only run one replica of our manager. This means that we do not have a highly available controller.
Repeatability (no surprises, partition tolerance, etc)
N/A - we do not have an upgrade workflow
Performance degradation
TODO
Usability (manual steps, bulk and single-instance upgrades, maintenance windows, etc)
N/A - we do not have an upgrade workflow
Timeline
No internal upgradeability claims - yes
Internal upgradeability claims - ?
External upgradeability claims beta- no
External upgradeability claims GA - no
Current state for each aspect of the system
Data loss
We are currently running an experiment to see if we experience any data loss with our current default configurations - https://github.com/pivotal/rabbitmq-for-kubernetes/projects/2#card-33229926
Consistency
RabbitMQ can be configured to maximise data consistency. We can only control certain aspects of RabbitMQ to ensure data consistency. Customers have to configure their RabbitMQ resources (exchanges, queues and messages e.t.c) to also minimize data loss in order for our mitigations of data loss to take effect.
We now have a preStop
container hook that will run rabbitmq-queues check_if_node_is_quorum_critical
and rabbitmq-queues check_if_node_is_quorum_critical
. This ensures that all distributed queues will have a chance to tell us if they are in a bad state (i.e. unsynchronised) before any RabbitMQ server nodes are taken down for maintenance, thereby keeping the data consistent.
Note:
This only helps us in the situation of a voluntary shutdown. The default value for โha-promote-on-shutdownโ is โwhen-synced. However, the default value for โha-promote-on-failureโ is โalwaysโ and can lead to data loss as the view here is availability over consistency. We will need to have a โplanโ that allows us to change this to โwhen-syncedโ if we want to make consistency a higher priority.
This does mean that maintenance times could potentially last a bit longer as we wait for queues to be synchronised which users may not like (given the frequency that Pod rollout could occur).
Availability
RabbitMQ can be configured to maximise data availability. Customers have to configure their RabbitMQ resources (exchanges, queues and messages e.t.c) to sacrifice data consistency for data availability.
We now have a preStop
container hook that will run rabbitmq-queues check_if_node_is_quorum_critical
and rabbitmq-queues check_if_node_is_quorum_critical
. This ensures that all distributed queues will have a chance to synchronise before any RabbitMQ server nodes are taken down for maintenance, thereby ensuring that if a node goes down that hosted the leader of the queue, high availability is still maintained because a new leader can be elected quickly.
Repeatability (no surprises, partition tolerance, etc)
N/A - we do not have an upgrade workflow
Performance degradation
N/A
Usability (manual steps, bulk and single-instance upgrades, maintenance windows, etc)
N/A - we do not have an upgrade workflow
After the finaliser work in #70 and #53, we set a finaliser in RabbitmqCluster objects. This will halt deletion of the object until the finalizer is removed. By deleting the namespace where the Operator lives, it immediately deletes the Operator, therefore the finaliser is never removed from RabbitmqCluster instances, which then halts the deletion of the Namespace.
Visible effects of this issue are stuck pipeline and make destroy
never completing if there is a rabbit object in the namespace.
We should change the destroy target to synchronously delete the CRD, which deletes all RabbitmqCluster objects in a cascading fashion, before proceeding to delete the Namespace and cluster scoped RBAC objects.
We should also review and adapt our docs if there is an uninstall procedure, as our customers will also hit this issue.
We should also check the behaviour with the Operator Chart and make sure helm uninstall
is not impacted.
Describe the bug
Files uploaded to PivNet have .tar
extension while they are in fact gzipped tar files and should therefore have .tar.gz
or .tgz
extension. Incorrect extension is confusing to some tools.
Expected behavior
Files uploaded to/downloaded from PivNet should be named .tar.gz
or .tgz
.
Additional context
https://github.com/pivotal/rabbitmq-for-kubernetes-ci/blob/master/tasks/upload-pivnet/task.yml#L37
After the work done in #70 and #53, the workflow of helm uninstall
will not work as expected if there are any RabbitmqCluster objects present in the cluster. During a helm uninstall
, all our resources will receive a signal to terminate at the same time; this will cause the Operator to terminate before the finalizer from RabbitmqCluster objects is removed, leaving those objects stuck in terminating state.
We should look into Chart Hooks, which allow to run a Kuberntes job just before deleting resources. We could combine this with a container with kubectl to delete all RabbitmqClusters before deleting the Operator and the Namespace.
Is your feature request related to a problem? Please describe.
Our pipeline only labels (deplab) images when the job label-images
is arbitrarily triggered. This will cause failures in our Nork pipeline because we added a new "scan root" to scan our images in PivNet registry.
Describe the solution you'd like
Our images should be labelled in every build run, or when there is a new version. Those two scenarios are practically the same. The difference would apply for RabbitMQ Docker images. RMQ image is pinned to a specific version and the process to upgrade this image is manual. The job to label RabbitMQ image should not trigger on every build run, but when there is a new pinned version of RMQ available.
Describe alternatives you've considered
Trigger the label-images
on every run. This would label and push RMQ image on every build run, which is unnecessary.
Additional context
N/A
Is your feature request related to a problem? Please describe.
Bump to the latest and greatest rabbitmq to take advantage the incremental upgrade.
Describe the solution you'd like
Use the docker image of the latest version.
Describe alternatives you've considered
N/A
Additional context
Make sure to also change the version in our pipeline.
Michal found that version 0.8.0 (next) is using RabbitMQ 3.8.2. Let's investigate this and confirm if this is the case for 0.7.0.
preStop
hook (infinite wait to terminate pods)Is your feature request related to a problem? Please describe.
We have a pipeline that runs the RabbitTestTool from Jack while performing a rolling update on our RabbitmqCluster StatefulSet. There has been hundreds of runs, and all reports are saved a postgress db in our GCP project
The RabbitTestTool reports on availability and data loss. We should try to understand under our current configuration and setup, what's the situation with data loss and availability during upgrades.
Describe the solution you'd like
Collect the results from our pipeline found here.
PLACEHOLDER - Please update the Upgrades road map and dedicated KEP for RabbitMQ upgrades with the results from this investigation. This is where we will keep context and these results will feed into further discussions.
If you think of any new investigations to run based on the results of this investigation, please create new issues and update the road map and KEPs accordingly.
Describe alternatives you've considered
N/A
Additional context
N/A
additionalPlugins
property allows arbitrary code execution through shell injection. The command to enable plugins is constructed by string concatenation, including user input (the value of additionalPlugins
). This means that a user who can create RabbitmqCluster
resource can also execute any command inside a RabbitMQ pod, even though that should be controlled through pods/exec
privilege.
To Reproduce
foo
kubectl edit rabbitmqcluster foo
spec:
rabbitmq:
additionalPlugins:
- rabbitmq_shovel; rabbitmqctl delete_vhost /
kubectl exec foo-rabbitmq-server-0 -- rabbitmqctl list_vhosts
Listing vhosts ...
Expected behavior
TBD how exactly this should be handled. Options include:
additionalPlugins
list that is not one of them (the list would include all plugins shipped as part of RabbitMQ at the time a given operator version was released, unfortunately we would have to make sure we keep it up to date when new plugins are added)Describe the bug
We have experienced some difficulty keeping track of outstanding PRs, potentially for a number of reasons:
Goals
I would like this issue to be a place to collect options for improving the PR review process. The focus should either be on tooling or proactive process changes
Anti-goals
The outcome of this issue should not be behavioural change. We don't think that people are wilfully not doing their jobs, rather that we haven't settled on the best approach for this experiment
Collaborator added: st3v
Is your feature request related to a problem? Please describe.
We currently define a prometheus
port on the client service for a given RabbitmqCluster
. Having that port there is misleading. It suggests to have Prometheus scrape this service for metrics on the specified port. Due to its load-balancing nature, if a user scraps this service with Prometheus they will...
Describe the solution you'd like
Remove the prometheus
port from the service. Adapt our docs accordingly.
Describe alternatives you've considered
We could think about adding the prometheus
port to the headless service which doesn't load-balance. Or alternatively, create additional client services, one for each RabbitMQ node. Such services would also be interesting for AMQP client-side load-balancing.
Additional context
n/a
Is your feature request related to a problem? Please describe.
The RabbitMQ node names are > 90 characters for a CR name of 9 of characters. For example, a CR with name mycluster
with 3 replicas will generate these 3 node names:
rabbit@mycluster-rabbitmq-server-0.mycluster-rabbitmq-headless.mynamespace.svc.cluster.local
rabbit@mycluster-rabbitmq-server-1.mycluster-rabbitmq-headless.mynamespace.svc.cluster.local
rabbit@mycluster-rabbitmq-server-2.mycluster-rabbitmq-headless.mynamespace.svc.cluster.local
Describe the solution you'd like
The node names could be rmq-<node-index>@<cr-name>.<namespace>
. For example: [email protected]
.
The environment variable RABBITMQ_NODENAME
can be used to set the node name. These names must be unique within a RMQ cluster. This variable could be templated in the Pod template of the StatefulSet.
Describe alternatives you've considered
We could use rabbit
as prefix and Pod name + namespace e.g. [email protected]
. This name still feels too verbose. Given the CR name and namespace it's easy to identify the Pods related to the CR.
Additional context
We had a conversation with PivNet registry team asking how can we use PivNet registry to publish images to customers. We learnt that we can publish images to selected users only. We want to test this because it would help A LOT to ease the deployment of our product; we specifically have PAs in mind here.
Can we start adding inline comments to our API types? So that kubectl explain
produces meaningful output. E.g...
$ k explain deployment.status.conditions
KIND: Deployment
VERSION: extensions/v1beta1
RESOURCE: conditions <[]Object>
DESCRIPTION:
Represents the latest available observations of a deployment's current
state.
DeploymentCondition describes the state of a deployment at a certain point.
FIELDS:
lastTransitionTime <string>
Last time the condition transitioned from one status to another.
lastUpdateTime <string>
The last time this condition was updated.
message <string>
A human readable message indicating details about the transition.
reason <string>
The reason for the condition's last transition.
status <string> -required-
Status of the condition, one of True, False, Unknown.
type <string> -required-
Type of deployment condition.
Describe the bug
At the moment, our operator hard-codes the cluster domain in the peer-discovery-related env vars of the pod template to cluster.local
. I.e.
apiVersion: apps/v1
kind: StatefulSet
metadata:
...
spec:
...
template:
spec:
...
containers:
- env:
...
- name: RABBITMQ_NODENAME
value: rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local
- name: K8S_HOSTNAME_SUFFIX
value: .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local
...
Now while cluster.local
is the most common domain used for K8s clusters, it is configurable by the cluster admin and there are clusters that use alternative domains. One such example are guest clusters running in Project Pacific. They use managedcluster.local
as their cluster domain.
$ k -n kube-system get configmap coredns -oyaml
kind: ConfigMap
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health
kubernetes managedcluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
...
If DNS for the K8s cluster is configured with such an alternative domain, the pods in our StatefulSet
are failing to start up with the following log message ...
$ k get pod
NAME READY STATUS RESTARTS AGE
on-demand-claim-rabbitmq-server-0 0/1 CrashLoopBackOff 8 17m
$ k logs on-demand-claim-rabbitmq-server-0
ERROR: epmd error for host on-demand-claim-rabbitmq-server-0.on-demand-claim-rabbitmq-headless.default.svc.cluster.local: nxdomain (non-existing domain)
To Reproduce
Steps to reproduce the behavior:
cluster.local
RabbitmqCluster
Expected behavior
Two alternatives.
RabbitmqCluster
without changing anything and it just works despite the cluster domain not being cluster.local
OR
clusterDomain
property on the RabbitmqCluster
resource. If not specified by the user, this property defaults to cluster.local
The first option would be much preferred and seems technically feasible without too much effort.
Is your feature request related to a problem? Please describe.
This should solve an integration test flake as a side effect.
Test flake is because of a problem using stale cache during an update action in apiserver. Similar issue was reported in this github issue, and later fixed in this PR. The fix is included in kubernetes 1.16
.
We are currently on kubebuilder 2.2
which uses controller-runtime 0.4
. Once we update to kubebuilder 2.3.0
, we will be using controller-runtime 0.5
, which uses a kubernetes 1.16.4
compatible testEnv (as seen in CR release notes).
Describe the solution you'd like
There has been a new kubebuilder release. release notes
We should take action to upgrade to the latest kubebuilder.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Kubebuilder 2.3.0
uses Go 1.13
Is your feature request related to a problem? Please describe.
There is a new version of KSM 0.7, which uses helm 3. We would like to verify that our helm chart can still be deployed using the new KSM.
Additional context
Talk to @mkuratczyk about how to set up an environment to deploy KSM.
Is your feature request related to a problem? Please describe.
We currently warn about different memory request and limit in the operator logs. This should be exposed in the Status field as well. We concluded that we will have a ResourceProblem
condition that will represent this state. This condition will also check RMQ alarms and misconfigured resource requests in the CR.
Describe the solution you'd like
ResourceProblem (revisit this name)
Add these in the status field of the Cluster CRD. Set them during reconciliation.
For each condition above:
Scenario: Happy path ๐
Given that I have a RabbitMQ cluster deployed
And my memory request matches the memory limit
And there are no alarms in RabbitMQ
When I describe the RabbitmqCluster CR
Then I see the 'ResourceProblem' condition with status set to 'False'
Scenario: Memory request != memory limits
Given I provision RabbitmqCluster with memory request != memory limits
When I describe the RabbitmqCluster CR
Then I see the 'ResourceProblem' condition with status set to 'True'
And the 'reason' and 'message' fields explain the state
Scenario: RabbitMQ alarm
Given I have a RabbitMQ cluster
And it has a memory alarm triggered
When I describe the RabbitmqCluster CR
Then I see the 'ResourceProblem' condition with status set to 'True'
And the 'reason' and 'message' fields explain the state
Describe alternatives you've considered
See design document.
Additional context
If consulting the state of RabbitMQ itself to discover alarms is too complex, you may create a separate issue.
The conditions should follow the Kubernetes API conventions.
This document has the conclusions of our team meeting and how we want the conditions to look like.
Is your feature request related to a problem? Please describe.
In #171080093, we added a couple of conditions. As a result, we now have the infrastructure to implement all the other conditions.
At the moment our CR status doesn't report on reconciliation progress and potential failures during reconciliation.
Describe the solution you'd like
In this story, let's implement the conditions related to reconciliation as we discussed previously.
Conditions:
Reconcilable:
InProgress (represents if the cluster is being reconciled) as discussed, InProgress will be addressed in a separate issue
Add these in the status field of the Cluster CRD. Set them during reconciliation.
For each condition above:
Scenario: Reconcilable Happy path ๐
Given that I deploy a RMQ cluster
And there are no reconciliation failures for any k8s child resource
When I describe the RabbitMQCluster CR
Then I see the condition `Reconcilable` in the status field set to `True`
Scenario: Reconcilable Unhappy path ๐
Given that I deploy a RMQ cluster
And a k8s child resource fails to reconcile
When I describe the RabbitMQCluster CR
Then I see the `Reconcilable` condition in the status field set to `False`
And the condition has a reason and a message field explaining the conditition status
Describe alternatives you've considered
See design document.
Additional context
The conditions should follow the Kubernetes API conventions.
This document has the conclusions of our team meeting and how we want the conditions to look like.
Edit: Reconcilable condition happy state is True. It was False before.
Is your feature request related to a problem? Please describe.
We are not automatically testing our PRs. If we forgot to run some tests before merging (which just happened this week after merging the status.conditions PR), we can break our master branch. We are working under the assumption that the master brach always works, so we have confidence to rebase on master when we work on WIP branches.
Describe the solution you'd like
Add a github hook, that will automatically run unit, integration, and system tests in a separate pipeline, and report the test results (failure or success) back the PRs. New PR should trigger a test run, and new commits to an existing PR should also trigger a test run.
Describe alternatives you've considered
Alternatively, we can keep what we have been doing: replied on the person creating and working on the PRs to run those tests. However, it could be easy to forget.
Is your feature request related to a problem? Please describe.
We do not currently have a solidified work flow for upgrading from CRD version A to B. We do not know what happens when multiple versions of an API need to transition to and from each other.
We need a place to discuss both the problems and solutions for upgrading CRDs.
Describe the solution you'd like
We should use a KEP to help collaborate efforts to explore problems and solutions related to upgrading the operator.
This issue can be closed when:
Progress
apiVersion
Is your feature request related to a problem? Please describe.
At the moment, our status conditions do not track the lastTransitionTime
. Eg:-
$ k get -oyaml rabbitmqclusters.rabbitmq.pivotal.io fj
...
<other resource information>
...
status:
conditions:
- lastTransitionTime: "1970-01-01T00:00:00Z"
reason: AllPodsAreReady
status: "True"
type: AllNodesAvailable
- lastTransitionTime: "1970-01-01T00:00:00Z"
reason: AtLeastOneNodeAvailable
status: "True"
type: ClusterAvailable
These are hardcoded to the Unix start time for now. We would like to do better, and track the last time the status condition transitioned.
Describe the solution you'd like
We pass the current status condition to the Condition
function for each condition manager. The Condition
function works out the values for status
and reason
, and if they are different from that of the current status condition, updates the lastTransitionTime
with the current time (also the lastProbeTime
).
Describe alternatives you've considered
The solution seems straightforward enough, did not consider alternatives. Happy for more suggestions :)
Is your feature request related to a problem? Please describe.
Currently, the user of the RabbitmqCluster
CR must know what the names of the Secret
and client Service
resources are in order to retrieve the necessary information to connect to the RabbitMQ instance as an admin. This is obviously problematic from the perspective of usability. Moreover, this causes the naming of these internal resources to become part of the API. If we ever change the internal names, users will face problems.
Describe the solution you'd like
Have RabbitmqCluster.status
point the user to the necessary Secret
and Service
using a reference that specifies the resources' name and its namespace. For the secret, the reference should also describe the names of the individual keys.
apiVersion: rabbitmq.pivotal.io/v1beta1
kind: RabbitmqCluster
metadata:
...
spec:
...
status:
conditions:
...
admin:
secretRef:
name: on-demand-claim-rabbitmq-admin
namespace: default
keys:
password: password
username: username
serviceRef:
name: on-demand-claim-rabbitmq-ingress
namespace: default
Describe alternatives you've considered
n/a
Additional context
I'd be open to suggestions on alternative naming for the proposed properties.
Is your feature request related to a problem? Please describe.
The use of keyword bases
in kustomize
was deprecated in Kustomize 2.1.0 in favour of keyword resources
. We are using kustomize
3.5.x and always keeping up to the latest.
Describe the solution you'd like
Adapt all our Kustomization files to not use deprecated keywords. Some reorganisation of our manifests might be needed. This will require some exploration.
Describe alternatives you've considered
N/A
Additional context
We use kubectl apply -k
to apply Kustomize templates. This option of kubectl
does not support using folders in the resources
section. The recommended way to use Kustomize templates is piping the output Kubectl as kustomize build some-folder/ | kubectl apply -f -
.
Projects generated with latest Kubebuilder use kustomize
with pipes to kubectl
.
Is your feature request related to a problem? Please describe.
Starting with 0.7, we will publish our images to PivNet Registry. That gives us the opportunity to significantly improve the installation experience.
Describe the solution you'd like
The most obvious tool to take advantage of these images is Helm. While it will require credentials to access PivNet Registry (images can't be public right now), it should be able to install the operator easily with no need to relocate the images.
A separate problem is how to distribute that chart but for now let's just build it. Perhaps there is a chart repository we could upload it to later.
Describe alternatives you've considered
Image relocation should be optional for those who have direct access to PivNet Registry. Any tool other than Helm could be used to deploy the operator. However, given Helm's popularity and the fact we already use it for KSM, I think it makes sense to use it here as well.
Describe the bug
Both an external user and we have noticed intermittently that the controller manager will crash with the following error. Here are some logs from the external user:
2020-03-30T09:56:59.320Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "rabbitmqcluster", "request": "pivotal-rabbitmq-system/rabbitmq"}
I0330 09:59:53.294085 1 leaderelection.go:288] failed to renew lease pivotal-rabbitmq-system/pivotal-rabbitmq-operator-leader-election: failed to tryAcquireOrRenew context deadline exceeded
2020-03-30T09:59:53.310Z ERROR setup problem running manager {"error": "leader election lost"}
github.com/go-logr/zapr.(*zapLogger).Error
/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
main.main
/workspace/main.go:87
runtime.main
/usr/local/go/src/runtime/proc.go:203
Here is an example of it succeeding:
I0330 09:56:40.443087 1 leaderelection.go:242] attempting to acquire leader lease pivotal-rabbitmq-system/pivotal-rabbitmq-operator-leader-election...
I0330 09:56:58.357420 1 leaderelection.go:252] successfully acquired lease pivotal-rabbitmq-system/pivotal-rabbitmq-operator-leader-election
It takes roughly 18 seconds for the lease to be acquired for the particular Kubernetes cluster (customer mentioned that there could be performance issues due to the nature of the cluster; a slow nested environment).
Expected behavior
Ideally we would like to reduce this kind of behaviour. We can achieve this by doing two things:
Additional context
Is your feature request related to a problem? Please describe.
Currently, the client service has a port called http
. I think that's not descriptive enough as it doesn't tell the user what kind of RabbitMQ API is being served on this port.
Describe the solution you'd like
Rename the http
port to management
as that is the API served on that port.
Describe alternatives you've considered
n/a
Additional context
n/a
Collaborator added: zvance
Is your feature request related to a problem? Please describe.
It will be helpful to have a shared glossary that documents some of the terminologies we use within this project. This is to make sure we're all on the same page but also to help catch-up people new to this repo. This can be seen as the first step to more developer-centric docs that live inside this repo, in contrast to the already existing end-user-centric docs that live in a separate repo.
Describe the solution you'd like
glossary.md
in docs
.Describe alternatives you've considered
n/a
Additional context
Look at Cluster API and other projects for inspiration.
Customer tested 0.7.0-build.41 with KSM 0.7. On binding, they received the following error:
error alert message,Service broker error: Bind failure: unable to get credentials from bind template: RUNTIME ERROR: Field does not exist: ip 7:66-110 thunk <ingressIP> from <object <anonymous>> 10:29-38 thunk <mgmtURI> from <object <anonymous>> 17:20-27 object <anonymous> During manifestation
Ingress service looks like this:
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "rabbitmq-rabbitmq-ingress",
"namespace": "ksm-6e0ca5c4-1bee-4d2c-83e0-b23b0c94f5ee",
"selfLink": "/api/v1/namespaces/ksm-6e0ca5c4-1bee-4d2c-83e0-b23b0c94f5ee/services/rabbitmq-rabbitmq-ingress",
"uid": "e593a2ef-cb7e-4a3d-a689-9396adac2383",
"resourceVersion": "4066167",
"creationTimestamp": "2020-04-01T12:51:03Z",
"labels": {
"app.kubernetes.io/component": "rabbitmq",
"app.kubernetes.io/name": "rabbitmq",
"app.kubernetes.io/part-of": "pivotal-rabbitmq"
},
"ownerReferences": [
{
"apiVersion": "rabbitmq.pivotal.io/v1beta1",
"kind": "RabbitmqCluster",
"name": "rabbitmq",
"uid": "610757d2-e605-4c44-801f-8705000ba8ab",
"controller": true,
"blockOwnerDeletion": true
}
]
},
"spec": {
"ports": [
{
"name": "amqp",
"protocol": "TCP",
"port": 5672,
"targetPort": 5672,
"nodePort": 30188
},
{
"name": "http",
"protocol": "TCP",
"port": 15672,
"targetPort": 15672,
"nodePort": 30253
},
{
"name": "prometheus",
"protocol": "TCP",
"port": 15692,
"targetPort": 15692,
"nodePort": 31100
}
],
"selector": {
"app.kubernetes.io/name": "rabbitmq"
},
"clusterIP": "10.100.200.28",
"type": "LoadBalancer",
"sessionAffinity": "None",
"externalTrafficPolicy": "Cluster"
},
"status": {
"loadBalancer": {
"ingress": [
{
"hostname": "ae593a2efcb7e4a3da6899396adac238-1441943493.us-east-1.elb.amazonaws.com"
}
]
}
}
}
Is your feature request related to a problem? Please describe.
We run preStop
container lifecycle hook before any pod termination. The hook checks the synchronization status of mirror queues and quorum status of quorum queues. When users are deleting the entire RabbitmqCluster
, the preStop
hook becomes unnecessary, and we should skip the hook instead.
We have already seen issue related to the hook in our pipeline. Deletion of pods in our upgrade pipeline was once halted for 10 hours because a quorum queue not being able to recover.
Describe the solution you'd like
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Kubebuilder book's chapter on finalizers: https://book.kubebuilder.io/reference/using-finalizers.html
Is your feature request related to a problem? Please describe.
Currently a basic RabbitMQ config file is generated by the operator but can only be customized once the cluster is deployed. Since the CR should define the desired state of a deployment, RabbitMQ configuration should be a part of it.
Describe the solution you'd like
I should be able to specify configuration like this:
apiVersion: rabbitmq.pivotal.io/v1beta1
kind: RabbitmqCluster
metadata:
name: foo
spec:
rabbitmq:
additional_config: |
vm_memory_high_watermark.relative = 0.6
The value of this field should be appended to the configuration we currently generate so that the value in the ConfigMap contains settings from both (first ours and then custom).
Changes to this property should trigger a StatefulSet rolling restart to reload configuration.
This is to start the conversion on adding support for TLS to our operator. I don't have any details in mind yet but wanted to see if there any takers to define the use case and drive forward the technical design through a dedicated enhancement proposal.
Describe the bug
When testing a rolling upgrade monitored with our recommended monitoring approach using the ServiceMonitor Custom Resource, we observed that Pod metrics disappear when the pod enters Terminating
. This would appear to be because the Pod loses its endpoint in this phase transition.
To Reproduce
Steps to reproduce the behavior:
kubectl rollout restart sts <sts-name>
Raft members with >5k entries in the log
graphExpected behavior
Screenshots
Is your feature request related to a problem? Please describe.
As part of re-writing the README, I wanted to reference an example custom resource YAML. We can and do link to the pivnet docs but they aren't quite suited for local development. It also seems like a good idea to have an YAML API reference in the actual codebase. Ideally we would want this to require as little manual input as possible and be designed in such a way as to update itself as the Golang API changes.
Describe the solution you'd like
A process which takes the RabbitmqCluster Golang API definition and converts it into YAML, updating the cr-example.yaml referenced in the README
Describe alternatives you've considered
Is your feature request related to a problem? Please describe.
Our repo doesn't have a proper readme right now. Let's fix that by creating the first version of a readme that contains actual information on what this project is about etc.
Describe the solution you'd like
Self-explanatory: cerate intial readme.md
Describe alternatives you've considered
n/a
Additional context
Look at other projects such as Cluster API for inspiration
We need a workflow to upgrade the Operator. CRD upgrades (e.g. going from RabbitmqCluster v1beta1 to v1) are out of the scope for this issue. We want to explore the risks and impacts of upgrading just the operator. A few questions or scenarios that could help with this explore:
Risk: what happens if we introduce a new config map (or any child resource) that requires a modification to STS
Scenario:
We create a new ConfigMap (e.g. to provide advanced.config
file for RMQ). This will require a modification to the StatefulSet. In our current model, this should work without issues because we always reconcile the STS inline with our opinions (assuming a rolling restart is not an issue).
This will become an issue when full CRD flexibility #14 is implemented because this will require a change to the STS template. In this scenario, we will create a new ConfigMap that won't be used by the existing STS. It would require manual input to start using this new ConfigMap. How can we address this situation?
Describe the bug
Running make integration-tests
fails roughly 50% of the times with error:
------------------------------
โข Failure [1.032 seconds]
RabbitmqclusterController
/Users/pivotal/workspace/rabbitmq-for-kubernetes/controllers/rabbitmqcluster_controller_test.go:36
Persistence configurations
/Users/pivotal/workspace/rabbitmq-for-kubernetes/controllers/rabbitmqcluster_controller_test.go:347
creates the RabbitmqCluster with the specified storage from instance spec [It]
/Users/pivotal/workspace/rabbitmq-for-kubernetes/controllers/rabbitmqcluster_controller_test.go:353
Timed out after 1.005s.
Expected success, but got an error:
<*errors.StatusError | 0xc000a4b7c0>: {
ErrStatus: {
TypeMeta: {Kind: "", APIVersion: ""},
ListMeta: {
SelfLink: "",
ResourceVersion: "",
Continue: "",
RemainingItemCount: nil,
},
Status: "Failure",
Message: "statefulsets.apps \"rabbit-persistence-1-rabbitmq-server\" not found",
Reason: "NotFound",
Details: {
Name: "rabbit-persistence-1-rabbitmq-server",
Group: "apps",
Kind: "statefulsets",
UID: "",
Causes: nil,
RetryAfterSeconds: 0,
},
Code: 404,
},
}
statefulsets.apps "rabbit-persistence-1-rabbitmq-server" not found
/Users/pivotal/workspace/rabbitmq-for-kubernetes/controllers/rabbitmqcluster_controller_test.go:657
------------------------------
2020-03-11T13:53:19.852Z INFO controllers.rabbitmqcluster Operation Result "unchanged" for resource "rabbit-resource-2-rabbitmq-server" of Type *v1.StatefulSet
2020-03-11T13:53:19.852Z ERROR controllers.rabbitmqcluster Failed to CreateOrUpdate {"error": "Operation cannot be fulfilled on statefulsets.apps \"rabbit-resource-2-rabbitmq-server\": StorageError: invalid object, Code: 4, Key: /registry/statefulsets/rabbit-resource-2/rabbit-resource-2-rabbitmq-server, ResourceVersion: 0, AdditionalErrorMs
g: Precondition failed: UID in precondition: 91a0102e-cb7a-4d54-8c01-5c381f5d06c1, UID in object meta: "}
github.com/go-logr/zapr.(*zapLogger).Error
This is also causing our pipeline to go red requiring manual re-runs. Since the probability of this error is so high, let's fix it!
To Reproduce
Run
$ make generate fmt vet manifests
$ ginkgo -r -untilItFails controllers/
This is identical to what make integration-tests
runs except ginkgo is configured to run the tests until they fail.
A failure will be seen in the first few runs.
Expected behavior
Running
$ make generate fmt vet manifests
$ ginkgo -r -untilItFails controllers/
does not cause the tests to fail. (Can wait for about 10 runs to get relative confidence).
Additional context
Please add additional context here as you dig deeper into this :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.