openshift / ibm-roks-toolkit Goto Github PK
View Code? Open in Web Editor NEWTooling and controllers to enable hosted control plane OpenShift clusters running on IBM Cloud.
License: Apache License 2.0
Tooling and controllers to enable hosted control plane OpenShift clusters running on IBM Cloud.
License: Apache License 2.0
See #150 (review)
The configmap should be changed to a secret. The bootstrap pod should look for both configmaps and secrets to apply.
The kube-apiserver operator performs this function for a self-hosted cluster.
We need something to take its place in a ROKS cluster:
https://github.com/openshift/cluster-kube-apiserver-operator/blob/c7b1e077d913fbe6e89d4c10312346234e31e4e4/pkg/operator/configmetrics/configmetrics.go#L20
There is a timing issue that may leave role bindings (see example OpenShift API server logs below) missing for hours after a cluster deployment. The missing shared-resource-viewers
role binding causes oc new-app --name myapp https://github.com/openshift/nodejs-ex.git
to fail
to build due to error error: build error: After retrying 2 times, Pull image still failed due to error: unauthorized: authentication required
. There are likely other impacts beyond this example. Eventually the missing role bindings are created hours later thus allowing oc new-app
to work.
E0413 16:44:39.080537 1 storage_rbac.go:316] unable to reconcile rolebinding.rbac.authorization.k8s.io/shared-resource-viewers in openshift: rolebindings.rbac.authorization.k8s.io "shared-resource-viewers" is forbidden: could not list rolebinding restrictions: the server could not find the requested resource (get rolebindingrestrictions.authorization.openshift.io)
E0413 16:42:58.333934 1 storage_rbac.go:316] unable to reconcile rolebinding.rbac.authorization.k8s.io/system:node-config-reader in openshift-node: rolebindings.rbac.authorization.k8s.io "system:node-config-reader" is forbidden: could not list rolebinding restrictions: the server could not find the requested resource (get rolebindingrestrictions.authorization.openshift.io)
The failure is when trying to get the release image info.
Failed to use the image from a secured registry even by providing the pull secret. No issue while using it via oc client 4.12 oc adm release info
.
# grep releaseImage cluster.yaml
releaseImage: registry.ng.bluemix.net/armada-multi-master/ocp-release:4.11.4-multi
# ./ibm-roks render --pull-secret ~/.docker/config.json
FATA[0000] Error occurred rendering manifests error="unable to read image registry.ng.bluemix.net/armada-multi-master/ocp-release:4.11.4-multi: Head \"https://registry.ng.bluemix.net/v2/armada-multi-master/ocp-release/manifests/4.11.4-multi\": unauthorized: The login credentials are not valid, or your IBM Cloud account is not active."
Failed to use the image from quay.io which is manifest list related.
# grep releaseImage cluster.yaml
releaseImage: quay.io/openshift-release-dev/ocp-release:4.11.4-multi
# ./ibm-roks render --pull-secret ~/.docker/config.json
FATA[0001] Error occurred rendering manifests error="unable to parse image quay.io/openshift-release-dev/ocp-release:4.11.4-multi: unknown image manifest of type *manifestlist.DeserializedManifestList from manifest sha256:53679d92dc0aea8ff6ea4b6f0351fa09ecc14ee9eda1b560deeb0923ca2290a1"
The new ROKS metrics component is missing CPU and memory requests causing OCP conformance test [sig-arch] Managed cluster should ensure control plane pods do not run in best-effort QoS [Suite:openshift/conformance/parallel]
to fail.
There's a race condition at cluster initialization with the manifest bootstrapper pod that will have it continuously fail and crash loop until the first worker is successfully provisioned in the cluster and the monitoring operator can roll out and initialize the servicemonitors.monitoring.coreos.com
CRD.
securitycontextconstraints.security.openshift.io 2020-10-24T00:01:18Z
servicecas.operator.openshift.io 2020-10-24T00:01:38Z
servicemonitors.monitoring.coreos.com 2020-10-24T00:15:27Z
You can see the creation of that corresponds to a couple minutes after the first worker node comes up in my cluster
apiVersion: v1
kind: Node
metadata:
annotations:
projectcalico.org/IPv4Address: 10.93.34.24/26
projectcalico.org/IPv4IPIPTunnelAddr: 172.30.32.192
creationTimestamp: "2020-10-24T00:13:08Z"
Which then initializes the CRD and everything completes. This race condition that causes the manifest bootstrapper pod to CrashLoop until the servicemonitors CRD is initialized I believe can be removed if we either substantiate it with the manifest bootstrapper or rework the creation of it.
It ultimately will complete on the first iteration when the openshift cluster has a node join in successfully and runs the cluster-monitoring-operator-f7b47f45-kw7c4
For more details here where the manifest is defined:
https://github.com/openshift/cluster-monitoring-operator/blob/170f91faabc9683a34df29d1d892027292ed0296/manifests/0000_50_cluster-monitoring-operator_00_0servicemonitor-custom-resource-definition.yaml
There's two that I see get applied:
https://github.com/openshift/ibm-roks-toolkit/blob/master/assets/roks-metrics/roks-metrics-servicemonitor.yaml
https://github.com/openshift/ibm-roks-toolkit/blob/release-4.4/assets/cluster-bootstrap/cluster-kube-apiserver-servicemonitor.yaml
This also might be fine to accept but just thought I'd point it out.
Telemetry out of ROKS includes entries like cluster_version{type="initial",image="registry.ng.bluemix.net/armada-master/ocp-release:4.6.22-x86_64",...}
. Using by-tag pullspecs from trusted registries is not dire, but pivoting to by-digest pullspecs protects you from compromised registries, mutating tags, and other excitement that can happen as an image flows out of Red Hat's build pipeline (with a signature) and over to the new cluster, until the cluster eventually updates to a by-digest pullspec. Some details on mutable-tag concerns in openshift/oc#390. Can we adjust to:
Copying from https://github.ibm.com/alchemy-containers/armada-update/issues/2617
From tugboat apiserver logs, counts of reads and writes by control-plane-operator for one cluster over about 2 minutes:
161 verb="GET" URI="/apis/apps/v1/namespaces/master-c465jhg20s3mckhh6s80/deployments/openshift-apiserver" userAgent="control-plane-operator/v0.0.0 (linux/amd64) kubernetes/$Format"
161 verb="PUT" URI="/apis/apps/v1/namespaces/master-c465jhg20s3mckhh6s80/deployments/openshift-apiserver" userAgent="control-plane-operator/v0.0.0 (linux/amd64) kubernetes/$Format"
161 verb="PUT" URI="/api/v1/namespaces/master-c465jhg20s3mckhh6s80/configmaps/openshift-apiserver-config" userAgent="control-plane-operator/v0.0.0 (linux/amd64) kubernetes/$Format"
162 verb="GET" URI="/apis/apps/v1/namespaces/master-c465jhg20s3mckhh6s80/deployments/openshift-controller-manager" userAgent="control-plane-operator/v0.0.0 (linux/amd64) kubernetes/$Format"
162 verb="PUT" URI="/apis/apps/v1/namespaces/master-c465jhg20s3mckhh6s80/deployments/openshift-controller-manager" userAgent="control-plane-operator/v0.0.0 (linux/amd64) kubernetes/$Format"
162 verb="PUT" URI="/api/v1/namespaces/master-c465jhg20s3mckhh6s80/configmaps/openshift-controller-manager-config" userAgent="control-plane-operator/v0.0.0 (linux/amd64) kubernetes/$Format"
322 verb="GET" URI="/api/v1/namespaces/master-c465jhg20s3mckhh6s80/configmaps/openshift-apiserver-config" userAgent="control-plane-operator/v0.0.0 (linux/amd64) kubernetes/$Format"
324 verb="GET" URI="/api/v1/namespaces/master-c465jhg20s3mckhh6s80/configmaps/openshift-controller-manager-config" userAgent="control-plane-operator/v0.0.0 (linux/amd64) kubernetes/$Format"
From performance team, the control-plane-operator logs have lots of this:
kubectl logs control-plane-operator-68d6d7d445-g4czf -n master-c6adpbk20mj47823vl7g --tail=50
2021-11-23T13:23:49.077Z INFO control-plane-operator.OpenShiftAPIServerClient Updating OpenShift APIServer configmap
2021-11-23T13:23:49.116Z INFO control-plane-operator.OpenShiftControllerManagerClient Updating OpenShift Controller Manager deployment
2021-11-23T13:23:49.258Z INFO control-plane-operator.OpenShiftAPIServerClient Updating OpenShift APIServer deployment
I1123 13:23:49.489453 1 recorder_logging.go:37] &Event{ObjectMeta:{dummy.16ba2fabd0aa669c dummy 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Pod,Namespace:dummy,Name:dummy,UID:,APIVersion:v1,ResourceVersion:,FieldPath:,},Reason:ObservedConfigChanged,Message:Writing updated observed config: map[string]interface{}{
"build": map[string]interface{}{"buildDefaults": map[string]interface{}{"resources": map[string]interface{}{}}, "imageTemplateFormat": map[string]interface{}{"format": string("quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d83f2ed0b41ad5f5b08d775a61f78c2459b4938e7cc53d3fb75ec68f672e8e48")}},
"deployer": map[string]interface{}{"imageTemplateFormat": map[string]interface{}{"format": string("quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:56a377e0f3e48105f5dc0d3d70e1d821fcb5f282023f0a5410b7df9d5b617a65")}},
- "dockerPullSecret": map[string]interface{}{
- "internalRegistryHostname": string("image-registry.openshift-image-registry.svc:5000"),
- },
}
,Source:EventSource{Component:,Host:,},FirstTimestamp:2021-11-23 13:23:49.489338012 +0000 UTC m=+527581.065721643,LastTimestamp:2021-11-23 13:23:49.489338012 +0000 UTC m=+527581.065721643,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}
2021-11-23T13:23:49.545Z INFO control-plane-operator.OpenShiftControllerManagerClient Updating OpenShift Controller Manager configmap
2021-11-23T13:23:49.682Z INFO control-plane-operator.OpenShiftControllerManagerClient Updating OpenShift Controller Manager deployment
I1123 13:23:49.911168 1 recorder_logging.go:37] &Event{ObjectMeta:{dummy.16ba2fabe9cd1eea dummy 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Pod,Namespace:dummy,Name:dummy,UID:,APIVersion:v1,ResourceVersion:,FieldPath:,},Reason:ObservedConfigChanged,Message:Writing updated observed config: map[string]interface{}{
- "imagePolicyConfig": map[string]interface{}{
- "internalRegistryHostname": string("image-registry.openshift-image-registry.svc:5000"),
- },
"projectConfig": map[string]interface{}{"projectRequestMessage": string("")},
}
,Source:EventSource{Component:,Host:,},FirstTimestamp:2021-11-23 13:23:49.911043818 +0000 UTC m=+527581.487427205,LastTimestamp:2021-11-23 13:23:49.911043818 +0000 UTC m=+527581.487427205,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}
2021-11-23T13:23:49.959Z INFO control-plane-operator.OpenShiftAPIServerClient Updating OpenShift APIServer configmap
2021-11-23T13:23:50.028Z INFO control-plane-operator.OpenShiftAPIServerClient Updating OpenShift APIServer deployment
I1123 13:23:50.490253 1 recorder_logging.go:37] &Event{ObjectMeta:{dummy.16ba2fac0c513aee dummy 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Pod,Namespace:dummy,Name:dummy,UID:,APIVersion:v1,ResourceVersion:,FieldPath:,},Reason:ObservedConfigChanged,Message:Writing updated observed config: map[string]interface{}{
"build": map[string]interface{}{"buildDefaults": map[string]interface{}{"resources": map[string]interface{}{}}, "imageTemplateFormat": map[string]interface{}{"format": string("quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d83f2ed0b41ad5f5b08d775a61f78c2459b4938e7cc53d3fb75ec68f672e8e48")}},
"deployer": map[string]interface{}{"imageTemplateFormat": map[string]interface{}{"format": string("quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:56a377e0f3e48105f5dc0d3d70e1d821fcb5f282023f0a5410b7df9d5b617a65")}},
- "dockerPullSecret": map[string]interface{}{
- "internalRegistryHostname": string("image-registry.openshift-image-registry.svc:5000"),
- },
}
,Source:EventSource{Component:,Host:,},FirstTimestamp:2021-11-23 13:23:50.490127086 +0000 UTC m=+527582.066510970,LastTimestamp:2021-11-23 13:23:50.490127086 +0000 UTC m=+527582.066510970,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}
2021-11-23T13:23:50.525Z INFO control-plane-operator.OpenShiftControllerManagerClient Updating OpenShift Controller Manager configmap
2021-11-23T13:23:50.645Z INFO control-plane-operator.OpenShiftControllerManagerClient Updating OpenShift Controller Manager deployment
I1123 13:23:50.911327 1 recorder_logging.go:37] &Event{ObjectMeta:{dummy.16ba2fac256a3e8a dummy 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Pod,Namespace:dummy,Name:dummy,UID:,APIVersion:v1,ResourceVersion:,FieldPath:,},Reason:ObservedConfigChanged,Message:Writing updated observed config: map[string]interface{}{
- "imagePolicyConfig": map[string]interface{}{
- "internalRegistryHostname": string("image-registry.openshift-image-registry.svc:5000"),
- },
"projectConfig": map[string]interface{}{"projectRequestMessage": string("")},
}
,Source:EventSource{Component:,Host:,},FirstTimestamp:2021-11-23 13:23:50.91119681 +0000 UTC m=+527582.487580430,LastTimestamp:2021-11-23 13:23:50.91119681 +0000 UTC m=+527582.487580430,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}
I1123 13:23:51.490272 1 recorder_logging.go:37] &Event{ObjectMeta:{dummy.16ba2fac47ec105e dummy 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Pod,Namespace:dummy,Name:dummy,UID:,APIVersion:v1,ResourceVersion:,FieldPath:,},Reason:ObservedConfigChanged,Message:Writing updated observed config: map[string]interface{}{
"build": map[string]interface{}{"buildDefaults": map[string]interface{}{"resources": map[string]interface{}{}}, "imageTemplateFormat": map[string]interface{}{"format": string("quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d83f2ed0b41ad5f5b08d775a61f78c2459b4938e7cc53d3fb75ec68f672e8e48")}},
"deployer": map[string]interface{}{"imageTemplateFormat": map[string]interface{}{"format": string("quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:56a377e0f3e48105f5dc0d3d70e1d821fcb5f282023f0a5410b7df9d5b617a65")}},
- "dockerPullSecret": map[string]interface{}{
- "internalRegistryHostname": string("image-registry.openshift-image-registry.svc:5000"),
- },
}
,Source:EventSource{Component:,Host:,},FirstTimestamp:2021-11-23 13:23:51.490130014 +0000 UTC m=+527583.066514718,LastTimestamp:2021-11-23 13:23:51.490130014 +0000 UTC m=+527583.066514718,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}
2021-11-23T13:23:51.795Z INFO control-plane-operator.OpenShiftAPIServerClient Updating OpenShift APIServer configmap
2021-11-23T13:23:51.796Z INFO control-plane-operator.OpenShiftControllerManagerClient Updating OpenShift Controller Manager configmap
2021-11-23T13:23:52.087Z INFO control-plane-operator.OpenShiftAPIServerClient Updating OpenShift APIServer deployment
I believe the "Writing updated observed config" events (recorder_logging.go lines) are from updateObeservedConfig
- https://github.com/openshift/library-go/blob/release-4.9/pkg/operator/configobserver/config_observer_controller.go#L184
And that is called by a sync
function - https://github.com/openshift/library-go/blob/release-4.9/pkg/operator/configobserver/config_observer_controller.go#L162
I don't see the configmaps or deployments actually changing over time. The event output suggests that the objects seen by the sync code are always missing those fields - yet I see them in the configmaps.
It seems like that logic is either not seeing the current configmaps or not comparing actual / expected properly.
Final note: In my test deployment this behavior stops after 20 minutes or so. The performance team sees that continuously for all clusters, possibly because they deploy clusters without workers?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.