openshift / cluster-api-provider-gcp Goto Github PK

View Code? Open in Web Editor NEW

10.0 12.0 33.0 86.29 MB

Ability to manage Kubernetes supportable hosts in OpenShift on GCP

License: Apache License 2.0

Go 83.75% Dockerfile 0.21% Makefile 3.74% Shell 8.34% Python 3.36% Starlark 0.60%

cluster-api-provider-gcp's Introduction

Kubernetes Cluster API Provider GCP

Kubernetes-native declarative infrastructure for GCP.

What is the Cluster API Provider GCP?

The Cluster API brings declarative Kubernetes-style APIs to cluster creation, configuration and management. The API itself is shared across multiple cloud providers allowing for true Google Cloud hybrid deployments of Kubernetes.

Documentation

Please see our book for in-depth documentation.

Quick Start

Checkout our Cluster API Quick Start to create your first Kubernetes cluster on Google Cloud Platform using Cluster API.

Support Policy

This provider's versions are compatible with the following versions of Cluster API:

	Cluster API `v1alpha3` (`v0.3.x`)	Cluster API `v1alpha4` (`v0.4.x`)	Cluster API `v1beta1` (`v1.0.x`)
Google Cloud Provider `v0.3.x`	✓
Google Cloud Provider `v0.4.x`		✓
Google Cloud Provider `v1.0.x`			✓

This provider's versions are able to install and manage the following versions of Kubernetes:

	Google Cloud Provider `v0.3.x`	Google Cloud Provider `v0.4.x`	Google Cloud Provider `v1.0.x`
Kubernetes 1.15
Kubernetes 1.16	✓
Kubernetes 1.17	✓	✓
Kubernetes 1.18	✓	✓	✓
Kubernetes 1.19	✓	✓	✓
Kubernetes 1.20	✓	✓	✓
Kubernetes 1.21		✓	✓
Kubernetes 1.22			✓

Each version of Cluster API for Google Cloud will attempt to support at least two versions of Kubernetes e.g., Cluster API for GCP v0.1 may support Kubernetes 1.13 and Kubernetes 1.14.

NOTE: As the versioning for this project is tied to the versioning of Cluster API, future modifications to this policy may be made to more closely align with other providers in the Cluster API ecosystem.

Getting Involved and Contributing

Are you interested in contributing to cluster-api-provider-gcp? We, the maintainers and the community would love your suggestions, support and contributions! The maintainers of the project can be contacted anytime to learn about how to get involved.

Before starting with the contribution, please go through prerequisites of the project.

To set up the development environment, checkout the development guide.

In the interest of getting new people involved, we have issues marked as good first issue. Although these issues have a smaller scope but are very helpful in getting acquainted with the codebase. For more, see the issue tracker. If you're unsure where to start, feel free to reach out to discuss.

See also: Our own contributor guide and the Kubernetes community page.

We also encourage ALL active community participants to act as if they are maintainers, even if you don't have 'official' written permissions. This is a community effort and we are here to serve the Kubernetes community. If you have an active interest and you want to get involved, you have real power!

Office hours

Join the SIG Cluster Lifecycle Google Group for access to documents and calendars.
Participate in the conversations on Kubernetes Discuss
Provider implementers office hours (CAPI)
- Weekly on Wednesdays @ 10:00 am PT (Pacific Time) on Zoom
- Previous meetings: [ notes | recordings ]
Cluster API Provider GCP office hours (CAPG)
- Monthly on first Thursday @ 09:00 am PT (Pacific Time) on Zoom
- Previous meetings: [ notes|recordings ]

Other ways to communicate with the contributors

Please check in with us in the #cluster-api-gcp on Slack.

Github Issues

Bugs

If you think you have found a bug, please follow the instruction below.

Please give a small amount of time giving due diligence to the issue tracker. Your issue might be a duplicate.
Get the logs from the custom controllers and please paste them in the issue.
Open a bug report.
Remember users might be searching for the issue in the future, so please make sure to give it a meaningful title to help others.
Feel free to reach out to the community on slack.

Tracking new feature

We also have an issue tracker to track features. If you think you have a feature idea, that could make Cluster API provider GCP become even more awesome, then follow these steps.

Open a feature request.
Remember users might be searching for the issue in the future, so please make sure to give it a meaningful title to help others.
Clearly define the use case with concrete examples. Example: type this and cluster-api-provider-gcp does that.
Some of our larger features will require some design. If you would like to include a technical design in your feature, please go ahead.
After the new feature is well understood and the design is agreed upon, we can start coding the feature. We would love for you to code it. So please open up a WIP (work in progress) PR and happy coding!

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

cluster-api-provider-gcp's People

Contributors

Stargazers

Watchers

cluster-api-provider-gcp's Issues

[Failing Test] (capg-conformance-main-ci-artifacts) No Control Plane machines came into existence

Duplicate of kubernetes/kubernetes#120481

Which jobs are failing?

master-informing:

capg-conformance-main-ci-artifacts

Which tests are failing?

capg-e2e.[It] Conformance Tests Should run conformance tests

Since when has it been failing?

Last 2 runs failed consecutively, first failure : https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-provider-gcp-make-conformance-main-ci-artifacts/1698303295132536832

Testgrid link

https://testgrid.k8s.io/sig-release-master-informing#capg-conformance-main-ci-artifacts

Reason for failure (if possible)

No Control Plane machines came into existence. 
Expected
    <bool>: false
to be true failed [FAILED] Timed out after 1800.000s.
No Control Plane machines came into existence. 
Expected
    <bool>: false
to be true
In [It] at: /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/controlplane_helpers.go:153 @ 09/03/23 12:42:02.614
}

Allow for local-ssd

Given that local-ssds are very performant and with a 3 year commitment less expensive than pd-standard (I do not believe you can get commitment discounts on other storage) I would like to use them for my /var (and use a small pd-standard boot disk). The fact that they are ephemeral and I can not shutdown machines where they are present is not a concern for openshift nodes.

Unfortunately, the cluster-api does not support them. When I create a machine config with a disk of type local-ssd I get the following error message:

error launching instance: googleapi: Error 400: Invalid value for field 'resource.disks[1].type': 'PERSISTENT'. Cannot create local SSD as persistent disk., invalid

Here is the spec/providerSpec/disk[1] definition I used:

            - autoDelete: true
              image: blank
              sizeGb: 375
              type: local-ssd

When I look at the allowed configuration values and the reconciler I don't see any way to specify the type as required by the google api.

Additionally, I could not find a way to keep the sourceImage and 'sizeGB' from being specified or set the interface.

I am happy to work on a PR but have to major obstacles:

The relationship to kubernetes-sigs/cluster-api-provider-gcp is not clear. Is it upstream, should I make the PR there?
How do I test my PR?

Any help or advice is greatly appreciated.

Unable to install Openshift on GCP with quick start.

Team we are trying to setup Openshift Container Platform on GCP project "XXXX" using steps mentioned in document

https://docs.openshift.com/container-platform/4.2/installing/installing_gcp/installing-gcp-default.html

Our GCP installation is failing with error.

time="2021-04-27T07:09:05Z" level=fatal msg="failed to initialize the cluster: Working towards 4.5.35: 86% complete, waiting on authentication, console, image-registry, ingress, kube-storage-version-migrator, monitoring"

attaching installation log file.

When i debugged more i found out worker nodes are not getting created as machine-controller was failing with error.

oc logs -f machine-api-controllers-76c494db9-qgmmz -c machine-controller -n openshift-machine-api | grep failed

outputs: ==>

E0427 10:21:34.152815 1 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="ocp-sda-cluster-445cl-master-0: failed to create scope for machine: error getting credentials secret "gcp-cloud-credentials" in namespace "openshift-machine-api": Secret "gcp-cloud-credentials" not found" "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"ocp-sda-cluster-445cl-master-0"}

it was not finding gcp-cloud-credentials so i created it manually copying from gcp-credentials of kube-system

later it created worker nodes in GCP compute cloud but its failed to register to Openshift cluster

with error

W0427 10:38:13.369046 1 controller.go:315] ocp-sda-cluster-445cl-worker-c-d25cj: failed to create machine: requeue in: 20s
W0427 10:38:15.345235 1 controller.go:315] ocp-sda-cluster-445cl-worker-a-pr7br: failed to create machine: requeue in: 20s
W0427 10:38:17.318970 1 controller.go:315] ocp-sda-cluster-445cl-worker-a-bxpwn: failed to create machine: requeue in: 20s

I also created google openshift google groups for it.

https://groups.google.com/g/openshift/c/xhORse_wE9I/m/8PZYZP__AgAJ

Currently on master nodes i can see with command oc get nodes but not worker nodes is being created.

ubuntu@openshift-install:~/openshift-sda$ oc get nodes

NAME STATUS ROLES AGE VERSION
ocp-sda-cluster-445cl-master-0.c.flow-on-k8s-test.internal Ready master 30h v1.18.3+cdb0358
ocp-sda-cluster-445cl-master-1.c.flow-on-k8s-test.internal Ready master 30h v1.18.3+cdb0358
ocp-sda-cluster-445cl-master-2.c.flow-on-k8s-test.internal Ready master 30h v1.18.3+cdb0358

On GCP console i can see worker nodes but its not ggetting initialize.

worker node bootstrap have this logs

"resource": {
"type": "gce_instance",
"labels": {
"instance_id": "2636235209896160329",
"project_id": "XXXXX",
"zone": "us-central1-a"
}
},
"timestamp": "2021-05-04T13:59:01.459139363Z",
"severity": "ERROR",
"logName": "projects/XXXXX/logs/compute.googleapis.com%2Fshielded_vm_integrity",
"receiveTimestamp": "2021-05-04T13:59:03.468291729Z"

Future Release Branches Frozen For Merging | branch:release-4.10 branch:release-4.9

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.10
release-4.9

Contact the Test Platform or Automated Release teams for more information.

Future Release Branches Frozen For Merging | branch:release-4.16 branch:release-4.17

release-4.16
release-4.17

For more information, see the branching documentation.

Future Release Branches Frozen For Merging | branch:release-4.17 branch:release-4.18

release-4.17
release-4.18

For more information, see the branching documentation.

compute.New(oauthClient) is marked as deprecated.

In: NewComputeService we call compute.New() which is marked as deprecated. We should switch to NewService().

Termination handler may be slow to detect instance preemption while running inside the instance

Some doubts about the overall GCP termination handler capabilities. Just reading over docs - https://cloud.google.com/compute/docs/instances/preemptible#testing-preemption-settings is describing how to test that your apps handle the preemption well. Our app, in this case, is the termination handler, is doing its job perfectly if it is just a simulated event. But I don't think the event is actually doing its job of removing the instance immediately. In a real environment, I imagine spot instance will be transferred ASAP to different demand, so the actual time from event occurrence to the moment the instance is in the TERMINATING state could be pretty small, say less than default 5s we have now. This means that our termination handler pod process could be killed on the node faster, then it will know it is about to die. If that happens - we leave the node and machine running, the instance is terminated, and worst of all the pod is still running - the behavior I observed while manually stopping instance gcloud compute instances stop - described here as one of the test scenarios here. Even if we have guaranteed 30s margin before the hard stop described in the preemption process, our termination handle could be stopped as one of the firs in the running instance in much less then 5s.

This judgment comes from observing the recommended way to handle the preemption procedure described here. It is a dedicated runnable, executed by the cloud during the shutdown procedure, waiting for some main process to exit as a result of a soft shutdown. Our termination handler is not running as a shutdown script, so we should follow the other recommended way for handling this from outside - https://cloud.google.com/compute/docs/instances/create-start-preemptible-instance#gcloud_2 with hanging HTTP request, by prepending ?wait_for_change=true

What's the difference between this provider and the one under kubernetes-sigs

There is a repo with the same name under kubernetes-sigs. It is used to bootstrap vanilla k8s cluster respecting kubernetes-sigs/cluster-api pattern. Why does this repo be re-designed/implemented here? Does it due to the machenism that based on machine api operator in openshift?

openshift / cluster-api-provider-gcp Goto Github PK

cluster-api-provider-gcp's Introduction

Kubernetes Cluster API Provider GCP

What is the Cluster API Provider GCP?

Documentation

Quick Start

Support Policy

Getting Involved and Contributing

Office hours

Other ways to communicate with the contributors

Github Issues

Bugs

Tracking new feature

Code of conduct

cluster-api-provider-gcp's People

Contributors

Stargazers

Watchers

Forkers

cluster-api-provider-gcp's Issues

Which jobs are failing?

Which tests are failing?

Since when has it been failing?

Testgrid link

Reason for failure (if possible)

Recommend Projects

Recommend Topics

Recommend Org