metal3-io / baremetal-operator Goto Github PK
View Code? Open in Web Editor NEWBare metal host provisioning integration for Kubernetes
License: Apache License 2.0
Bare metal host provisioning integration for Kubernetes
License: Apache License 2.0
The BMO wants the user data values to come from a Secret because that's how OpenShift stores the data needed by a host running an RHCOS image to become a node. That isn't the most convenient form for other types of images, though, so adding another way that is easier would be nice. We could expand the BareMetalHost CRD to accept base64 encoded data directly, or to have a more defined structure for the data.
See https://github.com/metal3-io/metal3-dev-env/blob/master/provision_host.sh for an example of a script that uses the existing Secret mechanism.
At the moment, pkg/provisioner/ironic/ironic.go
includes a hard coded IP address for the baremetal-operator to use on its provisioning network:
// FIXME(dhellmann): We need to get our IP on the
// provisioning network from somewhere.
driverInfo["deploy_kernel"] = "http://172.22.0.1/images/ironic-python-agent.kernel"
driverInfo["deploy_ramdisk"] = "http://172.22.0.1/images/ironic-python-agent.initramfs"
As noted by the FIXME comment there, we need to add a way to get this IP that makes this easier to use different environments.
The baremetal-operator should provide a simple way for the Machine actuator to determine whether a given BareMetalHost is in a state ready to be provisioned. This will be used in selection criteria in the Machine actuator.
Right now we hardcode a network name of "Pod networking" for every NIC. This should be either removed, or made dynamic somehow.
https://github.com/metal3-io/baremetal-operator/blob/master/pkg/provisioner/ironic/ironic.go#L378
Example:
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"metal3.io/v1alpha1","kind":"BareMetalHost","metadata":{"annotations":{},"name":"kube-worker-0","namespace":"metal3"},"spec":{"bmc":{"address":"ipmi://192.168.111.1:6233","credentialsName":"kube-worker-0-bmc-secret"},"bootMACAddress":
"00:93:1e:b1:74:87","online":true}}
creationTimestamp: "2019-05-03T18:24:45Z"
finalizers:
- baremetalhost.metal3.io
generation: 2
name: kube-worker-0
namespace: metal3
resourceVersion: "1382"
selfLink: /apis/metal3.io/v1alpha1/namespaces/metal3/baremetalhosts/kube-worker-0
uid: ba1d5285-6dd0-11e9-86cf-4c9a6490472b
spec:
bmc:
address: ipmi://192.168.111.1:6233
credentialsName: kube-worker-0-bmc-secret
bootMACAddress: 00:93:1e:b1:74:87
hardwareProfile: ""
online: true
status:
errorMessage: ""
goodCredentials:
credentials:
name: kube-worker-0-bmc-secret
namespace: metal3
credentialsVersion: "807"
hardware:
cpu:
count: 2
model: Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz
speedGHz: 3.50401
type: x86_64
nics:
- ip: 172.22.0.54
mac: 00:93:1e:b1:74:87
model: 0x1af4 0x0001
name: eth0
network: Pod Networking
speedGbps: 0
- ip: 192.168.111.23
mac: 00:93:1e:b1:74:89
model: 0x1af4 0x0001
name: eth1
network: Pod Networking
speedGbps: 0
ramGiB: 4
storage:
- model: QEMU QEMU HARDDISK
name: /dev/sda
sizeGiB: 50
type: HDD
- model: '0x1af4 '
name: /dev/vda
sizeGiB: 8
type: HDD
hardwareProfile: unknown
lastUpdated: "2019-05-03T18:29:58Z"
operationalStatus: OK
poweredOn: true
provisioning:
ID: c718759b-518e-446b-afd2-010374971f81
image:
checksum: ""
url: ""
state: ready
Implement the Iroinc provisioning workflow.
https://github.com/metalkube/baremetal-operator/blob/master/pkg/provisioning/provisioning.go
Issue #32 discusses the option of configuring Ironic to completely wipe disks in the future.
In the meantime, we should default to at least erasing metadata during deprovisioning to ensure that powering the node back on will not boot back into the OS previously provisioned.
According to Julia Kreger, the Ironic configuration we want is:
[conductor]
automated_clean = True
[deploy]
erase_devices_priority = 0
erase_devices_metadata_priority = 10
This will ensure we always drive a node through cleaning when we setup a node. It also ensures we skip the general disk wipe/erase, and turns on metadata only wiping.
Ironic is capable of doing discovery of new hosts. This should be an optional capability of the baremetal-operator. When we discover a new host, it should result in automatically creating a corresponding BareMetalHost object.
We need to provide some basic developer documentation about what parts of the operator need to be changed to add hardware support. For example, adding support for new BMC types in that package, and when we decide how we're really going to manage hardware profiles we will want to describe that well.
The baremetal operator should actively monitor the power on each host to ensure it matches the desired state.
We should show the power status of each host as one of:
And we should adjust the power state when it does not match what the CRD shows the user wanted.
We'd like to deploy baremetal server with RAID configuration. In order to
setup/unset RAID, using vendor driver is necessary. This issue proposes a new yaml
attribute to setup RAID with Fujitsu PRIMERGY server by using iRMC driver.
Set the state of externally provisioned hosts (masters) so they appear as adopted systems.
For baremetal hosts, it is required to have a 'hardwareProfile' in the spec section.
So far a BareMetalHost will reflect an error if BMC credentials are not set. Once those credentials are set in Ironic, we can use the state of the ironic node to determine if the BMC credentials are valid. We know the credentials are valid when the bare metal node moves to the "manageable" state.
https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/node_states.html
In order to setup/reset BIOS, vendor driver is necessary. This issue proposes a new yaml
attribute to setup BIOS with Fujitsu PRIMERGY server by using iRMC driver.
I just checked the resulting hardware details from introspection on the BareMetalHost objects via metal3-io/metal3-dev-env and noticed that the NICs are listed with a speed of 0.
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"metal3.io/v1alpha1","kind":"BareMetalHost","metadata":{"annotations":{},"name":"kube-worker-0","namespace":"metal3"},"spec":{"bmc":{"address":"ipmi://192.168.111.1:6233","credentialsName":"kube-worker-0-bmc-secret"},"bootMACAddress":
"00:93:1e:b1:74:87","online":true}}
creationTimestamp: "2019-05-03T18:24:45Z"
finalizers:
- baremetalhost.metal3.io
generation: 2
name: kube-worker-0
namespace: metal3
resourceVersion: "1382"
selfLink: /apis/metal3.io/v1alpha1/namespaces/metal3/baremetalhosts/kube-worker-0
uid: ba1d5285-6dd0-11e9-86cf-4c9a6490472b
spec:
bmc:
address: ipmi://192.168.111.1:6233
credentialsName: kube-worker-0-bmc-secret
bootMACAddress: 00:93:1e:b1:74:87
hardwareProfile: ""
online: true
status:
errorMessage: ""
goodCredentials:
credentials:
name: kube-worker-0-bmc-secret
namespace: metal3
credentialsVersion: "807"
hardware:
cpu:
count: 2
model: Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz
speedGHz: 3.50401
type: x86_64
nics:
- ip: 172.22.0.54
mac: 00:93:1e:b1:74:87
model: 0x1af4 0x0001
name: eth0
network: Pod Networking
speedGbps: 0
- ip: 192.168.111.23
mac: 00:93:1e:b1:74:89
model: 0x1af4 0x0001
name: eth1
network: Pod Networking
speedGbps: 0
ramGiB: 4
storage:
- model: QEMU QEMU HARDDISK
name: /dev/sda
sizeGiB: 50
type: HDD
- model: '0x1af4 '
name: /dev/vda
sizeGiB: 8
type: HDD
hardwareProfile: unknown
lastUpdated: "2019-05-03T18:29:58Z"
operationalStatus: OK
poweredOn: true
provisioning:
ID: c718759b-518e-446b-afd2-010374971f81
image:
checksum: ""
url: ""
state: ready
We use Event objects to record history for operations associated with a given host. However, events have a finite lifetime that is shorter than that of the host object, and so when they are cleaned up we will lose some of the history.
To address that, we should consider adding more details to the status block of the host object itself. We probably don't want to keep the full history of the host indefinitely, but we may want to keep more information than we have today.
References:
The operator-sdk supports generating OpenAPI validation parameters for the CRD automatically ("operator-sdk generate opensapi"). We need to
See https://github.com/operator-framework/operator-sdk/blob/master/doc/sdk-cli-reference.md#openapi and https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions
This was just launched: https://www.operatorhub.io/
We should consider publishing our bare metal operator to this directory to help make it discoverable. We should wait until it's more functional, though.
Currently, the network info we have on a BareMetalHost looks like this:
nics:
- ip: 172.22.0.86
mac: 00:5a:10:3f:c2:3d
model: 0x1af4 0x0001
name: eth0
network: Pod Networking
speedGbps: 0
- ip: 192.168.111.21
mac: 00:5a:10:3f:c2:3f
model: 0x1af4 0x0001
name: eth1
network: Pod Networking
speedGbps: 0
If a hostname is provided by DHCP, I would like to see it as a new field in here.
This is needed by metal3-io/cluster-api-provider-baremetal#49
The issue is that we eventually need all of the addresses that show up on a Node to also be in the Machine status. Right now that includes both IP and hostname. The info we have so far will let us populate the Machine status field with the expected IPs, but not hostname.
The current machine reference field holds the name of a Machine object, which ties the host objects closely to the Cluster API. Let's rename that and make it a simple string to hold a "consumer ID" so we can still track that something is using the host, but not require that the something be a Machine.
When a BareMetalHost
is provisioned, if there is an issue during the deployment, and the image hasn't changed, we will try again. In fact, we will try forever.
I expect that between each failure, we will bubble up the error message from Ironic per https://github.com/metal3-io/baremetal-operator/blob/master/pkg/provisioner/ironic/ironic.go#L809-L828
However, I do not see that playing out consistently. Take the following example where the md5sum for ub16-password-is-ubuntu.qcow2
was intentionally corrupted and then eventually fixed allowing the provisioning to complete. I do not see any error history. When we try the same for ub-16.04-test.img
and in the middle of that process, switch the image to ub16-password-is-ubuntu.qcow2
(this time with a correct checksum), we do see the error bubble up for the ub-16.04-test.img
attempt.
I am still struggling to find the exact cause of this and would be happy to fix the bug if I could find it. I suspect--but have not confirmed--as long as the image doesn't change, https://github.com/metal3-io/baremetal-operator/blob/master/pkg/provisioner/ironic/ironic.go#L823-L824 may not be doing what we anticipate but this is only conjecture.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal DeprovisioningStarted 100m metal3-baremetal-controller Image deprovisioning started
Normal PowerOn 98m metal3-baremetal-controller Host powered on
Normal DeprovisioningComplete 98m metal3-baremetal-controller Image deprovisioning completed
Normal ProvisioningStarted 88m metal3-baremetal-controller Image provisioning started for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
Normal ProvisioningStarted 84m metal3-baremetal-controller Image provisioning started for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
Normal ProvisioningStarted 83m metal3-baremetal-controller Image provisioning started for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
Normal ProvisioningStarted 81m metal3-baremetal-controller Image provisioning started for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
Normal ProvisioningStarted 80m metal3-baremetal-controller Image provisioning started for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
Normal ProvisioningStarted 78m metal3-baremetal-controller Image provisioning started for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
Normal ProvisioningComplete 75m metal3-baremetal-controller Image provisioning completed for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
Normal DeprovisioningStarted 69m metal3-baremetal-controller Image deprovisioning started
Normal DeprovisioningComplete 67m metal3-baremetal-controller Image deprovisioning completed
Normal ProvisioningStarted 67m metal3-baremetal-controller Image provisioning started for http://172.22.0.1/images/ub-16.04-test.img
Normal ProvisioningError 64m metal3-baremetal-controller Image provisioning failed: node be43421d-c2dd-4684-9eee-c49108520e5c command status errored: {u'message': u'Error verifying image checksum: Image failed to verify against checksum. location: ub-16.04-test.img; image ID: /tmp/ub-16.04-test.img; image checksum: bad md5sum; verification checksum: fd7659f1fb028049596608f4659d5923', u'code': 500, u'type': u'ImageChecksumError', u'details': u'Image failed to verify against checksum. location: ub-16.04-test.img; image ID: /tmp/ub-16.04-test.img; image checksum: bad md5sum; verification checksum: fd7659f1fb028049596608f4659d5923'}
Normal ProvisioningStarted 61m metal3-baremetal-controller Image provisioning started for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
Normal ProvisioningComplete 58m metal3-baremetal-controller Image provisioning completed for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
PR #147 adds introspection. The logging it does or the collected data isn't very useful, because it comes out as a big encoded string.
2019-04-22T10:11:02.682-0400 INFO baremetalhost_ironic Received introspection data {"host": "openshift-worker-0", "data": "eyJhbGxfaW50ZXJmYWNlcyI6eyJldGgwIjp7ImNsaWVudF9pZCI6bnVsbCwiaXAiOiIxNzIuM\
jIuMC4xMiIsIm1hYyI6IjAwOjdlOmUxOjAzOjMwOjNlIiwicHhlIjp0cnVlfSwiZXRoMSI6eyJjbGllbnRfaWQiOm51bGwsImlwIjoiMTkyLjE2OC4xMTEuMjUiLCJtYWMiOiIwMDo3ZTplMTowMzozMDo0MCIsInB4ZSI6ZmFsc2V9fSwiYm9vdF9pbnRlcmZhY2UiOiIwMDo3ZT\
plMTowMzozMDozZSIsImNwdV9hcmNoIjoieDg2XzY0IiwiY3B1cyI6NCwiZGF0YSI6W1siZGlzayIsImxvZ2ljYWwiLCJjb3VudCIsIjEiXSxbImRpc2siLCJ2ZGEiLCJzaXplIiwiNTMiXSxbImRpc2siLCJ2ZGEiLCJ2ZW5kb3IiLCIweDFhZjQiXSxbImRpc2siLCJ2ZGEiLCJ\
vcHRpbWFsX2lvX3NpemUiLCIwIl0sWyJkaXNrIiwidmRhIiwicGh5c2ljYWxfYmxvY2tfc2l6ZSIsIjUxMiJdLFsiZGlzayIsInZkYSIsInJvdGF0aW9uYWwiLCIxIl0sWyJkaXNrIiwidmRhIiwibnJfcmVxdWVzdHMiLCIyNTYiXSxbImRpc2siLCJ2ZGEiLCJzY2hlZHVsZXIi\
LCJtcS1kZWFkbGluZSJdLFsic3lzdGVtIiwicHJvZHVjdCIsIm5hbWUiLCJLVk0iXSxbInN5c3RlbSIsInByb2R1Y3QiLCJ2ZW5kb3IiLCJSZWQgSGF0Il0sWyJzeXN0ZW0iLCJwcm9kdWN0IiwidmVyc2lvbiIsIlJIRUwgNy42LjAgUEMgKGk0NDBGWCArIFBJSVgsIDE5OTYpI\
l0sWyJzeXN0ZW0iLCJwcm9kdWN0IiwidXVpZCIsIjg3MTRiMDE3LWE3NTAtNGJkOS1hNjg1LWQ3NWU4MGYxODg1MCJdLFsiZmlybXdhcmUiLCJiaW9zIiwidmVyc2lvbiIsIjEuMTEuMC0yLmVsNyJdLFsiZmlybXdhcmUiLCJiaW9zIiwiZGF0ZSIsIjA0LzAxLzIwMTQiXSxbIm\
Zpcm13YXJlIiwiYmlvcyIsInZlbmRvciIsIlNlYUJJT1MiXSxbIm1lbW9yeSIsInRvdGFsIiwic2l6ZSIsIjE3MTc5ODY5MTg0Il0sWyJuZXR3b3JrIiwiZXRoMCIsImJ1c2luZm8iLCJ2aXJ0aW9AMCJdLFsibmV0d29yayIsImV0aDAiLCJpcHY0IiwiMTcyLjIyLjAuMTIiXSx\
If a BMH CRD has wrong credentials, host is added but is not retried even after credentials were fixed.
---
apiVersion: v1
kind: Secret
metadata:
name: openshift-node-4-bmc-secret
type: Opaque
data:
username: YmFkdXNlcgo=
password: YmFkcGFzcwo=
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: openshift-node-4
spec:
online: true
bmc:
address: ipmi://192.168.122.1:6234
credentialsName: openshift-node-4-bmc-secret
bootMACAddress: 52:54:00:32:78:1a
image:
url: "http://172.22.0.1/images/rhcos-ootpa-latest.qcow2"
checksum: "http://172.22.0.1/images/rhcos-ootpa-latest.qcow2.md5sum"
BMH ends up in error state:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Registered 15m metal3-baremetal-controller Registered new host
Normal RegistrationError 15m metal3-baremetal-controller Failed to get power state for node 65e845dc-7a5c-4e13-96f7-5f29639e4a28. Error: IPMI call failed: power status.
and is not retried even after the credentials are fixed
{"level":"info","ts":1558441798.9009247,"logger":"baremetalhost","msg":"Reconciling BareMetalHost","Request.Namespace":"openshift-machine-api","Request.Name":"openshift-node-4"}
{"level":"info","ts":1558441798.9010067,"logger":"baremetalhost_ironic","msg":"ironic settings","endpoint":"http://localhost:6385/v1/","inspectorEndpoint":"http://localhost:5050/v1/","deployKernelURL":"http://172.22.0.1/images/ironic-python-agent.kernel","deployRamdiskURL":"http://172.22.0.1/images/ironic-python-agent.initramfs"}
{"level":"info","ts":1558441798.9010217,"logger":"baremetalhost","msg":"registering and validating access to management controller","Request.Namespace":"openshift-machine-api","Request.Name":"openshift-
node-4","provisioningState":"registering"}
{"level":"info","ts":1558441798.9010265,"logger":"baremetalhost_ironic","msg":"validating management access","host":"openshift-node-4"}
{"level":"info","ts":1558441798.92309,"logger":"baremetalhost_ironic","msg":"found existing node by ID","host":"openshift-node-4"}
{"level":"info","ts":1558441798.9481359,"logger":"baremetalhost","msg":"stopping on host error","Request.Namespace":"openshift-machine-api","Request.Name":"openshift-node-4","provisioningSta
te":"registering","message":"Failed to get power state for node 65e845dc-7a5c-4e13-96f7-5f29639e4a28. Error: IPMI call failed: power status."}
{
"apiVersion": "v1",
"data": {
"password": "Z29vZHBhc3N3b3JkCg==",
"username": "Z29vZHVzZXIK"
},
"kind": "Secret",
"metadata": {
"annotations": {
"kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"data\":{\"password\":\"Z29vZHBhc3N3b3JkCg==\",\"username\":\"Z29vZHVzZXIK\"},\"kind\":\"Secret\",\"metadata\":{\"annotations\":{},\"name\":\"openshift-node-4-bmc-secret\",\"namespace\":\"openshift-machine-api\"},\"type\":\"Opaque\"}\n"
},
"creationTimestamp": "2019-05-21T12:28:59Z",
"name": "openshift-node-4-bmc-secret",
"namespace": "openshift-machine-api",
"ownerReferences": [
{
"apiVersion": "metal3.io/v1alpha1",
"blockOwnerDeletion": true,
"controller": true,
"kind": "BareMetalHost",
"name": "openshift-node-4",
"uid": "02aae0a7-7bc4-11e9-abd0-525400f8c71d"
}
],
"resourceVersion": "287714",
"selfLink": "/api/v1/namespaces/openshift-machine-api/secrets/openshift-node-4-bmc-secret",
"uid": "02a8cb40-7bc4-11e9-abd0-525400f8c71d"
},
"type": "Opaque"
}```
The default network configuration applied by cloud-init is to DHCP on the first interface only. In bare metal environments, we will often want more than that. For example, in metal3-io/metal3-dev-env, the hosts are created with two network interfaces: one for provisioning and one as the primary, external network. The provisioning network is the first interface. When we provision these hosts with a cloud-init based image, we need to provide additional configuration to get the second interface to come up.
Right now this is done manually via extra cloud-init configuration passed through user-data. See: https://github.com/metal3-io/metal3-dev-env/blob/master/provision_host.sh#L15-L45
It would be nice to consider how the baremetal-operator could help make this easier.
Are HardwareProfile and HardwareDetails mutually exclusive?
Cleaning can take a little while, so we may want to expose which step is running. The ironic docs explain which field has the information. We can probably just update the human-readable provisioning status message with that detail when it changes.
If you register a baremetal host with incorrect IPMI credentials it becomes undeletable, since we only attempt delete from nodes.Manageable state in pkg/provisioner/ironic/ironic.go
.
Probably we should handle this case and delete, since it works OK performing a delete from this state via the ironic CLI/API.
When UserData
is accessed here, if it is not set, the process will segfault.
We need to check that UserData != nil
before accessing it.
The dev instructions talk about launching the operator outside of the cluster for development. We should add instructions for launching it inside the cluster, like one would do for production systems.
Now that we have basic introspection working, we should add proper hardware profile matching to detect the type of hardware being used on a host.
If a BareMetalHost is associated with a Machine, we want to prevent deletion of a BareMetalHost. In that case, the Machine must be deleted first.
This can be done via an admission webhook.
Some BMCs require a URL rather than just an IP address, so we want to make the field name more generic. Some discussion within the team came up with "Address" and "Location". "Address" is not sufficiently clear that it might include a URL. Are there other options?
the template rendering in the helper tools for creating hosts needs to be testable
It would be useful to have version information available in the logs from the operator. It is possible to tell the go compiler to replace a string in a module with a different value (see https://github.com/openshift-metal3/kni-installer/blob/master/hack/build.sh#L35 for an example). Let's see if that technique works for the operator, too.
Go's test framework has built-in support for coverage reporting, but we don't use it. It would be good to have that information available as part of the output when unit tests are run.
When we upgrade the baremetal operator we will exercise the design principle that the ironic database is ephemeral. We need to verify that the operator will re-register the host objects so that ironic can manage them, but they won't be reprovisioned. We deal with adopting control plane nodes already, but we don't want previously provisioned hosts to show up as "externally provisioned" so we may need a different workflow for this case. Perhaps ironic's "adopt" feature?
A Node has several conditions. Does it make sense for us to have separate conditions for hosts, too?
The BareMetalHost.Spec
currently supports providing a checksum for the image to be provisioned. The checksum can be provided directly as a string, or as a URL which will be fetched.
These questions need to be answered:
Whatever answers emerge, the documentation for the Image struct should be improved with more detail.
Events about BMH state transition is not displayed with resource describe:
oc describe baremetalhost discovered-node-0 -n openshift-machine-api
Name: discovered-node-0
Namespace: openshift-machine-api
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"metalkube.org/v1alpha1","kind":"BareMetalHost","metadata":{"annotations":{},"name":"discovered-node-0","namespace":"openshi...
API Version: metalkube.org/v1alpha1
Kind: BareMetalHost
Metadata:
Creation Timestamp: 2019-05-03T09:19:31Z
Finalizers:
baremetalhost.metalkube.org
Generation: 2
Resource Version: 350682
Self Link: /apis/metalkube.org/v1alpha1/namespaces/openshift-machine-api/baremetalhosts/discovered-node-0
UID: 8f2d0fbb-6d84-11e9-b472-525400a4453e
Spec:
Bmc:
Address:
Credentials Name: discovered-node-0-bmc-secret
Boot MAC Address: 52:54:00:b7:e8:e8
Hardware Profile:
Online: true
Status:
Error Message: Empty BMC address Missing BMC connection detail 'Address'
Good Credentials:
Hardware Profile:
Last Updated: 2019-05-03T09:19:31Z
Operational Status: discovered
Powered On: false
Provisioning:
ID:
Image:
Checksum:
URL:
State:
Events: <none>
While in baremetal-operator
logs:
{"level":"info","ts":1556875171.6777682,"logger":"baremetalhost","msg":"Reconciling BareMetalHost","Request.Namespace":"openshift-machine-api","Request.Name":"discovered-node-0"}
{"level":"info","ts":1556875171.6778715,"logger":"baremetalhost","msg":"adding finalizer","Request.Namespace":"openshift-machine-api","Request.Name":"discovered-node-0","existingFinalizers":[],"newValue":"bareme
talhost.metalkube.org"}
{"level":"info","ts":1556875171.6889548,"logger":"baremetalhost","msg":"Reconciling BareMetalHost","Request.Namespace":"openshift-machine-api","Request.Name":"discovered-node-0"}
{"level":"info","ts":1556875171.6891785,"logger":"baremetalhost","msg":"updating owner of secret","Request.Namespace":"openshift-machine-api","Request.Name":"discovered-node-0"}
{"level":"info","ts":1556875171.7072365,"logger":"baremetalhost","msg":"publishing event","reason":"Discovered","message":"Discovered host with unusable BMC details: Empty BMC address Missing BMC connection deta
il 'Address'"}
{"level":"info","ts":1556875171.726009,"logger":"baremetalhost","msg":"Reconciling BareMetalHost","Request.Namespace":"openshift-machine-api","Request.Name":"discovered-node-0"}
---
apiVersion: v1
kind: Secret
metadata:
name: discovered-node-0-bmc-secret
type: Opaque
data:
username: YWRtaW4=
password: cGFzc3dvcmQ=
# BMC address intentionally left empty to trigger transition
# to 'Discovered' state
---
apiVersion: metalkube.org/v1alpha1
kind: BareMetalHost
metadata:
name: discovered-node-0
spec:
online: true
bmc:
address:
credentialsName: discovered-node-0-bmc-secret
bootMACAddress: 52:54:00:b7:e8:e8
oc apply -f discovered_node_cr.yaml -n openshift-machine-api
oc describe baremetalhost discovered-node-0 -n openshift-machine-api
It looks like it is possible to describe some validation rules in the CRD YAML. What benefit does that give us? Should we take advantage?
Start a user guide and provide configuration examples for different types of BMCs.
We have minishift instructions and another ticket to add minikube instructions, but we should also document using our dev-scripts since that is our preferred environment now that it works.
Implement the Iroinc deprovisioning workflow.
https://github.com/metalkube/baremetal-operator/blob/master/pkg/provisioning/provisioning.go
Ironic has the ability to wipe disks as part of the deprovisioning workflow. This is time consuming, so we assume this will not be default behavior. We still may want to provide this as optional behavior in the future.
The firmware details from inspection are available in gophercloud, but not exposed in the host CRD.
On a libvirt/kvm VM I get:
"firmware": {
"bios": {
"date": "04/01/2014",
"version": "1.11.0-2.el7",
"vendor": "SeaBIOS"
}
},
It would be useful to be able to display that information about a host.
Our dev setup instructions focus on minishift, but should also include minikube. It's not hugely different, but we should make sure we have the difference clearly documented.
See:
$ git grep -n '[^a-zA-Z]oc '
docs/dev-setup.md:126:The output can be passed directly to `oc apply` like this:
docs/dev-setup.md:129:$ go run cmd/make-virt-host/main.go openshift_worker_1 | oc apply -f -
docs/publishing-images.md:69: oc apply -f deploy/dev-operator.yaml
docs/publishing-images.md:71:To monitor the operator, use `oc get pods` to find the pod name for
docs/publishing-images.md:73:`oc log -f $podname` to see the console log output.
When the BMC secret is not found, I get a pretty verbose error in the log. This seems like a condition we can expect and not log as a full backtrace, if possible.
{"level":"info","ts":1554830713.4522338,"logger":"controller_baremetalhost","msg":"Reconciling BareMetalHost","Request.Namespace":"openshift-machine-api","Request.Name":"openshift-worker-2"}
{"level":"error","ts":1554830713.4526749,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"metalkube-baremetalhost-controller","request":"openshift-machine-api/openshift-worker-2","error":"BMC credentials are invalid: failed to fetch BMC credentials from secret reference: Secret \"openshift-worker-2-bmc-secret\" not found","errorVerbose":"Secret \"openshift-worker-2-bmc-secret\" not found\nfailed to fetch BMC credentials from secret reference\ngithub.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).getValidBMCCredentials\n\t/go/src/github.com/metalkube/baremetal-opera
tor/pkg/controller/baremetalhost/baremetalhost_controller.go:638\ngithub.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).Reconcile\n\t/go/src/github.com/metalkube/baremetal
-operator/pkg/controller/baremetalhost/baremetalhost_controller.go:243\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:213\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/metalkube/baremetal-opera
tor/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/metalkube/baremetal-operator/vend
or/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2361\nBMC credentials are invalid\ngithub.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).Reconcile\n\t/go/src/github.com/metalkube/baremetal-operator/pkg/controller/baremetalhost/baremetalhost_c
ontroller.go:245\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/
sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:213\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/g
o/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/met
alkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2361","stacktrace":"github.com/metalkube/baremetal-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\ngithub.com/metalkube/baremet
al-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller
/controller.go:158\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
I observed an environment that set hardwareProfile: something-invalid
. In that case, the BareMetalHost remained in the "match profile" state, with no indication of an error.
We should add some more error handling in this case to make the error more clear through the API.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.