Code Monkey home page Code Monkey logo

ocp-kvm-ipi-automation's Introduction

OpenShift on KVM IPI Installation Automation for IBM Z / LinuxONE (and IBM Power Systems and Intel / AMD x86_64 and ARM64)

This repository contains Ansible playbooks to install an OpenShift cluster via the OpenShift installer 'IPI' method (installer-provisioned infrastructure) using KVM and libvirt on a dedicated Linux host. The primary focus of these playbooks is installing and configuring OpenShift on IBM Z / LinuxONE hosts (architecture: s390x) but IBM Power System hosts (architecture: ppc64le), Intel-based hosts (architecture: x86_64) and ARM64-based hosts (architecture: aarch64) are also supported.

Documentation

Please make sure to read the included project documentation thoroughly before you get started.

For frequently asked questions please refer to the FAQ document.

If you've encountered an issue while using the playbooks in this repository you might want to take a look at the troubleshooting document.

Contributing

Contributions to this project have to be submitted under the Apache License, Version 2.0. See the included LICENSE file for more information.

Developer Certificate of Origin

When making contributions to this project, you certify the Developer Certificate of Origin.

Submitting Changes

Create GitHub pull requests to contribute changes to this project. Create separate pull requests for each logical enhancement, feature, or problem fix.

You can use GitHub issues to report problems and suggest improvements.

License

The OpenShift on KVM IPI Installation Automation for IBM Z / LinuxONE project and all files included are licensed under the Apache License, Version 2.0. See the included LICENSE file for more information.

Maintainers

ocp-kvm-ipi-automation's People

Contributors

aliakseimakarau avatar haubenr avatar hbrueckner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ocp-kvm-ipi-automation's Issues

Enhancement: improve usability by leveraging custom Ansible callback plugin

Currently some of the long-running playbook tasks (especially the cluster installation task) do not display any meaningful message on the console besides the default 'FAILED - RETRYING: wait until... (XX retries left).' message. This is not particularly helpful for the user and potentially misleading due to the inclusion of the 'FAILED' keyword in the message.

A better alternative would be to display a custom message to assure the user that everything is alright and things are progressing nicely. For this, the following Ansible callback plugin could be used: https://docs.ansible.com/ansible/latest/collections/community/general/diy_callback.html

For examples on how to customize the messages for 'until' loops with the help of this plugin see here: ansible/ansible#32584

Enhancement: address various quality-of-life issues

The playbooks need to address the following:

  • add ~root/go path to list of artifacts that are to be cleaned up
  • change all usage of 'yes' and 'no' in playbooks (YAML files) to 'true' and 'false' respectively (for compatibility with YAML linters)

Bug: unable to install OCP pre-release (aka nightly) builds

I am unable to install pre-release (aka nightly) builds of OCP. The OpenShift installer fails with this error message:

fatal: [myhost]: FAILED! => changed=true
 cmd:
 - /usr/local/bin/openshift-install
 - create
 - manifests
 - --dir=/root/ocp4-workdir
 delta: ‘0:00:00.054399’
 end: ‘2022-06-30 09:12:53.846745’
 msg: non-zero return code
 rc: 3
 start: ‘2022-06-30 09:12:53.792346’
 stderr: ‘level=error msg=failed to fetch Master Machines: failed to load asset “Install Config”: failed to create install config: invalid “install-config.yaml” file: platform: Invalid value: “libvirt”: must specify one of the platforms (alibabacloud, aws, azure, baremetal, gcp, ibmcloud, none, nutanix, openstack, ovirt, powervs, vsphere)’
 stderr_lines: <omitted>
 stdout: ‘’
 stdout_lines: <omitted>

Ideally the playbooks support the installation of official OCP releases as well as pre-release builds.

Enhancement: add dedicated FAQ document

The existing documentation seemingly doesn't answer some questions users have about the playbooks or the OpenShift installation in general, e.g.:

  • is offline installation (air-gapped) supported (answer: no, not when using these playbooks)
  • can I install multiple OCP clusters on the same KVM host (answer: no, not when using these playbooks)
  • can I install an OCP cluster that spans multiple KVM hosts (answer: no, this topology setup is not supported by OCP libvirt / KVM IPI)
  • can I use a different type of networking setup, e.g. macvtap or openvswitch (answer: no, different networking options are not supported by OCP libvirt / KVM IPI)
  • does the OCP cluster installed on my KVM host survive reboots of the host (answer: yes, use the included playbooks to start the cluster nodes after a host reboot)

A dedicated FAQ document could be helpful for these types of questions.

Enhancement: Allow override of default network type

Red Hat OpenShift Container Platform supports multiple internal network types out of the box: OpenShiftSDN and OVNKubernetes with OpenShiftSDN being the default. Users should be able to override the default and install a cluster using network type OVNKubernetes.

Compatibility: pin Python3 packages to specific versions

Installing the latest Python3 packages as well as updating existing Python3 packages proves complicated as there are cross-dependencies between packages (pip <-> pyOpenSSL) which can easily break.

Hence the playbooks should lock down all Python3 packages that are installed via pip.

Broken Python package dependency: openshift version 0.13.0

The Python package openshift is broken for version 0.13.0 - it cannot be installed:

pip3 install -U openshift
Collecting openshift
  Downloading https://files.pythonhosted.org/packages/12/1d/914bfdcc8e1e3b9b88b050d77b934146831b5fe320ae238e3e1813e40fa5/openshift-0.13.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-fk8w5it6/openshift/setup.py", line 49, in <module>
        install_requires=extract_requirements('requirements.txt'),
      File "/tmp/pip-build-fk8w5it6/openshift/setup.py", line 36, in extract_requirements
        with open(filename, 'r') as requirements_file:
    FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-fk8w5it6/openshift/

The playbooks need to exclude this specific package version until this has been resolved.

Enhancement: remove cluster from active subscriptions in OCM upon cleanup

Red Hat OCM (https://console.redhat.com/openshift/) keeps track of all OCP clusters installed for a given Red Hat account / user ID. Upon cluster deletion on the target KVM LPAR (e.g. as part of the cleanup playbook) the cluster will be put in a 'stale' state, still lingering in the OCM UI. With many clusters setup and torn down using these playbooks the OCM inventory will inevitably end up littered with stale clusters long gone.

Using the ocm CLI tooling, stale clusters can be removed from the main OCM view automatically. The cleanup playbook should make use of this if OCM integration is configured by the user.

Compatibility: do not install the latest OS-level and Python packages

Up until now the playbooks have been installing the latest version of all prerequisite OS-level and Python packages with every playbooks run. This could potentially lead to unstable systems as newer versions of libraries with know bugs / incompatibilities might be pulled in (as is the case for libvirt on RHEL 8.5, see also: #12).

Bug: the Python3 prerequisites cannot be installed successfully

When running the setup_host.yml playbook the installation of the Python3 prerequisites fails with this error message:

     Downloading https://files.pythonhosted.org/packages/99/f2/b71b9b5b2400fffac7d42c560ac89f302c4d8e328337b2f05f0a4d9e590d/bcrypt-4.0.0.tar.gz
        Complete output from command python setup.py egg_info:

                =============================DEBUG ASSISTANCE==========================
                If you are seeing an error here please try the following to
                successfully install cryptography:

                Upgrade to the latest pip and try again. This will fix errors for most
                users. See: https://pip.pypa.io/en/stable/installing/#upgrading-pip
                =============================DEBUG ASSISTANCE==========================

        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/tmp/pip-build-t2zo6ly4/bcrypt/setup.py", line 11, in <module>
            from setuptools_rust import RustExtension
        ModuleNotFoundError: No module named 'setuptools_rust'

        ----------------------------------------

    :stderr: WARNING: Running pip install with root privileges is generally not a good idea. Try `__main__.py install --user` instead.
    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-t2zo6ly4/bcrypt/

This is likely due to an outdated pip3 installation on the target host.

Cleanup some aspects of the playbooks

There are some inconsistencies in the playbooks, e.g.:

  • host architecture references are sometimes hardcoded (s390x) -> these should be replaced with the corresponding Ansible variable
  • formatting of the playbooks is inconsistent (placement of blank lines etc.)
  • typos in documentation bits (comments, README files)

Various playbooks improvements

The playbooks can be improved in various areas:

  • proper handling of task loops

  • fix sanity check for disk size of directory / mountpoint /var/lib/libvirt

Compatibility: unable to install Python3 prerequiste packages on RHEL 8.5

The required Python3 packages cannot be installed on the target KVM host if the host is running RHEL 8.5 (or older). The setup_host.yml playbook fails with this error:

...
distutils.errors.DistutilsPlatformError: Rust 1.54.0 does not match extension requirement >=1.56.0
...

The prerequisite bcrypt package requires at least rust 1.56 installed whereas RHEL 8.5 only provides rust 1.54.
Due to this, any RHEL releases older than 8.6 should be excluded from the playbooks.

Add python3-jsonpatch OS-level package dependency

The kubernetes.core.k8s Ansible plugin depends on the third-party package 'jsonpatch' to be available on the Ansible host. This can be resolved by installing the RHEL OS-level package 'python3-jsonpatch'.

Compatibility: tie openshift-installer release to OCP release that is supposed to be installed

For technical reasons the versions of the OpenShift installer and the OCP release that is to be installed on the KVM host must match (or should be fairly close to each other). Tests have shown these combinations to work:

  • installer: release-4.9, OCP: 4.10.x, 4.9.x, 4.8.x
  • installer: release-4.8, OCP: 4.8.x, 4.9.x
  • installer: release-4.7, OCP: 4.7.x

The initial assumption that the OpenShift installer 'release-4.9' is capable of installing OCP release going back to 4.7.x (and older) is wrong. The installer generates CoreOS Ignition files as part of the installation process that are used to bootstrap the cluster nodes. It seems that OCP 4.7.x for instance is unable to deal with Ignition files spec'd with version 3.2.0 (generated by OpenShift installer versions newer that 4.8'.

Bug: fix go binaries build

One of the latest changes introduced a bug to the way some of the prerequisite binaries are being installed using 'go'. The 'go' executable itself is not included in the PATH variable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.