rackerlabs / genestack Goto Github PK

Where Flex cloud brings infrastructures to where you are.

Home Page: https://docs.rackspacecloud.com/

License: Apache License 2.0

Shell 85.81% Python 11.64% Jinja 1.58% Dockerfile 0.97%

blockstorage cloud containers k8s kubernetes networking objectstorage openstack ovn platform-engineering support system-administration virtual-machine virtualization

genestack's Introduction

Welcome to Genestack: Where Cloud Meets You

Genestack — where Kubernetes and OpenStack tango in the cloud. Imagine a waltz between systems that deploy what you need.

Documentation

Read the Genestack Documentation. The documentation was created to guide through the process of building, operating, and consuming a cloud.

Symphony of Simplicity

Genestack conducts this orchestra of tech with style. Operators play the score, managing the complexity with a flick of their digital batons. They unify the chaos, making scaling and management a piece of cake. Think of it like a conductor effortlessly guiding a cacophony into a symphony.

Hybrid Hilarity

Our hybrid capabilities aren’t your regular circus act. Picture a shared OVN fabric — a communal network where workers multitask like pros. Whether it’s computing, storing, or networking, they wear multiple hats in a hyperconverged circus or a grand full-scale enterprise cloud extravaganza.

The Secret Sauce: Kustomize & Helm

Genestack’s inner workings are a blend dark magic — crafted with Kustomize and Helm. It’s like cooking with cloud. Want to spice things up? Tweak the kustomization.yaml files or add those extra 'toppings' using Helm's style overrides. However, the platform is ready to go with batteries included.

Genestack is making use of some homegrown solutions, community operators, and OpenStack-Helm. Everything in Genestack comes together to form cloud in a new and exciting way; all built with opensource solutions to manage cloud infrastructure in the way you need it.

genestack's People

Contributors

Stargazers

Watchers

genestack's Issues

Unable to bootstrap cluster "Define the environment variable GENESTACK_PRODUCT to continue."

When deploying Genestack, bootstrap fails after defining product.

Steps to reproduce:

Clone Repo
Set ENV export for product
Run bootstrap.sh

Expected behavior
Bootstrap should start and finish

Screenshots
root@genestack-controller1:/opt# export GENESTACK_PRODUCT=openstack-flex
/opt/genestack/bootstrap.sh
No GENESTACK_PRODUCT defined
Define the environment variable GENESTACK_PRODUCT to continue.

DOCs Fix: Setting GW nodes to only network

In the wiki:
We set ovn gw nodes as:

export ALL_NODES=$(kubectl get nodes -l 'openstack-network-node=enabled' -o 'jsonpath={.items[*].metadata.name}')

kubectl annotate \
        nodes \
        ${ALL_NODES} \
        ovn.openstack.org/gateway='enabled'

Which gives us all the compute and network nodes.

However,

# With OVN we need the compute nodes to be "network" nodes as well. While they will be configured for networking, they wont be gateways.
kubectl label node $(kubectl get nodes | awk '/compute/ {print $1}') openstack-network-node=enabled

Galera databases/operator setup still produce DB locking issues

Describe the bug
The way Galera is currently deployed still appears to favor locking issues due to either load balancing issues (not active/passive) or issues with maxscale itself.

To Reproduce
Steps to reproduce the behavior:

Keystone DB sync typically encounters that first as issue where the db schema is not fully laid down and upon retires creates SQL errors that the tables are already present
If the openstack services deploy, rally standard tests do not finish such as OSPC-343

Expected behavior

DB syncs to finish on first pass and rally tests to suceeds

Screenshots

N/A

Additional context
N/A

Glance takes 2 tries before it will install

When deploying Glance. I have noted that the first time it is deployed. It will hang in the jobs section. The job db_sync will complete, but the other jobs will not start. If you then cancel the install and uninstall the failed process from helm. The multi-attrach volume for storage will remain and if you deploy a second time. All jobs will complete and the process will install correctly.

To Reproduce
Steps to reproduce the behavior:
Start yjr glance install and wait. You will see the db_sync complete and you will see the multi-attach PVC get created, but nothing else will happen and the 3 api pods will sit in a pending state.

Expected behavior
All jobs should run and then the 3 API pods will got to a running state

Server (please complete the following information):

OS: ubuntu 22.04
Kubernetes 1.26.10

create-secrets.sh generates secrets with newline characters in base64 encoded secret data

Describe the bug
creating secrets with create-secrets.sh generates /etc/genestack/secrets.yaml
It is seen that for secrets with character length more than 32 characters the base64 encoded data has a newline (\n) character due to which secret creation fails

To Reproduce
Steps to reproduce the behavior:

run the create-secrets.sh script from /opt/genstack/bin/ directory
the generated file /etc/genestack/secrets.yaml contains base64 encoded data with newline '(\n)' characters

Expected behavior
The generated secrets.yaml file should not contain newline (\n) characters for encoded secrets

Example

root@uac-k8m1:~# cat /etc/genestack/secrets.yaml
...
apiVersion: v1
kind: Secret
metadata:
  name: glance-rabbitmq-password
  namespace: openstack
type: Opaque
data:
  username: Z2xhbmNl
  password: MURLU2hpSGpOeWNaRUZkdkxkSWxhZTJWYlFuMEhEaEp2WlA4dkdONGN5RmZpWkk0b0JXVGpKMFVJ
Y1A1WlNzeA==

There should be "tr -d '\n'" to remove newline characters when the secret is encoded:

apiVersion: v1
kind: Secret
metadata:
  name: keystone-rabbitmq-password
  namespace: openstack
type: Opaque
data:
  username: $(echo -n "keystone" | base64)
  password: $(echo -n $keystone_rabbitmq_password | base64 | tr -d '\n')

Server (please complete the following information):

OS: Ubuntu
Version: 22

Add Graceful Reboot to Host Setup

Is your feature request related to a problem? Please describe.
Yes, when nodes are rebooted normally or shutdown for maintenance via reboot now or shutdown -h etc, pods get stuck in Terminating or Running and never reschedule.

Describe the solution you'd like
https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Intermittent Kubespray failure due to SSH error

Describe the bug
During kubespray, the following task seems to fail intermittently. Now, this could be considered an upstream bug because kubespray is not modifying MaxSessions on the first kubernetes control plane node, but.. should it?

Seen while running cluster.yml to add a few additional compute nodes:

TASK [kubernetes/kubeadm : Create kubeadm token for joining nodes with 24h expiration (default)] **********************************************************************************************************************************************
fatal: [compute016 -> kubernetes01]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"kubernetes01\". Make sure this host can be reached over ssh: mux_client_request_session: session request failed: Session open refused by peer\r\nkex_exchange_identification: Connection closed by remote host\r\nConnection closed by 172.24.9.61 port 22\r\n", "unreachable": true}

Observed on the kubernetes01 node in question:

Apr 24 22:55:58 kubernetes01 sshd[1208808]: error: no more sessions
Apr 24 22:55:58 kubernetes01 sshd[1208808]: error: no more sessions
Apr 24 22:55:58 kubernetes01 sshd[1208808]: error: no more sessions
Apr 24 22:55:58 kubernetes01 sshd[1208808]: error: no more sessions

Issue was resolved by increasing MaxSessions and MaxStartups in /etc/ssh/sshd_config on the kubernetes01 node.

Related stack-exchange ref: https://unix.stackexchange.com/a/22987

To Reproduce
Steps to reproduce the behavior:
Kubespray a large group of nodes, some amount over 10.

Expected behavior
No failure to connect via ssh from cluster member nodes to the kubernetes control plane node.

Screenshots
If applicable, add screenshots to help explain your problem.

Server (please complete the following information):

OS: Ubuntu
Version 22.04
openssh-server version: 1:8.9p1-3ubuntu0.6

Additional context
Add any other context about the problem here.

"Hybrid Cinder Volume deployment" playbook fails when more than one storage host is present

Describe the bug
When more than one storage host is present the playbook "deploy-cinder-volumes-reference.yaml" will fail for all but one host due to the delegation in task "Ensure python3-kubernetes is available" - delegate_to: "{{ groups['kube_control_plane'][0] }}"

To Reproduce
Steps to reproduce the behavior:

Have multiple storage hosts
Run deploy-cinder-volumes-reference.yaml
Observe error on all but one storage node:

fatal: [storage1.lab.local -> controller1.lab.local(192.168.100.11)]: FAILED! => {"cache_update_time": 1707109502, "cache_updated": true, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-con
fdef\" -o \"Dpkg::Options::=--force-confold\"       install 'python3-kubernetes=12.0.1-1ubuntu1'' failed: E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 126984 (apt-get)\nE: Unable to acquire
 the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?\n", "rc": 100, "stderr": "E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 126984 (apt-get)\nE: Unable to acq
uire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?\n", "stderr_lines": ["E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 126984 (apt-get)", "E: Unable to a
cquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?"], "stdout": "", "stdout_lines": []}

Workaround
Remove delegate_to directive from playbook.

Unable to install vault

Describe the bug
Unable to install vault with provided wiki steps.

To Reproduce
Steps to reproduce the behavior:
(genestack) root@genestack-controller1:/opt/genestack/kustomize/vault/base# kustomize build . --enable-helm | kubectl apply -f -
Command 'kustomize' not found, but can be installed with:
snap install kustomize
error: no objects passed to apply

Expected behavior
kustomize should be installed in a prior step.

Server (please complete the following information):
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"

Kubernetes fails to install at the point of joining the other 2 masters into the cluster

Describe the bug
Kubespray fails when it tries to join the second and third master into the cluster. Reporting the certificate being used is not valid for .cluster.local When I checked the /etc/hosts file there had been entries added to the file by ansible that gave all the servers in the cluster a cluster.local domain.

To Reproduce
Steps to reproduce the behavior:
run the first 4 steps in the wiki

Expected behavior
kubernetes would be installed on all servers with 3 masters

Server (please complete the following information):

OS: 22.04
Version 1.28.5

Additional context
Add any other context about the problem here.

Cluster name applied to kubernetes but not applied to any of the overrides

Describe the bug
During kubespray run the cluster name is applied to kubernetes from inventory file, but when you attempt to deploy mysql and all openstack services they still refer to the cluster as cluster.local. Also there is no mention of the change in the wiki

To Reproduce
Steps to reproduce the behavior:

Deploy genestack. You will notice

Expected behavior
All affected files need to have default cluster name changed to match the one in the inventory file.

Server (please complete the following information):

OS: Ubuntu
Version 22.04

Additional context
Here is a list of the files that have to be corrected:

the kustomize/mariadb-operator/kustomization.yaml - clusterName needs to be changed
/opt/genestack/helm-configs/keystone/keystone-helm-overrides.yaml - cluster_domain_suffix needs to be changed
/opt/genestack/helm-configs/glance/glance-helm-overrides.yaml - cluster_domain_suffix needs to be changed
/opt/genestack/helm-configs/heat/heat-helm-overrides.yaml - cluster_domain_suffix needs to be changed
/opt/genestack/helm-configs/cinder/cinder-helm-overrides.yaml - cluster_domain_suffix needs to be changed
/opt/genestack/helm-configs/octavia/octavia-helm-overrides.yaml - cluster_domain_suffix needs to be changed
/opt/genestack/helm-configs/neutron/neutron-helm-overrides - cluster_domain_suffix needs to be changed
/opt/genestack/helm-configs/nova/nova-helm-overrides.yaml - cluster_domain_suffix needs to be changed
/opt/genestack/helm-configs/placement/placement-helm-overrides.yaml - cluster_domain_suffix needs to be changed
/opt/genestack/helm-configs/horizon/horizon-helm-overrides.yaml - cluster_domain_suffix needs to be changed

For skyline the secrets creation has to be corrected:

kubectl --namespace openstack
create secret generic skyline-apiserver-secrets
--type Opaque
--from-literal=service-username="skyline"
--from-literal=service-password="$(< /dev/urandom tr -dc _A-Za-z0-9 | head -c${1:-32};echo;)"
--from-literal=service-domain="service"
--from-literal=service-project="service"
--from-literal=service-project-domain="service"
--from-literal=db-endpoint="mariadb-galera-primary.openstack.svc.<FIX_ME>"
--from-literal=db-name="skyline"
--from-literal=db-username="skyline"
--from-literal=db-password="$(< /dev/urandom tr -dc _A-Za-z0-9 | head -c${1:-32};echo;)"
--from-literal=secret-key="$(< /dev/urandom tr -dc _A-Za-z0-9 | head -c${1:-32};echo;)"
--from-literal=keystone-endpoint="
http://keystone-api.openstack.svc.<FIX_ME>:5000"

--from-literal=default-region="RegionOne"

and then this file also needs to be correct for skyline:

/opt/genestack/kustomize/skyline/base/ingress-apiserver.yaml - host definition at bottom

fix note found in logs

Describe the bug

2024-04-19 12:17:57 0 [Note] Using unique option prefix 'ignore-db-dir' is error-prone and can break in the future. Please use the full name 'ignore_db_dirs' instead.
2024-04-19 12:17:57 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery.
2024-04-19 12:17:57 0 [Warning] 'innodb-file-format' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
2024-04-19 12:17:57 0 [Warning] 'innodb-locks-unsafe-for-binlog' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2024-04-19 12:15:45 0 [Note] Using unique option prefix 'ignore-db-dir' is error-prone and can break in the future. Please use the full name 'ignore_db_dirs' instead.

openstack console url show returns incorrect URL

The 'openstack console url show' command returns an https url with port 80 in the host field, which doesn't work. Removing :80 allows the URL to work.

root@ospc:~/flex-terraform/terraform/05-cisco-asa-standalone# openstack --os-cloud flex-dfw console url show a3289102-ea33-4768-b578-862d4cd44a28
+----------+--------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+----------+--------------------------------------------------------------------------------------------------------------------+
| protocol | vnc |
| type | novnc |
| url | https://novnc.dfw-ospcv2-staging.ohthree.com:80/vnc_auto.html?path=%3Ftoken%3De2c57911-a5f2-4989-804b-6ae5bd989138 |
+----------+--------------------------------------------------------------------------------------------------------------------+

openrc.sh downloaded via control panel has the wrong keystone endpoint

Describe the bug
The openrc.sh file which can be downloaded via the control panel has a bad endpoint for keystone.

To Reproduce
Steps to reproduce the behavior:

Login to the control panel.
Once logged in, hover over the user image in the very upper right hand corner and select the "Get OpenRC File" option.
Select either option for password or credential to download the file locally.
If you review the openrc.sh file you should see the following line:

export OS_AUTH_URL=https://keystone.dfw-ospcv2-staging.ohthree.com/v3/v3/

Note the extra "v3/" appended to the end of the line. If not corrected you get a 404 response:

$ openstack server list --insecure
Failed to discover available identity versions when contacting https://keystone.dfw-ospcv2-staging.ohthree.com/v3/v3/. Attempting to parse version from URL.
Not Found (HTTP 404) (Request-ID: req-7fb601bd-7a30-4741-bbe5-12ea69be2148)

Expected behavior
The line just needs to be:

export OS_AUTH_URL=https://keystone.dfw-ospcv2-staging.ohthree.com/v3/

iscsid.service is not enabled or running on compute nodes

When deploying iscsi backed cinder storage, iscsid.service is not enabled or running on compute nodes when nova/nova-helm-overrides.yaml: enable_iscsi: true

Integrate ceph-csi into genestack

Rook adds a moving target to deploying SCs into Genestack, for external deploys of ceph, ceph-csi may be a better fit.

Neutron metadata OVN crashes periodically

Describe the bug
At seemingly random intervals Neutron OVN metadata agent will crash, and when it crashes it gets into a crash loop and fails to get back to a running state.

To Reproduce
I can't reproduce the error at this time.

Expected behavior
The Neutron OVN metadata agent shouldn't crash.

Kubernetes Version
Currently running v1.28.6. This issue has also been seen in v1.26.10.

POD Description

# kubectl --namespace openstack describe pod neutron-ovn-metadata-agent-default-nrmnk
Name:             neutron-ovn-metadata-agent-default-nrmnk
Namespace:        openstack
Priority:         0
Service Account:  neutron-ovn-metadata-agent
Node:             935822-compute03-ospcv2-dfw.openstack.local/172.28.232.123
Start Time:       Fri, 26 Jan 2024 02:43:15 +0000
Labels:           application=neutron
                  component=ovn-metadata-agent
                  controller-revision-hash=6b4fc69764
                  pod-template-generation=14
                  release_group=neutron
Annotations:      configmap-bin-hash: 3b17fadc4799090e9f5d65201d90080ae322cff710e2b448a8f9d2c92555a57d
                  configmap-etc-hash: 05ac75f51498078f0e5f26737411dca523197d952546c1c5d9926732d1d7d7a8
                  openstackhelm.openstack.org/release_uuid:
Status:           Running
IP:               172.28.232.123
IPs:
  IP:           172.28.232.123
Controlled By:  DaemonSet/neutron-ovn-metadata-agent-default
Init Containers:
  init:
    Container ID:  containerd://9e3aeb65d0f234b6e93298ec960c940a5cbe1a3b88c61ed2b5a05cf7d61b1ee7
    Image:         quay.io/airshipit/kubernetes-entrypoint:v1.0.0
    Image ID:      sha256:c092d0dada614fdae3920939c5a9683b2758288f23c2e3b425128653857d7520
    Port:          <none>
    Host Port:     <none>
    Command:
      kubernetes-entrypoint
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 26 Jan 2024 02:43:15 +0000
      Finished:     Fri, 26 Jan 2024 02:43:17 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      POD_NAME:                    neutron-ovn-metadata-agent-default-nrmnk (v1:metadata.name)
      NAMESPACE:                   openstack (v1:metadata.namespace)
      INTERFACE_NAME:              eth0
      PATH:                        /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/
      DEPENDENCY_SERVICE:          openstack:nova-metadata,openstack:neutron-server
      DEPENDENCY_DAEMONSET:
      DEPENDENCY_CONTAINER:
      DEPENDENCY_POD_JSON:
      DEPENDENCY_CUSTOM_RESOURCE:
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vqjjt (ro)
  neutron-metadata-agent-init:
    Container ID:  containerd://646c5854dc030c26473f090cc62a3b85e33badd985990b90addb67ec6098c383
    Image:         docker.io/openstackhelm/neutron:2023.1-ubuntu_jammy
    Image ID:      docker.io/openstackhelm/neutron@sha256:b6f3dcfe8ffe051ed2280857365ebfd51220e8d0ef8c4ef9f9f8f59ddf1a0823
    Port:          <none>
    Host Port:     <none>
    Command:
      /tmp/neutron-metadata-agent-init.sh
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 26 Jan 2024 02:43:18 +0000
      Finished:     Fri, 26 Jan 2024 02:43:18 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  4Gi
    Requests:
      cpu:     100m
      memory:  64Mi
    Environment:
      NEUTRON_USER_UID:  42424
    Mounts:
      /etc/neutron/neutron.conf from neutron-etc (ro,path="neutron.conf")
      /tmp from pod-tmp (rw)
      /tmp/neutron-metadata-agent-init.sh from neutron-bin (ro,path="neutron-metadata-agent-init.sh")
      /var/lib/neutron/openstack-helm from socket (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vqjjt (ro)
  ovn-neutron-init:
    Container ID:  containerd://177e528248223bd182edb2c6be832085a1164eb2f5a9ca018319a1227fa8b249
    Image:         docker.io/openstackhelm/neutron:2023.1-ubuntu_jammy
    Image ID:      docker.io/openstackhelm/neutron@sha256:b6f3dcfe8ffe051ed2280857365ebfd51220e8d0ef8c4ef9f9f8f59ddf1a0823
    Port:          <none>
    Host Port:     <none>
    Command:
      /tmp/neutron-ovn-init.sh
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 26 Jan 2024 02:43:19 +0000
      Finished:     Fri, 26 Jan 2024 02:43:19 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  4Gi
    Requests:
      cpu:        100m
      memory:     64Mi
    Environment:  <none>
    Mounts:
      /tmp from pod-tmp (rw)
      /tmp/neutron-ovn-init.sh from neutron-bin (ro,path="neutron-ovn-init.sh")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vqjjt (ro)
Containers:
  neutron-ovn-metadata-agent:
    Container ID:  containerd://dd4814d89b8446c913e22a8121062bae3d49206fd71fb89d6c9806c7e611140b
    Image:         docker.io/openstackhelm/neutron:2023.1-ubuntu_jammy
    Image ID:      docker.io/openstackhelm/neutron@sha256:b6f3dcfe8ffe051ed2280857365ebfd51220e8d0ef8c4ef9f9f8f59ddf1a0823
    Port:          <none>
    Host Port:     <none>
    Command:
      /tmp/neutron-ovn-metadata-agent.sh
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 26 Jan 2024 03:17:01 +0000
      Finished:     Fri, 26 Jan 2024 03:17:14 +0000
    Ready:          False
    Restart Count:  11
    Limits:
      cpu:     2
      memory:  4Gi
    Requests:
      cpu:      100m
      memory:   64Mi
    Liveness:   exec [python /tmp/health-probe.py --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/ovn_metadata_agent.ini --liveness-probe] delay=120s timeout=580s period=600s #success=1 #failure=3
    Readiness:  exec [python /tmp/health-probe.py --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/ovn_metadata_agent.ini] delay=30s timeout=185s period=190s #success=1 #failure=3
    Environment:
      RPC_PROBE_TIMEOUT:  60
      RPC_PROBE_RETRIES:  2
    Mounts:
      /etc/neutron/logging.conf from neutron-etc (ro,path="logging.conf")
      /etc/neutron/neutron.conf from neutron-etc (ro,path="neutron.conf")
      /etc/neutron/ovn_metadata_agent.ini from neutron-etc (ro,path="ovn_metadata_agent.ini")
      /etc/neutron/plugins/ml2/ml2_conf.ini from neutron-etc (ro,path="ml2_conf.ini")
      /etc/neutron/rootwrap.conf from neutron-etc (ro,path="rootwrap.conf")
      /etc/neutron/rootwrap.d/debug.filters from neutron-etc (ro,path="debug.filters")
      /etc/neutron/rootwrap.d/dhcp.filters from neutron-etc (ro,path="dhcp.filters")
      /etc/neutron/rootwrap.d/dibbler.filters from neutron-etc (ro,path="dibbler.filters")
      /etc/neutron/rootwrap.d/ebtables.filters from neutron-etc (ro,path="ebtables.filters")
      /etc/neutron/rootwrap.d/ipset-firewall.filters from neutron-etc (ro,path="ipset-firewall.filters")
      /etc/neutron/rootwrap.d/iptables-firewall.filters from neutron-etc (ro,path="iptables-firewall.filters")
      /etc/neutron/rootwrap.d/l3.filters from neutron-etc (ro,path="l3.filters")
      /etc/neutron/rootwrap.d/linuxbridge-plugin.filters from neutron-etc (ro,path="linuxbridge-plugin.filters")
      /etc/neutron/rootwrap.d/netns-cleanup.filters from neutron-etc (ro,path="netns-cleanup.filters")
      /etc/neutron/rootwrap.d/openvswitch-plugin.filters from neutron-etc (ro,path="openvswitch-plugin.filters")
      /etc/neutron/rootwrap.d/privsep.filters from neutron-etc (ro,path="privsep.filters")
      /etc/sudoers.d/kolla_neutron_sudoers from neutron-etc (ro,path="neutron_sudoers")
      /run from run (rw)
      /run/netns from host-run-netns (rw)
      /tmp from pod-tmp (rw)
      /tmp/health-probe.py from neutron-bin (ro,path="health-probe.py")
      /tmp/neutron-ovn-metadata-agent.sh from neutron-bin (ro,path="neutron-ovn-metadata-agent.sh")
      /var/lib/neutron from pod-var-neutron (rw)
      /var/lib/neutron/openstack-helm from socket (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vqjjt (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  pod-tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  pod-var-neutron:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  run:
    Type:          HostPath (bare host directory volume)
    Path:          /run
    HostPathType:
  neutron-bin:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      neutron-bin
    Optional:  false
  neutron-etc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  neutron-ovn-metadata-agent-default
    Optional:    false
  socket:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/neutron/openstack-helm
    HostPathType:
  host-run-netns:
    Type:          HostPath (bare host directory volume)
    Path:          /run/netns
    HostPathType:
  kube-api-access-vqjjt:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              openstack-network-node=enabled
Tolerations:                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  35m                  default-scheduler  Successfully assigned openstack/neutron-ovn-metadata-agent-default-nrmnk to 935822-compute03-ospcv2-dfw.openstack.local
  Normal   Pulled     35m                  kubelet            Container image "quay.io/airshipit/kubernetes-entrypoint:v1.0.0" already present on machine
  Normal   Created    35m                  kubelet            Created container init
  Normal   Started    35m                  kubelet            Started container init
  Normal   Pulled     35m                  kubelet            Container image "docker.io/openstackhelm/neutron:2023.1-ubuntu_jammy" already present on machine
  Normal   Created    35m                  kubelet            Created container neutron-metadata-agent-init
  Normal   Started    35m                  kubelet            Started container neutron-metadata-agent-init
  Normal   Pulled     35m                  kubelet            Container image "docker.io/openstackhelm/neutron:2023.1-ubuntu_jammy" already present on machine
  Normal   Created    35m                  kubelet            Created container ovn-neutron-init
  Normal   Started    35m                  kubelet            Started container ovn-neutron-init
  Normal   Pulled     34m (x4 over 35m)    kubelet            Container image "docker.io/openstackhelm/neutron:2023.1-ubuntu_jammy" already present on machine
  Normal   Created    34m (x4 over 35m)    kubelet            Created container neutron-ovn-metadata-agent
  Normal   Started    34m (x4 over 35m)    kubelet            Started container neutron-ovn-metadata-agent
  Warning  BackOff    35s (x151 over 35m)  kubelet            Back-off restarting failed container neutron-ovn-metadata-agent in pod neutron-ovn-metadata-agent-default-nrmnk_openstack(b12830e7-4c00-4562-a035-a31aab324ea3)

POD Logs

2024-01-26 03:17:06.670 321 INFO neutron.common.config [-] Logging enabled!
2024-01-26 03:17:06.670 321 INFO neutron.common.config [-] /var/lib/openstack/bin/neutron-ovn-metadata-agent version 22.1.1.dev14
2024-01-26 03:17:06.842 321 INFO neutron.agent.ovn.metadata.ovsdb [-] Getting OvsdbSbOvnIdl for MetadataAgent with retry
2024-01-26 03:17:07.288 330 INFO neutron.agent.ovn.metadata.ovsdb [-] Getting OvsdbSbOvnIdl for MetadataAgent with retry
2024-01-26 03:17:07.292 329 INFO neutron.agent.ovn.metadata.ovsdb [-] Getting OvsdbSbOvnIdl for MetadataAgent with retry
2024-01-26 03:17:13.761 321 INFO neutron.agent.ovn.metadata.agent [-] Cleaning up ovnmeta-f992a46b-8dae-4eef-b80f-2d980466a361 namespace which is not needed anymore
2024-01-26 03:17:14.186 321 CRITICAL neutron [-] Unhandled error: OSError: [Errno 22] failed to open netns
2024-01-26 03:17:14.186 321 ERROR neutron Traceback (most recent call last):
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/bin/neutron-ovn-metadata-agent", line 8, in <module>
2024-01-26 03:17:14.186 321 ERROR neutron     sys.exit(main())
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/lib/python3.10/site-packages/neutron/cmd/eventlet/agents/ovn_metadata.py", line 24, in main
2024-01-26 03:17:14.186 321 ERROR neutron     metadata_agent.main()
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/lib/python3.10/site-packages/neutron/agent/ovn/metadata_agent.py", line 42, in main
2024-01-26 03:17:14.186 321 ERROR neutron     agt.start()
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/lib/python3.10/site-packages/neutron/agent/ovn/metadata/agent.py", line 334, in start
2024-01-26 03:17:14.186 321 ERROR neutron     self.sync()
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/lib/python3.10/site-packages/neutron/agent/ovn/metadata/agent.py", line 65, in wrapped
2024-01-26 03:17:14.186 321 ERROR neutron     return f(*args, **kwargs)
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/lib/python3.10/site-packages/neutron/agent/ovn/metadata/agent.py", line 407, in sync
2024-01-26 03:17:14.186 321 ERROR neutron     self.teardown_datapath(self._get_datapath_name(ns))
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/lib/python3.10/site-packages/neutron/agent/ovn/metadata/agent.py", line 454, in teardown_datapath
2024-01-26 03:17:14.186 321 ERROR neutron     ip.garbage_collect_namespace()
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/lib/python3.10/site-packages/neutron/agent/linux/ip_lib.py", line 267, in garbage_collect_namespace
2024-01-26 03:17:14.186 321 ERROR neutron     if self.namespace_is_empty():
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/lib/python3.10/site-packages/neutron/agent/linux/ip_lib.py", line 262, in namespace_is_empty
2024-01-26 03:17:14.186 321 ERROR neutron     return not self.get_devices()
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/lib/python3.10/site-packages/neutron/agent/linux/ip_lib.py", line 179, in get_devices
2024-01-26 03:17:14.186 321 ERROR neutron     devices = privileged.get_device_names(self.namespace)
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/lib/python3.10/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 642, in get_device_names
2024-01-26 03:17:14.186 321 ERROR neutron     in get_link_devices(namespace, **kwargs)]
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/lib/python3.10/site-packages/tenacity/__init__.py", line 333, in wrapped_f
2024-01-26 03:17:14.186 321 ERROR neutron     return self(f, *args, **kw)
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/lib/python3.10/site-packages/tenacity/__init__.py", line 423, in __call__
2024-01-26 03:17:14.186 321 ERROR neutron     do = self.iter(retry_state=retry_state)
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/lib/python3.10/site-packages/tenacity/__init__.py", line 360, in iter
2024-01-26 03:17:14.186 321 ERROR neutron     return fut.result()
2024-01-26 03:17:14.186 321 ERROR neutron   File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
2024-01-26 03:17:14.186 321 ERROR neutron     return self.__get_result()
2024-01-26 03:17:14.186 321 ERROR neutron   File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
2024-01-26 03:17:14.186 321 ERROR neutron     raise self._exception
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/lib/python3.10/site-packages/tenacity/__init__.py", line 426, in __call__
2024-01-26 03:17:14.186 321 ERROR neutron     result = fn(*args, **kwargs)
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/lib/python3.10/site-packages/oslo_privsep/priv_context.py", line 271, in _wrap
2024-01-26 03:17:14.186 321 ERROR neutron     return self.channel.remote_call(name, args, kwargs,
2024-01-26 03:17:14.186 321 ERROR neutron   File "/var/lib/openstack/lib/python3.10/site-packages/oslo_privsep/daemon.py", line 215, in remote_call
2024-01-26 03:17:14.186 321 ERROR neutron     raise exc_type(*result[2])
2024-01-26 03:17:14.186 321 ERROR neutron OSError: [Errno 22] failed to open netns
2024-01-26 03:17:14.186 321 ERROR neutron

Additional context
The relevant log entry is here.

2024-01-26 03:17:13.761 321 INFO neutron.agent.ovn.metadata.agent [-] Cleaning up ovnmeta-f992a46b-8dae-4eef-b80f-2d980466a361 namespace which is not needed anymore
2024-01-26 03:17:14.186 321 CRITICAL neutron [-] Unhandled error: OSError: [Errno 22] failed to open netns

It seems that the neutron metadata agent is attempting to cleanup ovnmeta-f992a46b-8dae-4eef-b80f-2d980466a361 and failing to do so. While this problem is easy to fix, simply login to the offending node 935822-compute03-ospcv2-dfw.openstack.local/172.28.232.123 and eliminate the erroneous namespace, the metadata agent should be able to do this automatically.

Resolving the issue with Ansible.

ansible -m shell -a 'ip netns delete ovnmeta-f992a46b-8dae-4eef-b80f-2d980466a361' 935822-compute03-ospcv2-dfw.openstack.local --inventory /etc/genestack/inventory --become

Host netns

When investigating the host, we find the following output.

root@935822-compute03-ospcv2-dfw:~# ip netns
Error: Peer netns reference is invalid.
Error: Peer netns reference is invalid.
ovnmeta-a60825e1-66ff-479c-92c4-2120b092f75c (id: 15)
ovnmeta-2887f499-3ca1-4507-8f93-7585e6a13f63 (id: 14)
cni-0adaaeb7-0666-ee24-ce5e-fb5bc70c4f21 (id: 6)
cni-46d2ca93-5b2a-1574-ecfb-1c1e03d85ed4 (id: 4)
cni-c48dde5f-15ed-c672-35f4-a7b7bd361222 (id: 3)
cni-b595cd77-4a34-89fc-d2ff-8b55e8a76b23 (id: 2)
ovnmeta-970d930a-f047-4a4b-977e-d9a2fe2fe00d (id: 5)
Error: Peer netns reference is invalid.
ovnmeta-f992a46b-8dae-4eef-b80f-2d980466a361
Error: Peer netns reference is invalid.
ovnmeta-691b7a69-1142-4d5b-8395-026a4c676aab
ovnmeta-7ceb1594-e791-4ce2-a218-761e59bb1169 (id: 11)
ovnmeta-1b686933-cc0d-4744-9642-cbbf5e50bed8 (id: 10)
ovnmeta-639ba975-890a-4bc6-a230-3b6548ce6089 (id: 9)
ovnmeta-9b291755-a9b6-4abe-a5a7-6f75a7f6db16 (id: 8)
ovnmeta-0beb4d29-1244-431f-85e8-dd0a5ef1ac88 (id: 7)
cni-08e4d4a9-723a-4959-bcf4-b73d5859acf5 (id: 0

The namespace ovnmeta-f992a46b-8dae-4eef-b80f-2d980466a361 has no id and we see the Error: Peer netns reference is invalid..

OVN-Backup

Describe the bug
ovn-backup.yaml fails to run to completion complaining about wrong namespace.

To Reproduce
Steps to reproduce the behavior:

kubectl --namespace openstack apply -k /opt/genestack/kustomize/ovn

Expected behavior

Server (please complete the following information):

OS: ubuntu
Version 22.04

mariadb-operator has default setting "clusterName" that causes installs to fail on non-default named clusters

Describe the bug
The mariadb-operator has a setting "clusterName: cluster.local" which is later used for galera nodes for their wsrep connection urls.
If this is not changed for a cluster with a different name galera will fail to come up due to name resolution failure.

Workaround when following wiki:

kubectl kustomize --enable-helm /opt/genestack/kustomize/mariadb-operator | sed "s/cluster\.local/${CLUSTER_NAME}/g"| kubectl --namespace mariadb-system apply --server-side --force-conflicts -f -
# persist changes
sed -i "s/^clusterName: .*$/clusterName: ${CLUSTER_NAME}/" /opt/genestack/kustomize/mariadb-operator/charts/mariadb-operator/values.yaml

Ingress resources point to an IngressClass that does not exist

Describe the bug
During the installation of ingress-controllers, we override the ingress class to nginx and nginx-cluster[1]. But, I can see there are few ingress resources which points to nginx-openstack ingressclass which doesnt exist[2].

[1]

# kubectl get ingressclass
NAME            CONTROLLER             PARAMETERS   AGE
nginx           k8s.io/nginx-ingress   <none>       3d19h
nginx-cluster   k8s.io/nginx-ingress   <none>       3d19h

[2]

openstack       cinder-cluster-fqdn                    nginx-openstack   cinder.dfw-ospcv2-staging.ohthree.com                                                                                               80, 443   3d17h
# kubectl get ing -A |egrep -i nginx-openstack |wc -l
11

To Reproduce
Steps to reproduce the behavior:

Check ingress class from the environment
Check deployed ingress resources from the enviornment

Expected behavior
The ingress resources should use valid ingressclass available in the environment.

Server (please complete the following information):

OS: ubuntu
Version: 22.04

Sync product group_vars from repo to /etc

Is your feature request related to a problem? Please describe.
When updating group_vars for a given product, e.g. ansible/inventory/openstack-flex/group_vars/k8s_cluster/k8s-cluster.yml they will not take effect in subsequent kubespray operations unless the bootstrap script is run or those group_vars are manually updated.

Describe the solution you'd like
Perhaps we could sync these when we source /opt/genestack/scripts/genestack.rc

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Wiki Update: Include a step to install cert-manager

Current: there is a self-signed cert manager step for openstack but the cert-manger crd is not installed by default.

Solution:
In the wiki either on cluster prepare or openstack install include a step to install the cert-manager crd.
https://cert-manager.io/docs/installation/kubectl/

Add helm annotation to keep specific resources

Is your feature request related to a problem? Please describe.
When running helm uninstall it's possible to eliminate a resource which is not intended to be deleted.

Describe the solution you'd like
Add the helm annotation helm.sh/resource-policy: keep where needed so we're not eliminating resources unexpectedly.

rackerlabs / genestack Goto Github PK

genestack's Introduction

Welcome to Genestack: Where Cloud Meets You

Documentation

Symphony of Simplicity

Hybrid Hilarity

The Secret Sauce: Kustomize & Helm

genestack's People

Contributors

Stargazers

Watchers

Forkers

genestack's Issues

Seen while running cluster.yml to add a few additional compute nodes:

Observed on the kubernetes01 node in question:

Recommend Projects

Recommend Topics

Recommend Org