On an initial deployment of OpenStack in Atmosphere, if the

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Error on initial neutron deployment when `openstack_helm_neutron_networks` is populated about atmosphere HOT 11 CLOSED

vexxhost commented on September 11, 2024

Error on initial neutron deployment when `openstack_helm_neutron_networks` is populated

from atmosphere.

Comments (11)

mnaser commented on September 11, 2024 1

@runlevel-six as we are at a point of validating zed, we noticed that this is an issue that shows up with newer releases.

we're updating the order of playbooks to fix it

from atmosphere.

mnaser commented on September 11, 2024

@runlevel-six are you having this issue with a newer release of OpenStack or Wallaby?

I wonder because we actually use this inside the Molecule CI and we don't run into this:

atmosphere/playbooks/generate_workspace.yml

Lines 266 to 280 in f3a14a3

    
           openstack_helm_neutron_networks: 
        
             - name: public 
        
               external: true 
        
               shared: true 
        
               mtu_size: 1500 
        
               port_security_enabled: true 
        
               provider_network_type: flat 
        
               provider_physical_network: external 
        
               subnets: 
        
                 - name: public-subnet 
        
                   cidr: 10.96.250.0/24 
        
                   gateway_ip: 10.96.250.10 
        
                   allocation_pool_start: 10.96.250.200 
        
                   allocation_pool_end: 10.96.250.220 
        
                   enable_dhcp: true

from atmosphere.

runlevel-six commented on September 11, 2024

@mnaser that is a good question. I know I've seen it running yoga as my atmosphere_openstack_release value, but it is certainly possible that it works fine on wallaby - or even that this is potentially due to something different I'm doing (I don't deploy the Atmosphere Kubernetes cluster, for example), though I'm not sure where the cause would cascade down on those differences.

from atmosphere.

mnaser commented on September 11, 2024

@runlevel-six can you share the exact error if you have it handy? perhaps that might help us to research it

from atmosphere.

runlevel-six commented on September 11, 2024

The specific error is msg": "ResourceNotFound: 404: Client Error for url: https://network-<fqdn>/v2.0/networks, AvailabilityZone nova could not be found."

The surrounding logs:

TASK [openstack_helm_neutron : Deploy Helm chart] *********************************************************************************************
changed: [atmosphere-controller-01]

TASK [Create Ingress] *************************************************************************************************************************

TASK [openstack_helm_ingress : Create Ingress (network)] **************************************************************************************
ok: [atmosphere-controller-01]

TASK [openstack_helm_neutron : Wait until network service ready] ******************************************************************************
ok: [atmosphere-controller-01]

TASK [openstack_helm_neutron : Create networks] ***********************************************************************************************
failed: [clt-a-a04-19-2-sr-blade-b] (item={'external': True, 'mtu_size': 1500, 'name': 'public', 'port_security_enabled': True, 'provider_network_type': 'flat', 'provider_physical_network': 'external', 'shared': True, 'subnets': [{'allocation_pool_end': '10.1.160.90', 'allocation_pool_start': '10.1.160.51', 'cidr': '10.1.160.0/24', 'dns_nameservers': ['1.1.1.1', '1.0.0.1'], 'enable_dhcp': True, 'gateway_ip': '10.1.160.1', 'name': 'public-subnet'}]}) => {"ansible_loop_var": "item", "changed": false, "extra_data": {"data": null, "details": "AvailabilityZone nova could not be found.", "response": "{\"NeutronError\": {\"type\": \"AvailabilityZoneNotFound\", \"message\": \"AvailabilityZone nova could not be found.\", \"detail\": \"\"}}"}, "item": {"external": true, "mtu_size": 1500, "name": "public", "port_security_enabled": true, "provider_network_type": "flat", "provider_physical_network": "external", "shared": true, "subnets": [{"allocation_pool_end": "10.1.160.90", "allocation_pool_start": "10.1.160.51", "cidr": "10.1.160.0/24", "dns_nameservers": ["1.1.1.1", "1.0.0.1"], "enable_dhcp": true, "gateway_ip": "10.1.160.1", "name": "public-subnet"}]}, "msg": "ResourceNotFound: 404: Client Error for url: https://network-<fqdn>/v2.0/networks, AvailabilityZone nova could not be found."}

At this point in the plays, the neutron pods are not completely stable, and they stay at the following state until nova is deployed (specifically until the nova-api service is available):

kubectl -n openstack get po | grep -v 'Running\|Completed'                                       it-45661: Wed Sep 21 14:39:31 2022

NAME                                                  READY   STATUS      RESTARTS   AGE
neutron-dhcp-agent-default-4djfg                      0/1     Init:0/2    0          7m17s
neutron-dhcp-agent-default-g8ww6                      0/1     Init:0/2    0          7m17s
neutron-dhcp-agent-default-xf7z8                      0/1     Init:0/2    0          7m17s
neutron-l3-agent-default-6pfqh                        0/1     Init:0/2    0          7m17s
neutron-l3-agent-default-fgbtv                        0/1     Init:0/2    0          7m17s
neutron-l3-agent-default-fm2g7                        0/1     Init:0/2    0          7m17s
neutron-metadata-agent-default-7c66p                  0/1     Init:0/2    0          7m17s
neutron-metadata-agent-default-dv9v5                  0/1     Init:0/2    0          7m17s
neutron-metadata-agent-default-sfgvw                  0/1     Init:0/2    0          7m17s

If I followup after that failure calling the openstack playbook with just the tag openstack-helm-nova and let nova deploy, then immediately after it finishes I will see that all pods are running and stable, I can just re-run the entire openstack playbook and everything works including the Create networks task that failed prior to nova being deployed.

from atmosphere.

mnaser commented on September 11, 2024

@runlevel-six would it resolve your issue if we flip things around and deploy nova then neutron? I haven't been able to successfully figure out why this happens, but I think that might mitigate things while still making them work?

from atmosphere.

runlevel-six commented on September 11, 2024

@mnaser I re-deployed this fresh, this time with the openstack playbook changed to swap the order of the nova and neutron roles as you suggested. I left my release set to yoga. This time nova was deployed and reconciled as far as flux is concerned, but the nova-compute-default daemonset had not yet stabilized when the openstack playbook deployed neutron next. Neutron deployed but I still received the same error as before when it tried to create networks:

TASK [openstack_helm_neutron : Create networks] *********************************************************************************************
failed: [atmosphere-controller-01] (item={'external': True, 'mtu_size': 1500, 'name': 'public', 'port_security_enabled': True, .'provider_network_type': 'flat', 'provider_physical_network': 'external', 'shared': True, 'subnets': [{'allocation_pool_end': '10.1.160.90', 'allocation_pool_start': '10.1.160.51', 'cidr': '10.1.160.0/24', 'dns_nameservers': ['1.1.1.1', '1.0.0.1'], 'enable_dhcp': True, 'gateway_ip': '10.1.160.1', 'name': 'public-subnet'}]}) => {"ansible_loop_var": "item", "changed": false, "extra_data": {"data": null, "details": "AvailabilityZone nova could not be found.", "response": "{\"NeutronError\": {\"type\": \"AvailabilityZoneNotFound\", \"message\": \"AvailabilityZone nova could not be found.\", \"detail\": \"\"}}"}, "item": {"external": true, "mtu_size": 1500, "name": "public", "port_security_enabled": true, "provider_network_type": "flat", "provider_physical_network": "external", "shared": true, "subnets": [{"allocation_pool_end": "10.1.160.90", "allocation_pool_start": "10.1.160.51", "cidr": "10.1.160.0/24", "dns_nameservers": ["1.1.1.1", "1.0.0.1"], "enable_dhcp": true, "gateway_ip": "10.1.160.1", "name": "public-subnet"}]}, "msg": "ResourceNotFound: 404: Client Error for url: https://network-<fqdn>/v2.0/networks, AvailabilityZone nova could not be found."}

Less than a minute after that fails the neutron pods all are running and at that point the nova-compute-default daemonset successfully deploys. At this point re-running the openstack playbook results in a successful completion including network creation.

Also, note that I am not yet using the latest commit of Atmosphere because I haven't finished reviewing the atmosphere operator to test it out. So that is just to say that some of what I am experiencing may be due to not running the latest changes y'all have made.

from atmosphere.

mnaser commented on September 11, 2024

@runlevel-six are you running this on bare metal or VMs? I am wondering if the issue here is that no agents are up yet, and because of that, it fails.

I guess it's possible that in your case the playbook is running even faster than CI so it manages to get to that stage quite fast.

We're working on moving away from a role that applies HelmRelease and switching that to the operator which should allow us to more intelligently handle these scenarios.

from atmosphere.

runlevel-six commented on September 11, 2024

I am running this on bare-metal and it does run pretty fast. That could indeed be the cause.

from atmosphere.

runlevel-six commented on September 11, 2024

Update: I've been leaving nova before neutron and have redeployed a few times without any errors when I run with the release set to wallaby (my prior update was still showing that error when deploying yoga).

from atmosphere.

mnaser commented on September 11, 2024

@runlevel-six ok so I think then we can swap the ordering in that case

from atmosphere.

Error on initial neutron deployment when `openstack_helm_neutron_networks` is populated about atmosphere HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	openstack_helm_neutron_networks:
	- name: public
	external: true
	shared: true
	mtu_size: 1500
	port_security_enabled: true
	provider_network_type: flat
	provider_physical_network: external
	subnets:
	- name: public-subnet
	cidr: 10.96.250.0/24
	gateway_ip: 10.96.250.10
	allocation_pool_start: 10.96.250.200
	allocation_pool_end: 10.96.250.220
	enable_dhcp: true