Code Monkey home page Code Monkey logo

Comments (11)

mnaser avatar mnaser commented on September 11, 2024 1

@runlevel-six as we are at a point of validating zed, we noticed that this is an issue that shows up with newer releases.

we're updating the order of playbooks to fix it

from atmosphere.

mnaser avatar mnaser commented on September 11, 2024

@runlevel-six are you having this issue with a newer release of OpenStack or Wallaby?

I wonder because we actually use this inside the Molecule CI and we don't run into this:

openstack_helm_neutron_networks:
- name: public
external: true
shared: true
mtu_size: 1500
port_security_enabled: true
provider_network_type: flat
provider_physical_network: external
subnets:
- name: public-subnet
cidr: 10.96.250.0/24
gateway_ip: 10.96.250.10
allocation_pool_start: 10.96.250.200
allocation_pool_end: 10.96.250.220
enable_dhcp: true

from atmosphere.

runlevel-six avatar runlevel-six commented on September 11, 2024

@mnaser that is a good question. I know I've seen it running yoga as my atmosphere_openstack_release value, but it is certainly possible that it works fine on wallaby - or even that this is potentially due to something different I'm doing (I don't deploy the Atmosphere Kubernetes cluster, for example), though I'm not sure where the cause would cascade down on those differences.

from atmosphere.

mnaser avatar mnaser commented on September 11, 2024

@runlevel-six can you share the exact error if you have it handy? perhaps that might help us to research it

from atmosphere.

runlevel-six avatar runlevel-six commented on September 11, 2024

The specific error is msg": "ResourceNotFound: 404: Client Error for url: https://network-<fqdn>/v2.0/networks, AvailabilityZone nova could not be found."

The surrounding logs:

TASK [openstack_helm_neutron : Deploy Helm chart] *********************************************************************************************
changed: [atmosphere-controller-01]

TASK [Create Ingress] *************************************************************************************************************************

TASK [openstack_helm_ingress : Create Ingress (network)] **************************************************************************************
ok: [atmosphere-controller-01]

TASK [openstack_helm_neutron : Wait until network service ready] ******************************************************************************
ok: [atmosphere-controller-01]

TASK [openstack_helm_neutron : Create networks] ***********************************************************************************************
failed: [clt-a-a04-19-2-sr-blade-b] (item={'external': True, 'mtu_size': 1500, 'name': 'public', 'port_security_enabled': True, 'provider_network_type': 'flat', 'provider_physical_network': 'external', 'shared': True, 'subnets': [{'allocation_pool_end': '10.1.160.90', 'allocation_pool_start': '10.1.160.51', 'cidr': '10.1.160.0/24', 'dns_nameservers': ['1.1.1.1', '1.0.0.1'], 'enable_dhcp': True, 'gateway_ip': '10.1.160.1', 'name': 'public-subnet'}]}) => {"ansible_loop_var": "item", "changed": false, "extra_data": {"data": null, "details": "AvailabilityZone nova could not be found.", "response": "{\"NeutronError\": {\"type\": \"AvailabilityZoneNotFound\", \"message\": \"AvailabilityZone nova could not be found.\", \"detail\": \"\"}}"}, "item": {"external": true, "mtu_size": 1500, "name": "public", "port_security_enabled": true, "provider_network_type": "flat", "provider_physical_network": "external", "shared": true, "subnets": [{"allocation_pool_end": "10.1.160.90", "allocation_pool_start": "10.1.160.51", "cidr": "10.1.160.0/24", "dns_nameservers": ["1.1.1.1", "1.0.0.1"], "enable_dhcp": true, "gateway_ip": "10.1.160.1", "name": "public-subnet"}]}, "msg": "ResourceNotFound: 404: Client Error for url: https://network-<fqdn>/v2.0/networks, AvailabilityZone nova could not be found."}

At this point in the plays, the neutron pods are not completely stable, and they stay at the following state until nova is deployed (specifically until the nova-api service is available):

kubectl -n openstack get po | grep -v 'Running\|Completed'                                       it-45661: Wed Sep 21 14:39:31 2022

NAME                                                  READY   STATUS      RESTARTS   AGE
neutron-dhcp-agent-default-4djfg                      0/1     Init:0/2    0          7m17s
neutron-dhcp-agent-default-g8ww6                      0/1     Init:0/2    0          7m17s
neutron-dhcp-agent-default-xf7z8                      0/1     Init:0/2    0          7m17s
neutron-l3-agent-default-6pfqh                        0/1     Init:0/2    0          7m17s
neutron-l3-agent-default-fgbtv                        0/1     Init:0/2    0          7m17s
neutron-l3-agent-default-fm2g7                        0/1     Init:0/2    0          7m17s
neutron-metadata-agent-default-7c66p                  0/1     Init:0/2    0          7m17s
neutron-metadata-agent-default-dv9v5                  0/1     Init:0/2    0          7m17s
neutron-metadata-agent-default-sfgvw                  0/1     Init:0/2    0          7m17s

If I followup after that failure calling the openstack playbook with just the tag openstack-helm-nova and let nova deploy, then immediately after it finishes I will see that all pods are running and stable, I can just re-run the entire openstack playbook and everything works including the Create networks task that failed prior to nova being deployed.

from atmosphere.

mnaser avatar mnaser commented on September 11, 2024

@runlevel-six would it resolve your issue if we flip things around and deploy nova then neutron? I haven't been able to successfully figure out why this happens, but I think that might mitigate things while still making them work?

from atmosphere.

runlevel-six avatar runlevel-six commented on September 11, 2024

@mnaser I re-deployed this fresh, this time with the openstack playbook changed to swap the order of the nova and neutron roles as you suggested. I left my release set to yoga. This time nova was deployed and reconciled as far as flux is concerned, but the nova-compute-default daemonset had not yet stabilized when the openstack playbook deployed neutron next. Neutron deployed but I still received the same error as before when it tried to create networks:

TASK [openstack_helm_neutron : Create networks] *********************************************************************************************
failed: [atmosphere-controller-01] (item={'external': True, 'mtu_size': 1500, 'name': 'public', 'port_security_enabled': True, .'provider_network_type': 'flat', 'provider_physical_network': 'external', 'shared': True, 'subnets': [{'allocation_pool_end': '10.1.160.90', 'allocation_pool_start': '10.1.160.51', 'cidr': '10.1.160.0/24', 'dns_nameservers': ['1.1.1.1', '1.0.0.1'], 'enable_dhcp': True, 'gateway_ip': '10.1.160.1', 'name': 'public-subnet'}]}) => {"ansible_loop_var": "item", "changed": false, "extra_data": {"data": null, "details": "AvailabilityZone nova could not be found.", "response": "{\"NeutronError\": {\"type\": \"AvailabilityZoneNotFound\", \"message\": \"AvailabilityZone nova could not be found.\", \"detail\": \"\"}}"}, "item": {"external": true, "mtu_size": 1500, "name": "public", "port_security_enabled": true, "provider_network_type": "flat", "provider_physical_network": "external", "shared": true, "subnets": [{"allocation_pool_end": "10.1.160.90", "allocation_pool_start": "10.1.160.51", "cidr": "10.1.160.0/24", "dns_nameservers": ["1.1.1.1", "1.0.0.1"], "enable_dhcp": true, "gateway_ip": "10.1.160.1", "name": "public-subnet"}]}, "msg": "ResourceNotFound: 404: Client Error for url: https://network-<fqdn>/v2.0/networks, AvailabilityZone nova could not be found."}

Less than a minute after that fails the neutron pods all are running and at that point the nova-compute-default daemonset successfully deploys. At this point re-running the openstack playbook results in a successful completion including network creation.

Also, note that I am not yet using the latest commit of Atmosphere because I haven't finished reviewing the atmosphere operator to test it out. So that is just to say that some of what I am experiencing may be due to not running the latest changes y'all have made.

from atmosphere.

mnaser avatar mnaser commented on September 11, 2024

@runlevel-six are you running this on bare metal or VMs? I am wondering if the issue here is that no agents are up yet, and because of that, it fails.

I guess it's possible that in your case the playbook is running even faster than CI so it manages to get to that stage quite fast.

We're working on moving away from a role that applies HelmRelease and switching that to the operator which should allow us to more intelligently handle these scenarios.

from atmosphere.

runlevel-six avatar runlevel-six commented on September 11, 2024

I am running this on bare-metal and it does run pretty fast. That could indeed be the cause.

from atmosphere.

runlevel-six avatar runlevel-six commented on September 11, 2024

Update: I've been leaving nova before neutron and have redeployed a few times without any errors when I run with the release set to wallaby (my prior update was still showing that error when deploying yoga).

from atmosphere.

mnaser avatar mnaser commented on September 11, 2024

@runlevel-six ok so I think then we can swap the ordering in that case

from atmosphere.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.