Code Monkey home page Code Monkey logo

atmosphere's Introduction

Atmosphere

Community

If you have any questions and discussions about Atmosphere, you can join the community:

Contributing

You'll need to make sure that you have pre-commit setup and installed in your environment by running these commands::

pre-commit install --hook-type commit-msg

atmosphere's People

Contributors

cgmdude avatar dependabot[bot] avatar etimmons avatar fitbeard avatar github-actions[bot] avatar gtirloni avatar guilhermesteinmuller avatar hbhutani112 avatar mnaser avatar mpiscaer avatar okozachenko1203 avatar pre-commit-ci[bot] avatar renovate[bot] avatar ricolin avatar rptaylor avatar runlevel-six avatar thywyn avatar yaguangtang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

atmosphere's Issues

Glance with Cinder backend

There seems to be a list of outstanding issues when trying to use Glance with Cinder as a backend, this issue tracks all the outstanding issues:

Expose API average response time metrics

This issue is for target and discuss on how we can expose API average response time metrics

openstack/nova-api-osapi-65dc75df5b-n79jg[nova-osapi]: 2022-08-23 17:50:20.256 1 ERROR nova.api.openstack.wsgi sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30 (Background on this error at: http://sqlalche.me/e/3o7r)
openstack/nova-api-osapi-65dc75df5b-n79jg[nova-osapi]: 2022-08-23 17:57:18.657 1 INFO nova.osapi_compute.wsgi.server [req-fb927136-b4c5-43fa-a1ec-06745f58a2af 4fafb46dba2b4d6bad1fa8529fe6499d 8fdb4e6f09aa481ab157863dbfe2227f - default default] 38.102.64.143,10.101.6.141 "GET /v2.1/8fdb4e6f09aa481ab157863dbfe2227f/servers/detail HTTP/1.1" status: 200 len: 0 time: 407.0869071

ImageCopyFailure due to privesp issue

When using a setup which cinder is a backend for glance, when trying to create vms from boot volumes, we get the following:

openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server [req-375bce64-cf8a-49a8-aebe-b443dc5d9223 d7a95eca1fd7db54051a82eb1f898b36489bc2302e172d843ebab07fcf2fba7e 0413a906e8314a26a434b42f3134d1c5 - - -] Exception during message handling: cinder.exception.ImageCopyFailure: Failed to copy image to volume: Privsep daemon failed to start
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/cinder/volume/volume_utils.py", line 1152, in copy_image_to_volume
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     driver.copy_image_to_volume(
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/cinder/volume/driver.py", line 859, in copy_image_to_volume
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     self._copy_image_data_to_volume(
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/cinder/volume/driver.py", line 878, in _copy_image_data_to_volume
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     properties = volume_utils.brick_get_connector_properties(
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/cinder/volume/volume_utils.py", line 1323, in brick_get_connector_properties
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     return connector.get_connector_properties(root_helper,
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/os_brick/utils.py", line 141, in trace_logging_wrapper
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     return f(*args, **kwargs)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/os_brick/initiator/connector.py", line 232, in get_connector_properties
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     connector.get_connector_properties(
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/os_brick/initiator/connectors/iscsi.py", line 68, in get_connector_properties
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     initiator = iscsi.get_initiator()
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/os_brick/initiator/connectors/iscsi.py", line 990, in get_initiator
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     lines, _err = self._execute('cat', file_path, run_as_root=True,
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/os_brick/executor.py", line 52, in _execute
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     result = self.__execute(*args, **kwargs)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/os_brick/privileged/rootwrap.py", line 172, in execute
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     return execute_root(*cmd, **kwargs)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_privsep/priv_context.py", line 246, in _wrap
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     self.start()
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_privsep/priv_context.py", line 258, in start
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     channel = daemon.RootwrapClientChannel(context=self)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_privsep/daemon.py", line 383, in __init__
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     super(RootwrapClientChannel, self).__init__(sock)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_privsep/daemon.py", line 198, in __init__
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     self.exchange_ping()
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_privsep/daemon.py", line 211, in exchange_ping
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     raise FailedToDropPrivileges(msg)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server oslo_privsep.daemon.FailedToDropPrivileges: Privsep daemon failed to start
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server 
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred:
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server 
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "<decorator-gen-751>", line 2, in create_volume
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/cinder/objects/cleanable.py", line 208, in wrapper
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     result = f(*args, **kwargs)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/cinder/volume/manager.py", line 772, in create_volume
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     _run_flow()
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/cinder/volume/manager.py", line 764, in _run_flow
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     flow_engine.run()
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/taskflow/engines/action_engine/engine.py", line 247, in run
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     for _state in self.run_iter(timeout=timeout):
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/taskflow/engines/action_engine/engine.py", line 340, in run_iter
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     failure.Failure.reraise_if_any(er_failures)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/taskflow/types/failure.py", line 339, in reraise_if_any
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     failures[0].reraise()
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/taskflow/types/failure.py", line 346, in reraise
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     six.reraise(*self._exc_info)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/six.py", line 703, in reraise
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     raise value
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     result = task.execute(**arguments)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/cinder/volume/flows/manager/create_volume.py", line 1162, in execute
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     model_update = self._create_from_image(context,
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/cinder/utils.py", line 614, in _wrapper
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     return r.call(f, *args, **kwargs)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/tenacity/__init__.py", line 411, in call
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     return self.__call__(*args, **kwargs)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/tenacity/__init__.py", line 423, in __call__
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     do = self.iter(retry_state=retry_state)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/tenacity/__init__.py", line 360, in iter
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     return fut.result()
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     return self.__get_result()
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     raise self._exception
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/tenacity/__init__.py", line 426, in __call__
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     result = fn(*args, **kwargs)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/cinder/volume/flows/manager/create_volume.py", line 1058, in _create_from_image
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     model_update = self._create_from_image_cache_or_download(
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/cinder/volume/flows/manager/create_volume.py", line 950, in _create_from_image_cache_or_download
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     model_update = self._create_from_image_download(
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/cinder/volume/flows/manager/create_volume.py", line 766, in _create_from_image_download
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     volume_utils.copy_image_to_volume(self.driver, context, volume,
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.8/site-packages/cinder/volume/volume_utils.py", line 1169, in copy_image_to_volume
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server     raise exception.ImageCopyFailure(reason=ex)
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server cinder.exception.ImageCopyFailure: Failed to copy image to volume: Privsep daemon failed to start
openstack/cinder-volume-f478fc9f4-cxnpd[cinder-volume]: 2022-09-14 13:41:28.442 9 ERROR oslo_messaging.rpc.server 

Nova fails when running Atmosphere in dev env.

When running the converge Nova fails.

root@ctl1:/var/log# kubectl -n openstack get pods|grep nova
nova-api-metadata-fcc5c7b9c-cmsdd 0/1 Init:0/2 0 13m
nova-api-metadata-fcc5c7b9c-kqvmf 0/1 Init:0/2 0 13m
nova-api-metadata-fcc5c7b9c-pghct 0/1 Init:0/2 0 13m
nova-api-osapi-7bdddd9c49-2ggn2 0/1 Init:0/1 0 13m
nova-api-osapi-7bdddd9c49-qgg9r 0/1 Init:0/1 0 13m
nova-api-osapi-7bdddd9c49-tt4w8 0/1 Init:0/1 0 13m
nova-bootstrap-qs5bs 0/1 Init:0/1 0 13m
nova-cell-setup-zbhs2 0/1 Init:0/2 0 13m
nova-compute-default-d6ftj 0/2 Init:0/7 0 13m
nova-compute-default-zgzxx 0/2 Init:0/7 0 13m
nova-conductor-74b7d47f7-b4ff5 0/1 Init:0/1 0 13m
nova-conductor-74b7d47f7-rvxrr 0/1 Init:0/1 0 13m
nova-conductor-74b7d47f7-svnv6 0/1 Init:0/1 0 13m
nova-db-init-szkvw 0/3 Completed 0 13m
nova-db-sync-s5qh8 0/1 Completed 0 13m
nova-novncproxy-66f9444464-5l8dp 1/1 Running 0 13m
nova-novncproxy-66f9444464-hnd99 1/1 Running 0 13m
nova-novncproxy-66f9444464-pf52z 1/1 Running 0 13m
nova-scheduler-66f86545cc-6rbs4 0/1 Init:0/1 0 13m
nova-scheduler-66f86545cc-jj8qk 0/1 Init:0/1 0 13m
nova-scheduler-66f86545cc-sjvsc 0/1 Init:0/1 0 13m
rabbitmq-nova-server-0 1/1 Running 0 29m

nova-db-sync-s5qh8 takes longer then 6 minutes to complete.

root@ctl1:# kubectl -n openstack get pods|grep sync
< snip >
nova-db-sync-s5qh8 1/1 Running 0 5m44s
root@ctl1:# kubectl -n openstack get pods|grep sync
< snip >
nova-db-sync-s5qh8 0/1 Completed 0 7m27s

After it completes, I view the logs of nova-api-osapi-7bdddd9c49-tt4w8:

Entrypoint WARNING: 2022/10/28 08:00:07 entrypoint.go:72: Resolving dependency Job nova-rabbit-init in namespace openstack failed: jobs.batch "nova-rabbit-init" not found .
Entrypoint WARNING: 2022/10/28 08:00:07 entrypoint.go:72: Resolving dependency Job nova-ks-endpoints in namespace openstack failed: jobs.batch "nova-ks-endpoints" not found .
Entrypoint WARNING: 2022/10/28 08:00:07 entrypoint.go:72: Resolving dependency Job nova-ks-user in namespace openstack failed: jobs.batch "nova-ks-user" not found .

So nova-rabbit-init is missing and it never comes up. The only workaround what I found is doing a helm -n openstack uninstall nova and then rerun the playbook.

But this is not a option in my CI process.

Kind regards,

Michiel Piscaer

Introduce API rollouts for Ingress without downtime

At the moment, we do run 3 instances of the ingress however if the active one goes down, keepalived can still be pointing to the node that has a downed ingress.

We should clean that up in a way so that any of the system can accept traffic and route it through the Kubernetes mesh. Perhaps externalIP or something along those features that would allow us to do this.

Update doc

I found that atmosphere includes atmosphere k8s operator as a role and introduced poetry as dependency manager.
Also in molecule, it uses docker and ttl.sh for atmosphere operator image build.
Those new changes require pre-requisites for end users. It should be reflected documentation.

Allow for using `auto_bridge_add` with VLANs

It's possible that a user is adding a tagged VLAN into the br-ex port, it's a terrible idea but it is possibly something that was there from an old use-case.

We need to support that, right now it needs to be manually set after the fact and in system reboots, etc, it doesn't survive.

Switch to ClusterAPI

If we switch Atmosphere to cluster API, that cuts on all of our Kubernetes deployment code and it also makes it so that we don't actually need to maintain anything and it will also simplify upgrades, etc.

Error on initial neutron deployment when `openstack_helm_neutron_networks` is populated

On an initial deployment of OpenStack in Atmosphere, if the openstack_helm_neutron_networks is left empty, then there is no problem finishing the deployment of the openstack-helm-neutron role and the openstack playbook continues on to deploy the rest of the stack.

But if openstack_helm_neutron_networks is populated, the playbook will fail because the neutron networks cannot be added until an availability zone is created. That happens, as I understand it, in the openstack-helm-nova role. So when deploying for the first time with this variable populated, the playbook fails, at which point I re-run the openstack playbook with the flag -t openstack-helm-nova to only deploy nova. Once that completed, I can re-run the entire openstack playbook and everything completed with no errors.

The only entry in the openstack_helm_neutron_networks array I am using is similar to one below:

openstack_helm_neutron_networks:
- external: true
  mtu_size: 1500
  name: public
  port_security_enabled: true
  provider_network_type: flat
  provider_physical_network: external
  shared: true
  subnets:
  - allocation_pool_end: 10.1.160.90
    allocation_pool_start: 10.1.160.51
    cidr: 10.1.160.0/24
    dns_nameservers:
    - 1.1.1.1
    - 1.0.0.1
    enable_dhcp: true
    gateway_ip: 10.1.160.1
    name: public-subnet

Atmosphere-Operator CrashLoopBackOff issue

The atmosphere operator is doing the work needed to ensure resources but ends up in a CrashLoopBackOff status regularly. In 3 days I have seen 636 restarts and the entire log looks as follows:

❯ kubectl -n openstack logs atmosphere-operator-5b7f44dc7d-kxk68
2022-10-13 14:13.15 [info     ] Starting Atmosphere operator
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=Namespace name=monitoring
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=Namespace name=kube-system
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=Namespace name=cert-manager
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=Namespace name=openstack
2022-10-13 14:13.15 [info     ] Ensured resource               kind=Namespace name=kube-system
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=HelmRepository name=ceph namespace=kube-system requires=['namespace-kube-system']
2022-10-13 14:13.15 [info     ] Ensured resource               kind=Namespace name=openstack
2022-10-13 14:13.15 [info     ] Ensured resource               kind=Namespace name=cert-manager
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=HelmRepository name=percona namespace=openstack requires=['namespace-openstack']
2022-10-13 14:13.15 [info     ] Ensured resource               kind=Namespace name=monitoring
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=HelmRepository name=openstack-helm-infra namespace=openstack requires=['namespace-openstack']
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=HelmRepository name=bitnami namespace=openstack requires=['namespace-openstack']
2022-10-13 14:13.15 [info     ] Ensured resource               kind=HelmRepository name=ceph namespace=kube-system requires=['namespace-kube-system']
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=Secret name=atmosphere-memcached namespace=openstack requires=['namespace-openstack']
2022-10-13 14:13.15 [info     ] Ensured resource               kind=HelmRepository name=percona namespace=openstack requires=['namespace-openstack']
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=HelmRepository name=coredns namespace=openstack requires=['namespace-openstack']
2022-10-13 14:13.15 [info     ] Ensured resource               kind=HelmRepository name=bitnami namespace=openstack requires=['namespace-openstack']
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=HelmRepository name=openstack-helm namespace=openstack requires=['namespace-openstack']
2022-10-13 14:13.15 [info     ] Ensured resource               kind=HelmRepository name=openstack-helm-infra namespace=openstack requires=['namespace-openstack']
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=Service name=memcached-metrics namespace=openstack requires=['namespace-openstack']
2022-10-13 14:13.15 [info     ] Ensured resource               kind=Secret name=atmosphere-memcached namespace=openstack requires=['namespace-openstack']
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=HelmRepository name=prometheus-community namespace=monitoring requires=['namespace-monitoring']
2022-10-13 14:13.15 [info     ] Ensured resource               kind=Service name=memcached-metrics namespace=openstack requires=['namespace-openstack']
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=HelmRepository name=jetstack namespace=cert-manager requires=['namespace-cert-manager']
2022-10-13 14:13.15 [info     ] Ensured resource               kind=HelmRepository name=openstack-helm namespace=openstack requires=['namespace-openstack']
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=HelmRepository name=node-feature-discovery namespace=monitoring requires=['namespace-monitoring']
2022-10-13 14:13.15 [info     ] Ensured resource               kind=HelmRepository name=coredns namespace=openstack requires=['namespace-openstack']
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=HelmRelease name=pxc-operator namespace=openstack requires=['helm-repository-openstack-percona', 'namespace-openstack']
2022-10-13 14:13.15 [info     ] Ensured resource               kind=HelmRepository name=prometheus-community namespace=monitoring requires=['namespace-monitoring']
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=HelmRelease name=memcached namespace=openstack requires=['secret-openstack-atmosphere-memcached', 'helm-repository-openstack-openstack-helm-infra', 'namespace-openstack']
2022-10-13 14:13.15 [info     ] Ensured resource               kind=HelmRepository name=jetstack namespace=cert-manager requires=['namespace-cert-manager']
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=HelmRelease name=cert-manager namespace=cert-manager requires=['helm-repository-cert-manager-jetstack', 'namespace-cert-manager']
2022-10-13 14:13.15 [info     ] Ensured resource               kind=HelmRepository name=node-feature-discovery namespace=monitoring requires=['namespace-monitoring']
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=HelmRelease name=node-feature-discovery namespace=monitoring requires=['helm-repository-monitoring-node-feature-discovery', 'namespace-monitoring']
2022-10-13 14:13.15 [info     ] Ensured resource               kind=HelmRelease name=pxc-operator namespace=openstack requires=['helm-repository-openstack-percona', 'namespace-openstack']
2022-10-13 14:13.15 [debug    ] Ensuring resource              kind=PerconaXtraDBCluster name=percona-xtradb namespace=openstack requires=['helm-release-openstack-pxc-operator', 'namespace-openstack']
2022-10-13 14:13.15 [info     ] Ensured resource               kind=HelmRelease name=memcached namespace=openstack requires=['secret-openstack-atmosphere-memcached', 'helm-repository-openstack-openstack-helm-infra', 'namespace-openstack']
2022-10-13 14:13.15 [info     ] Ensured resource               kind=HelmRelease name=cert-manager namespace=cert-manager requires=['helm-repository-cert-manager-jetstack', 'namespace-cert-manager']
2022-10-13 14:13.16 [info     ] Ensured resource               kind=PerconaXtraDBCluster name=percona-xtradb namespace=openstack requires=['helm-release-openstack-pxc-operator', 'namespace-openstack']
2022-10-13 14:13.16 [debug    ] Ensuring resource              kind=Issuer name=openstack namespace=openstack requires=['helm-release-cert-manager-cert-manager', 'namespace-openstack']
2022-10-13 14:13.16 [debug    ] Ensuring resource              kind=Issuer name=self-signed namespace=openstack requires=['helm-release-cert-manager-cert-manager', 'namespace-openstack']
2022-10-13 14:13.16 [debug    ] Ensuring resource              kind=HelmRelease name=rabbitmq-cluster-operator namespace=openstack requires=['helm-release-cert-manager-cert-manager', 'helm-repository-openstack-bitnami', 'namespace-openstack']
2022-10-13 14:13.17 [info     ] Ensured resource               kind=Issuer name=openstack namespace=openstack requires=['helm-release-cert-manager-cert-manager', 'namespace-openstack']
2022-10-13 14:13.17 [debug    ] Ensuring resource              kind=Certificate name=self-signed-ca namespace=openstack requires=['helm-release-cert-manager-cert-manager', 'namespace-openstack']
2022-10-13 14:13.17 [info     ] Ensured resource               kind=Issuer name=self-signed namespace=openstack requires=['helm-release-cert-manager-cert-manager', 'namespace-openstack']
2022-10-13 14:13.18 [info     ] Ensured resource               kind=Certificate name=self-signed-ca namespace=openstack requires=['helm-release-cert-manager-cert-manager', 'namespace-openstack']
2022-10-13 14:13.18 [info     ] Ensured resource               kind=HelmRelease name=rabbitmq-cluster-operator namespace=openstack requires=['helm-release-cert-manager-cert-manager', 'helm-repository-openstack-bitnami', 'namespace-openstack']
2022-10-13 14:13.19 [debug    ] Ensuring resource              kind=RabbitmqCluster name=keystone namespace=openstack requires=['helm-release-openstack-rabbitmq-cluster-operator', 'namespace-openstack']
2022-10-13 14:13.19 [debug    ] Ensuring resource              kind=RabbitmqCluster name=senlin namespace=openstack requires=['helm-release-openstack-rabbitmq-cluster-operator', 'namespace-openstack']
2022-10-13 14:13.19 [debug    ] Ensuring resource              kind=RabbitmqCluster name=cinder namespace=openstack requires=['helm-release-openstack-rabbitmq-cluster-operator', 'namespace-openstack']
2022-10-13 14:13.20 [info     ] Ensured resource               kind=RabbitmqCluster name=keystone namespace=openstack requires=['helm-release-openstack-rabbitmq-cluster-operator', 'namespace-openstack']
2022-10-13 14:13.20 [debug    ] Ensuring resource              kind=RabbitmqCluster name=nova namespace=openstack requires=['helm-release-openstack-rabbitmq-cluster-operator', 'namespace-openstack']
2022-10-13 14:13.20 [info     ] Ensured resource               kind=RabbitmqCluster name=senlin namespace=openstack requires=['helm-release-openstack-rabbitmq-cluster-operator', 'namespace-openstack']
2022-10-13 14:13.20 [debug    ] Ensuring resource              kind=RabbitmqCluster name=glance namespace=openstack requires=['helm-release-openstack-rabbitmq-cluster-operator', 'namespace-openstack']
2022-10-13 14:13.20 [info     ] Ensured resource               kind=RabbitmqCluster name=cinder namespace=openstack requires=['helm-release-openstack-rabbitmq-cluster-operator', 'namespace-openstack']
2022-10-13 14:13.20 [debug    ] Ensuring resource              kind=RabbitmqCluster name=heat namespace=openstack requires=['helm-release-openstack-rabbitmq-cluster-operator', 'namespace-openstack']
2022-10-13 14:13.22 [info     ] Ensured resource               kind=RabbitmqCluster name=glance namespace=openstack requires=['helm-release-openstack-rabbitmq-cluster-operator', 'namespace-openstack']
2022-10-13 14:13.22 [debug    ] Ensuring resource              kind=RabbitmqCluster name=neutron namespace=openstack requires=['helm-release-openstack-rabbitmq-cluster-operator', 'namespace-openstack']
2022-10-13 14:13.22 [info     ] Ensured resource               kind=RabbitmqCluster name=nova namespace=openstack requires=['helm-release-openstack-rabbitmq-cluster-operator', 'namespace-openstack']
2022-10-13 14:13.22 [debug    ] Ensuring resource              kind=RabbitmqCluster name=barbican namespace=openstack requires=['helm-release-openstack-rabbitmq-cluster-operator', 'namespace-openstack']
2022-10-13 14:13.22 [info     ] Ensured resource               kind=RabbitmqCluster name=heat namespace=openstack requires=['helm-release-openstack-rabbitmq-cluster-operator', 'namespace-openstack']
2022-10-13 14:13.24 [info     ] Ensured resource               kind=RabbitmqCluster name=barbican namespace=openstack requires=['helm-release-openstack-rabbitmq-cluster-operator', 'namespace-openstack']
2022-10-13 14:13.24 [info     ] Ensured resource               kind=RabbitmqCluster name=neutron namespace=openstack requires=['helm-release-openstack-rabbitmq-cluster-operator', 'namespace-openstack']
Traceback (most recent call last):
  File "/app/.venv/bin/atmosphere-operator", line 6, in <module>
    sys.exit(main())
  File "/app/atmosphere/cmd/operator.py", line 15, in main
    engine.run()
  File "/app/.venv/lib/python3.10/site-packages/taskflow/engines/action_engine/engine.py", line 247, in run
    for _state in self.run_iter(timeout=timeout):
  File "/app/.venv/lib/python3.10/site-packages/taskflow/engines/action_engine/engine.py", line 340, in run_iter
    failure.Failure.reraise_if_any(er_failures)
  File "/app/.venv/lib/python3.10/site-packages/taskflow/types/failure.py", line 338, in reraise_if_any
    failures[0].reraise()
  File "/app/.venv/lib/python3.10/site-packages/taskflow/types/failure.py", line 350, in reraise
    raise value
  File "/app/.venv/lib/python3.10/site-packages/taskflow/engines/action_engine/executor.py", line 52, in _execute_task
    result = task.execute(**arguments)
  File "/app/atmosphere/tasks/kubernetes/base.py", line 79, in execute
    self.wait_for_resource(resource)
  File "/app/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 324, in wrapped_f
    return self(f, *args, **kw)
  File "/app/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 404, in __call__
    do = self.iter(retry_state=retry_state)
  File "/app/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 361, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f71333198d0 state=finished returned bool>]

Now this could be due to something different in my environment, but I am not getting a lot of context from the logs so I am unsure if its due to something different on my end or something wrong with the container in general.

download.cirros-cloud.net

Can it be that download.cirros-cloud.net has a rate limit?

I got the following error:

failed: [ctl1] (item={'name': 'cirros', 'source_url': 'http://download.cirros-cloud.net/0.5.1/', 'image_file': 'cirros-0.5.1-x86_64-disk.img', 'min_disk': 1, 'disk_format': 'qcow2', 'container_format': 'bare', 'is_public': True}) => {"ansible_loop_var": "item", "changed": false, "dest": "/tmp/cirros-0.5.1-x86_64-disk.img", "elapsed": 5, "gid": 0, "group": "root", "item": {"container_format": "bare", "disk_format": "qcow2", "image_file": "cirros-0.5.1-x86_64-disk.img", "is_public": true, "min_disk": 1, "name": "cirros", "source_url": "http://download.cirros-cloud.net/0.5.1/"}, "mode": "0600", "msg": "Request failed", "owner": "root", "response": "HTTP Error 503: Egress is over the account limit.", "size": 16338944, "state": "file", "status_code": 503, "uid": 0, "url": "http://download.cirros-cloud.net/0.5.1/cirros-0.5.1-x86_64-disk.img"}

Add monitoring for `vni*` interfaces

The vni* and vxlan* interfaces are constantly one of the things that always cause us problems since they are sometimes down, and when they are down, we end up with problems in the network with the customer.

We should add monitoring to Atmosphere to make sure that interface is up, so that we can always check if it's up and alert if not.

The prometheus query should be the following:

node_network_carrier{device=~"(vni|vxlan).*"} == 0

We can see that in the case of the customer issues reported, this is an example where it was caught:

image

Allow for missing trusts for Magnum cluster deletion

In the referenced issue, we are failing to delete a cluster because the trust is already gone, we should handle that exception and just pass through if it's already gone instead of failing because we can't delete something that already doesn't exist.

Switch health checks to `httpGet` for APIs

In the case of some environments, Glance stopped responding but the socket was still open so the API was unhealthy and Kubernetes didn't restart anything.

We need to change all API services to start using httpGet instead of tcpSocket to ensure proper responses.

Missing packages in glance image

Currently, when we use cinder as a backend for glance images, the following traceback is raised:

2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi [req-b4c41781-0afb-4173-9a8d-e052c658b0df dd5664a9739d4afba929b37d3f123f28 f5ed2d21437644adb2669f9ade9c949b - default default] Caught error: 'NoneType' object has no attribute 'Client': AttributeError: 'NoneType' object has no attribute 'Client'
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi Traceback (most recent call last):
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/common/wsgi.py", line 1353, in __call__
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     action_result = self.dispatch(self.controller, action,
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/common/wsgi.py", line 1397, in dispatch
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     return method(*args, **kwargs)
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/common/utils.py", line 416, in wrapped
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     return func(self, req, *args, **kwargs)
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/api/v2/image_data.py", line 300, in upload
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     self._restore(image_repo, image)
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     self.force_reraise()
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     raise self.value
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/api/v2/image_data.py", line 165, in upload
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     image.set_data(data, size, backend=backend)
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/domain/proxy.py", line 208, in set_data
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     self.base.set_data(data, size, backend=backend, set_active=set_active)
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/notifier.py", line 501, in set_data
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     _send_notification(notify_error, 'image.upload', msg)
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     self.force_reraise()
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     raise self.value
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/notifier.py", line 447, in set_data
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     self.repo.set_data(data, size, backend=backend,
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/api/policy.py", line 273, in set_data
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     return self.image.set_data(*args, **kwargs)
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/quota/__init__.py", line 322, in set_data
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     self.image.set_data(data, size=size, backend=backend,
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/location.py", line 567, in set_data
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     self._upload_to_store(data, verifier, backend, size)
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/location.py", line 473, in _upload_to_store
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     loc_meta) = self.store_api.add_to_backend_with_multihash(
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance_store/backend.py", line 490, in add_to_backend_with_multihash
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     return store_add_to_backend_with_multihash(
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance_store/backend.py", line 467, in store_add_to_backend_with_multihash
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     (location, size, checksum, multihash, metadata) = store.add(
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance_store/driver.py", line 279, in add_adapter
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     metadata_dict) = store_add_fun(*args, **kwargs)
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance_store/capabilities.py", line 176, in op_checker
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     return store_op_fun(store, *args, **kwargs)
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance_store/_drivers/cinder.py", line 819, in add
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     client = self.get_cinderclient(context)
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance_store/_drivers/cinder.py", line 508, in get_cinderclient
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi     c = cinderclient.Client(
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi AttributeError: 'NoneType' object has no attribute 'Client'
2022-09-02 17:59:11.475 8 ERROR glance.common.wsgi 

This is possible due to missing packages inside glance images like os_brick and python-cinderclient

Check if br-ex has no ports

There are some scenarios, like using vni interfaces on a ovs bridge which the ports and tags might vanish and cause public access issues to users.

We should check if br-ex has no ports and alert on that

Enable local mirroring or distributed cache for images

The images are quite large for OpenStack and instead of downloading them one by one, we should setup a pull through cache that lives on the control plane which will allow for images to only be downloaded once and then transferred locally.

There are alternatives such as Dragonfly or Uber's Kraken but there are some concerns with that solution.

Add TLS support for libvirt

We should generate client and server certificates for libvirt and distribute those accordingly, perhaps even build an internal CA that we use for this use-case specifically inside using cert-manger.

Fix `nova` chart readiness

This could be related to #42 but it looks like the first request that comes in to nova-api sometimes returns a 503, so even when the deployment is ready, we get a 503.

It could also be that the ingress is not updated yet with the new endpoints, so that's another potential issue (probably more with the way we wait for deployment rather than service readiness)

Atmosphere Operator: optional deployment of ingress-nginx

Our deployment of Atmosphere includes a slightly different architecture for Kubernetes, including a custom deployment of ingress-nginx, and one that uses metal-loadbalancer instead of keepalived. Prior to the move of ingress-nginx into the Atmosphere operator this was simple enough to manage, as we just commented out the deployment of keepalived and ingress-nginx in the openstack Ansible playbook.

Per the conversation we had last Friday (September 30), it would be ideal if instead of maintaining a fork of the atmosphere operator with the flows.py code commented out for the deployment of ingress-nginx, if instead this could be managed much like memcached (with a simple config override config.memcached.enabled) then the operator would be a more flexible option for us.

`NodeSoftNetDrops` noisy alerts

In several environments, that alarm does fire more than usual and it ends up causing problems with noisy alerts for no reason. We should either find something more actionable from it or drop the alert.

Horizon image on Quay errors: wallaby, xena, and yoga tags affected

When testing out the horizon image on quay.io/vexxhost/horizon:yoga I am seeing the following error occur which indicates that the version of mysql installed in that container image is not compatible. I see this error with the yoga tag.

Error reported in yoga tag:

Traceback (most recent call last):
  File "/tmp/manage.py", line 19, in <module>
    execute_from_command_line(sys.argv)
  File "/var/lib/openstack/lib/python3.8/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
    utility.execute()
  File "/var/lib/openstack/lib/python3.8/site-packages/django/core/management/__init__.py", line 395, in execute
    django.setup()
  File "/var/lib/openstack/lib/python3.8/site-packages/django/__init__.py", line 24, in setup
    apps.populate(settings.INSTALLED_APPS)
  File "/var/lib/openstack/lib/python3.8/site-packages/django/apps/registry.py", line 114, in populate
    app_config.import_models()
  File "/var/lib/openstack/lib/python3.8/site-packages/django/apps/config.py", line 301, in import_models
    self.models_module = import_module(models_module_name)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/var/lib/openstack/lib/python3.8/site-packages/django/contrib/auth/models.py", line 3, in <module>
    from django.contrib.auth.base_user import AbstractBaseUser, BaseUserManager
  File "/var/lib/openstack/lib/python3.8/site-packages/django/contrib/auth/base_user.py", line 48, in <module>
    class AbstractBaseUser(models.Model):
  File "/var/lib/openstack/lib/python3.8/site-packages/django/db/models/base.py", line 122, in __new__
    new_class.add_to_class('_meta', Options(meta, app_label))
  File "/var/lib/openstack/lib/python3.8/site-packages/django/db/models/base.py", line 326, in add_to_class
    value.contribute_to_class(cls, name)
  File "/var/lib/openstack/lib/python3.8/site-packages/django/db/models/options.py", line 207, in contribute_to_class
    self.db_table = truncate_name(self.db_table, connection.ops.max_name_length())
  File "/var/lib/openstack/lib/python3.8/site-packages/django/utils/connection.py", line 15, in __getattr__
    return getattr(self._connections[self._alias], item)
  File "/var/lib/openstack/lib/python3.8/site-packages/django/utils/connection.py", line 62, in __getitem__
    conn = self.create_connection(alias)
  File "/var/lib/openstack/lib/python3.8/site-packages/django/db/utils.py", line 204, in create_connection
    backend = load_backend(db['ENGINE'])
  File "/var/lib/openstack/lib/python3.8/site-packages/django/db/utils.py", line 111, in load_backend
    return import_module('%s.base' % backend_name)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/var/lib/openstack/lib/python3.8/site-packages/django/db/backends/mysql/base.py", line 36, in <module>
    raise ImproperlyConfigured('mysqlclient 1.4.0 or newer is required; you have %s.' % Database.__version__)
django.core.exceptions.ImproperlyConfigured: mysqlclient 1.4.0 or newer is required; you have 1.0.2.

In the xena and wallaby tags I get a different error:

Execution of msgfmt failed: /var/lib/openstack/lib/python3.8/site-packages/monitoring/locale/ko_KR/LC_MESSAGES/django.po:35: end-of-line within string
msgfmt: found 1 fatal error
Execution of msgfmt failed: /var/lib/openstack/lib/python3.8/site-packages/monitoring/locale/en_GB/LC_MESSAGES/django.po:36: end-of-line within string
/var/lib/openstack/lib/python3.8/site-packages/monitoring/locale/en_GB/LC_MESSAGES/django.po:49: end-of-line within string
msgfmt: found 2 fatal errors
Execution of msgfmt failed: /var/lib/openstack/lib/python3.8/site-packages/monitoring/locale/ja/LC_MESSAGES/django.po:39: end-of-line within string
msgfmt: found 1 fatal error
Execution of msgfmt failed: /var/lib/openstack/lib/python3.8/site-packages/monitoring/locale/id/LC_MESSAGES/django.po:31: end-of-line within string
msgfmt: found 1 fatal error
CommandError: compilemessages generated one or more errors.

The OpenStack Helm image openstackhelm/horizon:xena-ubuntu_focal does not have either of these errors nor does us-docker.pkg.dev/vexxhost-infra/openstack/horizon:wallaby

Add monitoring for Amphorae

At the moment, if any Amphorae go into ERROR status, we don't actually alert for it. However, it is possible that it would have a direct infrastructure impact on the load balancer if the amphorae is in an error state.

The nice thing is that the OpenStack exporter already implements this:

openstack-exporter/openstack-exporter@d431ae6

So it should be just adding the appropriate alerts.

Add automatic dependency bumps

We've got a few things that need to be updated automatically:

  • Packages for infrastructure services (Ceph, containers, Kuberentes)
  • Images for services
  • Versions for charts

We should probably use renovate to take care of this since I think we're going to be limited in our options with dependabot

Stop using pins

The pinning of the packages seems to have weird issues and not be enforced, it also breaks the upgrades for Kubernetes in a weird way.

The alternative is to hold packages instead.

Adding limits for all pods

We recently experienced OOM on controllers by the fact that's a small cloud, when we got one of the nodes out of the cluster, pods were rescheduled to other controllers causing another sequence of OOM.

we should implement limits on pods to avoid that.

OpenID connect + Keystone + large headers

When using an OpenID connect provider that can potentially provide large headers, Keystone falls over at the ingress level:

upstream sent too big header while reading response header from upstream

We'll need to add the following annotation to the ingress:

nginx.ingress.kubernetes.io/proxy-buffer-size: "8k"

based from oauth2-proxy/oauth2-proxy#676

Implement tests to verify if new VMs are failing to go up

At the moment, Tempest is currently able to tell us if we are failing to deploy inside the public network, however, we've had issues where the DHCP agent gets "stuck" and we have no idea of knowing if things are working or not inside the cloud.

I think we should implement https://github.com/albertodonato/query-exporter and get queries from the instance_faults tables in order to extract information that could be useful for detecting if we are getting a lot of new VM failures.

Glance Upload Internal Server Error

I pulled down the Glance changes made yesterday (Thursday, September 8), but when I use the container image tag referenced (quay.io/vexxhost/glance:677c89c23631e9083261a1a18ed438d8966e0de2 ) I get a failure when attempting to upload a new OS image to Glance:

If I use the previous image (us-docker.pkg.dev/vexxhost-infra/openstack/glance:22.1.1.dev2-1), everything works fine.

I see this result whether I let Atmosphere attempt to deploy an image defined in my inventory or if I run a manual upload via the CLI:

openstack image create \
  --disk-format qcow2 \
  --container-format bare \
  --public \
  --file ~/Downloads/cirros-0.3.5-x86_64-disk.img \
  cirros

Here are the relevant logs:

❯ kubectl -n openstack logs --follow -l application=glance,component=api
2022-09-09 12:54:21.801 9 ERROR glance.api.v2.image_data [req-0b4745cd-261c-4616-87ca-38aea8bc7056 b19af6e18b774b1b9de6f05d65131d43 325b64f1784b41328921a79e292cbe9b - default default] Failed to upload image data due to internal error
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi [req-0b4745cd-261c-4616-87ca-38aea8bc7056 b19af6e18b774b1b9de6f05d65131d43 325b64f1784b41328921a79e292cbe9b - default default] Caught error: 'NoneType' object has no attribute 'Rados': AttributeError: 'NoneType' object has no attribute 'Rados'
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi Traceback (most recent call last):
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/common/wsgi.py", line 1353, in __call__
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     action_result = self.dispatch(self.controller, action,
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/common/wsgi.py", line 1397, in dispatch
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     return method(*args, **kwargs)
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/common/utils.py", line 416, in wrapped
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     return func(self, req, *args, **kwargs)
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/api/v2/image_data.py", line 300, in upload
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     self._restore(image_repo, image)
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     self.force_reraise()
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     raise self.value
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/api/v2/image_data.py", line 165, in upload
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     image.set_data(data, size, backend=backend)
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/domain/proxy.py", line 208, in set_data
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     self.base.set_data(data, size, backend=backend, set_active=set_active)
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/notifier.py", line 501, in set_data
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     _send_notification(notify_error, 'image.upload', msg)
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     self.force_reraise()
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     raise self.value
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/notifier.py", line 447, in set_data
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     self.repo.set_data(data, size, backend=backend,
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/api/policy.py", line 273, in set_data
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     return self.image.set_data(*args, **kwargs)
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/quota/__init__.py", line 322, in set_data
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     self.image.set_data(data, size=size, backend=backend,
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/location.py", line 567, in set_data
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     self._upload_to_store(data, verifier, backend, size)
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance/location.py", line 473, in _upload_to_store
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     loc_meta) = self.store_api.add_to_backend_with_multihash(
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance_store/backend.py", line 490, in add_to_backend_with_multihash
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     return store_add_to_backend_with_multihash(
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance_store/backend.py", line 467, in store_add_to_backend_with_multihash
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     (location, size, checksum, multihash, metadata) = store.add(
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance_store/driver.py", line 279, in add_adapter
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     metadata_dict) = store_add_fun(*args, **kwargs)
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance_store/capabilities.py", line 176, in op_checker
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     return store_op_fun(store, *args, **kwargs)
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance_store/_drivers/rbd.py", line 536, in add
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     with self.get_connection(conffile=self.conf_file,
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     return next(self.gen)
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi   File "/var/lib/openstack/lib/python3.8/site-packages/glance_store/_drivers/rbd.py", line 288, in get_connection
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi     client = rados.Rados(conffile=conffile, rados_id=rados_id)
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi AttributeError: 'NoneType' object has no attribute 'Rados'
2022-09-09 12:54:21.833 9 ERROR glance.common.wsgi 

I did some research and it looks like an issue associated with the container missing the python3-rados or python3-ceph-common packages but I cannot say for sure as from what I can tell python3-ceph-common is installed. I can say that us-docker.pkg.dev/vexxhost-infra/openstack/glance:22.1.1.dev2-1 works fine.

Nova issue found on containers in quay.io/vexxhost/nova

When using quay.io/vexxhost/nova with any version tag alongside quay.io/vexxhost/heat using the matching version tag, I am seeing the following errors in deployment.apps/nova-novncproxy (error found in the nova-novncproxy-init-assets init container):

❯ kubectl -n openstack logs nova-novncproxy-54b54dffbf-9lbpr -c nova-novncproxy-init-assets
+ console_kind=novnc
+ '[' novnc == novnc ']'
+ cp -vaRf '/usr/share/novnc/*' /tmp/usr/share/novnc/
cp: cannot stat '/usr/share/novnc/*': No such file or directory

If I swap Atmosphere to use us-docker.pkg.dev/vexxhost-infra/openstack/nova:wallaby alongside us-docker.pkg.dev/vexxhost-infra/openstack/heat:wallaby I do not see these errors.

Collect container logs in Atmosphere CI

We are facing molecule converge failures frequently in Zuul CI without any reasonable errors in patches.
Once we collect all container logs after molecule converge, it will be helpful to find the root cause.

Switch all services to use UWSGI

At the moment, the services are running the (very) inefficient version of their eventlet API endpoint.

We should flip over to uWSGI (or something using mod_wsgi) in order to be able to have a proper HTTP server that can scale to the demand.

Migrate to local Harbor registry

We're currently seeing issues with Quay.io and there's nothing we can do about it until it's fixed. Dockerhub has impossible to work with limits.

We're going to migrate to Harbor hosted at @vexxhost for Atmosphere, I'll update the jobs shortly here..

Push all service logs to OpenSearch

At the moment, once a pod dies, the logs are gone forever and they cannot be recovered anymore. This is not good in the long term since it makes it very hard to troubleshoot things

We need to push logs into the ElasticSearch that we run. I think the best way to do this is to start making sure that all systems push JSON logs, and then maybe run a fluent-bit side car that pushes all those JSON logs to the OpenSearch cluster

Add `rabbitmq` image build/sync

At the moment, the quay.io/vexxhost/rabbitmq:3.8.23-management is manually updated. We'll need to first of all make it a job that automatically sync's it but then also automatically bumps it up when needed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.