rcbops / ansible-lxc-rpc Goto Github PK

Ansible Playbooks to deploy openstack

Home Page: https://rcbops.github.io/ansible-lxc-rpc/

License: Apache License 2.0

Python 77.02% Shell 22.07% D 0.02% JavaScript 0.63% Ruby 0.26%

ansible-lxc-rpc's Introduction

##NOTE: This repo and all development has moved to Stackforge

###As per all Stackforge projects, bug tracking and release management will be managed in launchpad, and code reviews will be managed in gerrit

ansible-lxc-rpc's People

Contributors

Stargazers

Watchers

ansible-lxc-rpc's Issues

Should be able to limit setup plays to a set of hosts or containers

Issue by hughsaunders
Friday Aug 22, 2014 at 11:38 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/449

Currently the destroy-containers.yml has variables host_group and container_group that can be supplied with -e to limit the containers that will be removed.

It would be useful if all the playbooks directly included by host-setup.yml followed the same convention, so for example all the galera containers can be rebuilt, or all the galera containers on a node etc.

Playbooks included by host-setup.yml:

include: setup-common.yml #could use host_group
include: build-containers.yml #already uses host_group, could use container_group
include: restart-containers.yml #already uses host_group, could use container_group
include: host-common.yml #could use host_group

Incorrect hosts groups listed in maas_local.yml

For some checks, we reference the _container hosts group (cinder_api_container as an example). The problem here is that if that hosts group is moved onto bare metal then the checks will not get deployed.

[kibana] kibana is redirecting to internal lb elasticsearch vip when accessing from external vip, causing dashboard to fail

Issue by jacobwagner
Tuesday Aug 26, 2014 at 22:00 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/477

here is the only log i could find and it was on the page

Error Could not contact Elasticsearch at https://192.168.96.10:8443/elasticsearch/. Please ensure that Elasticsearch is reachable from your system

192.168.96.10 is the internal vip

dnsmasq package not available in pinned apt repo

Issue by johnmarkschofield
Wednesday Aug 20, 2014 at 16:36 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/426

When using the pinned apt repo:

root@schof-aio:~# cat /etc/apt/sources.list
deb [arch=amd64] http://dc0e2a2ef0676c3453b1-31bb9324d3aeab0d08fa434012c1e64d.r5.cf1.rackcdn.com LA main
root@schof-aio:~#

The dnsmasq package is not available:

root@schof-aio:~# apt-get update
Hit http://mirror.jmu.edu trusty InRelease
Hit http://mirror.jmu.edu trusty/main amd64 Packages
Hit http://mirror.jmu.edu trusty/main i386 Packages
Hit http://dc0e2a2ef0676c3453b1-31bb9324d3aeab0d08fa434012c1e64d.r5.cf1.rackcdn.com LA InRelease
Ign http://mirror.jmu.edu trusty/main Translation-en_US
Ign http://mirror.jmu.edu trusty/main Translation-en
Hit http://dc0e2a2ef0676c3453b1-31bb9324d3aeab0d08fa434012c1e64d.r5.cf1.rackcdn.com LA/main amd64 Packages
Ign http://dc0e2a2ef0676c3453b1-31bb9324d3aeab0d08fa434012c1e64d.r5.cf1.rackcdn.com LA/main Translation-en_US
Ign http://dc0e2a2ef0676c3453b1-31bb9324d3aeab0d08fa434012c1e64d.r5.cf1.rackcdn.com LA/main Translation-en
Reading package lists... Done
root@schof-aio:~# apt-get install dnsmasq
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package dnsmasq is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
  dnsmasq-base

E: Package 'dnsmasq' has no installation candidate
root@schof-aio:~#

This causes the openstack-common.yml playbook to fail:

TASK: [container_common | Ensure container packages are installed] ************
<10.51.50.1> ESTABLISH CONNECTION FOR USER: root
<10.51.50.1> REMOTE_MODULE apt pkg=libpq-dev,dnsmasq,dnsmasq-utils state=present
<10.51.50.1> EXEC ['ssh', '-C', '-vvv', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r', '-o', 'StrictHostKeyChecking=no', '-o', 'Port=22', '-o', 'KbdInteractiveAuthentication=no', '-o', 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', '-o', 'PasswordAuthentication=no', '-o', 'ConnectTimeout=10', u'10.51.50.1', u"/bin/sh -c 'LC_CTYPE=en_US.UTF-8 LANG=en_US.UTF-8 /usr/bin/python'"]
failed: [infra1] => (item=libpq-dev,dnsmasq,dnsmasq-utils) => {"failed": true, "item": "libpq-dev,dnsmasq,dnsmasq-utils"}
msg: No package matching 'dnsmasq' is available

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/root/openstack-common.retry

infra1                     : ok=21   changed=3    unreachable=0    failed=1

OpenStackInstaller: 2014-08-19 20:00:33,021 - CRITICAL: Failed running playbook 'playbooks/openstack/openstack-common.yml' 3 times. Aborting...

This is a release-blocker.

haproxy db users always created

Issue by hughsaunders
Thursday Aug 21, 2014 at 11:05 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/434

Should probably only create those if haproxy is being used.

https://github.com/rcbops/ansible-lxc-rpc/blob/master/rpc_deployment/roles/galera_setup/tasks/main.yml#L45-L55

No nova-cert running after deploying

I don't see a nova-cert running nor any trace of this in this repo. Should it be running?

Missing Values In `metadata_agent.ini` Config

Some values (nova_metadata_ip and metadata_proxy_shared_secret) are not getting set / substituted. I looked at vars/nova_all.yml and everything seems correct. Not sure why the literal template strings aren't being parsed.

[DEFAULT]
auth_url = http://172.29.236.4:5000/v2.0
auth_region = RegionOne
admin_tenant_name = service
admin_user = neutron
admin_password = herpderp
nova_metadata_ip = {{internal_vip_address}}
metadata_proxy_shared_secret = {{nova_metadata_proxy_secret}}
metadata_workers = 10

server resizing does not work

On the IAD labs, when resizing a running instance, the resize target enters an error state

Excerpt from nova-compute.log on the compute host (including stacktrace):
https://gist.github.com/jcourtois/b68175881fa6fceb2fdf

I think this is the necessary fix:
https://bugzilla.redhat.com/show_bug.cgi?id=975014

nova-consoleauth needs to use memcached

This is required since we run multiple memcached servers. Whilst we specify memcached_servers in nova.conf, this is under the keystone_authtoken and is not found by nova-consoleauth.

On a side note, should we specify memcached_servers to point to the memcached VIP, or to each individual memcached server?

Glance notification mechanism

We configure the notification mechanism (notification_driver) in glance-api.conf to use a message queue. Without a consumer (typically ceilometer), this queue could become rather large.

Hosts file can generate duplicate entry weirdness

If you remove containers/re-run ansible duplicate hosts will be creating leading to very inconsistent and weird results:

root@533812-node16:~/ansible-lxc-rpc/rpc_deployment# grep 222 /etc/hosts
10.241.0.222 infra3_horizon_container-b0288493
10.241.0.222 infra3_heat_apis_container-a7ee79ee

Pinned Repo broken for linux-image-extra-virtual

Issue by johnmarkschofield
Tuesday Aug 19, 2014 at 17:26 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/415

The vhost_net kernel module is installed by default in the Ubuntu Trusty image on mycloud.rackspace.com. In the VagrandCloud Trusty Image and the Official Ubuntu Image, that kernel module is not present.

The fix is to install the linux-image-extra package, which includes the vhost_net module. To avoid specifying kernel versions in my scripts, I install linux-image-extra-virtual.

With the pinned repo, I am unable to install any of the linux-image-extra-* packages. With a standard ubuntu repo, I can.

So I believe we have mismatches between different kernel packages present in the repo. All need to be updated to a consistent version.

Missing horizon maas_local check

We have a horizon check in maas_remote.yml, but not maas_local.yml.

Some containers are using unconfined apparmor profiles

Issue by Apsu
Monday Jul 28, 2014 at 16:18 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/239

Some containers, such as neutron-agents, nova-compute and cinder-volumes are disabling apparmor by setting their profile to "unconfined". This is primarily due to us failing to figure out the right way to provide access to all the host resources and capabilities ( https://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.2/capfaq-0.2.txt ) required to do the needful.

We should figure out how to use apparmor correctly so we can at least make a passable attempt at locking these containers down as much as possible, in line with the rest of the cluster containers.

Add a check for nova-consoleauth to maas_local.yml

This is currently not being monitored but needs to be.

Cinder.conf Template Errors

There is an error when writing the backend sections. Also, the enabled_backends value doesn't seem right. It is using the key instead of the value from rpc_user_config.yml.

Error

< TASK: cinder_common | Setup Cinder Config >

    \   ^__^
     \  (oo)\_______
        (__)\       )\/\
            ||----w |
            ||     ||

fatal: [cinder1_cinder_volumes_container-f30a3071] => {'msg': "One or more undefined variables: 'unicode object' has no attribute 'items'", 'failed': True}
fatal: [cinder1_cinder_volumes_container-f30a3071] => {'msg': 'One or more items failed.', 'failed': True, 'changed': False, 'results': [{'msg': "One or more undefined variables: 'unicode object' has no attribute 'items'", 'failed': True}]}

FATAL: all hosts have already failed -- aborting
cinder.conf Template (partial)

...
{% if cinder_backends is defined %}

enabled_backends={% for backend in cinder_backends|dictsort %}{{ backend.0 }}{% if not loop.last %},{% endif %}{% endfor %}

{% for backend_section in cinder_backends|dictsort %}
[{{ backend_section.0 }}]
{% for key, value in backend_section.1.items() %}
{{ key }}={{ value }}
{% endfor %}

{% endfor %}
{% endif %}
...
Cinder.conf Rendered (partial)

...
enabled_backends=limit_container_types

Then it just blow up around here. See above error...

...
rpc_user_config.yml (relevant section)

User defined Storage Hosts, this should be a required group

storage_hosts:
cinder1:
ip: 172.29.236.1
# "container_vars" can be set outside of all other options as
# host specific optional variables.
container_vars:
# In this example we are defining what cinder volumes are
# on a given host.
cinder_backends:
# if the "limit_container_types" argument is set, within
# the top level key of the provided option the inventory
# process will perform a string match on the container name with
# the value found within the "limit_container_types" argument.
# If any part of the string found within the container
# name the options are appended as host_vars inside of inventory.
limit_container_types: cinder_volume
lvm:
volume_group: cinder-volumes
driver: cinder.volume.drivers.lvm.LVMISCSIDriver
backend_name: LVM_iSCSI

Look at adding checks for nova-api-ec2 and nova-api-metadata to maas_local.yml

Increase ansible.conf forks value to something larger than 5

http://docs.ansible.com/intro_configuration.html#forks

HAProxy for horizon is set to leastconn

This means you can connect to different horizon servers per request.

This causes SSL-cert weirdness, and is not advisable.

We should abort installation if passwords aren't set in user_variables.yml

Issue by miguelgrinberg
Thursday Aug 21, 2014 at 02:06 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/427

When the passwords aren't set in user_variables.yml an obscure failure occurs when restarting rabbitmq due to the cookie being empty. Probably better to check for non-empty passwords and abort with a meaningful error message.

Rabbit FQDN Issues returning

Issue by andymcc
Friday Aug 29, 2014 at 14:29 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/486

Rabbit sometimes fails to start because it can't connect to itself (when using anFQDN):

TASK: [rabbit_common | Install rabbit packages] *******************************
failed: [node12.domain.com_rabbit_mq_container-48e7bee8] => (item=rabbitmq-server)

Since the package install starts the package this fails, logs show the following:

rabbit@node12:

unable to connect to epmd (port 4369) on node12: address (cannot connect to host/port)

hosts shows the following:
root@node12:~# cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 node12.domain.com_rabbit_mq_container-48e7bee8

Moving the "Fix /etc/hosts" entry in the rabbit_common to be above the install will fix this.

Move the elasticsearch storage backend

We should move the elasticsearch storage backend to something outside of the containers LV. Current proposal is to bind mount /openstack/elasticsearch into the container and use that as the backend storage for elasticsearch.

Frozen Repo missing package for haproxy install

failed: [node1] => (item=haproxy,hatop,vim-haproxy) => {"failed": true, "item": "haproxy,hatop,vim-haproxy"}
stderr: E: Failed to fetch http://dc0e2a2ef0676c3453b1-31bb9324d3aeab0d08fa434012c1e64d.r5.cf1.rackcdn.com/ubuntu/pool/main/liby/libyaml/libyaml-0-2_0.1.4-3ubuntu3_amd64.deb  404  Not Found [IP: 80.239.178.186 80]

server live migration does not work

On the IAD labs, when migrating a running instance, the migration target enters an error state

Excerpt from nova-compute.log on the compute host (including stacktrace):
https://gist.github.com/jcourtois/8deb012fda83c4920411

I think this is the necessary fix:
https://bugzilla.redhat.com/show_bug.cgi?id=975014

[Cinder] iSCSI doesn't require running nova-compute outside of a container after all!

Issue by Apsu
Wednesday Aug 06, 2014 at 16:13 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/314

After lots of digging around and reading kernel code, trying to figure out how to fix iscsitarget's crackheadedness, I discovered that there's another iscsi targeting system built into recent kernels such as 3.13. If we load the scsi_transport_iscsi module, tgtd can talk to it from inside of a container with no problem! The initiator still only needs the iscsi_tcp module.

I made a crappy asciinema recording to demonstrate here: https://asciinema.org/a/11317

We should be able to just add the scsi_transport_iscsi module to cinder's module list and turn is_metal back off by default.

Horizon should use the same SECRET_KEY and SSL key/cert across containers

Currently we generate a different SECRET_KEY and SSL key/cert per horizon container -- we should probably change this so that all containers use the same SECRET_KEY value and SSL key/cert.

metadata_host left unset and defaults to '$my_ip'

I don't think this is an issue since we're using neutron only and metadata_agent.ini is configured with a correct nova_metadata_ip ({{internal_vip_address}}), however it may be worth setting metadata_host to also resolve to {{internal_vip_address}} in nova.conf just for consistency's sake.

Horizon Endpoint Type is publicURL by default

This fails if the internal server doesn't have access to the "external" IP via the firewall and is quite inefficient even if it does.

Need to add maas_local checks for X_service_check.py plugins

These should have been added to maas_local.yml but doesn't appear to have been done.

cinder LVM volume fails to delete

Trying to delete a cinder volume in IAD, it goes from 'active' to 'deleting' but never makes it to 'deleted'. 12 hours later, looking at lvs and lvdisplay, it seems that the volume staged for deletion has not been deleted and is still sitting there in a suspended state. No stack traces noted.

https://gist.github.com/jcourtois/1470b0e24a14205eb592

Address addition of new user_variables

If we add new user_variables we need a sensible way of notifying the end-user as these won't be in the "/etc/rpc_deploy/user_variables.yml" file and will cause runs to fail.

We currently do a version check on the inventory vs environment vs rpc_user_config files, we should probably consider doing this with the user_variables also.

Issues with the Frozen Repo [Missing or Incorrect packages]

All of the packages from the mariadb apt-repos are missing, specifically: "deb http://mirror.jmu.edu/pub/mariadb/repo/5.5/ubuntu trusty main"

This makes it impossible to install "mariadb-galera-server-5.5" and it's supporting services.

Glance image replication, or other such mechanisms

Issue by byronmccollum
Thursday Aug 28, 2014 at 03:20 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/484

Glance image replicator redux...

Registering an image lands the bits on only one infra node. Spawning an instance from that image could cause nova scheduler to retry multiple times until the image fetch call goes to the correct infra node containing the image. If scheduler retries >= number of infra nodes, this should eventually success (after unnecessary delay, and artificially induced lumpy compute distribution), but with any amount of load, it's quite possible all scheduler retries go to infra nodes without the image.

So, are there plans to reintroduce glance image replicator, or other such mechanisms. Or is the preferred / recommended configuration to use Swift if you have more that one infra node?

All git repos should be set to a TAG or SHA.

Issue by cloudnull
Monday Aug 25, 2014 at 15:06 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/459

All of the git repos that we have we need to set the tag or sha so that we never install anything from source that is unexpected. Additionally, we should create a tarball of all sources we are installing and add them to our frozen repo. Installing from tarball and then falling back to git repos if needed will not only speed up installations, clones can take a long time and vary due to lots of uncontrollable network conditions, but should also provide us a mechanism to move into in a CDC environment. IMO, by having the repo contain a source directory where software is downloaded and installed from that contains our tar'd up git repos will be the first step to fix the CDC situation where no internet access is available.

Here are all of the git repos we have and the respected branches that need to be stabilized:

Cinder:

inventory/group_vars/cinder_all.yml:62:git_repo: https://git.openstack.org/openstack/cinder
inventory/group_vars/cinder_all.yml:63:git_fallback_repo: https://github.com/openstack/cinder

Cinder Git Branch:

inventory/group_vars/cinder_all.yml:65:git_install_branch: stable/icehouse

Glance:

inventory/group_vars/glance_all.yml:72:git_repo: https://git.openstack.org/openstack/glance
inventory/group_vars/glance_all.yml:73:git_fallback_repo: https://github.com/openstack/glance

Glance Git Branch:

inventory/group_vars/glance_all.yml:75:git_install_branch: stable/icehouse

Heat:

inventory/group_vars/heat_all.yml:67:git_repo: https://git.openstack.org/openstack/heat
inventory/group_vars/heat_all.yml:68:git_fallback_repo: https://github.com/openstack/heat

Heat Git Branch:
inventory/group_vars/heat_all.yml:70:git_install_branch: stable/icehouse

Horizon:

inventory/group_vars/horizon.yml:44:git_repo: https://git.openstack.org/openstack/horizon
inventory/group_vars/horizon.yml:45:git_fallback_repo: https://github.com/openstack/horizon

Horizon Git Branch:

inventory/group_vars/horizon.yml:46:git_install_branch: stable/icehouse

Keystone:

inventory/group_vars/keystone_all.yml:57:git_repo: https://git.openstack.org/openstack/keystone
inventory/group_vars/keystone_all.yml:58:git_fallback_repo: https://github.com/openstack/keystone

Keystone Git Branch:

inventory/group_vars/keystone_all.yml:60:git_install_branch: stable/icehouse

Neutron:

inventory/group_vars/neutron_all.yml:80:git_repo: https://git.openstack.org/openstack/neutron
inventory/group_vars/neutron_all.yml:81:git_fallback_repo: https://github.com/openstack/neutron

Neutron Git Branch:

inventory/group_vars/neutron_all.yml:83:git_install_branch: stable/icehouse

Nova:

inventory/group_vars/nova_all.yml:76:git_repo: https://git.openstack.org/openstack/nova
inventory/group_vars/nova_all.yml:77:git_fallback_repo: https://github.com/openstack/nova

Nova Git Branch:

inventory/group_vars/nova_all.yml:79:git_install_branch: stable/icehouse

RaxMon:

roles/raxmon_agent_install/tasks/main.yml:39: repo: https://github.com/rcbops/rpc-maas.git

RaxMon Git Branch:

etc/rpc_deploy/user_variables.yml:112:maas_repo_version: master

Holland:

roles/rpc_support_holland/tasks/holland.yml:56: repo: https://github.com/holland-backup/holland

Holland Git Branch:

playbooks/rpc_support.yml:34: holland_release: "{{ rpc_support_holland_branch|default('v1.0.10') }}"

Nova-compute logs aren't mounted by rsyslog

This is a symlink which exists on the container, but its a broken symlink since /var/log/nova doesn't exist on the container.

SSH Cert generation for nova fails

@cloudnull @jacobwagner

This fails:

TASK: [nova_compute_sshkey_setup | Sync authorized_keys file] *****************
failed: [node17.uk.com] => {"cmd": "rsync --delay-updates -FF --compress --archive --rsh 'ssh -o StrictHostKeyChecking=no' --out-format='<>%i %n%L' /tmp/authorized_keys [email protected]:/var/lib/nova/.ssh/authorized_keys", "failed": true, "rc": 255}
msg: Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,password).
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.0]

This won't work unless we force sync keys on all nodes from the node you're running it on. (e.g. usually I use ssh-agent forwarding to get this to work, this doesn't follow since it will ssh to localhost first)

Additionally, it looks like it would fail on a second run (even if it succeeded on the first) since the shell out prompts you if the key already exists (which it will), and ansible won't respond to the prompt.

I think we should really reconsider this approach - with the original patch we discussed it over here, end result the same key was used on all nodes, this simplifies it out, for no real downside (since all the keys have to be synced anyway, access to a nova user on any node gives you access to all other nova users on other compute nodes - regardless of whether they have the same or different keys).

With separate keys for each compute node, the scale out isn't amazing since the pub key has to be copied to all other compute nodes, and it has to sync the authorized_keys for all compute nodes, for no real upside. Additionally the authorized_keys file is being recreated every run - it seems like generating individual keys creates added complexity for no real benefit.

README out of date

Issue by jcourtois
Monday Aug 18, 2014 at 15:35 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/398

The README hasn't been updated for about a month. Since then, at least one container (nova-spice-console) has been added, another has been removed (nova-compute) and a number of other changes have been made at least some of which are probably material.

Can someone give the README a close reading and edit it for accuracy and usability?

Frozen repos - raxmon

Issue by git-harry
Friday Aug 08, 2014 at 09:41 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/329

https://github.com/rcbops/ansible-lxc-rpc/blob/master/rpc_deployment/roles/raxmon_agent_install/tasks/main.yml#L2-13

Are these tasks required?

Glance requires an entry for snet endpoint even when not using swift for the image backend.

Issue by cloudnull
Monday Aug 25, 2014 at 20:43 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/463

When running a new installation the system attempts to laydown a host name for a service_net endpoint even when using a file backend. The desired outcome would be that all swift related options are optional and only parsed when set to a backend of swift.

The task that is attempting to execute:
https://github.com/rcbops/ansible-lxc-rpc/blob/ead8d1d698aa2820f53a20c753f4f6db03e44b64/rpc_deployment/roles/glance_snet_override/tasks/main.yml

Valid Endpoints:
https://github.com/rcbops/ansible-lxc-rpc/blob/ead8d1d698aa2820f53a20c753f4f6db03e44b64/rpc_deployment/inventory/group_vars/glance_all.yml#L98-L116

Glance Options:
https://github.com/rcbops/ansible-lxc-rpc/blob/master/etc/rpc_deploy/user_variables.yml#L56-L65

Running the code results with an error when snet is true || false
glance_swift_enable_snet: false

TASK: [glance_snet_override | Remove hosts entry if glance_swift_enable_snet is False] ***
fatal: [infra1_glance_container-dbd3a387] => One or more undefined variables: 'dict object' has no attribute 'SomeRegion'

FATAL: all hosts have already failed -- aborting

glance_swift_enable_snet: true

TASK: [glance_snet_override | Add hosts entry if glance_swift_enable_snet is True] ***
fatal: [infra1_glance_container-dbd3a387] => One or more undefined variables: 'dict object' has no attribute 'SomeRegion'

FATAL: all hosts have already failed -- aborting

Related Issue: #260

Monitor for excessive keystone tokens

Issue by mattt416
Wednesday Jul 09, 2014 at 08:04 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/144

It may be worthwhile monitoring the number of keystone tokens and alerting if that number is excessive. It could indicate that the token_flush cronjob is not defined or operating correctly.

Configuration generation is inconsistent

Issue by andymcc
Friday Aug 22, 2014 at 15:41 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/452

At the moment we use a template generator for glance but not for nova (as an example).

We should probably make this consistent at some point.

mass_scheme duplicate variable causing issues for horizon_maas_scheme

Issue by andymcc
Wednesday Aug 27, 2014 at 14:39 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/481

There is a duplicate entry so removing the oldduplicate entry will fix this.

Heat MaaS api metric names are incorrect

Issue by andymcc
Wednesday Aug 27, 2014 at 13:48 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/478

metric "nova_spice_api_local_status" is not available

Listed as the error since the metric name is wrong

Flat network type seems broken on un-containerized compute nodes

When using a flat network type the linux-bridge-agent bombs due to the linux-bridge-agent attempting to bridge the already bridged network.

Error:

2014-09-08 02:51:13.332 19696 INFO neutron.plugins.linuxbridge.agent.linuxbridge_neutron_agent [-] Port tapc76d1465-17 updated. Details: {u'admin_state_up': True, u'network_id': u'bb4573bd-220e-491e-8b0a-01b84c5a6551', u'segmentation_id':
 None, u'physical_network': u'vlan', u'device': u'tapc76d1465-17', u'port_id': u'c76d1465-172a-43ec-98cd-ca4f6daa444f', u'network_type': u'flat'}
2014-09-08 02:51:14.604 19696 ERROR neutron.plugins.linuxbridge.agent.linuxbridge_neutron_agent [-] Unable to add br-vlan to brqbb4573bd-22! Exception:
Command: ['sudo', '/usr/local/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'brctl', 'addif', 'brqbb4573bd-22', 'br-vlan']
Exit code: 1
Stdout: ''
Stderr: "device br-vlan is a bridge device itself; can't enslave a bridge device to a bridge device.\n"

While this issue needs verification, to fix this we'd need to rework the ml2.ini plugin to move the binding for flat network types to an unbridged interface.

Ensure hostname referenced in rabbitmq check in maas_local.yml playbook is correct

Issue by mattt416
Friday Aug 15, 2014 at 14:54 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/383

Currently, we use ansible_hostname however @andymcc has stated that this may not be correct after recent changes to the rabbitmq roles. We need to investigate this further.

Maria DB and adiscon repos not using frozen repo CDN

The Add Common Repos task in host-setup.yml does not drop in the frozen repo for maria db and others.

The adiscon ppa in rsyslog-install.yml is not using the frozen mirror. ppa:adiscon/v8-stable

Time out on initial Network check is probably too long

Issue by andymcc
Friday Aug 29, 2014 at 13:20 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/485

The initial time out check is 60 seconds - this seems too long, we could do this at 10seconds, since it just exists to determine whether we should reboot the container or not - the post reboot check can be longer (as it is 60 seconds).

This adds a lot of time to the initial run, and if the network is up (before we even restart it - e.g. it was already setup from a previous run) then it won't take 60 seconds to confirm.

Repo generator should be able to generate config containing all available packages

Issue by hughsaunders
Tuesday Aug 26, 2014 at 14:49 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/470

This will help with bootstrapping the frozen repo process - get a list of all available packages and generate config.

Additionally, it would be useful to have a command to check if an any new packages are available (perhaps an option to the update check).

All Package repos need to be replaced with the frozen repo

Issue by cloudnull
Monday Aug 25, 2014 at 14:58 GMT
Originally opened as https://github.com/rcbops/ansible-lxc-rpc-orig/issues/458

The repo URL needs to be updated to use the cname, we presently have:

inventory/group_vars/all.yml:43:rpc_repo_url: "http://dc0e2a2ef0676c3453b1-31bb9324d3aeab0d08fa434012c1e64d.r5.cf1.rackcdn.com"

These repo need to use the rpc_repo_url variable:

inventory/group_vars/all.yml:43:rpc_repo_url: "http://dc0e2a2ef0676c3453b1-31bb9324d3aeab0d08fa434012c1e64d.r5.cf1.rackcdn.com"
inventory/group_vars/all.yml:53: - { repo: "deb http://mirror.jmu.edu/pub/mariadb/repo/5.5/ubuntu {{ ansible_distribution_release }} main", state: "present" }
inventory/group_vars/elasticsearch.yml:29: - { repo: "deb http://packages.elasticsearch.org/elasticsearch/1.2/debian stable main", state: "present"}
inventory/group_vars/hosts.yml:20: - { repo: "deb http://stable.packages.cloudmonitoring.rackspace.com/ubuntu-14.04-x86_64 cloudmonitoring main", state: "present" }
inventory/group_vars/logstash.yml:30: - { repo: "deb http://packages.elasticsearch.org/logstash/1.4/debian stable main", state: "present"}
inventory/group_vars/rabbit.yml:22: - { repo: "deb http://www.rabbitmq.com/debian/ testing main", state: "present"

Related Issues:

#278
#279

safe upgrade is upgrading the kernel

When the setup-common role runs a safe-upgrade is run to ensure all of the packages that are determined to be safely upgradable are up-to-date. Sadly the ansible apt module uses aptitude to perform this upgrade which attempts to bring in new kernels as they become available. Presently we are dep'ing on kernel 3.13.0-34-generic and the new stable kernel for Ubuntu 14.04.01 kernel 3.13.0-35-generic. It would seem that this is going to be a never ending battle of incremental kernel releases. If we are not going to be deploying a repository of frozen packages at this time we may need change the kernel check to set on the version and allow for kernel releases of >= 34.

rcbops / ansible-lxc-rpc Goto Github PK

ansible-lxc-rpc's Introduction

ansible-lxc-rpc's People

Contributors

Stargazers

Watchers

Forkers

ansible-lxc-rpc's Issues

Then it just blow up around here. See above error...

User defined Storage Hosts, this should be a required group

Recommend Projects

Recommend Topics

Recommend Org