Code Monkey home page Code Monkey logo

tendrl-ansible's Introduction

tendrl-ansible

Ansible roles and playbooks for Tendrl!

What does it do?

tendrl-ansible automates installation of Tendrl and helps with cluster expansion based description from Tendrl wiki. You should check installation documentation there to have basic understanding of various machine roles in Tendrl cluster before using tendrl-ansible.

How to get tendrl-ansible?

Just clone this repo:

$ git clone https://github.com/Tendrl/tendrl-ansible.git

or you can install rpm package from copr repo tendrl/release with stable Tendlr upstream builds:

# yum copr enable tendrl/release
# yum install tendrl-ansible

See how to enable copr repository if you need more help with this step.

Note that installing tendrl-ansible from rpm package is highly recommended when you use stable builds from tendrl/release copr. Otherwise just cloning the repo is good enough.

Which version of ansible do I need?

Ansible >= 2.7 is required to use tendrl-ansible.

What roles and playbooks are there?

This is a brief overview only, there is a README file for each role, where you can see more details about each role, along with list of ansible variables one can use to tweak it.

Ansible roles for Tendrl:

  • tendrl-ansible.tendrl-copr: installs yum repositories with builds provided by Tendrl project, it uses stable tendrl/release copr by default
  • tendrl-ansible.tendrl-server: installation of Tendrl Server machine (where Tendrl api, web and etcd are running)
  • tendrl-ansible.tendrl-storage-node: installation of Tendrl Storage Node machines (Gluster servers, which you would like to monitor by Tendrl)

Roles installing yum repositories of Tendrl dependencies:

  • tendrl-ansible.grafana-repo: installs official upstream yum repository with latest stable Grafana release.

For convenience, there are also ansible roles for installation of yum repositories with upstream releases of Ceph, Gluster and theirs installation tools (such as gdeploy):

  • tendrl-ansible.gluster-gdeploy-copr

Playbook files:

  • prechecks.yml: playbook checking requirements before Tendrl installation (see comments inside the playbook file for references)
  • site.yml: main playbook of tendrl-ansible, which one will use to install Tendrl

Where are the roles and playbooks if I use rpm package?

Ansible roles are available in /usr/share/ansible/roles/ directory, where the role directories are prefixed with tendrl-ansible., for example: /usr/share/ansible/roles/tendrl-ansible.tendrl-server. Each role has it's own README.md file, where you can find all details about it's usage.

Playbooks are available in /usr/share/doc/tendrl-ansible-1.6.0/ directory, where 1.6.0 is version of tendrl-ansible package.

What should I know before using tendrl-ansible?

You need to know how to use ansible and how to deploy and use ssh public keys (to be able to connect via ssh without asking for password).

Moreover since this README file can't provide all details about Tendrl, you should read Tendrl installation documentation as well.

And last but not least, both tendrl-ansible.tendrl-server and tendrl-ansible.tendrl-storage-node roles contain many variables which one can use to tweak the installation. See README files of the roles for their description.

What installation steps from Tendrl installation documentation are not part of tendrl-ansible?

This should be clear from Tendrl installation documentation itself, but for the sake of convenience, here is the list of installation or deployment steps which are out of scope of tendrl-ansible:

  • Deployment and installation of machines (either virtual or bare metal), which includes setup of networking, partitioning of disks, deployment of ssh public keys and so on.
  • Installation and configuration of GlusterFS on the machines, see gdeploy for automation of this task.
  • Setup of dedicated disk for etcd and graphite data directories.
  • Setup of https for Tendrl web and api.
  • Deployment of tls certificates and keys for etcd tls based client server encryption and authentication (this means communication between various tendrl components and etcd instance).

How do I install Tendrl with tendrl-ansible?

  1. Install tendrl-ansible:

    # yum install tendrl-ansible
    

    See section "How to get tendrl-ansible?" in this README file for more details.

  2. Create Ansible inventory file with groups for tendrl_server and gluster_servers. Here is an example of inventory file for 4 node cluster with Gluster:

    [gluster_servers]
    gl1.example.com
    gl2.example.com
    gl3.example.com
    gl4.example.com
    
    [tendrl_server]
    tendrl.example.com
    
  3. Add mandatory ansible variables into the inventory file you created in the previous step.

    This includes:

    • etcd_ip_address configures where etcd instance is listening
    • etcd_fqdn configure tendrl components to be able to connect to etcd instance
    • graphite_fqdn configures tendrl components to be able to connect to graphite instance (this value doesn't reconfigure graphite itself!)

    For simple example cluster from previous step, assuming there is only single network interface on all machines, the code you need to add into the inventory file would look like:

    [all:vars]
    etcd_ip_address=192.0.2.1
    etcd_fqdn=tendrl.example.com
    graphite_fqdn=tendrl.example.com
    

    Where 192.0.2.1 is ip address of tendrl server, tendrl.example.com is a hostname of tendrl server and tendrl.example.com hostname is translated to 192.0.2.1 ip address.

    See full description in README file of tendrl-ansible.tendrl-server role and pay attention to the values you specify there when you use multiple network interfaces on the machines.

    Note: you can define these variables anywhere else you like (eg. in variable files or from command line directly), but including them into the inventory provides you with a single file with almost full description of tendrl-ansible setup for future reference (eg. reruning tendrl-ansible later when you need to expand cluster or make sure the configuration still holds). The only information not stored in inventory file which you may need in the future is grafana_admin_passwd file, which contains grafana admin password, which will be generated during tendrl-ansible run.

  4. Add optional ansible variables into the inventory file.

    Based on Tendrl documentation and description in README files of tendrl-ansible roles, specify values for variables you like to tweak.

    This is important because some features tendrl-ansible can help you with are disabled by default as they require additional user input.

    This includes etcd tls client authentication (etcd_tls_client_auth and other variables), tendrl notifier configuration for snmp or smtp (tendrl_notifier_email_id and other variables), and other tweaks (eg. tendrl_copr_repo variable of tendrl-ansible.tendrl-copr role).

    There are also features such as firewalld setup for Tendrl (variable configure_firewalld_for_tendrl) which are enabled by default, but can be disabled if needed.

  5. If you use tendrl-ansible from rpm package, copy site.yml playbook into working directory (where you already store the inventory file):

    $ cp /usr/share/doc/tendrl-ansible-VERSION/site.yml .
    

    Do the same for prechecks playbook:

    $ cp /usr/share/doc/tendrl-ansible-VERSION/prechecks.yml .
    
  6. Check that ssh can connect to all machines from the inventory file without asking for password or validation of public key by running:

    $ ansible -i inventory_file -m ping all
    

    You should see ansible to show "pong" message for all machines. In case of any problems, you need to fix it before going on. If you are not sure what's wrong, consult documentation of ansible and/or ssh.

    The following example shows how to use ansible become feature when direct ssh login of root user is not allowed and you are connecting via non-root cloud-user account, which can leverage sudo to run any command as root without any password:

    $ ansible --become -u cloud-user -i inventory_file -m ping all
    

    If this is your case, you may consider converting command line arguments related to Ansbile become feature into behavioral inventory parameters and adding them into the inventory file. This way, you don't need to specify these arguments again for every ansible command. Example of this update which matches previous command line example follows (it should be appended to the [all:vars] section):

    ansible_become=yes
    ansible_user=cloud-user
    

    After this edit, you can re run the ping example without become command line arguments:

    $ ansible -i inventory_file -m ping all
    
  7. Now you can run prechecks playbook to verify if minimal requirements and setup for Tendrl are satisfied. Any problem with the pre checks will make the playbook run fail immediately, pointing you to a particular requirement or problem with configuration before the installation itself (preventing you to spend time with unnecessary debugging after installation).

    For production deployment, run the full check:

    $ ansible-playbook -i inventory_file prechecks.yml
    

    While for proof of concept deployments, you can avoid checking of stringent production requirements using production tag:

    $ ansible-playbook -i inventory_file prechecks.yml --skip-tags "production"
    

    If you are not sure why a particular check is there or what is checked exactly, open the playbook file and see comments and/or implementation of the check.

  8. Then we are ready to run ansible to install Tendrl:

    $ ansible-playbook -i inventory_file site.yml
    

    Assuming we have deployed ssh keys on the machines and have Gluster trusted storage pool already installed and running there.

  9. Log in to your tendrl server at http://tendrl.example.com (hostname of Tendrl server as specified in the inventory file in step #2) with username admin and default password adminuser.

    Note that tendrl-ansible.tendrl-server role includes setup of admin user account for Tendrl (usable with both api and web interface), and that default password is adminuser. Moreover the admin password is also stored on Tendrl Server machine in /root/password file (this feature of tendrl-ansible is based on TEN-257).

How do I expand cluster with tendrl-ansible?

See Tendrl wiki for full details of cluster expansion procedure. This section contains only brief overview of the expand operation for you to understand how tendrl-ansible fits into Tendrl cluster expand operation.

  1. First of all, you need to install operating system and Gluster on new servers(s) and add them into existing cluster (aka Gluster Trusted Storage Pool) via peer probe and add bricks on new server(s) into existing gluster volume(s) based on your needs.

  2. When Gluster is aware of new servers (you see them in output of gluster pool list command), you add the new servers into ansible inventory file (into group gluster_servers) which you used during installation of Tendrl.

    Note that it's important to add new servers into the same inventory file as was used during installation, because you need to ensure that you are using the same set of ansible variables. For the same reason, you need to have the lookup file with password for grafana admin grafana_admin_passwd availabe in current directory.

  3. Then, you rerun ansible playbook in the same way as done during Tendrl installation:

    $ ansible-playbook -i inventory_file site.yml
    

    During this run, ansible should report "ok" status for already existing machines, while reporting "changed" status for the new machines you just added.

  4. Now, you should be able to see new servers in Tendrl web ui (see Tendrl documentation for details).

Does tendrl-ansible use some ansible tags?

Yes, tendrl-ansible uses ansible tags as listed below.

The purpose of these tags is to make debugging after installation easier by allowing to run particular type of tasks quickly without rerunning the whole tendrl-ansible playbook.

  • Tags service-enabled and service-started allows one to run just ansible tasks which enables (or starts) all services which Tendrl consists of. This is useful for checking that all services are running as expected.
  • Tag firewalld allows one to run firewalld setup only, making sure that all ports are enabled. Note that the tag doesn't override ansible variable configure_firewalld_for_tendrl, and if you have set it to False, all firewalld tasks will be skipped.
  • All yum tasks are tagged with rpm-installation. This is useful for testing purposes only and there is no reason to use it in production.

Example: The following command will check that all ports are open via firewalld after installation of Tendrl. If all tasks are reported as "ok", the ports has been already opened as expected.

$ ansible-playbook -i inventory_file site.yml --tags firewalld

License

Distributed under the terms of the GNU LGPL, version 2.1 license, tendrl-ansible is free and open source software.

tendrl-ansible's People

Contributors

brainfunked avatar dahorak avatar ebondare avatar gowthamshanmugam avatar japplewhite avatar julienlim avatar mbukatov avatar nthomas-redhat avatar r0h4n avatar timothyasirjeyasing avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tendrl-ansible's Issues

Need ansible playbook for grafana and monitoring-integration

Add ansible playbook for installing and configuring grafana and monitoring-integration.
This installation should do the 1) basic configuration based on the default values and
the given variables, 2) should start the grafana-server service and the monitoring-
integration.

Implement SNMP configuration for tendrl-notifier

Rohan introduced description of SNMP configuration into Tendrl release latest installation guide in commit 67c966d57bbaade16b207a990b83c91e1422f433.

Supported versions of snmp protocol: v2 and v3.

Related Information:

Known Issues:

Example of default configuration file (from tendrl-notifier-1.5.3-10_12_2017_00_40_04.noarch.rpm):

$ cat snmp.conf.yaml 
# V2 end point details
# Enable the below lines if v2 end point to be used for SNMP notifications
# Note: For different communities make different similar entries below additionally
# Here default community is public
# V2 is auth less 

# v2_endpoint:
#  endpoint1:
#    host_ip: 127.0.0.1
#    community: public
#  endpoint2:
#    host_ip: 127.0.0.1
#    community: public


# V3 end point details
# V3 needs authorization, Need to provide md5 password and des password
# Enable the below lines if v3 end point to be used for SNMP notifications

# v3_endpoint:
#  endpoint1:
#    host_ip: 127.0.0.1
#    username: admin
#    auth_key: mymd5pass
#    priv_key: mydespass
#  endpoint2:
#    host_ip: 127.0.0.1
#    username: user
#    auth_key: mymd5pass
#    priv_key: mydespass

tendrl-ansible playbook fails

Hi,

Tried setting up tendrl-ansible with two nodes.

ansible-playbook of site.yml succeeded.

But, tendrl server accessed from browser, I can see the following errors after import.

error Failure in Job 74d8f944-53a9-4d14-84cb-bbf44c8d764a Flow tendrl.flows.ImportCluster with error: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/init.py", line 218, in process_job the_flow.run() File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/init.py", line 103, in run raise ex AtomExecutionFailedError: Atom Execution failed. Error: Error executing post run function: tendrl.objects.Cluster.atoms.ConfigureMonitoring 20 Nov 2017 01:27:19

error Failed post-run: tendrl.objects.Cluster.atoms.ConfigureMonitoring for flow: Import existing Gluster Cluster 20 Nov 2017 01:27:17

error Could not find atom tendrl.objects.Cluster.atoms.ConfigureMonitoring 20 Nov 2017 01:27:15

Could you check?

Remove installation of tendrl-alerting

Based on:

installation of tendrl-alerting component should be removed from tendrl-server role.

@nthomas-redhat verified this conclusion today on gitter:

@mbukatov , In milestone 1 , alerting is not a supported functionality. We are planning to introduce that in milestone-2 and Tendrl/specifications#190 has details.

Relicense from Apache 2.0 to LGPL 2.1

As pointed out by @r0h4n , we need to align license of tendrl-ansible with the rest of Tendrl project, which means relicensing to LGPL 2.1.

Approval for this is needed from all contributors:

If you find yourself in this list, please add "+1" reaction or add a comment with "I agree" text below. I will monitor the approvals, update the list and when all contributors agree, will go on and change the license.

Configuration of cert based etcd auth

Based on poor performance of etcd password authentication, we are going to use authentication based on certificates instead.

The Setup

  • issue cert for each storage node and a tendrl server (we could go further and issue cert for each service, but that would be even more work) (outside of tendrl-ansible)
  • deploy the cert files on all machines (outside of tendrl-ansible)
  • reconfigure etcd for client auth
  • update config file of all tendrl services (parametrized via variable)
  • restart all tendrl and etcd service

Details

@r0h4n describes the setup in our wiki: https://github.com/Tendrl/documentation/wiki/Tendrl-with-a-secure-etcd-cluster

Possible missing details in the description:

  • ETCD_CLIENT_CERT_AUTH="true"
  • setup for tendrl-api

See Also

Related Issues

  • Previous tendrl-ansible issue for password auth, which is being replaced by this cert base approach: #34
  • Tendrl/commons#721 (this issue references the wikipage linked above)
  • Tendrl/api#262

Blocking Issues

These issues blocks merge of this into master:

Generating random password and setting it in grafana.ini and monitoring-integration.conf.yaml

Tendrl ansible is supposed to generate a random password for grafana admin, which is then supposed to be inserted into grafana.ini and monitoring-integration.conf.yaml

In monitoring-integration it is supposed to be inserted under credentials under password
https://github.com/Tendrl/monitoring-integration/blob/master/etc/tendrl/monitoring-integration/monitoring-integration.conf.yaml.sample#L27

In Grafana.ini
It should be set under admin_password
https://github.com/Tendrl/monitoring-integration/blob/master/etc/grafana/grafana.ini#L146

Cover SELinux configuration

Based on SELinux section in https://github.com/Tendrl/documentation/wiki/Tendrl-release-v1.5.2-(install-guide), update tendrl-ansible (most likely add tendrl-selinux role).

Specification

Actual Implementation

SELinux policies are maintained in selinux directories of these 2 repositories:

So that there are the following SELinux packages:

In tendrl-api spec file:

carbon-selinux-1.5.2-20170927T205654.a9e16c0.noarch.rpm
tendrl-grafana-selinux-1.5.2-20170927T205654.a9e16c0.noarch.rpm
tendrl-server-selinux-1.5.2-20170927T205654.a9e16c0.noarch.rpm

In tendrl-gluster-integration spec file:

tendrl-collectd-selinux-1.5.2-20171002T062306.f428920.noarch.rpm
tendrl-node-selinux-1.5.2-20171002T062306.f428920.noarch.rpm

Questions to figure out

  • Why we have tendrl-collectd-selinux rpm and carbon-selinux rpm? Having tendrl- prefix is a valid approach, as long as tendrl-collectd policy overrides collectd policy from selinux-policy package. Having no prefix could work as long the policy is not covered in selinux-policy package.

  • Is it ok to maintain SELinux policies in 2 unrelated repositories?
    No, this is not a good idea. While it would be ok to attach selinux policy to one of already established repositories (as written in the specification), spreading it into 2 repositories and covering 3rd party components is not, as it unnecessary increases work required for maintenance.

  • Is it ok for specfile for gluster-integration or api to contain selinux policies of 3rd party packages? No, it's not ok.

  • There is a conflict between node and server package (see below). Is this ok?

    # rpm -ql -p tendrl-node-selinux-1.5.2-20171002T062306.f428920.noarch.rpm 
    /usr/share/selinux/packages/tendrl.pp.bz2
    # rpm -ql -p tendrl-server-selinux-1.5.2-20170927T205654.a9e16c0.noarch.rpm 
    /usr/share/selinux/packages/tendrl.pp.bz2
    

    No, this is not ok.

  • What a default mode of the tendrl domains should be? It should be permissive.

Related Issues and Details

Blocker Issues

While I'm able to work on SELinux setup outright, I'm not going to merge the changes without these issues being addressed:

Status update: I'm going to fix these issues during work on moving selinux code into single repository tendrl-selinux.

Upstream Guidelines

https://fedoraproject.org/wiki/SELinux/IndependentPolicy

Update prechecks playbook to match new requirements

Sence hw requirements (wrt cpu and memory) in installation guide were updated, we need to update prechecks playbook to match what is stated there.

I don't link to the particular values, as the next release 1.5.5 is not currently finalized and numbers could change.

Replacing grafana.ini file path in /etc/sysconfig/grafana-server

With reference to Tendrl/monitoring-integration#60, there is a need for ansible script to replace the path of grafana.ini file in /etc/sysconfig/grafana-server file.
The path to all configuration files are present in grafana-server file https://github.com/grafana/grafana/blob/master/packaging/rpm/sysconfig/grafana-server
When grafana is installed this file is copied to /etc/sysconfig/grafana-server.

Grafana is installed when monitoring-integration is installed, so this change needs to be made after monitoring-integration is installed and before grafana-server.service is started

The new path in /etc/sysconfig/grafana-server will be:

CONF_DIR=/etc/tendrl/monitoring-integration/grafana/
CONF_FILE=/etc/tendrl/monitoring-integration/grafana/grafana.ini 

Add and enable minimal CI checks

Add checks for:

  • yaml file validation
  • something else?

and configure and enable Travis CI job to run it for every pull request.

Can't import ceph cluster

Hello.
First of all this role fails, demanding graphite_ip_address (ok if you set it manually in ceph role).
Secondly - i can't import ceph cluster for some reason (it detects hosts, their role, but as i think it tries to communicate with them as gluster nodes) - i get timeouts all the time.
I noticed dev packages on fedora repo, which have reworked ui and support ceph - is there somewhere a guide how to install that version?

Change etcd password auth setup so that it's disabled by default

Based on known problem with etcd password auth and statement in install guide:

Enabling etcd user/password auth is optional, release testing shows increased CPU usage of Etcd processes when auth is enabled (WARNING)

We should change tendrl-ansible so that the feature is disabled by default in this milestone.

This is slated to be removed completely for the next milestone.

Update tendrl-ansible for Tendrl Milestone #1 (3rd milestone)

Tendrl ansible needs to be updated for Gluster Monitoring milestone

Related specifications (only specifications originally listed in previous milestone #2 are mentioned there, full review is needed):

Add vagrantfile and ansible playbook for setting up POC demo cluster

We need to add vagrant code for setup of POC (proof of concept) demonstration cluster.

This includes:

  • vagrantfile with definition of virtual machines for the demo cluster
  • special ansible playbook tweaked for vagrant demo cluster use case
  • make sure it works on Fedora with Vagrant on top of libvirt
  • make sure it works on Mac OS X with Vagrant on top of VirtualBox or VMware (or whatever hypervisor available there, checking one is good enough, volunteer needed for this)

Having this work on Fedora with libvirt has higher priority over Mac OS X with proprietary hypervisors.

tendrl-ansible should meet requirements from TEN-257

This issue covers all requirements from TEN-257, with exception of pre checks (there is a separate issue for that).

  • The system will install and configure the core Tendrl components according to the upstream RPM documentation
  • The playbook should print out that the API admin password has been placed in /root/password on the main tendrl node and the permissions should be 700 on that file. It should not echo out the admin password (so that logging could not capture it).
  • The resulting Tendrl server(s) & performance monitoring server(s) will be able to manage both Ceph and Gluster nodes.
  • All roles should be able to be converged on one node (but not storage clusters themselves)
  • If a storage node is already running ceph or gluster, the resulting node agent on that node will be setup as per the upstream docs for node-agent (the Tendrl system should handle the rest)
  • (Nice to have - not required) The playbook should have tags for the various aspects it does: --tags "precheck,configuration,packages" so that they can be skipped or run alone as desired (example use case: only run precheck, or re-run configuration)
    • tag rpm-installation (but this is useful for testing or development only, no production usage expected - testing that one can install all packages from current repositories)
    • tag service-started and service-enabled (making sure all services are enabled or running as expected)
    • tag firewalld for firewalld related tasks
    • tag configuration is not a good idea, to reconfigure tendrl cluster, update playbook and run it again
    • tag for prechecks is not needed, as all the checks are in a separate playbook file
    • another option is to have tags for particular components, eg. tendrl-ui (again, useful mainly for testing/development)
  • The system runs these basic pre-checks on all nodes to ensure best practices are followed: #7

Items which are striked out are no longer possible to implement in tendrl-ansible as originally intended because of recent changes in Tendrl itself.

True/False supplied from command line (as --extra-vars..) are not properly considered as boolean values

Supplying True/False values from command line is not possible, because they are considered as string (ansible/ansible#14329).

It might be useful to use |bool filter on places where is expected booleans (for example https://github.com/Tendrl/tendrl-ansible/blob/master/roles/tendrl-server/tasks/etcd.yml#L11 and https://github.com/Tendrl/tendrl-ansible/blob/master/roles/tendrl-server/tasks/etcd.yml#L15).

Actual result (related to the two lines mentioned above), etcd_tls_client_auth=True is not considered properly:

$ ansible-playbook -i /var/lib/jenkins/usm/ci-usm3.hosts -vvv /var/lib/jenkins/workspace/tendrl-2-ci-usm3-cluster-install/usmqe-setup/tendrl.yml -e 'tendrl_repo=master  etcd_tls_client_auth=True'

<<truncated>>

TASK [tendrl-ansible.tendrl-server : Use http as protocol in etcd urls] ***************************************************************************************************************************
task path: /var/lib/jenkins/workspace/tendrl-2-ci-usm3-cluster-install/tendrl-ansible-tmp/usr/share/ansible/roles/tendrl-ansible.tendrl-server/tasks/etcd.yml:11
Monday 16 October 2017  12:11:03 +0000 (0:00:00.040)       0:01:03.440 ******** 
skipping: [ci-usm3-server.usmqe.lab.eng.brq.redhat.com] => {
    "changed": false, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}

TASK [tendrl-ansible.tendrl-server : Use http as protocol in etcd urls] ***************************************************************************************************************************
task path: /var/lib/jenkins/workspace/tendrl-2-ci-usm3-cluster-install/tendrl-ansible-tmp/usr/share/ansible/roles/tendrl-ansible.tendrl-server/tasks/etcd.yml:15
Monday 16 October 2017  12:11:03 +0000 (0:00:00.029)       0:01:03.470 ******** 
skipping: [ci-usm3-server.usmqe.lab.eng.brq.redhat.com] => {
    "changed": false, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}

TASK [tendrl-ansible.tendrl-server : Configure etcd.conf ETCD_LISTEN_CLIENT_URLS] ************************************************************************************************************************************************************
task path: /var/lib/jenkins/workspace/tendrl-2-ci-usm3-cluster-install/tendrl-ansible-tmp/usr/share/ansible/roles/tendrl-ansible.tendrl-server/tasks/etcd.yml:19
Monday 16 October 2017  12:11:03 +0000 (0:00:00.030)       0:01:03.501 ******** 
fatal: [ci-usm3-server.usmqe.lab.eng.brq.redhat.com]: FAILED! => {
    "failed": true, 
    "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'etcd_url_protocol' is undefined\n\nThe error appears to have been in '/var/lib/jenkins/workspace/tendrl-2-ci-usm3-cluster-install/tendrl-ansible-tmp/usr/share/ansible/roles/tendrl-ansible.tendrl-server/tasks/etcd.yml': line 19, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Configure etcd.conf ETCD_LISTEN_CLIENT_URLS\n  ^ here\n"
}
	to retry, use: --limit @/var/lib/jenkins/workspace/tendrl-2-ci-usm3-cluster-install/usmqe-setup/tendrl.retry

Service tendrl-api is down after running ci_default from usmqe-setup

After installing Tendrl via ci_default playbook from usmqe-setup, which reuses tendrl roles from tendrl-ansible for tendrl installation, I see that tendrl-api service is down.

I'm not sure what component is actually responsible for this problem, so I'm creating the issue here to start tracking it somewhere and figure actual culprit on the go.

Details

When the playbook finishes, the tendrl-api sevice is down and in the journal logs I see:

Jun 28 14:42:43 mbukatov-usm1-server.example.com systemd[1]: Started Tendrl Api Daemon.
Jun 28 14:42:43 mbukatov-usm1-server.example.com systemd[1]: Starting Tendrl Api Daemon...
Jun 28 14:42:43 mbukatov-usm1-server.example.com puma[5531]: Puma starting in single mode...
Jun 28 14:42:43 mbukatov-usm1-server.example.com puma[5531]: * Version 3.6.0 (ruby 2.0.0-p648), codename: Sleepy Sunday Serenity
Jun 28 14:42:43 mbukatov-usm1-server.example.com puma[5531]: * Min threads: 0, max threads: 16
Jun 28 14:42:43 mbukatov-usm1-server.example.com puma[5531]: * Environment: production
Jun 28 14:42:43 mbukatov-usm1-server.example.com puma[5531]: * Listening on tcp://0.0.0.0:9292
Jun 28 14:42:43 mbukatov-usm1-server.example.com puma[5531]: Use Ctrl-C to stop
Jun 28 14:42:45 mbukatov-usm1-server.example.com puma[5531]: - Gracefully stopping, waiting for requests to finish
Jun 28 14:42:45 mbukatov-usm1-server.example.com puma[5531]: === puma shutdown: 2017-06-28 14:42:45 +0000 ===
Jun 28 14:42:45 mbukatov-usm1-server.example.com puma[5531]: - Goodbye!
Jun 28 14:42:45 mbukatov-usm1-server.example.com systemd[1]: Stopping Tendrl Api Daemon...
Jun 28 14:42:45 mbukatov-usm1-server.example.com systemd[1]: Stopped Tendrl Api Daemon.

Looking into audit log shows that something actually just stopped it:

type=SERVICE_START msg=audit(1498660963.454:1347): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=tendrl-api comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
type=SERVICE_STOP msg=audit(1498660965.568:1415): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=tendrl-api comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

But I doesn't match what the ansible is supposed to be doing:

TASK [tendrl-server : Install tendrl-api] **************************************
changed: [mbukatov-usm1-server.example.com]                    
                                                                                
TASK [tendrl-server : Configure tendrl-api etcd.yml host] **********************
changed: [mbukatov-usm1-server.example.com]                    
                                                                                
TASK [tendrl-server : Configure tendrl-api etcd.yml user_name] *****************
changed: [mbukatov-usm1-server.example.com]                    
                                                                                
TASK [tendrl-server : Configure tendrl-api etcd.yml password] ******************
changed: [mbukatov-usm1-server.example.com]                    
                                                                                
TASK [tendrl-server : Enable tendrl-api service] *******************************
changed: [mbukatov-usm1-server.example.com]                    
                                                                                
TASK [tendrl-server : Start tendrl-api service] ********************************
ok: [mbukatov-usm1-server.example.com]                         
                                                                                
RUNNING HANDLER [tendrl-server : restart tendrl-api] ***************************
changed: [mbukatov-usm1-server.example.com]  

Pre check playbook should conform requirements from TEN-257

There is a initial version of prechecks.yml playbook which automates set of checks, expected to be run before installing Tendrl. The requirements are described in TEN-257.

This issue is tracking full compliance wrt TEN-257:

  • Checks for duplicate machine IDs in storage nodes and fails with error if found (027c754)
  • Checks for lack of NTP enablement and enables if missing (18e5693)
  • Checks for lack of proper domain name setup and warns (b017203)
  • Checks for lack of DNS setup and warns (b017203)
  • Checks for adequate base OS version (Centos/RHEL 7.3 or higher) and fails with warning if missing
  • Checks for adequate memory and disk (for Tendrl node only) and proceeds with warning if inadequate:
    • based on information from Tendrl-release-v1.5.0-(install-doc).md file
    • this check will require updates when the hw requirements are changed or specified in more details
    • the check is concerned with tendrl-server only as no details are directly given for other nodes, moreover given the current scope when preexisting Gluster Storage cluster is just imported, hw check of gluster storage machines is out of scope anyway

I decided against implementing (requirements listed below won't be added into pre check playbook):

  • Checks for lack of proper host name and /etc/hosts setup and warns (must have a primary IP & host name in /etc/hosts)
  • setup of ntp based time synchronization (when ntp sync is detected to be unconfigured)

Other changes compared to original requirements:

  • The playbook run will just immediately fail when a problem is detected (via assert module), there are no checks that would give a warning only. The reason is that every problem needs a manual intervention and ansible is not suited for reporting warnings.
  • This playbook is not going to change anything on the machines, there should be no side effects.

Remove password based etcd authentication

Since using password based etcd authentication mode makes whole Tendrl stack extremely slow, with etcd itself consuming about 95% of cpu processing power, changes of the etcd auth configuration needs to be made across Tendrl stack.

Related issues concerned with etcd password auth:

Since the root cause seems to be etcd issue: etcd-io/etcd#3223 (comment), the etcd auth should be removed. This is blocked by: Tendrl/api#294

This basically boils down to reverting changes introduced in #34

bug: fix validation of gpg signatures of storage packages

Installation of ceph storage packages fail during "Installing Ceph Packages on OSDS" ceph-installer task (which is invoked by tendrl):

TASK [ceph.ceph-common : install distro or red hat storage ceph osd] ***********
fatal: [10.34.108.62]: FAILED! => {
    "changed": true, 
    "failed": true, 
    "rc": 1, 
    "results": [
    .... tons of lines skipped ...
    Package userspace-rcu-0.7.16-3.el7.x86_64.rpm is not signed

Affected version: f90a672

While Tendrl Package Installation Reference states:

Make sure to disable the gpgcheck if the keys are not setup properly.

I didn't disabled any additional gpgcheck feature for any ceph repo explicitly as I was not aware of any problems. My bad.

That said, we are not going to disable gpg check of any repositories until we understand what is wrong exactly.

tendrl-node-monitoring restart fails

ansible-playbook -i gce-hosts -u ajeffrey -b site.yml --limit @site.retry

RUNNING HANDLER [tendrl-storage-node : restart tendrl-node-monitoring] *********
fatal: [stg01]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service tendrl-node-monitoring: Job for tendrl-node-monitoring.service failed because start of the service was attempted too often. See "systemctl status tendrl-node-monitoring.service" and "journalctl -xe" for details.\nTo force a start use "systemctl reset-failed tendrl-node-monitoring.service" followed by "systemctl start tendrl-node-monitoring.service" again.\n"}

Task Install tendrl-apid dependency errors

I'm running tendrl-ansible to install tendrl-server on a dedicated VM and tendrl-node on three glusterfs storage servers. Unfortunately I'm getting dependency errors installing the tendrl-apid piece.
The task in question is Install tendrl-apid in install-tendrl.yml
The ansible error output is:

fatal: [ose-tendrl-001]: FAILED! => {"changed": true, "failed": true, "msg": "Error: Package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch (tendrl-tendrl)\n           Requires: rubygem-bundler >= 1.13.6\n           Installed: rubygem-bundler-1.7.8-3.el7.noarch (@base)\n               rubygem-bundler = 1.7.8-3.el7\nError: Package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch (tendrl-tendrl)\n           Requires: rubygem-mixlib-log >= 1.7.1\nError: Package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch (tendrl-tendrl)\n           Requires: rubygem-etcd\nError: Package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch (tendrl-tendrl)\n           Requires: rubygem-sinatra >= 1.4.5\nError: Package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch (tendrl-tendrl)\n           Requires: rubygem-tilt >= 1.4.1\nError: Package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch (tendrl-tendrl)\n           Requires: rubygem-puma >= 3.6.0\nError: Package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch (tendrl-tendrl)\n           Requires: rubygem-minitest >= 5.9.1\n           Available: rubygem-minitest-4.3.2-29.el7.noarch (base)\n               rubygem-minitest = 4.3.2-29.el7\n", "rc": 1, "results": ["Loaded plugins: fastestmirror\nLoading mirror speeds from cached hostfile\n * base: centos.serverspace.co.uk\n * epel: mirrors.coreix.net\n * extras: mirrors.vooservers.com\n * updates: centos.serverspace.co.uk\nResolving Dependencies\n--> Running transaction check\n---> Package tendrl-api.noarch 0:1.2.3-05_02_2017_19_43_13 will be installed\n--> Processing Dependency: rubygem-activemodel >= 4.2.6 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-activesupport >= 4.2.6 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-bcrypt >= 3.1.10 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-bundler >= 1.13.6 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-i18n >= 0.7.0 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-minitest >= 5.9.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-mixlib-log >= 1.7.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-puma >= 3.6.0 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-rack >= 1.6.4 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-rack-protection >= 1.5.3 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-rake >= 0.9.6 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-sinatra >= 1.4.5 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-thread_safe >= 0.3.5 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-tilt >= 1.4.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-tzinfo >= 1.2.2 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-etcd for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Running transaction check\n---> Package rubygem-activemodel.noarch 0:4.2.6-3.el7 will be installed\n--> Processing Dependency: rubygem(builder) for package: rubygem-activemodel-4.2.6-3.el7.noarch\n---> Package rubygem-activesupport.noarch 1:4.2.6-1.el7 will be installed\n---> Package rubygem-bcrypt.x86_64 0:3.1.10-6.el7 will be installed\n---> Package rubygem-i18n.noarch 0:0.7.0-3.el7 will be installed\n---> Package rubygem-rack.noarch 1:1.6.4-2.el7 will be installed\n---> Package rubygem-rack-protection.noarch 0:1.5.3-3.el7 will be installed\n---> Package rubygem-rake.noarch 0:0.9.6-29.el7 will be installed\n---> Package rubygem-thread_safe.noarch 0:0.3.5-2.el7 will be installed\n---> Package rubygem-tzinfo.noarch 0:1.2.2-3.el7 will be installed\n---> Package tendrl-api.noarch 0:1.2.3-05_02_2017_19_43_13 will be installed\n--> Processing Dependency: rubygem-bundler >= 1.13.6 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-minitest >= 5.9.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-mixlib-log >= 1.7.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-puma >= 3.6.0 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-sinatra >= 1.4.5 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-tilt >= 1.4.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-etcd for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Running transaction check\n---> Package rubygem-builder.noarch 0:3.1.4-3.el7 will be installed\n---> Package tendrl-api.noarch 0:1.2.3-05_02_2017_19_43_13 will be installed\n--> Processing Dependency: rubygem-bundler >= 1.13.6 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-minitest >= 5.9.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-mixlib-log >= 1.7.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-puma >= 3.6.0 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-sinatra >= 1.4.5 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-tilt >= 1.4.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Processing Dependency: rubygem-etcd for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch\n--> Finished Dependency Resolution\n You could try using --skip-broken to work around the problem\n You could try running: rpm -Va --nofiles --nodigest\n"]}

A manual attempt at installing the RPM gives an equivalent:

Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: centos.serverspace.co.uk
 * epel: mirrors.coreix.net
 * extras: mirrors.vooservers.com
 * updates: centos.serverspace.co.uk
Resolving Dependencies
--> Running transaction check
---> Package tendrl-api.noarch 0:1.2.3-05_02_2017_19_43_13 will be installed
--> Processing Dependency: rubygem-activemodel >= 4.2.6 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-activesupport >= 4.2.6 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-bcrypt >= 3.1.10 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-bundler >= 1.13.6 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-i18n >= 0.7.0 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-minitest >= 5.9.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-mixlib-log >= 1.7.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-puma >= 3.6.0 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-rack >= 1.6.4 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-rack-protection >= 1.5.3 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-rake >= 0.9.6 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-sinatra >= 1.4.5 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-thread_safe >= 0.3.5 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-tilt >= 1.4.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-tzinfo >= 1.2.2 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-etcd for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Running transaction check
---> Package rubygem-activemodel.noarch 0:4.2.6-3.el7 will be installed
--> Processing Dependency: rubygem(builder) for package: rubygem-activemodel-4.2.6-3.el7.noarch
---> Package rubygem-activesupport.noarch 1:4.2.6-1.el7 will be installed
---> Package rubygem-bcrypt.x86_64 0:3.1.10-6.el7 will be installed
---> Package rubygem-i18n.noarch 0:0.7.0-3.el7 will be installed
---> Package rubygem-rack.noarch 1:1.6.4-2.el7 will be installed
---> Package rubygem-rack-protection.noarch 0:1.5.3-3.el7 will be installed
---> Package rubygem-rake.noarch 0:0.9.6-29.el7 will be installed
---> Package rubygem-thread_safe.noarch 0:0.3.5-2.el7 will be installed
---> Package rubygem-tzinfo.noarch 0:1.2.2-3.el7 will be installed
---> Package tendrl-api.noarch 0:1.2.3-05_02_2017_19_43_13 will be installed
--> Processing Dependency: rubygem-bundler >= 1.13.6 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-minitest >= 5.9.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-mixlib-log >= 1.7.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-puma >= 3.6.0 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-sinatra >= 1.4.5 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-tilt >= 1.4.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-etcd for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Running transaction check
---> Package rubygem-builder.noarch 0:3.1.4-3.el7 will be installed
---> Package tendrl-api.noarch 0:1.2.3-05_02_2017_19_43_13 will be installed
--> Processing Dependency: rubygem-bundler >= 1.13.6 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-minitest >= 5.9.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-mixlib-log >= 1.7.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-puma >= 3.6.0 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-sinatra >= 1.4.5 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-tilt >= 1.4.1 for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Processing Dependency: rubygem-etcd for package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch
--> Finished Dependency Resolution
Error: Package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch (tendrl-tendrl)
           Requires: rubygem-bundler >= 1.13.6
           Installed: rubygem-bundler-1.7.8-3.el7.noarch (@base)
               rubygem-bundler = 1.7.8-3.el7
Error: Package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch (tendrl-tendrl)
           Requires: rubygem-mixlib-log >= 1.7.1
Error: Package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch (tendrl-tendrl)
           Requires: rubygem-etcd
Error: Package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch (tendrl-tendrl)
           Requires: rubygem-sinatra >= 1.4.5
Error: Package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch (tendrl-tendrl)
           Requires: rubygem-tilt >= 1.4.1
Error: Package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch (tendrl-tendrl)
           Requires: rubygem-puma >= 3.6.0
Error: Package: tendrl-api-1.2.3-05_02_2017_19_43_13.noarch (tendrl-tendrl)
           Requires: rubygem-minitest >= 5.9.1
           Available: rubygem-minitest-4.3.2-29.el7.noarch (base)
               rubygem-minitest = 4.3.2-29.el7
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest```

Any ideas what is wrong?

Add ansible-lint run into travis CI check

Add ansible-lint run into travis CI check, moreover consider reconfiguration of the CI job so that one can run it offline on local machine, without any dependency on travis (maybe via tox??).

Remove workaround playbook for disabling iptables

Description

To remove workaround playbook which disables iptables completely, we need to include firewall setup into tendrl-ansible.

References

Documentation how to configure firewall are available now: https://github.com/Tendrl/documentation/wiki/Tendrl-firewall-settings (link added on Nov 22)

Approach

I propose to:

  • tendrl-ansible to open ports for tendrl-components only, gluster ports (or anything else) is out of scope now
  • make firewall setup part of both tendrl-server and tendrl-storage-node roles
  • firewall will be configured using firewalld service files, provided by particular tendrl components
  • when firewalld service is not up and running, stop the playbook(via assert) and ask for manual intervention
  • provide way to run tendrl-ansible without reconfiguration of firewall

Why?

Since we can't allow to enable or disable firewall during Tendrl installation, when firewalld is not running, we can't touch firewall configuration at all. When tendrl-ansible detects this, assert will stop the playbook immediately and ask user to resolve the situation:

  • one could enable firewalld and configure it to open all ports required for gluster and anything else admin requires to work, and then rerun tendrl-ansible
  • one can decide to use iptables tool instead, or any other way to push firewall rules into kernel, in which case one will disable firewall reconfiguration in tendrl-ansible, so that one can take full responsibility for firewall setup. In this mode, tendrl-ansible will not touch firewall at all, assuming it's enabled and configured (this way, one can also decide to not use firewall at all, which is not suggested, but possible)

tendrl-ansible will automate only single way to configure fiewall, which is firewalld. Automating multiple approaches (eg. both iptables and firewalld) is not reasonable, would require additional maintenance work and multiply testing efforts.

Moreover, by using firewalld service files when possible, we can simplify maintenance of firewalld configuration, which will be stored in the repository of the component, and change of port would not require update of tendrl-ansible. I realize that his is not always possible (eg. for etcd we don't directly control), but this approach should be strongly preferred.

Implement a clean-up option

The following steps need to be executed:

  • Stop all tendrl components.
  • Stop etcd.
  • Empty the etcd data directory on all etcd nodes.
  • Start etcd and verify that it has started with no keys present.
  • Start the tendrl node agents on all nodes.
  • Start all the other tendrl components.

RPM builds required

Please add rpm spec file and any other files (makefile etc) required for enabling copr (master branch and release branch) rpm builds for tendrl-ansible

Implement cluster unmanage feature via ansible

This is a proposal for implementing cluster unmanage feature via tendrl-ansible. This feature consists of two one step, which we could consider to perform on it's own (for dev/testing purpose):

  • remove tendrl components from the storage machines without interference with storage system
  • delete the cluster from tendrl central store, so that tendrl is no longer aware of it (feature removed based on #50 (comment))

FAQ

How it will be implemented? Most likely it would be a playbook for people to tweak and reuse.

Will this be officially tested and supported? No, qe team declined testing of this feature. It is something targeted to dev, qe and knowledgeable admins who are on their own.

Wouldn't it be better to implement something like this in tendrl directly? Yes, it would, but this feature is not going to be implemented in tendrl in a near future, and we may still find this useful to be implemented here instead.

Questions

There is new description of tendrl uninstallation in https://github.com/Tendrl/documentation/wiki/Tendrl-release-v1.5.4-(install-guide)#uninstall-tendrl

To reliably implement this feature, we need to know exactly:

  • which services, in which order to stop
  • which packages to uninstall
  • which directories to remove (eg. leftover cache, or database with previous state)
  • clarify step when we say to optionaly backup and then delete etcd data (how to restore the backup after tendrl reinstall?)

Use import_playbook instead of include

Ansible in version 2.4 consider include statement as deprecated and suggest to to use import_playbook instead.

# ansible-playbook -i inventory.hosts site.yml 
[DEPRECATION WARNING]: 'include' for playbook includes. You should use 'import_playbook' instead. This feature will be removed in version 2.8. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: The use of 'include' for tasks has been deprecated. Use 'import_tasks' for static inclusions or 'include_tasks' for dynamic inclusions. This feature will be removed in a future release. Deprecation warnings can be
 disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: include is kept for backwards compatibility but usage is discouraged. The module documentation details page may explain more about this rationale.. This feature will be removed in a future release. Deprecation 
warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.

<<truncated>>

tendrl-node-monitoring error on restart

I received this error on running site.yml with hard coded variables for etcd/api
ansible-playbook -i gce-hosts -u ajeffrey -b site.yml --limit @site.retry

RUNNING HANDLER [tendrl-storage-node : restart tendrl-node-monitoring] *********
fatal: [stg03]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service tendrl-node-monitoring: Job for tendrl-node-monitoring.service failed because start of the service was attempted too often. See "systemctl status tendrl-node-monitoring.service" and "journalctl -xe" for details.\nTo force a start use "systemctl reset-failed tendrl-node-monitoring.service" followed by "systemctl start tendrl-node-monitoring.service" again.\n"}
fatal: [stg02]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service tendrl-node-monitoring: A dependency job for tendrl-node-monitoring.service failed. See 'journalctl -xe' for details.\n"}

Implement SSL configuration for Tendrl [blocked]

tendrl-ansible is expected to handle deployment of ssl as described in Tendrl/api#264

  • explore the default cert approach
  • allow admin to provide their own ssl certificate
  • implement reconfiguration, or disabling ssl setup
  • implement the grafana related setup (see Issues section below)
  • documentation update

Questions about related changes

Unification related to non ssl setup:

  • In tendrl-ssl.conf we setup ip address of apache virtual host, but in non ssl setup we just seem to listed on all interfaces. Would it make sense to unify this and set the ip address there as well?
    • actually @dahorak suggests to not specify ip address for ssl setup: #46 (comment)
    • but using * alone breaks the setup (tendrl ui is not reachable)
    • we can consider specifying ip address for non ssl setup, but that would make such setup more complicated

Questions to figure out:

  • Should I validate that lookup('dig', httpd_server_name) == httpd_ip_address ? Probably not.
    • Moreover if @dahorak 's suggestion to drop ip address in virtual hosts is used, this check would not be needed
  • Is reconfiguration (eg. turning ssl on and off) required? Yes.
  • Shipping the default config as an ansible template would may be easier wrt ansible, but it would hide the configuration away from both developers and admin/users of tendrl, moreover it would make manual tweaks harder. For these reason, we will keep the sample ssl config in tendrl-api-httpd package.

Issues blocking merging of this feature

  • grafana web works over ssl as well, reported as Tendrl/api#303
  • IP address is not present in url of redirect requests: Tendrl/api#302
  • In tendrl-ssl.conf we specify ServerName as fqdn, but in non ssl setup, we just leave this with default tendrl value, see: Tendrl/api#217 This should be unified.
  • tendrl web - grafana authentication

Enable rpmlint check in Travis CI

Now that we have a specfile in the repo (based on current Tendrl approach), it would be nice to run rpmlint in the Travis CI job to prevent introducing bugs in the rpm spec file.

Since rpmlint deb package is not available in LTS Ubuntu Travis CI provides, we can't enable this check outright. We would either need to build our own deb package in PPA and enable it in the travis job, or check if it's possible to use Fedora instead (this I would like much more, but I have no idea how real this feat is ...).

User does not need to configure any tags in Tendrl configuration

Role "tendrl-storage-node" contains steps to add "provisioner/gluster" tag for tendrl-node-agents on Gluster storage nodes.

details: https://github.com/Tendrl/tendrl-ansible/blob/master/roles/tendrl-storage-node/tasks/tendrl-node-agent.yml#L21

From tendrl-release 1.5.1 [0], configuration of above tag will not be required since it is automatically done during ImportCluster API call by tendrl backend.

Please remove "Add provisioner/gluster tag into node-agent.conf.yaml" from above playbook

[0] : https://github.com/Tendrl/documentation/wiki/Tendrl-release-v1.5.1-(install-guide)

Tendrl-API fails to be set up properly

After running the Ansible scripts, tendrl-api is not set up properly and returns:

# curl localhost:9292/1.0/ping
<h1>Internal Server Error</h1>

inside of the console I can see a stack trace from Puma:

Etcd::KeyNotFound - Key not found:
	/usr/share/gems/gems/etcd-0.3.0/lib/etcd/client.rb:141:in `process_http_request'
	/usr/share/gems/gems/etcd-0.3.0/lib/etcd/client.rb:114:in `api_execute'
	/usr/share/gems/gems/etcd-0.3.0/lib/etcd/keys.rb:22:in `get'
	/usr/share/tendrl-api/node.rb:6:in `block in <class:Node>'   <--- Watch this
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:1603:in `call'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:1603:in `block in compile!'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:1006:in `[]'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:1006:in `block in process_route'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:1004:in `catch'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:1004:in `process_route'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:957:in `block in filter!'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:957:in `each'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:957:in `filter!'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:1075:in `block in dispatch!'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in `block in invoke'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in `catch'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in `invoke'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:1073:in `dispatch!'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:in `block in call!'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in `block in invoke'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in `catch'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in `invoke'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:in `call!'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:886:in `call'
	/usr/share/gems/gems/rack-protection-1.5.3/lib/rack/protection/xss_header.rb:18:in `call'
	/usr/share/gems/gems/rack-protection-1.5.3/lib/rack/protection/path_traversal.rb:16:in `call'
	/usr/share/gems/gems/rack-protection-1.5.3/lib/rack/protection/json_csrf.rb:18:in `call'
	/usr/share/gems/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in `call'
	/usr/share/gems/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in `call'
	/usr/share/gems/gems/rack-protection-1.5.3/lib/rack/protection/frame_options.rb:31:in `call'
	/usr/share/gems/gems/rack-1.6.4/lib/rack/nulllogger.rb:9:in `call'
	/usr/share/gems/gems/rack-1.6.4/lib/rack/head.rb:13:in `call'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:180:in `call'
	/usr/share/gems/gems/sinatra-1.4.5/lib/sinatra/base.rb:2014:in `call'
	/usr/share/gems/gems/rack-1.6.4/lib/rack/urlmap.rb:66:in `block in call'
	/usr/share/gems/gems/rack-1.6.4/lib/rack/urlmap.rb:50:in `each'
	/usr/share/gems/gems/rack-1.6.4/lib/rack/urlmap.rb:50:in `call'
	/usr/share/gems/gems/puma-3.6.0/lib/puma/configuration.rb:225:in `call'
	/usr/share/gems/gems/puma-3.6.0/lib/puma/server.rb:578:in `handle_request'
	/usr/share/gems/gems/puma-3.6.0/lib/puma/server.rb:415:in `process_client'
	/usr/share/gems/gems/puma-3.6.0/lib/puma/server.rb:275:in `block in run'
	/usr/share/gems/gems/puma-3.6.0/lib/puma/thread_pool.rb:116:in `call'
	/usr/share/gems/gems/puma-3.6.0/lib/puma/thread_pool.rb:116:in `block in spawn_thread'

Looking at https://github.com/Tendrl/api/blob/master/node.rb#L6 it appears like there is a call to get a key, which is not yet present in the etcd.
Not even the rake seed task is creating this key:
https://github.com/Tendrl/api/blob/master/Rakefile#L12

Update tendrl-ansible for Tendrl Milestone #3 (2nd milestone)

Tendrl ansible needs to be updated for https://github.com/Tendrl/specifications/milestone/3

Related specifications (I went through all items listed in the milestone and searched for configuration, setup and documentation impact, here I'm tracking changes in tendrl-ansible based on that, and questions when the impact is not clear):

See also Install Guide:

Known Problems (making tendrl-ansible based installation fail):

Update packages produces unfinished transactions error

again - in GCE env

japplewh-OSX:tendrl-ansible japplewh$ ansible-playbook -i gce-inventory -b site.yml

produces..

TASK [Update all installed packages] *******************************************
ok: [tendrl]
fatal: [stg-nodes-group-c3n7]: FAILED! => {"changed": true, "failed": true, "msg": "There are unfinished transactions remaining. You might consider running yum-complete-transaction, or "yum-complete-transaction --cleanup-only" and "yum history redo last", first to finish them. If those don't work you'll have to try removing/installing packages by hand (maybe package-cleanup can help).\nThe program yum-complete-transaction is found in the yum-utils package.\nError: Multilib version problems found. This often means that the root\n cause is something else and multilib version checking is just\n pointing out that there is a problem. Eg.:\n \n 1. You have an upgrade for glibc which is missing some\n dependency that another package requires. Yum is trying to\n solve this by installing an older version of glibc of the\n different architecture. If you exclude the bad architecture\n yum will tell you what the root cause is (which package\n requires what). You can try redoing the upgrade with\n --exclude glibc.otherarch ... this should give you an error\n message showing the root cause of the problem.\n \n 2. You have multiple architectures of glibc installed, but\n yum can only see an upgrade for one of those architectures.\n If you don't want/need both architectures anymore then you\n can remove the one with the missing update and everything\n will work.\n \n 3. You have duplicate versions of glibc installed already.\n You can use "yum check" to get yum show these errors.\n \n ...you can also use --setopt=protected_multilib=false to remove\n this checking, however this is almost never the correct thing to\n do as something else is very likely to go wrong (often causing\n much more problems).\n \n Protected multilib versions: glibc-2.17-157.el7_3.2.x86_64 != glibc-2.17-157.el7_3.1.i686\n", "rc": 1, "results": ["Loaded plugins: fastestmirror\nLoading mirror speeds from cached hostfile\n * base: repos.forethought.net\n * epel: mirror.steadfast.net\n * extras: centos.mbni.med.umich.edu\n * updates: mirror.cisp.com\nResolving Dependencies\n--> Running transaction check\n---> Package NetworkManager.x86_64 1:1.4.0-19.el7_3 will be updated\n---> Package NetworkManager.x86_64 1:1.4.0-20.el7_3 will be an update\n---> Package NetworkManager-adsl.x86_64 1:1.4.0-19.el7_3 will be updated\n---> Package NetworkManager-adsl.x86_64 1:1.4.0-20.el7_3 will be an update\n---> Package NetworkManager-bluetooth.x86_64 1:1.4.0-19.el7_3 will be updated\n---> Package NetworkManager-bluetooth.x86_64 1:1.4.0-20.el7_3 will be an update\n---> Package NetworkManager-glib.x86_64 1:1.4.0-19.el7_3 will be updated\n---> Package NetworkManager-glib.x86_64 1:1.4.0-20.el7_3 will be an update\n---> Package NetworkManager-libnm.x86_64 1:1.4.0-19.el7_3 will be updated\n---> Package NetworkManager-libnm.x86_64 1:1.4.0-20.el7_3 will be an update\n---> Package NetworkManager-team.x86_64 1:1.4.0-19.el7_3 will be updated\n---> Package NetworkManager-team.x86_64 1:1.4.0-20.el7_3 will be an update\n---> Package NetworkManager-tui.x86_64 1:1.4.0-19.el7_3 will be updated\n---> Package NetworkManager-tui.x86_64 1:1.4.0-20.el7_3 will be an update\n---> Package NetworkManager-wifi.x86_64 1:1.4.0-19.el7_3 will be updated\n---> Package NetworkManager-wifi.x86_64 1:1.4.0-20.el7_3 will be an update\n---> Package NetworkManager-wwan.x86_64 1:1.4.0-19.el7_3 will be updated\n---> Package NetworkManager-wwan.x86_64 1:1.4.0-20.el7_3 will be an update\n---> Package ca-certificates.noarch 0:2017.2.11-70.1.el7_3 will be updated\n---> Package ca-certificates.noarch 0:2017.2.14-70.1.el7_3 will be an update\n---> Package dracut.x86_64 0:033-463.el7 will be updated\n---> Package dracut.x86_64 0:033-463.el7_3.1 will be an update\n---> Package dracut-config-rescue.x86_64 0:033-463.el7 will be updated\n---> Package dracut-config-rescue.x86_64 0:033-463.el7_3.1 will be an update\n---> Package firewalld.noarch 0:0.4.3.2-8.1.el7_3.2 will be updated\n---> Package firewalld.noarch 0:0.4.3.2-8.1.el7_3.3 will be an update\n---> Package firewalld-filesystem.noarch 0:0.4.3.2-8.1.el7_3.2 will be updated\n---> Package firewalld-filesystem.noarch 0:0.4.3.2-8.1.el7_3.3 will be an update\n---> Package glibc.x86_64 0:2.17-157.el7_3.1 will be updated\n--> Processing Dependency: glibc = 2.17-157.el7_3.1 for package: glibc-common-2.17-157.el7_3.1.x86_64\n---> Package glibc.x86_64 0:2.17-157.el7_3.2 will be an update\n---> Package google-cloud-sdk.noarch 0:155.0.0-1.el7 will be updated\n---> Package google-cloud-sdk.noarch 0:159.0.0-1.el7 will be an update\n---> Package google-compute-engine.noarch 0:2.3.7-0.1495559176.el7 will be updated\n---> Package google-compute-engine.noarch 0:2.4.0-0.1497053016.el7 will be an update\n---> Package kernel.x86_64 0:3.10.0-514.21.1.el7 will be installed\n---> Package kernel-tools.x86_64 0:3.10.0-514.16.1.el7 will be updated\n---> Package kernel-tools.x86_64 0:3.10.0-514.21.1.el7 will be an update\n---> Package kernel-tools-libs.x86_64 0:3.10.0-514.16.1.el7 will be updated\n---> Package kernel-tools-libs.x86_64 0:3.10.0-514.21.1.el7 will be an update\n---> Package kpartx.x86_64 0:0.4.9-99.el7_3.1 will be updated\n---> Package kpartx.x86_64 0:0.4.9-99.el7_3.3 will be an update\n---> Package libgudev1.x86_64 0:219-30.el7_3.8 will be updated\n---> Package libgudev1.x86_64 0:219-30.el7_3.9 will be an update\n---> Package libnetfilter_conntrack.x86_64 0:1.0.4-2.el7 will be updated\n---> Package libnetfilter_conntrack.x86_64 0:1.0.6-1.el7_3 will be an update\n---> Package nss.x86_64 0:3.28.4-1.0.el7_3 will be updated\n---> Package nss.x86_64 0:3.28.4-1.2.el7_3 will be an update\n---> Package nss-sysinit.x86_64 0:3.28.4-1.0.el7_3 will be updated\n---> Package nss-sysinit.x86_64 0:3.28.4-1.2.el7_3 will be an update\n---> Package nss-tools.x86_64 0:3.28.4-1.0.el7_3 will be updated\n---> Package nss-tools.x86_64 0:3.28.4-1.2.el7_3 will be an update\n---> Package polkit.x86_64 0:0.112-11.el7_3 will be updated\n---> Package polkit.x86_64 0:0.112-12.el7_3 will be an update\n---> Package python-firewall.noarch 0:0.4.3.2-8.1.el7_3.2 will be updated\n---> Package python-firewall.noarch 0:0.4.3.2-8.1.el7_3.3 will be an update\n---> Package python-perf.x86_64 0:3.10.0-514.16.1.el7 will be updated\n---> Package python-perf.x86_64 0:3.10.0-514.21.1.el7 will be an update\n---> Package sudo.x86_64 0:1.8.6p7-21.el7_3 will be updated\n---> Package sudo.x86_64 0:1.8.6p7-22.el7_3 will be an update\n---> Package systemd.x86_64 0:219-30.el7_3.8 will be updated\n---> Package systemd.x86_64 0:219-30.el7_3.9 will be an update\n---> Package systemd-libs.x86_64 0:219-30.el7_3.8 will be updated\n---> Package systemd-libs.x86_64 0:219-30.el7_3.9 will be an update\n---> Package systemd-sysv.x86_64 0:219-30.el7_3.8 will be updated\n---> Package systemd-sysv.x86_64 0:219-30.el7_3.9 will be an update\n---> Package tuned.noarch 0:2.7.1-3.el7_3.1 will be updated\n---> Package tuned.noarch 0:2.7.1-3.el7_3.2 will be an update\n--> Running transaction check\n---> Package glibc.i686 0:2.17-157.el7_3.1 will be installed\n--> Processing Dependency: libfreebl3.so(NSSRAWHASH_3.12.3) for package: glibc-2.17-157.el7_3.1.i686\n--> Processing Dependency: libfreebl3.so for package: glibc-2.17-157.el7_3.1.i686\n---> Package glibc.x86_64 0:2.17-157.el7_3.1 will be updated\n--> Running transaction check\n---> Package nss-softokn-freebl.i686 0:3.16.2.3-14.4.el7 will be installed\n--> Finished Dependency Resolution\n"]}

verify that tendrl-ansible is aligned with upstream documentation for particular release

We need to verify that tendrl-ansible matches the instructions in upstream installation documentation exactly.

Reasoning

While tendrl-ansible is an official way to install Tendrl, it's not a mandatory installation method and all steps automated in ansible roles and playbooks there should be documented in upstream installation documentation as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.