nlnog / ring-ansible Goto Github PK

View Code? Open in Web Editor NEW

22.0 13.0 122.0 12.2 MB

RING Ansible playbooks & friends

Python 64.92% Shell 10.47% Ruby 5.02% Perl 3.33% Jinja 16.26%

ring-ansible's People

Contributors

Stargazers

Watchers

Forkers

pieterlexis grizz lochiiconnectivity sndrsmnk kralian dotwaffle tigeli alarig casdr leoluk coxalld mrxermon lucenildog8 barajus lukastribus patoh rico29 whiterat-8607 masterwayz skylonhost mjonkersdatabarn techfutures janchrillesen kalpak213326 openfactory-ch securebitag cryzeck ipfail amirovh rubengarbade skorpy2009 dirtycache timstallard flieslikeabrick herbetom vmcity isodude knutasyed avermeer-tc peter-potvin job jamesditrapani nullroutery raidandfade ichilton eb0815 outout14 samip5 natesales crami inoxth icez serverionnl ichdasich andrewnyr elektronikerdev accuris-technologies-ltd rickbakkr mzi77 rkrieger netravnen as204151 tuxis-ie jskoetsier spheron1 kruisdraad sbadia lamehost athompson-merlin tombuyvoets georgetasioulis jviersel ewpratten ap-foundation gondelach chrismacnaughton fabrizziop cloudiellc billyriantono tosunkaya bentogoa sunaihui cr0gers dylanjamesdev cyboerg42 yoitszigi ddominet fidelismaia mtrimarchi sia-com xyphen-it gshapiro accuris-infrastructure burritowrapped dam64 florisvdk kdkd user1841 ebais nickbouwhuis

ring-ansible's Issues

Faster ssh user key distribution

SSH user keys are currently distributed via Ansible, which means they are only updated once every two hours. We'd like to bring this down to minutes.

Possible option would be to use https://jpmens.net/2019/03/02/sshd-and-authorizedkeyscommand/ to check for new keys on Github when a key is not locally available. This would need some rate-limiting to not bombard Github with requests for every login.

transparency in node health/status information and state changes

While looking into how node liveness is determined (API's report of alive_ipv4 and alive_ipv6), I found myself wanting to be able to estimate the age of the health data for a system and understand if the current API response is reflective of the system health, or if a change is likely pending in the next 24 hours (next ring-admin run). One or more of the following would be helpful:

On a system, store more than just the latest status.json. This can be helpful for any user on the node to tell whether something on the system is intermittently unhealthy such as IPv4 or IPv6 connectivity. ring-health is run every 60 minutes from cron, but only the latest output is stored at /var/www/ring/status.json. Adding more history stored on the node would allow for investigation into what data may have changed since the last report to the API.
In the API, perhaps add a method and route to access the data from the health table? Speaking of, can someone provide the schema for the health table or add it to the SCHEMA in ring-admin?

With a bit of support, I can begin work on a PR for one or both of the items listed above, which could then enable research into some other contributions.

ring-sqa upstart / systemd scripts

ansible can't enable the ring-sqa service on a fresh xenial install

TASK [ring_sqa : create /etc/ring-sqa/main.conf] *******************************
changed: [casablanca01.ring.nlnog.net]

TASK [ring_sqa : create /etc/ring-sqa/hosts.conf] ******************************
changed: [casablanca01.ring.nlnog.net]

TASK [ring_sqa : create /etc/init/ring-sqa4.conf] ******************************
changed: [casablanca01.ring.nlnog.net]

TASK [ring_sqa : create /etc/init/ring-sqa6.conf] ******************************
changed: [casablanca01.ring.nlnog.net]

TASK [ring_sqa : file] *********************************************************
changed: [casablanca01.ring.nlnog.net]

TASK [ring_sqa : file] *********************************************************
changed: [casablanca01.ring.nlnog.net]

TASK [ring_sqa : service] ******************************************************
fatal: [casablanca01.ring.nlnog.net]: FAILED! => {"changed": false, "failed": true, "msg": "Error when trying to enable ring-sqa4: rc=1 ring-sqa4.service is not a native service, redirecting to systemd-sysv-install\nExecuting /lib/systemd/systemd-sysv-install enable ring-sqa4\nThe script you are attempting to invoke has been converted to an Upstart\njob, but lsb-header is not supported for Upstart jobs.\ninsserv: warning: script 'ring-sqa4' missing LSB tags and overrides\ninsserv: Default-Start undefined, assuming empty start runlevel(s) for script `ring-sqa4'\ninsserv: Default-Stop  undefined, assuming empty stop  runlevel(s) for script `ring-sqa4'\nThe script you are attempting to invoke has been converted to an Upstart\njob, but lsb-header is not supported for Upstart jobs.\ninsserv: warning: script 'ring-sqa6' missing LSB tags and overrides\ninsserv: Default-Start undefined, assuming empty start runlevel(s) for script `ring-sqa6'\ninsserv: Default-Stop  undefined, assuming empty stop  runlevel(s) for script `ring-sqa6'\nupdate-rc.d: error: ring-sqa4 Default-Start contains no runlevels, aborting.\n"}

NO MORE HOSTS LEFT *************************************************************

RUNNING HANDLER [ring_sqa : restart ring-sqa4] *********************************

RUNNING HANDLER [ring_sqa : restart ring-sqa6] *********************************
        to retry, use: --limit @/etc/ansible/ring/playbook.retry

Expose ansible and ring-health status

Expose status of last Ansible run and output of ring-health centrally on api.ring.nlnog.net

put zrbackup scripts in ansible

This needs to move to ansible https://github.com/NLNOG/ring-puppet/blob/master/modules/backup/manifests/init.pp

Migrate website away from Wordpress

Replace wordpress with something lightweight.

Requirements:

Migrate current static content
Dynamic page generation from ansible or ring db (news updates, participant list)
Application form for new participants

Fix RPSL updates for AS199036

These are currently not active.

Old script: https://github.com/NLNOG/ring-ansible/blob/master/roles/bird/tools/create-rpsl-as199036.py

Graphite replacement

To process results of https://github.com/NLNOG/ring-ansible/tree/master/roles/ringfpingd

upgrade Unbound for long CNAME chains

As discussed on #ring: It would be nice to update Unbound on ring node so NLnetLabs/unbound@8878680 is supported.

This helps to resolve very long chains of CNAMES, which seem to become more popular on some hyperscalers.

The problem seems to be resolved in Unbound 1.13.2rc1 (and newer). Unfortunately, this version is not available in Ubuntu Jammy, and not in Ubuntu backports (https://packages.ubuntu.com/jammy-backports/allpackages) either. Debian backports/unstable seems to have it though.

This means we would have to use custom built packages.

merge upgrade_prep_disable upcoming Sunday 23:59 UTC

@grizz

merge "upgrade_prep_disable" branch upcoming Sunday at 23:59 UTC

BGP role

Hello,

on the LG, it seems that the Only To Customer (OTC) flag is always set for every prefix across all the peers.
Personally I believe that such piece of information doesn't really provide great value to the overall information offered by the tool, while on the other hand might lead to confusion and misunderstandings.

As far as I understand by looking at the code of the LG and the configuration of the OpenBGPD route collector, this seems not due to a bug in the LG, but rather to the presence of role customer in the config of bgpd.

With that config knob set, the route collector would behave as per the following in most cases:

For backward compatibility, if the BGP Role Capability is sent but one is not received, the BGP Speaker SHOULD ignore the absence of the BGP Role Capability and proceed with session establishment. The locally configured BGP Role is used for the procedures described in Section 5.

Being the local role "customer", the route collector would expect the peer to be a "provider", so the procedure from Section 5 that would be followed is this one:

The following ingress procedure applies to the processing of the OTC Attribute on route receipt:
...
3. If a route is received from a Provider, a Peer, or an RS and the OTC Attribute is not present, then it MUST be added with a value equal to the AS number of the remote AS.

I wonder whether making the route collector transparent to RFC9234 would be a better choice. In that situation, the looking glass would only show OTC attributes which are set downstream and passed along the path.

RFC9234: https://datatracker.ietf.org/doc/html/rfc9234

Find alternative for generating LOC records

ring-pdns uses Open Elevation to generate LOC records. The public API is currently down and may not return soon: Jorl17/open-elevation#29 (comment)

Infrastructure monitoring

Some kind of monitoring system that sends mails when ring infrastructure servers or services are down. Monitoring of hosts and services should be automatically configured when they are added to ansible.

remove stuff after upgrade

when we are done with next week's project, clean up files like:

/root/upgrade-12-16

reloading systemd files

fatal: [casablanca01.ring.nlnog.net]: FAILED! => {"changed": false, "failed": true, "msg": "Warning: ring-sqa4.service changed on disk. Run 'systemctl daemon-reload' to reload unit
s.\nJob for ring-sqa4.service failed because the control process exited with error code. See \"systemctl status ring-sqa4.service\" and \"journalctl -xe\" for details.\n"}

is this approach appropiate? https://lookonmyworks.co.uk/2015/06/24/ansible-systemctl-daemon-reload/

PermitRootLogin without-password is deprecated

ring-ansible/roles/openssh/files/sshd_config.ringnode

Line 15 in 78d92bd

PermitRootLogin without-password

Please consider updating this configuration setting to:

PermitRootLogin prohibit-password

see man sshd_config

[RFC] Helper script for disabling Ansible

Build a ansible-disable script that creates a persistent lock file. The script takes a required parameter which describes the reason for disabling and records the value of $USER or $SUDO_USER (for root).

$ ansible-disable "Working on a local branch"
/run/ring-ansible.disable lock created (expires in 8 days)

$ cat /run/ring-ansible.disable
Working on local branch

Wed, 02 Jan 2019 20:00:38 +0000 by leopold

FR: Upgrade the RING looking glass to Bird2 era

I strongly believe that the ring LG should be upgraded to Bird2 as that would allow for MP-BGP aka IPv4/IPv6 route exchanges in one session.

[DEPRECATION WARNING]: Instead of sudo/sudo_user, use become/become_user and make sure become_method is 'sudo' (default).

root@20c01:~# /usr/bin/ansible-pull --full -d /etc/ansible/ring -U https://github.com/NLNOG/ring-ansible.git --vault-password-file=/root/.vaultpw -i nodes -l 20c01.ring.nlnog.net -c local playbook.yml
Starting Ansible Pull at 2017-04-04 18:19:09
/usr/bin/ansible-pull --full -d /etc/ansible/ring -U https://github.com/NLNOG/ring-ansible.git --vault-password-file=/root/.vaultpw -i nodes -l 20c01.ring.nlnog.net -c local playbook.yml

[DEPRECATION WARNING]: Instead of sudo/sudo_user, use become/become_user and make sure become_method is 'sudo' (default).
This feature will be removed in a future release.
Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.

59 23 * * 0 systemctl ring-sqa4 stop
59 23 * * 0 systemctl ring-sqa6 stop
59 23 * * 0 systemctl ring-sqa4 disable
59 23 * * 0 systemctl ring-sqa6 disable

Sign host key with a CA pubkey

Suggestion mentioned on IRC:

have all the host keys signed so that we can just approve the ca pubkey and be confident connecting to ring nodes without host key prompts