Code Monkey home page Code Monkey logo

ring-ansible's People

Contributors

alarig avatar arjenz avatar cloudiellc avatar cozonac avatar ddominet avatar grizz avatar hameno avatar ichilton avatar ipfail avatar jamuanfc avatar job avatar knutasyed avatar leoluk avatar lochiiconnectivity avatar lostepic avatar lyphiard avatar mjonkersdatabarn avatar mrxermon avatar noahkemm avatar peter-potvin avatar pieterlexis avatar ringforger avatar rodecker avatar securebitag avatar skorpy2009 avatar sndrsmnk avatar teunvink avatar tombuyvoets avatar toreanderson avatar user1841 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ring-ansible's Issues

transparency in node health/status information and state changes

While looking into how node liveness is determined (API's report of alive_ipv4 and alive_ipv6), I found myself wanting to be able to estimate the age of the health data for a system and understand if the current API response is reflective of the system health, or if a change is likely pending in the next 24 hours (next ring-admin run). One or more of the following would be helpful:

  • On a system, store more than just the latest status.json. This can be helpful for any user on the node to tell whether something on the system is intermittently unhealthy such as IPv4 or IPv6 connectivity. ring-health is run every 60 minutes from cron, but only the latest output is stored at /var/www/ring/status.json. Adding more history stored on the node would allow for investigation into what data may have changed since the last report to the API.
  • In the API, perhaps add a method and route to access the data from the health table? Speaking of, can someone provide the schema for the health table or add it to the SCHEMA in ring-admin?

With a bit of support, I can begin work on a PR for one or both of the items listed above, which could then enable research into some other contributions.

Other questions:

  • The logic in ring-admin to update alive_v4 and alive_v6 in the machines table is contained in ansible_process(). What cron job or other trigger results in ansible_process() being called? The closest cronjob I see is for purge machines however that appears to be a cleanup rather than calling ansible_process()
  • ring-admin will skip marking nodes as dead_v4/dead_v6 if more than 10 are detected down in a single run. Is there any visible report of this occurring or does this go to /dev/null? The concern would be if 10+ nodes legitimately fail in a single run (however often ansible_process() is run), that ring-admin will never be able to catch up on subsequent changes in state unless enough machines recover

ring-sqa upstart / systemd scripts

ansible can't enable the ring-sqa service on a fresh xenial install

TASK [ring_sqa : create /etc/ring-sqa/main.conf] *******************************
changed: [casablanca01.ring.nlnog.net]

TASK [ring_sqa : create /etc/ring-sqa/hosts.conf] ******************************
changed: [casablanca01.ring.nlnog.net]

TASK [ring_sqa : create /etc/init/ring-sqa4.conf] ******************************
changed: [casablanca01.ring.nlnog.net]

TASK [ring_sqa : create /etc/init/ring-sqa6.conf] ******************************
changed: [casablanca01.ring.nlnog.net]

TASK [ring_sqa : file] *********************************************************
changed: [casablanca01.ring.nlnog.net]

TASK [ring_sqa : file] *********************************************************
changed: [casablanca01.ring.nlnog.net]

TASK [ring_sqa : service] ******************************************************
fatal: [casablanca01.ring.nlnog.net]: FAILED! => {"changed": false, "failed": true, "msg": "Error when trying to enable ring-sqa4: rc=1 ring-sqa4.service is not a native service, redirecting to systemd-sysv-install\nExecuting /lib/systemd/systemd-sysv-install enable ring-sqa4\nThe script you are attempting to invoke has been converted to an Upstart\njob, but lsb-header is not supported for Upstart jobs.\ninsserv: warning: script 'ring-sqa4' missing LSB tags and overrides\ninsserv: Default-Start undefined, assuming empty start runlevel(s) for script `ring-sqa4'\ninsserv: Default-Stop  undefined, assuming empty stop  runlevel(s) for script `ring-sqa4'\nThe script you are attempting to invoke has been converted to an Upstart\njob, but lsb-header is not supported for Upstart jobs.\ninsserv: warning: script 'ring-sqa6' missing LSB tags and overrides\ninsserv: Default-Start undefined, assuming empty start runlevel(s) for script `ring-sqa6'\ninsserv: Default-Stop  undefined, assuming empty stop  runlevel(s) for script `ring-sqa6'\nupdate-rc.d: error: ring-sqa4 Default-Start contains no runlevels, aborting.\n"}

NO MORE HOSTS LEFT *************************************************************

RUNNING HANDLER [ring_sqa : restart ring-sqa4] *********************************

RUNNING HANDLER [ring_sqa : restart ring-sqa6] *********************************
        to retry, use: --limit @/etc/ansible/ring/playbook.retry

Migrate website away from Wordpress

Replace wordpress with something lightweight.

Requirements:

  • Migrate current static content
  • Dynamic page generation from ansible or ring db (news updates, participant list)
  • Application form for new participants

upgrade Unbound for long CNAME chains

As discussed on #ring: It would be nice to update Unbound on ring node so NLnetLabs/unbound@8878680 is supported.

This helps to resolve very long chains of CNAMES, which seem to become more popular on some hyperscalers.

The problem seems to be resolved in Unbound 1.13.2rc1 (and newer). Unfortunately, this version is not available in Ubuntu Jammy, and not in Ubuntu backports (https://packages.ubuntu.com/jammy-backports/allpackages) either. Debian backports/unstable seems to have it though.

This means we would have to use custom built packages.

BGP role

Hello,

on the LG, it seems that the Only To Customer (OTC) flag is always set for every prefix across all the peers.
Personally I believe that such piece of information doesn't really provide great value to the overall information offered by the tool, while on the other hand might lead to confusion and misunderstandings.

As far as I understand by looking at the code of the LG and the configuration of the OpenBGPD route collector, this seems not due to a bug in the LG, but rather to the presence of role customer in the config of bgpd.

With that config knob set, the route collector would behave as per the following in most cases:

For backward compatibility, if the BGP Role Capability is sent but one is not received, the BGP Speaker SHOULD ignore the absence of the BGP Role Capability and proceed with session establishment. The locally configured BGP Role is used for the procedures described in Section 5.

Being the local role "customer", the route collector would expect the peer to be a "provider", so the procedure from Section 5 that would be followed is this one:

The following ingress procedure applies to the processing of the OTC Attribute on route receipt:
...
3. If a route is received from a Provider, a Peer, or an RS and the OTC Attribute is not present, then it MUST be added with a value equal to the AS number of the remote AS.

I wonder whether making the route collector transparent to RFC9234 would be a better choice. In that situation, the looking glass would only show OTC attributes which are set downstream and passed along the path.

RFC9234: https://datatracker.ietf.org/doc/html/rfc9234

Infrastructure monitoring

Some kind of monitoring system that sends mails when ring infrastructure servers or services are down. Monitoring of hosts and services should be automatically configured when they are added to ansible.

reloading systemd files

fatal: [casablanca01.ring.nlnog.net]: FAILED! => {"changed": false, "failed": true, "msg": "Warning: ring-sqa4.service changed on disk. Run 'systemctl daemon-reload' to reload unit
s.\nJob for ring-sqa4.service failed because the control process exited with error code. See \"systemctl status ring-sqa4.service\" and \"journalctl -xe\" for details.\n"}

is this approach appropiate? https://lookonmyworks.co.uk/2015/06/24/ansible-systemctl-daemon-reload/

[RFC] Helper script for disabling Ansible

Build a ansible-disable script that creates a persistent lock file. The script takes a required parameter which describes the reason for disabling and records the value of $USER or $SUDO_USER (for root).

$ ansible-disable "Working on a local branch"
/run/ring-ansible.disable lock created (expires in 8 days)

$ cat /run/ring-ansible.disable
Working on local branch

Wed, 02 Jan 2019 20:00:38 +0000 by leopold

[DEPRECATION WARNING]: Instead of sudo/sudo_user, use become/become_user and make sure become_method is 'sudo' (default).

root@20c01:~# /usr/bin/ansible-pull --full -d /etc/ansible/ring -U https://github.com/NLNOG/ring-ansible.git --vault-password-file=/root/.vaultpw -i nodes -l 20c01.ring.nlnog.net -c local playbook.yml
Starting Ansible Pull at 2017-04-04 18:19:09
/usr/bin/ansible-pull --full -d /etc/ansible/ring -U https://github.com/NLNOG/ring-ansible.git --vault-password-file=/root/.vaultpw -i nodes -l 20c01.ring.nlnog.net -c local playbook.yml

[DEPRECATION WARNING]: Instead of sudo/sudo_user, use become/become_user and make sure become_method is 'sudo' (default).
This feature will be removed in a future release.
Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.

Sign host key with a CA pubkey

Suggestion mentioned on IRC:

have all the host keys signed so that we can just approve the ca pubkey and be confident connecting to ring nodes without host key prompts

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.