Code Monkey home page Code Monkey logo

sr3_tools's Introduction

Sarracenia v3 Data Pump/Cluster Tools (sr3_tools)

sr3_tools are a collection of scripts used to manage data pumps (clusters) running Sarracenia v3 (sr3).


Installation/Setup

To install, clone the repository:

git clone https://github.com/MetPX/sr3_tools.git

Then add the bin directory to your path. Generally this can be done by adding the following line to ~/.bash_profile or ~/.bashrc, where path_to_repo is substituted for the location where the repository was cloned to:

export PATH=path_to_repo/sr3_tools/bin:$PATH
# install completion script
source path_to_repo/sr3_tools/completion/sr3_tools_completion.bash

dsh is also required. On Ubuntu, it can be installed using:

sudo apt install dsh

Configuration Repository Layout

sr3_tools works in conjuction with a Git repository that contains the Sarracenia configuration files for one or more data pump clusters.

The layout of the repository should be similar to the following:

config_repo_root
├── _dsh_config
│   ├── pump1.list
│   ├── pump1_ssh_config (optional)
│   └── pump2.list
├── pump1
│   ├── cpost
│   ├── credentials.conf
│   ├── default.conf
│   ├── poll
│   ├── post
│   ├── sarra
│   ├── sender
│   ├── shovel
│   ├── subscribe
│   └── winnow
├── pump2
│   ├── cpost
│   ├── credentials.conf
│   ├── default.conf
│   ├── plugins
│   ├── poll
│   ├── post
│   ├── sarra
│   ├── sender
│   ├── shovel
│   ├── subscribe
│   └── winnow
├── .git
└── .gitignore

Files in _dsh_config

These files define the configurations used for dsh and ssh.

  • $pump_name.list is the dsh machine file; a list of hosts to connect to, one per line.

    Example:

  • $pump_name_ssh_config is an SSH client config file. This is optional. When the file exists, it can be used to specify options for the SSH client. See man 5 ssh_config for possible options. The example below uses a jump server to proxy connections to the nodes.

    Example:

    Host *.example.com
        ProxyCommand ssh [email protected] -W %h:%p
    

Set up config repo on each node

When setting up each node, the config repository should be cloned somewhere (possibly to ~), then the correct config subdirectory (e.g. pump1) should be symlinked to ~/.config/sr3.

Plugins as a Git repo

Plugin code can be managed in a separate Git repo from the configuration files. In this scenario, the plugins repo would be cloned to each node separately from the config repo and symlinked to ~/.config/sr3/plugins. sr3_pull will run git pull on the plugins directory if a .git directory exists inside it.

Example initial set up on nodes

Repeat for each node in the data pump:

cd ~
git clone https://git.example.com/config_repo.git
git clone https://git.example.com/plugins_repo.git

ln -s ~/config_repo/pump1 ~/.config/sr3
ln -s ~/plugins_repo/ ~/.config/sr3/plugins

# Optional - check out a particular branch of the plugins repo
cd ~/.config/sr3/plugins
git checkout dev

Command Descriptions

sr3d

Usage: sr3d [ -h ] (convert|declare|devsnap|dump|edit|log|restart|sanity|setup|show|status|overview|stop)

"sr3 distributed" runs sr3 on each node, with all command line arguments passed to sr3.

See man sr3.

Special Cases:

  • convert: converts the config from v2 to sr3 on the first node in the cluster, then copies the sr3 config to your workstation and removes the config from the node. The user needs to manually disable the v2 config, commit the sr3 config, and run sr3_pull to update the cluster.
  • remove: runs sr3 remove ... on the cluster and deletes the config from your Git repo.

Examples:

  • sr3d start subscribe/my_config
  • sr3d status poll/my_poll

sr3_commit

Usage: sr3_commit <pathspec>... [-m <msg>]

Simplifies the sequence of git pull, git add <pathspec>, git commit [-m <msg], git push origin into one step. Recommended replacement for sr(3)_push.

Examples:

  • sr3_commit poll/poll_some_source.conf sarra/get_some_source.conf sender/send_to_server.conf -m "Add a new data feed"
  • sr3_commit my_config.conf

Run sr3_pull afterwards to pull the latest configs to the nodes.


sr3_pull

Does a git pull on each node in the cluster to update the local configs.


sr3_push

Usage: sr3_push file_name ["Commit message"]

"Pushes" a config file by 1) commiting the file to Git and 2) running sr3_pull to update the configs on each node.

This exists for to provide a familiar workflow for people used to using sr_push, but using Git branches, merging and sr3_pull is preferred. The Git workflow supports changing multiple files at once.

The commit message is optional. If no message is passed on the command line, an editor will open where you can type the commit message.

Examples:

  • sr3_push new_config.conf "AA - description of change"
  • sr3_push another_config.conf Commit message
  • sr3_push some_file.conf

sr3l

Usage: sr3l your_command

"sr3 log" is sr3r, with cd ~/.cache/sr3/log before your command. Used for searching through logs on all nodes, typically in combination with grep or tail.

Try to be as specific as possible when grepping, e.g. search within sender*my_config*.log rather than sender*.log.

Examples:

  • sr3l grep looking_for_this_filename sender*my_config*.log
  • sr3l tail -n 2 sarra*.log

sr3r

Usage: sr3r your_command

"sr3 run" executes a shell command on all nodes in the cluster.


sr3_scp

Usage: sr3_scp user@server:/remote/file /local/file

Like sr3_ssh, but for scp.


sr3_ssh

Usage: sr3_ssh user@server

"sr3_ssh" SSHes to a remote server, using the same SSH config file that is used for sr3r. This is essentially a shortcut to SSH to a server using the same proxy config that would be used for dsh with the other commands.


Environment Variable Options

Specific environment variables can be used to set options for sr3_tools.

SR3TOOLS_DSH_ARGS

Passes additional arguments to dsh. See the dsh man page for full details.

Example: SR3TOOLS_DSH_ARGS="--remoteshellopt -vvv"

SR3TOOLS_COLOUR_CMDS

Used to force certain commands to output in colour. Replaces all instances of $cmd with $cmd --color=always.

Example: export SR3TOOLS_COLOUR_CMDS="grep ls" will always colourize the output of grep and ls.


Completion Script

A Bash completion script is included. This script allows automatic completion of arguments for some of the sr3_tools commands.

sr3_tools's People

Contributors

reidsunderland avatar tysonkaufmann avatar

Watchers

Peter Silva avatar Eric von Graevenitz avatar  avatar André LeBlanc avatar

sr3_tools's Issues

sr3d not working when only options specified

How to replicate

Run sr3d without any commands, only options

sr3d --version
Could not determine which argument contained the action.
Actions are: cleanup declare disable enable list remove restart run sanity setup show start status stop

It only doesn't work when no argument is specified

sr3d --logStdout status

The command above works.

How to manage configs on systems that don't have access to the Git repo?

We need a way to push the repo to the nodes instead of having the nodes pull it. Maybe using rsync/scp?

This could also be used as a fallback method when the Git server is down.

Ideally a completely clean repo would be pushed out with only the main branch. Any local branches should be ignored.

`sr3_pull` overwrites any local changes on the nodes

sr3_pull currently does a git reset --hard HEAD before pulling the config and plugins repos. This was intentional, I wanted to overwrite local changes that had been made and forgotten about, and I wanted to ensure the configs were identical across all cluster nodes and reflected the remote repo.

But when multiple people are working on stuff, sometimes the local changes need to stay around for a while.

Options:

  • do a git stash --include-untracked before pulling. This would let someone come back and recover their changes. Problem: need a way to get rid of old stashes that aren't needed.
  • warn the user that the local repos are not clean and abort. Provide a --force or similar option to force overwrite local changes.
  • other options?

add sr3d overview ?

When everything is ok, the old v2 format of sr status is much more compact than the current sr3 status .
People sometimes prefer that more compact format. It is still available, now called "overview" .

It might be good to add support for the overview command to sr3d so people can get that view if they want it.
the work-around is trivial: sr3l sr3 overview.

An easy way to search the Git history for deleted files

In v2, when turning off configs, we moved the .conf file to .conf.off. This makes it easy to find and reactivate disabled configs.

In sr3, they're deleted and the history is only in Git. It would be nice to have a way to display all removed config files in a directory, and even grep through all removed configs.

Some sr3/sr3d actions don't make sense on a cluster

While working on #1 I realized that some of the possible actions for sr3 don't make sense on a cluster.

sr3d currently doesn't do any kind of filtering. Maybe it should?

  • edit: if this were to do anything, it should edit the local copy of the config (and maybe sr3_commit it). But it shouldn't run on the remote nodes.
  • add
  • remove: maybe this should alias to sr3_remove? sr3_remove currently only supports one config at a time
  • convert
  • foreground
  • run

`sr3d convert` include files

I realized I don't know how sr3 convert handles include files, and sr3d convert definitely just ignores them.

This should be fixed - if sr3 convert converts include files, then sr3d needs to be updated to copy those converted include files to the workstation so they can be committed.

if sr3 convert ignores include files, we should figure out how we want to handle include files.

Can't remove include files with `sr3d remove`

sr3d remove (and sr3 remove) currently doesn't remove include files.

sr3d remove sender/myinclude.inc
Data Pump Name: pumpname. DSH Machine List: /net/local/home/sunderlandr/sr3-stg-config/_dsh_config/pumpname.list
ddsr-cmc-stage01: 2024-01-05 14:56:52,735 33334 [ERROR] root remove No configuration matched
ddsr-cmc-stage01:
ddsr-cmc-stage02:
ddsr-cmc-stage02: 2024-01-05 14:56:54,536 13028 [ERROR] root remove No configuration matched
[ERROR] File does not exist: /net/local/home/sunderlandr/sr3-stg-config/pumpname/sender/myinclude.inc.conf
  1. It would be nice if we checked if anything was using the include file and not allow it to be removed if any conf files include it.
  2. I don't think sr3 needs to do anything, this could be implemented only with sr3_tools, but maybe we don't want to have that inconsistency?

sr3_commit, likely other tools don't work when on a local branch

sr3_commit and other tools try to run git pull before committing changes to Git. This causes an error when there's no remote branch (yet).

~/sr3-dev-config/ddsr-dev$ sr3_commit flow/hydro_quebec_http.conf sarra/get_hydro_quebec_http.conf "Create new feed for Hydro Quebec scheduled poll"
There is no tracking information for the current branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.

    git pull <remote> <branch>

If you wish to set tracking information for this branch you can do so with:

    git branch --set-upstream-to=origin/<branch> 1617_flow_sarra_hydro_quebec_http

[ERROR] Problem updating local repository. Please resolve manually and retry.

Improve output of sr3_pull

When performing a sr3_pull it can be a bit confusing to determine which repository is being pulled. Looking at this on the first hand, it's not obvious as to what is being updated for which repo.

server02: HEAD is now at 6b917ff Merge branch 'issue1677_airnow_bc1' into 'main'
server06: HEAD is now at 6b917ff Merge branch 'issue1677_airnow_bc1' into 'main'
server05: HEAD is now at 6b917ff Merge branch 'issue1677_airnow_bc1' into 'main'
server03: HEAD is now at 6b917ff Merge branch 'issue1677_airnow_bc1' into 'main'
server08: HEAD is now at 6b917ff Merge branch 'issue1677_airnow_bc1' into 'main'
server07: HEAD is now at 6b917ff Merge branch 'issue1677_airnow_bc1' into 'main'
server04: HEAD is now at 6b917ff Merge branch 'issue1677_airnow_bc1' into 'main'
server01: HEAD is now at 6b917ff Merge branch 'issue1677_airnow_bc1' into 'main'
server06: Already up to date.
server02: Already up to date.
server06: HEAD is now at fcae539 Use relPath instead of new_file
server02: HEAD is now at fcae539 Use relPath instead of new_file
server05: Already up to date.
server05: HEAD is now at fcae539 Use relPath instead of new_file
server03: Already up to date.
server04: Already up to date.
server07: Already up to date.
server08: Already up to date.
server03: HEAD is now at fcae539 Use relPath instead of new_file
server04: HEAD is now at fcae539 Use relPath instead of new_file
server07: HEAD is now at fcae539 Use relPath instead of new_file
server01: Already up to date.
server08: HEAD is now at fcae539 Use relPath instead of new_file
server01: HEAD is now at fcae539 Use relPath instead of new_file
server06: Already up to date.
server02: Already up to date.
server07: Already up to date.
server05: Already up to date.
server08: Already up to date.
server03: Already up to date.
server01: Already up to date.
server04: Already up to date.

Maybe we could add a prefix stating which repository is being pulled before the commit message?

Bash completion script

It would be nice to have tab autocompletion for sr3_commit, following the way that git add completion works.

sr3l could complete the log file name based on the current directory and config files in it:

$ pwd
.../sender
$ ls
cfg1.conf include1.inc myconfig.conf
$ sr3l grep keyword [TAB]
# should fill sender_
$ sr3l grep keyword sender_
$ sr3l grep keyword sender_[TAB][TAB]
cfg1 myconfig
$ sr3l grep keyword sender_m[TAB]
# should fill myconfig (or maybe myconfig*)
$ sr3l grep keyword sender_myconfig

Configs can't be managed when Git is offline

In v2, sr_push would commit the config file to git and scp it to each node. This allowed configs to be pushed when the Git server is down.

With sr3_tools, the servers pull configs from the Git server, and there's no automated way to update the configs on the nodes if the server isn't available. In emergencies, users could manually edit the configs on each node, or scp changes by hand.

A possible solution would be to re-write sr3_push to rsync the Git repo from the workstation to all the nodes in the cluster. Some extra work would need to be done to avoid rsyncing gitignored files and local branches.

sr3_ssh accept numerical argument for node number...

I have a hard time remembering node names (seen a lot of clusters in my day. ;-) so I have this consistent pattern, I do:


sr3l uname -a


to get the node names, and then do sr3_ssh and use the hostname i get from that to pick a node to log into. I was thinking... I don't actually care what the hostnames are... I would much prefer it if I could just pick the first node with:


sr3_ssh 1


and not have to enter the actual hostname of the first node. it could just iterate through the dsh configs...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.