Code Monkey home page Code Monkey logo

manta's Introduction

MANTA

Another CLI tool for Alps.

Manta is a frontend cli to interact with Shasta, it uses mesa for all Shasta interaction.

Manta's goals:

  • release operators from repetitive tasks.
  • provide quick system feedback.

Manta aggregates information from multiple sources:

  • Shasta Keycloak
  • Shasta API
  • Shasta K8s API
  • local git repo
  • Gitea API (Shasta VCS)
  • Hashicorp Vault

Features

  • List and filter CFS configurations based on cluster name or configuration name
  • List and filter CFS sessions based on cluster name or session name
  • List and filter BOS session templates based on cluster name or session name
  • List nodes in HSM groups
  • List hw configuration/components
  • Create CFS configuration and session (target dynamic) from local repository
  • Create CFS configuration and session (target image) from CSCS SAT input file
  • Watch logs of a CFS session
  • Connect to a node's console
  • Power On/Off or restart nodes individually, in a list or per cluster
  • Restrict operations to nodes belonging to a specific HSM group
  • Filter information to a HSM group
  • Update node boot image based on CFS configuration name
  • Audit/Log
  • Delete all data related to CFS configuration
  • Migrate nodes from HSM group based on hw components profile

Configuration

Manta needs a configuration file in ${HOME}/.config/manta/config.toml like shown below

log = "info"

site = "alps"
hsm_group = "psi-dev"

[sites]

[sites.alps]
socks5_proxy = "socks5h://127.0.0.1:1080"
shasta_base_url = "https://api.cmn.alps.cscs.ch/apis"
keycloak_base_url = "https://api.cmn.alps.cscs.ch/keycloak"
gitea_base_url = "https://api.cmn.alps.cscs.ch/vcs"
k8s_api_url = "https://10.252.1.12:6442"
vault_base_url = "https://hashicorp-vault.cscs.ch:8200"
vault_secret_path = "shasta"
vault_role_id = "b15517de-cabb-06ba-af98-633d216c6d99" # vault in hashicorp-vault.cscs.ch

Manta can log user's operations in /var/log/manta/ (Linux) or ${PWD} (MacOS), please make sure this folder exists and the current user has rwx access to it

mkdir /var/log/manta
chmod 777 -R /var/log/manta

Legend:

Name mandatory Type Description Example
MANTA_CSM_TOKEN no env CSM authentication token, if this env var is missing, then manta will prompt use for credentials against CSM keycloak
log no config file log details/verbosity off/error/warn/info/debug/trace
hsm_group no config If exists, then it will filter/restrict the hsm groups and/or xnames targeted by the cli command psi-dev
site yes config file CSM instance manta comunicates with. Requires to have the right site in the "sites" section alps
sites.site_name.socks5_proxy yes config file socks proxy to access the services (only needed if using manta from outside a Shasta management node. Need VPN. Need to ope your VPN IP in hashicorp vault approle) socks5h://127.0.0.1:1080
sites.site_name.keycloak_base_url yes config file Keycloak base URL for authentication https://api.cmn.alps.cscs.ch/keycloak
sites.site_name.gitea_base_url yes config file Gitea base URL to fetch CFS layers git repo details https://api.cmn.alps.cscs.ch/vcs
sites.site_name.k8s_api_url yes config file Shasta k8s API URL https://10.252.1.12:6442
sites.site_name.vault_base_url yes config file Hashicorp Vault base URL storing secrets to authenticate to external services https://hashicorp-vault.cscs.ch
sites.site_name.vault_role_id yes config file role id related to Hashicorp Vault base URL approle authentication b15517de-cabb-06ba-af98-633d216c6d99
sites.site_name.vault_secret_path yes config file path in vault to find secrets shasta
sites.site_name.shasta_base_url yes config file Shasta API base URL for Shasta related jobs submission https://api-gw-service-nmn.local/apis

A note on certificates

Manta expects to have the CA of the CSM endpoint in PEM format in a file named <SITE>_root_cert.pem> under ${HOME}/.config/manta (Linux) or ${HOME}/Library/Application\ Support/local.cscs.manta (MacOS). Please make sure the file contains just one CA, on MacOS if there are more than one in the file, and the native-tls module is used, the following part of the security framework crate will break Manta:

    #[cfg(not(target_os = "ios"))]
pub fn from_pem(buf: &[u8]) -> Result<Certificate, Error> {
    let mut items = SecItems::default();
    ImportOptions::new().items(&mut items).import(buf)?;
    if items.certificates.len() == 1 && items.identities.is_empty() && items.keys.is_empty() {
        Ok(Certificate(items.certificates.pop().unwrap()))
    } else {
        Err(Error(base::Error::from(errSecParam)))
    }
}

The error message thrown is usually difficult to interpret and is something like:

thread 'main' panicked at <somepath>/mesa/src/shasta/authentication.rs:65:10:
called `Result::unwrap()` on an `Err` value: reqwest::Error { kind: Builder, source: Error { code: -50, message: "One or more parameters passed to a function were not valid." } }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

It's easy to determine how many certs are in the file with openssl:

while openssl x509 -noout -subject; do :; done < ~/.config/manta/alps_root_cert.2certsin1.pem

Example

Get latest (most recent) session

$ manta get session --most-recent
+----------------------------------------------+-------------------------+---------+---------------+---------------+---------------------+----------+-----------+------------------------------------------+
| Name                                         | Configuration           | Target  | Target groups | Ansible limit | Start               | Status   | Succeeded | Job                                      |
+==========================================================================================================================================================================================================+
| batcher-bab0cd68-5c61-4774-a685-bd57f744f62d | eiger-cos-config-3.0.24 | dynamic |               | x1002c6s6b0n0 | 2022-10-29T15:50:19 | complete | true      | cfs-cd39e25e-5b66-4ee9-be1c-027f5cd00683 |
+----------------------------------------------+-------------------------+---------+---------------+---------------+---------------------+----------+-----------+------------------------------------------+

Get logs for a session/layer

$ manta log --session-name batcher-cef892ee-39af-444a-b32c-89478a100e4d --layer-id 0
[2022-09-27T12:41:49Z INFO  manta::shasta_cfs_session_logs::client] Pod name: "cfs-b49cdc2b-d6cb-4477-b502-6be479472546-2jrlg"
Waiting for Inventory
Waiting for Inventory
Waiting for Inventory
Waiting for Inventory
Waiting for Inventory
Waiting for Inventory
Waiting for Inventory
Inventory generation completed
SSH keys migrated to /root/.ssh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
HTTP/1.1 200 OK
content-type: text/html; charset=UTF-8
cache-control: no-cache, max-age=0
x-content-type-options: nosniff
date: Tue, 27 Sep 2022 12:18:16 GMT
server: envoy
transfer-encoding: chunked

Sidecar available
[WARNING]: Invalid characters were found in group names but not replaced, use
-vvvv to see details

PLAY [Compute] *****************************************************************

PLAY [Application] *************************************************************
skipping: no hosts matched

PLAY [Management_Worker] *******************************************************
skipping: no hosts matched

PLAY RECAP *********************************************************************
x1500c7s2b0n0              : ok=1    changed=0    unreachable=0    failed=0    skipped=33   rescued=0    ignored=0

Create a CFS session and watch logs

$ manta apply session --repo-path /home/msopena/ownCloud/Documents/ALPSINFRA/vcluster_shasta_scripts/muttler/muttler_orchestrator/ --watch-logs --ansible-limit x1500c3s4b0n1
[2022-10-08T22:56:31Z INFO  manta::create_session_from_repo] Checking repo /home/msopena/ownCloud/Documents/ALPSINFRA/vcluster_shasta_scripts/muttler/muttler_orchestrator/.git/ status
[2022-10-08T22:56:32Z INFO  manta::create_session_from_repo] CFS configuration name: m-muttler-orchestrator
[2022-10-08T22:56:35Z INFO  manta::create_session_from_repo] CFS session name: m-muttler-orchestrator-20221008225632
[2022-10-08T22:56:35Z INFO  manta] cfs session: m-muttler-orchestrator-20221008225632
[2022-10-08T22:56:35Z INFO  manta] Fetching logs ...
[2022-10-08T22:56:35Z INFO  manta::shasta_cfs_session_logs::client] Pod for cfs session m-muttler-orchestrator-20221008225632 not ready. Trying again in 2 secs. Attempt 1 of 10
[2022-10-08T22:56:38Z INFO  manta::shasta_cfs_session_logs::client] Pod name: cfs-f1588924-f791-4bb8-a565-f61563a4274b-n7bbn
[2022-10-08T22:56:38Z INFO  manta::shasta_cfs_session_logs::client] Container ansible-0 not ready. Trying again in 2 secs. Attempt 1 of 10
[2022-10-08T22:56:40Z INFO  manta::shasta_cfs_session_logs::client] Container ansible-0 not ready. Trying again in 2 secs. Attempt 2 of 10
[2022-10-08T22:56:42Z INFO  manta::shasta_cfs_session_logs::client] Container ansible-0 not ready. Trying again in 2 secs. Attempt 3 of 10
Waiting for Inventory
Waiting for Inventory
Inventory generation completed
SSH keys migrated to /root/.ssh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
HTTP/1.1 200 OK
content-type: text/html; charset=UTF-8
cache-control: no-cache, max-age=0
x-content-type-options: nosniff
date: Sat, 08 Oct 2022 22:56:49 GMT
server: envoy
transfer-encoding: chunked

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
Sidecar available
[WARNING]: Invalid characters were found in group names but not replaced, use
-vvvv to see details

PLAY [Compute:Application] *****************************************************

PLAY RECAP *********************************************************************
x1500c3s4b0n1              : ok=8    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Create an interactive session to a node

$ manta console x1500c2s4b0n1
[2022-10-30T02:14:44Z INFO  manta::node_console] Alternatively run - kubectl -n services exec -it cray-console-node-2 -c cray-console-node -- conman -j x1500c2s4b0n1
[2022-10-30T02:14:44Z INFO  manta::node_console] Connecting to console x1500c2s4b0n1
Connected to x1500c2s4b0n1!
Use &. key combination to exit the console.

<ConMan> Connection to console [x1500c2s4b0n1] opened.

<ConMan> Console [x1500c2s4b0n1] joined with <nobody@localhost> on pts/452 at 10-30 02:14.

<ConMan> Console [x1500c2s4b0n1] joined with <nobody@localhost> on pts/453 at 10-30 02:14.

<ConMan> Console [x1500c2s4b0n1] joined with <nobody@localhost> on pts/454 at 10-30 02:14.

<ConMan> Console [x1500c2s4b0n1] joined with <nobody@localhost> on pts/455 at 10-30 02:14.

<ConMan> Console [x1500c2s4b0n1] joined with <nobody@localhost> on pts/468 at 10-30 02:14.

<ConMan> Console [x1500c2s4b0n1] joined with <nobody@localhost> on pts/510 at 10-30 02:14.

<ConMan> Console [x1500c2s4b0n1] joined with <nobody@localhost> on pts/511 at 10-30 02:14.

nid003129 login:

Power off a node

$ manta apply node off --force "x1004c1s4b0n1"

Power on a node

$ manta apply node on "x1004c1s4b0n1"

Deployment

Prerequisites

Install build dependencies

$ cargo install cargo-release cargo-dist git-cliff

Build container image

This repo contains a Dockerfile to build a Container with manta cli.

docker build -t manta .

Run

$ docker run -it --network=host -v ~:/root/ manta --help

Build from sources

Install Rust toolchain https://www.rust-lang.org/tools/install

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Install cross to be able to complile on different platforms

cargo install cross

Generate binary (cross compilation)

scripts/build

or

rustup target add x86_64-unknown-linux-gnu
cargo build --target=x86_64-unknown-linux-gnu

Development

Prerequisites

Install 'cargo dist' and 'cargo release'

cargo install cargo-dist
cargo install cargo-release

Configure cargo-dist. Accept default options and only target linux assets

cargo dist init -t $(uname -m)-unknown-$(uname -s | tr '[:upper:]' '[:lower:]')-gnu

Then remove the assets for macos and windows

Make sure a github workflow is created in .github/workflows/release.yml

Deployment

This project is already integrated with github actions through 'cargo release' and 'git cliff'

git cliff will parse your commits and update the CHANGELOG.md file automatically as long as your commits follows conventional commits and git cliff extra commit types

cargo release <bump level> --execute

chose your bump level accordingly

If everything went well, then binary should be located in manta/target/x86_64-unknown-linux-gnu/release/manta

Profiling

Enable capabilities

sudo sysctl -w kernel.perf_event_paranoid=-1

Install perf

sudo apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`

Grant access to kernel address map

sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"

Create perf data

perf stat -ad -r 100 target/release/manta get session

Identify bottlenecks and get hotspots for those events

perf record -g --call-graph=dwarf -F max target/release/manta get session

Convert perf data file to a format firefox profiles understands

perf script -F +pid > manta.perf

Go to https://profiler.firefox.com/ and open manta.perf file

DHAT mem alloction profiling

https://docs.rs/dhat/latest/dhat/ lto in Cargo.toml needs to be disabled

Run
cargo run -r --features dhat-heap -- get session
View results (dhat-heap.json file)

https://nnethercote.github.io/dh_view/dh_view.html

manta's People

Contributors

masber avatar miguelgila avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

manta's Issues

BUG: manta get CFS configuration details should fetch commit sha when resolving annotated tags

when manta gets the details of a CFS configuration, if fetches the ref details. If the ref is an annotated tag, then it will show the tag SHA which is different than the commit SHA it points to. This is default git behavior since annotated tags are an object themselves. However the manta get configuration output is confusing because the tag SHA is different than the commit id.
Operational side this should not have major implications since CFS session do a git checkout <SHA> to switch branches and if <SHA> is an annotated tag, then git will resolve it to the commit it points to automatically.
Probably.

FEAT: store multiple auth token for each site

the auth token is stored in a single file.
this means user needs to authenticate again every time the site is changed
make manta able to manage multiple auth token for each site independently.
this will help when migrating
each token should exists in a different file or single file with a struct inside?

example:

~/.cache/manta//http

FEATURE: add values to expand SAT template file through CLI params

A new feature was recently added to manta to template SAT file with values defined in a session var file.
An upgrade to this would be to give the option to overwrite the values files by passing the vars values through the cli

example:
manta apply cluster -f my_sat_file_template.yaml -V my_values.yaml --var "config.name="

The command above would overwrite the version value in my_values.yaml to v1.0.3. Then expand the my_sat_file_template.yaml

Having:

# cat my_values.yaml
---
hsm:
  group_name: "zinal_cta"
config:
  name: "test-config"
  version: "v1.0.2"
image:
  version: "v1.0.6"
bos_st:
  name: "deploy-cluster-action"
  version: "v1.0"

And var config.name=new-value

The values in my_values.yaml should update to:

hsm:
  group_name: zinal_cta
config:
  name: new-value
  version: v1.0.0
image:
  version: v1.0.5
bos_st:
  name: deploy-cluster-action
  version: v1.0

FEATURE: move parent HSM group to config file

Currently, HSM hardware inventory management works around the idea of target and parent HSM groups. The algorithm is flexible enough so any HSM group can be used as a parent, however, code is currently harcoded to "nodes_free".
Improve functionality above by adding parent HSM group as a value in configuration file:

  • When working with hardware inventory, check parent HSM group belongs to the HSM group list user has access to
  • Add new command manta config set parent-hsm to se new parent HSM group?

FEATURE: Add subcommand to validate local repo against CSM git

Currently, we use git.cscs.ch to push the ansible script configuring the clusters, each one of these repos will sync against CSM gitea through a gitlab pipeline.
This process works pretty well, however, there is a delay and external users don't have access to gitea, therefore is impossible for them to realize when their changes have been synchronized.
This issues is to create a new subcomand manta validate repo <path to local repo> which will compare the git history in a local repo with the git history of gitea.
This will give the user the possibility to know when is a good moment to submit a CFS session

FEATURE: provide summary data when fetching hardware inventory for a cluster

At the moment manta provide a summary at the node level when asking for the hardware inventory related to a cluster:
eg

+---------------+-----------+---------------------------------+------------------------------------+
| Node          | 32768 MiB | AMD EPYC 7742 64-Core Processor | SS11 200Gb 2P NIC Mezz REV02 (HSN) |
+==================================================================================================+
| x1003c1s7b0n0 |  βœ… (16)  |              βœ… (2)             |               βœ… (1)               |
|---------------+-----------+---------------------------------+------------------------------------|
| x1003c1s7b0n1 |  βœ… (16)  |              βœ… (2)             |               βœ… (1)               |
|---------------+-----------+---------------------------------+------------------------------------|
| x1003c1s7b1n0 |  βœ… (16)  |              βœ… (2)             |               βœ… (1)               |
|---------------+-----------+---------------------------------+------------------------------------|
| x1003c1s7b1n1 |  βœ… (16)  |              βœ… (2)             |               βœ… (1)               |
+---------------+-----------+---------------------------------+------------------------------------+

It would be better if we provide a summary with the counters of each hardware component type within the cluster.
Then provide a parameters to break down per node as the example above

SAT file validation

manta relies on CSM to validate SAT file, this is confusing for users if they are not familiar with how CSM works, specially if using jinja2 SAT files since the user will mostly interact with the values file.

This ticket is to work on manta SAT file validation to improve the error messages when submitting an incorrect SAT file.

FEAT: manage multiple auth tokens

Currently manta only manage a single token for all sites, this is a problem because the auth token gets rewritten every time the user changes site. This means if a user is working on more than one site at the same time, manta will have to ask user for authentication.

This ticket is to make manta manage a token per site like:

Auth token for site1 --> ~/.cache/manta/auth_site1
Auth token for site2 --> ~/.cache/manta/auth_site2

Note: we need to adapt this to env variables
Note: make sure PSI understands this

Use of CSM tags

Despite of manta doing a fairly good job in keeping CSM entities correlation, it is still hard to find information for lvl 2 and 3 support to find data related, this could be improved by tagging the CSM entities.
Make sure whenever a new CFS configuration, CFS session, BOS sessiontemplate is created, to tag those entities so they can be reachable.
Tags should be used to rename DATE in sat files following users' convention. If tag is missing, then put timestamp, this is easier since manta won't let you create 2 different CFS configurations with the same name

BUG: homebrew command in Github release page fails on Mac

When trying to install manta thru homebrew on MacOS another software is installed:

❯ brew info manta
==> manta: 1.1.4
https://getmanta.app/
Not installed
From: https://github.com/Homebrew/homebrew-cask/blob/HEAD/Casks/m/manta.rb
==> Name
Manta
==> Description
Invoicing desktop app with customizable templates
==> Artifacts
Manta.app (App)
==> Analytics
install: 6 (30 days), 13 (90 days), 55 (365 days)

This is what homebrew has in the manta.rb file:

cask "manta" do
  version "1.1.4"
  sha256 "f980f8d0c233e923a2352fd10521c1a04d059bc15140504bb7d2cfe235838776"

  url "https://github.com/hql287/Manta/releases/download/v#{version}/Manta-#{version}-mac.zip",
      verified: "github.com/hql287/Manta/"
  name "Manta"
  desc "Invoicing desktop app with customizable templates"
  homepage "https://getmanta.app/"

  app "Manta.app"
end

Is Homebrew supported on MacOS?

BUG: BOS sessiontemplate not filtered properly when deleting data

Data deletion works around CFS configurations and HSM groups, for this we need to correlate CFS configurations with BOS sessiontemplate and CFS sessions. There is a bug in manta where BOS sessiontemplate where not filtered properly for the CFS configurations marked for deletion

BUG: error parsing cli option `ansible-verbosity` to command `apply image` command

error message:

thread 'main' panicked at /home/msopena/.cargo/registry/src/index.crates.io-6f17d22bba15001f/clap_builder-4.5.1/src/parser/error.rs:32:9:
Mismatch between definition and access of `ansible-verbosity`. Could not downcast to TypeId { t: 7428646492878894209665195255548636123 }, need to downcast to TypeId { t: 42966343538335219590177265727833432740 }

FEATURE: Add support for jinja2 templating on SAT file

SAT file to deploy clusters is currently a static file and we would like to add support so we could use a jinja2 template features.
An example would be something like:

manta apply cluster -f <SAT file> --session-vars <session vars file>

With SAT file being:

# (C) Copyright 2022-2023 Hewlett Packard Enterprise Development LP
---
schema_version: 1.0.2
configurations:
- name: "{{default.note}}-compute-config-{{default.suffix}}"
  layers:
# The gpu_customize_driver_playbook.yml playbook will install GPU driver and
# SDK/toolkit software into the compute boot image if GPU content is available
# in the expected Nexus repo targets. If GPU content has not been uploaded to
# Nexus this play will be skipped automatically. If GPU content is available in
# Nexus but a non-gpu image is wanted this layer can be commented out.
#BEGIN_GPU_SUPPORT
  - name: uss-gpu-customize-driver-playbook-{{uss.working_branch}}
    playbook: gpu_customize_driver_playbook.yml
    product:
      name: uss
      version: "{{uss.version}}"
      branch: "{{uss.working_branch}}"
    special_parameters:
      ims_require_dkms: true
#END_GPU_SUPPORT
  - name: shs-{{default.network_type}}_install-{{slingshot_host_software.working_branch}}
    playbook: shs_{{default.network_type}}_install.yml
    product:
      name: slingshot-host-software
      version: "{{slingshot_host_software.version}}"
      branch: "{{slingshot_host_software.working_branch}}"
    special_parameters:
      ims_require_dkms: true
  - name: cscs-interfaces
    playbook: cscs-interfaces.yml
    git: 
      url: https://api-gw-service-nmn.local/vcs/cray/cscs-config-management.git
      branch: cscs-23.07.0
  - name: cos-compute-{{uss.working_branch}}
    playbook: cos-compute.yml
    product:
      name: uss
      version: "{{uss.version}}"
      branch: "{{uss.working_branch}}"
    special_parameters:
      ims_require_dkms: true
# The gpu_customize_net_playbook.yml playbook installs GPU network-dependent
# software and any additional GPU packages needed. The playbook will run by
# default if GPU content is available in Nexus, and will be skipped if not. If
# a non-gpu compute-only image is required this layer can be commented out.
#BEGIN_GPU_SUPPORT
  - name: uss-gpu-customize-net-playbook-{{uss.working_branch}}
    playbook: gpu_customize_net_playbook.yml
    product:
      name: uss
      version: "{{uss.version}}"
      branch: "{{uss.working_branch}}"
    special_parameters:
      ims_require_dkms: true
#END_GPU_SUPPORT
  - name: csm-packages-{{csm.version}}
    playbook: csm_packages.yml
    product:
      name: csm
      version: "{{csm.version}}"
  - name: csm-diags-compute-{{csm_diags.version}}
    playbook: csm-diags-compute.yml
    product:
      name: csm-diags
      version: "{{csm_diags.version}}"
  - name: sma-ldms-compute-{{sma.version}}
    playbook: sma-ldms-compute.yml
    product:
      name: sma
      version: "{{sma.version}}"
#  - name: cpe-pe_deploy-{{cpe.working_branch}}
#    playbook: pe_deploy.yml
#    product:
#      name: cpe
#      version: "{{cpe.version}}"
#      branch: "cscs-23.07.0"
##BEGIN_SLURM_SUPPORT
#  - name: slurm-site-{{slurm.working_branch}}
#    playbook: site.yml
#    product:
#      name: slurm
#      version: "{{slurm.version}}"
#      branch: "{{slurm.working_branch}}"
##END_SLURM_SUPPORT
  - name: cscs
    playbook: site.yml
    git:
      url: https://api-gw-service-nmn.local/vcs/cray/cscs-config-management.git
      branch: cscs-23.07.0
  - name: nomad
    playbook: site-client.yml
    git:
      url: https://api-gw-service-nmn.local/vcs/cray/nomad_orchestrator.git
      branch: main
  - name: cos-compute-last-{{uss.working_branch}}
    playbook: cos-compute-last.yml
    product:
      name: uss
      version: "{{uss.version}}"
      branch: "{{uss.working_branch}}"
    special_parameters:
      ims_require_dkms: true

images:
# Uncomment the lines below if ARM images are needed.
#BEGIN_AARCH64_SUPPORT
- name: "{{default.note}}-compute-{{default.suffix}}"
  ref_name: compute_image.aarch64
  base:
    ims: 
      name: "gracehopper-uss-1.0.0-58-csm-1.5.aarch64-1"
      type: image
  configuration: "{{default.note}}-compute-config-{{default.suffix}}"
  configuration_group_names:
  - Compute
  - prealps
  - santis
#END_AARCH64_SUPPORT

session_templates:
# Uncomment the lines below if ARM session templates are needed.
#BEGIN_AARCH64_SUPPORT
- name: "{{default.note}}-compute-template-{{default.suffix}}"
  image:
    image_ref: compute_image.aarch64
  configuration: "{{default.note}}-compute-config-{{default.suffix}}"
  bos_parameters:
    boot_sets:
      compute:
        arch: ARM
        kernel_parameters: ip=dhcp quiet ksocklnd.skip_mr_route_setup=1 cxi_core.disable_default_svc=0 spire_join_token=${SPIRE_JOIN_TOKEN}
        node_roles_groups:
        - Compute
        - prealps
        - santis
        rootfs_provider_passthrough: "dvs:api-gw-service-nmn.local:300:hsn0,nmn0:0"
- name: "{{default.note}}-compute-template-{{default.suffix}}-ramdisk"
  image:
    image_ref: compute_image.aarch64
  configuration: "{{default.note}}-compute-config-{{default.suffix}}"
  bos_parameters:
    boot_sets:
      compute:
        arch: ARM
        kernel_parameters: ip=dhcp quiet ksocklnd.skip_mr_route_setup=1 cxi_core.disable_default_svc=0 spire_join_token=${SPIRE_JOIN_TOKEN}
        node_roles_groups:
        - Compute
        - prealps
        - santis
        rootfs_provider_passthrough: "dvs:api-gw-service-nmn.local:300:hsn0,nmn0:1"
#END_AARCH64_SUPPORT

And session vars file being:

---

base_image: "gracehopper-base-cscs-uss-1.0.0-58-csm-1.5.aarch64-shs-2.1.1-64-cos-3.0-aarch64-compute-image-20"

default:
  network_type: cassini
  note: 'santis'
  suffix: 23.11.0-beta.5-9
  wlm: slurm
  working_branch: "cscs-23.07.0"

slingshot:
  version: 2.1.1-894

slingshot-host-software:
  version: 2.1.1-64-cos-3.0-aarch64
  working_branch: cscs-23.07.0

sma:
  version: 1.9.5

uan:
  version: 2.7.1
  working_branch: cscs-23.07.0

uss:
  version: 1.0.0-58-csm-1.5
  working_branch: cscs-23.07.0-no-nvhpc

feat: add git "tag" value to CFS configuration layers

At the moment CSM does not support the possibility of using git tags to assign commit ids to CFS configuration layers, this is an issue because it forces users to use commits if they want to have inmutable CFS configuration layers or mutable by using the most recent commit if CFS configuration is updated.
A proposal is to add a new git tag support for CFS configuration layers.
Upon CFS configuration creation, if tag is defined for a CFS configuration layer, then, manta/mesa will check if that layer exists in gitea, fetch its commit id and assign it to the CFS configuration layer.
If both branch and tag are defined, then, the application should crash.

Tasks:
Add tag to CFS configuration layer to mesa struct
get gitea commit id related to git tag and assign commit i to cfs configuraion layer.
repeat previous step for all CFS configuration layers
If any layer fails, then, cancel transaction, otherwise create CFs configuration

Output data as json

traditional CLIs have the option to show information as json, yaml or by default (tabular data)
potentially we could make use of the is_terminal feature to detect if user is running manta interactively or not, the former will imply to show output data in json format.

FEATURE: Integrate hardware components into SAT file

We would like to create a superset of the HPE SAT file adding new features like git tags to CFS configuration layers and hardware inventory to HSM groups.
This approach will help in having a self contained cluster definition in a single file.
Some use cases:

  • Build new clusters from scratch
  • Migrate clusters across different sites
  • simplify cluster management

eg:

hardware:
- pattern: a100:4:epyc:4
configurations:
- name: config-test-__DATE__
  layers:
  - name: test-layer
    playbook: site.yml
    git:
      url: https://api-gw-service-nmn.local/vcs/cray/test_layer.git
      branch: cscs-23.06.0

images:
- name: image-test-__DATE__
  ref_name: test_image
  base:
    ims:
      type: image
      id: 3de9f01b-1981-4248-a7b9-c9803a6bc471
  configuration: config-test-__DATE__
  configuration_group_names:
  - Compute
  - adula

session_templates:
- name: sessiontemplate-test-__DATE__
  image:
    image_ref: test_image
  configuration: config-test-__DATE__
  bos_parameters:
    boot_sets:
      compute:
        kernel_parameters: ip=dhcp quiet spire_join_token=${SPIRE_JOIN_TOKEN}
        node_groups:
        - adula

A the top, we can see a description of the hardware we want the cluster to have:

hardware:
- pattern: a100:4:epyc:4

While processing this SAT file, the end goal is to have an HSM group with x4 Nvidia a100 and x4 AMD epyc CPUS. the process of finding the hardware needed in the CSM hardware inventory is out of the scope since it is already implemented.

This tasks is to:

  1. adapt the logic in manta which reads a SAT file
  2. interacts with mesa in order to get the nodes with the hardware requirements needed
  3. create/update the HSM accordingly

missing libssl and libcrypto

Dear Support,

I am writing to report an issue I encountered after installing Manta on Castaneda system. The Manta binary raised an error related to missing libraries, specifically, libssl.so.1.1 and libcrypto.so.1.1.

Castaneda, which is based on Ubuntu, come with libssl3 as the default SSL library. However, the snap packaging system included the core18 package, which is the runtime environment for Ubuntu 18.04. This package contains the required missing libraries.

Here's the information from the ldd command for the Manta binary:

root@castaneda:~# ldd ./manta
linux-vdso.so.1 (0x00007fff542a5000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fc63921c000)
libssl.so.1.1 => not found
libcrypto.so.1.1 => not found
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc6391fc000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc6391f7000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc63910e000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc639109000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc638ee1000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc639dd2000

To resolve the issue, I created the necessary symbolic links, which provided a workaround. However, I believe this is not an ideal solution. It may be worth considering updating Manta to use the newer version of libssl and libcrypto to ensure compatibility.

Here are the commands I used to create the symbolic links:

ln -s /snap/core18/2751/usr/lib/x86_64-linux-gnu/libssl.so.1.1 /usr/lib/x86_64-linux-gnu/libssl.so.1.1
ln -s /snap/core18/2751/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1

Thanks for your support and best regards
Hussein

clean CSM data

Build clean subcommand to clean CSM data.

Some examples:

  • manta clean cluster --keep-last 5 --> will delete CFS configurations, CFS sessions, IMS artifacts, BOS sessiontemplates, BOS sessions, BSS... older than the last 5 clusters created. This is not straightforward because we need to correlate these data
  • manta clean configuration --> will delete CFS configurations, CFS sessions, IMS artifacts, BOS sessiontemplates, BOS sessions, BSS... related to the configuration name passed.

BUG: avoid running CFS configurations only SAT files

Currently, SAT files with only CFS configurations are valid, this cause a problem since CFS configurations are not linked to any HSM group/tenant.
This issues proposes manta to fail if user requests to process a SAT file with CFS configurations only.
The validation should make sure that any CFS configuration in the SAT file is being used by a CFS session (images section) or a BOS sessiontemplate (sessiontemplate section), then it should tag the CFS configuration with the HSM groups in the CFS session or BOS sessiontemplate

This issue depends on #53

BUG: command to get CFS logs fails is CFS session name does not exists

manta l <CFS session name> fails if the CFS session name does not exists

/manta log clariden-cos-config-2.3.111
thread 'main' panicked at src/cli/commands/log.rs:26:6:
called `Result::unwrap()` on an `Err` value: reqwest::Error { kind: Status(404), url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("api.cmn.alps.cscs.ch")), port: None, path: "/apis/cfs/v2/sessions/clariden-cos-config-2.3.111", query: None, fragment: None } }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

FEATURE: migrate gitea integration from http APIs to git2 library

manta currently relies on gitea APIs to fetch commit id information, this is quite limited because we lack of features in gitea API ( can't get the commit id related to a branch or tag name). Also CFS will be integrated with external git repos meaning gitea won't be the only source from where CFS sessions can clone from.

To simplify current limitations, we want to integrate git repo operations using git2 library.

FEATURE: resolve `commit id` from branch names when applying SAT files

manta currently does not resolves branch names in SAT files, it instead leverages this task to the CFS service. This has work for now but recently we found a bug in CFS where it could not resolve the branch name.

Error when creating a CFS configuration:

Error: Command '['git', 'checkout', 'cfs-layer-psi']' returned non-zero exit status 128.

This error while creating a CFS configuration means, CFS service could not validate the branch name and fails.

When using SAT file, the default behavior hides this because SAT will resolve the branch name on behalf of CFS service.

# sudo sat bootprep run --help
usage: sat bootprep run [-h] [--dry-run] [--save-files] [--no-resolve-branches] [--delete-ims-jobs] [--bos-version {v1,v2}] [--recipe-version RECIPE_VERSION] [--vars-file VARS_FILE] [--vars VARS] [--skip-existing-configs | --overwrite-configs]
                        [--skip-existing-images | --overwrite-images] [--skip-existing-templates | --overwrite-templates] [--output-dir OUTPUT_DIR] [--public-key-file-path PUBLIC_KEY_FILE_PATH | --public-key-id PUBLIC_KEY_ID]
                        input_file

Create images, configurations and session templates.

positional arguments:
  input_file            Path to the input YAML file that defines the configurations, images, and session templates to create.

optional arguments:
  -h, --help            show this help message and exit
  --dry-run, -d         Do a dry-run. Do not actually create CFS configurations, build images, customize images, or create BOS session templates, but walk through all the other steps.
  --save-files, -s      Save files that could be passed to the CFS and BOS to create CFS configurations and BOS session templates, respectively.
  --no-resolve-branches

As shown above, you can instruct SAT command to leverage the resolving branch behavior to CFS by using argument --no-resolve-branches

Manta should do the same as SAT and resolve git branches automatically, this will make the command slower but safer to run.

Important: keep the branch name in the request json payload so we can retrieve this information in the future when inspecting the CFS configuration layer details.

FEAT: add HSM groups in CFS configuration and BOS sessiontemplates in SAT file as tags to CFS configurations

One of Manta's goals is to make CFS configurations as first citizens, for this we need to make user we can identify each CFS configuration for each HSM group/tenant.

CFS configurations don't contain HSM information, this makes difficult to find out which CFS configuration belongs to which tenant. Currently, CSCS is embedding the HSM name into the CFS configuration name, this is far from optimal.
This issue proposes to add HSM groups as tags to the CFS configurations.

BUG: Make `manta config show` resiliant to network errors

Imagine a user starts using manta and they don't know which site is manta configured with. The next step would be to run manta config show or inspect the configuration file manually (more troublesome). If manta does not have access to keycloak API, then it won't show an error, but the user still does not know to which site is manta pointing to, this is inefficient and manta should instead show the information it has access to even if network is down.

feat: add a selection list when deleting auth tokens

Manta can manage multiple auth tokens depending on the site the user has worked with.
In order to improve the auth token cleaning process, we will provide the user with a list of local auth token to delete

Please chose the site token to delete from the list below::
> alpsm_auth
  alps_auth

FEATURE: rename subcommand `manta apply cluster`, `manta apply image`, `manta apply configuration` to `manta appy sat-file`

We currently have 3 different type of commands to manage SAT files:

  • apply configuration
  • apply image
  • apply cluster

This is confusing to the users, since version 1.23.0 manta is quite compatible with how HPE manages SAT files (eg support to Cray products, IMS images, etc)

This issue it to get rid of the sub commands above and replace them with a single command called manta apply sat-file

FEATURE: add CSM root parent cert to configuration file

At the moment the parent public CA cert file name for manta to trust CSM is dynamically calculated. This is troublesome for the configuration/setup of manta.
Create a new entry in manta config file specifying the path and filename where the CSM public root CA cert is

FEATURE: `get configuration --name` should lookup git tag

Recently we added a new functionality in manta where SAT files could have a git tag value in each CFS configuration layer.
This issues is to request a git tag lookup functionality when running manta get configuration -n <CFS configuration name> command, as a result, manta will lookup the git tag from the CFS configuration layer commit id and show it on screen with the rest of information

Error in converting CFS configuration layer commit id field form yaml SAT file

Manta fails in creating CFS configuration layer from SAT file when deserializing commit id value.

From:

  - name: psi
    playbook: layer0.yml
    git:
      url: https://api-gw-service-nmn.local/vcs/cray/vcluster-psidev-ansible.git
      commit: b81d7873da105c1775865588f4b8ab77842de60d

To:

| fora-cos-config-manuel-test-5    | 2024-02-15T19:46:29Z | COMMIT: ec773be15f72aed7c4a2161540200dec14d907f3 NAME: ss11               |
|                                  |                      | COMMIT: 06dcdc36181a05552ceae98da67bb16604dee34a NAME: cos                |
|                                  |                      | COMMIT: 64ab09c1adb0a431c479f8521c528dc9c116a869 NAME: csm                |
|                                  |                      | COMMIT: 3a0d13133a66a9ea4a6763a9202264a2250a113c NAME: cscs               |
|                                  |                      | COMMIT: Not defined NAME: psi                                             |

feat: Add support SAT file version 2.6

Add support for the following structs in SAT files:

configurations (cray-product-catalog in https://github.com/Cray-HPE/cray-product-catalog):

https://cray-hpe.github.io/docs-sat/en-26/usage/sat_bootprep/#define-cfs-configurations

        product:
          name: cos
          version: 2.4.122
          branch: cscs-23.06.0

images:

see https://cray-hpe.github.io/docs-sat/en-26/usage/sat_bootprep/#define-ims-images

  base:
    image_ref: base_cos_image

and

  base:
    product:
      name: cos
      type: recipe
      version: "2.4.139"

also support old SAT file version

- name: image-cos-__DATE__
  ims:
    is_recipe: false
    id: 22294fa6-f869-4acf-bcf4-0d27df5e824b
  configuration: cos-config-__DATE__
  configuration_group_names:
  - Compute

FEAT: validate base image id in when creating a CFS session

currently, manta delegates base image id validation to CFS when creating a CFS session to build an image. This work and the return message is understandable for a CSM sys admin:

#### Container 'ansible' logs

Inventory generation failed. Exiting

However this message how no significant meaning to a user not familiar with CSM and the interaction with the system going through manta + SAT file.

Therefore, it would be beneficial to be proactive and make this validation at the manta side and show a more meaningful message like base image ID not found. Please chose a different ID.

BUG: manta ignoring old version of session_template section in SAT file

New SAT file schema don't follow the session_template like below anymore:

session_templates:
- name: zinal-client-{{ version }}
  image: zinal-image-20231204014122
  configuration: zinal-client-{{ version }}
  bos_parameters:
    boot_sets:
      compute:
        kernel_parameters: ip=dhcp quiet spire_join_token=${SPIRE_JOIN_TOKEN}
        node_groups:
        - zinal_tds

Because manta tries to be backward compatible, we will add support to this

'manta get configuration' does not work on 1.22.3

(ansible2.9) πŸ’» [mcaubet@castaneda:~]# manta get configuration
2024-02-13 12:52:20 | INFO  | /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mesa-0.27.3/src/common/authentication.rs:47 β€” Reading CSM authentication token from configuration file
2024-02-13 12:52:20 | INFO  | /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mesa-0.27.3/src/common/authentication.rs:140 β€” Validate Shasta token against https://api.cmn.alps.cscs.ch/apis/cfs/healthz
2024-02-13 12:52:20 | INFO  | /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mesa-0.27.3/src/common/authentication.rs:151 β€” Shasta token is valid
thread 'main' panicked at src/common/cfs_configuration_utils.rs:76:43:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

(ansible2.9) πŸ’» [mcaubet@castaneda:~]# manta --version
manta 1.22.3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.