Code Monkey home page Code Monkey logo

doctor-cluster-config's Introduction

The documenation for all hosts lives in here. The corresponding nixos configuration is in ./hosts.

New admins: getting started

  1. Install nix (the recommended Multi-user installation is not NixOS, but only a package manager)
  2. Enable flake support in nix. This effectively adds the following flags to all your nix <flags> develop-like commands: --extra-experimental-features nix-command --extra-experimental-features flakes
  3. Clone the doctor-cluster-config repo, cd into it and run: nix develop. This opens a shell with additional packages available such as inv --list, sops and age.
  4. To generate new admin key, run (requires age):
mkdir -p ~/.config/sops/age/
age-keygen -o ~/.config/sops/age/keys.txt

Provide the generated key to a pre-existing admin and wait for him to re-encrypt all secrets in this repo with it. After pulling the re-encrypted secrets you can read them with sops secrets.yml.

Apply config to all servers

Choose a deployment target:

$ inv -l
Available tasks:

  cleanup-gcroots
  deploy                   Deploy to servers
  deploy-host              Deploy to a single host, i.e. inv deploy-host --host 192.168.1.2
  deploy-local             Deploy NixOS configuration on the same machine. The NixOS configuration is
  format-disks             Format disks with zfs, i.e.: inv format-disks --hosts new-hostname --disk /dev/nvme0n1
  generate-root-password   Generate password hashes for users i.e. for root in ./hosts/$HOSTNAME.yml
  generate-ssh-cert        Generate ssh cert for host, i.e. inv generate-ssh-cert bill 131.159.102.1
  install-nixos            install nixos, i.e.: inv install-nixos --hosts new-hostname --flakeattr
  ipmi-powercycle
  ipmi-serial
  mount-disks              Mount disks from the installer, i.e.: inv mount-disks --hosts new-hostname --disk /dev/nvme0n1
  print-age-key            Print age key for sops, inv print-age-key --hosts "host1,host2"
  print-tinc-key
  reboot                   Reboot hosts. example usage: fab --hosts clara.r,donna.r reboot
  update-docs              Regenerate docs for all servers
  update-lldp-info         Regenerate lldp info for all servers
  update-sops-files        Update all sops yaml and json files according to .sops.yaml rules

Run!

$ inv deploy

Add new users

Add chair members to ./modules/chair-members.nix and students to ./modules/users/students.nix.

For chair members use a uid in the 1000-2000. For new students use a uid in the 2000-3000 range. Check that the uid is unique across both files and in the range between to avoid conflicts.

If you need to give reviewers access i.e. for artifact evaluation, add them to ./modules/users/reviewers.nix. We use the uid range 4000-5000 there. By using users.users.<username>.allowedHosts it's possible to limit the hosts these users can access. To access the machine, they can use the ssh tunnel as described in here.

Add new servers

For installing new servers, see Add servers.

Update system

We use flakes to manage nixpkgs versions. To upgrade use:

$ nix flake update

Than commit flake.lock.

Home-manager

To install home-manager for a user simply run:

$ nix-shell '<home-manager>' -A install

This will initiate your home-manager and will generate a file similar to the one in home/.config/nixpkgs/home.nix

Visual Studio Code Server support in NixOS

You can use this to enable support for VS Code Server in NixOS.

An example of the home.nix configured for VS Code support is shown in home/.config/nixpkgs/home.nix.

IPMI

On our TUM rack machines we have IPMI support.

Generally, you can find the IPMI web interface at https://$HOST-mgmt.dse.in.tum.de/ (i.e. https://bill-mgmt.dse.in.tum.de) once the device has been installed in the rack. These addresses are only available through the management network, so you must use the RBG vpn for il1 to access them.

You can also retrieve the IP addresses assigned to the IPMI/BMC firmware by running:

$ ipmitool lan print

on the machine. On the other host (i.e. your laptop) you can run the following command to get a serial console:

$ inv impi-serial --host <ipmi-ip-address>

The following will reboot the machine:

$ inv impi-powercycle --host <ipmi-ip-address>

To boot the a machine into bios, use:

$ inv ipmi-boot-bios --host <ipmi-ip-address>

The IPMI password here is encrypted with sops. To decrypt it on your machine, your age/pgp fingerprint must be added to .sops.yaml in this repository. And one of the existing users must re-encrypt secrets.yml with your key.

Then press enter to get a login prompt. The root password for all machines is also stored in secrets.yaml.

Monitoring

Hosts are monitored here: https://grafana.thalheim.io/d/Y3JuredMz/monitoring?orgId=1

CI

All machines are build by gitlab ci on a self-hosted runner. Gitlab will also propagate the build status to the github repository eventually. The resulting builds are uploaded to https://tum-dse.cachix.org from where machines can download them while upgrading.

doctor-cluster-config's People

Contributors

ackxolotl avatar atsushikoshiba avatar bhatotia avatar bors[bot] avatar dependabot[bot] avatar dgiantsidi avatar dimstav23 avatar doctor-cluster-bot avatar ept avatar franciscoromao avatar gierens avatar github-actions[bot] avatar harshanavkis avatar he7086 avatar jedichen121 avatar lzmartinico avatar manosgior avatar martin-fink avatar mbailleu avatar mergify[bot] avatar mic92 avatar mmisono avatar myronfirst avatar paulhdk avatar pogobanane avatar raitobezarius avatar reimerss avatar rgouicem avatar sabanic-p avatar supersandro2000 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.