Code Monkey home page Code Monkey logo

app-of-apps's Introduction

Currently working with DevOps. A few of the things I've worked and / or fiddled with, to some significant extent:

Git Docker Helm Kubernetes OpenShift Container Platform GitHub Actions Jenkins CI Circle CI Terraform Packer Ansible Salt AWS Google Cloud Platform Datadog Prometheus Grafana Node.js React Jest Pytest C C++ Python Java Haskell Bash GitLab RHEL Debian Arch Linux CentOS Sass HTML5

app-of-apps's People

Contributors

d3adb5 avatar

Stargazers

 avatar

Watchers

 avatar

app-of-apps's Issues

Isolate namespaces using NetworkPolicy resources

The cluster that sources its applications from this repository uses Calico as a CNI. Calico has support for the NetworkPolicy resource, allowing us to restrict access to and from groups of Pods and namespaces within the cluster.

It is good practice to isolate applications in the network to mitigate lateral movement should any of them be compromised.

Automate application and dependency updates

Automate updates for application and chart versions using something like Renovate. The chosen solution needs to create pull requests to be reviewed, approved and merged. If possible, send notifications so we don't rely on GitHub itself.

Related to #4.

Deploy a Mumble server

Mumble is a free and open source voice chat service. It is low latency, very stable, and high quality. Mumble clients have noise cancellation algorithms like RNNoise that are quite good at isolating human speech.

Communication through Mumble will require a UDP port to be exposed. Keep in mind that as of the time of writing, it is not possible to have a Service of type LoadBalancer that exposes endpoints on both TCP and UDP ports. Therefore, NodePort will have to be used.

Set up alerts with alertmanager

Set up alerts with Prometheus alertmanager. It is likely being deployed already as part of kube-prometheus-stack, but notifications and alerts have not been set up.

For notifications, we can try out different solutions, from Telegram all the way up to PagerDuty (free for up to 5 users). Should the latter be used, let's just make sure nobody gets calls or anything of the sort in the middle of the night. These services are not mission critical, they're not even a product.

Calculate differences against cluster infrastructure using Argo CLI

Implement a CI workflow to show differences between the desired and the live states. This way we can start using pull requests and have a semblance of validation before merging and altering live infrastructure.

To achieve this, we can use the official Argo CD CLI. It is primarily useful when there are no ephemeral environments configured through an ApplicationSet.

Deploy Grafana independently

Grafana has been deployed as part of the kube-prometheus-stack for cluster wide monitoring. It is important to deploy Grafana and Prometheus separately, so Grafana isn't tied to the Prometheus deployment, and may be integrated with Loki as part of #23.

This will create a single interface for cluster monitoring and observation, where we can find metrics and logs alike.

Deploy Atlantis to automate Terraform workflow

While personally not a fan of Atlantis, it lets us keep credentials away from CI/CD pipelines. Since we want to keep secrets in HashiCorp Vault, it makes sense to have something like Atlantis on the cluster instead of adding GitHub Actions secrets that are too close to public repositories for comfort.

This issue is mainly for deploying Atlantis itself, not necessarily setting up credentials, which may be tracked through #13.

We may go back on this, of course. Nothing is set in stone.

Aggregate logs so they are easily navigated

We should collect and store logs for all applications on the server so they are easy to navigate. Log aggregators like Promtail and log storage solutions like Grafana Loki are available for this.

This will make it easier to detect any problems by looking at logs for applications that may or may not still be around. There is a limit to how many lines of logs Kubernetes keeps around by default.

Configure metrics exporter for Minecraft servers

itzg has an exporter for this: mc-monitor. It should have support for multiple Minecraft server implementations. Other than the exporter itself, we need to consume the metrics and display them through Grafana. That'll involve creating a ServiceMonitor and a Grafana dashboard.

Use fixed versions for all container images

We should not use latest as a tag, no matter how much we trust container images from LinuxServer. Instead, we should use fixed version numbers and explicitly update the services on the cluster. Ideally, we would have a bot monitoring for updates and creating pull requests for our approval automatically.

We wish to change from this:

image: <repository>:latest
imagePullPolicy: Always

To something like this:

image: <repository>:<specific version>
imagePullPolicy: IfNotPresent

Suggestions here are welcome.

Linkerd control plane certificates are being reset

It looks like the Linkerd control plane components' TLS certificates are being reset a few hours after synchronization through Argo CD. Argo is not detecting drift, but this leads to certificates that are not signed by the trust anchor.

Looking at the MutatingWebhookConfiguration created for the proxy injector, it looks like the CA bundle that is copied over is for the certificate that was put in place after the Argo CD sync. openssl verify says the certificate isn't trusted, even when the CA file is set to the trust anchor public certificate. This could be due to the trust anchor being a self-signed certificate, but it is also possible the certificate issued for these components is not signed by it.

This is not breaking communication between meshed services, but once the certificate is reset to an invalid one, the mutating webhook is unable to inject linkerd-proxy into new Pods, so new containers cannot enter the mesh.

Deploy a simple service mesh

Linkerd is one of the simplest service meshes in existence. It does the job of ensuring communication between containers in the cluster happens entirely over secure TLS tunnels.

Communication remains entirely transparent for the applications on the cluster, meaning we don't have to change anything other than installing Linkerd itself.

Deploy an open source IdP for OIDC such as Keycloak

Nobody wants to pay for Azure AD or GitHub Organizations. Google Workspace costs are based on user count. Luckily there are free implementations for identity providers. The most popular of these, it seems, is Keycloak. We can deploy Keycloak using the official Helm chart for it, then set it up.

It might be interesting to finish setting up HashiCorp Vault before this is done, in case we need any initializing secrets.

Deploy Owncast, the open source Twitch

Owncast is a free and open source service for livestreaming. Compared to raw ffmpeg streams over the network, it offloads the network output from the streamer to the server itself.

While current cluster users may not have a use for it right now, it's still an interesting service to have available, provided it is relatively difficult to exploit.

May benefit from #1 being implemented first.

Activate backups for Minecraft servers

Activate period backups for the Minecraft servers hosted on this cluster.

Backups should be done every week or every three days, and shouldn't exceed 20 GB altogether. They should be in the HDDs as they won't be written or read very often, and there should be at least 2 replicas for each volume.

Integrate External Secrets Operator and HashiCorp Vault

Create a SecretStore or multiple such resources that allows syncing Kubernetes built-in Secrets with data stored in HashiCorp Vault.

The External Secrets Operator has been deployed already in c00ffb4, and is maintained through the Operator Lifecycle Manager. The CRDs should therefore be in place, we just need to configure Kubernetes authentication on Vault and add the necessary service accounts. This may or may not be done on a case by case basis to limit the secrets each service account can read.

Current known use cases:

  • When integrating services with SSO, a client ID and client secret are made necessary and should be stored in a secret.
  • Terraform is likely to make its way into this repository (it has been written and is in use), we can deploy Atlantis and mount things like AWS credentials through external secrets.
  • Should this repository or any others turn private for whatever reason, GitHub credentials for Argo CD can be stored in Vault.

Set up rate limiting through Ingress controller

Global and application specific rate limits allow us to avoid server overload. We'll likely never reach high rates of requests, but it doesn't hurt to prepare for the worst.

Ingress NGINX is the current controller in use due to its simplicity. It implements rate limiting through annotations that can be added to the Ingress resources.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.