stolostron / multicluster-observability-operator Goto Github PK

Operator for Multi-Cluster Monitoring with Thanos.

License: Apache License 2.0

Dockerfile 0.95% Shell 6.38% Go 91.52% Makefile 1.15%

monitoring thanos prometheus grafana openshift-operator observability open-cluster-management kubernetes

multicluster-observability-operator's Introduction

Observability Overview

This document attempts to explain how the different components in Open Cluster Management Observabilty come together to deliver multicluster fleet observability. We do leverage several open source projects: Grafana, Alertmanager, Thanos, Observatorium Operator and API Gateway, Prometheus; We also leverage a few Open Cluster Mangement projects namely - Cluster Manager or Registration Operator, Klusterlet. The multicluster-observability operator is the root operator which pulls in all things needed.

Conceptual Diagram

Associated Github Repositories

Component	Git Repo	Description
MCO Operator	multicluster-observability-operator	Operator for monitoring. This is the root repo. If we follow the Readme instructions here to install, the code from all other repos mentioned below are used/referenced.
Endpoint Operator	endpoint-metrics-operator	Operator that manages setting up observability and data collection at the managed clusters.
Observatorium Operator	observatorium-operator	Operator to deploy the Observatorium project. Inside the open cluster management, at this time, it means metrics using Thanos. Forked from main observatorium-operator repo.
Metrics collector	metrics-collector	Scrapes metrics from Prometheus at managed clusters, the metric collection being shaped by configuring allow-list.
RBAC Proxy	rbac_query_proxy	Helper service that acts a multicluster metrics RBAC proxy.
Grafana	grafana	Grafana repo - for dashboarding and metric analytics. Forked from main grafana repo.
Dashboard Loader	grafana-dashboard-loader	Sidecar proxy to load grafana dashboards from configmaps.
Management Ingress	management-ingress	NGINX based ingress controller to serve Open Cluster Management services.
Observatorium API	observatorium	API Gateway which controls reading, writing of the Observability data to the backend infrastructure. Forked from main observatorium API repo.
Thanos Ecosystem	kube-thanos	Kubernetes specific configuration for deploying Thanos. The observatorium operator leverages this configuration to deploy the backend Thanos components.

Quick Start Guide

Prerequisites

Ensure kubectl and kustomize are installed.
Prepare a OpenShift cluster to function as the hub cluster.
Ensure docker 17.03+ is installed.
Ensure golang 1.15+ is installed.
Ensure operator-sdk 1.4.2+ in installed.
Ensure the open-cluster-management cluster manager is installed. See Cluster Manager for more information.
Ensure the open-cluster-management klusterlet is installed. See Klusterlet for more information.

Note: By default, the API conversion webhook use on the OpenShift service serving certificate feature to manage the certificate, you can replace it with cert-manager if you want to run the multicluster-observability-operator in a kubernetes cluster.

Use the following quick start commands for building and testing the multicluster-observability-operator:

Clone the Repository

Check out the multicluster-observability-operator repository.

git clone [email protected]:stolostron/multicluster-observability-operator.git
cd multicluster-observability-operator

Build the Operator

Build the multicluster-observability-operator image and push it to a public registry, such as quay.io:

make docker-build docker-push IMG=quay.io/<YOUR_USERNAME_IN_QUAY>/multicluster-observability-operator:latest

Run the Operator in the Cluster

Create the open-cluster-management-observability namespace if it doesn't exist:

kubectl create ns open-cluster-management-observability

Deploy the minio service which acts as storage service of the multicluster observability:

kubectl -n open-cluster-management-observability apply -k examples/minio

Replace the operator image and deploy the multicluster-observability-operator:

make deploy IMG=quay.io/<YOUR_USERNAME_IN_QUAY>/multicluster-observability-operator:latest

Deploy the multicluster-observability-operator CR:

kubectl apply -f operators/multiclusterobservability/config/samples/observability_v1beta2_multiclusterobservability.yaml

Verify all the components for the Multicluster Observability are starting up and running:

kubectl -n open-cluster-management-observability get pod
NAME                                                       READY   STATUS    RESTARTS   AGE
minio-79c7ff488d-72h65                                     1/1     Running   0          9m38s
observability-alertmanager-0                               3/3     Running   0          7m17s
observability-alertmanager-1                               3/3     Running   0          6m36s
observability-alertmanager-2                               3/3     Running   0          6m18s
observability-grafana-85fdc8c48d-j67j6                     2/2     Running   0          7m17s
observability-grafana-85fdc8c48d-wnltt                     2/2     Running   0          7m17s
observability-observatorium-api-69cfff4c95-bpw5s           1/1     Running   0          7m2s
observability-observatorium-api-69cfff4c95-gbh7b           1/1     Running   0          7m2s
observability-observatorium-operator-5df6b7949c-kbpmp      1/1     Running   0          7m17s
observability-rbac-query-proxy-d44df47c4-9ccdn             2/2     Running   0          7m15s
observability-rbac-query-proxy-d44df47c4-rtcgh             2/2     Running   0          6m50s
observability-thanos-compact-0                             1/1     Running   0          7m2s
observability-thanos-query-79c4d9488b-bd5sf                1/1     Running   0          7m3s
observability-thanos-query-79c4d9488b-d7wzt                1/1     Running   0          7m3s
observability-thanos-query-frontend-6fdb5d4946-rgblb       1/1     Running   0          7m3s
observability-thanos-query-frontend-6fdb5d4946-shsz2       1/1     Running   0          7m3s
observability-thanos-query-frontend-memcached-0            2/2     Running   0          7m3s
observability-thanos-query-frontend-memcached-1            2/2     Running   0          6m37s
observability-thanos-query-frontend-memcached-2            2/2     Running   0          6m33s
observability-thanos-receive-controller-6b446c5576-hj6xl   1/1     Running   0          7m3s
observability-thanos-receive-default-0                     1/1     Running   0          7m2s
observability-thanos-receive-default-1                     1/1     Running   0          6m20s
observability-thanos-receive-default-2                     1/1     Running   0          5m50s
observability-thanos-rule-0                                2/2     Running   0          7m3s
observability-thanos-rule-1                                2/2     Running   0          6m27s
observability-thanos-rule-2                                2/2     Running   0          5m56s
observability-thanos-store-memcached-0                     2/2     Running   0          7m3s
observability-thanos-store-memcached-1                     2/2     Running   0          6m37s
observability-thanos-store-memcached-2                     2/2     Running   0          6m33s
observability-thanos-store-shard-0-0                       1/1     Running   2          7m3s
observability-thanos-store-shard-1-0                       1/1     Running   2          7m3s
observability-thanos-store-shard-2-0                       1/1     Running   2          7m3s

What is next

After a successful deployment, you can run the following command to check if you have OCP cluster as a managed cluster.

kubectl get managedcluster --show-labels

If there is no vendor=OpenShift label exists in your managed cluster, you can manually add this label with this command kubectl label managedcluster <managed cluster name> vendor=OpenShift

Then you should be able to have metrics-collector pod is running:

kubectl -n open-cluster-management-addon-observability get pod
endpoint-observability-operator-5c95cb9df9-4cphg   1/1     Running   0          97m
metrics-collector-deployment-6c7c8f9447-brpjj      1/1     Running   0          96m

Expose the thanos query frontend via route by running this command:

cat << EOF | kubectl -n open-cluster-management-observability apply -f -
kind: Route
apiVersion: route.openshift.io/v1
metadata:
  name: query-frontend
spec:
  port:
    targetPort: http
  wildcardPolicy: None
  to:
    kind: Service
    name: observability-thanos-query-frontend
EOF

You can access the thanos query UI via browser by inputting the host from oc get route -n open-cluster-management-observability query-frontend. There should have metrics available when you search the metrics :node_memory_MemAvailable_bytes:sum. The available metrics are listed here

Uninstall the Operator in the Cluster

Delete the multicluster-observability-operator CR:

kubectl -n open-cluster-management-observability delete -f operators/multiclusterobservability/config/samples/observability_v1beta2_multiclusterobservability.yaml

Delete the multicluster-observability-operator:

make undeploy

Delete the minio service:

kubectl -n open-cluster-management-observability delete -k examples/minio

Delete the open-cluster-management-observability namespace:

kubectl delete ns open-cluster-management-observability

Rebuild Image: Wed Jan 25 15:08:26 EST 2023

multicluster-observability-operator's People

Contributors

Stargazers

Watchers

Forkers

morvencao songleo isgasho marcolan018 aizuddin85 haoqing0110 clyang82 hchenxa robdolares openshift-cherrypick-robot mrpop7t flaper87 juliobarreto vincent-pli crystalchun justinkuli chazzrobbz vikassharma437 quchangl-github sree77hari redhathameed mychai76 juliusgun justone0127 asimbutt-redhat rubinus subbarao-meduri liuxiaoyu-git zhiweiyin318 psehgaft mprahl leoaaraujo bjoydeep ngraham20 michailidis-konstantinos maemmanu jnpacker castawayegr gparvin dislbenn giovannisciortino aetheriaxai ckandag jonnynch stolostron zlayne vinodkottem saswatamcode rh-dluong coleenquadros thibaultmg philipgough jotak rh-bjeon jacobbaungard rokej moadz sradco dvandra

multicluster-observability-operator's Issues

generate-dashboard-configmap-yaml.sh : asterisk character in panel query is incorrectly evaluated

The script generate-dashboard-configmap-yaml.sh doesn't correctly export dashboard containing asterisk character in the query of panels.

This issue occurs also when the script generate-dashboard-configmap-yaml.sh is executed in a not-empty directory.

To reproduce this issue execute the following steps:

Create a dashboard containing a panel with the following metric "100 * 100"
Open a terminal session, verify that the current directory isn't empty
Execute the script generate-dashboard-configmap-yaml.sh in order to export the dashboard created in the step 1
Verify the yaml file containing the dashboard exported to a ConfigMap, the metrics "100 * 100" isn't present. The character "*" has been replaced with the file list of the current directory. This is a not expected behaviour

Image with latest tags is not picked up, thats why PODS are giving error.

When I am trying to install multicluster-observability-operator on openshift cluster, the snapshot(2021...) which it is referring is not present in repository. If I am updating the tag with latest snapshot present(2022....) in repository, it is updating the YAMLs but again it is updating the image with previous snapshot.. I am not sure whether some ArgoCD functionality is built in with this deployments. Could you please look into this and help me out with resolution. How I can update the image with latest tag (snapshot). Due to this issue, I am not able to implement Thanos on openshift cluster. My PODS are not running.

Internal error occurred: error resolving resource when running on plain k8s

I see the README is geared at the hub cluster being Openshift,
however I also see this wording that suggests may this can work on plain k8s as well?

Note: By default, the API conversion webhook use on the OpenShift service serving certificate feature to manage the certificate, you can replace it with cert-manager if you want to run the multicluster-observability-operator in a kubernetes cluster.

I'm using local kind clusters with k8s v1.26.0
I've gotten as far as the below command but hitting an error.
I'm thinking it could be related to the webhook?

kubectl -n open-cluster-management-observability apply -f operators/multiclusterobservability/config/samples/observability_v1beta2_multiclusterobservability.yaml

Error from server (InternalError): error when retrieving current configuration of:
Resource: "observability.open-cluster-management.io/v1beta2, Resource=multiclusterobservabilities", GroupVersionKind: "observability.open-cluster-management.io/v1beta2, Kind=MultiClusterObservability"
Name: "observability", Namespace: ""
from server for: "operators/multiclusterobservability/config/samples/observability_v1beta2_multiclusterobservability.yaml": Internal error occurred: error resolving resource

Is there a way I can get this add-on to work with plain k8s?

2.2 Readme for Setting up object storage section needs update

Minio Directory in tests/e2e/minio/ is no longer present or moved to https://github.com/open-cluster-management/observability-kind-cluster/tree/master/minio. Readme section needs an update reflecting the changes for setting up object storage using minio.

Reference to thanos.io does not exist

The reference from https://github.com/open-cluster-management/multicluster-observability-operator/blob/main/bundle/manifests/observability.open-cluster-management.io_multiclusterobservabilities.yaml#L706
to URL: https://thanos.io/storage.md/#configuration does not exist ... getting Page Not Found

username has to be encoded in swtich-to-grafana-admin.sh script

Followed the instruction to set up the Grafana developer instance.
https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.2/html/observing_environments/observing-environments-intro#designing-your-grafana-dashboard

When tried with "switch-to-grafana-admin.sh", I encountered some issue as my id is "IAM#changwoo" since I installed RHACM on ROKS (Red Hat OpenShift on IBM Cloud).

Basically, "#" made things complicated.

Suggesting the fix something like this at line 70,

encoded_user_name=echo -n $user_name | jq -sRr '@uri'
userID=$curlCMD -s -X GET -H "Content-Type: application/json" -H "X-Forwarded-User: $XForwardedUser" 127.0.0.1:3001/api/users/lookup?loginOrEmail=$encoded_user_name | $PYTHON_CMD -c "import sys, json; print(json.load(sys.stdin)['id'])" 2>/dev/null

orgID=$curlCMD -s -X GET -H "Content-Type: application/json" -H "X-Forwarded-User:$XForwardedUser" 127.0.0.1:3001/api/users/lookup?loginOrEmail=$encoded_user_name | $PYTHON_CMD -c "import sys, json; print(json.load(sys.stdin)['orgId'])" 2>/dev/null

Feature request - users should be able to secure Grafana observability route using own certificates

As of now, users are not able to configure TLS settings for the Grafana route (generate and use their own certificates) and the route is using the default OpenShift ingress certificate. There are no settings in the Observability CRD to specify a certificate for the route.
Idea is to provide users more flexibility with Grafana route configuration so they are able to configure TLS settings for the route. This can be set as a link to a secret in MultiClusterObservability CR for the Grafana route so the operator set route TLS settings using certificates from the secret.

Object aliases are already widely used in Nginx Operator.

Describe the bug
Kubernetes resources and aliases are always different by the domain.
Same name resources are common and we often forget about it.
But in this case ACM is using a resource that is already used in Nginx driven environment.
That can lead to tedious cases.

To Reproduce
Install Open Cluster Manager (Red Hat Advanced Cluster Manager) and Nginx Ingress Controller Operator
Try to get a policy (There goes the problem which one?)

oc api-resources|egrep  '^NAME| plc | pol '
NAME                                  SHORTNAMES         APIGROUP                                       NAMESPACED   KIND
policies                              plc                policy.open-cluster-management.io              true         Policy
policies                              pol                k8s.nginx.org                                                 true         Policy

Workaround
You can either use the FQDN or the shortname

plc or policies.policy.open-cluster-management.io
pol or policies.k8s.nginx.org

Solution
No solution there apart changing the names
I won't get on who did it first.
But at least we have a trace in the issues.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.