Code Monkey home page Code Monkey logo

prometheus-kubernetes's Introduction

Monitoring Kubernetes clusters on AWS, GCP and Azure using Prometheus Operator by CoreOS

alt

Note: the work on this repository is now based on CoreOS's kube-prometheus and it will be the default option for Kubernetes 1.7.X and up. For 1.5.X and 1.6.X you can deploy a simpler solution, located in ./basic directory. The purpose of this project is to provide a simple and interactive method to deploy and configure Prometheus on Kubernetes, especially for the users that are not using Helm.

Features

  • Prometheus Operator with support for Prometheus v2.X.X
  • highly available Prometheus and Alertmaneger
  • InCluster deployment using StatefulSets for persistent storage
  • auto-discovery for services and pods
  • automatic RBAC configuration
  • preconfigured alerts
  • preconfigured Grafana dashboards
  • easy to setup; usually less than a minute to deploy a complete monitoring solution for Kubernetes
  • support for Kubernetes v1.7.x and up running in AWS, GCP and Azure
  • tested on clusters deployed using kube-aws, kops, GKE and Azure

One minute deployment

asciicast

Prerequisites

  • Kubernetes cluster and kubectl configured
  • Security Groups configured to allow the following ports:
    • 9100/TCP - node-exporter
    • 10250/TCP - kubernetes nodes metrics,
    • 10251/TCP - kube-scheduler
    • 10252/TCP - kube-controller-manager
    • 10054/TCP and 10055/TCP - kube-dns

Optional

  • SMTP Account for email alerts
  • Token for Slack alerts

Running Kubernetes 1.12 and up?

If you are running Kubernetes 1.12 or higher you will also need to run cAdvisor on your cluster (bound to host port 4194) in order to access resource usage and performance characteristics of running containers.

Pre-Deployment

Clone the repository and checkout the latest release: curl -L https://git.io/getPrometheusKubernetes | sh -

Custom settings

All the components versions can be configured using the interactive deployment script. Same for the SMTP account or the Slack token.

Some other settings that can be changed before deployment:

  • Prometheus replicas: default 2 ==> manifests/prometheus/prometheus-k8s.yaml
  • persistent volume size: default 40Gi ==> manifests/prometheus/prometheus-k8s.yaml
  • allocated memory for Prometheus pods: default 2Gi ==> manifests/prometheus/prometheus-k8s.yaml
  • Alertmanager replicas: default 3 ==> manifests/alertmanager/alertmanager.yaml
  • Alertmanager configuration: ==> assets/alertmanager/alertmanager.yaml
  • custom Grafana dashboards: add yours in assets/grafana/ with names ending in -dashboard.json
  • custom alert rules: ==> assets/prometheus/rules/

Note: please commit your changes before deployment if you wish to keep them. The deploy script will remove the changes on most of the files.

Deploy

./deploy

Now you can access the dashboards locally using kubectl port-forwardcommand, or expose the services using a ingress or a LoadBalancer. Please check the ./tools directory to quickly configure a ingress or proxy the services to localhost.

To remove everything, just execute the ./teardown script.

Updating configurations

  • update alert rules: add or change the rules in assets/prometheus/rules/ and execute scripts/generate-rules-configmap.sh. Then apply the changes using kubectl apply -f manifests/prometheus/prometheus-k8s-rules.yaml -n monitoring
  • update grafana dashboards: add or change the existing dashboards in assets/grafana/ and execute scripts/generate-dashboards-configmap.sh. Then apply the changes using kubectl apply -f manifests/grafana/grafana-dashboards.cm.yaml.

Note: all the Grafana dashboards should have names ending in -dashboard.json.

Custom Prometheus configuration

The official documentation for Prometheus Operator custom configuration can be found here: custom-configuration.md If you wish, you can update the Prometheus configuration using the ./tools/custom-configuration/update_config script.

prometheus-kubernetes's People

Contributors

archifleks avatar arealmaas avatar bpownow avatar camilb avatar downneck avatar drubin avatar joseppla avatar kuramal avatar leeeboo avatar matthaywardwebdesign avatar philicious avatar qqian1991 avatar sairuby avatar scottbrenner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prometheus-kubernetes's Issues

SMTP configuration for Grafana?

I was wondering if it would make sense to take the custom SMTP configuration provided for alert manager and add it as grafana config in /etc/grafana/grafana.ini as well?

With grafana 4.6.3 there seems to be no alert manager plug-in installed.

Updating prometheus configuration

I need to add additional prometheus configuration, I don't see how I can do it. I found this

update alert rules: add or change the rules in assets/prometheus/rules/ and execute scripts/generate-rules-configmap.sh. Then apply the changes using kubectl apply -f manifests/prometheus/prometheus-k8s-rules.yaml -n monitoring
update grafana dashboards: add or change the existing dashboards in assets/grafana/ and execute scripts/generate-dashboards-configmap.sh. Then apply the changes using kubectl apply -f manifests/grafana/grafana-dashboards.cm.yaml.

Nothing for what i need

Docker push failing on fedora, unable to complete init.sh

Building Grafana Docker image and pushing to dockerhub
Sending build context to Docker daemon 206.8 kB
Step 1/5 : FROM grafana/grafana:4.5.0
---> 26ce8e4a4b18
Step 2/5 : MAINTAINER Camil Blanaru [email protected]
---> Running in 9575f1fb31d4
---> e877fa9a5a7b
Removing intermediate container 9575f1fb31d4
Step 3/5 : ADD grafana-config/grafana.ini /etc/grafana/grafana.ini
---> 0fa652b8ca5e
Removing intermediate container 7ae47a753f8a
Step 4/5 : ADD grafana-dashboards/* /var/lib/grafana/dashboards/
---> 4aafce0feadf
Removing intermediate container c9e762439028
Step 5/5 : VOLUME /var/lib/grafana /var/log/grafana /etc/grafana
---> Running in e13c2c2541d1
---> 39f37b7fa2b0
Removing intermediate container e13c2c2541d1
Successfully built 39f37b7fa2b0
The push refers to a repository [registry.fedoraproject.org/gnufreex/grafana]
80ef99734457: Preparing
75e389a229a6: Preparing
5f70bf18a086: Preparing
c21816ecbe0a: Preparing
d6a9e60bf7b5: Preparing
18f9b4e2e1bc: Waiting
unauthorized: authentication required
docker push failed! perhaps you need to login "gnufreex" to dockerhub?
Password:
The push refers to a repository [registry.fedoraproject.org/gnufreex/grafana]
80ef99734457: Preparing
75e389a229a6: Preparing
5f70bf18a086: Preparing
c21816ecbe0a: Preparing
d6a9e60bf7b5: Preparing
18f9b4e2e1bc: Waiting
unauthorized: authentication required
docker push failed a second time! exiting.

error in "Waiting for Operator to register custom resource definitions........................"

I deployed entire grafana thingy successfully also used for few days when I was trying to apply few changes, I got few errors.
Later, I used ./teardown in order to delete everything and when I started ./deploy, I was getting following error, it stuck at "Waiting for Operator to register custom resource definitions.........................."

  • "abhijitzanak-mac:~ abhijitzanak$ kubectl get pods -n monitoring
  • NAME READY STATUS RESTARTS AGE
  • prometheus-operator-430984089-97sgq 0/1 CrashLoopBackOff 3 1m
  • abhijitzanak-mac:~ abhijitzanak$ kubectl logs -f prometheus-operator-430984089-97sgq -n monitoring
  • ts=2017-11-26T16:07:45Z caller=operator.go:156 component=alertmanageroperator msg="connection established" cluster-version=v1.6.7
  • ts=2017-11-26T16:07:45Z caller=operator.go:257 component=prometheusoperator msg="connection established" cluster-version=v1.6.7
  • ts=2017-11-26T16:07:45Z caller=main.go:147 msg="Unhandled error received. Exiting..." err="Creating CRD: Alertmanager: the server could not find the requested resource"

it is not processing further and not able to deploy entire prometheus-kubernetes
Please let me know how I can solved this issue

prometheus-k8s-0/prometheus-k8s-1 stuck at pending state

First of all this tutorial to setup prometheus was really great and user friendly.
Unfortunately I got stuck after the deployments was exeecuted.

When I execute the command kubectl get pods -n monitoring. I could see all the pods running with respect to prometheus operator but expect prometheus-k8s-0 & prometheus-k8s-1.

Need help to proceed further on this deployment.

Monitoring NGiNX ingress

Hello!

Great piece of software, made deploying prometheus much easier and less painful!
I've however been running into a slight issue, when trying to get prometheus to recognize a nginx ingress that has been deployed using helm, it won't show up in the /targets list, other services can be monitored just fine, but NGiNX is the only one being stubborn.

Is this a configuration issue? If so, what could be done to get prometheus to recognize the NGiNX ServiceMonitor?

Thanks ๐Ÿ‘

Deployment (click to expand)
{
  "kind": "Deployment",
  "apiVersion": "extensions/v1beta1",
  "metadata": {
    "name": "default-nginx-ingress-controller",
    "namespace": "ingress",
    "selfLink": "/apis/extensions/v1beta1/namespaces/ingress/deployments/default-nginx-ingress-controller",
    "uid": "a98f2411-fee1-11e7-b7ed-42010a840009",
    "resourceVersion": "159937",
    "generation": 5,
    "creationTimestamp": "2018-01-21T19:31:22Z",
    "labels": {
      "app": "nginx-ingress",
      "chart": "nginx-ingress-0.8.26",
      "component": "controller",
      "heritage": "Tiller",
      "release": "default-nginx-ingress"
    },
    "annotations": {
      "deployment.kubernetes.io/revision": "1"
    }
  },
  "spec": {
    "replicas": 1,
    "selector": {
      "matchLabels": {
        "app": "nginx-ingress",
        "component": "controller",
        "release": "default-nginx-ingress"
      }
    },
    "template": {
      "metadata": {
        "creationTimestamp": null,
        "labels": {
          "app": "nginx-ingress",
          "component": "controller",
          "release": "default-nginx-ingress"
        },
        "annotations": {
          "checksum/config": "ca0755a5e5dbce143322cdb78f9294bee8dffc1b806eee9b879211352ab1bd77"
        }
      },
      "spec": {
        "containers": [
          {
            "name": "nginx-ingress-controller",
            "image": "k8s.gcr.io/nginx-ingress-controller:0.9.0-beta.15",
            "args": [
              "/nginx-ingress-controller",
              "--default-backend-service=ingress/default-nginx-ingress-default-backend",
              "--election-id=ingress-controller-leader",
              "--ingress-class=nginx",
              "--configmap=ingress/default-nginx-ingress-controller"
            ],
            "ports": [
              {
                "name": "http",
                "containerPort": 80,
                "protocol": "TCP"
              },
              {
                "name": "https",
                "containerPort": 443,
                "protocol": "TCP"
              },
              {
                "name": "stats",
                "containerPort": 18080,
                "protocol": "TCP"
              }
            ],
            "env": [
              {
                "name": "POD_NAME",
                "valueFrom": {
                  "fieldRef": {
                    "apiVersion": "v1",
                    "fieldPath": "metadata.name"
                  }
                }
              },
              {
                "name": "POD_NAMESPACE",
                "valueFrom": {
                  "fieldRef": {
                    "apiVersion": "v1",
                    "fieldPath": "metadata.namespace"
                  }
                }
              }
            ],
            "resources": {},
            "livenessProbe": {
              "httpGet": {
                "path": "/healthz",
                "port": 10254,
                "scheme": "HTTP"
              },
              "initialDelaySeconds": 10,
              "timeoutSeconds": 1,
              "periodSeconds": 10,
              "successThreshold": 1,
              "failureThreshold": 3
            },
            "readinessProbe": {
              "httpGet": {
                "path": "/healthz",
                "port": 10254,
                "scheme": "HTTP"
              },
              "timeoutSeconds": 1,
              "periodSeconds": 10,
              "successThreshold": 1,
              "failureThreshold": 3
            },
            "terminationMessagePath": "/dev/termination-log",
            "terminationMessagePolicy": "File",
            "imagePullPolicy": "IfNotPresent"
          },
          {
            "name": "nginx-ingress-stats-exporter",
            "image": "sophos/nginx-vts-exporter:v0.6",
            "ports": [
              {
                "name": "metrics",
                "containerPort": 9913,
                "protocol": "TCP"
              }
            ],
            "env": [
              {
                "name": "METRICS_ADDR",
                "value": ":9913"
              },
              {
                "name": "METRICS_ENDPOINT",
                "value": "/metrics"
              },
              {
                "name": "METRICS_NS",
                "value": "nginx"
              },
              {
                "name": "NGINX_STATUS",
                "value": "http://localhost:18080/nginx_status/format/json"
              }
            ],
            "resources": {},
            "terminationMessagePath": "/dev/termination-log",
            "terminationMessagePolicy": "File",
            "imagePullPolicy": "IfNotPresent"
          }
        ],
        "restartPolicy": "Always",
        "terminationGracePeriodSeconds": 60,
        "dnsPolicy": "ClusterFirst",
        "serviceAccountName": "default-nginx-ingress",
        "serviceAccount": "default-nginx-ingress",
        "securityContext": {},
        "schedulerName": "default-scheduler"
      }
    },
    "strategy": {
      "type": "RollingUpdate",
      "rollingUpdate": {
        "maxUnavailable": 1,
        "maxSurge": 1
      }
    },
    "revisionHistoryLimit": 10
  },
  "status": {
    "observedGeneration": 5,
    "replicas": 1,
    "updatedReplicas": 1,
    "readyReplicas": 1,
    "availableReplicas": 1,
    "conditions": [
      {
        "type": "Available",
        "status": "True",
        "lastUpdateTime": "2018-01-21T19:31:22Z",
        "lastTransitionTime": "2018-01-21T19:31:22Z",
        "reason": "MinimumReplicasAvailable",
        "message": "Deployment has minimum availability."
      }
    ]
  }
}
ServiceMonitor (click to expand)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: nginx-ingress-monitor
  namespace: monitoring
  labels:
    k8s-app: nginx-ingress
spec:
  selector:
    matchLabels:
      k8s-app: nginx-ingress
  endpoints:
  - port: exporter
    interval: 10s
  namespaceSelector:
    matchNames:
    - ingress

default backend - 404

So I tried Ingress controler. When I use ingress load balancer i get from describe command, i get folllowing message in browser

default backend - 404

Do it is hitting the fronted but backend is gone. Previous setup with manually added type: LoadBalancer to svc file worked ok, but we need it to be accessible from cluster and AWS network as well.

rbac errors on clean deploy

in attempting to deploy this on a 1.7.8-gke.0 cluster with legacy auth enabled (I'm not using RBAC), I get the following:

Deploying Prometheus Operator
serviceaccount "prometheus-operator" created
clusterrolebinding "prometheus-operator" created
service "prometheus-operator" created
deployment "prometheus-operator" created
Error from server (Forbidden): error when creating "manifests/prometheus-operator/prometheus-operator-cluster-role-binding.yaml": clusterroles.rbac.authorization.k8s.io "prometheus-operator" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:["thirdpartyresources"], APIGroups:["extensions"], Verbs:[""]} PolicyRule{Resources:["customresourcedefinitions"], APIGroups:["apiextensions.k8s.io"], Verbs:[""]} PolicyRule{Resources:["alertmanagers"], APIGroups:["monitoring.coreos.com"], Verbs:[""]} PolicyRule{Resources:["prometheuses"], APIGroups:["monitoring.coreos.com"], Verbs:[""]} PolicyRule{Resources:["servicemonitors"], APIGroups:["monitoring.coreos.com"], Verbs:[""]} PolicyRule{Resources:["statefulsets"], APIGroups:["apps"], Verbs:[""]} PolicyRule{Resources:["configmaps"], APIGroups:[""], Verbs:[""]} PolicyRule{Resources:["secrets"], APIGroups:[""], Verbs:[""]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["delete"]} PolicyRule{Resources:["services"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["services"], APIGroups:[""], Verbs:["create"]} PolicyRule{Resources:["services"], APIGroups:[""], Verbs:["update"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["create"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["update"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["namespaces"], APIGroups:[""], Verbs:["list"]}] user=&{[email protected] [system:authenticated] map[]} ownerrules=[PolicyRule{Resources:["selfsubjectaccessreviews"], APIGroups:["authorization.k8s.io"], Verbs:["create"]} PolicyRule{NonResourceURLs:["/api" "/api/" "/apis" "/apis/" "/healthz" "/swaggerapi" "/swaggerapi/*" "/version"], Verbs:["get"]}] ruleResolutionErrors=[]
Waiting for Operator to register custom resource definitions.....done!

is GKE supported without RBAC?

Is it because I am not an Owner of the project?

I'm not seeing my services show up when I scrape

Do you have something misconfigured. I was under the impression that if I launch my services with

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "5000"
  creationTimestamp: null

Operator would pick them up on <svc.namespace....>:5000/metrics What gives? Why don't I see them in the service discovery screen?

How to add exports manually ?

Hello - how i can create target to some exporter - for example to redis ?
In what file i should add configuration.
Thank you.

Prometheus-k8s-0 Pod Stuck in ContainerCreating

Hi,
I'vebeen axperiencing a problem with the prometheus-k8s-0 pod. It always gets stuck on the ContainerCreating stage.

The reason is the fact that the volumes can't get mounted:

Warning  FailedMount            1m (x2 over 3m)  kubelet, ip-yada-yada.internal  Unable to mount volumes for pod "prometheus-k8s-0_monitoring(sdsadsad-sdasd-asdsd-1111ac)": timeout expired waiting for volumes to attach/mount for pod "monitoring"/"prometheus-k8s-0". list of unattached/unmounted volumes=[prometheus-k8s-db]
Warning  FailedSync             1m (x2 over 3m)  kubelet, ip-yada-yada.interna Error syncing pod

Have you come across this issue before?
I'm running KOPS 1.8.4.

Thank you!

Azure RBAC error on deploy

Hi

I'm getting an error similar to

#40

I'm running on Azure ACS K8S 1.7.7 and I'm deploying the latest code using commit cd8bdb9

I've tried doing

kubectl create clusterrolebinding admin-binding --clusterrole=cluster-admin --user=

where is obtained using the command

kubectl config view

Deploying Prometheus
serviceaccount "prometheus-k8s" created
rolebinding "prometheus-k8s" created
rolebinding "prometheus-k8s" created
rolebinding "prometheus-k8s" created
clusterrolebinding "prometheus-k8s" created
Error from server (Forbidden): error when creating "manifests/prometheus/prometh eus-k8s-rbac.yaml": roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidde n: attempt to grant extra privileges: [PolicyRule{Resources:["nodes"], APIGroups :[""], Verbs:["get"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["li st"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["watch"]} PolicyRul e{Resources:["services"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:[" services"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["services"], A PIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["endpoints"], APIGroups:[" "], Verbs:["get"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["l ist"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["watch"]} Poli cyRule{Resources:["pods"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:[ "pods"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["pods"], APIGroup s:[""], Verbs:["watch"]} PolicyRule{Resources:["configmaps"], APIGroups:[""], Ve rbs:["get"]}] user=&{kubeconfig [system:authenticated] map[]} ownerrules=[] rul eResolutionErrors=[]
Error from server (Forbidden): error when creating "manifests/prometheus/prometh eus-k8s-rbac.yaml": roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidde n: attempt to grant extra privileges: [PolicyRule{Resources:["services"], APIGro ups:[""], Verbs:["get"]} PolicyRule{Resources:["services"], APIGroups:[""], Verb s:["list"]} PolicyRule{Resources:["services"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["get"]} PolicyRule{Re sources:["endpoints"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["en dpoints"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["pods"], APIGr oups:[""], Verbs:["get"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:[ "list"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["watch"]}] user=& {kubeconfig [system:authenticated] map[]} ownerrules=[] ruleResolutionErrors=[]
Error from server (Forbidden): error when creating "manifests/prometheus/prometh eus-k8s-rbac.yaml": roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidde n: attempt to grant extra privileges: [PolicyRule{Resources:["services"], APIGro ups:[""], Verbs:["get"]} PolicyRule{Resources:["services"], APIGroups:[""], Verb s:["list"]} PolicyRule{Resources:["services"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["get"]} PolicyRule{Re sources:["endpoints"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["en dpoints"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["pods"], APIGr oups:[""], Verbs:["get"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:[ "list"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["watch"]}] user=& {kubeconfig [system:authenticated] map[]} ownerrules=[] ruleResolutionErrors=[]
Error from server (Forbidden): error when creating "manifests/prometheus/prometh eus-k8s-rbac.yaml": clusterroles.rbac.authorization.k8s.io "prometheus-k8s" is f orbidden: attempt to grant extra privileges: [PolicyRule{Resources:["nodes/metri cs"], APIGroups:[""], Verbs:["get"]} PolicyRule{NonResourceURLs:["/metrics"], Ve rbs:["get"]}] user=&{kubeconfig [system:authenticated] map[]} ownerrules=[] rul eResolutionErrors=[]
servicemonitor "kube-scheduler" created
servicemonitor "kube-apiserver" created
servicemonitor "kube-controller-manager" created
servicemonitor "node-exporter" created
servicemonitor "kubelet" created
service "prometheus-k8s" created
servicemonitor "prometheus-operator" created
prometheus "k8s" created
servicemonitor "alertmanager" created
servicemonitor "prometheus" created
servicemonitor "kube-dns" created
servicemonitor "kube-state-metrics" created
configmap "prometheus-k8s-rules" created

After installation, I can only see the basic Node info but no K8S specific data such as namespace, pod & depoyments

Any ideas ?

Persist dashboards created by Grafana users

Whenever the Grafana pod is recreated (e.g. during rolling update of k8s cluster), it looses all dashboards that have been created by Grafana users. The new Grafana pod only has the default dashboards.

What's strange is that the alarm definitions seem to be still available:

# kubectl logs  grafana-588cdcbfbc-gd2md -n monitoring grafana
t=2018-07-02T10:24:11+0000 lvl=info msg="Starting Grafana" logger=server version=5.1.1 commit=dc11bf702 compiled=2018-05-07T11:50:10+0000
t=2018-07-02T10:24:11+0000 lvl=info msg="Config loaded from" logger=settings file=/usr/share/grafana/conf/defaults.ini
t=2018-07-02T10:24:11+0000 lvl=info msg="Config loaded from" logger=settings file=/etc/grafana/grafana.ini
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.data=/var/lib/grafana"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.logs=/var/log/grafana"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.plugins=/var/lib/grafana/plugins"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.provisioning=/etc/grafana/provisioning"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.log.mode=console"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_DATA=/var/lib/grafana"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_LOGS=/var/log/grafana"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_PLUGINS=/var/lib/grafana/plugins"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_PROVISIONING=/etc/grafana/provisioning"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_SECURITY_ADMIN_USER=grafana_admin"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_SECURITY_ADMIN_PASSWORD=*********"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_AUTH_ANONYMOUS_ENABLED=false"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_AUTH_BASIC_ENABLED=true"
t=2018-07-02T10:24:11+0000 lvl=info msg="Path Home" logger=settings path=/usr/share/grafana
t=2018-07-02T10:24:11+0000 lvl=info msg="Path Data" logger=settings path=/var/lib/grafana
t=2018-07-02T10:24:11+0000 lvl=info msg="Path Logs" logger=settings path=/var/log/grafana
t=2018-07-02T10:24:11+0000 lvl=info msg="Path Plugins" logger=settings path=/var/lib/grafana/plugins
t=2018-07-02T10:24:11+0000 lvl=info msg="Path Provisioning" logger=settings path=/etc/grafana/provisioning
t=2018-07-02T10:24:11+0000 lvl=info msg="App mode production" logger=settings
t=2018-07-02T10:24:11+0000 lvl=info msg="Initializing DB" logger=sqlstore dbtype=sqlite3
t=2018-07-02T10:24:11+0000 lvl=info msg="Starting DB migration" logger=migrator
t=2018-07-02T10:24:11+0000 lvl=info msg="Executing migration" logger=migrator id="copy data account to org"
t=2018-07-02T10:24:11+0000 lvl=info msg="Skipping migration condition not fulfilled" logger=migrator id="copy data account to org"
t=2018-07-02T10:24:11+0000 lvl=info msg="Executing migration" logger=migrator id="copy data account_user to org_user"
t=2018-07-02T10:24:11+0000 lvl=info msg="Skipping migration condition not fulfilled" logger=migrator id="copy data account_user to org_user"
t=2018-07-02T10:24:11+0000 lvl=info msg="Starting plugin search" logger=plugins
t=2018-07-02T10:24:11+0000 lvl=info msg="Initializing Alerting" logger=alerting.engine
t=2018-07-02T10:24:11+0000 lvl=info msg="Initializing CleanUpService" logger=cleanup
t=2018-07-02T10:24:11+0000 lvl=info msg="Initializing Stream Manager"
t=2018-07-02T10:24:11+0000 lvl=info msg="Initializing HTTP Server" logger=http.server address=0.0.0.0:3000 protocol=http subUrl= socket=
...
t=2018-07-02T10:26:03+0000 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=37 name="Replicas alert - test" error="Could not find datasource Data source not found" changing state to=alerting
...

Connecting Scylla and getting ingress up

Hello, Gotta say this repository has been a savior for setting up Prometheus and Grafana.
I have scylla setup on my cluster and Prometheus.

But I have no idea how to connect the two, if you have any idea how to connect them or point me to a tutorial, I would appreciate it greatly.

Thanks

Security Group Configuration

First, thanks for putting this together! It makes deploying Prometheus and Grafana in kubernetes super easy.

My deployment went great and most everything is up and running but I am getting errors about the control plane, scheduler, etc. I am certain it is because I have not set up the Security Groups primarily because I am not sure what I should open the ports up to. Can you provide some more guidance around what the ports need to be opened to? Do they need to be opened to the internet or internally between namespaces?

I am using Azure, btw.

Thanks!

Helm Usage

Your docs say

The purpose of this project is to provide a simple and interactive method to deploy and configure Prometheus on Kubernetes, especially for the users that are not using Helm.

For users using Helm, what is the way to deploy prometheus-kubernetes?

Is it possible to call the deploy script without interactive user input?

Hey ๐Ÿ˜Š
first of all, great work - i like this setup a lot!

I want to use your setup on an cluster. But i need a possibility to automate the ./deploy call without interactive user input. Is there a possibility for that? Like passing the parameters directly to the script, e.g. ./deploy --prometheus-operator-version=<version> ... or something like that?

Deployment on GKE shouldn't use self-hosted controller / scheduler

Asking for clarification from anyone else as well but on Google Kubernetes Engine (GKE). Are the kube-controller-manager & kube-scheduler even visible?

I cannot see the pods in kube-system namespace and I think that's because google manages the master node for you. Therefore the service definitions for kube-controller-manager and kube-scheduler are not relevant to a deployment on GKE.

I checked using:

kubectl get pod -n kube-system -l k8s-app=kube-controller-manager
kubectl get pod -n kube-system -l k8s-app=kube-scheduler

The result is that you seem to get constant alerts for those two components being down.
Am I right in the above - or by deploying the relevant service, should those two components be monitorable?

Temp Workaround

If you don't want to deploy those components and therefore not get alerts then you need to do the following:

Note that I deployed all then sort of hacked away by deleting various elements until the alerts stopped. So the above is untested, but I think it should roughly work looking at the deployment script.

Please let me know if they are visible as I'd love to leave the alerts in if they work properly.

Deploying twice leads to bizarre errors

When deployed over and over again, at a certain point errors like these surface up:

The Prometheus "k8s" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
configmap "prometheus-k8s-rules" created
The ServiceMonitor "alertmanager" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "kube-apiserver" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "kube-controller-manager" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "kube-dns" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "kube-scheduler" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "kube-state-metrics" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "kubelet" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "node-exporter" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "prometheus-operator" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "prometheus" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1

Some metrics are missing

I have installed this yesterday and still not able to get all the metrics

screenshot from 2017-09-14 09-48-11
screenshot from 2017-09-14 09-54-55

On third pic, alerts in prmetheus are giving me no data point. That is for every alert except Pods restarting too much alert, which gives me data about pods that are failing.

During install process I only changed NodePort to loadBalancer for front ends, and left all other on default. It is prometheus 2.0beta.

Prometheus 2G Limit

I have attempted to change this, but no matter what I set it to, it still ends up being 2Gi in the yaml on the cluster.

The problem is I am just testing it out, using GKE using their image n1-standard-1 which has 3.75GB, but once the system stuff is added the pod cannot be scheduled, even if I add more nodes.

Anyway to make this a lower value ? or change where ever it's actually being set from.. as if I remove it from the yml, commit the change and run deploy, it still ends up at 2Gi on the cluster.

Custom configuration gets overwritten

I've been attempting to deploy prometheus federation using a custom configuration but think I am not understanding something fully.

I have three clusters with external url configured as lab1, 2 and 3

https://api.lab1.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web
https://api.lab1.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web

Everything works great for the individual clusters but then I try to configure lab1 as the federation server but the configuration never appears to take...

Steps to recreate after successful deployment and testing external urls:

kubectl -n monitoring delete prometheus k8s

I edit tools/custom-configuration/prometheus-k8s-secret.prometheus.yaml and add the following underneath scrape_configs:

- job_name: 'federate'
    scrape_interval: 15s
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{job="prometheus"}'
        - '{__name__=~"job:.*"}'
    static_configs:
      - targets:
        - 'https://api.lab2.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web'
        - 'https://api.lab3.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web'

This seems fine and when i deploy the secret I can decode the base64 and see that it is correct
kubectl -n monitoring create secret generic prometheus-k8s --from-file=./prometheus-k8s-secret/

However when I deploy new prometheus
kubectl -n monitoring create -f prometheus-k8s.yaml
it overwrites the prometheus.yaml in the secret (I decoded the base64 as i deployed this and see it is immediately overwritten)

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: k8s
  labels:
    prometheus: k8s
spec:
  replicas: 2
  version: v2.2.0-rc.1
  externalUrl: 
   https://api.lab1.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web
  serviceAccountName: prometheus-k8s
  serviceMonitorSelector: {}
  ruleSelector:
    matchLabels:
      role: prometheus-rulefiles
      prometheus: k8s
  resources:
  storage:
    volumeClaimTemplate:
      metadata:
        annotations:
          annotation1: prometheus
      spec:
        storageClassName: ssd
        resources:
          requests:
            storage: 40Gi
    requests:
      memory: 1Gi
  alerting:
    alertmanagers:
    - namespace: monitoring
      name: alertmanager-main
      port: web

What am I missing here? I've tried everything I can think of including scorched earth of re-creating the cluster and re-cloning the repo.

Doesn't work with Custom cloud provider - with no persistence

Hey, I deployed this by choosing 4: Custom as cloud provider, so expecting no persistence, all got deployed bar prometheus itself, in operator logs I got this:

ts=2018-04-26T19:47:06.023555918Z caller=operator.go:158 component=alertmanageroperator msg="connection established" cluster-version=v1.10.1
ts=2018-04-26T19:47:06.525003228Z caller=operator.go:540 component=alertmanageroperator msg="CRD created" crd=Alertmanager
ts=2018-04-26T19:47:06.529602055Z caller=operator.go:1036 component=prometheusoperator msg="CRD created" crd=Prometheus
ts=2018-04-26T19:47:06.574189585Z caller=operator.go:1036 component=prometheusoperator msg="CRD created" crd=ServiceMonitor
ts=2018-04-26T19:47:09.547596022Z caller=operator.go:172 component=alertmanageroperator msg="CRD API endpoints ready"
ts=2018-04-26T19:47:10.625701877Z caller=operator.go:305 component=alertmanageroperator msg="Alertmanager added" key=monitoring/main
ts=2018-04-26T19:47:10.625812527Z caller=operator.go:381 component=alertmanageroperator msg="sync alertmanager" key=monitoring/main
ts=2018-04-26T19:47:10.741597917Z caller=operator.go:381 component=alertmanageroperator msg="sync alertmanager" key=monitoring/main
ts=2018-04-26T19:47:10.812573384Z caller=operator.go:345 component=alertmanageroperator msg="update handler" old=318005 cur=318008
ts=2018-04-26T19:47:10.812668745Z caller=operator.go:381 component=alertmanageroperator msg="sync alertmanager" key=monitoring/main
ts=2018-04-26T19:47:12.634674503Z caller=operator.go:272 component=prometheusoperator msg="CRD API endpoints ready"
ts=2018-04-26T19:47:18.843127314Z caller=operator.go:345 component=alertmanageroperator msg="update handler" old=318008 cur=318153
ts=2018-04-26T19:47:18.843345626Z caller=operator.go:381 component=alertmanageroperator msg="sync alertmanager" key=monitoring/main
ts=2018-04-26T19:47:18.844365776Z caller=operator.go:637 component=prometheusoperator msg="update handler" old=318008 cur=318153
E0426 19:47:21.722454       1 streamwatcher.go:109] Unable to decode an event from the watch stream: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'
E0426 19:47:21.726264       1 streamwatcher.go:109] Unable to decode an event from the watch stream: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'
E0426 19:47:22.730912       1 reflector.go:205] github.com/coreos/prometheus-operator/pkg/prometheus/operator.go:279: Failed to list *v1.Prometheus: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'
E0426 19:47:23.735996       1 reflector.go:205] github.com/coreos/prometheus-operator/pkg/prometheus/operator.go:279: Failed to list *v1.Prometheus: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'
E0426 19:47:24.741082       1 reflector.go:205] github.com/coreos/prometheus-operator/pkg/prometheus/operator.go:279: Failed to list *v1.Prometheus: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'
E0426 19:47:25.822169       1 reflector.go:205] github.com/coreos/prometheus-operator/pkg/prometheus/operator.go:279: Failed to list *v1.Prometheus: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'
E0426 19:47:26.828386       1 reflector.go:205] github.com/coreos/prometheus-operator/pkg/prometheus/operator.go:279: Failed to list *v1.Prometheus: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'

Help ?

Updating new grafana dashboards.

When I create a dashboard in grafana it doesnt save, so i must export it and do the steps here, to import it permanently.

update grafana dashboards: add or change the existing dashboards in assets/grafana/ and execute scripts/generate-dashboards-configmap.sh. Then apply the changes using kubectl apply -f manifests/grafana/grafana-dashboards.cm.yaml.

and when I go to apply the changes, I get this,
The ConfigMap "grafana-dashboards" is invalid: metadata.annotations: Too long: must have at most 262144 characters

Kubernetes 1.9.6, what branch should I use?

We're doing a jump from Kubernetes 1.7.3 to 1.9.6, was wondering what branch of this project I should use and whether that branch ditches the cadvisor metrics for the kubelet stats api.

Deploying to GKE with LoadBalancer results in 404

I've deployed to my GCE / GKE environment with ./deploy.
After after everything is up, I can proxy to all instances.

But if I deploy the Nginx ingress over tools/ingress/init.sh I'm getting first a 400 error in the UI.
This could be solved with changing the service type from ClusterIP to NodePort for all services.

Now the status looks good and I can reach my test domain grafana.example.com
But I get a 404 error for all files except the initial html document.
Any idea?

How to get jmx metrics in prometheus-kubernetes project?

Hi guys! Thank you for your job!
Everything is good, but I just don't know how to get metrics from prometheus jmx exporter agent.
The agent is installed in app container and listen on 9090 port:

spec:
      containers:
      - image: registry/image:version
        imagePullPolicy: Always
        name: app-name
        env:
        - name: JAVA_CUSTOM_OPTS
          value: "-Xms256m -Xmx512m -javaagent:./jmx_prometheus_javaagent.jar=9090:jmx/prom-jmx-agent-config.yml"
        ports:
        - name: app
          containerPort: 8080
        - name: monitoring
          containerPort: 9090 

How I can add this new target into prometheus configuration? Usually I just have to edit prometheus.yml config file, but I don't understand how to add it into your configuration.

Did not find expected key

Running ./deploy on new kubernetes cluster printed the following while deploying grafana:

Deploying Grafana
Enter Grafana administrator username [admin]: 
Enter Grafana administrator password: *****
secret "grafana-credentials" created
configmap "grafana-dashboards" created
service "grafana" created
error: error converting YAML to JSON: yaml: line 72: did not find expected key

Just thought I would bring your attention to this

Operator reliably creates zombie namespace

kubectl get namespaces
NAME          STATUS        AGE
default       Active        82d
jenkins       Active        10d
kube-public   Active        82d
kube-system   Active        82d
monitoring    Terminating   2h
monitoring2   Terminating   26m
rabbit        Active        2d

I first started operator deploy script, and it hang at

Waiting for Operator to register custom resource definitions.........................................................................................................................................................................................................................................................................................................................................................................................................................^C
So I interupted it. Ran teardown script. But namespace is stuck at terminating phase. I can not run the script again until I change namespace variable to monitoring2.

{
    "apiVersion": "v1",
    "kind": "Namespace",
    "metadata": {
        "creationTimestamp": "2017-09-24T21:05:34Z",
        "deletionTimestamp": "2017-09-24T21:14:27Z",
        "name": "monitoring2",
        "resourceVersion": "11461478",
        "selfLink": "/api/v1/namespaces/monitoring2",
        "uid": "1b94baae-a16c-11e7-8350-02a930332a2a"
    },
    "spec": {
        "finalizers": [
            "kubernetes"
        ]
    },
    "status": {
        "phase": "Terminating"
    }
}

Grafana shows less than 2 days of graphs (~30hours)

Hi @camilb ,
Not sure if you ran into this issue, but I have a stock deployment of prometheus-kubernetes using the deploy script (no emails/slack integration) on two k8s clusters deployed with kops. One is using the default debian jessie image, the other one is CoreOS. On both, when I look at grafana's graphs I always have the last ~30 hours graphed for both clusters. Worth to mention that the deployment has more than 30 hours.

I think it might be related to how much data retention prometheus has decalred in its config. After deployment on k8s there is a stateful set for prometheus and on the configuration i see - '--storage.tsdb.retention=24h'. How can I change this to a week or more? (Editing the sts on k8s doesn't work, the changes aren't applied)

Thanks!

screen shot 2017-12-06 at 15 45 41

NAME                                   READY     STATUS    RESTARTS   AGE
alertmanager-main-0                    2/2       Running   2          9d
alertmanager-main-1                    2/2       Running   2          9d
alertmanager-main-2                    2/2       Running   2          9d
grafana-2372210178-z9wwp               2/2       Running   2          9d
kube-state-metrics-2628134419-pb6lz    2/2       Running   2          9d
node-exporter-3ks0r                    1/1       Running   1          9d
node-exporter-h0l49                    1/1       Running   1          9d
node-exporter-n6wk9                    1/1       Running   1          9d
node-exporter-ss48g                    1/1       Running   1          9d
node-exporter-stvdz                    1/1       Running   1          9d
prometheus-k8s-0                       2/2       Running   2          9d
prometheus-k8s-1                       2/2       Running   2          9d
prometheus-operator-1613319266-9bfzw   1/1       Running   1          9d

Add instructions for updating the Services/Stack to README

The README mentions how to update Grafana Dashboards and Prometheus Alert Rules.
However it doesnt have infos how to update the stack/services itself.

  • Can one simply pull latest code and re-run ./deploy script and expect it to deploy any updated services/configs?
  • Do you have to run the ./teardown beforehand and will it persist/reattach the storage and data to the newly created stack then?
  • Do you have to manually run kubectl set image / apply -f for updating?

how does this compare to prometheus operator?

How does that project compare to Prometheus operator and kube-prometheus?
What is the roadmap and intention or problem this is trying to solve that might not be solved elsewhere?

I'm running kube-aws 0.9.7. I've downloaded Prometheus Operator 0.11.1.
I've got the Prometheus running but add to open the ports, editing the kube-aws security groups. Dashboards are showing up in Grafana ok.

Failed cron resoruce and RBAC config on Azure

Hi,

we have several deployments on AWS and OnPremise, when we deployed to Azure env, it throws the following errors: (defaults accepted everywhere)

E0222 10:00:53.490896 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/cronjob.go:93: Failed to list *v2alpha1.CronJob: the server could not find the requested resource
E0222 10:00:53.858858 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/namespace.go:80: Failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list namespaces at the cluster scope

Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T19:11:02Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:23:29Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

deployment went trough without errors.

App not showing up in Prometheus target

Hi,

Hope you are well! This repository helped me a lot to configure Prometheus and Grafana.
I have deployed a sample web application.
I need your help to get the Application metrics in targets page of prometheus. How can i achieve this.

Appreciate your help.

Thanks,
Akhilesh Appana

how I can persist grafana dashboard data

your repo helped me immensely to configure Prometheus and Grafana.Today, only I upgraded my cluster on kops 1.7.1.
my problem statement is

  1. I am creating few custom dashboard i.e using rabbitmq exporter getting queue count.
  2. Daily I am restarting my node, how I can persist my dashboard data and data source? do i need to add pvc for garfana as well , if yes , where i can add ?
  3. Am I not able see my rabbitmq exporter data in grafana dashboard and prometheus data?

How I can achieve above things

kubelet Kubernetes node labels are missing

Hi

I encounter the same issue as described here
prometheus/prometheus#3294
when deploying Prometheus

Earlier (before the Prometheus operator implementation) metrics like
container_memory_working_set_bytes{id='/'}
provided all the node labels
image

But unfortunately now most of the useful labels are missing.
image

Grafana 5.x

Hi

I tried to bump the Grafana version up to v5.1.0-beta1, but then I encountered problems with the persistent storage volumes.

This is the error from the Grafana log

GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
mkdir: cannot create directory '/var/lib/grafana/plugins': Permission denied

Reverting back to 4.6.3 resolved the issue.

Any plans for supporting Grafana 5.x?

Azure - Grafana+Prometheus deployments fail with "Addition of a managed disk to a VM with blob based disks is not supported."

I'm running on Azure with acs-engine.

The deployment script ends successfully, but the pods are not able to start:

Warning FailedMount 15s (x2 over 1m) attachdetach AttachVolume.Attach failed for volume "pvc-f1da4bbd-f6dc-11e7-a367-000d3a1b19fd" : Attach volume "XXX-dynamic-pvc-f1da4bbd-f6dc-11e7-a367-000d3a1b19fd" to instance "k8s-nodepool1-18946821-0" failed with compute.VirtualMachinesClient#CreateOrUpdate: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="OperationNotAllowed" Message="Addition of a managed disk to a VM with blob based disks is not supported."

Remove Kube-DNS target from prometheus

Hi

I'm using this project to monitoring a k8s cluster on azure (using acs-engine). I think Kube-DNS doesn't have it's metric enabled by default. What's the best way to remove the kube-dns target from prometheus. I tried to update the configmap on k8s without much success.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.