camilb / prometheus-kubernetes Goto Github PK

Monitoring Kubernetes clusters on AWS, GCP and Azure using Prometheus Operator and Grafana

License: Apache License 2.0

Shell 100.00%

prometheus monitoring-kubernetes-clusters grafana kubernetes-cluster kubernetes prometheus-operator

prometheus-kubernetes's Introduction

Monitoring Kubernetes clusters on AWS, GCP and Azure using Prometheus Operator by CoreOS

Note: the work on this repository is now based on CoreOS's kube-prometheus and it will be the default option for Kubernetes 1.7.X and up. For 1.5.X and 1.6.X you can deploy a simpler solution, located in ./basic directory. The purpose of this project is to provide a simple and interactive method to deploy and configure Prometheus on Kubernetes, especially for the users that are not using Helm.

Features

Prometheus Operator with support for Prometheus v2.X.X
highly available Prometheus and Alertmaneger
InCluster deployment using StatefulSets for persistent storage
auto-discovery for services and pods
automatic RBAC configuration
preconfigured alerts
preconfigured Grafana dashboards
easy to setup; usually less than a minute to deploy a complete monitoring solution for Kubernetes
support for Kubernetes v1.7.x and up running in AWS, GCP and Azure
tested on clusters deployed using kube-aws, kops, GKE and Azure

One minute deployment

Prerequisites

Kubernetes cluster and kubectl configured
Security Groups configured to allow the following ports:
- 9100/TCP - node-exporter
- 10250/TCP - kubernetes nodes metrics,
- 10251/TCP - kube-scheduler
- 10252/TCP - kube-controller-manager
- 10054/TCP and 10055/TCP - kube-dns

Optional

SMTP Account for email alerts
Token for Slack alerts

Running Kubernetes 1.12 and up?

If you are running Kubernetes 1.12 or higher you will also need to run cAdvisor on your cluster (bound to host port 4194) in order to access resource usage and performance characteristics of running containers.

Pre-Deployment

Clone the repository and checkout the latest release: curl -L https://git.io/getPrometheusKubernetes | sh -

Custom settings

All the components versions can be configured using the interactive deployment script. Same for the SMTP account or the Slack token.

Some other settings that can be changed before deployment:

Prometheus replicas: default 2 ==> manifests/prometheus/prometheus-k8s.yaml
persistent volume size: default 40Gi ==> manifests/prometheus/prometheus-k8s.yaml
allocated memory for Prometheus pods: default 2Gi ==> manifests/prometheus/prometheus-k8s.yaml
Alertmanager replicas: default 3 ==> manifests/alertmanager/alertmanager.yaml
Alertmanager configuration: ==> assets/alertmanager/alertmanager.yaml
custom Grafana dashboards: add yours in assets/grafana/ with names ending in -dashboard.json
custom alert rules: ==> assets/prometheus/rules/

Note: please commit your changes before deployment if you wish to keep them. The deploy script will remove the changes on most of the files.

Deploy

./deploy

Now you can access the dashboards locally using kubectl port-forwardcommand, or expose the services using a ingress or a LoadBalancer. Please check the ./tools directory to quickly configure a ingress or proxy the services to localhost.

To remove everything, just execute the ./teardown script.

Updating configurations

update alert rules: add or change the rules in assets/prometheus/rules/ and execute scripts/generate-rules-configmap.sh. Then apply the changes using kubectl apply -f manifests/prometheus/prometheus-k8s-rules.yaml -n monitoring
update grafana dashboards: add or change the existing dashboards in assets/grafana/ and execute scripts/generate-dashboards-configmap.sh. Then apply the changes using kubectl apply -f manifests/grafana/grafana-dashboards.cm.yaml.

Note: all the Grafana dashboards should have names ending in -dashboard.json.

Custom Prometheus configuration

The official documentation for Prometheus Operator custom configuration can be found here: custom-configuration.md If you wish, you can update the Prometheus configuration using the ./tools/custom-configuration/update_config script.

prometheus-kubernetes's People

Contributors

Stargazers

Watchers

Forkers

greytip gaja-hp gentitope zedfauji kalevas gokulchandrap spinezhang downneck zhiqinyang kmova yoniranjan plynte mikeln vivekrana007 gxjluck tuananh davidewatson alika rayhero joseppla mengt microwaves biglittleant jfpineau pingliu popurisiva akhilesh-anb nicolaihald jakirpatel lshguo philicious mansoormajeed kvolkovich-sc mateuszdyminski vpistis bernardovale stakater vladovidiu dmyerscough edtechfoundry lionkiss farrukhny lulzzz rahulwa imnotowner kinesisptyltd arealmaas wilderbridge harsh0707 vnt-github clustellar mhobotpplnet larkstreet ssro jasine rmccomb-bnet kairen oxyno-zeta a8j8i8t8 supernova106 gabrielsagnard ibrahiem94 omarqureshi mittyok deruizhu mahagamal csrao10890 sandeepmendiratta agusl88 silverstory forging2012 leeeboo archifleks drubin contiamo bpownow aviramartac 2dev2 qqian1991 madhunzv brianbenns snojo kubeme issac-lim gongchengcheng joyrahman synup liviudm mulakkalfaizal willy10839 ssslkj123 hujake yan8github dijeesh zhuchenjie rishiraaz h0-lab liuts kevin7674 katejc

prometheus-kubernetes's Issues

SMTP configuration for Grafana?

I was wondering if it would make sense to take the custom SMTP configuration provided for alert manager and add it as grafana config in /etc/grafana/grafana.ini as well?

With grafana 4.6.3 there seems to be no alert manager plug-in installed.

Updating prometheus configuration

I need to add additional prometheus configuration, I don't see how I can do it. I found this

update alert rules: add or change the rules in assets/prometheus/rules/ and execute scripts/generate-rules-configmap.sh. Then apply the changes using kubectl apply -f manifests/prometheus/prometheus-k8s-rules.yaml -n monitoring
update grafana dashboards: add or change the existing dashboards in assets/grafana/ and execute scripts/generate-dashboards-configmap.sh. Then apply the changes using kubectl apply -f manifests/grafana/grafana-dashboards.cm.yaml.

Nothing for what i need

Docker push failing on fedora, unable to complete init.sh

Building Grafana Docker image and pushing to dockerhub
Sending build context to Docker daemon 206.8 kB
Step 1/5 : FROM grafana/grafana:4.5.0
---> 26ce8e4a4b18
Step 2/5 : MAINTAINER Camil Blanaru [email protected]
---> Running in 9575f1fb31d4
---> e877fa9a5a7b
Removing intermediate container 9575f1fb31d4
Step 3/5 : ADD grafana-config/grafana.ini /etc/grafana/grafana.ini
---> 0fa652b8ca5e
Removing intermediate container 7ae47a753f8a
Step 4/5 : ADD grafana-dashboards/* /var/lib/grafana/dashboards/
---> 4aafce0feadf
Removing intermediate container c9e762439028
Step 5/5 : VOLUME /var/lib/grafana /var/log/grafana /etc/grafana
---> Running in e13c2c2541d1
---> 39f37b7fa2b0
Removing intermediate container e13c2c2541d1
Successfully built 39f37b7fa2b0
The push refers to a repository [registry.fedoraproject.org/gnufreex/grafana]
80ef99734457: Preparing
75e389a229a6: Preparing
5f70bf18a086: Preparing
c21816ecbe0a: Preparing
d6a9e60bf7b5: Preparing
18f9b4e2e1bc: Waiting
unauthorized: authentication required
docker push failed! perhaps you need to login "gnufreex" to dockerhub?
Password:
The push refers to a repository [registry.fedoraproject.org/gnufreex/grafana]
80ef99734457: Preparing
75e389a229a6: Preparing
5f70bf18a086: Preparing
c21816ecbe0a: Preparing
d6a9e60bf7b5: Preparing
18f9b4e2e1bc: Waiting
unauthorized: authentication required
docker push failed a second time! exiting.

error in "Waiting for Operator to register custom resource definitions........................"

I deployed entire grafana thingy successfully also used for few days when I was trying to apply few changes, I got few errors.
Later, I used ./teardown in order to delete everything and when I started ./deploy, I was getting following error, it stuck at "Waiting for Operator to register custom resource definitions.........................."

"abhijitzanak-mac:~ abhijitzanak$ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
prometheus-operator-430984089-97sgq 0/1 CrashLoopBackOff 3 1m
abhijitzanak-mac:~ abhijitzanak$ kubectl logs -f prometheus-operator-430984089-97sgq -n monitoring
ts=2017-11-26T16:07:45Z caller=operator.go:156 component=alertmanageroperator msg="connection established" cluster-version=v1.6.7
ts=2017-11-26T16:07:45Z caller=operator.go:257 component=prometheusoperator msg="connection established" cluster-version=v1.6.7
ts=2017-11-26T16:07:45Z caller=main.go:147 msg="Unhandled error received. Exiting..." err="Creating CRD: Alertmanager: the server could not find the requested resource"

it is not processing further and not able to deploy entire prometheus-kubernetes
Please let me know how I can solved this issue

prometheus-k8s-0/prometheus-k8s-1 stuck at pending state

First of all this tutorial to setup prometheus was really great and user friendly.
Unfortunately I got stuck after the deployments was exeecuted.

When I execute the command kubectl get pods -n monitoring. I could see all the pods running with respect to prometheus operator but expect prometheus-k8s-0 & prometheus-k8s-1.

Need help to proceed further on this deployment.

Monitoring NGiNX ingress

Hello!

Great piece of software, made deploying prometheus much easier and less painful!
I've however been running into a slight issue, when trying to get prometheus to recognize a nginx ingress that has been deployed using helm, it won't show up in the /targets list, other services can be monitored just fine, but NGiNX is the only one being stubborn.

Is this a configuration issue? If so, what could be done to get prometheus to recognize the NGiNX ServiceMonitor?

Thanks 👍

Deployment (click to expand)

{
  "kind": "Deployment",
  "apiVersion": "extensions/v1beta1",
  "metadata": {
    "name": "default-nginx-ingress-controller",
    "namespace": "ingress",
    "selfLink": "/apis/extensions/v1beta1/namespaces/ingress/deployments/default-nginx-ingress-controller",
    "uid": "a98f2411-fee1-11e7-b7ed-42010a840009",
    "resourceVersion": "159937",
    "generation": 5,
    "creationTimestamp": "2018-01-21T19:31:22Z",
    "labels": {
      "app": "nginx-ingress",
      "chart": "nginx-ingress-0.8.26",
      "component": "controller",
      "heritage": "Tiller",
      "release": "default-nginx-ingress"
    },
    "annotations": {
      "deployment.kubernetes.io/revision": "1"
    }
  },
  "spec": {
    "replicas": 1,
    "selector": {
      "matchLabels": {
        "app": "nginx-ingress",
        "component": "controller",
        "release": "default-nginx-ingress"
      }
    },
    "template": {
      "metadata": {
        "creationTimestamp": null,
        "labels": {
          "app": "nginx-ingress",
          "component": "controller",
          "release": "default-nginx-ingress"
        },
        "annotations": {
          "checksum/config": "ca0755a5e5dbce143322cdb78f9294bee8dffc1b806eee9b879211352ab1bd77"
        }
      },
      "spec": {
        "containers": [
          {
            "name": "nginx-ingress-controller",
            "image": "k8s.gcr.io/nginx-ingress-controller:0.9.0-beta.15",
            "args": [
              "/nginx-ingress-controller",
              "--default-backend-service=ingress/default-nginx-ingress-default-backend",
              "--election-id=ingress-controller-leader",
              "--ingress-class=nginx",
              "--configmap=ingress/default-nginx-ingress-controller"
            ],
            "ports": [
              {
                "name": "http",
                "containerPort": 80,
                "protocol": "TCP"
              },
              {
                "name": "https",
                "containerPort": 443,
                "protocol": "TCP"
              },
              {
                "name": "stats",
                "containerPort": 18080,
                "protocol": "TCP"
              }
            ],
            "env": [
              {
                "name": "POD_NAME",
                "valueFrom": {
                  "fieldRef": {
                    "apiVersion": "v1",
                    "fieldPath": "metadata.name"
                  }
                }
              },
              {
                "name": "POD_NAMESPACE",
                "valueFrom": {
                  "fieldRef": {
                    "apiVersion": "v1",
                    "fieldPath": "metadata.namespace"
                  }
                }
              }
            ],
            "resources": {},
            "livenessProbe": {
              "httpGet": {
                "path": "/healthz",
                "port": 10254,
                "scheme": "HTTP"
              },
              "initialDelaySeconds": 10,
              "timeoutSeconds": 1,
              "periodSeconds": 10,
              "successThreshold": 1,
              "failureThreshold": 3
            },
            "readinessProbe": {
              "httpGet": {
                "path": "/healthz",
                "port": 10254,
                "scheme": "HTTP"
              },
              "timeoutSeconds": 1,
              "periodSeconds": 10,
              "successThreshold": 1,
              "failureThreshold": 3
            },
            "terminationMessagePath": "/dev/termination-log",
            "terminationMessagePolicy": "File",
            "imagePullPolicy": "IfNotPresent"
          },
          {
            "name": "nginx-ingress-stats-exporter",
            "image": "sophos/nginx-vts-exporter:v0.6",
            "ports": [
              {
                "name": "metrics",
                "containerPort": 9913,
                "protocol": "TCP"
              }
            ],
            "env": [
              {
                "name": "METRICS_ADDR",
                "value": ":9913"
              },
              {
                "name": "METRICS_ENDPOINT",
                "value": "/metrics"
              },
              {
                "name": "METRICS_NS",
                "value": "nginx"
              },
              {
                "name": "NGINX_STATUS",
                "value": "http://localhost:18080/nginx_status/format/json"
              }
            ],
            "resources": {},
            "terminationMessagePath": "/dev/termination-log",
            "terminationMessagePolicy": "File",
            "imagePullPolicy": "IfNotPresent"
          }
        ],
        "restartPolicy": "Always",
        "terminationGracePeriodSeconds": 60,
        "dnsPolicy": "ClusterFirst",
        "serviceAccountName": "default-nginx-ingress",
        "serviceAccount": "default-nginx-ingress",
        "securityContext": {},
        "schedulerName": "default-scheduler"
      }
    },
    "strategy": {
      "type": "RollingUpdate",
      "rollingUpdate": {
        "maxUnavailable": 1,
        "maxSurge": 1
      }
    },
    "revisionHistoryLimit": 10
  },
  "status": {
    "observedGeneration": 5,
    "replicas": 1,
    "updatedReplicas": 1,
    "readyReplicas": 1,
    "availableReplicas": 1,
    "conditions": [
      {
        "type": "Available",
        "status": "True",
        "lastUpdateTime": "2018-01-21T19:31:22Z",
        "lastTransitionTime": "2018-01-21T19:31:22Z",
        "reason": "MinimumReplicasAvailable",
        "message": "Deployment has minimum availability."
      }
    ]
  }
}

ServiceMonitor (click to expand)

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: nginx-ingress-monitor
  namespace: monitoring
  labels:
    k8s-app: nginx-ingress
spec:
  selector:
    matchLabels:
      k8s-app: nginx-ingress
  endpoints:
  - port: exporter
    interval: 10s
  namespaceSelector:
    matchNames:
    - ingress

default backend - 404

So I tried Ingress controler. When I use ingress load balancer i get from describe command, i get folllowing message in browser

default backend - 404

Do it is hitting the fronted but backend is gone. Previous setup with manually added type: LoadBalancer to svc file worked ok, but we need it to be accessible from cluster and AWS network as well.

rbac errors on clean deploy

in attempting to deploy this on a 1.7.8-gke.0 cluster with legacy auth enabled (I'm not using RBAC), I get the following:

Deploying Prometheus Operator
serviceaccount "prometheus-operator" created
clusterrolebinding "prometheus-operator" created
service "prometheus-operator" created
deployment "prometheus-operator" created
Error from server (Forbidden): error when creating "manifests/prometheus-operator/prometheus-operator-cluster-role-binding.yaml": clusterroles.rbac.authorization.k8s.io "prometheus-operator" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:["thirdpartyresources"], APIGroups:["extensions"], Verbs:[""]} PolicyRule{Resources:["customresourcedefinitions"], APIGroups:["apiextensions.k8s.io"], Verbs:[""]} PolicyRule{Resources:["alertmanagers"], APIGroups:["monitoring.coreos.com"], Verbs:[""]} PolicyRule{Resources:["prometheuses"], APIGroups:["monitoring.coreos.com"], Verbs:[""]} PolicyRule{Resources:["servicemonitors"], APIGroups:["monitoring.coreos.com"], Verbs:[""]} PolicyRule{Resources:["statefulsets"], APIGroups:["apps"], Verbs:[""]} PolicyRule{Resources:["configmaps"], APIGroups:[""], Verbs:[""]} PolicyRule{Resources:["secrets"], APIGroups:[""], Verbs:[""]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["delete"]} PolicyRule{Resources:["services"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["services"], APIGroups:[""], Verbs:["create"]} PolicyRule{Resources:["services"], APIGroups:[""], Verbs:["update"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["create"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["update"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["namespaces"], APIGroups:[""], Verbs:["list"]}] user=&{[email protected] [system:authenticated] map[]} ownerrules=[PolicyRule{Resources:["selfsubjectaccessreviews"], APIGroups:["authorization.k8s.io"], Verbs:["create"]} PolicyRule{NonResourceURLs:["/api" "/api/" "/apis" "/apis/" "/healthz" "/swaggerapi" "/swaggerapi/*" "/version"], Verbs:["get"]}] ruleResolutionErrors=[]
Waiting for Operator to register custom resource definitions.....done!

is GKE supported without RBAC?

Is it because I am not an Owner of the project?

I'm not seeing my services show up when I scrape

Do you have something misconfigured. I was under the impression that if I launch my services with

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "5000"
  creationTimestamp: null

Operator would pick them up on <svc.namespace....>:5000/metrics What gives? Why don't I see them in the service discovery screen?

Improve RBAC configuration on GKE

Extracted from #40 (comment)

GCP: ./teardown doesnt delete clusterrolebinding "admin-binding"

..leading to an error on another ./deploy

Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "admin-binding" already exists

How to add exports manually ?

Hello - how i can create target to some exporter - for example to redis ?
In what file i should add configuration.
Thank you.

Prometheus-k8s-0 Pod Stuck in ContainerCreating

Hi,
I'vebeen axperiencing a problem with the prometheus-k8s-0 pod. It always gets stuck on the ContainerCreating stage.

The reason is the fact that the volumes can't get mounted:

Warning  FailedMount            1m (x2 over 3m)  kubelet, ip-yada-yada.internal  Unable to mount volumes for pod "prometheus-k8s-0_monitoring(sdsadsad-sdasd-asdsd-1111ac)": timeout expired waiting for volumes to attach/mount for pod "monitoring"/"prometheus-k8s-0". list of unattached/unmounted volumes=[prometheus-k8s-db]
Warning  FailedSync             1m (x2 over 3m)  kubelet, ip-yada-yada.interna Error syncing pod

Have you come across this issue before?
I'm running KOPS 1.8.4.

Thank you!

Prometheus kubelet target(s) state down, help debug

When I deploy and go to the Prometheus UI, all of the targets are up with the exception of kubelet. I have a 3 node system and the error for each node is similar to the following:

Get http://7.165.211.29:4194/metrics: dial tcp 7.165.211.29:4194: connect connection refused.

Where should I look to start debugging this problem?

Azure RBAC error on deploy

I'm getting an error similar to

#40

I'm running on Azure ACS K8S 1.7.7 and I'm deploying the latest code using commit cd8bdb9

I've tried doing

kubectl create clusterrolebinding admin-binding --clusterrole=cluster-admin --user=

where is obtained using the command

kubectl config view

Deploying Prometheus
serviceaccount "prometheus-k8s" created
rolebinding "prometheus-k8s" created
rolebinding "prometheus-k8s" created
rolebinding "prometheus-k8s" created
clusterrolebinding "prometheus-k8s" created
Error from server (Forbidden): error when creating "manifests/prometheus/prometh eus-k8s-rbac.yaml": roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidde n: attempt to grant extra privileges: [PolicyRule{Resources:["nodes"], APIGroups :[""], Verbs:["get"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["li st"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["watch"]} PolicyRul e{Resources:["services"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:[" services"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["services"], A PIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["endpoints"], APIGroups:[" "], Verbs:["get"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["l ist"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["watch"]} Poli cyRule{Resources:["pods"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:[ "pods"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["pods"], APIGroup s:[""], Verbs:["watch"]} PolicyRule{Resources:["configmaps"], APIGroups:[""], Ve rbs:["get"]}] user=&{kubeconfig [system:authenticated] map[]} ownerrules=[] rul eResolutionErrors=[]
Error from server (Forbidden): error when creating "manifests/prometheus/prometh eus-k8s-rbac.yaml": roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidde n: attempt to grant extra privileges: [PolicyRule{Resources:["services"], APIGro ups:[""], Verbs:["get"]} PolicyRule{Resources:["services"], APIGroups:[""], Verb s:["list"]} PolicyRule{Resources:["services"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["get"]} PolicyRule{Re sources:["endpoints"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["en dpoints"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["pods"], APIGr oups:[""], Verbs:["get"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:[ "list"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["watch"]}] user=& {kubeconfig [system:authenticated] map[]} ownerrules=[] ruleResolutionErrors=[]
Error from server (Forbidden): error when creating "manifests/prometheus/prometh eus-k8s-rbac.yaml": roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidde n: attempt to grant extra privileges: [PolicyRule{Resources:["services"], APIGro ups:[""], Verbs:["get"]} PolicyRule{Resources:["services"], APIGroups:[""], Verb s:["list"]} PolicyRule{Resources:["services"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["get"]} PolicyRule{Re sources:["endpoints"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["en dpoints"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["pods"], APIGr oups:[""], Verbs:["get"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:[ "list"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["watch"]}] user=& {kubeconfig [system:authenticated] map[]} ownerrules=[] ruleResolutionErrors=[]
Error from server (Forbidden): error when creating "manifests/prometheus/prometh eus-k8s-rbac.yaml": clusterroles.rbac.authorization.k8s.io "prometheus-k8s" is f orbidden: attempt to grant extra privileges: [PolicyRule{Resources:["nodes/metri cs"], APIGroups:[""], Verbs:["get"]} PolicyRule{NonResourceURLs:["/metrics"], Ve rbs:["get"]}] user=&{kubeconfig [system:authenticated] map[]} ownerrules=[] rul eResolutionErrors=[]
servicemonitor "kube-scheduler" created
servicemonitor "kube-apiserver" created
servicemonitor "kube-controller-manager" created
servicemonitor "node-exporter" created
servicemonitor "kubelet" created
service "prometheus-k8s" created
servicemonitor "prometheus-operator" created
prometheus "k8s" created
servicemonitor "alertmanager" created
servicemonitor "prometheus" created
servicemonitor "kube-dns" created
servicemonitor "kube-state-metrics" created
configmap "prometheus-k8s-rules" created

After installation, I can only see the basic Node info but no K8S specific data such as namespace, pod & depoyments

Any ideas ?

GCP: Error with StorageClass creation

https://github.com/camilb/prometheus-kubernetes/blob/master/deploy#L84 doesnt fit https://github.com/camilb/prometheus-kubernetes/blob/master/manifests/prometheus/prometheus-k8s.yaml#L4 leading to error

The StorageClass "ssd" is invalid: parameters: Forbidden: updates to parameters are forbidden.

and preventing the StorageClass frombeing created

Persist dashboards created by Grafana users

Whenever the Grafana pod is recreated (e.g. during rolling update of k8s cluster), it looses all dashboards that have been created by Grafana users. The new Grafana pod only has the default dashboards.

What's strange is that the alarm definitions seem to be still available:

# kubectl logs  grafana-588cdcbfbc-gd2md -n monitoring grafana
t=2018-07-02T10:24:11+0000 lvl=info msg="Starting Grafana" logger=server version=5.1.1 commit=dc11bf702 compiled=2018-05-07T11:50:10+0000
t=2018-07-02T10:24:11+0000 lvl=info msg="Config loaded from" logger=settings file=/usr/share/grafana/conf/defaults.ini
t=2018-07-02T10:24:11+0000 lvl=info msg="Config loaded from" logger=settings file=/etc/grafana/grafana.ini
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.data=/var/lib/grafana"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.logs=/var/log/grafana"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.plugins=/var/lib/grafana/plugins"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.provisioning=/etc/grafana/provisioning"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.log.mode=console"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_DATA=/var/lib/grafana"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_LOGS=/var/log/grafana"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_PLUGINS=/var/lib/grafana/plugins"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_PROVISIONING=/etc/grafana/provisioning"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_SECURITY_ADMIN_USER=grafana_admin"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_SECURITY_ADMIN_PASSWORD=*********"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_AUTH_ANONYMOUS_ENABLED=false"
t=2018-07-02T10:24:11+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_AUTH_BASIC_ENABLED=true"
t=2018-07-02T10:24:11+0000 lvl=info msg="Path Home" logger=settings path=/usr/share/grafana
t=2018-07-02T10:24:11+0000 lvl=info msg="Path Data" logger=settings path=/var/lib/grafana
t=2018-07-02T10:24:11+0000 lvl=info msg="Path Logs" logger=settings path=/var/log/grafana
t=2018-07-02T10:24:11+0000 lvl=info msg="Path Plugins" logger=settings path=/var/lib/grafana/plugins
t=2018-07-02T10:24:11+0000 lvl=info msg="Path Provisioning" logger=settings path=/etc/grafana/provisioning
t=2018-07-02T10:24:11+0000 lvl=info msg="App mode production" logger=settings
t=2018-07-02T10:24:11+0000 lvl=info msg="Initializing DB" logger=sqlstore dbtype=sqlite3
t=2018-07-02T10:24:11+0000 lvl=info msg="Starting DB migration" logger=migrator
t=2018-07-02T10:24:11+0000 lvl=info msg="Executing migration" logger=migrator id="copy data account to org"
t=2018-07-02T10:24:11+0000 lvl=info msg="Skipping migration condition not fulfilled" logger=migrator id="copy data account to org"
t=2018-07-02T10:24:11+0000 lvl=info msg="Executing migration" logger=migrator id="copy data account_user to org_user"
t=2018-07-02T10:24:11+0000 lvl=info msg="Skipping migration condition not fulfilled" logger=migrator id="copy data account_user to org_user"
t=2018-07-02T10:24:11+0000 lvl=info msg="Starting plugin search" logger=plugins
t=2018-07-02T10:24:11+0000 lvl=info msg="Initializing Alerting" logger=alerting.engine
t=2018-07-02T10:24:11+0000 lvl=info msg="Initializing CleanUpService" logger=cleanup
t=2018-07-02T10:24:11+0000 lvl=info msg="Initializing Stream Manager"
t=2018-07-02T10:24:11+0000 lvl=info msg="Initializing HTTP Server" logger=http.server address=0.0.0.0:3000 protocol=http subUrl= socket=
...
t=2018-07-02T10:26:03+0000 lvl=eror msg="Alert Rule Result Error" logger=alerting.evalContext ruleId=37 name="Replicas alert - test" error="Could not find datasource Data source not found" changing state to=alerting
...

Connecting Scylla and getting ingress up

Hello, Gotta say this repository has been a savior for setting up Prometheus and Grafana.
I have scylla setup on my cluster and Prometheus.

But I have no idea how to connect the two, if you have any idea how to connect them or point me to a tutorial, I would appreciate it greatly.

Thanks

Security Group Configuration

First, thanks for putting this together! It makes deploying Prometheus and Grafana in kubernetes super easy.

My deployment went great and most everything is up and running but I am getting errors about the control plane, scheduler, etc. I am certain it is because I have not set up the Security Groups primarily because I am not sure what I should open the ports up to. Can you provide some more guidance around what the ports need to be opened to? Do they need to be opened to the internet or internally between namespaces?

I am using Azure, btw.

Thanks!

Helm Usage

Your docs say

The purpose of this project is to provide a simple and interactive method to deploy and configure Prometheus on Kubernetes, especially for the users that are not using Helm.

For users using Helm, what is the way to deploy prometheus-kubernetes?

Is it possible to call the deploy script without interactive user input?

Hey 😊
first of all, great work - i like this setup a lot!

I want to use your setup on an cluster. But i need a possibility to automate the ./deploy call without interactive user input. Is there a possibility for that? Like passing the parameters directly to the script, e.g. ./deploy --prometheus-operator-version=<version> ... or something like that?

Deployment on GKE shouldn't use self-hosted controller / scheduler

Asking for clarification from anyone else as well but on Google Kubernetes Engine (GKE). Are the kube-controller-manager & kube-scheduler even visible?

I cannot see the pods in kube-system namespace and I think that's because google manages the master node for you. Therefore the service definitions for kube-controller-manager and kube-scheduler are not relevant to a deployment on GKE.

I checked using:

kubectl get pod -n kube-system -l k8s-app=kube-controller-manager
kubectl get pod -n kube-system -l k8s-app=kube-scheduler

The result is that you seem to get constant alerts for those two components being down.
Am I right in the above - or by deploying the relevant service, should those two components be monitorable?

Temp Workaround

If you don't want to deploy those components and therefore not get alerts then you need to do the following:

Delete controller-manager rules - kube-controller-manager.rules
Delete scheduler rules - kube-scheduler.rules
Delete controller-manager service - kube-controller-manager.yaml
Delete scheduler service - kube-scheduler.yaml
Delete prometheus, controller manager servicemonitor - prometheus-k8s-service-monitor-kube-controller-manager.yaml
Delete promethus, scheduler servicemonitor - prometheus-k8s-service-monitor-kube-scheduler.yaml)
Commit changes to your branch / local copy and then deploy.

Note that I deployed all then sort of hacked away by deleting various elements until the alerts stopped. So the above is untested, but I think it should roughly work looking at the deployment script.

Please let me know if they are visible as I'd love to leave the alerts in if they work properly.

Deploying twice leads to bizarre errors

When deployed over and over again, at a certain point errors like these surface up:

The Prometheus "k8s" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
configmap "prometheus-k8s-rules" created
The ServiceMonitor "alertmanager" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "kube-apiserver" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "kube-controller-manager" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "kube-dns" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "kube-scheduler" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "kube-state-metrics" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "kubelet" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "node-exporter" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "prometheus-operator" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1
The ServiceMonitor "prometheus" is invalid: apiVersion: Invalid value: "monitoring.coreos.com/__internal": must be monitoring.coreos.com/v1

Seeing some alerts firing on a new cluster in aure

Here's what I see in the alert manager

Prometheus Time Series Collection and Processing Server.pdf

Some metrics are missing

I have installed this yesterday and still not able to get all the metrics

On third pic, alerts in prmetheus are giving me no data point. That is for every alert except Pods restarting too much alert, which gives me data about pods that are failing.

During install process I only changed NodePort to loadBalancer for front ends, and left all other on default. It is prometheus 2.0beta.

Configuration of SMTP and Slack after deploy.

Hi,

First of all thanks for sharing, it's really good one!
How can I config slack and SMTP after deploy (when I choose NO in deployment) ?
It's not part of config map

Using the same prometheus setup for scraping other services' metrics

How do I use the same prometheus setup to get metrics from another service, say an Nginx VM which is outside of kubernetes. ? How do I add a new scrape rule?

Prometheus 2G Limit

I have attempted to change this, but no matter what I set it to, it still ends up being 2Gi in the yaml on the cluster.

The problem is I am just testing it out, using GKE using their image n1-standard-1 which has 3.75GB, but once the system stuff is added the pod cannot be scheduled, even if I add more nodes.

Anyway to make this a lower value ? or change where ever it's actually being set from.. as if I remove it from the yml, commit the change and run deploy, it still ends up at 2Gi on the cluster.

Custom configuration gets overwritten

I've been attempting to deploy prometheus federation using a custom configuration but think I am not understanding something fully.

I have three clusters with external url configured as lab1, 2 and 3

https://api.lab1.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web
https://api.lab1.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web

Everything works great for the individual clusters but then I try to configure lab1 as the federation server but the configuration never appears to take...

Steps to recreate after successful deployment and testing external urls:

kubectl -n monitoring delete prometheus k8s

I edit tools/custom-configuration/prometheus-k8s-secret.prometheus.yaml and add the following underneath scrape_configs:

- job_name: 'federate'
    scrape_interval: 15s
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{job="prometheus"}'
        - '{__name__=~"job:.*"}'
    static_configs:
      - targets:
        - 'https://api.lab2.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web'
        - 'https://api.lab3.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web'

This seems fine and when i deploy the secret I can decode the base64 and see that it is correct
kubectl -n monitoring create secret generic prometheus-k8s --from-file=./prometheus-k8s-secret/

However when I deploy new prometheus
kubectl -n monitoring create -f prometheus-k8s.yaml
it overwrites the prometheus.yaml in the secret (I decoded the base64 as i deployed this and see it is immediately overwritten)

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: k8s
  labels:
    prometheus: k8s
spec:
  replicas: 2
  version: v2.2.0-rc.1
  externalUrl: 
   https://api.lab1.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web
  serviceAccountName: prometheus-k8s
  serviceMonitorSelector: {}
  ruleSelector:
    matchLabels:
      role: prometheus-rulefiles
      prometheus: k8s
  resources:
  storage:
    volumeClaimTemplate:
      metadata:
        annotations:
          annotation1: prometheus
      spec:
        storageClassName: ssd
        resources:
          requests:
            storage: 40Gi
    requests:
      memory: 1Gi
  alerting:
    alertmanagers:
    - namespace: monitoring
      name: alertmanager-main
      port: web

What am I missing here? I've tried everything I can think of including scorched earth of re-creating the cluster and re-cloning the repo.

Need to add Rabbitmq exporter to Prometheus configuration to scrape the data

I am creating custom dashboard wherein Prometheus will poll the data from rabbitmq exporter, please let me know where I can modify the Rabbitmq exporter any template to include the Rabbitmq exporter in scrape_configs.

Doesn't work with Custom cloud provider - with no persistence

Hey, I deployed this by choosing 4: Custom as cloud provider, so expecting no persistence, all got deployed bar prometheus itself, in operator logs I got this:

ts=2018-04-26T19:47:06.023555918Z caller=operator.go:158 component=alertmanageroperator msg="connection established" cluster-version=v1.10.1
ts=2018-04-26T19:47:06.525003228Z caller=operator.go:540 component=alertmanageroperator msg="CRD created" crd=Alertmanager
ts=2018-04-26T19:47:06.529602055Z caller=operator.go:1036 component=prometheusoperator msg="CRD created" crd=Prometheus
ts=2018-04-26T19:47:06.574189585Z caller=operator.go:1036 component=prometheusoperator msg="CRD created" crd=ServiceMonitor
ts=2018-04-26T19:47:09.547596022Z caller=operator.go:172 component=alertmanageroperator msg="CRD API endpoints ready"
ts=2018-04-26T19:47:10.625701877Z caller=operator.go:305 component=alertmanageroperator msg="Alertmanager added" key=monitoring/main
ts=2018-04-26T19:47:10.625812527Z caller=operator.go:381 component=alertmanageroperator msg="sync alertmanager" key=monitoring/main
ts=2018-04-26T19:47:10.741597917Z caller=operator.go:381 component=alertmanageroperator msg="sync alertmanager" key=monitoring/main
ts=2018-04-26T19:47:10.812573384Z caller=operator.go:345 component=alertmanageroperator msg="update handler" old=318005 cur=318008
ts=2018-04-26T19:47:10.812668745Z caller=operator.go:381 component=alertmanageroperator msg="sync alertmanager" key=monitoring/main
ts=2018-04-26T19:47:12.634674503Z caller=operator.go:272 component=prometheusoperator msg="CRD API endpoints ready"
ts=2018-04-26T19:47:18.843127314Z caller=operator.go:345 component=alertmanageroperator msg="update handler" old=318008 cur=318153
ts=2018-04-26T19:47:18.843345626Z caller=operator.go:381 component=alertmanageroperator msg="sync alertmanager" key=monitoring/main
ts=2018-04-26T19:47:18.844365776Z caller=operator.go:637 component=prometheusoperator msg="update handler" old=318008 cur=318153
E0426 19:47:21.722454       1 streamwatcher.go:109] Unable to decode an event from the watch stream: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'
E0426 19:47:21.726264       1 streamwatcher.go:109] Unable to decode an event from the watch stream: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'
E0426 19:47:22.730912       1 reflector.go:205] github.com/coreos/prometheus-operator/pkg/prometheus/operator.go:279: Failed to list *v1.Prometheus: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'
E0426 19:47:23.735996       1 reflector.go:205] github.com/coreos/prometheus-operator/pkg/prometheus/operator.go:279: Failed to list *v1.Prometheus: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'
E0426 19:47:24.741082       1 reflector.go:205] github.com/coreos/prometheus-operator/pkg/prometheus/operator.go:279: Failed to list *v1.Prometheus: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'
E0426 19:47:25.822169       1 reflector.go:205] github.com/coreos/prometheus-operator/pkg/prometheus/operator.go:279: Failed to list *v1.Prometheus: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'
E0426 19:47:26.828386       1 reflector.go:205] github.com/coreos/prometheus-operator/pkg/prometheus/operator.go:279: Failed to list *v1.Prometheus: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'

Help ?

Updating new grafana dashboards.

When I create a dashboard in grafana it doesnt save, so i must export it and do the steps here, to import it permanently.

update grafana dashboards: add or change the existing dashboards in assets/grafana/ and execute scripts/generate-dashboards-configmap.sh. Then apply the changes using kubectl apply -f manifests/grafana/grafana-dashboards.cm.yaml.

and when I go to apply the changes, I get this,
The ConfigMap "grafana-dashboards" is invalid: metadata.annotations: Too long: must have at most 262144 characters

Kubernetes 1.9.6, what branch should I use?

We're doing a jump from Kubernetes 1.7.3 to 1.9.6, was wondering what branch of this project I should use and whether that branch ditches the cadvisor metrics for the kubelet stats api.

Deploying to GKE with LoadBalancer results in 404

I've deployed to my GCE / GKE environment with ./deploy.
After after everything is up, I can proxy to all instances.

But if I deploy the Nginx ingress over tools/ingress/init.sh I'm getting first a 400 error in the UI.
This could be solved with changing the service type from ClusterIP to NodePort for all services.

Now the status looks good and I can reach my test domain grafana.example.com
But I get a 404 error for all files except the initial html document.
Any idea?

How to get jmx metrics in prometheus-kubernetes project?

Hi guys! Thank you for your job!
Everything is good, but I just don't know how to get metrics from prometheus jmx exporter agent.
The agent is installed in app container and listen on 9090 port:

spec:
      containers:
      - image: registry/image:version
        imagePullPolicy: Always
        name: app-name
        env:
        - name: JAVA_CUSTOM_OPTS
          value: "-Xms256m -Xmx512m -javaagent:./jmx_prometheus_javaagent.jar=9090:jmx/prom-jmx-agent-config.yml"
        ports:
        - name: app
          containerPort: 8080
        - name: monitoring
          containerPort: 9090

How I can add this new target into prometheus configuration? Usually I just have to edit prometheus.yml config file, but I don't understand how to add it into your configuration.

Did not find expected key

Running ./deploy on new kubernetes cluster printed the following while deploying grafana:

Deploying Grafana
Enter Grafana administrator username [admin]: 
Enter Grafana administrator password: *****
secret "grafana-credentials" created
configmap "grafana-dashboards" created
service "grafana" created
error: error converting YAML to JSON: yaml: line 72: did not find expected key

Just thought I would bring your attention to this

Is it possible to deploy this on a standalone kubernetes cluster

After git clone and running the deploy script, I get a prompt asking me to enter AWS, GCP, or Azure. Is there no way to use this project for self-hosted k8s cluster?

Operator reliably creates zombie namespace

kubectl get namespaces
NAME          STATUS        AGE
default       Active        82d
jenkins       Active        10d
kube-public   Active        82d
kube-system   Active        82d
monitoring    Terminating   2h
monitoring2   Terminating   26m
rabbit        Active        2d

I first started operator deploy script, and it hang at

Waiting for Operator to register custom resource definitions.........................................................................................................................................................................................................................................................................................................................................................................................................................^C
So I interupted it. Ran teardown script. But namespace is stuck at terminating phase. I can not run the script again until I change namespace variable to monitoring2.

{
    "apiVersion": "v1",
    "kind": "Namespace",
    "metadata": {
        "creationTimestamp": "2017-09-24T21:05:34Z",
        "deletionTimestamp": "2017-09-24T21:14:27Z",
        "name": "monitoring2",
        "resourceVersion": "11461478",
        "selfLink": "/api/v1/namespaces/monitoring2",
        "uid": "1b94baae-a16c-11e7-8350-02a930332a2a"
    },
    "spec": {
        "finalizers": [
            "kubernetes"
        ]
    },
    "status": {
        "phase": "Terminating"
    }
}

Grafana shows less than 2 days of graphs (~30hours)

Hi @camilb ,
Not sure if you ran into this issue, but I have a stock deployment of prometheus-kubernetes using the deploy script (no emails/slack integration) on two k8s clusters deployed with kops. One is using the default debian jessie image, the other one is CoreOS. On both, when I look at grafana's graphs I always have the last ~30 hours graphed for both clusters. Worth to mention that the deployment has more than 30 hours.

I think it might be related to how much data retention prometheus has decalred in its config. After deployment on k8s there is a stateful set for prometheus and on the configuration i see - '--storage.tsdb.retention=24h'. How can I change this to a week or more? (Editing the sts on k8s doesn't work, the changes aren't applied)

Thanks!

NAME                                   READY     STATUS    RESTARTS   AGE
alertmanager-main-0                    2/2       Running   2          9d
alertmanager-main-1                    2/2       Running   2          9d
alertmanager-main-2                    2/2       Running   2          9d
grafana-2372210178-z9wwp               2/2       Running   2          9d
kube-state-metrics-2628134419-pb6lz    2/2       Running   2          9d
node-exporter-3ks0r                    1/1       Running   1          9d
node-exporter-h0l49                    1/1       Running   1          9d
node-exporter-n6wk9                    1/1       Running   1          9d
node-exporter-ss48g                    1/1       Running   1          9d
node-exporter-stvdz                    1/1       Running   1          9d
prometheus-k8s-0                       2/2       Running   2          9d
prometheus-k8s-1                       2/2       Running   2          9d
prometheus-operator-1613319266-9bfzw   1/1       Running   1          9d

Add instructions for updating the Services/Stack to README

The README mentions how to update Grafana Dashboards and Prometheus Alert Rules.
However it doesnt have infos how to update the stack/services itself.

Can one simply pull latest code and re-run ./deploy script and expect it to deploy any updated services/configs?
Do you have to run the ./teardown beforehand and will it persist/reattach the storage and data to the newly created stack then?
Do you have to manually run kubectl set image / apply -f for updating?

how does this compare to prometheus operator?

How does that project compare to Prometheus operator and kube-prometheus?
What is the roadmap and intention or problem this is trying to solve that might not be solved elsewhere?

I'm running kube-aws 0.9.7. I've downloaded Prometheus Operator 0.11.1.
I've got the Prometheus running but add to open the ports, editing the kube-aws security groups. Dashboards are showing up in Grafana ok.

Failed cron resoruce and RBAC config on Azure

Hi,

we have several deployments on AWS and OnPremise, when we deployed to Azure env, it throws the following errors: (defaults accepted everywhere)

E0222 10:00:53.490896 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/cronjob.go:93: Failed to list *v2alpha1.CronJob: the server could not find the requested resource
E0222 10:00:53.858858 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/namespace.go:80: Failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list namespaces at the cluster scope

Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T19:11:02Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:23:29Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

deployment went trough without errors.

Add ability to quickly add "nodeSelector" to all deployments

Reason I'm asking for this is to not have the monitoring services/deployments interfere with the main app nodes.

App not showing up in Prometheus target

Hi,

Hope you are well! This repository helped me a lot to configure Prometheus and Grafana.
I have deployed a sample web application.
I need your help to get the Application metrics in targets page of prometheus. How can i achieve this.

Appreciate your help.

Thanks,
Akhilesh Appana

how I can persist grafana dashboard data

your repo helped me immensely to configure Prometheus and Grafana.Today, only I upgraded my cluster on kops 1.7.1.
my problem statement is

I am creating few custom dashboard i.e using rabbitmq exporter getting queue count.
Daily I am restarting my node, how I can persist my dashboard data and data source? do i need to add pvc for garfana as well , if yes , where i can add ?
Am I not able see my rabbitmq exporter data in grafana dashboard and prometheus data?

How I can achieve above things

kubelet Kubernetes node labels are missing

I encounter the same issue as described here
prometheus/prometheus#3294
when deploying Prometheus

Earlier (before the Prometheus operator implementation) metrics like
container_memory_working_set_bytes{id='/'}
provided all the node labels

But unfortunately now most of the useful labels are missing.

Grafana 5.x

I tried to bump the Grafana version up to v5.1.0-beta1, but then I encountered problems with the persistent storage volumes.

This is the error from the Grafana log

GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
mkdir: cannot create directory '/var/lib/grafana/plugins': Permission denied

Reverting back to 4.6.3 resolved the issue.

Any plans for supporting Grafana 5.x?

Azure - Grafana+Prometheus deployments fail with "Addition of a managed disk to a VM with blob based disks is not supported."

I'm running on Azure with acs-engine.

The deployment script ends successfully, but the pods are not able to start:

Warning FailedMount 15s (x2 over 1m) attachdetach AttachVolume.Attach failed for volume "pvc-f1da4bbd-f6dc-11e7-a367-000d3a1b19fd" : Attach volume "XXX-dynamic-pvc-f1da4bbd-f6dc-11e7-a367-000d3a1b19fd" to instance "k8s-nodepool1-18946821-0" failed with compute.VirtualMachinesClient#CreateOrUpdate: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="OperationNotAllowed" Message="Addition of a managed disk to a VM with blob based disks is not supported."

Need support to pull metrics from kafka cluster

I am using your project - https://github.com/camilb/prometheus-kubernetes to monitor my Kubernetes cluster. Now I have kafka running in my cluster that was created using this chart - https://github.com/kubernetes/charts/tree/master/incubator/kafka

Could you please look at last few comments here and point out where should target endpoint be configured?

helm/charts#4138

Remove Kube-DNS target from prometheus

I'm using this project to monitoring a k8s cluster on azure (using acs-engine). I think Kube-DNS doesn't have it's metric enabled by default. What's the best way to remove the kube-dns target from prometheus. I tried to update the configmap on k8s without much success.