grafana / grafana-operator Goto Github PK

An operator for Grafana that installs and manages Grafana instances, Dashboards and Datasources through Kubernetes/OpenShift CRs

Home Page: https://grafana.github.io/grafana-operator/

License: Apache License 2.0

Makefile 3.78% Dockerfile 0.21% Go 62.49% Shell 0.83% Smarty 0.51% Jsonnet 31.71% Lua 0.47%

grafana-operator kubernetes operator grafana k8s golang go observability monitoring openshift

grafana-operator's Introduction

Grafana Operator

Official Documentation | Quickstart | Installation | Tutorials

The Grafana Operator is a Kubernetes operator built to help you manage your Grafana instances and its resources in and outside of Kubernetes.

Whether you’re running one Grafana instance or many, the Grafana Operator simplifies the processes of installing, configuring, and maintaining Grafana and its resources. Additionally, it's perfect for those who prefer to manage resources using infrastructure as code or using GitOps workflows through tools like ArgoCD and Flux CD.

Getting Started

Installation

Option 1: Helm Chart

Deploy the Grafana Operator easily in your cluster using Helm:

helm upgrade -i grafana-operator oci://ghcr.io/grafana/helm-charts/grafana-operator --version v5.6.3

Option 2: Kustomize & More

Prefer Kustomize, Openshift OLM, or Kubernetes directly? Find detailed instructions in our Installation Guide.

For even more detailed setups, see our documentation.

Example: Deploying Grafana & A Dashboard

Here's a simple example of deploying Grafana and a Grafana Dashboard using the custom resources (CRs) defined by the Grafana Operator:

apiVersion: grafana.integreatly.org/v1beta1
kind: Grafana
metadata:
  name: grafana
  labels:
    dashboards: "grafana"
spec:
  config:
    log:
      mode: "console"
    auth:
      disable_login_form: false
    security:
      admin_user: root
      admin_password: secret

---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
metadata:
  name: sample-dashboard
spec:
  resyncPeriod: 30s
  instanceSelector:
    matchLabels:
      dashboards: "grafana"
  json: >
    {
      "title": "Simple Dashboard",
      "timezone": "browser",
      "refresh": "5s",
      "panels": [],
      "time": {
        "from": "now-6h",
        "to": "now"
      }
    }

For more tailored setups and resources management, check out these guides:

Why Grafana Operator?

Switching to Grafana Operator from traditional deployments amplifies your efficiency by:

Enabling multi-instance and multi-namespace Grafana deployments effortlessly.
Simplifying dashboard, data sources, and plugin management through code.
Supporting both Kubernetes and Openshift with smart adjustments based on the environment.
Allowing management of external Grafana instances for robust GitOps integration.
Providing multi-architecture support, making it versatile across different platforms.
Offering one-click installation through Operatorhub/OLM.

Get In Touch!

Got questions or suggestions? Let us know! The quickest way to reach us is through our GitHub Issues or by joining our weekly public meeting on Tuesdays at 12:30 PM UTC (link here).

Feel free to drop into our Grafana Operator discussions on:

Contributing

For more information on how to contribute to the operator look at CONTRIBUTING.md.

Version Support and Development Mindset

Caution

v4 will stop receiving bug fixes and security updates as of the 22nd of December 2023. We recommend you migrate to v5 if you haven't yet! Please follow our v4 -> v5 Migration Guide to mitigate any potential future risks.

V5 is the current, actively developed and maintained version of the operator, which you can find on the Master Branch.

A more in-depth overview of v5 is available in the intro blog

V5 is a ground-up rewrite of the operator to refocus development on:

Performance
Reliability
Maintainability
Extensibility
Testability
Usability

The previous versions of the operator have some serious tech-debt issues, which effectively prevent community members that aren't massively familiar with the project and/or its codebase from contributing features that they wish to see.

These previous versions, we're built on a "as-needed" basis, meaning that whatever was the fastest way to reach the desired feature, was the way it was implemented. This lead to situations where controllers for different resources were using massively different logic, and features were added wherever and however they could be made to work.

V5 aims to re-focus the operator with a more thought out architecture and framework, that will work better, both for developers and users. With certain standards and approaches, we can provide a better user experience through:

Better designed Custom Resource Definitions (Upstream Grafana Native fields will be supported without having to whitelist them in the operator logic).
- Upstream documentation can be followed to define the Grafana Operator Custom Resources.
- This also means a change in API versions for the resources, but we see this as a benefit, our previous mantra of maintaining a seamless upgrade from version to version, limited us in the changes we wanted to make for a long time.
A more streamlined Grafana resource management workflow, one that will be reflected across all controllers.
Using an upstream Grafana API client (standardizing our interactions with the Grafana API, moving away from bespoke logic).
The use of a more up-to-date Operator-SDK version, making use of newer features.
- along with all relevant dependencies being kept up-to-date.
Proper testing.
Cleaning and cutting down on code.
Multi-instance and Multi-namespace support!

grafana-operator's People

Contributors

Stargazers

Watchers

Forkers

pb82 sedroche wei-lee laurafitzgerald neoludo lgarciaaco steventobin sapcc austincunningham jjaferson techmexdev florinpeter davidffrench gambol99 raserge jiangchen2019 metalmatze lukasgr90 ssshcl thereallukl mvazquezc shikhasriva andreasbergmeier6176 r-lawton benjamink yogeek abergmeier adheipsingh buddhiwathsala jsanda doytsujin redradrat slach lja9702 ossys rubenharutyunov hubertstefanski dulltz kfyharukz cybozu-go abouchama infobloxopen mjattiot obrienrobert polefishu hwanjin-jeong ucloud kqzh rajagopalan-ranganathan kungfuchicken tadayosi cip-core-mirrors tyrion85 dollypreethi nobody4t danielqsj tioxy kingofspades enekofb ctron aland-zhang mickael-carl innovation-sre vrutkovs banzaicloud ocervell ramkumarocp25 helene forestoden oscarneira hashnao sjh0027 wn-doolittle rieger-jared tapih querycap tesharp erezo9 hemajv deanbrunt cathaloconnorrh goof15kw b1zzu isgasho jvargasmurillo-splunk shivpathak weisdd pacospace kirchknopfc georgettica clayrisser davar-playgrounds apefactory arthursens robshelly dpatrick65 kd7lxl jpweber inful gencoo

grafana-operator's Issues

ability to add dashboards via grafana.com url OR id

It would be really nice to be able to add a dashboard by only providing its grafana.com URL or ID in the same way you can when adding them via the Grafana UI

Grafana behind reverse proxy

I am trying to install grafana using the operator, it gets installed, but as I am behind a reverse proxy the UI loads but the links doesn't work.

According to grafana documentation I should use root_url but I don't see a way to change it using the operator.

Thanks

grafana-service: type and name customization is not working

Hello all,

I am trying to modify the grafana-service name and type, but it is not working, I get always the same grafana-service name with ClusterIP type.
Is there a workaround?
Thanks.

Currently I have the following Grafana.yaml

apiVersion: integreatly.org/v1alpha1
kind: Grafana
metadata:
  name: example-grafana
spec:
  service:
    type: "NodePort"
    name: "telemetry"
  ingress:
    enabled: True
  config:
    log:
      mode: "console"
      level: "warn"
    security:
      admin_user: "root"
      admin_password: "secret"
    auth:
      disable_login_form: False
      disable_signout_menu: True
    auth.basic:
      enabled: False
    auth.anonymous:
      enabled: True
  dashboardLabelSelector:
    - matchExpressions:
        - {key: app, operator: In, values: [grafana]}

Next release to OperatorHub

Hi, any idea on when this is due?

Switch to Modules

I currently basically use Travis for compilation because I cannot be bothered.

Switching to Modules would make it easier to contribute and test.

Folder support for dashboards

It would be amazing if we could set Grafana folders for the dashboard CRDs.

Deployment not working

Started changing Operator to 3.0.0 with new roles but now get:

{"level":"info","ts":1576064539.3801906,"logger":"action-runner","msg":"(    5)     FAILED create grafana deployment"}

Status is:

  message: >-
    Deployment.apps "grafana-deployment" is invalid:
    spec.template.spec.containers[0].volumeMounts[5].name: Not found:
    "configmap-grafana-ldap"

Seems strange that it would search for "configmap-grafana-ldap" and not "grafana-ldap" (<- the actual name).

Operator repeatedly reconciles the Grafana service

Hello,

I'm trying to use the grafana-operator (v3.0.1), with a NodePort-type service, and it appears that the operator is repeatedly updating the grafana-service. Here's a 1 sec output to demonstrate the issue:

zvic@zvic-vbox:~/Workspace/decorus-operator (master)$ kubectl get service grafana-service --watch
NAME              TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
grafana-service   NodePort   10.97.25.73   <none>        3000:32303/TCP   2m39s
grafana-service   NodePort   10.97.25.73   <none>        3000:31420/TCP   2m39s
grafana-service   NodePort   10.97.25.73   <none>        3000:32676/TCP   2m39s
grafana-service   NodePort   10.97.25.73   <none>        3000:32281/TCP   2m39s
grafana-service   NodePort   10.97.25.73   <none>        3000:31193/TCP   2m39s
grafana-service   NodePort   10.97.25.73   <none>        3000:31160/TCP   2m39s

(Note the repeatedly changing NodePort assignment which obviously renders it useless).
Watching with -o yaml confirms that the only changing bits in the service are the spec.ports[].nodePort, as well as the resourceVersion.

Looking at the operator log, here's what I see (excerpt, the 0-6 actions and "desired cluster state met" message repeat forever):

zvic@zvic-vbox:~/Workspace/decorus-operator (master)$ kubectl logs grafana-operator-66b44cd969-th8wm | head -n50
{"level":"info","ts":1577623864.9680684,"logger":"cmd","msg":"Go Version: go1.13.5"}
{"level":"info","ts":1577623864.968354,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1577623864.9684932,"logger":"cmd","msg":"operator-sdk Version: v0.12.0"}
{"level":"info","ts":1577623864.9685607,"logger":"cmd","msg":"operator Version: 3.0.0"}
{"level":"info","ts":1577623864.9689298,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1577623865.4359303,"logger":"leader","msg":"No pre-existing lock was found."}
{"level":"info","ts":1577623865.4416497,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1577623865.8979354,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8080"}
{"level":"info","ts":1577623865.8982437,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1577623865.9006953,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"grafanadatasource-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1577623865.9013271,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"grafana-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1577623865.9015782,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"grafana-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1577623865.901804,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"grafana-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1577623865.9020362,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"grafana-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1577623865.902183,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"grafana-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1577623865.9024515,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"grafana-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1577623866.414084,"logger":"metrics","msg":"Metrics Service object updated","Service.Name":"grafana-operator-metrics","Service.Namespace":"default"}
{"level":"info","ts":1577623866.4142244,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1577623866.8737087,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"grafanadashboard-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1577623866.873995,"logger":"controller_grafanadashboard","msg":"Starting dashboard controller"}
{"level":"info","ts":1577623866.874477,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":1577623866.9749553,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"grafanadashboard-controller"}
{"level":"info","ts":1577623866.9760575,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"grafanadatasource-controller"}
{"level":"info","ts":1577623866.9760916,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"grafana-controller"}
{"level":"info","ts":1577623867.0785172,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"grafanadashboard-controller","worker count":1}
{"level":"info","ts":1577623867.0785792,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"grafanadatasource-controller","worker count":1}
{"level":"info","ts":1577623867.0786958,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"grafana-controller","worker count":1}
{"level":"info","ts":1577623867.0787594,"logger":"controller_grafanadashboard","msg":"no grafana instance available"}
{"level":"info","ts":1577623867.0789087,"logger":"controller_grafanadashboard","msg":"no grafana instance available"}
{"level":"info","ts":1577623867.0789275,"logger":"controller_grafanadashboard","msg":"no grafana instance available"}
{"level":"info","ts":1577623867.1853034,"logger":"action-runner","msg":"(    0)    SUCCESS update admin credentials secret"}
{"level":"info","ts":1577623867.1995697,"logger":"action-runner","msg":"(    1)    SUCCESS update grafana service"}
{"level":"info","ts":1577623867.203105,"logger":"action-runner","msg":"(    2)    SUCCESS update grafana service account"}
{"level":"info","ts":1577623867.2066414,"logger":"action-runner","msg":"(    3)    SUCCESS update grafana config"}
{"level":"info","ts":1577623867.2066796,"logger":"action-runner","msg":"(    4)    SUCCESS plugins unchanged"}
{"level":"info","ts":1577623867.2180324,"logger":"action-runner","msg":"(    5)    SUCCESS update grafana deployment"}
{"level":"info","ts":1577623867.2180629,"logger":"action-runner","msg":"(    6)    SUCCESS check deployment readiness"}
{"level":"info","ts":1577623867.235129,"logger":"grafana-controller","msg":"desired cluster state met"}
{"level":"info","ts":1577623867.2416759,"logger":"action-runner","msg":"(    0)    SUCCESS update admin credentials secret"}
{"level":"info","ts":1577623867.2580156,"logger":"action-runner","msg":"(    1)    SUCCESS update grafana service"}
{"level":"info","ts":1577623867.2635689,"logger":"action-runner","msg":"(    2)    SUCCESS update grafana service account"}
{"level":"info","ts":1577623867.2675772,"logger":"action-runner","msg":"(    3)    SUCCESS update grafana config"}
{"level":"info","ts":1577623867.2676094,"logger":"action-runner","msg":"(    4)    SUCCESS plugins unchanged"}
{"level":"info","ts":1577623867.2779772,"logger":"action-runner","msg":"(    5)    SUCCESS update grafana deployment"}
{"level":"info","ts":1577623867.278476,"logger":"action-runner","msg":"(    6)    SUCCESS check deployment readiness"}
{"level":"info","ts":1577623867.2840967,"logger":"grafana-controller","msg":"desired cluster state met"}

Also, here's my Grafana resource definition:

zvic@zvic-vbox:~/Workspace/decorus-operator (master)$ kubectl get grafana decorus-grafana -o yaml
apiVersion: integreatly.org/v1alpha1
kind: Grafana
metadata:
  creationTimestamp: "2019-12-29T16:04:04Z"
  generation: 1
  name: decorus-grafana
  namespace: default
  resourceVersion: "3044624"
  selfLink: /apis/integreatly.org/v1alpha1/namespaces/default/grafanas/decorus-grafana
  uid: 9acfae0f-cd8a-4d28-b9ec-a308e6033049
spec:
  compat:
    fixAnnotations: false
  config:
    auth.anonymous:
      enabled: true
    log:
      level: info
      mode: console
    security:
      admin_password: <REDUCTED>
      admin_user: admin
  dashboardLabelSelector:
  - matchLabels:
      decorus.name: decorus0
      decorus.namespace: default
  ingress: {}
  service:
    type: NodePort
status:
  adminPassword: ""
  adminUser: ""
  dashboards:
    default:
    - hash: 63f1d501cd64827e2c1a092062cddd70
      name: decorus-grafana-dashboard-main
      namespace: default
      uid: YfOSvD2Wz
    - hash: a2b33ac2b3d48a3dab851833bf1ef2f3
      name: decorus-grafana-dashboard-signal
      namespace: default
      uid: ujkoT0cZz
    - hash: e6ce0b223ef889455cd0dd8cf8c0bd26
      name: decorus-grafana-dashboard-interval
      namespace: default
      uid: HgJBLKhZk
  failedPlugins: null
  installedPlugins: null
  message: success
  phase: reconciling

Any help is appreciated :-)

GrafanaDashboard is ignored silently if the dashboard content has invalid json

If the json content of a GrafanaDashboard CR has some invalid json, there is no way for the CR creator to know about this, or that Grafana is not accepting it.

There are errors in the Grafana logs (e.g. see below) but the CR creator may not have access to the Grafana logs

t=2019-07-03T14:39:39+0000 lvl=eror msg="failed to load dashboard from " logger=provisioning.dashboard type=file name=default file=/etc/grafana-dashboards/middleware-monitoring_resources-by-namespace.json error="invalid character '(' after object key:value pair"

Suggestion to do basic json validation on the content of the dashboard in the CR before adding it to the grafana-dashboards configmap for Grafana to pickup.
If validation fails, the status field of the CR can be updated with the validation error message.

When grafana repliacas is > 1 , then the log in session is lost during browsing

Hello,
I has set Grafana to use more that 1 replica, also I have created my own kubernetes services (I don't use the one provided by operator because I need to customize the service name due to backward compatibility).
But when I go to the browser I log in and when I move through different grafana pages, the log in session is lost and I cannot get in where privileges are needed.

cannot set blockOwnerDeletion if an ownerReference refers

I've tried to deploy the operator but get stuck once deploying the Grafana instance. Firstly, i get the below:

"error":"serviceaccounts \"grafana-serviceaccount\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: no RBAC policy matched

Which is solved by manually creating the service account but then it moves on to creating the configmaps which fail with a similiar issue:

"Error in CreateConfigFiles, resourceName=grafana-config : err=configmaps \"grafana-config\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on:

I was not able to work out what the above needed as i didn't want to start manually creating everything.

Is this a bug? or have i missed something?

[email protected]

Replicas missing in deepcopy

Replicas have not been added in deepcopy. In current master annotations, labels are available options in the the deployment spec.

func (in *GrafanaDeployment) DeepCopyInto(out *GrafanaDeployment) {
	*out = *in
	if in.Annotations != nil {
		in, out := &in.Annotations, &out.Annotations
		*out = make(map[string]string, len(*in))
		for key, val := range *in {
			(*out)[key] = val
		}
	}
	if in.Labels != nil {
		in, out := &in.Labels, &out.Labels
		*out = make(map[string]string, len(*in))
		for key, val := range *in {
			(*out)[key] = val
		}
	}
	return
}

I can add the replica deep copy in this PR #106

ldap.toml currently not created

The CRD does have options to configure LDAP files but looking at the controller code, it seems there is currently no handling for that!?
Is anyone working on this?
I would assume that config_file should be a ConfigMap reference, right?

If I understand correctly, it would perhaps be enough to insert another init-container, which writes /etc/grafana/ldap.toml?

Grafana pod never created

With this CR

apiVersion: integreatly.org/v1alpha1
kind: Grafana
metadata:
  name: monitoring-grafana
spec:
  ingress:
    enabled: False
  config:
    log:
      mode: "console"
      level: "warn"
    security:
      admin_user: "admin"
      admin_password: "admin"
    auth:
      disable_login_form: False
      disable_signout_menu: False
    auth.basic:
      enabled: True
    auth.anonymous:
      enabled: True
  dashboardLabelSelector:
    - matchExpressions:
        - {key: app, operator: In, values: [grafana]}

The grafana svc is present but the grafana pod is not created.
Here are the operator logs :

grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350508.359364,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:256","msg":"Phase: Create Config Files"}
grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350508.3632724,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:452","msg":"Resource grafana-serviceaccount does not exist, creating now"}
grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350508.3850758,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:286","msg":"Creating the grafana-dashboards resource"}
grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350508.3909838,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:452","msg":"Resource grafana-dashboards does not exist, creating now"}
grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350508.3993938,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:286","msg":"Creating the grafana-providers resource"}
grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350508.4050183,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:452","msg":"Resource grafana-providers does not exist, creating now"}
grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350508.4121027,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:286","msg":"Creating the grafana-datasources resource"}
grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350508.4180439,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:452","msg":"Resource grafana-datasources does not exist, creating now"}
grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350508.4305937,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:286","msg":"Creating the grafana-service resource"}
grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350508.4351752,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:452","msg":"Resource grafana-service does not exist, creating now"}
grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350508.4587262,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:293","msg":"Config files created"}
grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350508.4808128,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:256","msg":"Phase: Create Config Files"}
grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350508.5761788,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:286","msg":"Creating the grafana-dashboards resource"}
grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350508.7659416,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:286","msg":"Creating the grafana-providers resource"}
grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350508.965594,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:286","msg":"Creating the grafana-datasources resource"}
grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350509.1760461,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:286","msg":"Creating the grafana-service resource"}
grafana-operator-c59c5b98c-vrbfd grafana-operator {"level":"info","ts":1576350509.3649647,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:293","msg":"Config files created"}

Dashboards not imported after Operator restart

When the Operator is restarted it will not re-import any dashboards it had imported previously. This is because we mark a dashboard as imported and only re-import when the JSON field has changed.

We should always import a dashboard if it is not present in the configmap. A similar problem existed for data sources and was resolved in the same way.

Workaround: you can remove the status field from the dashboard or introduce a small change in the JSON to force the operator to re-import the dashboard.

grafana-controller the server could not find the requested resource

After updating to v3.0.0 I am now seeing this status on the Grafana resource.

  Warning  ProcessingError  4m1s (x16 over 9m12s)  grafana-controller  deployment not ready
  Warning  ProcessingError  3m46s (x6 over 8m44s)  grafana-controller  the server could not find the requested resource (put grafanas.integreatly.org metalmatze)

In the controller itself I can see these panics happening every now and then:

{"level":"error","ts":1575930586.1131256,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"grafana-controller","request":"monitoring/metalmatze","error":"the server could not find the requested resource (put grafanas.integreatly.org metalmatze)","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/home/travis/gopath/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/travis/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/travis/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/home/travis/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:171\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/home/travis/gopath/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/home/travis/gopath/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/home/travis/gopath/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88"}

Here's my Grafana resource:

apiVersion: integreatly.org/v1alpha1
kind: Grafana
metadata:
  name: metalmatze
  namespace: monitoring
spec:
  ingress:
    enabled: false
  config:
    log:
      mode: console
      level: debug
    security:
      admin_user: root
      admin_password: foobar
    auth:
      disable_login_form: true
      disable_signout_menu: true
    auth.basic:
      enabled: False
    auth.anonymous:
      enabled: true
  dashboardLabelSelector:
    - matchExpressions:
        - { key: app, operator: In, values: [grafana] }

Grafana liveness/readiness probes and container port are hardcoded into the template

Container Port: https://github.com/integr8ly/grafana-operator/blob/a35bd7b92f5b1f71ecc7039b22ed8949c740575f/templates/grafana-deployment.yaml#L51

Liveness: https://github.com/integr8ly/grafana-operator/blob/a35bd7b92f5b1f71ecc7039b22ed8949c740575f/templates/grafana-deployment.yaml#L65

Readiness: https://github.com/integr8ly/grafana-operator/blob/a35bd7b92f5b1f71ecc7039b22ed8949c740575f/templates/grafana-deployment.yaml#L57

The spec allows setting a different http_port for grafana (https://github.com/integr8ly/grafana-operator/blob/master/pkg/apis/integreatly/v1alpha1/grafana_types.go#L89) so instead of having the port hardcoded it should point to the value of the spec if any, otherwise it can default to 3000.

panels of type gauge and bar gauge lose their settings

When I try to import dashboards via the CR with panels of type guage or bar gauge the settings for those pannels get lost.

I have tried using the json and the url form of the GrafanaDashboard type. both have same experince.

I can copy the exact same JSON direclty into the same Grafana dashboard import feature and the panels import fine. This only seems to happen when the board gets imported via the Operator and I can't for the life of me figure out what is causing it. I can only assume at this point since I can copy paste directly in that the operator has to be doing something or importing the board in such a way that something somewhere is going funky.

Operator upgrades never finishes

Right now it's impossible to upgrade operator using deployment's rolling upgrade.


	// Become the leader before proceeding
	leader.Become(context.TODO(), "grafana-operator-lock")

	r := ready.NewFileReady()
	err = r.Set()
	if err != nil {
		log.Error(err, "")
		os.Exit(1)
	}
	defer r.Unset()

leader election never succeedes (there is already master instance elected), ready file is not created and finally readiness probe never marks pod as ready.

grafana-dashboards configmap exceeds size limit

Hey there, I think there's a design flow in how the dashboards get stored.

Currently all dashboards get merged together into one "grafana-dashboards" configmap.
Since there's a limit of 1MB per ConfigMap, we ran into the case, that we can't add any more dashboards into grafana.

Error message from grafana-operator:

{"level":"error","ts":1571926076.6034844,"logger":"kubebuilder.controller","caller":"controller/controller.go:209","msg":"Reconciler error","Controller":"grafanadashboard-controller","Request":"schnitzel-test/grafana-dashboard-schnitzel-test-schnitzel-processor-performance","error":"ConfigMap \"grafana-dashboards\" is invalid: []: Too long: must have at most 1048576 characters","stacktrace":"github.com/integr8ly/grafana-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error
\tgrafana-operator/vendor/github.com/go-logr/zapr/zapr.go:128
github.com/integr8ly/grafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
\tgrafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:209
github.com/integr8ly/grafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
\tgrafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157
github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
\tgrafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil
\tgrafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until
\tgrafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}

So either the dashboards need to be seperated into multiple configmaps (which would mean the grafana deployment needs to be updated and grafana itself restarted in the case of added/removed dashboards) or need to be injected using PVCs (might be sucky since depending on the implementation that could require RWX mounts) or some other means.

Pod Disruption Budget for grafana pods

Do we need a pdb spec for grafana pods or is it an overkill for grafana ? Perhaps we should add one. Any thoughts on this ?

Customize labels for Grafana Pod

Currently I don't see how can I set the labels for the Grafana Pod which is managed by the operator. It would be great if I could set custom labels which are then applied to the Grafana Pod, or even better if the Pod would inherit the labels of the Grafana CRD itself.

Support managing loki

It would be great to allow grafana-operator to also manage the various components of loki.

${DS_PROMETHEUS} not processed in Kubernetes GrafanaDashboard kind

We're integrating Strimzi with Prometheus/Grafana. This URL: https://github.com/strimzi/strimzi-kafka-operator/blob/master/metrics/examples/grafana/strimzi-kafka.json - defines a Strimzi/Kafka dashboard.

If I hand-import the dashboard JSON into the Grafana UI, the dashboard works fine. (Precursor steps are to deploy a Prometheus instance and a GrafanaDataSource for Prometheus.)

However, if I include the dashboard JSON - as is - into a GrafanaDashboard CR like this:

apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
  name: strimzi-kafka
  namespace: my-ns
  labels:
    app: grafana
spec:
  name: strimzi-kafka.json
  json: >
    {
      "__inputs": [
        "name": "DS_PROMETHEUS"
        ...
      ],
      ...
      "panels": [
         {
         ...
         "datasource": "${DS_PROMETHEUS}"
         ...

The dashboard does get loaded by Grafana, but the datasource variable replacement apparently does not take place in this code path. So the dashboard in the UI fails with "Datasource named ${DS_PROMETHEUS} not found"

The specific steps are:

Deploy Strimzi/Kafka cluster
Deploy Prometheus CRs
Ensure Prometheus is scraping Kafka
Subscribe Grafana Operator v2.0.0
Deploy GrafanaDataSource CR
Deploy GrafanaDashboard CR
Deploy Grafana CR (e.g. example from https://operatorhub.io/operator/grafana-operator)
Port forward 3000 to Grafana pod
Access the Grafana UI from the browser
Observe that the dashboard is present, but is unable to resolve the data source

Is this by design - or - am I doing something wrong? If I hand-edit the resulting grafana-dashboards configmap generated by the Grafana Operator and replace all occurrences of "${DS_PROMETHEUS}" with "Prometheus" - the dashboard then works. So it looks like - in order to use an exported dashboard JSON I have some text replacement to do in the GrafanaDashboard CR. I would have expected the JSON to be handled identically by Grafana whether it flows in from the UI via import, or from a custom resource.

Thanks

Request/Limits

I see the quota allocation is kept as constant.
As an operator user i should be able to define these in kind: Grafana . In my deployment spec, any specific reason for keeping these as constants ??
Each user can have its own quota/limits preferences

const (
	MemoryRequest = "256Mi"
	CpuRequest    = "100m"
	MemoryLimit   = "1024Mi"
	CpuLimit      = "500m"
)

Openshift 3.11 - Creating a Grafana app using the operator

Hey,

While using this operator and having it installed on Openshift 3.11, when trying to deploy a new grafana app from the example yaml (integr8tly\grafana-operator\deploy\examples\Grafana.yaml), I ran into the following error:

{"level":"info","ts":1562846141.0283246,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:186","msg":"Phase: Create Config Files"}

{"level":"info","ts":1562846141.0389605,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:321","msg":"Creating grafana-serviceaccount"}

{"level":"error","ts":1562846141.042217,"logger":"kubebuilder.controller","caller":"controller/controller.go:209","msg":"Reconciler error","Controller":"grafana-controller","Request":"grafana-operator/example-grafana4","error":"serviceaccounts "grafana-serviceaccount" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: no RBAC policy matched, ","stacktrace":"github.com/integr8ly/grafana-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/media/peter/MEDIA/go/src/github.com/integr8ly/grafana-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/integr8ly/grafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/media/peter/MEDIA/go/src/github.com/integr8ly/grafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:209\ngithub.com/integr8ly/grafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/media/peter/MEDIA/go/src/github.com/integr8ly/grafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157\ngithub.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/media/peter/MEDIA/go/src/github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/media/peter/MEDIA/go/src/github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/media/peter/MEDIA/go/src/github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}

As a workaround I modified the following rule of the Clusterrole, adding the pairs */finalizers, and it worked!

verbs:
- '*'
  apiGroups:
- integreatly.org
  resources:
- grafanas
- grafanadashboards
- grafanadatasources
- grafanas/finalizers
- grafanadashboards/finalizers
- grafanadatasources/finalizers

Datasources are not dynamic

When I add/remove a GrafanaDataSource CRD even though the file is synced (it appears/disappears automatically in the grafana deployment under /etc/grafana/provisioning/datasources) Grafana doesn't pick up the change. If I kill the Pod manually the newly spawned one will have the correct datasources.

grafana-operator crashing in latest and v1.4.0 too

I am trying to follow this https://github.com/integr8ly/grafana-operator/blob/master/documentation/deploy_grafana.md

but when operator is deployed it crashes with this

{"level":"info","ts":1568894945.5405514,"logger":"leader","caller":"leader/leader.go:82","msg":"Continuing as the leader."}
{"level":"info","ts":1568894945.5801764,"logger":"cmd","caller":"manager/main.go:140","msg":"Registering Components."}
{"level":"info","ts":1568894945.5812306,"logger":"kubebuilder.controller","caller":"controller/controller.go:120","msg":"Starting EventSource","Controller":"grafanadatasource-controller","Source":{"Type":{"metadata":{"creationTimestamp":null},"spec":{"datasources":null,"name":""},"status":{"phase":0,"lastConfig":""}}}}
{"level":"error","ts":1568894945.584245,"logger":"cmd","caller":"manager/main.go:150","msg":"","error":"no matches for kind \"GrafanaDataSource\" in version \"integreatly.org/v1alpha1\"","stacktrace":"github.com/integr8ly/grafana-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\tgrafana-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\tgrafana-operator/cmd/manager/main.go:150\nruntime.main\n\t/usr/local/Cellar/go/1.12.4/libexec/src/runtime/proc.go:200"}```

can somene help me navigate further. so far i have deployed the role, rolebinding and sa and just tried to launch operator to get going.

Cannot add JsonData to datasource

So, when I try to add a datasource like this, with jsonData:

apiVersion: integreatly.org/v1alpha1
kind: GrafanaDataSource
metadata:
name: onprem-grafanadatasource
spec:
name: prometheus.yaml
datasources:
- name: Prometheus
type: prometheus
access: proxy
orgId: 1
url: https://prometheus-k8s.openshift-monitoring.svc:9091
basicAuth: true
basicAuthUser: blablabla
basicAuthPassword: blablabla
isDefault: true
jsonData:
tlsSkipVerify: true
version: 1
editable: true

I get this error from the operator:

E0823 13:18:03.740973 1 reflector.go:205] github.com/integr8ly/grafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: Failed to list *v1alpha1.GrafanaDataSource: v1alpha1.GrafanaDataSourceList.Items: []v1alpha1.GrafanaDataSource: v1alpha1.GrafanaDataSource.Spec: v1alpha1.GrafanaDataSourceSpec.Datasources: []v1alpha1.GrafanaDataSourceFields: v1alpha1.GrafanaDataSourceFields.JsonData: ReadString: expects " or n, but found t, error found in #10 byte of ...|pVerify":true},"name|..., bigger context ...|rue,"isDefault":true,"jsonData":{"tlsSkipVerify":true},"name":"Prometheus","orgId":1,"type":"prometh|...

I have also tried what the error suggests and put like:

jsonData: "{tlsSkipVerify: true}"

But then it says it is expecting a { instead of "
So, if I do:
jsonData: {tlsSkipVerify: true}

This is just rewritten to the below and fails again with the original error

  jsonData:
    tlsSkipVerify: true

OpenShift 3.11 Projects with Grafana Operator and related CRs get stuck in "Terminating" state???

Hi,

First of all, thanks for this awesome operator, is very useful!!!

I'm seeing this behaviour consistently. I haven't gone into more details,
but when I have the Grafana Operator deployed on a project, along with Grafana, GrafanaDashboards and GrafanDataSources CRs, and I delete the Project, the project gets stuck in "Terminating" state.

I guess it has to be realated with the k8s finalizers and the order in which resources are deleted....

Getting Ingress to get a consistent IP Address on various managed K8s platforms.

I am using release version 2.0 of the Grafana Operator.

I have the following Grafana CR:

apiVersion: integreatly.org/v1alpha1
kind: Grafana
metadata:
  name: insights-grafana
spec:
  ingress:
    enabled: True
    labels:
      app: grafana
    annotations:
      app: grafana
  service:
    labels:
      app: grafana
    annotations:
      app: grafana
    type: LoadBalancer
  config:
    log:
      mode: "console"
      level: "warn"
    security:
      admin_user: "root"
      admin_password: "secret"
    auth:
      disable_login_form: False
      disable_signout_menu: True
    auth.basic:
      enabled: False
    auth.anonymous:
      enabled: True
  dashboardLabelSelector:
    - matchExpressions:
        - {key: app, operator: In, values: [grafana]}

That above CR works fine on OpenShift, Kops/AWS, GKE. But does not assign an address on EKS/AWS:

tgates@tgu19:~$ kubectl get ingress
NAME              HOSTS   ADDRESS   PORTS   AGE
grafana-ingress   *                 80      33m
tgates@tgu19:~$ kubectl get ingress/grafana-ingress
NAME              HOSTS   ADDRESS   PORTS   AGE
grafana-ingress   *                 80      33m
tgates@tgu19:~$ kubectl describe ingress/grafana-ingress
Name:             grafana-ingress
Namespace:        nuodb
Address:          
Default backend:  default-http-backend:80 (<none>)
Rules:
  Host  Path  Backends
  ----  ----  --------
  *     
           grafana-service:3000 (192.168.42.203:3000)
Annotations:
  app:   grafana
Events:  <none>
tgates@tgu19:~$

If I change the CR spec.service.type to "ClusterIP", then it fixes EKS/AWS, but does not get an assigned Ingress address on GKE (and perhaps others I didn't test).

If I remove the entire spec.service from the CR, The Grafana Operator creates a default Grafana K8s Service using ClusterIP but the Ingress still doesn't have an address on GKE:

tgates@tgu19:~$ kubectl get svc
NAME                               TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                                          AGE
admin                              LoadBalancer   10.104.15.252   35.223.38.33   8888:32474/TCP,48004:31179/TCP,48005:32352/TCP   14m
domain                             ClusterIP      None            <none>         8888/TCP,48004/TCP,48005/TCP                     14m
grafana-service                    ClusterIP      10.104.8.238    <none>         3000/TCP                                         10m
insights-escluster-es-http         LoadBalancer   10.104.8.60     34.67.70.72    9200:31409/TCP                                   14m
insights-server                    LoadBalancer   10.104.11.99    35.202.116.3   8080:32483/TCP                                   14m
insights-server-release-logstash   ClusterIP      10.104.5.88     <none>         8080/TCP                                         11m
kibana-kb-http                     ClusterIP      10.104.8.55     <none>         5601/TCP                                         13m
tgates@tgu19:~$ kubectl get ingress
NAME              HOSTS   ADDRESS   PORTS   AGE
grafana-ingress   *                 80      10m
tgates@tgu19:~$ kubectl get grafana
NAME               AGE
insights-grafana   11m
tgates@tgu19:~$ kubectl describe grafana/insights-grafana
Name:         insights-grafana
Namespace:    nuodb
Labels:       <none>
Annotations:  <none>
API Version:  integreatly.org/v1alpha1
Kind:         Grafana
Metadata:
  Creation Timestamp:  2019-12-05T13:50:37Z
  Generation:          4
  Owner References:
    API Version:           nuodb.com/v2alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  NuodbInsightsServer
    Name:                  insightsserver
    UID:                   acd3b267-1765-11ea-9b3f-42010a800257
  Resource Version:        7043072
  Self Link:               /apis/integreatly.org/v1alpha1/namespaces/nuodb/grafanas/insights-grafana
  UID:                     37d83f3d-1766-11ea-9b3f-42010a800257
Spec:
  Admin Password:  
  Admin User:      
  Anonymous:       false
  Basic Auth:      false
  Config:
    Alerting:
    Analytics:
    Auth:
      Disable Signout Menu:  true
    Auth . Anonymous:
      Enabled:  true
    Auth . Basic:
    Auth . Generic Oauth:
    Auth . Github:
    Auth . Google:
    Auth . Ldap:
    Auth . Proxy:
    Dashboards:
    Database:
    Dataproxy:
    External Image Storage:
    External Image Storage . Azure Blob:
    External Image Storage . Gcs:
    External Image Storage . S 3:
    External Image Storage . Webdav:
    Log:
      Level:  warn
      Mode:   console
    Metrics:
    Metrics . Graphite:
    Panels:
    Paths:
    Plugins:
    Remote Cache:
    Security:
      Admin Password:  secret
      Admin User:      root
    Server:
    Smtp:
    Snapshots:
    Users:
  Dashboard Label Selector:
    Match Expressions:
      Key:       app
      Operator:  In
      Values:
        grafana
  Disable Login Form:    false
  Disable Signout Menu:  false
  Ingress:
    Annotations:
      App:    grafana
    Enabled:  true
    Labels:
      App:    grafana
  Log Level:  
  Service:
Status:
  Failed Plugins:     <nil>
  Installed Plugins:  <nil>
  Last Config:        e148efa7b90017b0c916ef81566ca7c6
  Phase:              3
Events:               <none>
tgates@tgu19:~$ kubectl config view
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://35.224.125.114
  name: gke_marketplace-dev-230722_us-central1-a_tgates-cluster-3
contexts:
- context:
    cluster: gke_marketplace-dev-230722_us-central1-a_tgates-cluster-3
    namespace: nuodb
    user: gke_marketplace-dev-230722_us-central1-a_tgates-cluster-3
  name: gke_marketplace-dev-230722_us-central1-a_tgates-cluster-3
current-context: gke_marketplace-dev-230722_us-central1-a_tgates-cluster-3
kind: Config
preferences: {}
users:
- name: gke_marketplace-dev-230722_us-central1-a_tgates-cluster-3
  user:
    auth-provider:
      config:
        access-token: ya29.ImazB9DIdp34yJcnFyJF5v1TDbSMfObp8rSk0EVKszszQIqBrc47wbdFz0OijgHJQsZPt5WFUSrnZuA-rqO9MM2QJG6p1o5t6gUGHh_7BoyBENKZWDjb9bpUqcEHY7ktLX1XNBTQQsY
        cmd-args: config config-helper --format=json
        cmd-path: /usr/lib/google-cloud-sdk/bin/gcloud
        expiry: "2019-12-05T14:35:12Z"
        expiry-key: '{.credential.token_expiry}'
        token-key: '{.credential.access_token}'
      name: gcp
tgates@tgu19:~$

There is a reliable way to have a single CR that works correctly on all K8s/Cloud platforms?

watching grafanas and datasources in all namespaces

The Grafana and GrafanaDataSource resources do not support multiple namespaces and are only reconciled if created in the operators namespace.

Would you mind sharing the reasons that lead to the decision not to support it?

We usually install all operators in one system namespace and then create CRDs in other managed namespaces where we want to install the software the operator manages (Grafana in this case). All operators I used so far (including Prometheus) support this pattern.

Rework the Operator to call Grafana API directly

Having deployed the Grafana Operator I am concerned about its architecture for future releases.
Most problematic would probably be the grafana-dashboards ConfigMap which in my case already is stuffed with 20 objects.
Every time some dashboard changes the deployment gets re-deployed entirely, which might not preferred when deploying often and by different teams.

I propose to have the Operator read GrafanaDashboards CustomResources and when reconciling directly call the Grafana API and add dashboards that way. It should work similarly for datasources and other things too.

Let me know what you think.

Cannot process dashboard

I am trying to import some existing dashboards (the ones from https://github.com/kubernetes-monitoring/kubernetes-mixin project which are the dashboards provided when using the Grafana instance included in the Prometheus Operator installation).

When I create the GrafanaDashboard custom resources corresponding to the JSON files of these dashboards and apply it to my cluster, some are imported correctly but others are not.

Operator logs:

grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597132.3258252,"logger":"action-runner","msg":"(    0)    SUCCESS update admin credentials secret"}
grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597132.3348644,"logger":"action-runner","msg":"(    1)    SUCCESS update grafana service"}
grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597132.339254,"logger":"action-runner","msg":"(    2)    SUCCESS update grafana service account"}
grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597132.3445523,"logger":"action-runner","msg":"(    3)    SUCCESS update grafana config"}
grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597132.3445776,"logger":"action-runner","msg":"(    4)    SUCCESS plugins unchanged"}
grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597132.3511753,"logger":"action-runner","msg":"(    5)    SUCCESS update grafana deployment"}
grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597132.3512046,"logger":"action-runner","msg":"(    6)    SUCCESS check deployment readiness"}
grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597132.3574247,"logger":"grafana-controller","msg":"desired cluster state met"}
grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597136.9083877,"logger":"controller_grafanadashboard","msg":"running periodic dashboard resync"}
grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597136.9091856,"logger":"controller_grafanadashboard","msg":"cannot process dashboard monitoring/namespace-by-pod"}
grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597136.931786,"logger":"controller_grafanadashboard","msg":"cannot process dashboard monitoring/apiserver"}
grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597136.9522867,"logger":"controller_grafanadashboard","msg":"cannot process dashboard monitoring/controller-manager"}
grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597136.9716318,"logger":"controller_grafanadashboard","msg":"cannot process dashboard monitoring/pods"}
grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597136.9854927,"logger":"controller_grafanadashboard","msg":"cannot process dashboard monitoring/scheduler"}
grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597137.0040374,"logger":"controller_grafanadashboard","msg":"cannot process dashboard monitoring/proxy"}
grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597137.025436,"logger":"controller_grafanadashboard","msg":"cannot process dashboard monitoring/kubelet"}
grafana-operator-864477b67-ctwtf grafana-operator {"level":"info","ts":1576597137.0624104,"logger":"controller_grafanadashboard","msg":"cannot process dashboard monitoring/pod-total"}

One of the erroneous Grafana Dashboard status :

$ kubectl describe grafanadashboards.integreatly.org controller-manager
Name:         controller-manager
Namespace:    monitoring
Labels:       app=istio
              monitoring.discovery.dashboards=true
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"integreatly.org/v1alpha1","kind":"GrafanaDashboard","metadata":{"annotations":{},"labels":{"app":"istio","monitoring.discov...
API Version:  integreatly.org/v1alpha1
Kind:         GrafanaDashboard
Metadata:
  Creation Timestamp:  2019-12-17T01:46:30Z
  Generation:          1
  Resource Version:    14865
  Self Link:           /apis/integreatly.org/v1alpha1/namespaces/monitoring/grafanadashboards/controller-manager
  UID:                 69313ed2-9958-499f-a7c9-9e6365ba00c2
Spec:
  Json:  {
[...]
}
  Name:  controller-manager.json
Status:
  Hash:     
  Id:       0
  Message:  json: cannot unmarshal string into Go struct field Row.rows.panels of type bool
  Phase:    failing
  Slug:     
  UID:      
Events:
  Type     Reason           Age                     From                         Message
  ----     ------           ----                    ----                         -------
  Warning  ProcessingError  2m56s (x5259 over 14h)  controller_grafanadashboard  json: cannot unmarshal string into Go struct field Row.rows.panels of type bool

We can see in the previous pasted log that the main error is :
json: cannot unmarshal string into Go struct field Row.rows.panels of type bool

IMPORTANT : When I import the same JSON file directly in Grafana UI, it is imported correctly !

It seems that you use a grafana sdk in your code and that this SDK is responsible of unmarshaling the JSON file.
Do you think it may be a problem of comptibility between the SDK you use and the Grafana image version ?

Howto get grafana 6.4

Is there a sensible way to achieve this? 6.4 has improvements around loki.

tyop/stray descripton (instead of description) field in the openapi crd spec

https://github.com/integr8ly/grafana-operator/blob/master/deploy/crds/Grafana.yaml#L90

This is causing synchronisation issues with gitops flows.
When the crd is applied, Kubernetes ignores/removes the field and thus the git source of truth is alway out of sync.

Phase upgrade seems to not work

I switched Operator to 3.0.0 and get the following errors:

reflector.go:125] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: Failed to list *v1alpha1.Grafana: v1alpha1.GrafanaList.ListMeta: v1.ListMeta.TypeMeta: Kind: Items: []v1alpha1.Grafana: v1alpha1.Grafana.Status: v1alpha1.GrafanaStatus.Phase: ReadString: expects " or n, but found 3, error found in #10 byte of ...|,"phase":3}}],"kind"|..., bigger context ...|nfig":"e8febc1cafc5509cb3f45ddd35a7eed0","phase":3}}],"kind":"GrafanaList","metadata":{"continue":""|...

In the status maps, you find phase values like:

phase: 3

Since phase is a string, I think it would be necessary to at least upgrade the status to be

phase: "3"

If phase was an int in the past, a complete conversion would be great ;)

Grafana provisioning configuration is not generated correctly

Tested using grafana-operator:latest sha 2a564282916d with Grafana 6.2.5.

Found multiple issues when trying to create new datasources using CRD...

1. Casing of keys are not correct

When providing:

kind: GrafanaDataSource
...
      basicAuth: true

The generated config becomes:

      basicauth: true

From what I can tell, Grafana's provisioning config version 0 had snake_case and version 1 has camelCase.
Issue appears to in: pkg/apis/integreatly/v1alpha1/grafanadatasource_types.go
Where the yaml formatting is missing:

BasicAuth         bool   `json:"basicAuth,omitempty"`

Should be:

BasicAuth         bool   `json:"basicAuth,omitempty" yaml:"basicAuth,omitempty"`

2. Missing apiVersion

Documentation shows apiVersion as part of the provisioning file: https://grafana.com/docs/administration/provisioning/#example-datasource-config-file
When not specifying apiVersion 1, the parsing of the provisioning file does not follow documentation and does not work as expected.

3. jsonData and secureJsonData is not generated correctly

Documentation shows both should be generated as a map: https://grafana.com/docs/administration/provisioning/#example-datasource-config-file
However, the CRD expects a JSON string, which Grafana will reject when using apiVersion: 1.

Operator panics when creating a grafana object with 'containers' spec

grafana.yaml

apiVersion: integreatly.org/v1alpha1
kind: Grafana
metadata:
  name: example-grafana
spec:
  config:
    log:
      mode: "console"
      level: "warn"
    auth:
      disable_login_form: True
      disable_signout_menu: True
    auth.basic:
      enabled: True
    auth.proxy:
      auto_sign_up: True
      enabled: True
      header_name: "X-Forwarded-User"
    server:
      http_addr: "127.0.0.1"
      http_port: "3001"
    users:
      allow_sign_up: False
      auto_assign_org: True
      auto_assign_org_role: "Admin"
  service:
    annotations:
      service.alpha.openshift.io/serving-cert-secret-name: grafana-tls
    labels:
      app: grafana
    type: ClusterIP
  containers:
    - args:
        - '-provider=openshift'
        - '-https-address=:3000'
        - '-http-address='
        - '-email-domain=*'
        - '-upstream=http://localhost:3001'
        - '-openshift-sar={"resource": "namespaces", "verb": "get"}'
        - '-tls-cert=/etc/tls/private/tls.crt'
        - '-tls-key=/etc/tls/private/tls.key'
        - '-cookie-secret-file=/etc/proxy/secrets/session_secret'
        - '-openshift-service-account=grafana'
        - '-openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'
        - '-skip-auth-regex=^/metrics'
      image: quay.io/openshift/origin-oauth-proxy:4.1
      name: oauth-proxy
      ports:
        - containerPort: 3000
          name: web-proxy
      volumeMounts:
        - mountPath: /etc/tls/private
          name: secret-grafana-tls
        - mountPath: /etc/proxy/secrets
          name: secret-grafana-proxy
  secrets:
    - grafana-tls
    - grafana-proxy
  dashboardLabelSelector:
    - matchExpressions:
        - {key: app, operator: In, values: [grafana]}

Grafana Operator startup logs

{"level":"info","ts":1571828027.9506412,"logger":"cmd","caller":"manager/main.go:37","msg":"Go Version: go1.10.8"}
{"level":"info","ts":1571828027.9507186,"logger":"cmd","caller":"manager/main.go:38","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1571828027.9507422,"logger":"cmd","caller":"manager/main.go:39","msg":"operator-sdk Version: v0.9.0"}
{"level":"info","ts":1571828027.9507594,"logger":"cmd","caller":"manager/main.go:40","msg":"operator Version: 2.0.0"}
{"level":"info","ts":1571828027.9516816,"logger":"leader","caller":"leader/leader.go:46","msg":"Trying to become the leader."}
{"level":"info","ts":1571828028.1541784,"logger":"leader","caller":"leader/leader.go:81","msg":"Found existing lock with my name. I was likely restarted."}
{"level":"info","ts":1571828028.1542351,"logger":"leader","caller":"leader/leader.go:82","msg":"Continuing as the leader."}
{"level":"info","ts":1571828028.3330405,"logger":"cmd","caller":"manager/main.go:170","msg":"Registering Components."}
{"level":"info","ts":1571828028.3363254,"logger":"kubebuilder.controller","caller":"controller/controller.go:120","msg":"Starting EventSource","Controller":"grafanadatasource-controller","Source":{"Type":{"metadata":{"creationTimestamp":null},"spec":{"datasources":null,"name":""},"status":{"phase":0,"lastConfig":""}}}}
{"level":"info","ts":1571828028.3406792,"logger":"kubebuilder.controller","caller":"controller/controller.go:120","msg":"Starting EventSource","Controller":"grafana-controller","Source":{"Type":{"metadata":{"creationTimestamp":null},"spec":{"adminPassword":"","adminUser":"","anonymous":false,"basicAuth":false,"config":{"paths":{},"server":{},"database":{},"remote_cache":{},"security":{},"users":{},"auth":{},"auth.basic":{},"auth.anonymous":{},"auth.google":{},"auth.github":{},"auth.generic_oauth":{},"auth.ldap":{},"auth.proxy":{},"dataproxy":{},"analytics":{},"dashboards":{},"smtp":{},"log":{},"metrics":{},"metrics.graphite":{},"snapshots":{},"external_image_storage":{},"external_image_storage.s3":{},"external_image_storage.webdav":{},"external_image_storage.gcs":{},"external_image_storage.azure_blob":{},"alerting":{},"panels":{},"plugins":{}},"disableLoginForm":false,"disableSignoutMenu":false,"ingress":{},"logLevel":"","service":{}},"status":{"phase":0,"installedPlugins":null,"failedPlugins":null,"lastConfig":""}}}}
{"level":"info","ts":1571828028.3491786,"logger":"cmd","caller":"manager/main.go:184","msg":"Starting the Cmd."}
{"level":"info","ts":1571828028.53275,"logger":"kubebuilder.controller","caller":"controller/controller.go:120","msg":"Starting EventSource","Controller":"grafanadashboard-controller","Source":{"Type":{"metadata":{"creationTimestamp":null},"spec":{"json":"","name":""},"status":{"phase":0}}}}
{"level":"info","ts":1571828028.5332046,"logger":"controller_grafanadashboard","caller":"grafanadashboard/dashboard_controller.go:61","msg":"Starting dashboard controller"}
{"level":"info","ts":1571828028.6347058,"logger":"kubebuilder.controller","caller":"controller/controller.go:134","msg":"Starting Controller","Controller":"grafana-controller"}
{"level":"info","ts":1571828028.6348126,"logger":"kubebuilder.controller","caller":"controller/controller.go:134","msg":"Starting Controller","Controller":"grafanadatasource-controller"}
{"level":"info","ts":1571828028.6348999,"logger":"kubebuilder.controller","caller":"controller/controller.go:134","msg":"Starting Controller","Controller":"grafanadashboard-controller"}
{"level":"info","ts":1571828028.7351656,"logger":"kubebuilder.controller","caller":"controller/controller.go:153","msg":"Starting workers","Controller":"grafanadashboard-controller","WorkerCount":1}
{"level":"info","ts":1571828028.7430577,"logger":"kubebuilder.controller","caller":"controller/controller.go:153","msg":"Starting workers","Controller":"grafana-controller","WorkerCount":1}
{"level":"info","ts":1571828028.7430577,"logger":"kubebuilder.controller","caller":"controller/controller.go:153","msg":"Starting workers","Controller":"grafanadatasource-controller","WorkerCount":1}

Operator crashes:

{"level":"info","ts":1571828182.066933,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:256","msg":"Phase: Create Config Files"}
{"level":"info","ts":1571828182.0725026,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:452","msg":"Resource grafana-serviceaccount does not exist, creating now"}
{"level":"info","ts":1571828182.1542637,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:286","msg":"Creating the grafana-dashboards resource"}
{"level":"info","ts":1571828182.1602066,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:452","msg":"Resource grafana-dashboards does not exist, creating now"}
{"level":"info","ts":1571828182.1692173,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:286","msg":"Creating the grafana-providers resource"}
{"level":"info","ts":1571828182.174224,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:452","msg":"Resource grafana-providers does not exist, creating now"}
{"level":"info","ts":1571828182.181247,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:286","msg":"Creating the grafana-datasources resource"}
{"level":"info","ts":1571828182.1862216,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:452","msg":"Resource grafana-datasources does not exist, creating now"}
{"level":"info","ts":1571828182.192278,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:286","msg":"Creating the grafana-service resource"}
{"level":"info","ts":1571828182.1962156,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:452","msg":"Resource grafana-service does not exist, creating now"}
{"level":"info","ts":1571828182.2096612,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:293","msg":"Config files created"}
{"level":"info","ts":1571828182.2244709,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:299","msg":"Phase: Install Grafana"}
{"level":"info","ts":1571828182.2461221,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:326","msg":"adding volume for secret 'grafana-tls' as 'secret-grafana-tls'"}
{"level":"info","ts":1571828182.2461627,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:326","msg":"adding volume for secret 'grafana-proxy' as 'secret-grafana-proxy'"}
{"level":"info","ts":1571828182.2462075,"logger":"controller_grafana","caller":"grafana/grafana_controller.go:379","msg":"adding extra container 'oauth-proxy' to 'grafana-deployment'"}
E1023 10:56:22.246367       1 runtime.go:66] Observed a panic: &runtime.TypeAssertionError{interfaceString:"interface {}", concreteString:"v1.Container", assertedString:"map[string]interface {}", missingMethod:""} (interface conversion: interface {} is v1.Container, not map[string]interface {})
/home/travis/gopath/src/github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/home/travis/gopath/src/github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/home/travis/gopath/src/github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/home/travis/.gimme/versions/go1.10.8.linux.amd64/src/runtime/asm_amd64.s:573
/home/travis/.gimme/versions/go1.10.8.linux.amd64/src/runtime/panic.go:502
/home/travis/.gimme/versions/go1.10.8.linux.amd64/src/runtime/iface.go:252
/home/travis/gopath/src/github.com/integr8ly/grafana-operator/pkg/controller/grafana/grafana_controller.go:390
/home/travis/gopath/src/github.com/integr8ly/grafana-operator/pkg/controller/grafana/grafana_controller.go:301
/home/travis/gopath/src/github.com/integr8ly/grafana-operator/pkg/controller/grafana/grafana_controller.go:103
/home/travis/gopath/src/github.com/integr8ly/grafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:207
/home/travis/gopath/src/github.com/integr8ly/grafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157
/home/travis/gopath/src/github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/home/travis/gopath/src/github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/home/travis/gopath/src/github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/home/travis/.gimme/versions/go1.10.8.linux.amd64/src/runtime/asm_amd64.s:2361
panic: interface conversion: interface {} is v1.Container, not map[string]interface {} [recovered]
	panic: interface conversion: interface {} is v1.Container, not map[string]interface {}

goroutine 712 [running]:
github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/home/travis/gopath/src/github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x111be80, 0xc4207ae040)
	/home/travis/.gimme/versions/go1.10.8.linux.amd64/src/runtime/panic.go:502 +0x229
github.com/integr8ly/grafana-operator/pkg/controller/grafana.(*ReconcileGrafana).createDeployment(0xc420362c30, 0xc420299500, 0x12710c2, 0x12, 0x0, 0x0)
	/home/travis/gopath/src/github.com/integr8ly/grafana-operator/pkg/controller/grafana/grafana_controller.go:390 +0x2055
github.com/integr8ly/grafana-operator/pkg/controller/grafana.(*ReconcileGrafana).installGrafana(0xc420362c30, 0xc420299500, 0xc42009c018, 0xc4209842fa, 0x6, 0xc4209842e0)
	/home/travis/gopath/src/github.com/integr8ly/grafana-operator/pkg/controller/grafana/grafana_controller.go:301 +0x8c
github.com/integr8ly/grafana-operator/pkg/controller/grafana.(*ReconcileGrafana).Reconcile(0xc420362c30, 0xc4209842fa, 0x6, 0xc4209842e0, 0xf, 0x4, 0x0, 0x0, 0x0)
	/home/travis/gopath/src/github.com/integr8ly/grafana-operator/pkg/controller/grafana/grafana_controller.go:103 +0x252
github.com/integr8ly/grafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc4202bcd20, 0x0)
	/home/travis/gopath/src/github.com/integr8ly/grafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:207 +0x139
github.com/integr8ly/grafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1()
	/home/travis/gopath/src/github.com/integr8ly/grafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157 +0x36
github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc420a11150)
	/home/travis/gopath/src/github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc420a11150, 0x3b9aca00, 0x0, 0x1, 0xc4207b1440)
	/home/travis/gopath/src/github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd
github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc420a11150, 0x3b9aca00, 0xc4207b1440)
	/home/travis/gopath/src/github.com/integr8ly/grafana-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by github.com/integr8ly/grafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
	/home/travis/gopath/src/github.com/integr8ly/grafana-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:156 +0x388

Am I missing something? Based on this file it worked at some point.

Trying to create that grafana object from the template folder produces the same error.

Ingress is not set up properly in k8s clusters

I am trying v1.4.0 on a GKE cluster. Here is my manifest which I copied from the examples:

apiVersion: integreatly.org/v1alpha1
kind: Grafana
metadata:
  name: example-grafana
spec:
  config:
    log:
      mode: "console"
      level: "warn"
    security:
      admin_user: "root"
      admin_password: "secret"
    auth:
      disable_login_form: False
      disable_signout_menu: True
    auth.basic:
      enabled: False
    auth.anonymous:
      enabled: True
  dashboardLabelSelector:
    - matchExpressions:
        - {key: app, operator: In, values: [grafana]}

The operator creates a ClusterIP service:

$ kubectl get svc
NAME              TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
grafana-service   ClusterIP   10.80.15.92   <none>        3000/TCP   114s

For non-OpenShift clusters, shouldn't a NodePort or LoadBalancer service be created?

Here is my ingress which never get assigned an address:

$ kubectl get ingress
NAME              HOSTS   ADDRESS   PORTS   AGE
grafana-ingress   *                 80      7m28s

I deleted the service and created the following NodePort service:

apiVersion: v1
kind: Service
metadata:
  labels:
    application-monitoring: "true"
  name: grafana-service
spec:
  ports:
  - name: grafana
    port: 3000
    protocol: TCP
    targetPort: 3000
  selector:
    app: grafana
  type: NodePort

Now my Ingress gets an IP address:

$ kubectl get ingress
NAME              HOSTS   ADDRESS         PORTS   AGE
grafana-ingress   *       34.101.217.55   80      10m

A NodePort service is fine for me as this is just for dev/testing, but it would probably be good to make it configurable.

OperatorHub.io contribution

Follow through contribution steps on https://operatorhub.io/contribute to add the Grafana Operator to OperatorHub.io

cc @pb82 for checklist of changes before you're happy to have a v1 on OperatorHub

Grafana plugin support via GrafanaDashboard CR #6
Use an upstream Grafana image, with ability to change/specify a different image #8
Allow custom containers e.g. for oauth proxy #9
Allow a namespace selector to specify where to look for CRs #4 #5
Auto update dashboards configmap when GrafanaDashboard CR's change
Update to latest operator-sdk version
Test & Verify the Operator manually https://github.com/operator-framework/community-operators/blob/master/docs/testing-operators.md#manual-testing-on-kubernetes
Preview the Operator as per https://github.com/operator-framework/community-operators#preview-your-operator-on-operatorhubio
Submit Pull request to https://github.com/operator-framework/community-operators (Kubernetes and OpenShift compatible)

Image used in grafana-operator container is not fully compatible with dashboards

Not sure if it is an issue or just a trick ; it it can be of any help ... I'll just put it here.

To install the Grafana operator on a Kubernetes cluster, I deploy (among others) the operator.yaml file. This manifest uses the following image for the grafana-operator container :

      containers:
        - name: grafana-operator
          image: quay.io/integreatly/grafana-operator:latest
          args:
            - '--grafana-image=quay.io/openshift/origin-grafana'
            - '--grafana-image-tag=4.2'

From a previous Grafana (helm) install, I created a dashboard using grafana/grafana:6.4.2. I tried to import it in my new Grafana installed via Grafana operator using quay.io/openshift/origin-grafana:4.2, and got the following error message :

using Firefox

Failed create dashboard model
e is undefined

using Chrome

Failed create dashboard model
undefined is not iterable (cannot read property Symbol(Symbol.iterator))

⚠️ quay.io/openshift/origin-grafana:4.2 might cause compatibility problems with dashboard created with other Grafana versions available on Github or Helm.

My dashboard is as follows :

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "description": "Request-Error-Duration pattern",
  "editable": true,
  "gnetId": null,
  "graphTooltip": 0,
  "id": 2,
  "links": [],
  "panels": [
    {
      "cacheTimeout": null,
      "colorBackground": false,
      "colorValue": true,
      "colors": [
        "#5794F2",
        "rgba(237, 129, 40, 0.89)",
        "#d44a3a"
      ],
      "datasource": "Prometheus",
      "description": "",
      "format": "none",
      "gauge": {
        "maxValue": 100,
        "minValue": 0,
        "show": false,
        "thresholdLabels": false,
        "thresholdMarkers": true
      },
      "gridPos": {
        "h": 3,
        "w": 3,
        "x": 0,
        "y": 0
      },
      "id": 2,
      "interval": null,
      "links": [],
      "mappingType": 1,
      "mappingTypes": [
        {
          "name": "value to text",
          "value": 1
        },
        {
          "name": "range to text",
          "value": 2
        }
      ],
      "maxDataPoints": 100,
      "nullPointMode": "connected",
      "nullText": null,
      "options": {},
      "pluginVersion": "6.4.2",
      "postfix": "",
      "postfixFontSize": "50%",
      "prefix": "",
      "prefixFontSize": "50%",
      "rangeMaps": [
        {
          "from": "null",
          "text": "N/A",
          "to": "null"
        }
      ],
      "sparkline": {
        "fillColor": "rgba(31, 118, 189, 0.18)",
        "full": false,
        "lineColor": "rgb(31, 120, 193)",
        "show": false,
        "ymax": null,
        "ymin": null
      },
      "tableColumn": "",
      "targets": [
        {
          "expr": "sum(http_request_duration_seconds_count{handler=\"index\",method=\"post\"})",
          "format": "time_series",
          "instant": false,
          "legendFormat": "",
          "refId": "A"
        }
      ],
      "thresholds": "",
      "timeFrom": null,
      "timeShift": null,
      "title": "Total number of decisions",
      "type": "singlestat",
      "valueFontSize": "80%",
      "valueMaps": [
        {
          "op": "=",
          "text": "N/A",
          "value": "null"
        }
      ],
      "valueName": "avg"
    },
    {
      "cacheTimeout": null,
      "datasource": "Prometheus",
      "description": "",
      "gridPos": {
        "h": 7,
        "w": 4,
        "x": 3,
        "y": 0
      },
      "id": 7,
      "links": [],
      "options": {
        "fieldOptions": {
          "calcs": [
            "mean"
          ],
          "defaults": {
            "mappings": [],
            "max": 100,
            "min": 0,
            "nullValueMode": "connected",
            "thresholds": [
              {
                "color": "semi-dark-red",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ],
            "unit": "percent"
          },
          "override": {},
          "values": false
        },
        "orientation": "horizontal",
        "showThresholdLabels": false,
        "showThresholdMarkers": true
      },
      "pluginVersion": "6.4.2",
      "targets": [
        {
          "expr": "100*\n(sum(rate(http_request_duration_seconds_count{code=\"400\",handler=\"index\",method=\"post\"}[1h])) by (instance))\n/\n(sum(rate(http_request_duration_seconds_count{handler=\"index\",method=\"post\"}[1h])) by (instance))",
          "refId": "A"
        }
      ],
      "timeFrom": null,
      "timeShift": null,
      "title": "Decisions in error 400 over last hour",
      "type": "gauge"
    }
  ],
  "refresh": "5s",
  "schemaVersion": 20,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": []
  },
  "time": {
    "from": "now-1h",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": [
      "5s",
      "10s",
      "30s",
      "1m",
      "5m",
      "15m",
      "30m",
      "1h",
      "2h",
      "1d"
    ]
  },
  "timezone": "",
  "title": "OPA RED",
  "uid": "-t41kUcZz",
  "version": 5
}

💡 Removing the second panel, I can successfully import my dashboard.

💡 Modifying the operator manifest as follows, I can successfully import my dashboard.

      containers:
        - name: grafana-operator
          image: quay.io/integreatly/grafana-operator:v2.0.0
          args:
            - '--grafana-image=grafana/grafana'
            - '--grafana-image-tag=6.4.2'

No problem then, just an info to share. And a suggestion : update the image of grafana-operator, or reference a more widely spread image ;)

Plugin Installation

Hi @pb82

I thought i'd open a conversation before jumping into coding. In regard to plugin installation, I can see your using an init container plugins (note, i'm not aware of whats behind the quay.io/openshift/origin-grafana image), but I was wondering if using the official images and their https://grafana.com/docs/installation/docker/#installing-plugins-for-grafana plugin installation was an idea? ... extending the grafana crd type and adding a []Plugins and using the env to roll the deployment ...

Whats your thoughts?

Deployment Spec Enhancements

The current deployment spec has some important parameters missing such as Pod Disruption Budgets, security context, resource quotas. Can we add those in the grafana deployment struct.
I can issue a PR on those. @pb82 :)

hostname seems to be ignored in OpenShift

In OpenShift whenever the route is created it seems the .spec.ingress.hostname is ignored and is always created with the default name. I beleive it is because the .spec.ingress.hostname is not being referenced here:

https://github.com/integr8ly/grafana-operator/blob/f551d384f11ec69147ab7e050f9dffb9fbe80436/pkg/controller/model/grafanaRoute.go#L70

RBAC error

Hi, thanks for this project, really useful !

I have installed the operator following the README instructions (not from operatorhub because the Operator Lifecycle Manager does not apparently allow to configure the installation...let me know if i am wrong on this) on a vanilla kubernetes cluster (installed on a cloud provider with kubeadm in HA mode) :

export GRAFANA_OPERATOR_RELEASE_VERSION="3.0.1"
export GRAFANA_OPERATOR_RELEASE_URL="https://github.com/integr8ly/grafana-operator/archive/v${GRAFANA_OPERATOR_RELEASE_VERSION}.tar.gz"

# Get release
curl -sSL "${GRAFANA_OPERATOR_RELEASE_URL}" | tar -xz
cd grafana-operator-${GRAFANA_OPERATOR_RELEASE_VERSION} 

kubectl create ns monitoring

# Create the custom resource definitions that the operator uses
kubectl create -f deploy/crds

# Create the operator roles
kubectl create -f deploy/roles -n monitoring

# To scan for dashboards in other namespaces, we also need the cluster roles
kubectl create -f deploy/cluster_roles

# To deploy the operator to that namespace
kubectl create -f deploy/operator.yaml -n monitoring

# Waiting for deployments to be rolled out
kubectl rollout status deploy/grafana-operator --namespace monitoring
deployment "grafana-operator" successfully rolled out

# Check pod
kubectl get po -n monitoring
NAME                                READY   STATUS    RESTARTS   AGE
grafana-operator-6d49c98cd4-8mp5c   1/1     Running   0          7h29m

But when I look at the operator's pod logs, I see some repeated RBAC errors :

kubectl logs -f grafana-operator-6d49c98cd4-8mp5c
[...]
E1214 16:04:07.733173       1 reflector.go:205] pkg/cache/internal/informers_map.go:126: Failed to list *v1alpha1.GrafanaDashboard: grafanadashboards.integreatly.org is forbidden: User "system:serviceaccount:monitoring:grafana-operator" cannot list resource "grafanadashboards" in API group "integreatly.org" at the cluster scope

It is odd because RBAC seems to have been created correctly :

kubectl auth can-i  list grafanadashboards --as=system:serviceaccount:monitoring:grafana-operator -n monitoring
yes

Any idea please ?

grafana-plugin image not customisable

Hi,

I'm behind an artifactory proxy, so I have to change grafana-plugins image to remove "quay.io" part, but it's hardcoded in template,

quay.io/integreatly/grafana_plugins_init:{{ .PluginsInitContainerImageTag }}'

Thanks in advance for your help
Ludo

Access denied

With 3.0.1 (on OpenShift 3.11) the deployment now complains:

t=2019-12-16T12:32:30+0000 lvl=eror msg="Access denied to save dashboard" logger=context userId=0 orgId=1 uname= error="Access denied to save dashboard"
t=2019-12-16T12:32:30+0000 lvl=info msg="Request Completed" logger=context userId=0 orgId=1 uname= method=POST path=/api/dashboards/db status=403 remote_addr=10.29.0.49 time_ms=1 size=45 referer=

over and over.

Granafa - bump version to 6.4?

Would this be possible? 6.4 has some additions for Loki

Make grafana pod resources configurable

It is necessary to be able to specify the pod resource used by grafana pods to limit the usage of Node resources when it is shared with other pods of the system.
for example.

resources:
  requests:
    memory: 50Mi
    cpu: 50m
  limits:
    memory: 100Mi
    cpu: 100m