linkerd / linkerd-viz Goto Github PK
View Code? Open in Web Editor NEWTop-line service metrics dashboard for Linkerd 1.
License: Apache License 2.0
Top-line service metrics dashboard for Linkerd 1.
License: Apache License 2.0
Once linkerd/linkerd#938 ships, we'll need to update/remove the metrics relabeling rules in our prometheus configs that we use to populate the service label. Using linkerd-provided labels will make us less reliant on regexes.
we currently do port replacements in grafana.ini and prometheus_data_source.json at boot time. it may be simpler just to generate these files from scratch.
reference:
https://github.com/BuoyantIO/linkerd-viz/blob/master/linkerd-viz#L15
In terminal:
linkerd viz install | kubectl apply -f -
then:
kubectl get pods -n linkerd-viz
NAME READY STATUS RESTARTS AGE
metrics-api-57c76d5c5c-jztcg 2/2 Running 0 56m
prometheus-5bcd95c8fc-vsrhb 2/2 Running 0 56m
tap-6694cbcb97-lsntm 2/2 Running 0 17m
tap-injector-779b797dbf-lrtj6 2/2 Running 0 17m
web-7bff7b8d89-7d9b8 2/2 Running 0 56m
dcos and k8s files live in root. as linkerd-viz supports more platforms, it may be better to put them into subdirectories.
Hi,
the parameter grafana.url is not well documented.
I installed as described linkerd-viz + grafana from the helm chart in the same namespace.
So for grafana.url I configured http://grafana, http://grafana/, but I get only a 502 in return for the grafana links.
what happed?
Linkerd-viz does something with the url and produces error log entries like
2023/03/14 10:09:15 http: proxy error: dial tcp: lookup http://grafana: no such host
2023/03/14 10:19:15 http: proxy error: dial tcp: lookup http://grafana/: no such host
depending on the configuration.
So I eneded up in reading https://linkerd.io/2.12/tasks/grafana/index.html and use the configuration grafana:80
from the example.
Something is wrong in the rewriting or the documentation needs an update.
Best...
Uwe
Hi,
I'm trying to deploy linkerd-viz in a Kubernetes cluster where I have rights on a single namespace, and I'm getting the following errors:
Failed to list *v1.Service: User \"system:serviceaccount:my-namespace:viewer\" cannot list all services in the cluster"
Failed to list *v1.Pod: User \"system:serviceaccount:my-namespace:viewer\" cannot list all pods in the cluster"
Failed to list *v1.Endpoints: User \"system:serviceaccount:my-namespace:viewer\" cannot list all endpoints in the cluster"
It seems that Prometheus is trying to list services / pods in the whole cluster. Is there a way to have it restrict itself to the namespace my-namespace
only? I was thinking that using a __meta_kubernetes_namespace
meta label could do the trick, but I'm unsure whether that will change the API call that Prometheus does, or just filter the services afterwards.
Note that I run linkerd-viz is run under a viewer
service account that can list services / pods inside my namespace.
Thanks!
@siggy The viz looks great. I am deploying a cluster for company on GKE, and really need to be able to secure the public facing Grafana auth to use the same Google Account users.
While I can normally do this from config files and UI, I'm struggling with this setup. It logs in anonymous, but the grafana admin functions are not available. I assume this is a feature of the anonymous logins, which I haven't used before.
Do you have a reference config to get it running with google cloud IAM you could share?
Currently linkerd-viz requires a linker-to-linker configuration to display metrics:
https://github.com/BuoyantIO/linkerd-examples/blob/master/dcos/linker-to-linker/linkerd-config.yml
Modify linkerd-viz to display metrics when in a simple-proxy configuration:
https://github.com/BuoyantIO/linkerd-examples/blob/master/dcos/simple-proxy/linkerd-config.yml
The changes in #33 made for a surprise, when the success rate and request volume graphs became empty.
The default of 1m matches the query in Grafana, so the graphs become empty when they don't have two data points. I'd recommend a different default, like 30s.
However, this can't be overridden, because the replacement with $SCRAPE_INTERVAL fails.
The root of the issue is here:
sed -i "" "s@scrape_interval:.*@scrape_interval: $SCRAPE_INTERVAL@" $PROMETHEUS_CONF
sed -i "" "s@ evaluation_interval:.*@ evaluation_interval: $SCRAPE_INTERVAL@" $PROMETHEUS_CONF
With sed -i ""
, the "" is interpreted as a filename by GNU sed, and the sed command fails. So the replacements never happen, and the command is only ever executed with the default "1m" in prometheus-$PLATFORM.yml . (whichever file that turns out to be, "k8s" in my case).
This should be sed -i""
, with no space. (verified inside the container)
It would be a two-character PR, but the default causing empty graphs is also surprise, so the defaults should probably be adjusted down to ensure the irate() call has data.
linkerd-viz does not properly aggregate metrics from DC/OS applications deployed as part of groups.
For example, an app named my-group/webapp
yields metrics like this:
rt:outgoing:dst:id:_:io_l5d_marathon:my_group:webapp:requests 11
the last two metrics_relabel
steps defined at:
https://github.com/BuoyantIO/linkerd-viz/blob/master/dcos/prometheus-dcos.yml#L32
.... cause the metric to be rewritten as:
linkerd:incoming:webapp:requests{instance="10.0.2.164",job="linkerd",service="my_group"}
...when the expected metric should be:
linkerd:incoming:requests{instance="10.0.2.164",job="linkerd",service="my_group/webapp"}
In prometheus-mesos-marathon.yml line 11:
marathon_sd_configs:
- servers:
- 'http://localhost:8080'
should be changed to:
marathon_sd_configs:
- servers:
- 'http://marathon.mesos:8080'
To reflect other linkerd mesos-marathon examples.
You can also modify the linkerd-viz.json under mesos-marathon to use add-host parameter if that dns entry will not resolve:
{
"id": "linkerd-viz",
"instances": 1,
"cpus": 1.0,
"mem": 512.0,
"acceptedResourceRoles": ["*", "slave_public"],
"maintainer": "[email protected]",
"container": {
"type": "DOCKER",
"docker": {
"image": "buoyantio/linkerd-viz:latest",
"parameters": [
{
"key": "add-host",
"value": "marathon.mesos:192.168.250.11"
}
],
"forcePullImage": true,
"network": "HOST",
"privileged": true
}
},
"args": ["mesos-marathon"],
...
Additional stats around connection counts and client pools can be helpful in diagnosing performance issues. Consider adding these to the dashboard.
relevant connection stats:
rt:client:connections
rt:client:connects
rt:server:connections
rt:server:connects
relevant client pool stats:
rt:client:pool_cached
rt:client:pool_num_too_many_waiters
rt:client:pool_num_waited
rt:client:pool_size
rt:client:pool_waiters
modify linkerd-viz
executable to exec
prometheus, ensure it runs as pid 1.
relevant:
https://www.ctl.io/developers/blog/post/gracefully-stopping-docker-containers/
The current dashboard is very top-level request volume / success rate focused. Consider displaying linkerd health metrics (gc, etc), either on the existing dashboard, or as a separate "health" dashboard.
The Readme for the Consul Deploy says to start Consul with docker in host networking mode, then start the linkerd-vis docker container without host networking mode. This will not work because the linkerd-vis configuration for consul is set to use localhost:8500. The localhost will not work unless the linkerd-vis container is also run in host networking mode.
Have tried to deploy in DCOS using the universe package. Results in the service never deploying or running. I have looked on each node for the docker image buoyantio/linkerd-viz. No image has been pulled. I have also tried this from using just the json and creating a service from it and the result is the same. Not sure where else to look to resolve this. Not urgent but certainly would like to see this working in 1.9
As reported in Slack, prom/prometheus
does not install properly if you are running minikube with podman.
Error: ImageInspectError
Warning InspectFailed 3m58s (x259 over 128m) kubelet Failed to inspect image "prom/prometheus:v2.47.0": rpc error: code
= Unknown desc = short-name "prom/prometheus:v2.47.0" did not resolve to an alias and no unqualified-search registries are defined in
"/etc/containers/registries.conf"
The reason for this is that only when running minikube with podman, there is no unqualified search registries. If you instead run minikube with docker, this just works without any issues.
I have to say I'm not entirely sure. I think I would want podman to provide these unqualified search registries out of the box. But on the other hand, maybe it would be good if linkerd was friendly enough to specify the prometheus dependency prefixed with the intended search registry?
minikube ssh
.sudo vi /etc/containers/registries.conf
.unqualified-search-registries = ["docker.io", "quay.io"]
.minikube stop && minikube start
.In order to support an Automated Build on Docker Hub, the DockerFile
must be self-contained, and not rely on dockerize
as a setup step.
I think this broke as part of 71138df. The kubernetes_sd_configs
scrape config has a role called "endpoints", but linkerd-viz is using the role "endpoint". The role name must have changed when we upgraded prometheus.
scrape_interval
and evaluation_interval
are hard-coded at 5s
, which is quite frequent. consider increasing this interval, or make it configurable for the user.
Grafana 5 added support for data source provisioning via config file:
http://docs.grafana.org/guides/whats-new-in-v5/#data-sources
This should significantly decrease linkerd-viz's startup complexity, where we can forego hitting Grafana's API to add a data source.
Modify the linkerd-viz DC/OS Universe package to support a DC/OS service URL:
http://<DCOS_URL>/service/linkerd-viz/
Attempted to set the server/root_url
in the grafana.ini to the DCOS_URL
, but got:
{"message":"Invalid Basic Auth Header"}
Where's the chart for linkerd2? And don't point me to the docs that tell me to install a CLI on my local system and install it manually there - that's ridiculous. This should be doable via IaC.
Grafana.com provides a central repository of popular dashboards. It would be great to see linkerd-health-dashboard.json
and linkerd-viz-dashboard.json
available there.
I'm working through some linkerd examples from your blog.
If I use the linkerd-ingress-controller (without using tls) linkerd-viz works.
But if i switch to linkerd-tls-ingress-controller prometheus doesn't get any data.
Is there a way to adjust the config that it also works with tls enabled communication?
Right now the linkerd-viz dashboard requires that the router from which it pull stats be labeled as "incoming", which might not be the case in all setups. We should think about ways to support other labels, possibly using grafana templating.
We use Linkerd in a cluster that pretty much blocks every INGRESS/EGRESS not white listed with NetworkPolicies
or GlobalNetworkPolicies
(via Calico's CRD).
After successfully upgrading Linkerd from 2.9.4 to 2.10.1 we can't figure out what the viz
plugin need and the fact it's installed in its own namespace makes all our previous configuration useless...
Can anyone help with some guidelines on how to proceed? What ports are used to where? If a cluster-wide configuration is needed, what would it look like?
The dashboard today calculates rates per second. For lower velocities, It would be useful, likely via a template variable, to support minutely and hourly rates.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.