Code Monkey home page Code Monkey logo

linkerd-viz's Issues

grafana was not installed

In terminal:
linkerd viz install | kubectl apply -f -
then:
kubectl get pods -n linkerd-viz
NAME READY STATUS RESTARTS AGE
metrics-api-57c76d5c5c-jztcg 2/2 Running 0 56m
prometheus-5bcd95c8fc-vsrhb 2/2 Running 0 56m
tap-6694cbcb97-lsntm 2/2 Running 0 17m
tap-injector-779b797dbf-lrtj6 2/2 Running 0 17m
web-7bff7b8d89-7d9b8 2/2 Running 0 56m

捕获
Uploading 捕获.PNG…

improvement for documentaiton regarding grafana

Hi,

the parameter grafana.url is not well documented.

I installed as described linkerd-viz + grafana from the helm chart in the same namespace.
So for grafana.url I configured http://grafana, http://grafana/, but I get only a 502 in return for the grafana links.

what happed?
Linkerd-viz does something with the url and produces error log entries like
2023/03/14 10:09:15 http: proxy error: dial tcp: lookup http://grafana: no such host
2023/03/14 10:19:15 http: proxy error: dial tcp: lookup http://grafana/: no such host
depending on the configuration.

So I eneded up in reading https://linkerd.io/2.12/tasks/grafana/index.html and use the configuration grafana:80 from the example.

Something is wrong in the rewriting or the documentation needs an update.

Best...
Uwe

Limit linkerd instances search to a particular Kubernetes namespace?

Hi,

I'm trying to deploy linkerd-viz in a Kubernetes cluster where I have rights on a single namespace, and I'm getting the following errors:

Failed to list *v1.Service: User \"system:serviceaccount:my-namespace:viewer\" cannot list all services in the cluster"
Failed to list *v1.Pod: User \"system:serviceaccount:my-namespace:viewer\" cannot list all pods in the cluster" 
Failed to list *v1.Endpoints: User \"system:serviceaccount:my-namespace:viewer\" cannot list all endpoints in the cluster" 

It seems that Prometheus is trying to list services / pods in the whole cluster. Is there a way to have it restrict itself to the namespace my-namespace only? I was thinking that using a __meta_kubernetes_namespace meta label could do the trick, but I'm unsure whether that will change the API call that Prometheus does, or just filter the services afterwards.

Note that I run linkerd-viz is run under a viewer service account that can list services / pods inside my namespace.

Thanks!

How to integrate auth?

@siggy The viz looks great. I am deploying a cluster for company on GKE, and really need to be able to secure the public facing Grafana auth to use the same Google Account users.

While I can normally do this from config files and UI, I'm struggling with this setup. It logs in anonymous, but the grafana admin functions are not available. I assume this is a feature of the anonymous logins, which I haven't used before.

Do you have a reference config to get it running with google cloud IAM you could share?

Configurable scrape_interval is broken, breaks 1m Grafana graphs

The changes in #33 made for a surprise, when the success rate and request volume graphs became empty.

The default of 1m matches the query in Grafana, so the graphs become empty when they don't have two data points. I'd recommend a different default, like 30s.

However, this can't be overridden, because the replacement with $SCRAPE_INTERVAL fails.

The root of the issue is here:

sed -i "" "s@scrape_interval:.*@scrape_interval: $SCRAPE_INTERVAL@" $PROMETHEUS_CONF
sed -i "" "s@ evaluation_interval:.*@ evaluation_interval: $SCRAPE_INTERVAL@" $PROMETHEUS_CONF

With sed -i "" , the "" is interpreted as a filename by GNU sed, and the sed command fails. So the replacements never happen, and the command is only ever executed with the default "1m" in prometheus-$PLATFORM.yml . (whichever file that turns out to be, "k8s" in my case).

This should be sed -i"", with no space. (verified inside the container)

It would be a two-character PR, but the default causing empty graphs is also surprise, so the defaults should probably be adjusted down to ensure the irate() call has data.

DC/OS application group support

linkerd-viz does not properly aggregate metrics from DC/OS applications deployed as part of groups.

For example, an app named my-group/webapp yields metrics like this:

rt:outgoing:dst:id:_:io_l5d_marathon:my_group:webapp:requests 11

the last two metrics_relabel steps defined at:
https://github.com/BuoyantIO/linkerd-viz/blob/master/dcos/prometheus-dcos.yml#L32

.... cause the metric to be rewritten as:

linkerd:incoming:webapp:requests{instance="10.0.2.164",job="linkerd",service="my_group"}

...when the expected metric should be:

linkerd:incoming:requests{instance="10.0.2.164",job="linkerd",service="my_group/webapp"}

linkerd-viz assumes marathon master runs on localhost

In prometheus-mesos-marathon.yml line 11:

marathon_sd_configs:
  - servers:
    - 'http://localhost:8080'

should be changed to:

marathon_sd_configs:
  - servers:
    - 'http://marathon.mesos:8080'

To reflect other linkerd mesos-marathon examples.

You can also modify the linkerd-viz.json under mesos-marathon to use add-host parameter if that dns entry will not resolve:

{
  "id": "linkerd-viz",
  "instances": 1,
  "cpus": 1.0,
  "mem": 512.0,
  "acceptedResourceRoles": ["*", "slave_public"],
  "maintainer": "[email protected]",
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "buoyantio/linkerd-viz:latest",
      "parameters": [
        {
          "key": "add-host",
          "value": "marathon.mesos:192.168.250.11"
        }
      ],
      "forcePullImage": true,
      "network": "HOST",
      "privileged": true
    }
  },
  "args": ["mesos-marathon"],
  ...

display connection and client pool metrics

Additional stats around connection counts and client pools can be helpful in diagnosing performance issues. Consider adding these to the dashboard.

relevant connection stats:

  • rt:client:connections
  • rt:client:connects
  • rt:server:connections
  • rt:server:connects

relevant client pool stats:

  • rt:client:pool_cached
  • rt:client:pool_num_too_many_waiters
  • rt:client:pool_num_waited
  • rt:client:pool_size
  • rt:client:pool_waiters

linkerd health metrics

The current dashboard is very top-level request volume / success rate focused. Consider displaying linkerd health metrics (gc, etc), either on the existing dashboard, or as a separate "health" dashboard.

Consul startup doc error

The Readme for the Consul Deploy says to start Consul with docker in host networking mode, then start the linkerd-vis docker container without host networking mode. This will not work because the linkerd-vis configuration for consul is set to use localhost:8500. The localhost will not work unless the linkerd-vis container is also run in host networking mode.

DCOS 1.9 linkerd-viz never deploys

Have tried to deploy in DCOS using the universe package. Results in the service never deploying or running. I have looked on each node for the docker image buoyantio/linkerd-viz. No image has been pulled. I have also tried this from using just the json and creating a service from it and the result is the same. Not sure where else to look to resolve this. Not urgent but certainly would like to see this working in 1.9

Prometheus does not install under minikube/podman

Problem description

As reported in Slack, prom/prometheus does not install properly if you are running minikube with podman.

image

Error: ImageInspectError
  Warning  InspectFailed  3m58s (x259 over 128m)  kubelet            Failed to inspect image "prom/prometheus:v2.47.0": rpc error: code
 = Unknown desc = short-name "prom/prometheus:v2.47.0" did not resolve to an alias and no unqualified-search registries are defined in 
"/etc/containers/registries.conf"

The reason for this is that only when running minikube with podman, there is no unqualified search registries. If you instead run minikube with docker, this just works without any issues.

Expected behavior

I have to say I'm not entirely sure. I think I would want podman to provide these unqualified search registries out of the box. But on the other hand, maybe it would be good if linkerd was friendly enough to specify the prometheus dependency prefixed with the intended search registry?

Workaround

  1. Log into minikube with minikube ssh.
  2. Run sudo vi /etc/containers/registries.conf.
  3. Add unqualified-search-registries = ["docker.io", "quay.io"].
  4. Restart minikube with minikube stop && minikube start.

DC/OS service URL

Modify the linkerd-viz DC/OS Universe package to support a DC/OS service URL:
http://<DCOS_URL>/service/linkerd-viz/

Attempted to set the server/root_url in the grafana.ini to the DCOS_URL, but got:

{"message":"Invalid Basic Auth Header"}

Linkerd2-viz chart?

Where's the chart for linkerd2? And don't point me to the docs that tell me to install a CLI on my local system and install it manually there - that's ridiculous. This should be doable via IaC.

Support more flexible router labels

Right now the linkerd-viz dashboard requires that the router from which it pull stats be labeled as "incoming", which might not be the case in all setups. We should think about ways to support other labels, possibly using grafana templating.

Missing Network Policy Guidelines

We use Linkerd in a cluster that pretty much blocks every INGRESS/EGRESS not white listed with NetworkPolicies or GlobalNetworkPolicies (via Calico's CRD).

After successfully upgrading Linkerd from 2.9.4 to 2.10.1 we can't figure out what the viz plugin need and the fact it's installed in its own namespace makes all our previous configuration useless...

Can anyone help with some guidelines on how to proceed? What ports are used to where? If a cluster-wide configuration is needed, what would it look like?

support for minutely/hourly graphs

The dashboard today calculates rates per second. For lower velocities, It would be useful, likely via a template variable, to support minutely and hourly rates.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.