k8sgpt-ai / k8sgpt-operator Goto Github PK

View Code? Open in Web Editor NEW

260.0 2.0 69.0 2.76 MB

Automatic SRE Superpowers within your Kubernetes cluster

Home Page: https://k8sgpt.ai

License: Apache License 2.0

Dockerfile 1.65% Makefile 9.22% Go 86.60% Shell 0.59% Smarty 1.95%

devops kubernetes openai sre tooling

k8sgpt-operator's People

Contributors

Stargazers

Watchers

Forkers

rakshitgondwal bradmccoydev harshit-mehtaa arcilli tylergillson aisuko arbreezy doytsujin deep9191 jxs1211 matthisholleville tomaszwostal zaidizeeshan nlamirault rocrisp anaisurlichs mobs75 joepoptiya franciscoprestes jkleinlercher hyman1105 luis-sousa-pinto a7vicky sredevopsorg fourthkindsolutions matesousa fyuan1316 eyal-sofer eyalsofer alexrogalskiy pascal-h-kim wujunwei isramareddy reza srayaprolufm nuaays jas-atwal j2joi vimwarrior ruanxin anders-swanson johnsonnu amr-mokhtar prometherion juhyung-son robbieallover kevincichon singhiqbal1007 tozastation awesome-openai fossabot chnyda vkritsimas phillipahereza wolfi-chainguard-demo amitamrutiya2210 ultram4rine tanujd11 jeesmon mfahlandt jibbscript samrocketman fw-corp vaibhavmalik4187 aaroniscode

k8sgpt-operator's Issues

[Question]: failed while calling AI provider localai: error, status code: 401

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

No response

Kubernetes Version

No response

Host OS and its Version

Kubernetes Cluster

Steps to reproduce

I follow the LocalAI steps but the k8sgpt operator give the following error

Expected behaviour

I expect the k8sgbt can talk with the localai backend with no need for OpenAI token

Actual behaviour

Additional Information

No response

feature: Create ArtifactHub Entry for k8sgpt-operator

Checklist:

I've searched for similar issues and couldn't find anything matching
I've discussed this feature in the #k8sgpt slack channel

Is this feature request related to a problem?

As a user of the k8sgpt-operator I would like to see this chart on ArtifactHub so I can subscribe to updates and get an email notification when there is a new version available, see the helm values easily, and see the security posture of the chart.

We can also create a badge on the repository that links to the ArtifactHub page.

Here is a blog I have done in the past explaining the process: https://blog.bradmccoy.io/linking-helm-charts-in-github-pages-to-artifacthub-46e02e19abfe

feature: Help wanted implementing release-please

In order to bring the project to a timely cadence of build and publish
release-please would be ideal to implement.

This should enable us to trigger a rebuild on tag that releases a new version and chart

Fix error in the example for k8sgpt-local-ai

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Yes

Problem Description

In the example for the K8sGPT there spec is wrong. The namespace doesn't exist under spec it should be in metadata.

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-local-ai
spec:
  namespace: default
  model: gpt-3.5-turbo
  backend: localai
  noCache: false
  version: v0.2.7
  enableAI: true

This should be fixed as per: chart/operator/templates/k8sgpt-crd.yaml

Solution Description

Move namespace under metadata not spec

Benefits

New people won't have errors when applying example.

Potential Drawbacks

No response

Additional Information

No response

question: strange logs + missing usage information

Checklist:

[x ] I've searched for similar issues and couldn't find anything matching
[ x] I've included steps to reproduce the bug.
I've included the version of Kubernetes and k8sgpt.

Subject of the issue

error storing value to cache: open /home/nonroot/.cache/k8sgpt/openai

Your environment

K8s 1.24 (vanilla)
Ubuntu 20.04

Steps to reproduce

Install k8sgpt operator using the README, added valid openai key.

Expected behaviour

not sure to be honest

Actual behaviour

as I don't know how to use this, how to collect outputs or direct them to any particular prometheus server, etc., I checked the logs on the pod and found many messages like this in the log:
error storing value to cache: open /home/nonroot/.cache/k8sgpt/openai-<snip>gc3RhdHVzLg==: file name too long
I also see what looks like successfull http 200 messages from, presumably, the openai API:
{"level":"info","ts":1682506122.1083932,"msg":"request completed","duration_ms":63152,"method":"GET","remote_addr":"100.72.70.84:45874","status_code":200,"url":"/analyze"}

I don't know if the installation is working correctly or not, nor can I figure out how to see the outputs of the analyses.

[Feature]: create Grafana-Dashboard CR optionally

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Problem Description

This issue is very similar to kyverno/kyverno#7992 . When using the grafana-operator, the grafana-dashboards should be defined in special Grafana-Dashboard CRs and not just a ConfigMap.
So it would be great if the Helm-Charts creates also a Grafana-Dashboard CR and referenced to the ConfigMap (opt-in).

Solution Description

very similar to kyverno/kyverno#7992

Benefits

User who have the grafana-operator deployt get automatically the K8sGPT Grafana Dashboard.

Potential Drawbacks

No response

Additional Information

No response

[Question]: There is an error in the system, but the result is empty

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v0.021

Kubernetes Version

v1.26.8

Host OS and its Version

rockylinux 8.8

Steps to reproduce

Errors in the system

nginx-deployment-65b5dd9c95-8p4vr   0/1     ImagePullBackOff   0          3m18s

Configuration file for K8sGPT

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-local-ai
  namespace: monitoring
spec:
  ai:
    enabled: true
    model: ggml-gpt4all-j
    backend: localai
    language: chinese
    baseUrl: http://10.0.0.100:8089/v1
  noCache: false
  version: v0.3.8

Command line access to localai

# curl http://10.0.0.100:8089/v1/models
{"object":"list","data":[{"id":"ggml-gpt4all-j","object":"model"}]}

K8sgpt deployment log

Service nfs-subdir-external-provisioner/cluster.local-nfs-subdir-external-provisioner does not exist
{"level":"info","ts":1694382338.530298,"caller":"server/log.go:50","msg":"request failed. failed while calling AI provider localai: error, status code: 500, message: rpc error: code = Unknown desc = unimplemented","duration_ms":140,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"localai\" explain:true anonymize:true language:\"chinese\" max_concurrency:10 output:\"json\"","remote_addr":"10.244.85.203:54390","status_code":2}

k8sgpt-operator log

Creating new client for 10.96.216.36:8080
Connection established between 10.96.216.36:8080 and localhost with time out of 1 seconds.
Remote Address : 10.96.216.36:8080 
K8sGPT address: 10.96.216.36:8080
Finished Reconciling k8sGPT with error: failed to call Analyze RPC: rpc error: code = Unknown desc = failed while calling AI provider localai: error, status code: 500, message: rpc error: code = Unknown desc = unimplemented
2023-09-10T21:45:38Z	ERROR	Reconciler error	{"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-local-ai","namespace":"monitoring"}, "namespace": "monitoring", "name": "k8sgpt-local-ai", "reconcileID": "a8ee0a63-d11b-4cc3-b75e-d313a99a9f7c", "error": "failed to call Analyze RPC: rpc error: code = Unknown desc = failed while calling AI provider localai: error, status code: 500, message: rpc error: code = Unknown desc = unimplemented"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226

Expected behaviour

I don't understand why I can't connect to localai

Actual behaviour

No response

Additional Information

No response

[Question]: I'm getting an empty results probably bc this error: failed to call Analyze RPC: rpc error: code = Unimplemented desc = unknown service schema.v1.Server

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v0.3.8

Kubernetes Version

v1.26.5

Host OS and its Version

ubuntu 20.04

Steps to reproduce

kubectl get results -n k8sgpt-operator-system -o json | jq

Expected behaviour

I'm trying to use k8sgpt to help me with an onpremise kubernets, but I'm having an error logged and the results are empty

Actual behaviour

{
"apiVersion": "v1",
"items": [],
"kind": "List",
"metadata": {
"resourceVersion": ""
}
}

Additional Information

I'm having this error many times in the operator log
2023-06-22T06:35:41Z ERROR Reconciler error {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-sample","namespace":"k8sgpt-operator-system"}, "namespace": "k8sgpt-operator-system", "name": "k8sgpt-sample", "reconcileID": "3751af6e-2ea7-498a-a7da-cf417946a810", "error": "failed to call Analyze RPC: rpc error: code = Unimplemented desc = unknown service schema.v1.Server"}

it seems like issue 98 but it is on an onpremise kubernetes hosted on contabo
I'm just starting with k8sgpt
Here is the yaml:
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: k8sgpt-sample
namespace: k8sgpt-operator-system
spec:
enableAI: true
model: gpt-3.5-turbo
backend: openai
secret:
name: k8sgpt-sample-secret
key: openai-api-key
noCache: false
version: v0.3.8

I'm using a Personal open ai key

[Feature]: Add integrations support in the operator

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Problem Description

Feature parity with K8sGPT current implementation:

More specifically, the operator at the moment doesn't support the integrations feature (trivy).

Recently we introduced in k8sgpt cli a no-install subfeature in the integrations which effectively won't install Trivy.

Solution Description

K8sGPT CR will have a new integrations section to support this feature which will not install integrations but assume they are already installed
e.g trivy operator is already deployed to a K8s cluster that K8sGPT operator is running

example CR:

...  
integrations:
  trivy:
    enabled: true
...

Benefits

Feature parity with K8sGPT and ability to extend integrations support in the future

Potential Drawbacks

No response

Additional Information

No response

[Feature]: Slack output

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Problem Description

No response

Solution Description

In order to provide even more value to users we should give them ability to alert slack

e.g.

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-sample
  namespace: k8sgpt-operator-system
spec:
  model: gpt-3.5-turbo
  backend: openai
  noCache: false
  version: v0.3.0
  enableAI: true
  sinks:
   slack:
     webhook: <WEBHOOK>
  secret:
    name: k8sgpt-sample-secret
    key: openai-api-key

The idea is that this would let the user drop in a slack webhook to a pre-configured instance/channel

The main area for refactoring code would be here

I would suggest adding an additional sinks.Emit() style method and a new package pkg/sinks/slack where the sinks manager would have Slack as a registered sink type, we would also need a Configure method that is setup when the webhook is first detected on the CR

Benefits

Slack messages

Potential Drawbacks

No response

Additional Information

No response

[Question]: When a result is updated it is incorrectly marked as incremented in the metrics count

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v0.3.0

Kubernetes Version

v1.27.0

Host OS and its Version

Ubuntu

Steps to reproduce

// Update metrics
			k8sgptNumberOfResultsByType.With(prometheus.Labels{
				"kind": resultSpec.Kind,
				"name": resultSpec.Name,
			}).Inc()
			

			err = r.Create(ctx, &result)
			if err != nil {
				// if the result already exists, we will update it
				if errors.IsAlreadyExists(err) {
					// Get the actual result with metadata rather than our local construct
					var newResult corev1alpha1.Result
					err = r.Get(ctx, client.ObjectKey{Namespace: k8sgptConfig.Namespace,
						Name: name}, &newResult)
					if err != nil {
						k8sgptReconcileErrorCount.Inc()
						return r.finishReconcile(err, false)
					}

Expected behaviour

This should be moved to the create phase only

Actual behaviour

No response

Additional Information

No response

[Bug]: container "manager" in pod "release-k8sgpt-operator-controller-manager-xxx-xxx is waiting to start: trying and failing to pull image

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

No response

Kubernetes Version

No response

Host OS and its Version

No response

Steps to reproduce

k logs -f release-k8sgpt-operator-controller-manager-xxx-xxx -n k8sgpt-operator-system
Error from server (BadRequest): container "manager" in pod "release-k8sgpt-operator-controller-manager-675c7f8bf-4rkpk" is waiting to start: trying and failing to pull image

Expected behaviour

container needs to up and running and able to pull the image

Actual behaviour

No response

Additional Information

No response

[Question]: Incorrect error/warning when there is no K8sGPT CR

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v0.3.0

Kubernetes Version

v1.27

Host OS and its Version

No response

Steps to reproduce

E0512 10:15:33.572907       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1alpha1.K8sGPT: the server could not find the requested resource (get k8sgpts.core.k8sgpt.ai)W0512 10:15:34.628626       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1alpha1.K8sGPT: the server could not find the requested resource (get k8sgpts.core.k8sgpt.ai)E0512 10:15:34.628853       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1alpha1.K8sGPT: failed to list *v1alpha1.K8sGPT: the server could not find the requested resource (get k8sgpts.core.k8sgpt.ai)W0512 10:15:36.381299       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1alpha1.K8sGPT: the server could not find the requested resource (get k8sgpts.core.k8sgpt.ai)E0512 10:15:36.382622       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1alpha1.K8sGPT: failed to list *v1alpha1.K8sGPT: the server could not find the requested resource (get k8sgpts.core.k8sgpt.ai)Finished Reconciling K8sGPT

This occurs when no K8sGPT resource has been created, it shouldn't report this but just return

Expected behaviour

No warnings or errors for this

Actual behaviour

No response

Additional Information

No response

[Question]: May I modify the address of the image "ghcr. io/k8sgpt ai/k8sgpt: v0.3.8"?

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v0.0.21

Kubernetes Version

v1.26.8

Host OS and its Version

rockylinux 8.8

Steps to reproduce

k describe pod k8sgpt-deployment-76ffb88d89-pjjfs

  Normal  Scheduled  2m13s  default-scheduler  Successfully assigned monitoring/k8sgpt-deployment-76ffb88d89-pjjfs to k8s-node01
  Normal  Pulling    2m12s  kubelet            Pulling image "ghcr.io/k8sgpt-ai/k8sgpt:v0.3.8"

Expected behaviour

May I modify the address of the image "ghcr. io/k8sgpt ai/k8sgpt: v0.3.8"?

[Feature]: Replace current K8sGPT deployment if version is incremented via CR

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

None

Problem Description

No response

Solution Description

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-sample
  namespace: k8sgpt-operator-system
spec:
  model: gpt-3.5-turbo
  backend: openai
  noCache: false
  version: v0.3.0
  enableAI: true
  # filters:
  #   - Ingress
  secret:
    name: k8sgpt-sample-secret
    key: openai-api-key

In order to enable a better experience when users want to upgrade the version then we should automatically reconcile and redeploy the k8sgpt-deployment

Benefits

This makes a better upgrade experience.

Potential Drawbacks

No response

Additional Information

No response

[Feature]: Decrease repo size with the robot.

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Problem Description

No response

Solution Description

We can add some useful bots to help me manage the repo size. Here is an example for reducing the repo size, please take a look: Aisuko#1

Benefits

Automatically help us to decrease the repo size.

Potential Drawbacks

Of course drawbacks forward.

Additional Information

I believe it is free to plan is ok for covering the public project.

[Bug]: 401 unauthorized when setting LocalAi as bachend with k8sgpt-operator

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v0.0.16

Kubernetes Version

No response

Host OS and its Version

Ubunto

Steps to reproduce

Run LocalAi on kubernetes cluster (aks)
configure k8sgpt-operator with localAi as backend.
Get 401 Err: "Finished Reconciling K8sGPT with error: failed to call Analyze RPC: rpc error: code = Unknown desc = failed while calling AI provider localai: error, status code: 401, message: You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys"

Expected behaviour

auth is optional so its shouldnt fail over it.
same works with k8sgpt cli and LocalAi locallyl.

Actual behaviour

No response

Additional Information

No response

[Feature]: Install k8sgpt-overview and servicemonitor in the namespace where kube-prometheus is located

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Yes

Problem Description

I must install k8sgpt and Prometheus in the same namespace, otherwise the monitor will not work

Solution Description

If only you could add a few helm parameters

Benefits

This eliminates the need to install k8sgpt and prometheus in the same namespace

LocalAi integration not working with k8sgpt-operator

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v.0.3.8

Kubernetes Version

1.24

Host OS and its Version

linux

Steps to reproduce

1.deploy local-ai chart with the default configuration as detailed here: https://github.com/go-skynet/helm-charts - latest version V2.1.0
2. deploy k8sgpt-operator helm chart - latest version V0.0.21
3. apply k8sgpt-config.yaml with localai base url configured to port 80:
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: k8sgpt-local-ai
namespace: default
spec:
ai:
enabled: true
model: ggml-gpt4all-j
backend: localai
baseUrl: http://local-ai.local-ai.svc.cluster.local/v1
noCache: false
version: v0.3.8

Expected behaviour

integration should work as it is working with azureopen ai

Actual behaviour

No response

Additional Information

No response

[Feature]: Results should indicate the AI provider used

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Problem Description

No response

Solution Description

❯ kubectl get results -A
NAMESPACE                NAME         BACKEND               AGE
k8sgpt-operator-system   foocharlie       azureopenai           4h25m
k8sgpt-operator-system   foocharlie7445dfdcdbnvvmv  openai 10m

Adding the backend would be a really nice little convenience

Benefits

It opens up the potential to using multiple backends together

Potential Drawbacks

No response

Additional Information

No response

[Feature]: support arbitrary uid for openshift environments

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Yes

Problem Description

it is the same as in k8sgpt-ai/k8sgpt#440 which also needs to get fixed in k8sgpt-operator

Solution Description

the operator should create a deployment which emptyDir volume and this XDG_ variable set as in k8sgpt-ai/k8sgpt@44725d9

Benefits

then on openshift k8sgpt deployment will start with restricted-v2 SCC and arbitrary uid. in addition, now k8sgpt deployment can run with securityContext.readOnlyRootFilesystem, which is also a security benefit.

Potential Drawbacks

No response

Additional Information

No response

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

chore(deps): update softprops/action-gh-release digest to 69320db
fix(deps): update module buf.build/gen/go/k8sgpt-ai/k8sgpt/grpc/go to v1.3.0-20240406062209-1cc152efbf5c.3
fix(deps): update module buf.build/gen/go/k8sgpt-ai/k8sgpt/protocolbuffers/go to v1.34.1-20240406062209-1cc152efbf5c.1
fix(deps): update module github.com/onsi/ginkgo/v2 to v2.17.3
fix(deps): update kubernetes packages to v0.30.0 (k8s.io/api, k8s.io/apimachinery, k8s.io/cli-runtime, k8s.io/client-go, k8s.io/kubectl)
fix(deps): update module sigs.k8s.io/controller-runtime to v0.18.2
Click on this checkbox to rebase all open PRs at once

Detected dependencies

dockerfile

Dockerfile

golang 1.22-alpine3.19

github-actions

.github/workflows/build_container.yaml

actions/checkout v4@0ad4b8fadaa221de15dcec353f45205ec38ea70b

keptn/gh-action-extract-branch-name main

actions/checkout v4@0ad4b8fadaa221de15dcec353f45205ec38ea70b

docker/setup-buildx-action v3@d70bba72b1f3fd22344832f00baa16ece964efeb

docker/build-push-action v5@2cdde995de11925a030ce8070c3d77a52ffcf1c0

actions/upload-artifact v4@65462800fd760344b1a7b4382951275a0abb4808

actions/checkout v4@0ad4b8fadaa221de15dcec353f45205ec38ea70b

docker/login-action v3@e92390c5fb421da1463c202d546fed0ec5c39f20

docker/setup-buildx-action v3@d70bba72b1f3fd22344832f00baa16ece964efeb

docker/build-push-action v5@2cdde995de11925a030ce8070c3d77a52ffcf1c0

ubuntu 22.04

ubuntu 22.04

ubuntu 22.04

.github/workflows/release.yaml

actions/checkout v4@0ad4b8fadaa221de15dcec353f45205ec38ea70b

google-github-actions/release-please-action v4@a37ac6e4f6449ce8b3f7607e4d97d0146028dc0b

actions/checkout v4@0ad4b8fadaa221de15dcec353f45205ec38ea70b

stefanprodan/helm-gh-pages master

actions/checkout v4@0ad4b8fadaa221de15dcec353f45205ec38ea70b

docker/setup-buildx-action v3@d70bba72b1f3fd22344832f00baa16ece964efeb

docker/login-action v3@e92390c5fb421da1463c202d546fed0ec5c39f20

docker/build-push-action v5@2cdde995de11925a030ce8070c3d77a52ffcf1c0

anchore/sbom-action v0.15.11@7ccf588e3cf3cc2611714c2eeae48550fbc17552

softprops/action-gh-release v2@9d7c94cfd0a1f3ed45544c887983e9fa900f0564

ubuntu 22.04

.github/workflows/test.yaml

actions/checkout v4@0ad4b8fadaa221de15dcec353f45205ec38ea70b

actions/setup-go v5@cdcb36043654635271a94b9a6d1392de5bb323a7

actions/checkout v4@0ad4b8fadaa221de15dcec353f45205ec38ea70b

azure/setup-helm v4@fe7b79cd5ee1e45176fcad797de68ecaf3ca4814

actions/setup-python v5@82c7e631bb3cdc910f68e0081d67478d79c6982d

helm/chart-testing-action v2.6.1@e6669bcd63d7cb57cb4380c33043eebe5d111992

helm/kind-action v1.10.0@0025e74a8c7512023d06dc019c617aa3cf561fde

gomod

go.mod

go 1.21

buf.build/gen/go/k8sgpt-ai/k8sgpt/grpc/go v1.3.0-20240406062209-1cc152efbf5c.2@1cc152efbf5c

buf.build/gen/go/k8sgpt-ai/k8sgpt/protocolbuffers/go v1.34.0-20240406062209-1cc152efbf5c.1@1cc152efbf5c

github.com/onsi/ginkgo/v2 v2.17.2

github.com/onsi/gomega v1.33.1

github.com/prometheus/client_golang v1.19.0

google.golang.org/grpc v1.63.2

k8s.io/api v0.29.3

k8s.io/apimachinery v0.29.3

k8s.io/cli-runtime v0.29.3

k8s.io/client-go v0.29.3

k8s.io/kubectl v0.29.3

k8s.io/utils v0.0.0-20240502163921-fe8a2dddb1d0@fe8a2dddb1d0

sigs.k8s.io/controller-runtime v0.15.0

github.com/stretchr/testify v1.9.0

helm-values

chart/operator/values.yaml

gcr.io/kubebuilder/kube-rbac-proxy v0.16.0

ghcr.io/k8sgpt-ai/k8sgpt-operator v0.1.3

kustomize

config/manager/kustomization.yaml

Check this box to trigger a request for Renovate to run again on this repository

[Bug]: slack report same resource every time analysis

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v0.3.13

Kubernetes Version

No response

Host OS and its Version

No response

Steps to reproduce

set slack url callback.

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-sample
  namespace: openai
spec:
  ai:
    enabled: true
    model: gpt-3.5-turbo
    backend: openai
    secret:
      name: k8sgpt-sample-secret
      key: openai-api-key
    language: english
  noCache: false
  version: v0.3.13
  sink:
    type: slack
    webhook: http://localhost:888/slack

create some resource which can cause problem.
Wait operator to callback.

Expected behaviour

I think it should be called for 1~3 times or configure it in crd.

Actual behaviour

call back every time ,because the below code:

	if len(existing.Spec.Error) == len(res.Spec.Error) && reflect.DeepEqual(res.Labels, existing.Labels) {
		existing.Status.LifeCycle = string(NoOpResult)
		err := c.Status().Update(ctx, &existing)
		return NoOpResult, err
	}

result.Spec.Error can be different every time ,because it generated by LLM whose generated content is not fixed.

Additional Information

We can config it like below:

  sink:
    type: slack
    webhook: http://localhost:888/slack
    for: 10m

If we can determine some strategy , I am happy to work on it :)

[Feature]: Grafana dashboards as configmaps by default

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Problem Description

No response

Solution Description

We should use the generated grafana/* dashboards from Kubebuilder to populate a new configmap e.g. config/grafana which has a kustomize overlay to inject the content into.

Benefits

Dashboards will show up for free if a use is using grafana

Potential Drawbacks

No response

Additional Information

No response

standard_init_linux.go:211: exec user process caused "exec format error

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v0.0.20

Kubernetes Version

v1.16.9

Host OS and its Version

No response

Steps to reproduce

When I install operator, it fails with error: standard_init_linux.go:211: exec user process caused "exec format error

Expected behaviour

Successful installation

Actual behaviour

When I install operator, it fails with error: standard_init_linux.go:211: exec user process caused "exec format error

Additional Information

No response

invalid character 'e' looking for beginning of value

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v0.0.8

Kubernetes Version

v1.24.2

Host OS and its Version

centos7.9

Steps to reproduce

OPENAI_TOKEN=xxxxx
kubectl create secret generic k8sgpt-sample-secret --from-literal=openai-api-key=$OPENAI_TOKEN -n kube-system
kubectl apply -f - << EOF
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: k8sgpt-sample
namespace: kube-system
spec:
model: gpt-3.5-turbo
backend: openai
noCache: false
version: v0.2.7
enableAI: true
secret:
name: k8sgpt-sample-secret
key: openai-api-key
EOF
[root@master ~]# kubectl -n gpt logs k8s-gpt-k8sgpt-operator-controller-manager-599f9655cd-st295
2023-05-06T01:25:59Z INFO controller-runtime.metrics Metrics server is starting to listen {"addr": "127.0.0.1:8080"}
2023-05-06T01:25:59Z INFO setup starting manager
2023-05-06T01:25:59Z INFO Starting server {"kind": "health probe", "addr": "[::]:8081"}
I0506 01:25:59.771627 1 leaderelection.go:248] attempting to acquire leader lease gpt/ea9c19f7.k8sgpt.ai...
2023-05-06T01:25:59Z INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
I0506 01:26:24.246227 1 leaderelection.go:258] successfully acquired lease gpt/ea9c19f7.k8sgpt.ai
2023-05-06T01:26:24Z DEBUG events k8s-gpt-k8sgpt-operator-controller-manager-599f9655cd-st295_ad62cb5e-d719-4497-a78d-d1c838789192 became leader {"type": "Normal", "object": {"kind":"Lease","namespace":"gpt","name":"ea9c19f7.k8sgpt.ai","uid":"74ca4ae0-33dc-465a-b040-d22963c2d1db","apiVersion":"coordination.k8s.io/v1","resourceVersion":"1487977"}, "reason": "LeaderElection"}
2023-05-06T01:26:24Z INFO Starting EventSource {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "source": "kind source: *v1alpha1.K8sGPT"}
2023-05-06T01:26:24Z INFO Starting Controller {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT"}
2023-05-06T01:26:24Z INFO Starting workers {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "worker count": 1}
Finished Reconciling K8sGPT with error: invalid character 'e' looking for beginning of value
2023-05-06T01:26:26Z ERROR Reconciler error {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-sample","namespace":"kube-system"}, "namespace": "kube-system", "name": "k8sgpt-sample", "reconcileID": "7f6fa68c-7ea3-4797-9926-3bbc259eaedb", "error": "invalid character 'e' looking for beginning of value"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235
Finished Reconciling K8sGPT with error: invalid character 'e' looking for beginning of value
2023-05-06T01:26:27Z ERROR Reconciler error {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-sample","namespace":"kube-system"}, "namespace": "kube-system", "name": "k8sgpt-sample", "reconcileID": "b9426e8f-746e-4ec6-ad74-66fdddca2d29", "error": "invalid character 'e' looking for beginning of value"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235
Finished Reconciling K8sGPT with error: invalid character 'e' looking for beginning of value
2023-05-06T01:26:29Z ERROR Reconciler error {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-sample","namespace":"kube-system"}, "namespace": "kube-system", "name": "k8sgpt-sample", "reconcileID": "726a9c75-58a6-463d-882e-0c9826cf40b1", "error": "invalid character 'e' looking for beginning of value"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235
2023-05-06T01:26:30Z ERROR Reconciler error {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-sample","namespace":"kube-system"}, "namespace": "kube-system", "name": "k8sgpt-sample", "reconcileID": "512ce716-5cb0-4cfc-b58e-38bdf0c1ba09", "error": "invalid character 'e' looking for beginning of value"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235
Finished Reconciling K8sGPT with error: invalid character 'e' looking for beginning of value
2023-05-06T01:26:32Z ERROR Reconciler error {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-sample","namespace":"kube-system"}, "namespace": "kube-system", "name": "k8sgpt-sample", "reconcileID": "e9c64a3b-0a9c-4e21-93a9-0310e739b5f6", "error": "invalid character 'e' looking for beginning of value"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235
Finished Reconciling K8sGPT with error: invalid character 'e' looking for beginning of value
Finished Reconciling K8sGPT with error: invalid character 'e' looking for beginning of value
2023-05-06T01:26:33Z ERROR Reconciler error {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-sample","namespace":"kube-system"}, "namespace": "kube-system", "name": "k8sgpt-sample", "reconcileID": "59761095-3372-47b5-9cc1-7356505a83ab", "error": "invalid character 'e' looking for beginning of value"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235
Finished Reconciling K8sGPT with error: invalid character 'e' looking for beginning of value
2023-05-06T01:26:35Z ERROR Reconciler error {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-sample","namespace":"kube-system"}, "namespace": "kube-system", "name": "k8sgpt-sample", "reconcileID": "7248e8ad-694c-4149-a9f1-b5fdbee10816", "error": "invalid character 'e' looking for beginning of value"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235
2023-05-06T01:26:37Z ERROR Reconciler error {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-sample","namespace":"kube-system"}, "namespace": "kube-system", "name": "k8sgpt-sample", "reconcileID": "b54a634b-8890-4c6e-abd1-4388d82ace5f", "error": "invalid character 'e' looking for beginning of value"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
Finished Reconciling K8sGPT with error: invalid character 'e' looking for beginning of value
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235

Expected behaviour

I think this command can output result when kubernetes cluster is issue

❯ kubectl get results -o json | jq .
{
"apiVersion": "v1",
"items": [
{
"apiVersion": "core.k8sgpt.ai/v1alpha1",
"kind": "Result",
"metadata": {
"creationTimestamp": "2023-04-26T09:45:02Z",
"generation": 1,
"name": "placementoperatorsystemplacementoperatorcontrollermanagermetricsservice",
"namespace": "default",
"resourceVersion": "108371",
"uid": "f0edd4de-92b6-4de2-ac86-5bb2b2da9736"
},
"spec": {
"details": "The error message means that the service in Kubernetes doesn't have any associated endpoints, which should have been labeled with "control-plane=controller-manager". \n\nTo solve this issue, you need to add the "control-plane=controller-manager" label to the endpoint that matches the service. Once the endpoint is labeled correctly, Kubernetes can associate it with the service, and the error should be resolved.",

Actual behaviour

[root@master ~]# kubectl get results -o json | jq .
{
"apiVersion": "v1",
"items": [],
"kind": "List",
"metadata": {
"resourceVersion": ""
}
}

Additional Information

No response

k8sgpt-operator to enable continuous in cluster monitoring

Error helm install

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

No response

Kubernetes Version

No response

Host OS and its Version

GKE 1.25

Steps to reproduce

When I am goint to install the system reports this error

helm install release k8sgpt/k8sgpt-operator -n k8sgpt-operator-system --create-namespace
Error: parse error at (k8sgpt-operator/templates/deployment.yaml:60): unclosed action

Regards.

Expected behaviour

When I am goint to install the system reports this error

helm install release k8sgpt/k8sgpt-operator -n k8sgpt-operator-system --create-namespace
Error: parse error at (k8sgpt-operator/templates/deployment.yaml:60): unclosed action

Regards.

Actual behaviour

When I am goint to install the system reports this error

helm install release k8sgpt/k8sgpt-operator -n k8sgpt-operator-system --create-namespace
Error: parse error at (k8sgpt-operator/templates/deployment.yaml:60): unclosed action

Regards.

Additional Information

When I am goint to install the system reports this error

helm install release k8sgpt/k8sgpt-operator -n k8sgpt-operator-system --create-namespace
Error: parse error at (k8sgpt-operator/templates/deployment.yaml:60): unclosed action

Regards.

[Feature]: manual restart of k8sgpt-deployment pod needed when openai-api-key gets changed

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Yes

Problem Description

I needed to change the openai-api-key but the old key was used until I restarted the k8sgpt-deployment pod. From my point of view this is suboptimal.

Solution Description

We should mount the secret as a vollume and reread it if it gets changed.

Benefits

When changing the openai-api-key in the kubernetes secret there is no other manual thing to do.

Potential Drawbacks

No response

Additional Information

No response

[Question]: May I know how to configure the anonymous and language parameters in k8sgpt-operator?

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v0.0.16

Kubernetes Version

v1.24

Host OS and its Version

CentOS 7.9

Steps to reproduce

kubectl apply -f - << EOF
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: k8sgpt-sample
namespace: k8sgpt-operator-system
spec:
model: gpt-3.5-turbo
backend: openai
noCache: false
version: v0.3.0
enableAI: true
secret:
name: k8sgpt-sample-secret
key: openai-api-key
baseUrl: https://api.openai.com/v1
EOF

Expected behaviour

I have set the baseUrl parameter above, and I would also like to know how to set the anonymize and language parameters.

[Bug]: Corrupted charts due to accidental line breaks

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v0.3.8

Kubernetes Version

v1.25.6

Host OS and its Version

linux

Steps to reproduce

cd chart/operator
helm template .

# error msg
Error: parse error in "k8sgpt-operator/templates/deployment.yaml": template: k8sgpt-operator/templates/deployment.yaml:60: unclosed action
❯ helm template .
Error: parse error in "k8sgpt-operator/templates/deployment.yaml": template: k8sgpt-operator/templates/deployment.yaml:66: unclosed action
❯ helm template .
Error: parse error in "k8sgpt-operator/templates/deployment.yaml": template: k8sgpt-operator/templates/deployment.yaml:79: unclosed action
...

Expected behaviour

output normal yamls

Actual behaviour

Corrupted charts due to accidental line breaks

Additional Information

No response

[Question]: Adding proxy settings in the k8sgpt-operator

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

No response

Kubernetes Version

No response

Host OS and its Version

No response

Steps to reproduce

Run k8sgpt analyze --filter=Pod --backend azureopenai --explain
Getting following timeout error:
Error: failed while calling AI provider azureopenai: Post "https://myopenapitests.azure-api.net/mtx/openai/deployments/gpt-35-turbo-002/chat/completions?api-version=2023-03-15-preview": dial tcp 22.201.29.165:443: i/o timeout

i am working behind proxy and added my proxy details in the deployment of k8sgpt-deployment.

Expected behaviour

No timeout error and response from azureopenai

Actual behaviour

following error:
Error: failed while calling AI provider azureopenai: Post "https://myopenapitests.azure-api.net/mtx/openai/deployments/gpt-35-turbo-002/chat/completions?api-version=2023-03-15-preview": dial tcp 22.201.29.165:443: i/o timeout

Additional Information

i suspect the k8sgpt is trying to connect azure openai directly and not through the proxy

feature: Help wanted, updating the Result status field to show more information

ROADMAP: Observability included

Grafana Dashboards and configuration as part of the Operator start up

[Feature]: Support remote caching through the K8sGPT CR

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Problem Description

No response

Solution Description

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-sample
  namespace: k8sgpt-operator-system
spec:
  model: gpt-3.5-turbo
  backend: openai
  noCache: false
  version: v0.3.0
  enableAI: true
  # filters:
  #   - Ingress
  secret:
    name: k8sgpt-sample-secret
    key: openai-api-key
 remoteCache:
   credentials:
     name: remote-cache-creds
   bucketName: foo

Benefits

Lets users configure the remote cache through the API

Potential Drawbacks

No response

Additional Information

No response

Bug: empty analyze results when running with operator on AKS + error is raised

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v1alpha1

Kubernetes Version

V 1.25.6

Host OS and its Version

Ubunto

Steps to reproduce

install K8SGPT operator on AKS
run kubectl get results -o json | jq .

Expected behaviour

list with analyzed data is return + no error at k8sgpt deployment.

Actual behaviour

returns empty list:
{
"apiVersion": "v1",
"items": [],
"kind": "List",
"metadata": {
"resourceVersion": ""
}
}

error is raised: Error: AI provider openai not specified in configuration. Please run k8sgpt auth

Additional Information

Spoke with Aris on this issue (Aris: "we've recently refactored the way we interact with k8sgpt service and I think we need to tweak it a bit")

[BUG]: LocalAi returns connection refused to k8sgpt-operator

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

V3.0.9

Kubernetes Version

No response

Host OS and its Version

No response

Steps to reproduce

deploy local-ai chart with the default configuration as detailed here: https://github.com/go-skynet/helm-charts - latest version V2.1.0
deploy k8sgpt-operator helm chart - latest version V0.0.19.
apply k8sgpt-config.yaml with localai base url configured to port 80:
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: k8sgpt-local-ai
namespace: default
spec:
ai:
enabled: true
model: ggml-gpt4all-j
backend: localai
baseUrl: http://local-ai.local-ai.svc.cluster.local/v1
noCache: false
version: v0.3.9
after applying you will start to see this err in localAi logs:
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37005: connect: connection refused"

Expected behaviour

no errors in local-ai, and scan results from k8sgpt

Actual behaviour

No response

Additional Information

No response

[Bug]: Chart v0.0.4 and v0.0.5 were not published

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v0.0.3

Kubernetes Version

No response

Host OS and its Version

No response

Steps to reproduce

Installing the chart does not work as it helm cannot find it:

helm install k8sgpt/k8sgpt-operator  --version 0.0.5

Searching for the chart reveals only the v0.0.3 chart

helm search repo k8sgpt-operator

NAME                    CHART VERSION   APP VERSION     DESCRIPTION
k8sgpt/k8sgpt-operator  v0.0.3          0.1.0           A Helm chart for Kubernetes

Expected behaviour

v0.0.5 chart should exist.

Actual behaviour

Charts after 0.0.3 have not been released.

Additional Information

No response

[Feature]: Client-side Streaming for the API

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Problem Description

No response

Solution Description

I would like to update the existing API to support client-side streaming. This means that the client will be able to send a continuous stream of analysis data to the server for processing. The server will process the data as it arrives and provide a response based on the complete stream of data received.

Benefits

This streaming capability will enable asynchronous data analysis, allowing for more efficient and responsive processing of large or continuous data sets.

Potential Drawbacks

No response

Additional Information

No response

[Feature]: K8sgpt-deployment ownerref to operator

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Problem Description

No response

Solution Description

Adding an owner-reference from the k8sgpt-operator to the deployment assets would help visualise it in argocd and flux

Benefits

better visualisation

Potential Drawbacks

No response

Additional Information

No response

[Bug]: k8sgpt-deployment is not synchronized, after k8sgptConfig update

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v0.3.8

Kubernetes Version

v1.25.6

Host OS and its Version

ubuntu

Steps to reproduce

create a k8sGPT CR with a invalid baseUrl, say

kubectl apply -f - << EOF
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-local-ai
  namespace: default
spec:
  ai:
    enabled: true
    model: ggml-gpt4all-j
    backend: localai
    baseUrl: http://local-ai.local-ai.svc.cluster.local:8080/v1
  noCache: false
  version: v0.3.8
EOF

check operator log

fix baseUrl to http://local-ai.local-ai.svc.cluster.local/v1

check operator & deployment logs

operator log

deployment log

Expected behaviour

k8sGPT-deployment should use the latest configuration after the K8sGPT CR changes

Actual behaviour

The k8sGPT-deployment configuration has not been updated with the latest configuration after the K8sGPT CR changes

Additional Information

No response

[Question]: Should there be a Scan Frequency?

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v0.3.14

Kubernetes Version

v1.26

Host OS and its Version

No response

Steps to reproduce

Deploy k8s operator using helm
Add K8sGPT CRD with openai backend (for v0.3.14)

After deployment we hit a couple hundred API calls / hour

Expected behaviour

I can see many teams appreciating the constant scan, especially those with monitors in Prom/Grafana; however, this can likely increase costs for some.

Possibly the introduction of a cron like scan frequency to allow the user to choose how frequently they need an AI assisted audit.
Or notes in the docs where the expectation would be for the user to install/uninstall the application when they needed scans.

Actual behaviour

Appears to be constant scanning

Additional Information

Searched around docs, GIT, reddit, YT, and Slack - apologies if this was already discussed somewhere

[Feature]: Sign Helm Charts

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Problem Description

Helm Provenance and Integrity

Helm has provenance tools which help chart users verify the integrity and origin of a package. Using industry-standard tools based on PKI, GnuPG, and well-respected package managers, Helm can generate and verify signature files.

Integrity is established by comparing a chart to a provenance record. Provenance records are stored in provenance files, which are stored alongside a packaged chart. For example, if a chart is named myapp-1.2.3.tgz, its provenance file will be myapp-1.2.3.tgz.prov.

Provenance files are generated at packaging time (helm package --sign ...), and can be checked by multiple commands, notably helm install --verify.

Right now there in ArtifactHub there it is showing as the helm chart is not signed. This could stop some people from adopting it. https://artifacthub.io/packages/helm/k8sgpt/k8sgpt-operator

It is easy to do so I propose that we do it. We can add this capability on to the helm-chart-releaser that we already use.

Solution Description

Create GPG Key and passpharse
Upload them to Github Secrets
Update GitHub action to sign the chart

Benefits

People will trust the charts more and enterprises with proper security processes and practices will be able to tick their compliance boxes and adopt it.

Potential Drawbacks

No Drawbacks

Additional Information

No response

[Feature]: Install Grafana dashboard in the namespace where kube-prometheus is located

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Yes

Problem Description

Without making any modifications, the dashboard will be installed in the namespace where k8sgpt is located. If the namespaces of kube-prometheus and k8sgpt are different, the dashboard will not work.

Solution Description

Adding namespace options in helm

Benefits

Can make the Grafana dashboard work smoothly.

question: Bad RBAC, unable to list deployments

│ manager E0424 15:25:14.269895       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/r │
│ eflector.go:169: Failed to watch *v1.Deployment: failed to list *v1.Deployment: deployments.apps is fo │
│ rbidden: User "system:serviceaccount:k8sgpt-operator-system:k8sgpt-operator-controller-manager" cannot │
│  list resource "deployments" in API group "apps" at the cluster scope

[Feature]: Remove Results when they are no longer applicable

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Yes

Problem Description

Results once created are not checked to see whether they are still an issue e.g. I might have an issue with nginx that remediates, therefore should be removed.

Solution Description

We need in the reconcilation phase to check all relevant results and remove those that are no longer relevant

Benefits

Accuracy

Potential Drawbacks

No response

Additional Information

No response

chore: adding this to the `README`

          Thanks @bradmccoydev would you mind adding this to the `README` in the table of available helm values?

Originally posted by @AlexsJones in #51 (comment)

[Feature]: Add unit tests into GH check

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Problem Description

No response

Solution Description

In order to improve our visibility on any issues, we should take the go test ./... results as a metric for PR/branch health by adding them into GH checks.

Benefits

This would improve our frequency of testing and add a quality gate.

Potential Drawbacks

No response

Additional Information

No response

[BUG]: old pod IP is used when k8sgpt-deployment pod is restarted

Checklist

I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior

Affected Components

K8sGPT (CLI)
K8sGPT Operator

K8sGPT Version

v0.0.3

Kubernetes Version

No response

Host OS and its Version

No response

Steps to reproduce

restart k8sgpt-deployment pod
look at operator logs:

kubectl logs deployment/k8sgpt-k8sgpt-operator-controller-manager

2023-05-31T20:03:39Z    ERROR   Reconciler error        {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-sample","namespace":"sx-k8sgpt"}, "namespace": "sx-k8sgpt", "name": "k8sgpt-sample", "reconcileID": "c5a51cb1-4a4b-49f8-af9f-db04e0e3f057", "error": "failed to call Analyze RPC: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 10.130.3.224:8080: connect: no route to host\""}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler

Expected behaviour

the controller should be resilient against k8sgpt-deployment pod restarts. why do we use the pod IP in

k8sgpt-operator/controllers/k8sgpt_controller.go

Line 182 in 075caf5

address = fmt.Sprintf("%s:8080", podList.Items[0].Status.PodIP)

instead of the k8sgpt service?

Actual behaviour

No response

Additional Information

No response

[Feature]: Add some tests for the K8s Operator

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

Problem Description

At the moment, there are not much tests for the k8sgpt operator and therefore, merging dependency PRs and similar things lead to "blind" approvals or a some manual effort.

Solution Description

It would make sense to test at least the most important functionality of the operator to get more confidence in approving PRs. An example for an operator which is tested extensively is the keptn lifecycle toolkit (https://github.com/keptn/lifecycle-toolkit).

It would be possible to test some simple things with ginkgo and run e2e tests using kuttl.

Benefits

More confidence in CI runs and approvals, more stability of the operator.

Potential Drawbacks

None I can think of

Additional Information

No response