fidelity / theliv Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Describe the solution you'd like:
add copyright headers to UI files
Why do you want this feature:
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
As a user of this Kubernetes open source project, I want a deployment guide and Helm chart so I can easily install and configure the project in my Kubernetes cluster.
We need to create docs and assets to make deployment of this project simpler for users.
Theliv should integrate with prometheus. i.e. theliv will maintain its own set of prometheus alerts in each cluster. These alerts would be created and managed by theliv and installation of a prometheus server in each of the kubernetes cluster is a pre-requisite.
This list of problems would then be run through an investigation framework and then be passed onto an aggregation logic which will produce the report cards and show it to the user.
problem struct:
name: {alert_name}
description: {alert_description}
tags: {alert_tags}
affectedResources: []Resource
details: []string {alert_details}
Resource struct
object: runtime.Object
objectKind: string
ownerKind: string
owner: runtime.Object
Documentation link: https://github.com/fidelity/theliv/blob/main/readme.md
Description:
Finally theliv aims to be an extensible framework where custom checks can be plugged in whereever required.
Change whereever to wherever
Now Theliv use Rule-based solution to do issue detection and analysis.
Take advantage of Generative AI model, to do trouble shooting and provide suggestions.
Alerts in general give you a high level information of what is going wrong. Sometimes it is enough for the user but sometimes it needs further troubleshooting. Theliv will maintain a bunch of "investigator" functions which can be contributed by anyone. These are single purpose functions that are mapped to specific alerts and helps trouble shoot those alerts further.
Imagine something like map[alert_name][investigator_func] . When theliv gets the list of alerts from the prometheus api and constructs the problem struct, it would then be passed on to some investigator i.e. an image pull backoff alert is fired and we have a dedicated investigator function for image pull back off which holds the troubleshooting logic for imagepull back off issues.
The investigator will then ADD more details to the []details field of the problem struct. This would also mean anyone from the operators team should be able to raise a PR with investigator functions for specific types of alerst (i.e. pending pods, IP exhaustion etc).
This investigator piece is one of the main elements of theliv where instead of an operations team member analyzing and troubleshooting the issue for the application team, theliv will do it and throw more light into what is happening via report cards in the UI.
the investigator should also have access to some of the following
clustername, namespace, kube api client, aws client (for ingress related) etc.
Why do you want this feature:
theliv investigator functions are supposed to analyze the alerts deeply and provide actionable insights/next steps to the users. This means investigator functions should analyze kubernetes events in combination with the alert information and provide more information to the user.
Describe the solution you'd like:
Theliv provides an investigation framework on top of prometheus alerts. This means it will analyze alerts from prometheus, dive deeper to provide actionable insights to the user. E.g. when a crashloop backoff alert is triggered, typically a sre or a devops member would dive deeper to figure out the root cause. Many a times, that involves analyzing the kubernetes events.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
The existing aggregation logic needs to be modified accordingly. The logic still creates the report cards for the user to view based on applications. There will be two types of report cards, first one for cluster level alerts, cloud provider level alerts like az/region failures etc and management namespace related alerts (where some of the critical add-ons run). The second report card is only for alerts pertaining to user supplied namespace. The alerts within the report cards needs to be arranged in a specific order (simple correlation logic) e.g. alerts related to nodes or api server is at the top (which could be a possible root cause for many issues)
Now in Ingress detector, use goroutine to call service in parallel.
If the service call failed in 1 routine, should record the failure. Finally generate a summary, reply to the user.
As 1 notification, user can try again, soon or later.
UI needs to be updated based on the changes in the aggregation logic.
Documentation link: https://github.com/fidelity/theliv/blob/main/readme.md
Description:
While an developer can perfectly be equipped with the necessary skills to debug using the above flow, most of the time they dont want to do it since their focus is on rolling out a business feature to production asap.
Change dont to don't
Describe the solution you'd like:
Create Github Actions to push image to ghcr.io
Why do you want this feature:
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
documentation link: https://github.com/fidelity/theliv/blob/main/readme.md
Description:
E.g Mutiple factors could come into play in a specific issue
Currently Ingress detector, will check the backend service, which has a service name and port.
But for ALB, if a service port is "use-annotation", need to get the Ingress annotation, to find the detailed actions.
ALB annotation as reference:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.