Code Monkey home page Code Monkey logo

feiyu563 / prometheusalert Goto Github PK

View Code? Open in Web Editor NEW
2.6K 49.0 650.0 630.29 MB

Prometheus Alert是开源的运维告警中心消息转发系统,支持主流的监控系统Prometheus,Zabbix,日志系统Graylog和数据可视化系统Grafana发出的预警消息,支持钉钉,微信,华为云短信,腾讯云短信,腾讯云电话,阿里云短信,阿里云电话等

Home Page: https://feiyu563.gitbook.io

License: MIT License

Dockerfile 0.02% Go 3.52% JavaScript 89.19% Shell 0.03% Batchfile 0.01% CSS 4.90% HTML 2.26% Mustache 0.03% Makefile 0.03%
prometheus graylog grafana alertmanager alert aliyun weixin kubernetes dingding dingtalk

prometheusalert's Issues

微信告警恢复,颜色我能修改为绿色吗?

使用md格式的微信机器人(webhook是支持的),默认的告警颜色是红色。

我能将告警恢复的颜色自定义为其它颜色吗?例如绿色。这样用于更直观的区别是告警还是恢复。在此程序中需要如何修改?

阿里云短信模版配置

你好,我在页面测试阿里云短信发送时提示,params over length limit is 20,模版按照“prometheus告警:${code}”配置,请问应该如何修改

[展望] 感觉此软件可发展为一个消息中心

感觉这个软件可以发展成为一个消息转换和发送中心,而不仅仅限于告警。

可接收各种不同软件源的消息,通过PrometheusAlert处理,发送给各种接收软件。

后面感觉可以改名为MessageCenter

告警类:
  prometheus                                                                    wx
  grafana                                                                      dingding
  zabbix                                                                       ali
代码类:                                                                         tx
  gitlab                                                                       hw
  gogs                                                                         email
构建类:           -------->  PrometheusAlert   -------->                        feishu
  jenkins                                                                        ...
  bamboo
任务类:
  redmine
  jira
...

作者大神您好,Graylog3 + prometheusAlert 发送钉钉报警格式问题咨询

是这样的,我在网上找到资料可以用 Graylog3 + prometheusAlert + 钉钉 来做报警推送,
在配置调试过程中发现钉钉的格式有问题

  • 问题1:用http://<pa_url>/graylog3/dingding 发送的钉钉消息在Win10下不能点开,提示要装Win10应用
  • 问题2:显示的格式没有动态内容,每次都一样的

后来我查源代码,发现是逻辑判断是否有backlog属性,但Graylog3.2好像没有设置backlog,信息都在event里面,所以在此提一个问题看看怎么处理比较好,多谢!

环境:

Docker + k8s
feiyu563/prometheus-alert:latest
graylog/graylog:3.2
mongo:3
docker.elastic.co/elasticsearch/elasticsearch-oss:6.8.5

LOG信息:

2020/05/20 12:08:05.369 [I] [value.go:460]  [1589947685369118813] 
{
	"event_definition_id": "5ec4914e23c3ff0016ccd26b",
	"event_definition_type": "aggregation-v1",
	"event_definition_title": "Error Event",
	"event_definition_description": "test {{ $labels.instance }}\ntest2 {{ $labels.volume}}",
	"job_definition_id": "5ec3a8006f2d320014c4dbfe",
	"job_trigger_id": "5ec4ad2575c74d00157110d0",
	"event": {
		"id": "01E8R48PAQZ94Z2SZEHYENKY2J",
		"event_definition_type": "aggregation-v1",
		"event_definition_id": "5ec4914e23c3ff0016ccd26b",
		"origin_context": "urn:graylog:message:es:graylog_0:580ca8e0-9a4f-11ea-af2b-de790f769da2",
		"timestamp": "2020-05-20T04:06:55.000Z",
		"timestamp_processing": "2020-05-20T04:08:05.207Z",
		"timerange_start": null,
		"timerange_end": null,
		"streams": [],
		"source_streams": ["000000000000000000000001"],
		"message": "Error Event",
		"source": "graylog3-867cd4545d-bt9st",
		"key_tuple": [],
		"key": "",
		"priority": 2,
		"alert": true,
		"fields": {
			"message": "rancher-logging-fluentd-linux-5kl4b fluentd: log:[2020-05-20 12:06:40,939][pid:1][tid:139961156640840][system.py:163] ERROR: get net io info by ssh error"
		}
	},
	"backlog": []
}
2020/05/20 12:08:05.369 [I] [graylog2.go:147]  [1589947685369118813] [dingding] {"msgtype":"markdown","markdown":{"title":"IOT-Edge-异常日志告警告警信息","text":"## [IOT-Edge-异常日志告警Graylog2告警信息](https://log.xxx.xxx.com)\n\n#### \n\n![IOT-Edge-异常日志告警](https://raw.githubusercontent.com/feiyu563/PrometheusAlert/master/doc/alert-center.png)"},"at":{"atMobiles":["15395105573"],"isAtAll":false}}
2020/05/20 12:08:05.609 [I] [graylog2.go:147]  [1589947685369118813] [dingding] {"errcode":0,"errmsg":"ok"}
2020/05/20 12:08:05.609 [I] [graylog2.go:151]  [1589947685369118813] [dingding] 飞书接口未配置未开启状态,请先配置open-feishu为1
2020/05/20 12:08:05.609 [I] [graylog2.go:155]  [1589947685369118813] [weixin] 企业微信接口未配置未开启状态,请先配置open-weixin为1
2020/05/20 12:08:05.609 [I] [value.go:460]  [158

graylog报警时区问题

大神好,graylog3报警时,报警开始时间慢了8个小时,这个时区问题怎么处理?

修改了模板没有生效

修改了钉钉的模板,但是告警时还是发的老的模板内容,还有如果我想遍历所有label,怎么改模板哈

prometheus-alert镜像存在问题

docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: "/app/PrometheusAlert": permission denied": unknown.

运行时指定配置文件和 views 等文件位置

例如: 将可执行文件放在 /opt/prometheusalert/PrometheusAlert

# 执行命令,只会在当前目录寻找配置文件和 view、db 等文件夹。
/opt/prometheusalert/PrometheusAlert

导致的问题是无法利用 systemd 管理服务。会找不到配置文件等东西。

建议 :Db_name:= "./db/PrometheusAlertDB.db" 等都改为使用 app.config 默认值。 并且添加参数指定 config。

告警是否支持了收敛功能?

zabbix、grafana的报警信息转发过来之后,是否可以将告警信息进行收敛?
是否有此功能,如果没有会不会计划增加此功能?

prometheus接收告警信息时,重复代码过多

以下两行所定义的router

beego.Router("/prometheus/alert", &controllers.PrometheusController{},"post:PrometheusAlert")

beego.Router("/prometheus/router", &controllers.PrometheusController{},"post:PrometheusRouter")

在此文件内的两个函数有大量重复代码,是否可以合并优化?

func SendMessageP(message Prometheus,logsign string)(string) {

func SendMessageR(message Prometheus,rwxurl,rddurl,rfsurl,rphone,logsign string)(string) {

飞书告警详情变化

PrometheusAlert故障告警信息我这里用默认接口接收prometheus的告警的话,会出现PrometheusAlert故障告警信息与PrometheusAlert故障恢复信息。但是我用自定义的模版倒没有这变化。能问下是那个变量控制的吗?
以下是我模版
{{ $var := .externalURL}}{{ range $k,$v:=.alerts }}
Prometheus告警信息
[{{$v.labels.desc}}]
告警级别:{{$v.labels.severity}}
开始时间:{{$v.startsAt}}
结束时间:{{$v.endsAt}}
故障主机IP:{{$v.labels.instance}}
故障描述: {{$v.annotations.summary}}
{{ end }}

钉钉报警失效,不支持协议

钉钉报警没效果,我看了打印的日志
PostToDingDing:Post : unsupported protocol scheme ""
app.conf

#是否开启钉钉告警通道,可同时开始多个通道0为关闭,1为开启
    open-dingding=1
    #默认钉钉机器人地址
    ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxxxx

aliyun公共号码池电话号码配置报错

配置

`
open-alydh=1
#阿里云电话主账号AccessKey的ID
ALY_DH_AccessKeyId=2VXfRPdQbRU
#阿里云电话接口密钥
ALY_DH_AccessSecret=zGDaF2VXfRPdQbRU0Ia
#阿里云电话被叫显号,必须是已购买的号码
ALY_DX_CalledShowNumber=
#阿里云电话文本转语音(TTS)模板ID
ALY_DH_TtsCode=TTS_1960992

`

报错
{"RequestId":"BA69B4AC-ADF5-4E55-8758-EBD7EE67E8D7","Message":"模板变量缺少对应参数值","Code":"isv.TEMPLATE_MISSING_PARAMETERS"}

range labels报错

image

我的配置为:

{{ $var := .externalURL}}{{ range $k,$v:=.alerts }}
{{if eq $v.status "resolved"}}
## [Prometheus恢复信息]({{$v.generatorURL}})
#### [{{$v.labels.alertname}}]({{$var}})
###### 告警级别:{{$v.labels.severity}}
###### 开始时间:{{$v.startsAt}}
###### 结束时间:{{$v.endsAt}}
###### 故障主机IP:{{$v.labels.instance}}
##### {{$v.annotations.description}}
**Labels**
{{ range $v.labels }}> - {{ .Name }}: {{ .Value | markdown | html }}
{{end}}
{{else}}
## [Prometheus告警信息]({{$v.generatorURL}})
#### [{{$v.labels.alertname}}]({{$var}})
###### 告警级别:{{$v.labels.severity}}
###### 开始时间:{{$v.startsAt}}
###### 结束时间:{{$v.endsAt}}
###### 故障主机IP:{{$v.labels.instance}}
##### {{$v.annotations.description}}
**Labels**
{{ range $v.labels }}> - {{ .Name }}: {{ .Value | markdown | html }}
{{end}}
{{end}}
{{ end }}

自定义模板,@ 无效。(钉钉)

@某个人,就钉钉显示@某个人
@手机号,就显示@手机号

理论上@手机号,钉钉会自动转换成某个人的

求教,template里面压根就无法实现@功能吗?

关于prometheus报警模板的疑问

我集群里面是采用prometheus-operator部署的一整套prometheus报警系统,之前采用的是邮件以及企业微信报警,报警信息都是很完整的,且都是官方配置规则,我这边未做任何改动;今天测试了PrometheusAlert,是可以发送报警信息到飞书webhook的,但显示的报警信息不全,不知道这个有没有对接prometheus-operator上面所有rule规则的模板呢?prometheus-operator官方完整rule:https://github.com/coreos/kube-prometheus/blob/release-0.1/manifests/prometheus-rules.yaml
我复制了其中一个报警rule如下:

        expr: |
          absent(up{job="kube-scheduler"} == 1)
        for: 15m
        labels:
          severity: critical
      - alert: KubeStateMetricsDown
        annotations:
          message: KubeStateMetrics has disappeared from Prometheus target discovery.
          runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatemetricsdown

上面这条rule产生报警的邮件内容如下:

1 alert for severity=critical
--
View in AlertManager[1] 
[1] Firing
Labelsalertname = KubeStateMetricsDownprometheus = monitoring/k8
sseverity = critical
Annotations
message = KubeStateMetrics has disappeared from Prometheus target discovery.
runbook_url = https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatemetricsdownSource

今天实测PrometheusAlert发送飞书的报警信息如下:

PrometheusAlert故障告警信息
## [PrometheusAlertPrometheus故障告警信息](http://prometheus-xxx:xxxx/graph?xxxxxxxxx)

#### [KubeStateMetricsDown](http://alertmanagxxx0:xxxx)

###### 告警级别:信息

###### 开始时间:2020-04-20T12:58:17.64901609Z

###### 结束时间:0001-01-01T00:00:00Z

###### 故障主机IP:

##### 

![PrometheusAlert](https://raw.githubusercontent.com/feiyu563/PrometheusAlert/master/doc/alert-center.png)

是否可以按照调用url所传入的值决定告警方式,而不是根据告警级别来固定告警方式?

假设只有邮件短信两种告警方式,想要自己决定,所有告警级别全部发邮件,或者1,2发邮件,3,4,5发短信。
如果可以通过Url传入参数来决定,会更加灵活。
我看到如下位置的代码,部分是支持的根据url来解析告警通知方式的

if RMessage.Annotations.Ddurl==""{

建议短信和电话也切换成这种方式,如果您认同我的看法请告知我,我将会协助您一起修改

高可用优化建议及通用性建议

  1. 引入SQLite来本地存储模板,不方便扩展成多实例,要不要考虑去掉或者增加开关?
  2. 接入的数据格式默认是prometheus,其他服务不太方便接入,考虑增加更通用的接口?
  3. 没有失败重试或日志的功能

在Web页面修改模板不生效

在模板管理修改微信模板(p-weixin)后不生效,重启也不生效。还是使用的初始的模板格式,请问是什么原因呢?

以下为修改的模板内容:

{{ $var := .externalURL}}{{ range $k,$v:=.alerts }}
[Prometheus告警信息]($v.generatorURL}})
>规则名称: **[{{$v.labels.alertname}}]({{$var}})**
>告警级别: {{$v.labels.level}}
开始时间: {{$v.startsAt}}
结束时间: {{$v.endsAt}}
故障主机: {{$v.labels.instance}}
{{$v.annotations.summary}}
{{$v.annotations.description}}
{{end}}

添加了规则名称和annotations.sumarry。但告警内容还是初始的格式。

[需求]webhook支持企业微信应用转发告警消息

[需求]webhook支持企业微信应用转发告警消息
需求描述:
当今互联网使用企业微信作为IM及时沟通工具较多,也用企业微信应用发送相关告警及其他消息给指定的人的需求也比较常见。如通知某某某拿快递等等,鉴于此,希望webhook增加对企业微信应用的支持兼容。

功能点:
企业微信APP应用的核心参数可在app.conf配置;
image

企业微信APP应用支持API消息接收;
image
image

支持图文格式的消息内容收发;

支持markdown格式的消息内容收发;

支持文本格式的消息内容收发;

支持语音格式的消息内容收发;

支持任务卡片格式的消息内容收发;

更多请参考企业微信API文档:
https://work.weixin.qq.com/api/doc/90001/90143/90372

影响:
将完善消息通知通道,提升通道覆盖率;

请酌情考虑,谢谢。

graylog3的消息怎么路由

目前配置graylog3的webhook地址后,就是用 /graylog3/weixin 的接口,只会默认发给同一个模板。如果我定义了多个模板,就必须使用/prometheusalert的接口跟参数来配置,这样graylog3就有大量的通知器的配置。

aliyun 短信发送失败

阿里云短信发送失败
失败原因:
参数错误(isv.PARAM_LENGTH_LIMIT)
建议:
建议修改参数长度
是由于短信内容过长导致吗?是否可以精简阿里云短信信息呢?

go-sqlite3 requires cgo to work

启动服务报错信息如下:
register db Ping default, Binary was compiled with 'CGO_ENABLED=0', go-sqlite3 requires cgo to work. This is a stub
must have one register DataBase alias named default

自定义模板报错

貌似模板没错但是一直报错信息:

template: :6: unexpected "," in operand

模板:

{{ $var := .externalURL}}{{ range $k, $v:=.alerts }}
{{ if eq $v.status "resolved" }}
## [告警恢复]
#### [{{$v.labels.alertname}}]
###### 告警级别:{{$v.labels.level}}
###### 开始时间:{{$v.startsAt, timezone="Asia/Shanghai"}}
###### 结束时间:{{$v.endsAt, timezone="Asia/Shanghai"}}
###### 事件详情
alertname: {{$v.labels.alertname}}
endpoint: {{$v.labels.endpoint}}
instance: {{$v.labels.instance}}
namespace: {{$v.labels.namespace}}
job: {{$v.labels.job}}
pod: {{$v.labels.pod}}
prometheus: {{$v.labels.prometheus}}
service: {{$v.labels.service}}
severity: {{$v.labels.severity}}
########{{$v.annotations.description}}
{{else}}
## [告警信息]
#### [{{$v.labels.alertname}}]
###### 告警级别:{{$v.labels.severity}}
###### 开始时间:{{$v.startsAt, timezone="Asia/Shanghai"}}
###### 事件详情
########alertname: {{$v.labels.alertname}}
########endpoint: {{$v.labels.endpoint}}
########instance: {{$v.labels.instance}}
########namespace: {{$v.labels.namespace}}
########job: {{$v.labels.job}}
########pod: {{$v.labels.pod}}
########prometheus: {{$v.labels.prometheus}}
########service: {{$v.labels.service}}
########severity: {{$v.labels.severity}}
##### {{$v.annotations.description}}
{{end}}
{{ end }}

json:

{
	"receiver": "dingtalk-webhook-1",
	"status": "firing",
	"alerts": [{
		"status": "firing",
		"labels": {
			"alertname": "AlertmanagerFailedReload",
			"endpoint": "web",
			"instance": "10.200.221.95:9093",
			"job": "alertmanager-main",
			"namespace": "monitoring",
			"pod": "alertmanager-main-1",
			"prometheus": "monitoring/k8s",
			"service": "alertmanager-main",
			"severity": "warning"
		},
		"annotations": {
			"message": "Reloading Alertmanager's configuration has failed for monitoring/alertmanager-main-1."
		},
		"startsAt": "2020-09-12T08:13:18.582585835Z",
		"endsAt": "0001-01-01T00:00:00Z",
		"generatorURL": "http://prometheus-k8s-1:9090/graph?g0.expr=alertmanager_config_last_reload_successful%7Bjob%3D%22alertmanager-main%22%2Cnamespace%3D%22monitoring%22%7D+%3D%3D+0\u0026g0.tab=1"
	},{
		"status": "resolved",
		"labels": {
			"alertname": "AlertmanagerFailedReload",
			"endpoint": "web",
			"instance": "10.200.59.81:9093",
			"job": "alertmanager-main",
			"namespace": "monitoring",
			"pod": "alertmanager-main-2",
			"prometheus": "monitoring/k8s",
			"service": "alertmanager-main",
			"severity": "warning"
		},
		"annotations": {
			"message": "Reloading Alertmanager's configuration has failed for monitoring/alertmanager-main-2."
		},
		"startsAt": "2020-09-12T08:13:48.582585835Z",
		"endsAt": "2020-09-12T08:17:18.582585835Z",
		"generatorURL": "http://prometheus-k8s-0:9090/graph?g0.expr=alertmanager_config_last_reload_successful%7Bjob%3D%22alertmanager-main%22%2Cnamespace%3D%22monitoring%22%7D+%3D%3D+0\u0026g0.tab=1"
	}],
	"groupLabels": {
		"alertname": "AlertmanagerFailedReload",
		"service": "alertmanager-main"
	},
	"commonLabels": {
		"alertname": "AlertmanagerFailedReload",
		"endpoint": "web",
		"job": "alertmanager-main",
		"namespace": "monitoring",
		"prometheus": "monitoring/k8s",
		"service": "alertmanager-main",
		"severity": "warning"
	},
	"commonAnnotations": {},
	"externalURL": "http://alertmanager-main-0:9093",
	"version": "4",
	"groupKey": "{}:{alertname=\"AlertmanagerFailedReload\", service=\"alertmanager-main\"}"
}

时间显示不对,相差8小时

PrometheusAlertPrometheus故障告警信息
InstanceDown
告警级别: 信息
开始时间: 2020-04-08T07:25:49.905535979Z
结束时间: 0001-01-01T00:00:00Z
故障主机IP: 10.1x.x.x

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.