qihoo360 / doraemon Goto Github PK
View Code? Open in Web Editor NEWDoraemon is a Prometheus based monitor system
License: GNU General Public License v3.0
Doraemon is a Prometheus based monitor system
License: GNU General Public License v3.0
alert-gateway报错
[alerts.go:566] Insert alter failed:Error 1292: Incorrect datetime value: '0000-00-00' for column 'confirmed_at' at row 1
doraemon/cmd/alert-gateway/models/alerts.go 文件中
todayZero, _ := time.ParseInLocation("2006-01-02", "2019-01-01 15:22:22", time.Local)
sql中定义的confirmed_at的类型为datetime
请问有最新的加群二维码吗?想加下微信群。
您好,本地跑了Doraemon的流程,感觉很不错,使用hook方式告警,告警恢复时post传过来的json 中label值一直为None,不知道是否是我哪里配置有问题?
简单运行了一下,路还很长啊。。。
2020/07/24 11:48:22.738 [E] [alerts.go:482] Insert alter failed:Error 1292: Incorrect datetime value: '0000-00-00' for column 'confirmed_at' at row 1
感谢开源,很不错的项目,
描述需求场景:我有多个地区的prometheus,都是用转线来互通的,但是转线的不稳定性考虑我想把Alert-gateway部署的在每个地区一个,触发的告警就用每个地区部署的Alert-gateway发出,然后规则下发的Alert-gateay比如放在北京统一下发,这个架构可以么?
我是k8s部署,我想使用Clusterip的service,通过ingress来访问。我把doraemon.yml改了几处地方
1、WebUrl = "http://doraemon.***.cn"
2、window.CONFIG = {
baseURL: 'http://doraemon.***.cn',
};
3、apiVersion: v1
kind: Service
metadata:
labels:
app: doraemon-web
name: doraemon-web
namespace: monitoring
spec:
ports:
- protocol: TCP
port: 8080
targetPort: 80
selector:
app: doraemon-web
部署完毕,域名访问提示“”没有返回数据”
alertgateway容器日志报错:
2020/07/08 18:37:14.829 [C] [panic.go:522] Handler crashed with error runtime error: invalid memory address or nil pointer dereference
请问是我哪里配置错误,还是现在只支持Nodeport模式的service访问
doraemon/cmd/alert-gateway/models/alerts.go
Line 421 in c7ce31f
您好!支持微信告警吗?
安装方式:docker
问题现象:
(1)docker方式安装完以后,通过地址登录,但是没有输入“用户名”和“密码”的地方。
(2)点击其他功能按钮,没有任何反应。
建议:能否在文档中体现如何拍错的过程呢?
添加报警策略 是否支持企业微信告警,该怎么配置
Hi there,
I'm testing this project in local environment. After manually enabling ldap auth on frontend app(by editing(doraemon/web/app/page/base/app/login.js, changing chooseMethod from 'local' to 'ldap'), the backend log showing ' nomatch| POST /api/v1/login/ldap'.
Then I dig into the backend code in file named 'cmd/alert-gateway/controllers/login.go', it turns out that there was not // @router /ldap [post] and ldap authentication code there.
So LDAP authentication is in both your roadmap, config file and documents, but just not implemented yet. I'm I right?
Thanks for your works, good idea and its helpful by the way.
日志如下:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0x969ab0]
Hello,
doraemon,dose it just support running on docker and k8s ?
Dose it can be run local, I can not find document.
为了验证告警流程,我创建了一条规则,监控主机的node是否up,当我把node_exporter停掉之后,在告警历史中看不到任何记录,从gateway日志当中看到了如下错误:
2020/05/25 15:35:20.049 [I] [controller.go:218] [{2020-05-25 15:27:35.044897044 +0800 CST {主机exporter无响应 主机exporter无响应 871} 2020-05-25 15:27:50.044897044 +0800 CST map[instance:192.168.0.2:9100 job:ops-eryajf-test-1] 2020-05-25 15:35:20.044897044 +0800 CST 0001-01-01 00:00:00 +0000 UTC 2 2020-05-25 15:38:20.044897044 +0800 CST 0}]
2020/05/25 15:35:20.050 [E] [alerts.go:566] Insert alter failed:Error 1292: Incorrect datetime value: '0000-00-00' for column 'confirmed_at' at row 1
2020/05/25 15:35:20.050 [D] [server.go:2774] | 172.19.0.4| 200 | 1.029839ms| match| POST /api/v1/alerts r:/api/v1/alerts/
2020/05/25 15:35:30.322 [E] [panic.go:522] Panic in UpdateMaintainlist:runtime error: invalid memory address or nil pointer dereference
goroutine 11 [running]:
github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.UpdateMaintainlist.func1()
/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:46 +0xb5
panic(0xa10f00, 0xffd010)
/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.UpdateMaintainlist()
/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:69 +0x9c1
github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.init.1.func1()
/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:399 +0x64
created by github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.init.1
/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:395 +0x35
我导入的是文档中提供的sql。
hi, 数据源prometheus有basic auth怎么处理呢?
For now, the project initializes tables iff the database does not exist.
However, in some scenarios, the database is already created by DBA, in particular in the production environment. So I suppose we need a full support for data migration from creating database, creating skeletons of the tables to fulfilling necessary data.
With beego migrate
subcommand, it can be done with ease.
建议在 rule-engine 模块中设置 rule 文件存储路径以及 tsdb 数据的持久化存储。
另外问一下有人对prometheus.yml的编辑修改进行扩展开发吗,监控节点也想弄个界面来配置。
类似于grafana那样添加datasource时,有连通性测试模块
ops-alert ops-alert:ops-alert@tcp(10.70.0.122:3306)/
must have one register DataBase alias named default
非常感谢。天下苦alertmanager久矣。
我想问的是未来是否会在 k8s和docker-compose以外提供本地化部署或者未来作为单一二进制文件。启动。
当然目前,我也可以直接按照compose里的写法,单独拿出来打包本地部署。
但是如果能有一份本地化部署的官方文档再好不过。
感谢。
prometheus not support tls
web登陆后,随机点击左侧功能标签,很大几率自动登出,需要重新登陆
目前,Master 分支代码跟镜像是不是不一致?看起来镜像更新一点
不好意思,看了各个地方好像都没有群连接,所以开了一个issues问一下。
在如上链接的架构图中,有一条从 prometheus-server 指向 rule-engine 的调用链,备注为 "push alerts“, 我看 doraemon 的源码中 rule-engine 并没有启动服务。
请问,这条调用链是如何实现的?谢谢
doraemon/cmd/alert-gateway/models/alerts.go
Line 127 in 55d03e0
当 alert 表中的数据起来以后。 前端不好处理数据。 个人建议 多接口多调用。以下皆为示例。莫笑。
事件确认模块 关于 key:value 信息的获取接口
func (u *Alerts) EventTagMeta() map[string][]string {
records := make([]string, 0)
_, err := Ormer().
Raw( "SELECT labels FROM alert WHERE status=2").QueryRows(&records)
if err != nil {
logs.Logger.Warning( "get labels meta failed")
}
mp := make(map[string][]string)
for _, instance := range records {
for _, j := range strings.Split( instance, "\v") {
kv := strings.Split(j, "\a")
if _, ok := mp[kv[0]]; ok {
mp[kv[0]] = append(mp[kv[0]], kv[1])
} else {
mp[kv[0]] = []string{ kv[1]}
}
}
}
mp1 := make(map[string][]string)
for key, value := range mp {
t := make(map[string]int)
for _, v := range value {
if s, ok := t[v]; ok {
t[v] = s + 1
} else {
t[v] = 1
}
}
t1 := make([]string, 0)
for key, _ := range t {
t1 = append(t1, key)
}
mp1[key] = append(mp1[key], t1...)
}
return mp1
}
事件确认模块 界面刚打开获取内容的接口
func (u *Alerts) GetEvents(pageNo ,pageSize int64) ShowAlerts {
var showAlerts ShowAlerts
showAlerts.Alerts = []common.AlertForShow{}
var records []record
Ormer().
Raw("SELECT id,rule_id,labels,value,count,status,summary,description,confirmed_by,fired_at,confirmed_at,confirmed_before,resolved_at FROM alert WHERE status=2 ORDER BY id DESC LIMIT ?,?",
(pageNo-1)*pageSize, pageSize).
QueryRows(&records)
Ormer().
Raw("SELECT count(*) FROM alert WHERE status=2 ").
QueryRow(&showAlerts.Total)
for _, i := range records {
showAlerts.Alerts = append(showAlerts.Alerts, i.toAlertForShow())
}
return showAlerts
}
事件确认模块 选中 map 以后的接口
func (u *Alerts) GetEvent(pageNo ,pageSize int64, labels string) ShowAlerts {
var showAlerts ShowAlerts
showAlerts.Alerts = []common.AlertForShow{}
var records []record
Ormer().
Raw("SELECT id,rule_id,labels,value,count,status,summary,description,confirmed_by,fired_at,confirmed_at,confirmed_before,resolved_at FROM alert WHERE status=2 AND labels LIKE ? ORDER BY id DESC LIMIT ?,?",
"%"+labels+"%", (pageNo-1)*pageSize, pageSize).
QueryRows(&records)
Ormer().
Raw("SELECT count(*) FROM alert WHERE status=2 AND labels LIKE ?",
"%"+labels+"%").
QueryRow(&showAlerts.Total)
for _, i := range records {
showAlerts.Alerts = append(showAlerts.Alerts, i.toAlertForShow())
}
return showAlerts
}
doraemon/cmd/alert-gateway/models/alerts.go
Line 370 in 931e065
if instance.Status !=0 {
if elemt.State == AlertStatusOff {
recoverAlert(*a)
}
// 这里为了规避当触发器满足条件触发后,触发器规则被删除
if elemt.ValidUntil.Unix() - elemt.LastSentAt.Unix() <=0 {
a.State = AlertStatusOff
recoverAlert(*a)
}
Ormer().
Raw("UPDATE alert SET summary=?,value=? WHERE id=?",
elemt.Annotations.Summary, elemt.Value, instance.Id).
Exec()
} else {
// 这里是为了规避 偶发性 事件恢复时 resolved_at 字段为空的情况
Ormer().
Raw("UPDATE alert SET summary=?,value=?, resolved_at=? WHERE id=?",
elemt.Annotations.Summary, elemt.Value, elemt.ResolvedAt ,instance.Id).
Exec()
}
}
您好,今天看了下Doraemon,大部分功能都不错,不过我看目前没有配置报警级别的地方,是不是可以考虑增加该功能呢,或者添加一个label设置的选项,让用户自定义label。报警级别主要是考虑在报警计划管理里面可以通过不同的级别通过什么样的报警取到发送。
举例
node_load1 > 10
在运用这条规则时,事件产生 100条。事件产生时值的范围主要是集中在两个阶段
值范围 | 事件条数 |
---|---|
10 ~ 15 | 60 |
15 ~ 20 | 40 |
动态将 node_load1>10 变更为 node_load1>15 ,观察消息通知。发现 在 10 ~ 15 范围内的事件不触发 恢复通知
当我配置rabbitmq告警后,会出现如下数据越界的报错,其他中间件的告警规则看起来就不会出现,请问下要怎么解决
2020/09/03 10:12:37.994 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:37.998 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.001 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.004 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.008 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.012 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.015 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.019 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.022 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.026 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.029 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.033 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.036 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.039 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.043 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.046 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.047 [D] [server.go:2774] | 127.0.0.1| 200 | 305.073647ms| match| POST /api/v1/alerts r:/api/v1/alerts/
panic: runtime error: index out of range
goroutine 31 [running]:
github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.Filter(0xc00038bd98, 0xc00038bd68, 0xc00017c120, 0xc0001ee070)
/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:353 +0x31d6
github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.InitTimer.func5.2(0xc000258000, 0x13, 0xc00017c120)
/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:792 +0x518
created by github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.InitTimer.func5
/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:770 +0xd8
选择告警通知方式为hook, 填入接口url后,doraemon-backend容器就一直重启,选择其它告警通知方式没有这种情况。
目前支持钉钉报警吗
安装方式:docker
问题现象:
(1)docker方式安装完以后,通过地址登录,但是没有输入“用户名”和“密码”的地方。
告警恢复信息中的value值依然是告警时的值,能否支持告警恢复时获取监控项此时正常的value值(未超出阈值)?
环境: ucloud主机+docker-compose
环境搭建好后,访问首页出现如下错误:
error {data: "", msg: "服务器连接超时", code: -1}
2client.ad01.js:1 dark
vendor.dll.js:1 GET http://10.9.170.80:8080/api/v1/login/method net::ERR_CONNECTION_TIMED_OUT
注:已配网络规则
src: 192.168.7.x
dst: 192.168.14.249
err: Access to XMLHttpRequest at 'http://192.168.14.249:8080/api/v1/login/username' from origin 'http://192.168.14.249:32000' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource.
dockercompose_doraemon-frontend_1 /usr/local/openresty/bin/o ... Restarting
doraemon-frontend一直Restarting
尝试手工打包build/frontend/Dockerfile
修改openresty/openresty:1.15.8.1-1-centos 为 openresty/openresty:1.17.8.1-0-centos
问题解决。
实际使用prometheus有时候会通过basic-auth暴露到公网,这种数据源添加支持认证吗? 数据源地址可以下k8s集群内地址吗,比如 http://prometheus-k8s.monitoring.svc.cluster.local:9090吗?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.