Code Monkey home page Code Monkey logo

doraemon's People

Contributors

70data avatar caodonghui123 avatar cx2c avatar dd01tianhua avatar dependabot[bot] avatar doublemine avatar huangwei2013 avatar icemintchocolate avatar jayryu avatar jsvisa avatar lujiajing1126 avatar sencoder avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

doraemon's Issues

alert-gateway报错,alert record写库失败

alert-gateway报错
[alerts.go:566] Insert alter failed:Error 1292: Incorrect datetime value: '0000-00-00' for column 'confirmed_at' at row 1

doraemon/cmd/alert-gateway/models/alerts.go 文件中
todayZero, _ := time.ParseInLocation("2006-01-02", "2019-01-01 15:22:22", time.Local)

sql中定义的confirmed_at的类型为datetime

告警恢复信息label的值为空

您好,本地跑了Doraemon的流程,感觉很不错,使用hook方式告警,告警恢复时post传过来的json 中label值一直为None,不知道是否是我哪里配置有问题?

告警异常

在使用过程中发现两个问题,麻烦看下是代码bug,还是我使用过程中配置有误

  1. 采用HOOK邮件方式告警, 当告警项已恢复正常值,web界面也已显示告警已恢复,在prometheus web界面查询此时也是正常值,但是还是会不间断的收到告警邮件,告警信息中value值为第一次触发告警时的值。

  2. HOOK方式获取的告警发生时间和web界面显示的告警发生时间不同

告警规则和告警计划配置见下图
1
2

Alert-gateway问题

感谢开源,很不错的项目,
描述需求场景:我有多个地区的prometheus,都是用转线来互通的,但是转线的不稳定性考虑我想把Alert-gateway部署的在每个地区一个,触发的告警就用每个地区部署的Alert-gateway发出,然后规则下发的Alert-gateay比如放在北京统一下发,这个架构可以么?

k8s部署ingress访问入口

我是k8s部署,我想使用Clusterip的service,通过ingress来访问。我把doraemon.yml改了几处地方
1、WebUrl = "http://doraemon.***.cn"

2、window.CONFIG = {
baseURL: 'http://doraemon.***.cn',
};

3、apiVersion: v1
kind: Service
metadata:
labels:
app: doraemon-web
name: doraemon-web
namespace: monitoring
spec:
ports:
- protocol: TCP
port: 8080
targetPort: 80
selector:
app: doraemon-web

部署完毕,域名访问提示“”没有返回数据”
alertgateway容器日志报错:
2020/07/08 18:37:14.829 [C] [panic.go:522] Handler crashed with error runtime error: invalid memory address or nil pointer dereference

请问是我哪里配置错误,还是现在只支持Nodeport模式的service访问

安装完以后,界面操作有问题

安装方式:docker

问题现象:
(1)docker方式安装完以后,通过地址登录,但是没有输入“用户名”和“密码”的地方。
(2)点击其他功能按钮,没有任何反应。

建议:能否在文档中体现如何拍错的过程呢?

no LDAP authentication method

Hi there,
I'm testing this project in local environment. After manually enabling ldap auth on frontend app(by editing(doraemon/web/app/page/base/app/login.js, changing chooseMethod from 'local' to 'ldap'), the backend log showing ' nomatch| POST /api/v1/login/ldap'.
Then I dig into the backend code in file named 'cmd/alert-gateway/controllers/login.go', it turns out that there was not // @router /ldap [post] and ldap authentication code there.

So LDAP authentication is in both your roadmap, config file and documents, but just not implemented yet. I'm I right?

Thanks for your works, good idea and its helpful by the way.

kubernetes启动,web不断重启

日志如下:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0x969ab0]

support local running

Hello,
doraemon,dose it just support running on docker and k8s ?
Dose it can be run local, I can not find document.

当我验证一条规则的时候,遇到了如下错误

为了验证告警流程,我创建了一条规则,监控主机的node是否up,当我把node_exporter停掉之后,在告警历史中看不到任何记录,从gateway日志当中看到了如下错误:

2020/05/25 15:35:20.049 [I] [controller.go:218]  [{2020-05-25 15:27:35.044897044 +0800 CST {主机exporter无响应 主机exporter无响应 871} 2020-05-25 15:27:50.044897044 +0800 CST map[instance:192.168.0.2:9100 job:ops-eryajf-test-1] 2020-05-25 15:35:20.044897044 +0800 CST 0001-01-01 00:00:00 +0000 UTC 2 2020-05-25 15:38:20.044897044 +0800 CST 0}]

2020/05/25 15:35:20.050 [E] [alerts.go:566]  Insert alter failed:Error 1292: Incorrect datetime value: '0000-00-00' for column 'confirmed_at' at row 1
2020/05/25 15:35:20.050 [D] [server.go:2774]  |     172.19.0.4| 200 |   1.029839ms|   match| POST     /api/v1/alerts   r:/api/v1/alerts/
2020/05/25 15:35:30.322 [E] [panic.go:522]  Panic in UpdateMaintainlist:runtime error: invalid memory address or nil pointer dereference
goroutine 11 [running]:
github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.UpdateMaintainlist.func1()
	/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:46 +0xb5
panic(0xa10f00, 0xffd010)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.UpdateMaintainlist()
	/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:69 +0x9c1
github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.init.1.func1()
	/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:399 +0x64
created by github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.init.1
	/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:395 +0x35

我导入的是文档中提供的sql。

Full support for database migration

For now, the project initializes tables iff the database does not exist.

However, in some scenarios, the database is already created by DBA, in particular in the production environment. So I suppose we need a full support for data migration from creating database, creating skeletons of the tables to fulfilling necessary data.

With beego migrate subcommand, it can be done with ease.

[功能建议]报警规则管理展示

1、告警规则的筛选功能能不能把数据源和报警计划做成下拉菜单

2、告警规则能不能设置规则组。现在规则多起来看起来很乱,用组分类下便于管理。并且单个规则和规则组都能绑定告警计划,是不是就灵活一点。

Snipaste_2020-08-07_09-23-09

是否会提供本地化部署方式

非常感谢。天下苦alertmanager久矣。
我想问的是未来是否会在 k8s和docker-compose以外提供本地化部署或者未来作为单一二进制文件。启动。
当然目前,我也可以直接按照compose里的写法,单独拿出来打包本地部署。
但是如果能有一份本地化部署的官方文档再好不过。
感谢。

web登陆异常退出

web登陆后,随机点击左侧功能标签,很大几率自动登出,需要重新登陆

push alerts 调用链的困惑

Architecture

在如上链接的架构图中,有一条从 prometheus-server 指向 rule-engine 的调用链,备注为 "push alerts“, 我看 doraemon 的源码中 rule-engine 并没有启动服务。

请问,这条调用链是如何实现的?谢谢

当 alert 表数据起来以后,前端不好处理数据。建议多接口多调用

func (u *Alerts) ClassifyAlerts() map[string]map[string][]OneAlert {

当 alert 表中的数据起来以后。 前端不好处理数据。 个人建议 多接口多调用。以下皆为示例。莫笑。
事件确认模块 关于 key:value 信息的获取接口

func (u *Alerts) EventTagMeta()  map[string][]string {

	records := make([]string, 0)
	_, err := Ormer().
		Raw( "SELECT labels FROM alert WHERE status=2").QueryRows(&records)
	if err != nil {
		logs.Logger.Warning( "get labels meta failed")
	}
	mp := make(map[string][]string)

	for _, instance := range records {
		for _, j := range strings.Split( instance, "\v") {
			kv := strings.Split(j, "\a")
			if _, ok := mp[kv[0]]; ok {
				mp[kv[0]] = append(mp[kv[0]], kv[1])
			} else {
				mp[kv[0]] = []string{ kv[1]}
			}
		}
	}

	mp1 := make(map[string][]string)
	for key, value := range mp {
		t := make(map[string]int)
		for _, v := range value {
			if s, ok := t[v]; ok {
				t[v] = s + 1
			} else {
				t[v] = 1
			}
		}
		t1 := make([]string, 0)
		for key, _ := range t {
			t1 = append(t1, key)
		}
		mp1[key] = append(mp1[key], t1...)
	}

	return mp1
}

事件确认模块 界面刚打开获取内容的接口

func (u *Alerts) GetEvents(pageNo ,pageSize int64) ShowAlerts {
	var showAlerts ShowAlerts
	showAlerts.Alerts = []common.AlertForShow{}
	var records []record

	Ormer().
		Raw("SELECT id,rule_id,labels,value,count,status,summary,description,confirmed_by,fired_at,confirmed_at,confirmed_before,resolved_at FROM alert  WHERE status=2  ORDER BY id DESC LIMIT ?,?", 
			(pageNo-1)*pageSize, pageSize).
		QueryRows(&records)
	
	Ormer().
		Raw("SELECT count(*) FROM alert  WHERE status=2 ").
		QueryRow(&showAlerts.Total)

	for _, i := range records {
		showAlerts.Alerts = append(showAlerts.Alerts, i.toAlertForShow())
	}
	return showAlerts
}

事件确认模块 选中 map 以后的接口

func (u *Alerts) GetEvent(pageNo ,pageSize int64, labels string) ShowAlerts  {
	var showAlerts ShowAlerts
	showAlerts.Alerts = []common.AlertForShow{}
	var records []record

	Ormer().
		Raw("SELECT id,rule_id,labels,value,count,status,summary,description,confirmed_by,fired_at,confirmed_at,confirmed_before,resolved_at FROM alert  WHERE status=2 AND labels LIKE ? ORDER BY id DESC LIMIT ?,?",  
			"%"+labels+"%", (pageNo-1)*pageSize, pageSize).
		QueryRows(&records)
	
	Ormer().
		Raw("SELECT count(*) FROM alert  WHERE status=2 AND labels LIKE ?", 
			"%"+labels+"%").
		QueryRow(&showAlerts.Total)

	for _, i := range records {
		showAlerts.Alerts = append(showAlerts.Alerts, i.toAlertForShow())
	}

	return showAlerts
}

判定事件是否恢复

if len(queryres) > 0 {

这部分代码逻辑是不是有点问题。

					if instance.Status !=0 {
						if elemt.State == AlertStatusOff {
							recoverAlert(*a)
						}
                                                // 这里为了规避当触发器满足条件触发后,触发器规则被删除
						if elemt.ValidUntil.Unix() - elemt.LastSentAt.Unix() <=0 {
							a.State = AlertStatusOff
							recoverAlert(*a)
						}

						Ormer().
							Raw("UPDATE alert SET summary=?,value=? WHERE id=?",
								elemt.Annotations.Summary, elemt.Value, instance.Id).
							Exec()
						} else {
                                                   // 这里是为了规避 偶发性 事件恢复时 resolved_at 字段为空的情况
							Ormer().
								Raw("UPDATE alert SET summary=?,value=?, resolved_at=? WHERE id=?",
									elemt.Annotations.Summary, elemt.Value, elemt.ResolvedAt ,instance.Id).
								Exec()
						}
					}

告警级别设置

您好,今天看了下Doraemon,大部分功能都不错,不过我看目前没有配置报警级别的地方,是不是可以考虑增加该功能呢,或者添加一个label设置的选项,让用户自定义label。报警级别主要是考虑在报警计划管理里面可以通过不同的级别通过什么样的报警取到发送。

[bug] rule-engine 查询 rule 记录

Rule-engine 运行过程中产生的 rule 记录信息。

Rule-engine 模块运行过程中,对阈值的动态调整并不会引起事件的恢复通知。

举例

触发器规则

node_load1 > 10
在运用这条规则时,事件产生 100条。事件产生时值的范围主要是集中在两个阶段

值范围 事件条数
10 ~ 15 60
15 ~ 20 40

动态将 node_load1>10 变更为 node_load1>15 ,观察消息通知。发现 在 10 ~ 15 范围内的事件不触发 恢复通知

Insert alter failed数组越界

当我配置rabbitmq告警后,会出现如下数据越界的报错,其他中间件的告警规则看起来就不会出现,请问下要怎么解决
2020/09/03 10:12:37.994 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:37.998 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.001 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.004 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.008 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.012 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.015 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.019 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.022 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.026 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.029 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.033 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.036 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.039 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.043 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.046 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster�rabbit@do-ns-dev-ops-rabbitmq-172-19-0-122�durable�fa' for key 'ruleid_labels_firedat'
2020/09/03 10:12:38.047 [D] [server.go:2774] | 127.0.0.1| 200 | 305.073647ms| match| POST /api/v1/alerts r:/api/v1/alerts/
panic: runtime error: index out of range
goroutine 31 [running]:
github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.Filter(0xc00038bd98, 0xc00038bd68, 0xc00017c120, 0xc0001ee070)
/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:353 +0x31d6
github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.InitTimer.func5.2(0xc000258000, 0x13, 0xc00017c120)
/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:792 +0x518
created by github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.InitTimer.func5
/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:770 +0xd8

doraemon-backend容器一直重启

选择告警通知方式为hook, 填入接口url后,doraemon-backend容器就一直重启,选择其它告警通知方式没有这种情况。

界面按钮点击没有反应。

安装方式:docker

问题现象:
(1)docker方式安装完以后,通过地址登录,但是没有输入“用户名”和“密码”的地方。

告警恢复信息

告警恢复信息中的value值依然是告警时的值,能否支持告警恢复时获取监控项此时正常的value值(未超出阈值)?

doraemon-frontend一直Restarting

dockercompose_doraemon-frontend_1 /usr/local/openresty/bin/o ... Restarting
doraemon-frontend一直Restarting

尝试手工打包build/frontend/Dockerfile
修改openresty/openresty:1.15.8.1-1-centos 为 openresty/openresty:1.17.8.1-0-centos

问题解决。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.