shaido987 / riskloc Goto Github PK

Implementation of RiskLoc, a method for localizing multi-dimensional root causes.

Python 100.00%

adtributor autoroot hotspot multi-dimensional rca riskloc root-cause root-cause-analysis root-cause-location squeeze time-series

riskloc's Introduction

🎓 I'm a Senior Research Engineer at Noah's Ark Lab at Huawei in Hong Kong.
🔆 My work has mainly been on time series, including anomaly detection, forecasting, root cause analysis/localization, and spatio-temporal graphs. These days I'm getting more involved with robotics.
🌱 As my previous work alludes, my research interests are quite broad. My focus right now is specifically on LLMs and dexterous manipulation.

riskloc's People

Contributors

Stargazers

Watchers

Forkers

helloxingjun lx2m17 yistar-traitor juntaozhang chaochaobar zhihuangli1221 forestqin mingwenqin hadrianpaulo davidlight2018 zhangqingpei1994 zhang-yz ouyang666 melaniewangjin henrygits forrestgg njust-taoye

riskloc's Issues

question of surprise in adtributor

The calculation of surprise value in adtributor seems not correct to me.

The JS divergense formula should be:

So, the code should be:

p = df['predict'] / F
q = df['real'] / A
m = (p + q) / 2
df['surprise'] = 0.5 * np.sum(p * np.log(p/m)) + 0.5 * np.sum(q * np.log(q/m))

what do you think? thanks.

hotspot方法：关于PS度量因置信度的可解释性

大佬您好，PS方法采用RE（涟漪效应）来度量因的置信度，如何理解PS方法的原理

很多人的猜想类似于下面的：
如果属性值是因，属性值的变化和属性值样本的变化符合涟漪效应；
如果属性值的变化和属性值样本的变化符合涟漪效应，则属性值是因

这种理解对么

About dataset generate.

Hello:
I have a question about the method of anomaly injection(scale_anomaly) in generate_dataset.py, why should a relatively large value be taken in row*(1-r) and 0, which will cause the predicted value of some abnormal combination to be 0, so in It will be filtered out when using squeeze.

the result of adtributor

hi, I have ran the adtributor algorithm with the B0 dataset, but get 0 TP. Is there something wrong about the code.

Question about the value of "n_remove" in riksloc

Hi,

I hope you are well.

When I used riskloc in my dataset, I noticed that it can precisely found the root cause. However, my purpose is to find those anomalies that occur more frequently, so I would consider those rare root causes I found would be some outliers. Then I tried to increase the value of "n_remove" , but still not got my expected result.

Also, when I decrease the "n_remove" to 1, the "cutoff" value shifted a lot, and the output return null. When I do the same thing in another dataset, the result was not affected. I compared the distributions of measurements of 2 datasets, the first one is more like normal distribution, the second one is like long-tailed distribution.

Here are my questions:

Is adjusting n_remove a way to do what I expect? If yes, is there some more reliable way than setting constants arbitrarily?
Does the distribution of the measurements range affect the performance of the algorithm?

I am looking forward to your reply.

input data

Hi, I have a question about the input data, how is the data input to the different algorithms? Thanks a lot in advance

About the forecasts

Thanks for your excellent work! It really helped me a lot.

Recently, I have also been focusing on issues in this area (Multi-dimensional Root Causes Analysis), and as you mentioned:

In practice, I found that the most difficult step is to get accurate forecasting values for all leaf elements. Since these are usually quite fine-grained, they don't actually have much data and any forecasts are often inaccurate. This can skew the results.

I've also found it very difficult to get the forecast value of all leaves, in particular, certain combinations have only a small number of values or are almost 0, is there any suitable forecasting method worth recommending in this case?

Or have you tried using the RiskLoc algorithm in a real industrial scenario, and if so, can you share what forecasting method you used in this case?

demo for squeeze

Hi,

I try to understand the algorithm squeeze and someone recommend your project. But I can't understand the detail of how each function works, especially the meaning of the input of def squeeze(df, attributes, delta_threshold=0.9, debug=False), line 124 in squeeze.py. Could you provide a demo in the future? Thank you.

Best,
YYL

数据集获取不了

这个数据集获取不了 All datasets are available at Tsinghua Cloud: https://cloud.tsinghua.edu.cn/d/aa4102a5d1614e57bc36/
可以提供一下网盘链接嘛

shaido987 / riskloc Goto Github PK

riskloc's Introduction

riskloc's People

Contributors

Stargazers

Watchers

Forkers

riskloc's Issues

question of surprise in adtributor

hotspot方法：关于PS度量因置信度的可解释性

About dataset generate.

the result of adtributor

Question about the value of "n_remove" in riksloc

input data

About the forecasts

demo for squeeze

数据集获取不了

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent