thuml / anomaly-transformer Goto Github PK

About Code release for "Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight), https://openreview.net/forum?id=LzQQ89U1qm_

License: MIT License

Python 92.73% Shell 7.27%

anomaly-detection deep-learning time-series

anomaly-transformer's People

Contributors

Stargazers

Watchers

Forkers

eventhorizon7 liujie40 meibaotai mdyoucef githubtpx zp1018 zdstandup xjw-wade shinel94 solderzzc kiburmsong jadsalloum 5uperpalo weizhang-indi lyongo theroadofsky dl-vit zaihanlit yyyyybb567 sota-miura system9x ab93 kpreetsid joelstiventorres jc0624 qihuagaosh zzr06 zhanglongjianjie lvzhiqiang juwan-s branden-kang yunx1aoyuj1 licj1 nabayanc chatterbox-k henderson11 zhkai wucutin donstang jorgebmann zggg1p cstcloudops echizen456 yyysjz1997 luciferjason panzipanzi evgeniya1 siwguan ykpercy hhhh1230kvj heguangwu margaux2022 bariscann lasklu bighorse-ai zy123458 chinahappyking homany mr-memorandum niall-twomey vishalbelsare zehui03 kamnol fengxingxiang smallapes zbelky cxapro codefavor2018 ricklovelisa pedrohbtp prithuls mila-aia philipp91roberto hrgentry jhpark-git ekapujiw2002 zhichaoyou dawoud xpnhcnsh mojin2855 leima0324 gg777777777 gregkavalerov jasondepblu 0nedawn aiwilduitraman dreamflychen dpshorten pxie1985 bverghese tonydev-timeseries-ml johnykiyo sonsoowon jackkrasmus-vorrath llayer otavioon seong-1k xhfei1224 proy10 zpdg

anomaly-transformer's Issues

请问是否可以开源THOC的代码？

谢谢

Why combined energy is used to get a threshold?

Hi. Thanks for your work.
In the code, both train and test energy are used to get a threshold.
Which seems different from the explanation of the paper, that is "The threshold δ is determined to make r proportion data of the validation dataset labeled as anomalies."
I wonder why train data is included to get a threshold in the code.
I'd really appreciate it if you answer.

Hello, may I ask why the results of these two lines of code are the same in test mode?

metric = torch.softmax((-series_loss - prior_loss), dim=-1)
metric = torch.softmax((series_loss + prior_loss), dim=-1)

How should I reproduce the performance of Anomaly-Transformer on SWaT

I want to reproduce the performance of Anomaly-Transformer on the SWaT dataset, but I don't know how the SWaT dataset is preprocessed. Can you help me? Thank you!

Data download website error

I try to download the data by clicking the link in README, but it goes to "Page unavailable". Are there any other ways to download?

The model scored poorly after annotating the "detection adjustment" code

Hi, this is an amazing job.
Here I come across a small problem.
On MSL dataset, the model performed good, looks like:
======================TEST MODE====================== Threshold : 0.0017330783803481142 pred: (73700,) gt: (73700,) pred: (73700,) gt: (73700,) Accuracy : 0.9853, Precision : 0.9161, Recall : 0.9473, F-score : 0.9314

But after I annotated the "detection adjustment" code, the score was poorly, looks like:
======================TEST MODE====================== Threshold : 0.0017330783803481142 pred: (73700,) gt: (73700,) pred: (73700,) gt: (73700,) Accuracy : 0.8866, Precision : 0.1120, Recall : 0.0109, F-score : 0.0199

And I'm sure only the "detection adjustment" code was annotated.

Can you help me out of this problem?
thx.

About "detection adjustment" in the line 339-360 of solver.py

Since some researchers are confused about the "detection adjustment", we provide some clarification here.

(1) Why use "detection adjustment"?

Firstly, I strongly suggest the researchers read the original paper Xu et al., 2018, which has given a comprehensive explanation of this operation.

In our paper, we follow this convention because of the following reasons:

Fair comparison: As we stated in the Implementation details section of our paper, the adjustment is a widely-used convention in time series anomaly detection. Especially, in the benchmarks that we used in our paper, the previous methods all use the adjustment operation for the evaluation of these benchmarks Shen et al., 2020. Thus, we also adopt the adjustment for model evaluation.
Real-world meaning: Since one abnormal event will cause a segment of abnormal time points. The adjustment corresponds to the "abnormal event detection" task, which is to evaluate the model performance in detecting the abnormal events from the whole records. This is a very meaningful task for real-world applications. Once we have detected the abnormal event, we can send a worker to check that time segment for security.

In summary, you can view the adjustment as an "evaluation protocol", which is to measure the capability of models in "abnormal event detection".

(2) We have provided a comprehensive and fair comparison in our paper.

All the baselines that we compared in our paper are also evaluated with this "adjustment". Note that this evaluation is widely used in the previous papers for the experiments on SMD, SWaT, and so on. Thus, the comparison is fair.
For a comprehensive analysis, we also provide a benchmark for the UCR dataset in Appendix L, which is from KDD Cup. The anomalies in this dataset are mostly recorded only at a single time point. Thus, if you want to obtain the comparison on single-time-point anomaly detection, this dataset can provide some intuitions.

If you still have some questions about the adjustment, welcome to email me and discuss more ([email protected]).

关于 UCR 数据集的使用

据我所知，UCR数据集由250个不由的子数据集组成，每个子数据集仅有一个时间段异常。想请教下您是采用什么策略进行模型测试的？是从250个子数据集中抽取了几个吗？

About 'self.test_labels[0:self.win_size]' in the line 187-192 of data_loader.py

I really appreciate your work and thanks a lot for sharing this work with us!

I am now following your code step by step to understand the model and apply it to my data.
And while I'm doing this, I got a question.

In the line 187-192 of data_loader.py
The code is written like this.

187 def getitem(self, index):
***
190 return np.float32(self.train[index:index + self.win_size]), np.float32(self.test_labels[0:self.win_size])
192 return np.float32(self.val[index:index + self.win_size]), np.float32(self.test_labels[0:self.win_size])

Shouldn't the '0' be changed to 'index'?
self.test_labels[0:self.win_size] -> self.test_labels[index:index + self.win_size]

If there's anything that I missed please let me know.

Some questions about Prior-Association

Thanks for submitting such a great job, but I have some questions with prior-association of the code

Anomaly-Transformer/model/attn.py

Lines 48 to 49 in 72a71e5

    
           sigma = torch.sigmoid(sigma * 5) + 1e-5 
        
           sigma = torch.pow(3, sigma) - 1

In this part, some mathematical processing is done on the sigma
Could you please explain some reasons for doing this?
I can not find any reference in your paper.

我有一个关于class AnomalyAttention的问题

self.conv1 = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=5, padding=2)
请问一下这行代码的作用是什么呀，我把它注释掉了也训练得起来。

在测试SMD的时候step为什么会设置成100呢？

https://github.com/thuml/Anomaly-Transformer/blob/main/data_factory/data_loader.py#L204
为什么smd为100，其他数据集都是1，论文里也没有详细的说明

About the trained models

Hello! Thank you for uploading this impressive work!

According to the paper, you have tested your method on several datasets. Could you share the already trained models on these datasets? Thank you very much!

关于参数temperature

您好，非常优秀的文章👍。不过有一个小问题想请教一下作者，在test计算series_loss与prior_loss时，引入了一个参数temperature，请问这个参数的用意是？能否用1/len(prior)代替呢？

                series_loss += my_kl_loss(series[u], (

                        prior[u] / torch.unsqueeze(torch.sum(prior[u], dim=-1), dim=-1).repeat(1, 1, 1,

                                                                                               self.win_size)).detach()) * temperature

                prior_loss += my_kl_loss(

                    (prior[u] / torch.unsqueeze(torch.sum(prior[u], dim=-1), dim=-1).repeat(1, 1, 1,

                                                                                            self.win_size)),

                    series[u].detach()) * temperature

F1 and F1-PA

Hello,
I've encountered the exactly same issue as previous issue, and want to ask your opinion.

To provide more,
Along with the code without point adjustment (commenting out the PA part as in previous issue), I personally got the following result:

Dataset	F1-PA	F1-PA (paper)	F1
MSL	0.9500	0.9359	0.0209
PSM	0.9750	0.9789	0.0217
SMAP	0.9636	0.9669	0.0189
SMD	0.8944	0.9233	0.0201

Although I agree that the F1-PA algorithm has practical justification (abnormal time point will cause an alert and
further make the whole segment noticed in real-world applications.), (1) F1 seems to aggravate too much, and (2) AAAI paper raises concern about F1-PA metrics: even random guessing can achieive high F1-PA depending on data distribution.

I want to ask your opinion on these results.
Thanks in advance.

Data download website error

I try to download the data by clicking the link in README, but it both goes to "Page unavailable". Are there any other ways to download?

bugs happens when doing the backward propgation for loss1 and loss2 in train function

Really exccllent work!
Just met a RuntimeError when the program went to line 191 in solver.py
It is saying that

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 55]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I am running in Python3.8 and pytorch 1.6.0.
Not sure whether it is a compatibility issue.

Thank you!

Minimax strategy and early_stopping

Hi, I have two issues with Minimax strategy and early stopping :
1, loss1 is to maximize series_association and loss2 is to minimize prior_association. In the original paper, it was minimize, then maximize. Why did it become maximize, then minimize in the code?
2. Why score = -loss when stopping early？

Got error: RuntimeError: All inputs of where should have same/compatible number of dims

The error located in loss2.backward ()

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Hello,
When I run bash ./scripts/MSL.sh, I get the following error:
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 55]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

solver.py use gt before auc test

good work,i have a simple question.
solver.py line 338 # detection adjustment
in the detection adjustment stage, the code use gt to adjust pred result.But gt can't be used before auc test.
Why?Is that ok to do this?

Wrongly valid on test_loader. Unfair evaluation.

It seems like the validation is on test_loader while not vali_loader, which is unfair to some extent and would make the results a little bit different.

Anomaly-Transformer/solver.py

Line 196 in 72a71e5

vali_loss1, vali_loss2 = self.vali(self.test_loader)

Moreover, directly using thre_loader to find thresholds would cause test datasets leakage, since thre_loader is built on test_data while not valid_data.

Anomaly-Transformer/solver.py

Line 254 in 72a71e5

for i, (input_data, labels) in enumerate(self.thre_loader):

Anomaly-Transformer/data_factory/data_loader.py

Lines 66 to 69 in 72a71e5

    
           else: 
        
               return np.float32(self.test[ 
        
                                 index // self.step * self.win_size:index // self.step * self.win_size + self.win_size]), np.float32( 
        
                   self.test_labels[index // self.step * self.win_size:index // self.step * self.win_size + self.win_size])

train_loss 为负数，但是test结果正确，为什么？

复现时出现了梯度计算报错

但是，注掉loss2.backward代码可以通过

questions on validation and early stop

hello,
when i trained the model with the PSM data, i found that it is difficult to escape from early stop.
is it not easy to get the best vali loss1 score and vali loss2 score at the same time so that training process stop so early?
vali loss1 score= -rec_loss + AssDiss
vali loss2 score= -rec_loss - AssDiss

About the additional computation for sigma

can you explain these codes in lines 48-49 that do not appear in the paper, thank you.
especially why do pow computation with exponent 3?

sigma = torch.sigmoid(sigma * 5) + 1e-5
sigma = torch.pow(3, sigma) - 1

关于 UCR 数据集

您好，在KDD2021 Multi-dataset Time Series Anomaly Detection的网站里我未能找到标注文件。方便提供一下吗？感谢。

请问在训练的时候为什么使用 "index:index + self.win_size" 来准备数据集？

在实际应用场景中，为了考虑报警的实时性，没办法取到当前时刻点之后的100个窗口的数据，请问是我理解的有偏差吗？

Dataset processing

Hi everyone,

Tthank you for your interesting job. Does it possible to have more information about the preprocessing? How raw dataset has been processed?

Some issues about baseline and SOTA methods

Many thanks for your excellent work!

I would like to know about the details of the baselines and SOTA method codes. I can not find any official code for the LSTM, LSTM-VAE, BeatGAN, and THOC methods. Can you provide some links and code repo for us to reproduce?

Thanks again! Look forward to your kind reply!!

运行时卡在模型实例化

运行时会在实例化solver卡住，具体是在 self.build_model() -> self.model = AnomalyTransformer(win_size=self.win_size, enc_in=self.input_c, c_out=self.output_c, e_layers=3) -> self.encoder = Encoder(...)
并没有报错，只是运行到这里时就卡住了,不知道是什么原因导致的？
python 3.6, cuda 10.1, pytorch 1.4.0 GPU型号A100

The generation details of the NeurIPS-TS benchmark

Hey, exciting work. I mentioned that you use the NeurIPS-TS benchmark. However, the original paper contains a collection of benchmarks, including real datasets and synthetic datasets. Can you share the detail of how you generated the dataset?

关于 Minimax Strategy 的顺序

关于Minimax Strategy ，我理解的您论文中的表述是需要先 minimize 使 P 靠近 S，再 maximize 使 S 远离 P。但是注意到您代码中似乎是先 maximize 再 minimize。想请问下 Minimax Strategy 的顺序对模型训练有影响吗？谢谢！

loss1 = rec_loss - self.k * series_loss
loss2 = rec_loss + self.k * prior_loss

loss1.backward(retain_graph=True)
loss2.backward()

Why use test_loader instead of validation_loader?

I think you should use validation_loader instead of test_loader in the following self.vali function.

Anomaly-Transformer/solver.py

Line 196 in e11c317

vali_loss1, vali_loss2 = self.vali(self.test_loader)

Issues about anomaly duration and real-time application.

Thanks for your wonderful work!

I have two questions about this work.
(1) In all the datasets, the lengths of most anomaly segments are large, e.g. the average lengths of anomaly segment in MSL, SMD, SMAP and PSM are 216, 90, 816 and 339 respectively. But the window size is set as 100, which indicates that in some windows, most timesteps are anomalous. Will these windows affect the detection performance?
(2) Have you tried to evaluate Anomaly Transformer in the real-time scenario?

Some results inconsistent with the original paper of other methods

Hi ! Thx for your attention.
In your paper, I found some results inconsistent with the original paper of other methods, like "OmniAnomaly" and "InterFusion". Is there some thing different in experiment detail?

Why the optimizer.step() write twice？

Anomaly-Transformer/solver.py

Lines 189 to 192 in bfe075e

    
           loss1.backward(retain_graph=True) 
        
           self.optimizer.step() 
        
           loss2.backward() 
        
           self.optimizer.step()

When the first optimizer.step() execute all the gradient relate to loss1 will update，but some variable in loss2 are common in loss1.So this maybe cause some problem.

questions on validation set and threshold selection algorithm

Hello,

I have a question regarding your work.

How validation set are selected here? from dataloader code, validation set seems to be test set. Did I get the codes right?
Anomaly Transformer's thresholding mechanism.
Is the model using test dataset rather than validation set to set the threshold?
From Appendix H, validation set (which is consisted of normal data only) is used to pick an appropriate threshold.
However, the code seems to utilize test dataset for thresholding.
Did I miss some points?

Thanks a lot for your answer in advance.

About 3090

Hello, looking at other questions, I found that you mentioned that your environment is 3090, but I found that the cuda version supported by 3090 is above 11. How did you solve it?

关于SMAP数据集

你好，在提供的清华镜像SMAP里面没找到SMAP_test_label.npy这个标签数据，方便提供一下吗？感谢。

FileNotFoundError: [Errno 2] No such file or directory: 'dataset/SMAP/SMAP_test_label.npy'

关于评价指标的计算问题

# detection adjustment anomaly_state = False for i in range(len(gt)): if gt[i] == 1 and pred[i] == 1 and not anomaly_state: anomaly_state = True for j in range(i, 0, -1): if gt[j] == 0: break else: if pred[j] == 0: pred[j] = 1 for j in range(i, len(gt)): if gt[j] == 0: break else: if pred[j] == 0: pred[j] = 1 elif gt[i] == 0: anomaly_state = False if anomaly_state: pred[i] = 1
求教这段代码目的是根据测试集标签将模型测试结果进行修正吗？
如果是的话在测试集没有标签时（实际应用中测试集本来就是没有标签的）模型不就失效了吗？

About benchmarks

Hello, "four benchmarks from Tsinghua Cloud" link doesn't seem to open, can you share the download method again, thank you very much!

Error while executing

RuntimeError: Could not infer dtype of numpy.float32
I am getting the above error while enumerating the Data loader from pytorch of the following code.
'for i, (input_data, labels) in enumerate(self.train_loader):'

About NeurIPS-TS dataset

I want to know the details about what you experimented using NeurIPS-TS dataset.
According to the paper, It includes univariate time series which have total length of 50,000.
I couldn't find any information about how to generate or sample that train, test samples.
Could you explain more detailed design of your experiment using this dataset? Is there any future plans to update the related codes?

关于数据集的疑惑

您好，我正在使用这个论文的数据集去评测其它模型。有一个疑问，这里的训练集是否有包含异常段。
我常规的做法是，训练阶段，模型只从正常数据里学习，预测阶段，当重建的LOSS比较大的时候，根据阈值判定异常段。不知道这里的数据集是否和我的常规做法相同。感谢

源码中计算先验关联的问题

你好，请问计算先验关联时，下面的代码的作用是什么？

sigma = torch.sigmoid(sigma * 5) + 1e-5
sigma = torch.pow(3, sigma) - 1

论文中似乎并未提及该操作，还请赐教:smile:

请问在计算Precision，Recall和F1时是否使用point adjust策略

loss2.backward():runtime error

Thanks for the open source code! And there is a problem confuing me.
When i running the code with [python main.py],there is a Pytorch error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 38]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Code error is located in solver.py,line 188
//Minimax strategy
loss1.backward(retain_graph=True)
self.optimizer.step()
loss2.backward()
self.optimizer.step()

when i change the code to
loss1.backward(retain_graph=True)
//self.optimizer.step()
loss2.backward()
self.optimizer.step()

It worked!
I wonder if the first [self.optimizer.step()] should be annotated,if not, how to resolve the error.
Thank you.

Data download website error

I try to download the data by clicking the link in README, but it goes to "Page unavailable". Are there any other ways to download?

关于solver.py中prior_loss、series_loss的计算

prior_loss和series_loss计算用到的变量相同，本来就是这样的还是手误呐？

	sigma = torch.sigmoid(sigma * 5) + 1e-5
	sigma = torch.pow(3, sigma) - 1

	else:
	return np.float32(self.test[
	index // self.step * self.win_size:index // self.step * self.win_size + self.win_size]), np.float32(
	self.test_labels[index // self.step * self.win_size:index // self.step * self.win_size + self.win_size])

	loss1.backward(retain_graph=True)
	self.optimizer.step()
	loss2.backward()
	self.optimizer.step()

thuml / anomaly-transformer Goto Github PK

anomaly-transformer's People

Contributors

Stargazers

Watchers

Forkers

anomaly-transformer's Issues

Recommend Projects

Recommend Topics

Recommend Org