ml4its / mtad-gat-pytorch Goto Github PK

PyTorch implementation of MTAD-GAT (Multivariate Time-Series Anomaly Detection via Graph Attention Networks) by Zhao et. al (2020, https://arxiv.org/abs/2009.02040).

License: MIT License

Python 99.04% Shell 0.96%

2021 anomaly-detection attention deep-learning gnn graph-attention-networks graph-neural-networks mtad-gat pytorch time-series

mtad-gat-pytorch's People

Contributors

Stargazers

Watchers

Forkers

aukarous hoanglam-novobi lisha1992 fancy1573 cslele tkrom lawson-source lethe1hunter bjmlt skchina fengkoushangdezzx uniqueroho kanesp tili4998 ilwoof vivek1240 kangwoolee1234 adhall-synsor shalu-ml zhouhnag jinyang88 liujie40 jimmy-inl tylerchoi1224 smile-zbj vino-ming fortunesuyue jackding9306 harbatt yangmindidemajia superpounch ab93 edwardqin-creator stahengik xjw-wade yangzch lhysgithub littlebai12138 serendipityxin guetye dingfengqian zhuolinli-shu mahadev-hummanagol luciferjason kdyao zccczw ashishkumarchhetrilatentview srigas pavkyr daviddong004 s295103 oudy525i rolandtate ajithpious oswaldxia khadijakhaldi tearcloud giulionenna sunjun626 wangzetx xiong-make canghaiwuya xiashengjie yyexela wryisme pidefrem nogiveup-lwl peterdavison01 zhangjin6

mtad-gat-pytorch's Issues

tqdm is not imported in prediction.py + request

Dear authors,

First of all, I would like to thank you for publicly offering your implementation version of this method from Zhao et. al (2020) on Github. This material greatly helps me in conducting my master thesis research into unsupervised methods for multivariate time series anomaly detection.

I have one problem I have run into, an error while executing your code for the SMAP dataset. training goes well up until predicting and calculating the anomaly scores from line 167 in train.py onwards. Here the predict_anomalies function on line 118 calls the get_score function from prediction.py, which on line 51 returns "NameError: name 'tqdm' is not defined". This is probably due to the lack of the tqdm import statement at the beginning of the prediction.py file.

I hope this feedback is of value to you.

Apart from mentioning this problem, I have another question. Is it possible to load different datasets into your implementation, or is it hard to do this due to hardcoding practices for example? My intention is to load a custom multivariate time series dataset and evaluate the performance of this method. The dataset comprises several .csv files, each having data of one of multiple IoT sensors where columns constitute the multivariate features and rows the time dimension. In order to transform this data to your data format requirements, I will condense these files into one big file where the multivariate sensor features of all sensors are combined in the columns. Could you recommend a way to load this file into your implementation? I was thinking of adapting the preprocess.py file to do so. Perhaps you could add the option to load custom datasets in the future.

In any case, thank you very much for your efforts.

Embedding vector dimension issue in the paper

The dimension of the input data is n×k. After the convolution is completed, the dimension of the data must have changed. However, according to the content of the paper, the dimension has not changed after the convolution is completed and is still n×k. And，then send it to GATs

When computing 'e' in the FeatureAttentionLayer, the output of MTAD_GAT 'predictions', 'recons' are all nan, and training is not possible due to the presence of nan.

Dear authors,

Thank you for uploading this code. I am a beginner in multivariate time series anomaly detection and this has been very helpful in my research. I have read and understood your code, but the output is always nan when training. And i can be sure that the data input is normal.

Therefore, I output the result of each step in forward() in mtad_gat.py. Then, after the feature_gat() layer of operation, there is a problem.

So I step into feature_gat(), after e = torch.matmul(a_input, self.a).squeeze(3) , some nan appears, as shown in the figure. Then after softmax there are more nans, usually one column is nan.

I wonder how to solve this problem? I also tried to adjust batch_size,look_back, but nothing works.

Environment：

linux
cpu/gpu
torch1.10.0 cpu only/torch1.10.0+cu111

about the example output,please!!!

The code is written very well and looks very comfortable. Thank you very much. However, I have a question. I trained using the default args on the SMD dataset, and the results obtained were quite different from the "example output". Is it possible that the training parameters for the two are different? If so, could you provide the parameters for the "example output"? Thank you very much.

Exception: Dataset ".\DATASETS\DATA\SMAP_TRAIN_MD.CSV" not available.

raise Exception(f'Dataset "{dataset}" not available.')
Exception: Dataset ".\DATASETS\DATA\SMAP_TRAIN_MD.CSV" not avaipython.exe .\train.py --dataset .\datasets\data\smap_train_md.csvv) PS C:\Users\dheiver\Desktop\Nova pasta\mtad-gat-pytorch>
{'dataset': '.\DATASETS\DATA\SMAP_TRAIN_MD.CSV', 'group': '1-1', 'lookback': 100, 'normalize': True, 'spec_res': False, 'kernel_size': 7, 'use_gatv2': True, 'feat_gat_embed_dim': None, 'time_gat_embed_dim': None, 'gru_n_layers': 1, 'gru_hid_dim':
150, 'fc_n_layers': 3, 'fc_hid_dim': 150, 'recon_n_layers': 1, 'recon_hid_dim': 150, 'alpha': 0.2, 'epochs': 30, 'val_split': 0.1, 'bs': 256, 'init_lr': 0.001, 'shuffle_dataset': True, 'dropout': 0.3, 'use_cuda': True, 'print_every': 1, 'log_tensorboard': True, 'scale_scores': False, 'use_mov_av': False, 'gamma': 1, 'level': None, 'q': None, 'dynamic_pot': False, 'comment': ''}
Traceback (most recent call last):
File "C:\Users\dheiver\Desktop\Nova pasta\mtad-gat-pytorch\train.py", line 43, in
raise Exception(f'Dataset "{dataset}" not available.')
Exception: Dataset ".\DATASETS\DATA\SMAP_TRAIN_MD.CSV" not available.
(env) PS C:\Users\dheiver\Desktop\Nova pasta\mtad-gat-pytorch>

Some question about the param target_dims

I have the question about this param, as the mtad-gat is a multivariable time series model which uses modules to catch the time dependencyand the feature dependency, if use only one dim to training and testing, it just degenerates into univariate time series model. What's the use of the corresponding module of feature dependency in this situation?

The setting about feat_gat_embed_dim and time_gat_embed_dim

Thanks very much for your code. They are valuable and very helpful. But I have some questions regarding the setting of feat_gat_embed_dim and time_gat_embed_dim. How do you set them? What is the general range for them?

losses are always nan

Hi, hope you are fine.
Thanks for this wonderful work.
I tried training with MSL, and SMD, and my losses are always nan.
Moreover, I also tried GDN repo, and I found that there is a difference in MSL data as compared to this repo.
Thanks for any help.

Regards,
Ali

The reason why use shuffle in time-series data

Hi. Thanks for your wonderful work!

I'm curious about the reason why 'shuffle = True' is default option in this implementation below,
because the data is time-series data.

def create_data_loaders(train_dataset, batch_size, val_split=0.1, shuffle=True, test_dataset=None):

Is there any reason why shuffle the time-series data?
(or even if shuffled data can get the time-oriented features in GAT?)

Reason for using `find_epsilon` in feature-level

Hi. Thanks for your excellent work!
I'm wondering why only epsilon method is used in predicting anomalies at feature-level in below.

mtad-gat-pytorch/prediction.py

Line 142 in 9e671ea

epsilon = find_epsilon(train_feature_anom_scores, reg_level=2)

Can POT method also be used in here, or is there any reason for using only epsilon method in here?
Thanks for your help.

Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

Versions of Ptyhon and Pytorch

The Python and Pytorch versions are which one?
Both are not given in requirements.txt.

out_dim or n_features

mtad-gat-pytorch/mtad_gat.py

Line 62 in 8f907cc

    
           self.recon_model = ReconstructionModel(window_size, gru_hid_dim, recon_hid_dim, out_dim, recon_n_layers, dropout)

self.recon_model = ReconstructionModel(window_size, gru_hid_dim, recon_hid_dim, out_dim, recon_n_layers, dropout)
should the "out_dim" be changed to "n_features" to match the shape of input x while the loss is calculated by MSELoss(recons, x) ??

Multiple inconsistent training results

First of all, thank you very much for your code! This has been very helpful to me! But I want to ask one question: "Taking the 1-1 dataset of SMD as an example, why is it that after multiple training and testing sessions, the f1 index of the test results is different, and the difference is significant! Sometimes f1 can reach 0.7, and sometimes it soars to 0.8. This way, I cannot judge the true performance of the model at all! May I ask why this is? And how should I solve it?"

ValueError: time data '<built-in function id>' does not match format '%d%m%Y_%H%M%S'

Hello!!
I checked the following error in the 'result_visualizer' Jupiter file. Is there a good way to solve this problem?

Thank you!!

Loss and some slice of the output tensors become NAN

When you rerun the train.py after a while the use_bias parameter causing NANs in the outputs of the Attention layers. The situation occures after the Bias is added to the tensor. Do you have any explaination or solution beside to set the use_bias parameter to False?

about gat_layer

Thank you for your excellent work.
I don’t understand something about the gat layer. Your graph_attention is implemented with the function make_attention_input, but it seems that you just copied and spliced x(v) in various ways. I can’t understand how this part implements graph_attention. Can you explain it in detail? ?
In addition, if I want to build a graph that is not fully connected (each node has a fixed number of edges), is this possible?

Computational resource

How many gpus did you use for training ? and how long did it take ?

FC layer out_dim not matching RECOn layer in_dim

I have run into a logic issue at the point where the Forecast layer output goes into the Reconstruction layer.

`gru_n_layers=1
in_dim=150
out_dim = 8 #Output dimension of the FC layer
forecast_n_layers=1
forecast_hid_dim=150
n_layers=1
hid_dim=150
window_size = 20

dropout = 0.0 if n_layers == 1 else dropout
rnn = nn.GRU(in_dim, hid_dim, n_layers, batch_first=True, dropout=dropout)
print(rnn)

fc = nn.Linear(hid_dim, out_dim)
print(fc)

h_final_end = x_
print(h_final_end.shape)
h_final_end_rep = h_final_end.repeat_interleave(window_size, dim=1).view(x_.size(0), window_size, -1)
print(h_final_end_rep.shape)
decoder_out, _ = rnn(h_final_end_rep)
print(decoder_out.shape)
out = fc(decoder_out)
print(out.shape)`

The Forecast layer output dimension is 8 referring to n_features and the the Reconstruction layer input dimension is 150 referring to gru_hid_dim.

running separetly in notebook i got the error:
RuntimeError: input.size(-1) must be equal to input_size. Expected 150, got 8

I must missed something:(

A Question about the implementation.

Thanks for making this repo public. I have some questions after reading your code.

mtad-gat-pytorch/training.py

Line 123 in 8f907cc

recon_loss = torch.sqrt(self.recon_criterion(x, recons))

The paper used a VAE-like method for the reconstructor but you simply used MSE like naive autoencoder. I wonder whether this is because of the stability of optimization. In my case, I tried to sample at every timestep like the LSTM-VAE model and sometimes the loss just became nan.

msl and smap dataset preprocess gives error

(venv) PS C:\Users\hp\PycharmProjects\mtad-gat-pytorch> python preprocess.py --dataset SMAP
SMAP test_label (427617,)
Traceback (most recent call last):
File "C:\Users\hp\PycharmProjects\mtad-gat-pytorch\preprocess.py", line 96, in
load_data(ds)
File "C:\Users\hp\PycharmProjects\mtad-gat-pytorch\preprocess.py", line 89, in load_data
concatenate_and_save(c)
File "C:\Users\hp\PycharmProjects\mtad-gat-pytorch\preprocess.py", line 81, in concatenate_and_save
temp = np.load(path.join(dataset_folder, category, filename + ".npy"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\hp\PycharmProjects\mtad-gat-pytorch\venv\Lib\site-packages\numpy\lib\npyio.py", line 427, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'datasets/data\train\A-1.npy'

Problems encountered in calculating anomaly scores on the SMAP dataset

The problem is shown in the figure below

Such problems only appear on the SMAP dataset

My understanding of mtad _ gat.py does not reflect the ' Multivariate Time-series ' in the title of the paper.

The data x received by the forward ( ) function in mtad _ gat.py is in the shape of ( batch number, window, fearture number ), but on the feature number, only the data of the first column is not 0, and the data of the remaining columns are all 0. Isn 't that splicing all the dimensions of data into a long one-dimensional data ?

Evaluation Code

How can I just evaluate trained model?

Is there any method or function just for evaluation?

about adjust_predicts() ，please！！！

First，thanks for making this repo public, and I have learned a lot from the issues, thanks for your reply.
I have seen many times about that:

for i in range(len(predict)):
    if any(actual[max(i, 0) : i + 1]) and predict[i] and not anomaly_state:
        anomaly_state = True
        anomaly_count += 1
        for j in range(i, 0, -1):
            if not actual[j]:
                break
            else:
                if not predict[j]:
                    predict[j] = True
                    latency += 1
    elif not actual[i]:
        anomaly_state = False
    if anomaly_state:
        predict[i] = True

It's part of the "adjust_predicts", I am very curious, what is the purpose?
And the how does the "latency" work out?

Pot results on the SMD dataset

Thank you very much for your work. May I ask why the F1 score of the SMD dataset on pot results is only around 0.75

some question about model and the result

Hi Axel,
I have some question for the repo

i read the OmniAnomaly code, i found it use 25/55 as the out_dim for the MSL and SMAP, i also open the MSL data, only the first dimension has value, so can i think the MSL and SMAP is just univariate time series, but almost paper say the dataset is multivariate, it's make me confuse.
i use the repo as the baseline for my research, i found you replace the decoder from VAE to GRU, so did you try the original VAE for the decoder , i do some experiment but i can't achieve the result in the original paper, so if you try VAE can achieve the result in original paper

I will appreciate it if you can reply as soon as possible.

FileNotFoundError: [Errno 2] No such file or directory: 'datasets/ServerMachineDataset/processed\\machine-1-1_train.pkl'

FileNotFoundError: [Errno 2] No such file or directory: 'datasets/ServerMachineDataset/processed\machine-1-1_train.pkl'

About data cleaning

Thank you for your wonderful work! I have a problem that I don't seem to see the part about data cleaning in this repo, i.e. the part of spectral residuals and replacing abnormal data, have you implemented it?

The parameter of `adjust_predicts()`

Thank you for your excellent work!
I don't understand the adjust_predicts() function when I was reading the source code.

In the adjust_predicts() function, the comment indicates that

threshold (float): The threshold of anomaly score. A point is labeled as "anomaly" if its score is [[lower than]] the threshold.

But when you preprocess the data, in the preprocess.py file

for anomaly in anomalies:
      label[anomaly[0] : anomaly[1] + 1] = True

My question is, why do you consider a point is labeled as "anomaly" if its score lower than the threshold in the adjust_predicts() function when you set the Label of the anomaly to True during data preprocessing? In my opinion the anomaly point which score is higher than the threshold.

Thanks for your time. Hope you can answer my question.

attention layer

h = self.sigmoid(torch.matmul(attention, x))

I found the attention alpha are multiplicated with original x instead of W * x, which is not same to the graph convolution network. Could you show me the reason? Thanks very much!

The issue with the dataset

Hello! Thank you very much for your work. I have a question. If I'm working with multivariate time series data, for example, 800 timestamps with 10 features each, do I only need one .npy file? I would greatly appreciate it if you could answer my question.

Running repo on custom data

Hi! Just wondering if there is a way I could run this architecture on a custom dataset? because I can run on the datasets provided in the README file but I would like to check how this mdoel works on my own custom multi-time series set

Thank you in advance

The purpose of `adjust_anomaly_scores`

Thanks for your briliiant work. I would like to know the purpose of adjust_anomaly_scores. Thanks for your time.

 # Remove errors for time steps when transition to new channel (as this will be impossible for model to predict)
    if dataset.upper() not in ['SMAP', 'MSL']:
        return scores

    adjusted_scores = scores.copy()
    if is_train:
        md = pd.read_csv(f'./datasets/data/{dataset.lower()}_train_md.csv')
    else:
        md = pd.read_csv('./datasets/data/labeled_anomalies.csv')
        md = md[md['spacecraft'] == dataset.upper()]

    md = md[md['chan_id'] != 'P-2']

    # Sort values by channel
    md = md.sort_values(by=['chan_id'])

    # Getting the cumulative start index for each channel
    sep_cuma = np.cumsum(md['num_values'].values) - lookback
......................