The informer2020 from zhouhaoyi

Model saved under `checkpoints` cannot be accessed with file browser in Colab example

After trainning, I can't download checkpoint.pth via file browser panel.
This is a problem related to Google Colab, which requires using name other than checkpoints. refer: googlecolab/colabtools#621
However, this problem may not affect the execution of the example.

RuntimeError when e_layers>3 ?

感谢您的分享。

当我尝试e_layers=4或者更大的时候，训练总会出现错误，无法进行。调用exp.train(settings).

e_layers=3或者更少，都正常。不明白为什么？

RuntimeError Traceback (most recent call last)
in
13 # train
14 print('>>>>>>>start training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
---> 15 exp.train(setting)
16
17 # test

~/max/Informer2020/exp/exp_informer.py in train(self, setting)
169 # encoder - decoder
170 if self.args.output_attention:
--> 171 outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
172 else:
173 outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/model.py in forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, enc_self_mask, dec_self_mask, dec_enc_mask)
145 enc_self_mask=None, dec_self_mask=None, dec_enc_mask=None):
146 enc_out = self.enc_embedding(x_enc, x_mark_enc)
--> 147 enc_out, attns = self.encoder(enc_out, attn_mask=enc_self_mask)
148
149 dec_out = self.dec_embedding(x_dec, x_mark_dec)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/encoder.py in forward(self, x, attn_mask)
94 inp_len = inp_len//2
95 continue
---> 96 x, attn = encoder(x[:, -inp_len:, :])
97 x_stack.append(x); attns.append(attn)
98 inp_len = inp_len//2

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/encoder.py in forward(self, x, attn_mask)
65 if self.conv_layers is not None:
66 for attn_layer, conv_layer in zip(self.attn_layers, self.conv_layers):
---> 67 x, attn = attn_layer(x, attn_mask=attn_mask)
68 x = conv_layer(x)
69 attns.append(attn)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/encoder.py in forward(self, x, attn_mask)
43 new_x, attn = self.attention(
44 x, x, x,
---> 45 attn_mask = attn_mask
46 )
47 x = x + self.dropout(new_x)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/attn.py in forward(self, queries, keys, values, attn_mask)
151 keys,
152 values,
--> 153 attn_mask
154 )
155 out = out.view(B, L, -1)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/attn.py in forward(self, queries, keys, values, attn_mask)
109 u = self.factor * np.ceil(np.log(L_Q)).astype('int').item() # c*ln(L_q)
110
--> 111 scores_top, index = self._prob_QK(queries, keys, sample_k=U_part, n_top=u)
112
113 # add scale factor

~/max/Informer2020/models/attn.py in _prob_QK(self, Q, K, sample_k, n_top)
58 # find the Top_k query with sparisty measurement
59 M = Q_K_sample.max(-1)[0] - torch.div(Q_K_sample.sum(-1), L_K)
---> 60 M_top = M.topk(n_top, sorted=False)[1]
61
62 # use the reduced Q to calculate Q_K

RuntimeError: invalid argument 5: k not in range for dimension at /pytorch/aten/src/THC/generic/THCTensorTopK.cu:26

OOM后继续训练

首先感谢您的分享。

我在训练你的模型时，使用了不同的参数和数据集。有时候训练到一半，会出现OOM，不知道能不能接着上一个checkpoint，继续train这个model，不用重新开始？

我调用train(settings)会重新开始训练。

About ECL dataset in your paper

Congratulations on the best paper award!

We collected the ECL data set and found that it contained a total of 370 clients. However, in the experimental part of the paper, you said "It collects the electricity consumption (Kwh) of 321 clients" and "set ‘MT 320’ as the target value". How do you determine the target value? Can you provide the data of these 321 clients? Thank you。

EarlyStopping's occurence

Hello,

I am currently trying to run your code to see how it works, but every time the code terminates too soon based on EarlyStopping. The result MSE and MAE were also quite off compared with the results shown here. I have had no involvement with almost any programming-related things for a long time so my knowledge is too limited to solve the problem myself. With that being said, I did try to set the EarlyStopping patience to 100 instead, but the code still ended on its own despite saying that EarlyStopping counter is at 3 out of 100. Also, at the start, the code would show that Use GPU: cuda: 0, which made me concerned that if the training was done on CPU at first, but when I checked with the Task Manager the GPU use was at almost 100%, so I believed it was fine, but the fact that the code terminates itself too early every time still makes me wonder if it is using the GPU properly. It would be great if you could provide me some help on this.

In case if any information on specs are needed:
OS: Windows Server 2019 64-bit
Processor: Intel Xeon CPU @ 2.20GHz 2.20GHz
Memory: 30GB
GPU: Nvidia Tesla V100

Thank you in advance. Let me know if there is any additional information you need.

关于 EncoderStack 每个stack的初始输入

你好我想请问一下论文中每个stack的输入是取上一个stack输入的后半段对吗？还是取上一个stack输出的一部分？因为我看代码里初始输入x貌似会被输出覆盖掉。

Informer2020/models/encoder.py

Line 96 in c648cc6

x, attn = encoder(x[:, -inp_len:, :])

Why is the data not inverse normalized when the model is tested

参数label_len的作用是什么？

How use the network to predict future data.

I don't understand how to use the net to predict future data, there’s no example code to predict unseen data, I want to pass X previous values and predict the next values.

Poor documentation on how to predict on new data

I trained the model but I have no idea how to use it to predict new data, I would like to pass a csv with two columns: "dates" and "past values" to predict the next values using the model, but from what I understand it is not possible to do this without entering in the source code. The colab example also does not provide any specific code or comment on how to do this.

By the way, what exactly is "batch_x", "batch_y, "batch_x_mark, "batch_y_mark"? Which ones I have to pass to the informer to predict the next values?

BUG：Wrong att-mask in decoder

A BUG when creating model:

Informer2020/models/model.py

Lines 50 to 53 in a87092b

    
           AttentionLayer(FullAttention(True, factor, attention_dropout=dropout, output_attention=False),  
        
                       d_model, n_heads), 
        
           AttentionLayer(FullAttention(False, factor, attention_dropout=dropout, output_attention=False),  
        
                       d_model, n_heads),

The BUG lead the cross-att in decoder using NO casual mask, while self-att using casual mask. Fortunately there is no information leak in informer, but still totally different with what you wrote in the paper.

A question about scaling data.

In your model code, I find you used data scaler for train\val\test dataset separately. However, I think you probably use future information during validating and testing process. Because, during the online prediction, we can't get the whole data in advance. In addition, I didn't find inverse transformation, which is important to show the real model performance for testing dataset. Can you give more information as to how to deal with data scaling and inverse transformation? Thanks.

args.data in Colab Example

Thank you for the excellent work! and sorry to bother you again. 😃
Since the Colab Example is used for custom data, why should we input the data-name "ETTh1"?
And I found the code d_inp = 4 if data=='ETTh' else 5 in models/embed.py. Does it mean ETTh for (Y, M, W, D) while ETTm needs additional Minute? But this is unfriendly for the custom data, at least a little bit confusing.
Also, I found you add the freq to set the dimensionality of the timestamp, isn't it enough to get the time features?

What I want to say is that I don't understand why parameter data should be sent into the model and DataEmbed Class.
Data is independent of model in my opinion.

模型测试结果

我尝试着用script里的参数去调试数据，例如
python -u main_informer.py --model informer --data ETTh2 --features S --seq_len 336 --label_len 336 --pred_len 168 --e_layers 2 --d_layers 1 --attn prob --des 'Exp' --itr 5
测试结果是mse:0.25672146677970886, mae:0.43616965413093567
和您更新的结果差的不多

但是图像的预测结果和真实的值相差的很多。请问是我有漏调的参数还是结果就是这样的。

Informer.ipynb - Colaboratory.pdf

非常感谢。

decoder without prob attention

论文整体的模型结构图里decoder部分是有prob attention的，但是看代码实现都是使用的full attention
DecoderLayer(
AttentionLayer(FullAttention(True, factor, attention_dropout=dropout, output_attention=False),
d_model, n_heads),
AttentionLayer(FullAttention(False, factor, attention_dropout=dropout, output_attention=False)

想问下为什么这么做呢

希望支持将模型转换为torchscript格式

希望能够支持将模型转换为torchscript格式，便于得到更广泛的应用。
我希望能够在Java App中调用训练好的Informer模型, 在了解了JDL库后得知需要先将pth模型转换为torchscript格式
我尝试使用如下代码进行转换

Exp = Exp_Informer
exp = Exp(args) # 使用训练时相同的参数初始化模型
pthfile = './checkpoints/test/checkpoint.pth'

examples = exp.trace() # 为Informer类新增一个方法以便获取forward()所需的参数, 在此例中返回值是一个tuple()
model = exp.model # 获取模型并加载
model.load_state_dict(torch.load(pthfile))

# 尝试推理并转换
traced_script_module = torch.jit.trace(model, examples)
traced_script_module.save("./traced_model.pt")

我得到了如下错误,
File "E:\pythonspace\deep_learning\Informer2020\models\attn.py", line 110, in forward
U_part = self.factor * np.ceil(np.log(L_K)).astype('int').item() # c*ln(L_k)
AttributeError: 'Tensor' object has no attribute 'astype'

https://zhuanlan.zhihu.com/p/146453159
我猜测这是因为使用了numpy 中的np.ceil(), np.log()函数导致的, 我尝试将其替换为torch对应的函数但仍不奏效

# U_part = self.factor * np.ceil(np.log(L_K)).astype('int').item() # 转换中应尽量避免使用np [https://zhuanlan.zhihu.com/p/146453159] 
U_part = self.factor * torch.ceil(torch.log(L_K)).int() # 尝试替换为torch对应的写法

这样改之后错误变成了这样:
torch.jit._trace.TracingCheckError: Tracing failed sanity checks!
encountered an exception while running the Python function with test inputs.
Exception:
log(): argument 'input' (position 1) must be Tensor, not int

希望大佬能指点一下，这里如果想不用np的话应该怎么修改，万分感谢❀❀❀

the weather dataset

The weather dataset's link on your paper has been unavailable for sometime. Is there other ways to download the data or is it possible for you to provide the weather data on github? Thanks

Sample (stochastically) from the model?

Hi,

I was wondering if it is possible to sample from the Informer model?

To be more specific:
During decoding in classic Transformer, each token follows a categorical distribution (i.e. the softmax) and is sampled/generated one by one. I understand that it's an advantage of Informer that this sequential decoding is not needed anymore.
But does this mean that one cannot get diverse samples from the Informer model?
If not, how would one go about generating those samples?

Thanks!

In oder to change the order of dimensions, why is torch.tensor.view used??

Hallo,

thank you for making the code public.

In the code (forward in ProbAttention), you used the tensor.view to change the dimensions.

queries = queries.view(B, H, L_Q, -1)
keys = keys.view(B, H, L_K, -1)
values = values.view(B, H, L_K, -1)

Why do not use permute？Dose View function not break the relationship between heads?
Is the goal of using tensor.view here to change the order of dimensions? If it is, why tensor.view.?

Concatenating Feature maps from encoder

Hi,

I really appreciate you helping others to understand the code by proactively replying to the questions.

While going through the paper and the figure in the paper, it is mentioned that the output from the encoder is a concatenated feature map from each stack.

"we concatenate all the stacks’ outputs and have the final hidden representation of encoder."

But in the code, it seems like only the final encoder representation is used as an input to the decoder (without concatenating final encoder representation with lower level embedding representations.)

Could you please clarify why do I notice this discrepancy?

Thanks in advance.

Dataloaders for ECL and Weather

Hello everyone,
first of all, thank you for your amazing paper!
We are currently trying to reproduce your results and want to run the experiments on the weather and ECL datasets. Would it be possible for you to also publish the two data loaders you used for those particular datasets to keep the preprocessing consistent.
Thanks a lot in advance!

about "circular" padding in conv op

Hello, just wondering why "circular" padding is chosen for all the conv operations, including the projection of token embeddings?

Reproducing the results for ETTh2

Hello,

Thanks a lot for the publishing your results and code, I enjoyed reading the paper.
While trying to reproduce the paper results, the output was way off especially for the ETTh2 dataset. (Ran it with the same configuration in the Colab notebook)

testing : informer_ETTh2_ftM_sl96_ll48_pl24_dm512_nh8_el3_dl2_df512_atprob_ebtimeF_dtTrue_exp_0<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
test 2857
test shape: (89, 32, 24, 7) (89, 32, 24, 7)
test shape: (2848, 24, 7) (2848, 24, 7)
mse:0.8689931831590327, mae:0.7622690107174594

Could you please let me know if you use a different hyper parameter? or what am i doing wrong .

Thanks in advance.

Regards,
Kiran

results

运行程序后results文件夹中的trues.npy是三维数据，和原二维数据什么关系呢，还有预测结果preds.npy也是三维的，为什么不是像预测目标‘OT’那样是一列呢，不太理解，请多多指教。

There are many depencies issues please provide a working google colab notebook possible for further contribution

A question about "attn.py"?

   At “attn.py” "90 - 95 line" 
   "B, L, H, D = queries.shape
    _, S, _, _ = keys.shape

    queries = queries.view(B, H, L, -1)
    keys = keys.view(B, H, S, -1)
    values = values.view(B, H, S, -1)"

    why it use "view" but not "transpose"?

关于double数据类型的问题

pytorch默认的是float，速度更快一般精度相对double也差不多，为何代码里都是数据和模型都是double双精度的呢？有其他的考虑吗？谢谢~

About the ECL dataset

Thanks for the code! I noticed that you mentioned in another issue that the ECL dataset used the Informer paper is the same as https://github.com/laiguokun/multivariate-time-series-data. Could you help to specify which column in the electricity dataset corresponds to the target "MT_320"? Thanks!

Is it possible to combine ETTh1 and ETTh2 data to predict them simultaneously

Is it possible to combine ETTh1 and ETTh2 data to predict them simultaneously? so the output shape would be something like (batch_size, prediction_length, feature, 2), where 2 is ETTh1 and ETTh2.

IndexError when using 'learned' or 'fixed' in args.embed

args.model = 'informerstack' # model of experiment, options: [informer, informerstack, informerlight(TBD)]

args.data = 'custom' # data
args.root_path = './' # root path of data file
args.data_path = 'test.csv' # data file
args.features = 'S' # forecasting task, options:[M, S, MS(TBD)]; M:multivariate predict multivariate, S:univariate predict univariate, MS:multivariate predict univariate
args.target = 'target' # target feature in S or MS task
args.freq = 't' # freq for time features encoding

args.seq_len = 128 # input sequence length of Informer encoder
args.label_len = 96 # start token length of Informer decoder
args.pred_len = 15 # prediction sequence length

args.enc_in = 1 # encoder input size number of features in input
args.dec_in = 1 # decoder input size number of features
args.c_out = 7 # output size output dimension before FN
args.factor = 5 # probsparse attn factor
args.d_model = 512 # dimension of model
args.n_heads = 8 # num of heads
args.e_layers = 3 # num of encoder layers
args.d_layers = 2 # num of decoder layers
args.d_ff = 512 # dimension of fcn in model
args.dropout = 0.05 # dropout
args.attn = 'full' # attention used in encoder, options:[prob, full]
args.embed = 'fixed' # time features encoding, options:[timeF, fixed, learned]
args.activation = 'gelu' # activation
args.distil = True # whether to use distilling in encoder
args.output_attention = False # whether to output attention in ecoder

args.batch_size = 64
args.learning_rate = 0.0001 ## 0.0001
args.loss = 'mse'
args.lradj = 'type1'

args.num_workers = 0
args.itr = 1
args.train_epochs = 6
args.patience = 3
args.des = 'exp'

我用以上参数训练，结果报错：

IndexError Traceback (most recent call last)
in
9 # train
10 print('>>>>>>>start training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
---> 11 exp.train(setting)
12
13 # test

~/max/Informer2020/exp/exp_informer.py in train(self, setting)
169 # encoder - decoder
170 if self.args.output_attention:
--> 171 outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
172 else:
173 outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/model.py in forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, enc_self_mask, dec_self_mask, dec_enc_mask)
144 def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec,
145 enc_self_mask=None, dec_self_mask=None, dec_enc_mask=None):
--> 146 enc_out = self.enc_embedding(x_enc, x_mark_enc)
147 enc_out, attns = self.encoder(enc_out, attn_mask=enc_self_mask)
148

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/embed.py in forward(self, x, x_mark)
105
106 def forward(self, x, x_mark):
--> 107 x = self.value_embedding(x) + self.position_embedding(x) + self.temporal_embedding(x_mark)
108
109 return self.dropout(x)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/embed.py in forward(self, x)
75 x = x.long()
76
---> 77 minute_x = self.minute_embed(x[:,:,4]) if hasattr(self, 'minute_embed') else 0.
78 hour_x = self.hour_embed(x[:,:,3])
79 weekday_x = self.weekday_embed(x[:,:,2])

IndexError: index 4 is out of bounds for dimension 2 with size 4

如果使用'timeF'，则一切正常。

Inconsistent with the original paper?

Hello:

Thank you for your work. During my study I find it confusing that in your code (line 68 in atten.py)

V_sum = V.sum(dim=-2)

you directly sum up V for those flatten attention matrix lines. However, in your original paper you use the average value of V, which is more reasonable for me. Which way is correct？

Thank you!

ValueError: could not convert string to float:

Hi
Same script that was working this morning now gives me :

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-8eee1c4218f9> in <module>()
     14     target=args.target,
     15     timeenc=timeenc,
---> 16     freq=args.freq
     17 )
     18 data_loader = DataLoader(

5 frames
/content/Informer2020/data/data_loader.py in __init__(self, root_path, flag, size, features, data_path, target, scale, timeenc, freq)
    216         self.root_path = root_path
    217         self.data_path = data_path
--> 218         self.__read_data__()
    219 
    220     def __read_data__(self):

/content/Informer2020/data/data_loader.py in __read_data__(self)
    244         if self.scale:
    245             train_data = df_data[border1s[0]:border2s[0]]
--> 246             self.scaler.fit(train_data.values)
    247             data = self.scaler.transform(df_data.values)
    248         else:

/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_data.py in fit(self, X, y)
    667         # Reset internal state before fitting
    668         self._reset()
--> 669         return self.partial_fit(X, y)
    670 
    671     def partial_fit(self, X, y=None):

/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_data.py in partial_fit(self, X, y)
    698         X = check_array(X, accept_sparse=('csr', 'csc'),
    699                         estimator=self, dtype=FLOAT_DTYPES,
--> 700                         force_all_finite='allow-nan')
    701 
    702         # Even in the case of `with_mean=False`, we update the mean anyway

/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    529                     array = array.astype(dtype, casting="unsafe", copy=False)
    530                 else:
--> 531                     array = np.asarray(array, order=order, dtype=dtype)
    532             except ComplexWarning:
    533                 raise ValueError("Complex data not supported\n"

/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

ValueError: could not convert string to float:

Note : args.freq = 'h' # freq for time features encoding

error:report

When I used 'informerstack' instead of informer in command below:
python -u main_informer.py --model informerstack --data ETTm1 --attn prob --freq t

an error occured:
(informer) lizhaorui@server-System-Product-Name:~/DL/Informer2020$ python -u main_informer.py --model informerstack --data ETTh1 --attn prob --freq h
Args in experiment:
Namespace(activation='gelu', attn='prob', batch_size=32, c_out=7, checkpoints='./checkpoints/', d_ff=2048, d_layers=1, d_model=512, data='ETTh1', data_path='ETTh1.csv', dec_in=7, des='test', device_ids=[0, 1], devices='0,1', distil=True, dropout=0.05, dvices='0,1', e_layers=2, embed='timeF', enc_in=7, factor=5, features='M', freq='h', gpu=0, itr=2, label_len=48, learning_rate=0.0001, loss='mse', lradj='type1', model='informerstack', n_heads=8, num_workers=0, output_attention=False, patience=3, pred_len=48, root_path='./data/ETT/', seq_len=512, target='OT', train_epochs=6, use_amp=False, use_gpu=True, use_multi_gpu=True)
Use GPU: cuda:0

start training : informerstack_ETTh1_ftM_sl512_ll48_pl48_dm512_nh8_el2_dl1_df2048_atprob_fc5_ebtimeF_dtTrue_test_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8081
val 2833
test 2833
Traceback (most recent call last):
File "main_informer.py", line 91, in
exp.train(setting)
File "/home/lizhaorui/DL/Informer2020/exp/exp_informer.py", line 199, in train
outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lizhaorui/DL/Informer2020/models/model.py", line 147, in forward
enc_out, attns = self.encoder(enc_out, attn_mask=enc_self_mask)
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lizhaorui/DL/Informer2020/models/encoder.py", line 99, in forward
x_stack = torch.cat(x_stack, -2)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:5925 [kernel]
CUDA: registered at /pytorch/build/aten/src/ATen/RegisterCUDA.cpp:7100 [kernel]
QuantizedCPU: registered at /pytorch/build/aten/src/ATen/RegisterQuantizedCPU.cpp:641 [kernel]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradCPU: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradCUDA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradXLA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradNestedTensor: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradPrivateUse1: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradPrivateUse2: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradPrivateUse3: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
Tracer: registered at /pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:10525 [kernel]
Autocast: registered at /pytorch/aten/src/ATen/autocast_mode.cpp:254 [kernel]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

没有用大量的数据进行pretrain？

question about the test part result

hi zhouhaoyi,
I am just a machinelearning beginner.I watch your vedio about informer from bilibli. I am curious about the informer so I come here to see the code.
I download the code and run it .but I was confused about the result .
My understanding about the test results represent the 'OT' values .but I just got the values is like this.
-0.22336945 0.01950708 -0.08188418 -0.00959123 -0.34219092 0.19225414 -0.40179282
but the real values is like below.
10.114 3.55 6.183 1.564 3.716 1.462 9.567
I am confused about this .Can you help me ?What's the reason?

About the inconsistance of ProbSparse self-attention implementation

Thanks for your great work first.

I find in the code att.py line 111:
scores_top, index = self._prob_QK(queries, keys, u, U)
it seems that U is set to n_top and it should be u in the paper?

and U is set to cln(n) instead of mln(n) in line 108:
U = self.factor * np.ceil(np.log(S)).astype('int').item()

I don't know if I misunderstand this and hope for your reply.

Can't use multi-gpu (Colab example)

Hi, Thanks for your great work.

I just tried to run through provided Colab example using multi-gpu (0,1,2,3).

By changing use_multi_gpu from False to True, I got AssertionError: Invalid device id.

Does that mean Colab refuses to offer 4 GPUs or there are just some bugs?

By the way, I want to make sure whether my problem can be solved by the Informer model.

Let's say my custom dataset has 1 column of date and N columns of features and 1 target column that is numerical, and the goal is to predict the value of the target column using other columns in the same row.

Does this kind of problem called ''Multivariate predict Univariate'' so I choose 'MS' in args.features?

Thanks again.

For custom dataset

Congratulations on the best paper award!
I'm new to the transformers with time-series data and just found this cool paper!
It will be great if you could provide an example to show how to use the custom data.
Thanks in advance!

on running informer.py file

`Use GPU: cuda:0

start training : informer_ETTh1_ftM_sl96_ll48_pl24_dm512_nh8_el3_dl2_df1024_atprob_ebfixed_test_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8521
val 2857
test 2857
Traceback (most recent call last):
File "main_informer.py", line 69, in
exp.train(setting)
File "/content/Informer2020/exp/exp_informer.py", line 157, in train
outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/Informer2020/models/model.py", line 71, in forward
dec_out = self.decoder(dec_out, enc_out, x_mask=dec_self_mask, cross_mask=dec_enc_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/Informer2020/models/decoder.py", line 46, in forward
x = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/Informer2020/models/decoder.py", line 23, in forward
attn_mask=x_mask
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/Informer2020/models/attn.py", line 141, in forward
attn_mask
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/Informer2020/models/attn.py", line 25, in forward
attn_mask = TriangularCausalMask(B, L, device=queries.device)
File "/content/Informer2020/utils/masking.py", line 7, in init
self._mask = torch.triu(torch.ones(mask_shape, dtype=torch.bool), diagonal=1).to(device)
RuntimeError: "triu" not implemented for 'Bool'`

perf test

First of all, I like the idea in calcing the attention matrix and it should work.

I then did a small performance test and it seemed the result didn't add up... appreciated if anyone could shed some lights on this.

The code snippet is provided as below. I basically removed anything else except the attention layer, and changed the layer depth to 5.

=================RESULTS=======================
When using full attention: 0.008 sec per forward and 18.8 GRAM
When using prob attention: 0.21 sec per forward and >20 GRAM

=================ENV===========================
UBUNTU: 20.04
NVIDIA DRIVER: 460.39
CUDA: 11.03
PYTORCH: 1.7.1
GPU: RTX3090
=================CODE==========================

class InformerSpeedTest(nn.Module):
    def __init__(self, enc_in, dec_in, c_out, seq_len, label_len, out_len,
                factor=5, d_model=512, n_heads=8, e_layers=5, d_layers=2, d_ff=512,
                dropout=0.0, attn='prob', embed='fixed', data='ETTh', activation='gelu', 
                device=torch.device('cuda:0')):
        super(InformerSpeedTest, self).__init__()
        self.attn = attn

        # Attention
        Attn = ProbAttention if attn=='prob' else FullAttention
        # Encoder
        self.encoder = Encoder(
            [
                EncoderLayer(
                    AttentionLayer(Attn(False, factor, attention_dropout=dropout), 
                                d_model, n_heads),
                    d_model,
                    d_ff,
                    dropout=dropout,
                    activation=activation
                ) for l in range(e_layers)
            ]
        )

    def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, enc_self_mask=None, dec_self_mask=None, dec_enc_mask=None):
        enc_out = self.encoder(x_enc, attn_mask=enc_self_mask)
        return enc_out

if __name__ == "__main__":
    from tqdm import tqdm
    import time
    parser = argparse.ArgumentParser()
    parser.add_argument("--prob", action="store_true")
    args = parser.parse_args()
    rount = 50
    batch_size = 128
    length = 512
    d_model = 512
    device = "cuda:0"
    attn = None

    if args.prob:
        print("Use prob")
        attn = "prob"
    else:
        print("Use full")

    test = InformerSpeedTest(None, None, None, None, None, None, attn=attn).to(device)
    test.train() #test.eval()

    for i in range(rount):
        print(f"Round: {i}")
        x = torch.randn(batch_size, length, d_model).to(device)
        s = time.time()
        test(x, None, None, None)
        print(f"Cost: {time.time() - s:.6f}s")

如何对时间间隔为1s的自定义序列数据进行预测

感谢大佬们做了如此惊艳的工作！

我想问一下，我的自定义数据集的序列时间间隔是1s, 我看到freq选项好像仅支持到分钟，我要怎么修改才能用于对间隔为秒的序列进行预测呢。

另外我想问一下，预测时，输入的序列可以是不定长度的吗，比如预测时，不论是输入前60秒还是前40秒的轨迹，都固定预测后续60秒的轨迹
提前感谢大佬 :)

Colab error on ETTm1

Hi,

I was trying to run the model on the provided Colab with the ETTm1 dataset.
And I run into the error RuntimeError: mat1 dim 1 must match mat2 dim 0:

/content/Informer2020/exp/exp_informer.py in train(self, setting)
    171                     outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
    172                 else:
--> 173                     outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
    174 
    175                 f_dim = -1 if self.args.features=='MS' else 0

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/content/Informer2020/models/model.py in forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, enc_self_mask, dec_self_mask, dec_enc_mask)
     67     def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, 
     68                 enc_self_mask=None, dec_self_mask=None, dec_enc_mask=None):
---> 69         enc_out = self.enc_embedding(x_enc, x_mark_enc)
     70         enc_out, attns = self.encoder(enc_out, attn_mask=enc_self_mask)
     71 

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/content/Informer2020/models/embed.py in forward(self, x, x_mark)
    105 
    106     def forward(self, x, x_mark):
--> 107         x = self.value_embedding(x) + self.position_embedding(x) + self.temporal_embedding(x_mark)
    108 
    109         return self.dropout(x)

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/content/Informer2020/models/embed.py in forward(self, x)
     92 
     93     def forward(self, x):
---> 94         return self.embed(x)
     95 
     96 class DataEmbedding(nn.Module):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py in forward(self, input)
     91 
     92     def forward(self, input: Tensor) -> Tensor:
---> 93         return F.linear(input, self.weight, self.bias)
     94 
     95     def extra_repr(self) -> str:

/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
   1690         ret = torch.addmm(bias, input, weight.t())
   1691     else:
-> 1692         output = input.matmul(weight.t())
   1693         if bias is not None:
   1694             output += bias

RuntimeError: mat1 dim 1 must match mat2 dim 0

All I changed was:

args.data = 'ETTm1' # data

Am I missing anything in the configuration?

Error of tensor dimension

After executing the following command:
python -u main_informer.py --model informer --data ETTh1

with your downloaded data, I get the following error:
Use GPU: cuda:0

start training : informer_ETTh1_ftM_sl96_ll48_pl24_dm512_nh8_el3_dl2_df1024_atprob_ebfixed_test_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8521
val 2857
test 2857
Traceback (most recent call last):
File "main_informer.py", line 69, in
exp.train(setting)
File "C:\Users\User\Informer2020\exp\exp_informer.py", line 157, in train
outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
File "C:\Users\User\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\User\Informer2020\models\model.py", line 67, in forward
enc_out = self.enc_embedding(x_enc, x_mark_enc)
File "C:\Users\User\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\User\Informer2020\models\embed.py", line 95, in forward
x = self.value_embedding(x) + self.position_embedding(x) + self.temporal_embedding(x_mark)
RuntimeError: The size of tensor a (98) must match the size of tensor b (96) at non-singleton dimension 1

about the distilling process

From your paper I read "To enhance the robustness of the distilling operation, we build halving replicas of the main stackand progressively decrease the number of self-attention dis-tilling layers by dropping one layer at a time, like a pyramidin Fig.(3), such that their output dimension is aligned. Thus,we concatenate all the stacks’ outputs and have the final hid-den representation of encoder." Somehow I failed to locate the corresponding code sections (also as black-boxed in pic below), though I do notice maxpool here which I believe is part of the upper portion op:

Informer2020/models/encoder.py

Line 15 in 956ac31

self.maxPool = nn.MaxPool1d(kernel_size=3, stride=2, padding=1)

Any clue?

关于算法复杂度

attn.py 48行：

K_expand = K.unsqueeze(-3).expand(B, H, S, L, E)

这一步空间复杂度有点高，超过了原始的transformer ，是不是有更好的实现方式呢？

muliti-gpu training support

Can this model support multi-gpu training to speed up training cycles?

Eq3

Hi,

Thanks for the nice work. I have one question about eq3. To achieve sparsity, the proposed Probsparse attends only to the top-u queries. I am a little bit confused about this design. From my side, it's more rational to attend to partial keys for each query. Could you please elaborate more on this design? Thanks.

关于模型训练的问题

你好，我在使用程序训练模型时遇到一个情况比较疑惑。虽然设置了训练6个epochs，不过基本上都是只训练了1个epochs，模型的val loss和test loss就无法继续降低了（最多训练2个epochs），最后训练的结果也不算特别好。

我尝试了几个不同的数据集ETTh1，ETTm1和另外两个天气的数据集。参数使用的基本都是默认参数，程序基本上也没有太多改动。

我感觉有可能是哪里我没有做对，希望可以得到一些优化建议，谢谢。

About the best result's hyperparameters

Hi.
Thank you for your amazing paper.
I think your thoughts and this technical structure is so awesome.

I am doing replication your paper, but I have some question.
The prediction(len=336) of Informer's result is very interesting below.
So I want to know below result's more detail hyperparameters.
eg.) seq_len, label_len, pred_len, e_layers, d_layers and other parameters everything.

Can you tell me about the specific hyperparameters?
How can we use this forecasitng system to anomaly detecion?
(I just want to know your thoughts.)

Thank you. 非常感谢。

Little problem in utils/masking.py

Line 15 in utils/masking.py should be dtype=torch.bool, not dytpe=torch.bool. ^-^

	AttentionLayer(FullAttention(True, factor, attention_dropout=dropout, output_attention=False),
	d_model, n_heads),
	AttentionLayer(FullAttention(False, factor, attention_dropout=dropout, output_attention=False),
	d_model, n_heads),

zhouhaoyi / informer2020 Goto Github PK

informer2020's People

Contributors

Stargazers

Watchers

Forkers

informer2020's Issues

Recommend Projects

Recommend Topics

Recommend Org