zhouhaoyi / informer2020 Goto Github PK
View Code? Open in Web Editor NEWThe GitHub repository for the paper "Informer" accepted by AAAI 2021.
License: Apache License 2.0
The GitHub repository for the paper "Informer" accepted by AAAI 2021.
License: Apache License 2.0
After trainning, I can't download checkpoint.pth via file browser panel.
This is a problem related to Google Colab, which requires using name other than checkpoints
. refer: googlecolab/colabtools#621
However, this problem may not affect the execution of the example.
感谢您的分享。
当我尝试e_layers=4或者更大的时候,训练总会出现错误,无法进行。调用exp.train(settings).
e_layers=3或者更少,都正常。不明白为什么?
RuntimeError Traceback (most recent call last)
in
13 # train
14 print('>>>>>>>start training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
---> 15 exp.train(setting)
16
17 # test
~/max/Informer2020/exp/exp_informer.py in train(self, setting)
169 # encoder - decoder
170 if self.args.output_attention:
--> 171 outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
172 else:
173 outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
~/max/Informer2020/models/model.py in forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, enc_self_mask, dec_self_mask, dec_enc_mask)
145 enc_self_mask=None, dec_self_mask=None, dec_enc_mask=None):
146 enc_out = self.enc_embedding(x_enc, x_mark_enc)
--> 147 enc_out, attns = self.encoder(enc_out, attn_mask=enc_self_mask)
148
149 dec_out = self.dec_embedding(x_dec, x_mark_dec)
~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
~/max/Informer2020/models/encoder.py in forward(self, x, attn_mask)
94 inp_len = inp_len//2
95 continue
---> 96 x, attn = encoder(x[:, -inp_len:, :])
97 x_stack.append(x); attns.append(attn)
98 inp_len = inp_len//2
~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
~/max/Informer2020/models/encoder.py in forward(self, x, attn_mask)
65 if self.conv_layers is not None:
66 for attn_layer, conv_layer in zip(self.attn_layers, self.conv_layers):
---> 67 x, attn = attn_layer(x, attn_mask=attn_mask)
68 x = conv_layer(x)
69 attns.append(attn)
~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
~/max/Informer2020/models/encoder.py in forward(self, x, attn_mask)
43 new_x, attn = self.attention(
44 x, x, x,
---> 45 attn_mask = attn_mask
46 )
47 x = x + self.dropout(new_x)
~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
~/max/Informer2020/models/attn.py in forward(self, queries, keys, values, attn_mask)
151 keys,
152 values,
--> 153 attn_mask
154 )
155 out = out.view(B, L, -1)
~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
~/max/Informer2020/models/attn.py in forward(self, queries, keys, values, attn_mask)
109 u = self.factor * np.ceil(np.log(L_Q)).astype('int').item() # c*ln(L_q)
110
--> 111 scores_top, index = self._prob_QK(queries, keys, sample_k=U_part, n_top=u)
112
113 # add scale factor
~/max/Informer2020/models/attn.py in _prob_QK(self, Q, K, sample_k, n_top)
58 # find the Top_k query with sparisty measurement
59 M = Q_K_sample.max(-1)[0] - torch.div(Q_K_sample.sum(-1), L_K)
---> 60 M_top = M.topk(n_top, sorted=False)[1]
61
62 # use the reduced Q to calculate Q_K
RuntimeError: invalid argument 5: k not in range for dimension at /pytorch/aten/src/THC/generic/THCTensorTopK.cu:26
首先感谢您的分享。
我在训练你的模型时,使用了不同的参数和数据集。有时候训练到一半,会出现OOM,不知道能不能接着上一个checkpoint,继续train这个model,不用重新开始?
我调用train(settings)会重新开始训练。
Congratulations on the best paper award!
We collected the ECL data set and found that it contained a total of 370 clients. However, in the experimental part of the paper, you said "It collects the electricity consumption (Kwh) of 321 clients" and "set ‘MT 320’ as the target value". How do you determine the target value? Can you provide the data of these 321 clients? Thank you。
Hello,
I am currently trying to run your code to see how it works, but every time the code terminates too soon based on EarlyStopping. The result MSE and MAE were also quite off compared with the results shown here. I have had no involvement with almost any programming-related things for a long time so my knowledge is too limited to solve the problem myself. With that being said, I did try to set the EarlyStopping patience to 100 instead, but the code still ended on its own despite saying that EarlyStopping counter is at 3 out of 100. Also, at the start, the code would show that Use GPU: cuda: 0, which made me concerned that if the training was done on CPU at first, but when I checked with the Task Manager the GPU use was at almost 100%, so I believed it was fine, but the fact that the code terminates itself too early every time still makes me wonder if it is using the GPU properly. It would be great if you could provide me some help on this.
In case if any information on specs are needed:
OS: Windows Server 2019 64-bit
Processor: Intel Xeon CPU @ 2.20GHz 2.20GHz
Memory: 30GB
GPU: Nvidia Tesla V100
Thank you in advance. Let me know if there is any additional information you need.
你好我想请问一下论文中每个stack的输入是取上一个stack输入的后半段对吗?还是取上一个stack输出的一部分?因为我看代码里初始输入x貌似会被输出覆盖掉。
Informer2020/models/encoder.py
Line 96 in c648cc6
I don't understand how to use the net to predict future data, there’s no example code to predict unseen data, I want to pass X previous values and predict the next values.
I trained the model but I have no idea how to use it to predict new data, I would like to pass a csv with two columns: "dates" and "past values" to predict the next values using the model, but from what I understand it is not possible to do this without entering in the source code. The colab example also does not provide any specific code or comment on how to do this.
By the way, what exactly is "batch_x", "batch_y, "batch_x_mark, "batch_y_mark"? Which ones I have to pass to the informer to predict the next values?
A BUG when creating model:
Lines 50 to 53 in a87092b
The BUG lead the cross-att in decoder using NO casual mask, while self-att using casual mask. Fortunately there is no information leak in informer, but still totally different with what you wrote in the paper.
In your model code, I find you used data scaler for train\val\test dataset separately. However, I think you probably use future information during validating and testing process. Because, during the online prediction, we can't get the whole data in advance. In addition, I didn't find inverse transformation, which is important to show the real model performance for testing dataset. Can you give more information as to how to deal with data scaling and inverse transformation? Thanks.
Thank you for the excellent work! and sorry to bother you again. 😃
Since the Colab Example is used for custom data, why should we input the data-name "ETTh1"?
And I found the code d_inp = 4 if data=='ETTh' else 5
in models/embed.py
. Does it mean ETTh
for (Y, M, W, D) while ETTm
needs additional Minute? But this is unfriendly for the custom data, at least a little bit confusing.
Also, I found you add the freq
to set the dimensionality of the timestamp, isn't it enough to get the time features?
What I want to say is that I don't understand why parameter data
should be sent into the model and DataEmbed Class.
Data is independent of model in my opinion.
我尝试着用script里的参数去调试数据,例如
python -u main_informer.py --model informer --data ETTh2 --features S --seq_len 336 --label_len 336 --pred_len 168 --e_layers 2 --d_layers 1 --attn prob --des 'Exp' --itr 5
测试结果是mse:0.25672146677970886, mae:0.43616965413093567
和您更新的结果差的不多
但是图像的预测结果和真实的值相差的很多。请问是我有漏调的参数还是结果就是这样的。
Informer.ipynb - Colaboratory.pdf
非常感谢。
希望能够支持将模型转换为torchscript格式,便于得到更广泛的应用。
我希望能够在Java App中调用训练好的Informer模型, 在了解了JDL库后得知需要先将pth模型转换为torchscript格式
我尝试使用如下代码进行转换
Exp = Exp_Informer
exp = Exp(args) # 使用训练时相同的参数初始化模型
pthfile = './checkpoints/test/checkpoint.pth'
examples = exp.trace() # 为Informer类新增一个方法以便获取forward()所需的参数, 在此例中返回值是一个tuple()
model = exp.model # 获取模型并加载
model.load_state_dict(torch.load(pthfile))
# 尝试推理并转换
traced_script_module = torch.jit.trace(model, examples)
traced_script_module.save("./traced_model.pt")
我得到了如下错误,
File "E:\pythonspace\deep_learning\Informer2020\models\attn.py", line 110, in forward
U_part = self.factor * np.ceil(np.log(L_K)).astype('int').item() # c*ln(L_k)
AttributeError: 'Tensor' object has no attribute 'astype'
https://zhuanlan.zhihu.com/p/146453159
我猜测这是因为使用了numpy 中的np.ceil(), np.log()函数导致的, 我尝试将其替换为torch对应的函数但仍不奏效
# U_part = self.factor * np.ceil(np.log(L_K)).astype('int').item() # 转换中应尽量避免使用np [https://zhuanlan.zhihu.com/p/146453159]
U_part = self.factor * torch.ceil(torch.log(L_K)).int() # 尝试替换为torch对应的写法
这样改之后错误变成了这样:
torch.jit._trace.TracingCheckError: Tracing failed sanity checks!
encountered an exception while running the Python function with test inputs.
Exception:
log(): argument 'input' (position 1) must be Tensor, not int
希望大佬能指点一下,这里如果想不用np的话应该怎么修改,万分感谢❀❀❀
The weather dataset's link on your paper has been unavailable for sometime. Is there other ways to download the data or is it possible for you to provide the weather data on github? Thanks
Hi,
I was wondering if it is possible to sample from the Informer model?
To be more specific:
During decoding in classic Transformer, each token follows a categorical distribution (i.e. the softmax) and is sampled/generated one by one. I understand that it's an advantage of Informer that this sequential decoding is not needed anymore.
But does this mean that one cannot get diverse samples from the Informer model?
If not, how would one go about generating those samples?
Thanks!
Hallo,
thank you for making the code public.
In the code (forward in ProbAttention), you used the tensor.view to change the dimensions.
queries = queries.view(B, H, L_Q, -1)
keys = keys.view(B, H, L_K, -1)
values = values.view(B, H, L_K, -1)
Why do not use permute?Dose View function not break the relationship between heads?
Is the goal of using tensor.view here to change the order of dimensions? If it is, why tensor.view.?
Hi,
I really appreciate you helping others to understand the code by proactively replying to the questions.
While going through the paper and the figure in the paper, it is mentioned that the output from the encoder is a concatenated feature map from each stack.
"we concatenate all the stacks’ outputs and have the final hidden representation of encoder."
But in the code, it seems like only the final encoder representation is used as an input to the decoder (without concatenating final encoder representation with lower level embedding representations.)
Could you please clarify why do I notice this discrepancy?
Thanks in advance.
Hello everyone,
first of all, thank you for your amazing paper!
We are currently trying to reproduce your results and want to run the experiments on the weather and ECL datasets. Would it be possible for you to also publish the two data loaders you used for those particular datasets to keep the preprocessing consistent.
Thanks a lot in advance!
Hello, just wondering why "circular" padding is chosen for all the conv operations, including the projection of token embeddings?
Hello,
Thanks a lot for the publishing your results and code, I enjoyed reading the paper.
While trying to reproduce the paper results, the output was way off especially for the ETTh2 dataset. (Ran it with the same configuration in the Colab notebook)
testing : informer_ETTh2_ftM_sl96_ll48_pl24_dm512_nh8_el3_dl2_df512_atprob_ebtimeF_dtTrue_exp_0<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
test 2857
test shape: (89, 32, 24, 7) (89, 32, 24, 7)
test shape: (2848, 24, 7) (2848, 24, 7)
mse:0.8689931831590327, mae:0.7622690107174594
Could you please let me know if you use a different hyper parameter? or what am i doing wrong .
Thanks in advance.
Regards,
Kiran
运行程序后results文件夹中的trues.npy是三维数据,和原二维数据什么关系呢,还有预测结果preds.npy也是三维的,为什么不是像预测目标‘OT’那样是一列呢,不太理解,请多多指教。
At “attn.py” "90 - 95 line"
"B, L, H, D = queries.shape
_, S, _, _ = keys.shape
queries = queries.view(B, H, L, -1)
keys = keys.view(B, H, S, -1)
values = values.view(B, H, S, -1)"
why it use "view" but not "transpose"?
pytorch默认的是float,速度更快一般精度相对double也差不多,为何代码里都是数据和模型都是double双精度的呢?有其他的考虑吗?谢谢~
Thanks for the code! I noticed that you mentioned in another issue that the ECL dataset used the Informer paper is the same as https://github.com/laiguokun/multivariate-time-series-data. Could you help to specify which column in the electricity dataset corresponds to the target "MT_320"? Thanks!
Is it possible to combine ETTh1 and ETTh2 data to predict them simultaneously? so the output shape would be something like (batch_size, prediction_length, feature, 2), where 2 is ETTh1 and ETTh2.
args.model = 'informerstack' # model of experiment, options: [informer, informerstack, informerlight(TBD)]
args.data = 'custom' # data
args.root_path = './' # root path of data file
args.data_path = 'test.csv' # data file
args.features = 'S' # forecasting task, options:[M, S, MS(TBD)]; M:multivariate predict multivariate, S:univariate predict univariate, MS:multivariate predict univariate
args.target = 'target' # target feature in S or MS task
args.freq = 't' # freq for time features encoding
args.seq_len = 128 # input sequence length of Informer encoder
args.label_len = 96 # start token length of Informer decoder
args.pred_len = 15 # prediction sequence length
args.enc_in = 1 # encoder input size number of features in input
args.dec_in = 1 # decoder input size number of features
args.c_out = 7 # output size output dimension before FN
args.factor = 5 # probsparse attn factor
args.d_model = 512 # dimension of model
args.n_heads = 8 # num of heads
args.e_layers = 3 # num of encoder layers
args.d_layers = 2 # num of decoder layers
args.d_ff = 512 # dimension of fcn in model
args.dropout = 0.05 # dropout
args.attn = 'full' # attention used in encoder, options:[prob, full]
args.embed = 'fixed' # time features encoding, options:[timeF, fixed, learned]
args.activation = 'gelu' # activation
args.distil = True # whether to use distilling in encoder
args.output_attention = False # whether to output attention in ecoder
args.batch_size = 64
args.learning_rate = 0.0001 ## 0.0001
args.loss = 'mse'
args.lradj = 'type1'
args.num_workers = 0
args.itr = 1
args.train_epochs = 6
args.patience = 3
args.des = 'exp'
我用以上参数训练,结果报错:
IndexError Traceback (most recent call last)
in
9 # train
10 print('>>>>>>>start training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
---> 11 exp.train(setting)
12
13 # test
~/max/Informer2020/exp/exp_informer.py in train(self, setting)
169 # encoder - decoder
170 if self.args.output_attention:
--> 171 outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
172 else:
173 outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
~/max/Informer2020/models/model.py in forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, enc_self_mask, dec_self_mask, dec_enc_mask)
144 def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec,
145 enc_self_mask=None, dec_self_mask=None, dec_enc_mask=None):
--> 146 enc_out = self.enc_embedding(x_enc, x_mark_enc)
147 enc_out, attns = self.encoder(enc_out, attn_mask=enc_self_mask)
148
~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
~/max/Informer2020/models/embed.py in forward(self, x, x_mark)
105
106 def forward(self, x, x_mark):
--> 107 x = self.value_embedding(x) + self.position_embedding(x) + self.temporal_embedding(x_mark)
108
109 return self.dropout(x)
~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
~/max/Informer2020/models/embed.py in forward(self, x)
75 x = x.long()
76
---> 77 minute_x = self.minute_embed(x[:,:,4]) if hasattr(self, 'minute_embed') else 0.
78 hour_x = self.hour_embed(x[:,:,3])
79 weekday_x = self.weekday_embed(x[:,:,2])
IndexError: index 4 is out of bounds for dimension 2 with size 4
如果使用'timeF',则一切正常。
Hello:
Thank you for your work. During my study I find it confusing that in your code (line 68 in atten.py)
V_sum = V.sum(dim=-2)
you directly sum up V for those flatten attention matrix lines. However, in your original paper you use the average value of V, which is more reasonable for me. Which way is correct?
Thank you!
Hi
Same script that was working this morning now gives me :
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-16-8eee1c4218f9> in <module>()
14 target=args.target,
15 timeenc=timeenc,
---> 16 freq=args.freq
17 )
18 data_loader = DataLoader(
5 frames
/content/Informer2020/data/data_loader.py in __init__(self, root_path, flag, size, features, data_path, target, scale, timeenc, freq)
216 self.root_path = root_path
217 self.data_path = data_path
--> 218 self.__read_data__()
219
220 def __read_data__(self):
/content/Informer2020/data/data_loader.py in __read_data__(self)
244 if self.scale:
245 train_data = df_data[border1s[0]:border2s[0]]
--> 246 self.scaler.fit(train_data.values)
247 data = self.scaler.transform(df_data.values)
248 else:
/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_data.py in fit(self, X, y)
667 # Reset internal state before fitting
668 self._reset()
--> 669 return self.partial_fit(X, y)
670
671 def partial_fit(self, X, y=None):
/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_data.py in partial_fit(self, X, y)
698 X = check_array(X, accept_sparse=('csr', 'csc'),
699 estimator=self, dtype=FLOAT_DTYPES,
--> 700 force_all_finite='allow-nan')
701
702 # Even in the case of `with_mean=False`, we update the mean anyway
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
529 array = array.astype(dtype, casting="unsafe", copy=False)
530 else:
--> 531 array = np.asarray(array, order=order, dtype=dtype)
532 except ComplexWarning:
533 raise ValueError("Complex data not supported\n"
/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85
ValueError: could not convert string to float:
Note : args.freq = 'h' # freq for time features encoding
When I used 'informerstack' instead of informer in command below:
python -u main_informer.py --model informerstack --data ETTm1 --attn prob --freq t
an error occured:
(informer) lizhaorui@server-System-Product-Name:~/DL/Informer2020$ python -u main_informer.py --model informerstack --data ETTh1 --attn prob --freq h
Args in experiment:
Namespace(activation='gelu', attn='prob', batch_size=32, c_out=7, checkpoints='./checkpoints/', d_ff=2048, d_layers=1, d_model=512, data='ETTh1', data_path='ETTh1.csv', dec_in=7, des='test', device_ids=[0, 1], devices='0,1', distil=True, dropout=0.05, dvices='0,1', e_layers=2, embed='timeF', enc_in=7, factor=5, features='M', freq='h', gpu=0, itr=2, label_len=48, learning_rate=0.0001, loss='mse', lradj='type1', model='informerstack', n_heads=8, num_workers=0, output_attention=False, patience=3, pred_len=48, root_path='./data/ETT/', seq_len=512, target='OT', train_epochs=6, use_amp=False, use_gpu=True, use_multi_gpu=True)
Use GPU: cuda:0
start training : informerstack_ETTh1_ftM_sl512_ll48_pl48_dm512_nh8_el2_dl1_df2048_atprob_fc5_ebtimeF_dtTrue_test_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8081
val 2833
test 2833
Traceback (most recent call last):
File "main_informer.py", line 91, in
exp.train(setting)
File "/home/lizhaorui/DL/Informer2020/exp/exp_informer.py", line 199, in train
outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lizhaorui/DL/Informer2020/models/model.py", line 147, in forward
enc_out, attns = self.encoder(enc_out, attn_mask=enc_self_mask)
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lizhaorui/DL/Informer2020/models/encoder.py", line 99, in forward
x_stack = torch.cat(x_stack, -2)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
CPU: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:5925 [kernel]
CUDA: registered at /pytorch/build/aten/src/ATen/RegisterCUDA.cpp:7100 [kernel]
QuantizedCPU: registered at /pytorch/build/aten/src/ATen/RegisterQuantizedCPU.cpp:641 [kernel]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradCPU: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradCUDA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradXLA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradNestedTensor: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradPrivateUse1: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradPrivateUse2: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradPrivateUse3: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
Tracer: registered at /pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:10525 [kernel]
Autocast: registered at /pytorch/aten/src/ATen/autocast_mode.cpp:254 [kernel]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
hi zhouhaoyi,
I am just a machinelearning beginner.I watch your vedio about informer from bilibli. I am curious about the informer so I come here to see the code.
I download the code and run it .but I was confused about the result .
My understanding about the test results represent the 'OT' values .but I just got the values is like this.
-0.22336945 0.01950708 -0.08188418 -0.00959123 -0.34219092 0.19225414 -0.40179282
but the real values is like below.
10.114 3.55 6.183 1.564 3.716 1.462 9.567
I am confused about this .Can you help me ?What's the reason?
Thanks for your great work first.
I find in the code att.py line 111:
scores_top, index = self._prob_QK(queries, keys, u, U)
it seems that U is set to n_top and it should be u in the paper?
and U is set to cln(n) instead of mln(n) in line 108:
U = self.factor * np.ceil(np.log(S)).astype('int').item()
I don't know if I misunderstand this and hope for your reply.
Hi, Thanks for your great work.
I just tried to run through provided Colab example using multi-gpu (0,1,2,3).
By changing use_multi_gpu from False to True, I got AssertionError: Invalid device id.
Does that mean Colab refuses to offer 4 GPUs or there are just some bugs?
By the way, I want to make sure whether my problem can be solved by the Informer model.
Let's say my custom dataset has 1 column of date and N columns of features and 1 target column that is numerical, and the goal is to predict the value of the target column using other columns in the same row.
Does this kind of problem called ''Multivariate predict Univariate'' so I choose 'MS' in args.features?
Thanks again.
Congratulations on the best paper award!
I'm new to the transformers with time-series data and just found this cool paper!
It will be great if you could provide an example to show how to use the custom data.
Thanks in advance!
`Use GPU: cuda:0
start training : informer_ETTh1_ftM_sl96_ll48_pl24_dm512_nh8_el3_dl2_df1024_atprob_ebfixed_test_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8521
val 2857
test 2857
Traceback (most recent call last):
File "main_informer.py", line 69, in
exp.train(setting)
File "/content/Informer2020/exp/exp_informer.py", line 157, in train
outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/Informer2020/models/model.py", line 71, in forward
dec_out = self.decoder(dec_out, enc_out, x_mask=dec_self_mask, cross_mask=dec_enc_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/Informer2020/models/decoder.py", line 46, in forward
x = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/Informer2020/models/decoder.py", line 23, in forward
attn_mask=x_mask
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/Informer2020/models/attn.py", line 141, in forward
attn_mask
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/Informer2020/models/attn.py", line 25, in forward
attn_mask = TriangularCausalMask(B, L, device=queries.device)
File "/content/Informer2020/utils/masking.py", line 7, in init
self._mask = torch.triu(torch.ones(mask_shape, dtype=torch.bool), diagonal=1).to(device)
RuntimeError: "triu" not implemented for 'Bool'`
First of all, I like the idea in calcing the attention matrix and it should work.
I then did a small performance test and it seemed the result didn't add up... appreciated if anyone could shed some lights on this.
The code snippet is provided as below. I basically removed anything else except the attention layer, and changed the layer depth to 5.
=================RESULTS=======================
When using full attention: 0.008 sec per forward and 18.8 GRAM
When using prob attention: 0.21 sec per forward and >20 GRAM
=================ENV===========================
UBUNTU: 20.04
NVIDIA DRIVER: 460.39
CUDA: 11.03
PYTORCH: 1.7.1
GPU: RTX3090
=================CODE==========================
class InformerSpeedTest(nn.Module):
def __init__(self, enc_in, dec_in, c_out, seq_len, label_len, out_len,
factor=5, d_model=512, n_heads=8, e_layers=5, d_layers=2, d_ff=512,
dropout=0.0, attn='prob', embed='fixed', data='ETTh', activation='gelu',
device=torch.device('cuda:0')):
super(InformerSpeedTest, self).__init__()
self.attn = attn
# Attention
Attn = ProbAttention if attn=='prob' else FullAttention
# Encoder
self.encoder = Encoder(
[
EncoderLayer(
AttentionLayer(Attn(False, factor, attention_dropout=dropout),
d_model, n_heads),
d_model,
d_ff,
dropout=dropout,
activation=activation
) for l in range(e_layers)
]
)
def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, enc_self_mask=None, dec_self_mask=None, dec_enc_mask=None):
enc_out = self.encoder(x_enc, attn_mask=enc_self_mask)
return enc_out
if __name__ == "__main__":
from tqdm import tqdm
import time
parser = argparse.ArgumentParser()
parser.add_argument("--prob", action="store_true")
args = parser.parse_args()
rount = 50
batch_size = 128
length = 512
d_model = 512
device = "cuda:0"
attn = None
if args.prob:
print("Use prob")
attn = "prob"
else:
print("Use full")
test = InformerSpeedTest(None, None, None, None, None, None, attn=attn).to(device)
test.train() #test.eval()
for i in range(rount):
print(f"Round: {i}")
x = torch.randn(batch_size, length, d_model).to(device)
s = time.time()
test(x, None, None, None)
print(f"Cost: {time.time() - s:.6f}s")
感谢大佬们做了如此惊艳的工作!
我想问一下,我的自定义数据集的序列时间间隔是1s, 我看到freq选项好像仅支持到分钟,我要怎么修改才能用于对间隔为秒的序列进行预测呢。
另外我想问一下,预测时,输入的序列可以是不定长度的吗,比如预测时,不论是输入前60秒还是前40秒的轨迹,都固定预测后续60秒的轨迹
提前感谢大佬 :)
Hi,
I was trying to run the model on the provided Colab with the ETTm1 dataset.
And I run into the error RuntimeError: mat1 dim 1 must match mat2 dim 0:
/content/Informer2020/exp/exp_informer.py in train(self, setting)
171 outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
172 else:
--> 173 outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
174
175 f_dim = -1 if self.args.features=='MS' else 0
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
/content/Informer2020/models/model.py in forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, enc_self_mask, dec_self_mask, dec_enc_mask)
67 def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec,
68 enc_self_mask=None, dec_self_mask=None, dec_enc_mask=None):
---> 69 enc_out = self.enc_embedding(x_enc, x_mark_enc)
70 enc_out, attns = self.encoder(enc_out, attn_mask=enc_self_mask)
71
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
/content/Informer2020/models/embed.py in forward(self, x, x_mark)
105
106 def forward(self, x, x_mark):
--> 107 x = self.value_embedding(x) + self.position_embedding(x) + self.temporal_embedding(x_mark)
108
109 return self.dropout(x)
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
/content/Informer2020/models/embed.py in forward(self, x)
92
93 def forward(self, x):
---> 94 return self.embed(x)
95
96 class DataEmbedding(nn.Module):
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py in forward(self, input)
91
92 def forward(self, input: Tensor) -> Tensor:
---> 93 return F.linear(input, self.weight, self.bias)
94
95 def extra_repr(self) -> str:
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
1690 ret = torch.addmm(bias, input, weight.t())
1691 else:
-> 1692 output = input.matmul(weight.t())
1693 if bias is not None:
1694 output += bias
RuntimeError: mat1 dim 1 must match mat2 dim 0
All I changed was:
args.data = 'ETTm1' # data
Am I missing anything in the configuration?
After executing the following command:
python -u main_informer.py --model informer --data ETTh1
with your downloaded data, I get the following error:
Use GPU: cuda:0
start training : informer_ETTh1_ftM_sl96_ll48_pl24_dm512_nh8_el3_dl2_df1024_atprob_ebfixed_test_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8521
val 2857
test 2857
Traceback (most recent call last):
File "main_informer.py", line 69, in
exp.train(setting)
File "C:\Users\User\Informer2020\exp\exp_informer.py", line 157, in train
outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
File "C:\Users\User\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\User\Informer2020\models\model.py", line 67, in forward
enc_out = self.enc_embedding(x_enc, x_mark_enc)
File "C:\Users\User\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\User\Informer2020\models\embed.py", line 95, in forward
x = self.value_embedding(x) + self.position_embedding(x) + self.temporal_embedding(x_mark)
RuntimeError: The size of tensor a (98) must match the size of tensor b (96) at non-singleton dimension 1
From your paper I read "To enhance the robustness of the distilling operation, we build halving replicas of the main stackand progressively decrease the number of self-attention dis-tilling layers by dropping one layer at a time, like a pyramidin Fig.(3), such that their output dimension is aligned. Thus,we concatenate all the stacks’ outputs and have the final hid-den representation of encoder." Somehow I failed to locate the corresponding code sections (also as black-boxed in pic below), though I do notice maxpool here which I believe is part of the upper portion op:
Informer2020/models/encoder.py
Line 15 in 956ac31
attn.py 48行:
K_expand = K.unsqueeze(-3).expand(B, H, S, L, E)
这一步空间复杂度有点高,超过了原始的transformer ,是不是有更好的实现方式呢?
Can this model support multi-gpu training to speed up training cycles?
Hi,
Thanks for the nice work. I have one question about eq3. To achieve sparsity, the proposed Probsparse attends only to the top-u queries. I am a little bit confused about this design. From my side, it's more rational to attend to partial keys for each query. Could you please elaborate more on this design? Thanks.
你好,我在使用程序训练模型时遇到一个情况比较疑惑。虽然设置了训练6个epochs,不过基本上都是只训练了1个epochs,模型的val loss和test loss就无法继续降低了(最多训练2个epochs),最后训练的结果也不算特别好。
我尝试了几个不同的数据集ETTh1,ETTm1和另外两个天气的数据集。参数使用的基本都是默认参数,程序基本上也没有太多改动。
我感觉有可能是哪里我没有做对,希望可以得到一些优化建议,谢谢。
Hi.
Thank you for your amazing paper.
I think your thoughts and this technical structure is so awesome.
I am doing replication your paper, but I have some question.
The prediction(len=336) of Informer's result is very interesting below.
So I want to know below result's more detail hyperparameters.
eg.) seq_len, label_len, pred_len, e_layers, d_layers and other parameters everything.
Thank you. 非常感谢。
Line 15 in utils/masking.py should be dtype=torch.bool, not dytpe=torch.bool. ^-^
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.