gzerveas / mvts_transformer Goto Github PK

View Code? Open in Web Editor NEW

748.0 748.0 173.0 29 KB

Multivariate Time Series Transformer, public version

License: MIT License

Python 100.00%

mvts_transformer's People

Contributors

Stargazers

Watchers

Forkers

huangtarn abrar09350 kleeeeea aws-bassem sylvainlan lusccc fraserprice hhhercules gordonguo98 yingriyanlong doubb fifixpy zeeroocooll zhuimengxuebao xuzhichao611 harry040 ianderrington hadisaadat lyapunovstability hyunjuice iplay93 tayib-getup-work tayibgetup liubarnabas seanahmad ltangerine surajitdb valeman htrivino20 jxxplzwakeup alsac shdeldari miladalipour99 iamb4tman lraas webzerg pinghao-sib sh9959 mh-lee lmpan githubfragments ryanliangy thejonathanvancetrance saverymax hl729 helsinkipirate chuzheng88 cookie1111 drsaikirant88 ricable teodorkasap fletp harryreyesnieva dalearbo nakols vpredictor areias oalacam henryfang1037 jazheng3 tonylibing numancelik34 gameldar danialghiaseddin tpatzelt isears chjacast josephzbao ruthvik92 lsflyt geopars g1capital obii4 spideralanken nabayanc zhaoguanlan xjw-wade hx804722948 elbert-lau tianxu08 noshad-vida gauravsett kamalendupy iffishells fansofstark griffithkq greenlight2000 rhqaq allenyllee sarunwu icarorib mairaalvi tfahg zhuolinli-shu emrul meesamnaqvi l-dickey leezekun tigerkey10 xaraq

mvts_transformer's Issues

can not download the classification data

question about my dataset

Thank you for sharing your code, there are some parts of your code that I don't understand because of my limited ability, could you help me?

The scenario you are modelling is a direct segmentation of the data, but the scenario I am dealing with uses a sliding window to dynamically traverse the data, so where should I fix this if I want to load my own data.
I still don't quite understand the difference between ImputationDataset and TransductionDataset, it seems that there is no difference between classification and regression tasks in the unsupervised learning phase.
Looking forward to your reply, thanks for your help

failsafe_requirements.txt missing

Hello, the file failsafe_requirements.txt mentioned in README does not seem to be present in the repo?

When installing the package from a new conda environment, I just had to downgrade python to 3.8 to avoid sktime installation issues, and it now seems to work.

Sparse, Binary Data: Interpolate Missing

I am running into an issue related to the type of data I am using. I built a new data class that preprocesses data into the same dataframe format and indexing as the provided examples (appending repeated (sample) sequences to a dataframe indexed by sample number with each row as a timestep and each column as a feature). However, the data I am leveraging is extremely sparse and binary - many NaNs and few 1s. I noticed that your data.py has a function called interpolate_missing, that I am running on my sparse dataframe. However, it's replacing the NaNs with ones, creating a univariate DF. I'm happy to write my own function that simply replaces my NaNs with 0s, but am worried binary data might not work well with this model-type. Could you please provide intuition or guidance here?

Also, I am running this as a supervised regression task to predict the end-time (discretized) of the sequence. My current strategy is to simply provide a label_df with the numerical discretized end-time for each sample, but I know there are other ways to label for this task. Any intuition on if this strategy will be effective or if I should try something else?

Thanks,

Ian

--change_output Option and Transferring Learning

Reposting a conversion with the author of the paper and this code regarding the --change_output option for others who might have a similar question.

My original question to George:
"Thanks, one more I'm going to shoot at you (and post to github) is around the "--change_output" config option. I'm holding thumbs to hear that this is an option that allows us to pre-train, possibly fine-tune a model on a specific task, then load that fine-tuned model, change the output layer and fine-tune it for a another task? And we might even want to freeze everything except the norm and output layers?"

Answer:
"Yes, that's pretty much the envisioned use, but practically the way it implements it is very simple: when you use it, all weights except for the output layer will be loaded from the specified checkpoint. The output layer weights (their name should start with "output_layer") will be initialized as defined in the model's code. So this indeed allows you to either fine-tune the same exact model for another task, or define another model (e.g. subclass of original) with a different output layer (e.g. different output dimensions).

The --freeze option will allow you to do the second thing you are asking. No gradients will be computed (and no parameter updates performed) for any layer except for the output layer.
The norm layers suggestion is interesting; in my case, I was simply using this to evaluate pre-training - fine-tuning on the same exact input dataset, so the batchnorm statistics where the same. However, if you want to change the dataset, then yes, it makes sense to make the batchnorm parameters trainable.
Here is where this can be added:

mvts_transformer/src/main.py

Line 146 in 3f2e378

if config['freeze']:

Data Shapes for Multi-class Classification

Hello,
I hope you are doing well.

I have tried to do a multi-class classification from scratch with your transformer code on my own data set. But still, I have not been able to do that yet. I think the problem is with my data shapes. I have 1442 samples, each sample has 51 rows(time-step) and 9 columns as features. Also, for each sample, I have four labels (four classes). I wanted to know how should be the shape and formation of 'all_df', 'labels_df', and 'all_IDs'. right now they have flowing shapes:
all_df = (1442*51, 9)
labels_df = (1442, 4)
all_IDs = (1442,)

With these shapes, the code gives me an index error when it wants to splits the data, and it says 5071 is out of bounds for 1442. Also, I noticed that in line 66 of main.py :

labels = my_data.labels_df.values.flatten()
the labels_df would be flattened and I do not get why!!!?? It changes the labels index...

Also, I tried to solve this, by deleting the 'flatten()' method, however, it will raise another error when it comes to calculating the loss for validation before starting to train, and it says that the target for the loss should be a 1D tensor, and it should not be multi-target tensor.

I really appreciate it to help me with this.
Regards

Learning loss problem & predict procedure

Hi George, When perform "train models from scratch" with my dataset, rmse loss turn to nan.
I want to check normalizer for my dataset, I wonder how.
If normalization is performed, I wonder if the normalized value is predicted even in the mask section.

Also, I want to check finetuning model structer, but I cannot find. Can you tell me how?
I wonder if the mask is applied to the test set even when finetuned.

Thank you

Problem running Test_only mode

Hi George, really like the project! I have been trying it out for a couple weeks now, training multiple models including some with my own datasets. However during all this time, while training works without any problems, i have not been able to get the test_only mode running. I continue to get this error:
per_batch['predictions'].append(predictions.cpu().numpy()) RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

I have used the following commands:
Training:
python src/main.py --output_dir .\experiments --comment "regression from Scratch" --name custom_regression --records_file Regression_records.xls --data_dir ..\Datasets\CUSTOM --data_class tsra --pattern TRAIN --val_pattern TEST --epochs 100 --lr 0.001 --optimizer RAdam --pos_encoding learnable --task regression

Testing (not working):
python src/main.py --output_dir .\experiments --comment "regression from Scratch" --name Custom_regression --records_file Regression_records.xls --data_dir ..\Datasets\CUSTOM --data_class tsra --pattern TRAIN --val_pattern TEST --epochs 100 --lr 0.001 --optimizer RAdam --pos_encoding learnable --task regression --test_pattern TEST --test_only testset --load_model ./experiments/custom_regression_2022-10-20_17-05-04_MjH/checkpoints/model_best.pth

I have also tried the exact commands mentioned in this issue, which seem to work for the user that opened that issue, yet i still get the same error.

I have tested with both python 3.7 and 3.8 with the normal requirements.txt as well as the failsafe_requirements.txt. (using anaconda)

At this point i am unsure what i am doing wrong and what else to try to get the test_only mode working.

question about masking

Hey there! I want to pretrain the model on my custom dataset from scratch. But my model wasn't learning until I have made dynamic masking of series. My question is - should I dynamically mask only train set(mask series in getitem)? And valid set should be static for evaluation of training process, right? Correct me please, if I've mistaken

forecaster.predict results

I've launched one epoch training for the toy2 dataset and modified the train.py code to call forecaster.predict twice for the fisrt test data sample:

xc, yc, xt, _ = test_samples
yt_pred1 = forecaster.predict(xc, yc, xt)
print("yt_pred 1[0][0]:")
print(yt_pred1[0][0])
yt_pred2 = forecaster.predict(xc, yc, xt)
print("yt_pred 2[0][0]:")
print(yt_pred2[0][0])

But I'm getting two different predict results with the same input (same xc, yc and yt):

yt_pred 1[0][0]:
tensor([ 0.2833, 0.2584, 0.3955, 0.1239, 0.1491, -0.2220, 0.3673, 0.1451,
0.0191, 0.0947, 0.4993, -0.2045, 0.2724, 0.0498, 0.0839, 0.2188,
0.0291, -0.0505, 0.2537, 0.2825])
yt_pred 2[0][0]:
tensor([-0.0851, 0.1524, 0.1037, -0.0464, -0.1989, 0.0934, 0.0636, 0.0913,
0.2973, 0.0513, 0.3559, 0.1850, 0.1016, 0.1844, 0.5109, 0.0665,
0.2945, 0.3052, 0.3375, 0.1235])

Why two different predictions? What am I missing?

Question on multiplying linear projection by sqrt(d_model)

Hello! Thank you for the great paper and for sharing your implementation! I have a quick question. I'm wondering why the linear projection is multiplied by the constant, square root of self.d_model, while this is not mentioned in the paper and not shown in other implementations (I don't think).

This line:

mvts_transformer/src/models/ts_transformer.py

Line 299 in fe3b539

inp = self.project_inp(inp) * math.sqrt(

Just curious, thank you!

No csv file in dataset

I found my mistake.

obtain embeddings from trained models

Hello! Thank you for sharing the code of the paper!

I want to know if there is an easy way of extracting the embeddings (z_t) of a trained model. I was able to pre-train the model (unsupervised learning through input masking) but after I obtain the .pth files, I am struggling to obtain the embedding of the dataset.

How to modify code for single-variate time series?

DummyTSTransformerEncoder

Are you missing this class in ts_transformer.py along with DummyTSTransformerEncoderClassiregressor?

Need your suggestions, Thanks

Hi, I have read the paper (A Transformer-based Framework for Multivariate Time Series Representation Learning), it is a very meaningful work. I want to use this project to finish a prediction work. My taks is a regression problem, and my dataset can be described as follows:
X =
[
[[time series sequence 1]],
[[time series sequence 2]],
***
[[time series sequence s]],
]
and y = [
[[label 1]],
[[label 2]],
***
[[label s]],
]
where sequences are not the same length.
So I want to just use your model definition (in ts_transformer.py](https://github.com/gzerveas/mvts_transformer/blob/master/src/models/ts_transformer.py)) and paddding all sequences to the same length in my dataset before input the TST model.

Is there any thing I need to pay attention to in this job ? Or do you have other suggestions ?

Thansk.

how is the dataset converted to .ts format???

Providing pretrained models?

Would it be possible to provide any of the pretrained models (without finetuning) or host them somewhere? I would be particularly interested in that for the Beijing data.

Thanks!

have you computed the EER?

Thank you for your complete code with the detailed description.
After completing the Training, have you computed/ calculated the Equal Error Rate using the classification dataset by using the pre-trained model?

EEG Classification

I am implementing your paper for EEG classification. The EEG data is of dimension 19x120000 where 19 is the number of electrodes and 120000 are the time points. I would like to understand how this dataset can be added to the code for the implementation.

Discussion about "Extracted representations" mentioned in the "FUTURE WORK"

I am very interested in the "Extracted representations" mentioned in the "FUTURE WORK" , but I still have some areas of confusion and would like to seek your opinion.

The aggregated representation Z of Transformer can evaluate the similarity of time series. Is it to directly aggregate the representation Z of (w, d), where w is the number of time points and d is the output feature dimension of Transformer, into a (1, d) dimension feature? In this case, does each dimension represent the similarity between this sequence and all other sequences? But we input m time series (dimension), and we only get d similarity values. It seems that it can't represent the similarity between the original sequences. Should this Z of (w, d) be passed through a fully connected layer to make it equal to the dimension of original input feature (m-dim)，then we aggregated? Or should we aggregate the pre-trained result x (with an m dimension)? Are the similarities between original multidimensional time series (m dimension) evaluated only when d=m?
Secondly, with regard to the "clustering and visualization" section, I am not quite sure how to proceed. Should we visualize or cluster the aggregated (1, d) or (1, m) features, or should we operate on (w, d)? Do you have any specific suggestions or any relevant works that can be referenced for clustering and visualization?
For each time step, its representation is independently processed. We can assign greater weight to certain time points. As we all known, Transformer can learn the relationship between time points naturally. If we manually apply weights to certain time points, does it mean we are adding a prior constraint? And where should this constraint be added, in the model part or the data processing part? Do you have any specific suggestions?

Thank you very much! I know my questions may be a bit long and complex, but I will try to learn how to ask better questions. Thanks again.

Hyperparameters and accuracy for AtrialFibrillation dataset.

Hello,

I am attempting to perform pre-training and fine-tuning on the AtrialFibrillation dataset, but I am unable to locate the hyperparameters and corresponding performance metrics in the relevant paper. Can you please provide me with this information, if available?

Solved: Minor Test Reproduction Issue

Hi, trying to reproduce results from the paper and running into a seemingly trivial error. Would appreciate any help.
I run the following with no issues:

python src/main.py --output_dir experiments --comment "pretraining through imputation" --name pretrained --records_file Imputation_records.xls --data_dir "datasets/Monash_UEA_UCR_Regression_Archive/AppliancesEnergy/" --data_class tsra --pattern TRAIN --val_ratio 0.2 --epochs 700 --lr 0.001 --optimizer RAdam  --pos_encoding learnable --num_layers 3  --num_heads 16 --d_model 128 --dim_feedforward 512 --batch_size 64

python src/main.py --output_dir experiments --comment "finetune for regression" --name finetuned --records_file Regression_records.xls --data_dir datasets/Monash_UEA_UCR_Regression_Archive/AppliancesEnergy/ --data_class tsra --pattern TRAIN --val_pattern TEST  --epochs 600 --lr 0.001 --optimizer RAdam --pos_encoding learnable  --load_model experiments/pretrained_2023-02-21_18-29-57_Ijb/checkpoints/model_best.pth --task regression --change_output --num_layers 3  --num_heads 16 --d_model 128 --dim_feedforward 512 --batch_size 64

When I try to run:

python src/main.py --output_dir experiments --comment "test" --name test  --data_dir datasets/Monash_UEA_UCR_Regression_Archive/AppliancesEnergy/ --data_class tsra  --load_model experiments/finetuned_2023-02-21_18-40-55_2J1/checkpoints/model_best.pth --pattern TEST --test_only testset --num_layers 3  --num_heads 16 --d_model 128 --dim_feedforward 512 --batch_size 64 --task regression

It seems as though total_samples = 0somehow. This is the full error (I added the print statement to print total_samples):

UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
2023-02-21 19:03:43,404 | INFO : Using device: cpu
2023-02-21 19:03:43,404 | INFO : Loading and preprocessing data ...
66it [00:00, 136.70it/s]
2023-02-21 19:03:43,998 | INFO : 33 samples may be used for training
2023-02-21 19:03:43,998 | INFO : 9 samples will be used for validation
2023-02-21 19:03:43,998 | INFO : 0 samples will be used for testing
2023-02-21 19:03:44,003 | INFO : Creating model ...
2023-02-21 19:03:44,006 | INFO : Model:
TSTransformerEncoderClassiregressor(
  (project_inp): Linear(in_features=24, out_features=128, bias=True)
  (pos_enc): FixedPositionalEncoding(
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (transformer_encoder): TransformerEncoder(
    (layers): ModuleList(
      (0): TransformerBatchNormEncoderLayer(
        (self_attn): MultiheadAttention(
          (out_proj): _LinearWithBias(in_features=128, out_features=128, bias=True)
        )
        (linear1): Linear(in_features=128, out_features=512, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
        (linear2): Linear(in_features=512, out_features=128, bias=True)
        (norm1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (norm2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (dropout1): Dropout(p=0.1, inplace=False)
        (dropout2): Dropout(p=0.1, inplace=False)
      )
      (1): TransformerBatchNormEncoderLayer(
        (self_attn): MultiheadAttention(
          (out_proj): _LinearWithBias(in_features=128, out_features=128, bias=True)
        )
        (linear1): Linear(in_features=128, out_features=512, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
        (linear2): Linear(in_features=512, out_features=128, bias=True)
        (norm1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (norm2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (dropout1): Dropout(p=0.1, inplace=False)
        (dropout2): Dropout(p=0.1, inplace=False)
      )
      (2): TransformerBatchNormEncoderLayer(
        (self_attn): MultiheadAttention(
          (out_proj): _LinearWithBias(in_features=128, out_features=128, bias=True)
        )
        (linear1): Linear(in_features=128, out_features=512, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
        (linear2): Linear(in_features=512, out_features=128, bias=True)
        (norm1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (norm2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (dropout1): Dropout(p=0.1, inplace=False)
        (dropout2): Dropout(p=0.1, inplace=False)
      )
    )
  )
  (dropout1): Dropout(p=0.1, inplace=False)
  (output_layer): Linear(in_features=18432, out_features=1, bias=True)
)
2023-02-21 19:03:44,006 | INFO : Total number of parameters: 616449
2023-02-21 19:03:44,006 | INFO : Trainable parameters: 616449
Loaded model from experiments/finetuned_2023-02-21_18-40-55_2J1/checkpoints/model_best.pth. Epoch: 188
total_samples: 0
Traceback (most recent call last):
  File "src/main.py", line 307, in <module>
    main(config)
  File "src/main.py", line 196, in main
    aggr_metrics_test, per_batch_test = test_evaluator.evaluate(keep_all=True)
  File "/mvts_transformer/src/running.py", line 471, in evaluate
    epoch_loss = epoch_loss / total_samples  # average loss per element for whole epoch
ZeroDivisionError: division by zero

Fwiw, this is the path to the test file and it is populated with data:
datasets/Multivariate2018_ts/Multivariate_ts/SpokenArabicDigits/SpokenArabicDigits_TEST.ts

EDIT: Solved, silly type --pattern should be --test_pattern

Something wrong when training from scratch

Hello

I want to pre-train the model from the very beginning using your data. I've tried your classification command, but many bugs occur.

Does anyone successfully run the code from scratch?

BTW, how to perform mask in the code?

how can the code be runnable for dataframe of column size 18260?

2023-04-12 17:47:26,636 | WARNING : Not all samples have same length: maximum length set to 18260
2023-04-12 17:47:28,605 | INFO : 1212 samples may be used for training
2023-04-12 17:47:28,605 | INFO : 304 samples will be used for validation
2023-04-12 17:47:28,605 | INFO : 0 samples will be used for testing

2023-04-12 17:53:36,348 | INFO : Total number of parameters: 3656002
2023-04-12 17:53:36,348 | INFO : Trainable parameters: 3656002

ERROR

Hello author, I get this error when running the dataset, what does the file with the .ts suffix mean?

Exception: No .ts files found using pattern: 'TRAIN'

Imputation with nan's produces loss to be nan

Hi George,
Imputation using this package is bit confusion, tried keeping nan's for to be imputed, but my loss is obviously 1, cannot use 0 because it can be legitimate value. I tried -1 but seems like the its throwing ZeroDivisionError after 1st epoch.
I can give you my data i.e. wastewater class.
Here is my json config file
{
"data_dir": "\\Hscpigdcapmdw05\sas\Use....\inputdata",
"output_dir": "\\Hscpigdcapmdw05\sas\Use...\mvts_imputed",
"model" : "transformer",
"data_class": "wastewater",
"task" : "imputation",
"d_model": 64,
"activation" : "relu",
"num_heads" : 4,
"num_layers": 8,
"pos_encoding": "learnable",
"epochs" : 10,
"normalization": "minmax",
"test_ratio" : 0.1,
"val_ratio": 0.05,
"mean_mask_length": 6,
"mask_mode": "concurrent",
"mask_distribution": "bernoulli",
"exclude_feats" : ["geoLat", "geoLong", "phureg", "sewershedPop"],
"data_window_len": 15,
"lr": 0.001,
"batch_size": 5,
"masking_ratio": 0.05
}

example_data_class.py is missing

Hi,
Thanks for your work making this accessible to the community.
The example_data_class.py was referenced in your readme, but I can't find it. Could you please add this?
Thank you!
Ian

categorical variables

Hi George, I was wondering what your opinion is for using your TST model on multivariate time series datasets with categorical variables? The dataset I have in mind is encrypted network traffic. This would consist of fields such as "Timestamp", "time since last packet", "packet size", "protocol" (categorical), and various binary columns for TCP flags.

Thanks for you hard work, and will look forward to hearing from you :)

what's the right way to calculate predicted value and extract the representation?

Hi George,
Great project. This really open a new way for MTS analysis.
I have run your code and everything looks good and very fast. When I tried to use the model to calculate the prediction, the value can't match the results in "predictions" folder.
So what I did is load the check points, load the best model, then use first row in "predictions" targets.npy to feed into the model and compare the output with first row in predictions.npy . padding_masks = torch.BoolTensor([[1]]) . My results can't match the prediction output from yours. could you share some idea how to do this on a record by record base?
Another question is how to extract the encoder value, for example I have PM25 data 24 by 9 dataframe, How to get the corresponding 24 by 128 representation data frame? As once this step done I can use the new dataframe for other models input and don't need to calculate these values on the fly to save some time?
Thank you in advance.

repository directory

Hi Goerge, thanks for your open-source codes. It is very clear and organized.

But I am new to use the shell script, could you please give a directory tree of the entire repository? That would be very helpful to understand the architecture. I am confused about where I should put the downloaded data and where I should make the experiments folder. Currently, I am trying with the following tree:

experiments
src
- datasets
- models
- regression
- utils
- main.py
- optimizers.py
- options.py
- running.py

After cd mvts_transformer, I run python src/main.py --output_dir experiments --comment "regression from Scratch" --name FloodModeling1_fromScratch_Regression --records_file Regression_records.xls --data_dir Datasets/Regression/FloodModeling1/ --data_class tsra --pattern TRAIN --val_pattern TEST --epochs 100 --lr 0.001 --optimizer RAdam --pos_encoding learnable --task regression, but it shows No files found using: Datasets/Regression/FloodModeling1/*.

Shape Issue When Masking A Single Timesteps Feature Values

I'm attempting a classification on custom data. There are 8 features and 447 time steps or samples in the train/val set. I'm guessing the issue is with my dataset, so I provide some shape prints below.

The issue occurs in dataset.py, line ~263 (added a lot of comments so might be a bit off) which reads:
for m in range(X.shape[1]): # feature dimension throws IndexError: tuple index out of range

Printing out some variables to debug, just before the above problematic line, we can see X is a single dimension array with size = number of features:

 if distribution == 'geometric':  # stateful (Markov chain)
        if mode == 'separate':  # each variable (feature) is independent
            mask = np.ones(X.shape, dtype=bool)
            print(f'type X: {type(X)}')
            print(f'X.shape: {X.shape}')
            print(f'X.shape[1]: {X.shape[1]}')

Gives:

type X: <class 'numpy.ndarray'>
X.shape: (8,)

Further up, around line 35 or so is where the noise_mask is called. I've printed out some variables there to debug too:

 X = self.feature_df.loc[self.IDs[ind]].values  # (seq_length, feat_dim) array
        print(f'\nshape X: {X.shape}')
        print(f'X: {X}')
        print(f'self.feature_df: {self.feature_df.shape}')
        print(f'self.IDs[ind]: {self.IDs[ind]}')
        print(f'\nBuilding mask')
        mask = noise_mask(X, self.masking_ratio, self.mean_mask_length, self.mode, self.distribution,
                          self.exclude_feats)  # (seq_length, feat_dim) boolean array

Gives:

shape X: (8,)
X: [ 0.62708933  0.75219719 -0.65542292 -0.25243002 -1.11766093 -1.75127136
 -0.79571237 -0.17200066]
self.feature_df: (447, 8)
self.IDs[ind]: 2022-05-27T00:00:00.000000000

Edit: The command I'm using to run is:

python src/main.py --output_dir experiments --comment "pretraining through imputation" --name pretrained_ex1 --records_file Imputation_records_ex1.xls --data_dir data_preprocessing/1 --data_class csv --pattern TRAIN --val_ratio 0.2 --epochs 2 --lr 0.001 --optimizer RAdam  --pos_encoding learnable --num_layers 3  --num_heads 16 --d_model 128 --dim_feedforward 512 --batch_size 128

Error whenever I try to run classification with dataset from timeseriesclassification.com

Regression works perfectly fine for me but all datasets from timeseriesclassification.com give the following errors:

Traceback (most recent call last):
  File "/home/avshmelev/hw/./mvts_transformer/src/main.py", line 307, in <module>
    main(config)
  File "/home/avshmelev/hw/./mvts_transformer/src/main.py", line 235, in main
    aggr_metrics_val, best_metrics, best_value = validate(val_evaluator, tensorboard_writer, config, best_metrics,
  File "/home/avshmelev/hw/mvts_transformer/src/running.py", line 222, in validate
    np.savez(pred_filepath, **per_batch)
  File "<__array_function__ internals>", line 200, in savez
  File "/home/avshmelev/.local/lib/python3.9/site-packages/numpy/lib/npyio.py", line 615, in savez
    _savez(file, args, kwds, False)
  File "/home/avshmelev/.local/lib/python3.9/site-packages/numpy/lib/npyio.py", line 716, in _savez
    val = np.asanyarray(val)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (10,) + inhomogeneous part.

The training process even didn't start. I used the following command to start training (like training from scratch in your readme):

python ./mvts_transformer/src/main.py --output_dir experiments --comment "classification from Scratch" --name ArticularyWordRecognition_fromScratch --records_file Classification_records.xls --data_dir Multivariate_ts/ArticularyWordRecognition/ --data_class tsra --pattern TRAIN --val_pattern TEST --epochs 400 --lr 0.001 --optimizer RAdam --pos_encoding learnable --task classification --key_metric accuracy --batch_size 32

Now I'm just trying to understand what's wrong here using dataset "ArticularyWordRecognition" from timeseriesclassification.com. Could you try to run it and give feedback how it was on your side? If your code wasn't made to work with this dataset, could you, please, provide the right idea of fixing it (it is very important to do everything right because I want to run your code on my own dataset that it very similar to "ArticularyWordRecognition")?

Of course, I can try to fix your code but could you help me in order to avoid some logical mistakes in training pipline (who knows what wrong I can do during my problem fixing).

very long time series, performance drop

implied in the text,who combine the triplet loss with a deep causal
CNN with dilation, in order to make the method effective for very long time series. Is there any relevant code to refer to?This method does not seem to work well on a time series dataset with a sequence length of 5000 and a feature number of 2.

Time-series forecasting

Hi, How can I setup the code for time-series forecasting? What would be the necessary changes? Thanks

How to perform mask as Figure.1?

Hello, George

Thank you very much for your code. Could you tell me how to perform the mask as in Figure 1? I found many functions in dataset.py, but it is difficult to me to implement them without any guidance. Could you provide some more details?

Thanks a lot.

Extraction of time series representations after pre-training without fine-tuning on downstream task

Hi,

First of all, thank you for sharing your work its a very nice implementation.

I need to extract time series representations Z_t as mentioned in the unsupervised pre-training section of the paper.
Can you please advise me what is the best way to extract these representations after pre-training on a custom dataset?

question of relationship between multiple sequences

Hi, I noticed that you focused on the time characteristics of time signals (transformer based), as if you didn't make full use of or extract the relationship information between multiple sequences. That is to say, multiple time series are processed in parallel, and their commonness/characteristics are not obtained. However, this relationship seems to exist. Do you have any opinions on this?

Extracting Imputed values ?

Hi,

My heartiest congratulation on implementing this wonderful work on the transformers. I have also added "WasteWaterClass" to the my branch with new data.py.

It would be great if anybody can tell me how to extract the imputed values after doing unsupervised training using masking.

Thanks
Jag

Testing procedure

First of all, thank you for providing such a complete implementation of your code.
In the paper you mention that "Ater fixing the hyperparameters, the entire training set was used to train the model again, which was finally evaluated on the official test set.". Could you explain the way in which this final training procedure (on the entire training set) was performed?

Was a predefined amount of epochs used to train the model, after which it was evaluated on the testset? Or was the testset used as a validation set?

Thanks in advance.

Test result in multivariate dataset without pretrain

Hi,

I'm trying to study your code for multivariate classification dataset without pretrain, and I choose Handwriting for an example.

In order to achieve the paper's performance , I used the hyperparameters which is shown in your paper.

So I train the model with command below.

python src/main.py --output_dir experiments --comment "classification from Scratch" --name HW --records_file Classification_records.xls --data_dir data/Multivariate_ts/Handwriting --data_class tsra --pattern TRAIN --epochs 400 --lr 0.001 --optimizer RAdam --pos_encoding learnable --task classification --key_metric accuracy --val_ratio 0.2 --num_layers 3 --num_heads 16 --d_model 128 --dim_feedforward 256 --batch_size 128

And then I test the model with command below.

python src/main.py --output_dir experiments --comment "classification from Scratch" --name HW --records_file Classification_records.xls --data_dir data/Multivariate_ts/Handwriting --data_class tsra --pattern TRAIN --epochs 400 --lr 0.001 --optimizer RAdam --pos_encoding learnable --task classification --key_metric accuracy --val_ratio 0 --num_layers 3 --num_heads 16 --d_model 128 --dim_feedforward 256 --batch_size 128 --test_pattern TEST --test_only testset --load_model experiments/HW_2022-07-27_20-01-05_axV/checkpoints/model_best.pth

I thought I use the same data split, same model and same hyperparameters, but finally I find the acc is 0.25882352941176473 and it's different from 0.3 in paper. Is there any step I missed?

Some difficulties reproducing the results in paper.

Thank you for your significant contributions to the field of time series, which have given me the opportunity to build on your work in downstream tasks. However, I am facing some challenges in reproducing the results of your paper. I have used the unsupervised pre-training mode to complete the regression of AppliancesEnergy data, and the parameters are also specific values given in your paper (Table 14).

The specific pre-training code is as follows.
python src/main.py --output_dir experiments --comment "pretraining through imputation" --name pretrained --records_file Imputation_records.xls --data_dir "/AppliancesEnergy/" --data_class tsra --pattern TRAIN --val_ratio 0.2 --epochs 700 --lr 0.001 --optimizer RAdam --pos_encoding learnable --num_layers 3 --num_heads 16 --d_model 128 --dim_feedforward 512 --batch_size 128

The specific fine-tune code is as follows.
python src/main.py --output_dir experiments --comment "finetune for regression" --name finetuned --records_file Regression_records.xls --data_dir /AppliancesEnergy/ --data_class tsra --pattern TRAIN --val_pattern TEST --epochs 200 --lr 0.001 --optimizer RAdam --pos_encoding learnable --load_model /pretrained_2023-02-17_21-13-58_dtF/checkpoints/model_best.pth --task regression --change_output --num_layers 3 --num_heads 16 --d_model 128 --dim_feedforward 512 --batch_size 128

The specific test code is as follows.
python src/main.py --output_dir experiments --comment "test" --name test --data_dir /AppliancesEnergy/ --data_class tsra --load_model /finetuned_2023-02-17_21-28-36_l5O/checkpoints/model_best.pth --pattern TEST --test_only testset --num_layers 3 --num_heads 16 --d_model 128 --dim_feedforward 512 --batch_size 128 --task regression

I have run these three steps (pre-training, fine-tuning, and testing) several times, but each time the test loss results have a large difference with that in paper. I saw that there is a similar situation in issue 19 (Test result in multivariate dataset without pretrain #19), and he said he solved it this way (He search best epoch and train the model with such epoch and the whole train set). However, I did not understand his meaning indeed. Would you be able to provide a detailed explanation or some other advice to me as a beginner?

the RMSE pretrained reslut of AppliancesEnergy is 2.375 (Table 4)
but i got thes test result:
Test Summary: loss: 11.078025
Test Summary: loss: 11.406574 |
Test Summary: loss: 10.409667

Very high loss when finetuning

Dear Author,

I am running your commands and find that the pretraining process seems good while the finetuning is weird. The pretraining loss is just 0.140160306.

The commands I run are

CUDA_VISIBLE_DEVICES=4 python src/main.py --output_dir experiments --comment "pretraining through imputation" --name BeijingPM25Quality_pretrained --records_file Imputation_records.xls --data_dir BeijingPM25Quality --data_class tsra --pattern TRAIN --val_ratio 0.2 --epochs 700 --lr 0.001 --optimizer RAdam --batch_size 32 --pos_encoding learnable --d_model 128

CUDA_VISIBLE_DEVICES=1 python src/main.py --output_dir experiments --comment "finetune for regression" --name BeijingPM25Quality_finetuned --records_file Regression_records.xls --data_dir BeijingPM25Quality --data_class tsra --pattern TRAIN --val_pattern TEST --epochs 200 --lr 0.001 --optimizer RAdam --pos_encoding learnable --d_model 128 --load_model /home/xzhoubi/paperreading/mvts_transformer/experiments/BeijingPM25Quality_pretrained_2022-07-19_10-27-28_tlB/checkpoints/model_best.pth --task regression --change_output --batch_size 128

Can you please help check it?

2022-07-19 17:42:53,244 | INFO : Epoch 85 Training Summary: epoch: 85.000000 | loss: 1024.587302 | 
2022-07-19 17:42:53,244 | INFO : Epoch runtime: 0.0 hours, 0.0 minutes, 4.77277946472168 seconds

2022-07-19 17:42:53,244 | INFO : Avg epoch train. time: 0.0 hours, 0.0 minutes, 4.6006609103258915 seconds
2022-07-19 17:42:53,245 | INFO : Avg batch train. time: 0.048943201173679694 seconds
2022-07-19 17:42:53,245 | INFO : Avg sample train. time: 0.00038602625527151295 seconds
Training Epoch:  42%|████████████████████▊                            | 85/200 [07:40<10:05,  5.26s/it]Training Epoch 86   0.0% | batch:         0 of        94 |       loss: 566.886
Training Epoch 86   1.1% | batch:         1 of        94        |       loss: 686.58
Training Epoch 86   2.1% | batch:         2 of        94        |       loss: 1297.63
Training Epoch 86   3.2% | batch:         3 of        94        |       loss: 976.956
Training Epoch 86   4.3% | batch:         4 of        94        |       loss: 565.19
Training Epoch 86   5.3% | batch:         5 of        94        |       loss: 809.262
Training Epoch 86   6.4% | batch:         6 of        94        |       loss: 1095.96
Training Epoch 86   7.4% | batch:         7 of        94        |       loss: 1047.49
Training Epoch 86   8.5% | batch:         8 of        94        |       loss: 782.682
Training Epoch 86   9.6% | batch:         9 of        94        |       loss: 697.767
Training Epoch 86  10.6% | batch:        10 of        94        |       loss: 900.141
Training Epoch 86  11.7% | batch:        11 of        94        |       loss: 919.351
Training Epoch 86  12.8% | batch:        12 of        94        |       loss: 782.872
Training Epoch 86  13.8% | batch:        13 of        94        |       loss: 1082.41
Training Epoch 86  14.9% | batch:        14 of        94        |       loss: 1004.29
Training Epoch 86  16.0% | batch:        15 of        94        |       loss: 960.513
Training Epoch 86  17.0% | batch:        16 of        94        |       loss: 776.499
Training Epoch 86  18.1% | batch:        17 of        94        |       loss: 995.985
Training Epoch 86  19.1% | batch:        18 of        94        |       loss: 655.607
Training Epoch 86  20.2% | batch:        19 of        94        |       loss: 733.846
Training Epoch 86  21.3% | batch:        20 of        94        |       loss: 1190.87
Training Epoch 86  22.3% | batch:        21 of        94        |       loss: 698.143
Training Epoch 86  23.4% | batch:        22 of        94        |       loss: 992.943
Training Epoch 86  24.5% | batch:        23 of        94        |       loss: 1017.47
Training Epoch 86  25.5% | batch:        24 of        94        |       loss: 696.403
Training Epoch 86  26.6% | batch:        25 of        94        |       loss: 822.942
Training Epoch 86  27.7% | batch:        26 of        94        |       loss: 935.869
Training Epoch 86  28.7% | batch:        27 of        94        |       loss: 1040.06
Training Epoch 86  29.8% | batch:        28 of        94        |       loss: 904.523
Training Epoch 86  30.9% | batch:        29 of        94        |       loss: 882.923
Training Epoch 86  31.9% | batch:        30 of        94        |       loss: 805.928
Training Epoch 86  33.0% | batch:        31 of        94        |       loss: 803.492
Training Epoch 86  34.0% | batch:        32 of        94        |       loss: 1720.69
Training Epoch 86  35.1% | batch:        33 of        94        |       loss: 778.216
Training Epoch 86  36.2% | batch:        34 of        94        |       loss: 729.644
Training Epoch 86  37.2% | batch:        35 of        94        |       loss: 1233.58
Training Epoch 86  38.3% | batch:        36 of        94        |       loss: 960.826
Training Epoch 86  39.4% | batch:        37 of        94        |       loss: 986.129
Training Epoch 86  40.4% | batch:        38 of        94        |       loss: 1316.68
Training Epoch 86  41.5% | batch:        39 of        94        |       loss: 1351.79
Training Epoch 86  42.6% | batch:        40 of        94        |       loss: 1661.48
Training Epoch 86  43.6% | batch:        41 of        94        |       loss: 956.305
Training Epoch 86  44.7% | batch:        42 of        94        |       loss: 1017.96
Training Epoch 86  45.7% | batch:        43 of        94        |       loss: 851.958
Training Epoch 86  46.8% | batch:        44 of        94        |       loss: 816.494
Training Epoch 86  47.9% | batch:        45 of        94        |       loss: 603.491
Training Epoch 86  48.9% | batch:        46 of        94        |       loss: 710.572
Training Epoch 86  50.0% | batch:        47 of        94        |       loss: 1318.47
Training Epoch 86  51.1% | batch:        48 of        94        |       loss: 905.094
Training Epoch 86  52.1% | batch:        49 of        94        |       loss: 662.117
Training Epoch 86  53.2% | batch:        50 of        94        |       loss: 850.853
Training Epoch 86  54.3% | batch:        51 of        94        |       loss: 1007.81
Training Epoch 86  55.3% | batch:        52 of        94        |       loss: 1236.99
Training Epoch 86  56.4% | batch:        53 of        94        |       loss: 809.194
Training Epoch 86  57.4% | batch:        54 of        94        |       loss: 1075.82
Training Epoch 86  58.5% | batch:        55 of        94        |       loss: 859.909
Training Epoch 86  59.6% | batch:        56 of        94        |       loss: 739.112
Training Epoch 86  60.6% | batch:        57 of        94        |       loss: 992.518
Training Epoch 86  61.7% | batch:        58 of        94        |       loss: 953.861
Training Epoch 86  62.8% | batch:        59 of        94        |       loss: 881.18
Training Epoch 86  63.8% | batch:        60 of        94        |       loss: 878.613
Training Epoch 86  64.9% | batch:        61 of        94        |       loss: 1006.92
Training Epoch 86  66.0% | batch:        62 of        94        |       loss: 728.144
Training Epoch 86  67.0% | batch:        63 of        94        |       loss: 865.157
Training Epoch 86  68.1% | batch:        64 of        94        |       loss: 895.809
Training Epoch 86  69.1% | batch:        65 of        94        |       loss: 616.984
Training Epoch 86  70.2% | batch:        66 of        94        |       loss: 893.007
Training Epoch 86  71.3% | batch:        67 of        94        |       loss: 859.431
Training Epoch 86  72.3% | batch:        68 of        94        |       loss: 1648.19
Training Epoch 86  73.4% | batch:        69 of        94        |       loss: 657.725
Training Epoch 86  74.5% | batch:        70 of        94        |       loss: 960.164
Training Epoch 86  75.5% | batch:        71 of        94        |       loss: 666.139
Training Epoch 86  76.6% | batch:        72 of        94        |       loss: 3079.8
Training Epoch 86  77.7% | batch:        73 of        94        |       loss: 802.407
Training Epoch 86  78.7% | batch:        74 of        94        |       loss: 1103.64
Training Epoch 86  79.8% | batch:        75 of        94        |       loss: 1029.07
Training Epoch 86  80.9% | batch:        76 of        94        |       loss: 1488.64
Training Epoch 86  81.9% | batch:        77 of        94        |       loss: 924.513
Training Epoch 86  83.0% | batch:        78 of        94        |       loss: 909.587
Training Epoch 86  84.0% | batch:        79 of        94        |       loss: 862.864
Training Epoch 86  85.1% | batch:        80 of        94        |       loss: 607.052
Training Epoch 86  86.2% | batch:        81 of        94        |       loss: 967.5
Training Epoch 86  87.2% | batch:        82 of        94        |       loss: 942.684
Training Epoch 86  88.3% | batch:        83 of        94        |       loss: 1217.01
Training Epoch 86  89.4% | batch:        84 of        94        |       loss: 685.092
Training Epoch 86  90.4% | batch:        85 of        94        |       loss: 949.638
Training Epoch 86  91.5% | batch:        86 of        94        |       loss: 737.985
Training Epoch 86  92.6% | batch:        87 of        94        |       loss: 1085.89
Training Epoch 86  93.6% | batch:        88 of        94        |       loss: 936.676
Training Epoch 86  94.7% | batch:        89 of        94        |       loss: 1203.51
Training Epoch 86  95.7% | batch:        90 of        94        |       loss: 677.801
Training Epoch 86  96.8% | batch:        91 of        94        |       loss: 2214.77
Training Epoch 86  97.9% | batch:        92 of        94        |       loss: 1357.56
Training Epoch 86  98.9% | batch:        93 of        94        |       loss: 1019.23

2022-07-19 17:42:57,306 | INFO : Epoch 86 Training Summary: epoch: 86.000000 | loss: 974.012262 | 
2022-07-19 17:42:57,307 | INFO : Epoch runtime: 0.0 hours, 0.0 minutes, 3.9919965267181396 seconds

2022-07-19 17:42:57,307 | INFO : Avg epoch train. time: 0.0 hours, 0.0 minutes, 4.593583417493243 seconds
2022-07-19 17:42:57,307 | INFO : Avg batch train. time: 0.04886790869673663 seconds
2022-07-19 17:42:57,307 | INFO : Avg sample train. time: 0.00038543240623370055 seconds
2022-07-19 17:42:57,307 | INFO : Evaluating on validation set ...
Evaluating Epoch 86   0.0% | batch:         0 of        40      |       loss: 7538.28
Evaluating Epoch 86   2.5% | batch:         1 of        40      |       loss: 1100.53
Evaluating Epoch 86   5.0% | batch:         2 of        40      |       loss: 2441.92
Evaluating Epoch 86   7.5% | batch:         3 of        40      |       loss: 7944.98
Evaluating Epoch 86  10.0% | batch:         4 of        40      |       loss: 2934.04
Evaluating Epoch 86  12.5% | batch:         5 of        40      |       loss: 2394.65
Evaluating Epoch 86  15.0% | batch:         6 of        40      |       loss: 8225.28
Evaluating Epoch 86  17.5% | batch:         7 of        40      |       loss: 3071.4
Evaluating Epoch 86  20.0% | batch:         8 of        40      |       loss: 3004.23
Evaluating Epoch 86  22.5% | batch:         9 of        40      |       loss: 2549.05
Evaluating Epoch 86  25.0% | batch:        10 of        40      |       loss: 5039.37
Evaluating Epoch 86  27.5% | batch:        11 of        40      |       loss: 1271.33
Evaluating Epoch 86  30.0% | batch:        12 of        40      |       loss: 7026.6
Evaluating Epoch 86  32.5% | batch:        13 of        40      |       loss: 4039.62
Evaluating Epoch 86  35.0% | batch:        14 of        40      |       loss: 1919.55
Evaluating Epoch 86  37.5% | batch:        15 of        40      |       loss: 3505.34
Evaluating Epoch 86  40.0% | batch:        16 of        40      |       loss: 5214.82
Evaluating Epoch 86  42.5% | batch:        17 of        40      |       loss: 2959.36
Evaluating Epoch 86  45.0% | batch:        18 of        40      |       loss: 2551.97
Evaluating Epoch 86  47.5% | batch:        19 of        40      |       loss: 6823
Evaluating Epoch 86  50.0% | batch:        20 of        40      |       loss: 4544.8
Evaluating Epoch 86  52.5% | batch:        21 of        40      |       loss: 1190.93
Evaluating Epoch 86  55.0% | batch:        22 of        40      |       loss: 3702.28
Evaluating Epoch 86  57.5% | batch:        23 of        40      |       loss: 3874.76
Evaluating Epoch 86  60.0% | batch:        24 of        40      |       loss: 1572.05
Evaluating Epoch 86  62.5% | batch:        25 of        40      |       loss: 3755.92
Evaluating Epoch 86  65.0% | batch:        26 of        40      |       loss: 10556.1
Evaluating Epoch 86  67.5% | batch:        27 of        40      |       loss: 3082.73
Evaluating Epoch 86  70.0% | batch:        28 of        40      |       loss: 1867.05
Evaluating Epoch 86  72.5% | batch:        29 of        40      |       loss: 10148.6
Evaluating Epoch 86  75.0% | batch:        30 of        40      |       loss: 1724.54
Evaluating Epoch 86  77.5% | batch:        31 of        40      |       loss: 1341.73
Evaluating Epoch 86  80.0% | batch:        32 of        40      |       loss: 7704.38
Evaluating Epoch 86  82.5% | batch:        33 of        40      |       loss: 7095.86
Evaluating Epoch 86  85.0% | batch:        34 of        40      |       loss: 1109.71
Evaluating Epoch 86  87.5% | batch:        35 of        40      |       loss: 5296.75
Evaluating Epoch 86  90.0% | batch:        36 of        40      |       loss: 6882.2
Evaluating Epoch 86  92.5% | batch:        37 of        40      |       loss: 2588.44
Evaluating Epoch 86  95.0% | batch:        38 of        40      |       loss: 3639.52
Evaluating Epoch 86  97.5% | batch:        39 of        40      |       loss: 11084.6
2022-07-19 17:42:58,800 | INFO : Validation runtime: 0.0 hours, 0.0 minutes, 1.4921939373016357 seconds

2022-07-19 17:42:58,800 | INFO : Avg val. time: 0.0 hours, 0.0 minutes, 1.4729443497127956 seconds
2022-07-19 17:42:58,800 | INFO : Avg batch val. time: 0.03682360874281989 seconds
2022-07-19 17:42:58,800 | INFO : Avg sample val. time: 0.0002917877079462749 seconds

Getting an error while running. Could anyone suggest where should I change the code

Traceback (most recent call last):
File "/content/drive/MyDrive/502/src/main.py", line 307, in
main(config)
File "/content/drive/MyDrive/502/src/main.py", line 235, in main
aggr_metrics_val, best_metrics, best_value = validate(val_evaluator, tensorboard_writer, config, best_metrics,
File "/content/drive/MyDrive/502/src/running.py", line 191, in validate
aggr_metrics, per_batch = val_evaluator.evaluate(epoch, keep_all=True)
File "/content/drive/MyDrive/502/src/running.py", line 451, in evaluate
predictions = self.model(X.to(self.device), padding_masks)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/content/drive/MyDrive/502/src/models/ts_transformer.py", line 303, in forward
output = self.transformer_encoder(inp, src_key_padding_mask=~padding_masks) # (seq_length, batch_size, d_model)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/transformer.py", line 306, in forward
output = mod(output, src_mask=mask, is_causal=is_causal, src_key_padding_mask=src_key_padding_mask_for_layers)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
TypeError: forward() got an unexpected keyword argument 'is_causal'

How should we balance the data in this code for binary classification problem in order to get good accuracy for both the classes???

nothing

Normalizer will normalize the test data with mean and std of test unless it is used on train data before

One pre-trained model for all datasets or one for each?

Hi @gzerveas , when I read your paper, I thought you pre-trained ONE model based on all datasets and fine-tuned on each dataset for the specific classification/regression task. However, based on the README in this repository, it seems that each dataset has a pre-trained model. Which method did you use in your experiments?

Only test mode not working

Hi,

I think there is a typo for the only test behaviour (--test_only testset), the input of pipeline factory should be pipeline_factory(config) instead of pipeline_factory(config['task']) which does not run