Code Monkey home page Code Monkey logo

lstm_encoder_decoder's Introduction

Building a LSTM Encoder-Decoder using PyTorch to make Sequence-to-Sequence Predictions

Requirements

  • Python 3+
  • PyTorch
  • numpy

1 Overview

There are many instances where we would like to predict how a time series will behave in the future. For example, we may be interested in forecasting web page viewership, weather conditions (temperature, humidity, etc.), power usage, or traffic volume. In this project, we will focus on making sequence-to-sequence predictions for time series data. A sequence-to-sequence prediction uses ni input values of the time series to predict the next no values. An example sequence-to-sequence prediction for the number of views Stephen Hawking's Wikipedia page receives is shown below.


Here, the past few months of viewership (black) would be used to predict the next month (red) of viewership.

For sequence-to-sequence time series predictions, the past values of the time series often influence the future values. In the case of Stephen Hawking's Wikipedia page, more people might view his page after a film or documentary about him is released. The increase in public discussion might stimulate other people to view his Wikipedia page, causing an upward trend in viewership. The Long Short-Term Memory (LSTM) neural network is well-suited to model this type of problem because it can learn long-term dependencies in the data. To make sequence-to-sequence predictions using a LSTM, we use an encoder-decoder architecture.

The LSTM encoder-decoder consists of two LSTMs. The first LSTM, or the encoder, processes an input sequence and generates an encoded state. The encoded state summarizes the information in the input sequence. The second LSTM, or the decoder, uses the encoded state to produce an output sequence. Note that the input and output sequences can have different lengths.

We will build a LSTM encoder-decoder using PyTorch to make sequence-to-sequence predictions for time series data. For illustrative purposes, we will apply our model to a synthetic time series dataset. In Section 2, we will prepare the synthetic time series dataset to input into our LSTM encoder-decoder. In Section 3, we will build the LSTM encoder-decoder using PyTorch. We discuss how we train the model and use it to make predictions. Finally, in Section 4, we will evaluate our model on the training and test datasets.

2 Preparing the Time Series Dataset

We prepare the time series dataset in generate_dataset.py. For our time series, we consider the noisy sinusoidal curve plotted below.

We treat the first 80 percent of the time series as the training set and the last 20 percent as the test set. The time series, split into the training and test data, is shown below.

Right now, our dataset is one long time series. In order to train the LSTM encoder-decoder, we need to subdivide the time series into many shorter sequences of ni input values and no target values. We can achieve this by windowing the time series. To do this, we start at the first y value and collect ni values as input and the next no values as targets. Then, we slide our window to the second (stride = 1) or third (stride = 2) y value and repeat the procedure. We do this until the window no longer fits into the dataset, for a total of nw times. We organize the inputs in a matrix, X, with shape (ni, nw) and the targets in a matrix, Y, with shape (no, nw). The windowing procedure is shown schematically for ni = 3, no = 2, and stride = 1 below.

We will feed X and Y into our LSTM encoder-decoder for training. The LSTM encoder-decoder expects X and Y to be 3-dimensional, with the third dimension being the number of features. Since we only have one feature, y, we augment the shape of X to (ni, nw, 1) and Y to (no, nw, 1).

We apply the windowing procedure to our synthetic time series, using ni = 80, no = 20, and stride = 5. A sample training window is shown below.

3 Build the LSTM Encoder-Decoder using PyTorch

We use PyTorch to build the LSTM encoder-decoder in lstm_encoder_decoder.py. The LSTM encoder takes an input sequence and produces an encoded state (i.e., cell state and hidden state). We feed the last encoded state produced by the LSTM encoder as well as the last value of the input data, , into the LSTM decoder. With this information, the LSTM decoder makes predictions. During training, we allow the LSTM decoder to make predictions in three different ways. First, we can predict recursively.

That is, we recurrently feed the predicted decoder outputs, , into the LSTM decoder until we have an output of the desired length. Second, we can make predictions using teacher forcing. In teacher forcing, we feed the true target value, , into the LSTM decoder.

Teacher forcing acts like "training wheels." If the model makes a bad prediction, it is put back in place with the true value. Finally, we can make predictions using mixed teacher forcing.

Sometimes, we provide the LSTM decoder with the true value, while other times we require it to use the predicted value. Thus, the "training wheels" are on some of the time. When we use teacher forcing or mixed teacher forcing, we provide an option in the code called dynamic teacher forcing. If dynamic teacher forcing is turned on, the amount of teacher forcing is slowly reduced each epoch. Dynamic teacher forcing helps the model learn the structure of the data at the beginning of training, but then slowly transitions it to make predictions on its own.

Once the LSTM decoder has produced an output sequence of the desired length, we compare the predicted values to the true target values and compute the mean squared error as our loss function. We then use backpropagation to adjust the weights in the LSTM gates to minimize the loss function.

After we have trained the LSTM encoder-decoder, we can use it to make predictions for data in the test set. All predictions for test data are done recursively.

4 Evaluate LSTM Encoder-Decoder on Train and Test Datasets

Now, let's evaluate our model performance. We build a LSTM encoder-decoder that takes in 80 time series values and predicts the next 20 values in example.py. During training, we use mixed teacher forcing. We set the level of mixed teacher forcing so that there is a 60 percent chance between predicting recursively and using teacher forcing. For this run, we set the size of the encoded state produced by the LSTM encoder to 15. Longer hidden states allow the LSTM encoder to store more information about the input series. The model specifications are shown below.

model = lstm_encoder_decoder.lstm_seq2seq(input_size = X_train.shape[2], hidden_size = 15)

loss = model.train_model(X_train, Y_train, n_epochs = 50, target_len = ow, batch_size = 5, 
                         training_prediction = 'mixed_teacher_forcing', teacher_forcing_ratio = 0.6, 
                         learning_rate = 0.01, dynamic_tf = False)

We have plotted a few examples of our model for the train and test data.

We see that for both the train and test sets, the LSTM encoder-decoder is able to learn the pattern in the time series.

lstm_encoder_decoder's People

Contributors

lkulowski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

lstm_encoder_decoder's Issues

Model does not seem to behave well with trending data

Hi! I saw your model and thought it was really cool! I tried to use it to predict some data, and it did seem to pickup the patterns really well, but it really struggled with trend data. For example:
synthetic_time_series

I tried to train using this format:
train_test_split
windowed_data
But ended up with the following:
predictions

The model seems to undestimate the test data (kind of makes sense, because it was trained with a part of the data that had lower values).

Do you know any way to correct this?

Thanks by the way, you code was helpful and explained a lot of concepts.

R² are too slow when use this model to predict

I have a dataset which describes water level of river,looks like this:
图片
And then I process data like this:

    res_df = pd.read_csv(my_table)
    t = res_df['TM']    
    y = res_df['Z']
    t_train, t_test, y_train, y_test = train_test_split(t, y, test_size=0.25)
    # set size of input/output windows
    iw = 72
    ow = 72
    s = 12
    # generate windowed training/test datasets
    Xtrain, Ytrain = generate_dataset.windowed_dataset(y_train.to_numpy().reshape(-1, 1),
                                                       input_window=iw, output_window=ow, stride=s)
    Xtest, Ytest = generate_dataset.windowed_dataset(y_test.to_numpy().reshape(-1, 1),
                                                     input_window=iw, output_window=ow, stride=s)
    X_train, Y_train, X_test, Y_test = generate_dataset.numpy_to_torch(Xtrain, Ytrain, Xtest, Ytest)

You can see I want to predict 72 water levels in 72 hours from last 72 hours, and dim of X_train/Y_train is (72, 1259, 1), X_test/Y_test is (72, 412, 1) .
Then I use these codes to test the model:

model = LstmSeq2seq(input_size=X_train.shape[2], hidden_size=20)
device = torch.device('cuda:0' if torch.cuda.is_available() else "cpu")
model.to(device)
loss = model.train_model(X_train, Y_train, n_epochs=50, target_len=ow, batch_size=4, training_prediction='mixed_teacher_forcing',
                                 teacher_forcing_ratio=0.6, learning_rate=0.006, dynamic_tf=False)
# hidden_size = 20
np.save(loss_path, loss, allow_pickle=True)
torch.save(model, model_path)
for sample in range(0, Y_test.shape[1]):
    X_slice = X_test[:, sample, :]
    Y_slice = Y_test[:, sample, :]
    rmse = sqrt(sklearn.metrics.mean_squared_error(Y_slice.cpu().numpy(), model.predict(X_slice,
                    target_len=Y_test.shape[0])))
    r2 = sklearn.metrics.r2_score(Y_slice.cpu().numpy(), model.predict(X_slice, target_len=Y_test.shape[0]))
    print(rmse, ',', r2)

However 412 outputs of slices look like this:

0.44626987 , -0.5325643609308734
0.4078824 , -0.27453414173048074
0.43178117 , -0.16933621375610097
0.47949684 , -0.3551825689227257
0.474012 , -0.4368760433523293
0.50186276 , -0.4779415136088816
0.47483996 , -0.49918125915299805
0.43614775 , -0.4350776345576517
0.41064182 , -0.5490551658763032
0.40776572 , -0.3227591162474419
0.41470525 , -0.4134024058370649
0.37772563 , -0.10867208658414595
0.5142976 , -0.25404413112346536
……

You can see almost all R² of 412 slices is smaller than 0, so it's a bad model.
I have adjusted parameters of LSTM for serveral times, but I don't get a ideal result. Do I have any mistake? If you are here, please answer me.

Possible API misunderstanding and error.

Hello,
I am trying to use your lstm to solve a problem. I tried the example and everything worked fine. Now I want to derive a 7 dimensional feature from a 7 dimensional time series with a window size of 20 and 60000 samples. So I thought the shape of the training data should be (20, 60000, 7) and the shape of the target data should be (7,60000,1). Am I getting this wrong?

Also when I try to learn on the custom data I am getting the following error:

Traceback (most recent call last): File "model_creation.py", line 141, in main() File "model_creation.py", line 130, in main loss = model.train_model(x_train, y_train, n_epochs=10, target_len=7, batch_size=50, training_prediction="mixed_teacher_enforcing", File "C:\Users\tomsc\gdm-python\classes\lstm_encoder_decoder.py", line 220, in train_model loss.backward() File "C:\Users\tomsc\anaconda3\lib\site-packages\torch\tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "C:\Users\tomsc\anaconda3\lib\site-packages\torch\autograd_init_.py", line 130, in backward Variable._execution_engine.run_backward( RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Can you help me with that?

Thanks and Greetings

fitted curve not behave well but loss curve seems good

hi Author,thank you for sharing your code!
i tried to apply this code using my own dataset and it was multivariate,but the fitted curve not behave well but loss curve seems good,look like
d569dfac6eb571412c1e9d68752f2dc
and the fitted curve
96a18898d4367d7ec5f6a9c148fdda0
can i have your suggestions?thanks a lot !

Decoder input does not come from Encoder (only hidden states are transferred) why ?

Hi !

Thanks for this great code it has been very useful to me. Just I had a question about your encoder decoder model in lstm_encoder_decoder.py file.
During the training, you initialize the decoder parameters like this:

decoder_input = input_batch[-1, :, :]
decoder_hidden = encoder_hidden
decoder_output, decoder_hidden = self.decoder(decoder_input, decoder_hidden)

Meaning that the hidden states are transferred from the encoder to the decoder but not the input tensor.
Hence, if the reduced encoded version of the encoder is ignored and not passed to the decoder, I don't understand why using an encoder-decoder architecture. Correct me if I'm wrong, but I think that in the above example, the current model does what a standard LSTM would do ( i.e. transferring hidden states from one cell to another). So from what I understand, for an auto-encoder, the above lines of code would be replaced by:

decoder_input = encoder_input[-1, :, :]
decoder_hidden = encoder_hidden
decoder_output, decoder_hidden = self.decoder(decoder_input, decoder_hidden)

Again, I'm still learning to better understand auto-encoders so if you could explain me why you coded it this way I'll be happy to hear about it.

Thanks for the code,
Cyril

Confuse about the Dimension

Hi, thanks for this very detail and reader friendly tutorial and code.
I am very interested in encoder and decoder, now I finally closer to it.
just a little bit curious about the demission:

lstm_out, self.hidden = self.lstm(x_input.unsqueeze(0), encoder_hidden_states)

In my understanding, the input to lstm should be(seq_len, batch size, input_dimension)
but here, the input become "x_input.unsqueeze(0), encoder_hidden_states"
Can you help me to explain more detail about it?
Like why we use unsqueeze to add one dimension, and how do we combine the hidden state with the new x input?

one more question for encoder, you initialized the hidden and cell, but I didn't see you use it in encoder?

Thanks!!!!!!!!!!!!!!

Selecting Data for Batches

input_batch = input_tensor[:, b: b + batch_size, :]

target_batch = target_tensor[:, b: b + batch_size, :]

Basically these lines are wrong and should be changed like below to iterate on all samples, although, it still doesn't consider the last samples which are not fit into (less than) one batch.

input_batch = input_tensor[:, b * batch_size: (b + 1) * batch_size, :]
target_batch = target_tensor[:, b * batch_size: (b + 1) * batch_size, :]

How to deal with multivariate input?

Hi Author,

Thanks for sharing such an amazing code. I am wondering if this code can be extended to cases with multivariate input data?

Thanks for any help.

Shaoxing

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.