nicodjimenez / lstm Goto Github PK

View Code? Open in Web Editor NEW

1.7K 70.0 654.0 22 KB

Minimal, clean example of lstm neural network training in python, for learning purposes.

Python 100.00%

lstm's People

Contributors

Stargazers

Watchers

Forkers

fivesecondlaugh 5idaidai psanecki rcrowder tianzhi0549 vsooda yt752 sidec bquast plsang dapeng2018 waterxt liangkai le02146 piyush3db kreukle zbxzc35 omar-florez ha2emnomer bfreis kdjyss shannonyu yudf2010 catsdogone hahnyuan prm10 qnix poyuwu pluiefox sqxiang v-chuqin charlesshang zjmbjfu yehaibuaa severusvinegar kurnianggoro qixianbiao fredwang222 zengguodong mmoradik bertomartin alfiuman countkisg dvhuang peratham cocoalab alexzhang267 yuxiaorun zhengkaifu lguduy followingjoneses miradel51 albanperli jacek6 juary88 dimitrif vyraun yetanothertimes keaideii iguazi laurencecao karanberiwal franksnail hbyw618 icyc9 javelir harshavardhank donnyyou rarezhang zhaoyu611 nrupatunga leedaga soledad89 lyouqi chjiao rameshglyton alexbigboy olgnaydn siddheshmhatre everglowing 174high zhwhong csbrown ichraibi quxiaofeng scottmackay2 lamhocn gucasbrg wangwocg xiaoxiaoxiaojing limitspro jkhlot dongziqi001 allensmile moniruzzaman007 alg-jmx wxybdth grafke lightingwy hotohoto

lstm's Issues

line 85&86:if s_prev == None: s_prev = np.zeros_like(self.state.s)

when run test.py, there is a error.
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
changing the '==' of two lines to 'is' is right, it likes 'if s_prev is None: s_prev = np.zeros_like(self.state.s)'

self.state.h = self.state.s * self.state.o

Why self.state.h = self.state.s * self.state.o?
But not h = tanh(s) * o?

The backpropagation part is needed?

I think with the definition of the network and loss function, the backpropagation should be auto computed, why should we explicitly definite it in the code?

action recogniton

Hi ,
I am new in LSTM , and I wanna know whether your code apply in the action recognition about skeleton data. And which deep learning architecture like caffe , theano, tensorflow I can use ?
Thanks

I don't understand wi_diff wf_diff etc

diffs w.r.t. inputs

self.param.wi_diff += np.outer(di_input, self.xc)
self.param.wf_diff += np.outer(df_input, self.xc)
self.param.wo_diff += np.outer(do_input, self.xc)
self.param.wg_diff += np.outer(dg_input, self.xc)
self.param.bi_diff += di_input
self.param.bf_diff += df_input       
self.param.bo_diff += do_input
self.param.bg_diff += dg_input

I am not clear about this code

can't convergence

test,but can't convergence to target:

when: y_list = [-0.8333333333, 0.33333, 0.166666667, -0.8]
result: iter 9999: y_pred = [-0.76083, -0.00007, -0.00007, -0.79996], loss: 1.442e-01

I add a linare layer after h,now it can convergence to any target:

import numpy as np

from lstm import LstmParam, LstmNetwork


class ToyLossLayer:
    def __init__(self,mem_cell_ct):
        self.v=np.zeros(mem_cell_ct)
    
    def value(self, pred):
        out=self.v.dot(pred)  
        return out
    def loss(self, pred, label):
        out=self.value(pred)  
        return (out- label) ** 2


    def bottom_diff(self, pred, label):
        out=self.value(pred)
        df = 2 * (out - label)/self.v.shape[0] 
        diff=df*self.v
        self.v-=0.1*pred*df
        return diff


def example_0():
    # learns to repeat simple sequence from random inputs
    np.random.seed(0)

    # parameters for input data dimension and lstm cell count
    mem_cell_ct = 1000
    x_dim = 50
    lstm_param = LstmParam(mem_cell_ct, x_dim)
    lstm_net = LstmNetwork(lstm_param)
    y_list = [-8.8, 80.2, 3.1, 8000.8]
   
    input_val_arr = [np.random.random(x_dim) for _ in y_list]
    loss=ToyLossLayer(mem_cell_ct)
    for cur_iter in range(10000):
        # print(y_list)
        print("iter", "%2s" % str(cur_iter), end=": ")
        for ind in range(len(y_list)):
            lstm_net.x_list_add(input_val_arr[ind])

        print("y_pred = [" +
              ", ".join(["% 2.5f" % loss.value(lstm_net.lstm_node_list[ind].state.h) for ind in range(len(y_list))]) +
              "]", end=", ")

        lossv = lstm_net.y_list_is(y_list, loss)
        print("loss:", "%.3e" % lossv)
        lstm_param.apply_diff(lr=0.1)
        lstm_net.x_list_clear()


if __name__ == "__main__":
    example_0()

when: y_list = [-8.8, 80.2, 3.1, 8000.8]
result: iter 9999: y_pred = [-8.72629, 80.11621, 3.70385, 8000.36304], loss: 5.988e-01

After training, how to input some test samples?

hi,
I'm very new to lstm.
I use test.py finish the training, then how can I input some test samples to get the acc or something?
thank you!

How to predict a new sequence ?

If a next sequence without the true label is coming, how to predict it ? Thanks!

No tanh on state.s in 'bottom_data_is'

In L95 of lstm.py, as far as I can see you are omitting to apply tanh() to the new cell state before multiplying it with the squashed o(t).

As referenced in the article you mention in your readme in the last equation on page 20, and in this excellent tutorial page I found (https://colah.github.io/posts/2015-08-Understanding-LSTMs/), you have to apply tanh() to your new cell state before you multiply it with o(t). I don't see you doing that in your code, so unless this is being corrected somewhere else I failed to notice, it should be corrected.

Otherwise, this is an excellent resource, thanks a lot :)

You forget the tanh function in the last computation in the part of def bottom_data_is():

Issue: lstm.py--the 98th line.

There is a problem with the code of line: self.state.h = self.state.s * self.state.o. You forget the tanh function. The formula is h_{t} = o_{t} * tanh(s_{t}). Therefore, the correct one is the line of code as follows.

self.state.h = tanh(self.state.s) * self.state.o

Pasted the partial lines of code as follows.

 def bottom_data_is(self, x, s_prev = None, h_prev = None):
    # if this is the first lstm node in the network
    if s_prev is None: s_prev = np.zeros_like(self.state.s)
    if h_prev is None: h_prev = np.zeros_like(self.state.h)
    # save data for use in backprop
    self.s_prev = s_prev
    self.h_prev = h_prev

    # concatenate x(t) and h(t-1)
    xc = np.hstack((x,  h_prev))
    self.state.g = np.tanh(np.dot(self.param.wg, xc) + self.param.bg)
    self.state.i = sigmoid(np.dot(self.param.wi, xc) + self.param.bi)
    self.state.f = sigmoid(np.dot(self.param.wf, xc) + self.param.bf)
    self.state.o = sigmoid(np.dot(self.param.wo, xc) + self.param.bo)
    self.state.s = self.state.g * self.state.i + s_prev * self.state.f
    self.state.h = self.state.s * self.state.o

Each cell gets x_input

Hello. I have a question about dimension of input.

In case

mem_cell_ct = 2
x_dim = 1

Biases are not considered

Input of first cell must be: [x_value, 0]
Input of seconds cell: [cell_1_output, 0]

But in your case:
Input of first cell is: [x_value, 0, 0]
Input of seconds cell: [x_value, cell_1_output, 0]

Is it correct that cells in hidden layers get actual input?

Error in using 2 outputs

This implementation is for 50 input and 1 output. I'm trying to use it with 2 output and it gives an error as follows.

File "test.py", line 43, in example_0
loss = lstm_net.y_list_is(y_list, ToyLossLayer)
File "lstm.py", line 155, in y_list_is
diff_h = loss_layer.bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])
File "test.py", line 17, in bottom_diff
diff[0] = 2 * (pred[0] - label)
ValueError: setting an array element with a sequence.

Ported to Julia

Hi,
I implemented your code (with some changes) in Julia here if you'd like to include it in the README along with the D port.

question

hi, would you please explain what bottom and top means in your code? Thank you very much.

What does "M" really mean?

I am very interested LSTM project. But I found that I could not understand the "M" in your blog very well. In your blog, you said that "M" is the total number of memory cells. But I think "M" equals to the time steps. Do note that there are LSTM networks with multiple memory cells. But I think the LSTM network you discussed is comprised of only one memory cell.

The tanh_derivative should be : 1. + values**2

Usage of the LSTM

Hi,
I'm very new to LSTMs and I'm quite confused about how to make use of them.
What do the random input values mean here, when you're teaching it the small y_list sequence ?
I ported the code in C++ and I would basically like to teach it how to speak, after reading some text at the character-level (as discussed here). My purpose is to understand what my input values and y_list should be.
Thanks a lot.

about tan(s)

hi, sorry to bother you again
I read this line in your code: self.state.h = self.state.s * self.state.o
but when I found in the paper, it saids may be like this:
self.state.h = np.tanh(self.state.s) * self.state.o
would you tell me which one is right?

why there is no 0.5* in the equation of the derivative of sate(t) ?

thanks for your great work on this blog, it's the clearest and simplest blog on lstm i have read.
in your blog, derivative of s(t) is get by this equation,

i wonder why the left of this equation is equal to the right.
in my opinion, left = 0.5*right. and why we need the top_diff_s if we could calculate
directly.

I am new to nn and lstm, your blog is the best and clearest blog about implement and derivation of lstm, thanks for your great work.

A parameter (x_dim) appears to be unused

In LstmState __init__, x_dim looks to be unnecessary. Is that correct?

https://github.com/nicodjimenez/lstm/blob/master/lstm.py#L65

I have a question of ds,how to compulete ds?

def top_diff_is(self, top_diff_h, top_diff_s):
# notice that top_diff_s is carried along the constant error carousel
ds = self.state.o * top_diff_h + top_diff_s

this is your code above. I wanna ask why plus top_diff_s? I am not understand your comment .

mem_cell_ct =100 but only 4 LSTM nodes

Hi,
I was wondering how did you choose the number of memory cells?
And why and how did you choose 100 as a value for your parameter?

And finally, just to be clear : Do you create in the code, 4 LSTM nodes, because of the 4 "input sequence- target value" pairs?

And what is the difference between a memory cell and a LSTM node? I've quickly read the paper that you recommand in your blog, but it's still not clear for me.

Thanks in advance for your answers.
Ick

Explain self.wi = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)

what does self.wi = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len) signify? Is wi having input gate weights for both input x_{t} as well as the h_{t-1}. What does mem_cell_ct contain and what is concat_len

Where does constant error carousel come from?

referencing the first line in your backward pass:

# notice that top_diff_s is carried along the constant error carousel
ds = self.state.o * top_diff_h + top_diff_s

Mathematically where does the + top_diff_s come from? Is my guess that it is purely just a fudge factor to prevent the gradient from going to zero (hence preventing the vanishing gradient) accurate? Or, is there more math behind it that I'm overlooking?

Thanks again for your clarification!

Schematic diagram of the Architecture?

@nicodjimenez Could you please upload a schematic diagram of the LSTM architecture? Will help a lot.

Thanks.

new dataset

if i want run the code in new dataset,and the data range not in [-1,1],what should i do?

loss compute

@classmethod
def loss(self, pred, label):
return (pred[0] - label) ** 2
@classmethod
def bottom_diff(self, pred, label):
diff = np.zeros_like(pred)
diff[0] = 2 * (pred[0] - label)
return diff
why use pred[0] to compute losee and buttom_diff

Port to D

Hey,

I ported your example to D. Maybe it's useful to reference here in your readme or so..

https://github.com/Marenz/lstm

y_list dimension

Conceptually, why dont y_list and input_val_arr have the same dimensions? Is your toy example intended as a multi-input-single-output network?

Changing x_dim = 1 had poor results on convergence (a feeble attempt to make y_list and input_val_arr look more familiar by having equal size). However, replicating y_list, such that for each x there is an associated y (single-input-single-output), had adequate convergence and looked more familiar to common examples I've seen around the web:

yy = [-0.5,0.2,0.1, -0.5]
y_list = [[y for _ in xrange(x_dim)] for y in yy]
input_val_arr = [np.random.random(x_dim) for _ in yy]

and then I made very minimal changes to the ToyLossLayer:

@classmethod
def loss(self, pred, label):
    return np.sum((pred[:len(label)] - label) ** 2)

@classmethod
def bottom_diff(self, pred, label):
    diff = np.zeros_like(pred)
    diff[:len(label)] = 2 * (pred[:len(label)] - label)
    return diff

I'm very new to nn and so far have only seen single-input-single-output cases where x and y are equally sized vectors (For example, Karpathy's word prediction rnn: https://gist.github.com/karpathy/d4dee566867f8291f086). I thought to check with you on your intention for the example_0() to ensure I was understanding your work fully. Am I close?

Thanks! this is the easiest to read code I've found on LSTM.

Execution problem

Hi #nicodjimenez would like to work on your code. But while executing test.py file I am getting global name 'ToyLossLayer' is not defined

kindly explain what is ToyLossLayer and why we are sending to "y_list_is( )" in lstm.py
And I don't understand what is "end" in the print statement.

print("iter", "%2s"%str(cur_iter, end":")

Swift port

Hello,

I made a Swift port over here: https://github.com/emilianbold/swift-lstm

Thanks for the code! I hope to use it for some practical application soon.

PS: The repository has no license, so I assume it's safe to assume some sort of Public Domain license? I want to be able to use my port and tweaks...

lstm.py the 97 line

lstm.py the 97 line
self.state.s = self.state.g * self.state.i + s_prev * self.state.f
should be
self.state.s = np.tanh(self.state.g * self.state.i + s_prev * self.state.f)