Code Monkey home page Code Monkey logo

lstm's People

Contributors

alfiuman avatar bbtarzan12 avatar carlosascari avatar kurnianggoro avatar naught101 avatar nicodjimenez avatar scottmackay2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lstm's Issues

The backpropagation part is needed?

I think with the definition of the network and loss function, the backpropagation should be auto computed, why should we explicitly definite it in the code?

action recogniton

Hi ,
I am new in LSTM , and I wanna know whether your code apply in the action recognition about skeleton data. And which deep learning architecture like caffe , theano, tensorflow I can use ?
Thanks

I don't understand wi_diff wf_diff etc

diffs w.r.t. inputs

self.param.wi_diff += np.outer(di_input, self.xc)
self.param.wf_diff += np.outer(df_input, self.xc)
self.param.wo_diff += np.outer(do_input, self.xc)
self.param.wg_diff += np.outer(dg_input, self.xc)
self.param.bi_diff += di_input
self.param.bf_diff += df_input       
self.param.bo_diff += do_input
self.param.bg_diff += dg_input       

I am not clear about this code

can't convergence

test,but can't convergence to target:

when: y_list = [-0.8333333333, 0.33333, 0.166666667, -0.8]
result: iter 9999: y_pred = [-0.76083, -0.00007, -0.00007, -0.79996], loss: 1.442e-01

I add a linare layer after h,now it can convergence to any target:

import numpy as np

from lstm import LstmParam, LstmNetwork


class ToyLossLayer:
    def __init__(self,mem_cell_ct):
        self.v=np.zeros(mem_cell_ct)
    
    def value(self, pred):
        out=self.v.dot(pred)  
        return out
    def loss(self, pred, label):
        out=self.value(pred)  
        return (out- label) ** 2


    def bottom_diff(self, pred, label):
        out=self.value(pred)
        df = 2 * (out - label)/self.v.shape[0] 
        diff=df*self.v
        self.v-=0.1*pred*df
        return diff


def example_0():
    # learns to repeat simple sequence from random inputs
    np.random.seed(0)

    # parameters for input data dimension and lstm cell count
    mem_cell_ct = 1000
    x_dim = 50
    lstm_param = LstmParam(mem_cell_ct, x_dim)
    lstm_net = LstmNetwork(lstm_param)
    y_list = [-8.8, 80.2, 3.1, 8000.8]
   
    input_val_arr = [np.random.random(x_dim) for _ in y_list]
    loss=ToyLossLayer(mem_cell_ct)
    for cur_iter in range(10000):
        # print(y_list)
        print("iter", "%2s" % str(cur_iter), end=": ")
        for ind in range(len(y_list)):
            lstm_net.x_list_add(input_val_arr[ind])

        print("y_pred = [" +
              ", ".join(["% 2.5f" % loss.value(lstm_net.lstm_node_list[ind].state.h) for ind in range(len(y_list))]) +
              "]", end=", ")

        lossv = lstm_net.y_list_is(y_list, loss)
        print("loss:", "%.3e" % lossv)
        lstm_param.apply_diff(lr=0.1)
        lstm_net.x_list_clear()


if __name__ == "__main__":
    example_0()

when: y_list = [-8.8, 80.2, 3.1, 8000.8]
result: iter 9999: y_pred = [-8.72629, 80.11621, 3.70385, 8000.36304], loss: 5.988e-01

No tanh on state.s in 'bottom_data_is'

In L95 of lstm.py, as far as I can see you are omitting to apply tanh() to the new cell state before multiplying it with the squashed o(t).

As referenced in the article you mention in your readme in the last equation on page 20, and in this excellent tutorial page I found (https://colah.github.io/posts/2015-08-Understanding-LSTMs/), you have to apply tanh() to your new cell state before you multiply it with o(t). I don't see you doing that in your code, so unless this is being corrected somewhere else I failed to notice, it should be corrected.

Otherwise, this is an excellent resource, thanks a lot :)

You forget the tanh function in the last computation in the part of def bottom_data_is():

Issue: lstm.py--the 98th line.

There is a problem with the code of line: self.state.h = self.state.s * self.state.o. You forget the tanh function. The formula is h_{t} = o_{t} * tanh(s_{t}). Therefore, the correct one is the line of code as follows.

self.state.h = tanh(self.state.s) * self.state.o

Pasted the partial lines of code as follows.

 def bottom_data_is(self, x, s_prev = None, h_prev = None):
    # if this is the first lstm node in the network
    if s_prev is None: s_prev = np.zeros_like(self.state.s)
    if h_prev is None: h_prev = np.zeros_like(self.state.h)
    # save data for use in backprop
    self.s_prev = s_prev
    self.h_prev = h_prev

    # concatenate x(t) and h(t-1)
    xc = np.hstack((x,  h_prev))
    self.state.g = np.tanh(np.dot(self.param.wg, xc) + self.param.bg)
    self.state.i = sigmoid(np.dot(self.param.wi, xc) + self.param.bi)
    self.state.f = sigmoid(np.dot(self.param.wf, xc) + self.param.bf)
    self.state.o = sigmoid(np.dot(self.param.wo, xc) + self.param.bo)
    self.state.s = self.state.g * self.state.i + s_prev * self.state.f
    self.state.h = self.state.s * self.state.o

Each cell gets x_input

Hello. I have a question about dimension of input.

In case

mem_cell_ct = 2
x_dim = 1

Biases are not considered

Input of first cell must be: [x_value, 0]
Input of seconds cell: [cell_1_output, 0]

But in your case:
Input of first cell is: [x_value, 0, 0]
Input of seconds cell: [x_value, cell_1_output, 0]

Is it correct that cells in hidden layers get actual input?

Error in using 2 outputs

This implementation is for 50 input and 1 output. I'm trying to use it with 2 output and it gives an error as follows.

File "test.py", line 43, in example_0
loss = lstm_net.y_list_is(y_list, ToyLossLayer)
File "lstm.py", line 155, in y_list_is
diff_h = loss_layer.bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])
File "test.py", line 17, in bottom_diff
diff[0] = 2 * (pred[0] - label)
ValueError: setting an array element with a sequence.

Ported to Julia

Hi,
I implemented your code (with some changes) in Julia here if you'd like to include it in the README along with the D port.

question

hi, would you please explain what bottom and top means in your code? Thank you very much.

What does "M" really mean?

I am very interested LSTM project. But I found that I could not understand the "M" in your blog very well. In your blog, you said that "M" is the total number of memory cells. But I think "M" equals to the time steps. Do note that there are LSTM networks with multiple memory cells. But I think the LSTM network you discussed is comprised of only one memory cell.

Usage of the LSTM

Hi,
I'm very new to LSTMs and I'm quite confused about how to make use of them.
What do the random input values mean here, when you're teaching it the small y_list sequence ?
I ported the code in C++ and I would basically like to teach it how to speak, after reading some text at the character-level (as discussed here). My purpose is to understand what my input values and y_list should be.
Thanks a lot.

about tan(s)

hi, sorry to bother you again
I read this line in your code: self.state.h = self.state.s * self.state.o
but when I found in the paper, it saids may be like this:
self.state.h = np.tanh(self.state.s) * self.state.o
would you tell me which one is right?

why there is no 0.5* in the equation of the derivative of sate(t) ?

thanks for your great work on this blog, it's the clearest and simplest blog on lstm i have read.
in your blog, derivative of s(t) is get by this equation,
image
i wonder why the left of this equation is equal to the right.
in my opinion, left = 0.5*right. and why we need the top_diff_s if we could calculate
image directly.
image

I am new to nn and lstm, your blog is the best and clearest blog about implement and derivation of lstm, thanks for your great work.

I have a question of ds,how to compulete ds?

def top_diff_is(self, top_diff_h, top_diff_s):
# notice that top_diff_s is carried along the constant error carousel
ds = self.state.o * top_diff_h + top_diff_s

this is your code above. I wanna ask why plus top_diff_s? I am not understand your comment .

mem_cell_ct =100 but only 4 LSTM nodes

Hi,
I was wondering how did you choose the number of memory cells?
And why and how did you choose 100 as a value for your parameter?

And finally, just to be clear : Do you create in the code, 4 LSTM nodes, because of the 4 "input sequence- target value" pairs?

And what is the difference between a memory cell and a LSTM node? I've quickly read the paper that you recommand in your blog, but it's still not clear for me.

Thanks in advance for your answers.
Ick

Where does constant error carousel come from?

referencing the first line in your backward pass:

# notice that top_diff_s is carried along the constant error carousel
ds = self.state.o * top_diff_h + top_diff_s

Mathematically where does the + top_diff_s come from? Is my guess that it is purely just a fudge factor to prevent the gradient from going to zero (hence preventing the vanishing gradient) accurate? Or, is there more math behind it that I'm overlooking?

Thanks again for your clarification!

new dataset

if i want run the code in new dataset,and the data range not in [-1,1],what should i do?

loss compute

@classmethod
def loss(self, pred, label):
return (pred[0] - label) ** 2
@classmethod
def bottom_diff(self, pred, label):
diff = np.zeros_like(pred)
diff[0] = 2 * (pred[0] - label)
return diff
why use pred[0] to compute losee and buttom_diff

y_list dimension

Conceptually, why dont y_list and input_val_arr have the same dimensions? Is your toy example intended as a multi-input-single-output network?

Changing x_dim = 1 had poor results on convergence (a feeble attempt to make y_list and input_val_arr look more familiar by having equal size). However, replicating y_list, such that for each x there is an associated y (single-input-single-output), had adequate convergence and looked more familiar to common examples I've seen around the web:

yy = [-0.5,0.2,0.1, -0.5]
y_list = [[y for _ in xrange(x_dim)] for y in yy]
input_val_arr = [np.random.random(x_dim) for _ in yy]

and then I made very minimal changes to the ToyLossLayer:

@classmethod
def loss(self, pred, label):
    return np.sum((pred[:len(label)] - label) ** 2)

@classmethod
def bottom_diff(self, pred, label):
    diff = np.zeros_like(pred)
    diff[:len(label)] = 2 * (pred[:len(label)] - label)
    return diff

I'm very new to nn and so far have only seen single-input-single-output cases where x and y are equally sized vectors (For example, Karpathy's word prediction rnn: https://gist.github.com/karpathy/d4dee566867f8291f086). I thought to check with you on your intention for the example_0() to ensure I was understanding your work fully. Am I close?

Thanks! this is the easiest to read code I've found on LSTM.

Execution problem

Hi #nicodjimenez would like to work on your code. But while executing test.py file I am getting global name 'ToyLossLayer' is not defined

kindly explain what is ToyLossLayer and why we are sending to "y_list_is( )" in lstm.py
And I don't understand what is "end" in the print statement.

print("iter", "%2s"%str(cur_iter, end":")

Swift port

Hello,

I made a Swift port over here: https://github.com/emilianbold/swift-lstm

Thanks for the code! I hope to use it for some practical application soon.

PS: The repository has no license, so I assume it's safe to assume some sort of Public Domain license? I want to be able to use my port and tweaks...

lstm.py the 97 line

lstm.py the 97 line
self.state.s = self.state.g * self.state.i + s_prev * self.state.f
should be
self.state.s = np.tanh(self.state.g * self.state.i + s_prev * self.state.f)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.