Code Monkey home page Code Monkey logo

rnn-tutorial-gru-lstm's People

Contributors

dennybritz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rnn-tutorial-gru-lstm's Issues

about s_t1

I think s_t1 may be as follows
s_t1 = (T.ones_like(z_t1) - z_t1) * s_t1_prev + z_t1 * c_t1

functools32 error

The file "requirements.txt" must be changed in the line 6

Now: functools32==3.2.3.post2
Later: functools32==3.2.3-2

Readme Error

It mus be updated in the folllowing line:

source venv/bin/active

source venv/bin/activate

Update your RNN tutorial part 4

Hi Denny,

You have written a impressive tutorial about RNN. I am wondering when you update the part 4 of RNN tutorial in your blog.

Best,
Siqin

Comment Scoring

Can this be used to score a comment as well, eg., get the probability of the comment based on the language model?

about batch size of the sgd algorithm

Firstly, thank you very much! Your blog helps so many people to learn the RNN.
I have some questions about the parameter of batch. I am a newer to deep learning, if my question looks stupid, please forgive me.
I have learned that when we use the sgd algorithm to optimize the loss function of CNN, we always give sgd a batch size, but I never use the batch size equals to 1, I think one is too small.
Because I think when batch size equals to 1, the below equation is not right.(the screenshot is from the website book neural networks and deep learning, [http://neuralnetworksanddeeplearning.com/chap1.html])

2016-05-24 12 41 44

But I have read your blog and github code, I found that both in the RNN and LSTM, you both use the batch size equals to 1. So my first question is why you use the batch size equals to 1?
And I found that your code do not support to change the batch size of the sgd algorithm. I am trying to modify your code to support change the batch size. Or do you think it is necessary to modify it?

ValueError: sum(pvals[:-1]) > 1.0

Hi, i following the tutorial by denny britz here, and i got some problem of np.random.multinomial(1, next_word_probs)

here's my related code

def generate_sentence(model, index_to_word, word_to_index, min_length=5):
    # We start the sentence with the start token
    new_sentence = [word_to_index[SENTENCE_START_TOKEN], word_to_index[white], word_to_index[plane]]
    # Repeat until we get an end token
    while not new_sentence[-1] == word_to_index[SENTENCE_END_TOKEN]:
            next_word_probs = model.predict(new_sentence)[-1]
            samples = np.random.multinomial(1, next_word_probs)
            sampled_word = np.argmax(samples)
            new_sentence.append(sampled_word)
            # Seomtimes we get stuck if the sentence becomes too long, e.g. "........" :(
            # And: We don't want sentences with UNKNOWN_TOKEN's
            if len(new_sentence) > 100 or sampled_word == word_to_index[UNKNOWN_TOKEN]:
                return None
    if len(new_sentence) < min_length:
        return None
    return new_sentence

and here is the output error message

Traceback (most recent call last):

  File "<ipython-input-13-5282294d5250>", line 1, in <module>
    runfile('C:/Users/cerdas/Documents/bil/lat/rnn-tutorial-gru-lstm-master/train.py', wdir='C:/Users/cerdas/Documents/bil/lat/rnn-tutorial-gru-lstm-master')

  File "C:\Users\cerdas\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\Users\cerdas\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/cerdas/Documents/bil/lat/rnn-tutorial-gru-lstm-master/train.py", line 53, in <module>
    generate_sentences(model, 10, index_to_word, word_to_index)

  File "C:\Users\cerdas\Documents\bil\lat\rnn-tutorial-gru-lstm-master\utils.py", line 189, in generate_sentences
    sent = generate_sentence(model, index_to_word, word_to_index)

  File "C:\Users\cerdas\Documents\bil\lat\rnn-tutorial-gru-lstm-master\utils.py", line 166, in generate_sentence
    samples = np.random.multinomial(1, next_word_probs)

  File "mtrand.pyx", line 4630, in mtrand.RandomState.multinomial

ValueError: sum(pvals[:-1]) > 1.0`

I have searching for the similar issue, i suspect there is the problem of `np.random.multinomial

i found the suspected problem by the answer

The root of this problem rises from numpy's implicit data casting: the output of my sorfmax() is in float32 type, however, numpy.random.multinomial() will cast the pval into float64 type IMPLICITLY. This data type casting would cause pval.sum() exceed 1.0 sometimes due to numerical rounding.

but i still have no idea how to solve the problem

two GRU layers?

Hi,

I have read your GRU code: https://github.com/dennybritz/rnn-tutorial-gru-lstm/blob/master/gru_theano.py, and there are two GRU layers added.
`
# GRU Layer 1
z_t1 = T.nnet.hard_sigmoid(U[0].dot(x_e) + W[0].dot(s_t1_prev) + b[0])
r_t1 = T.nnet.hard_sigmoid(U[1].dot(x_e) + W[1].dot(s_t1_prev) + b[1])
c_t1 = T.tanh(U[2].dot(x_e) + W[2].dot(s_t1_prev * r_t1) + b[2])
s_t1 = (T.ones_like(z_t1) - z_t1) * c_t1 + z_t1 * s_t1_prev

        # GRU Layer 2
        z_t2 = T.nnet.hard_sigmoid(U[3].dot(s_t1) + W[3].dot(s_t2_prev) + b[3])
        r_t2 = T.nnet.hard_sigmoid(U[4].dot(s_t1) + W[4].dot(s_t2_prev) + b[4])
        c_t2 = T.tanh(U[5].dot(s_t1) + W[5].dot(s_t2_prev * r_t2) + b[5])
        s_t2 = (T.ones_like(z_t2) - z_t2) * c_t2 + z_t2 * s_t2_prev

        # Final output calculation
        # Theano's softmax returns a matrix with one row, we only need the row
        o_t = T.nnet.softmax(V.dot(s_t2) + c)[0]

`
Can I set 'o_t = T.nnet.softmax(V.dot(s_t1) + c)[0]'?

encoding error?

Hi dennybritz,

I tried to run "train.py" and couldn't pass the data file "reddit-comments-2015.csv" reading part in load_data. My python environment is WinPython-64bit-3.4.4.4Qt5.

At first, the error said that str doesn't have the decode attribute. If I removed the decode part, I got the message similar to the following line:
"UnicodeEncodeError: 'gbk' codec can't encode character '\udca0' in position 356: illegal multibyte sequence".

I could open the csv file in Notepad++ and see its encoding as 'utf-8'.

What did I do wrong? Is it because python 2/3 code incompatible with each other? How can I fix the problem?

Thanks.
chenmaosi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.