Hi Egil, Thank you so much for making your code available! This is

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Combining data_pipeline and simple_example about wtte-rnn HOT 4 CLOSED

ragulpr commented on May 20, 2024

Combining data_pipeline and simple_example

from wtte-rnn.

Comments (4)

ragulpr commented on May 20, 2024

Hi there,
Thanks for reaching out! I need to be clearer about this, I haven't had time to join together the two scripts yet. I'll get back to you ASAP with an updated answer but for now:
init_alpha: -785.866918162 is an error. (<0)

Note that for big magnitutes of alpha mean of tte is same as the complex estimate using log etc

Furthermore

Initialization is important. Gradients explode if you're too far off. More censored data leads to higher probability of exploding grad initially.
Learning rate is dependent on data and can be in magnitudes you didn't expect
Are you feeding in masked steps? Varying length sequences has no clean implementation atm, haven't had time to get masking layer to work. Current solution: set n_timesteps = None and run training step with one input sequence with something like:

OBS NOT TESTED:

def epoch():
    for i in xrange(n_samples):
        model.fit(x_train[i,:seq_length[i],:], y_train[i,:seq_length[i],:],
                  epochs=1,
                  batch_size=1,
                  verbose=2
                  )

But even better debug-mode initially is to simply transform the data to [n_non_masked_samples,1,n_features] (feed in only seen timesteps) to a simple ANN and when that works test the RNN.

Would love to see forks!

from wtte-rnn.

ragulpr commented on May 20, 2024

There's multiple reasons for NANs to show up but just found a very important:

shift_discrete_padded_features is currently broken which is supposed to hide target but apparently doesn't. This means that if input is "event" then it's possible to make a perfect prediction, causing exploding gradient

I'm trying to fix it asap

from wtte-rnn.

NataliaVConnolly commented on May 20, 2024

Hi Egil,

Thanks for the update! Here's a fork with the notebook Combined_data_pipeline_and_analysis in examples/keras.

  https://github.com/NataliaVConnolly/wtte-rnn-1

The last cell shows an example of training with just one input sequence. It does result in a non-NaN loss, although a very large one (but I didn't optimize the initial alpha or the network config much).

Cheers,
Natalia (aka hedgy123 :))

from wtte-rnn.

ragulpr commented on May 20, 2024

@NataliaVConnolly Sorry for the wait. It took me some time to figure out what was wrong!

Too much censoring leads to instability. Works when using more frequent committers, <50% censoring. In the example I use only those who committed at least 10 days.
You train on one subject but initialize alpha using the mean over all subjects. This leads to high probability of exploding gradient.
As mentioned above, if it was done before the fix of shift_discrete_padded_features that would also lead to NaN (perfect fit) after some training.

Check out the new data_pipeline and let me know if you have more questions! :)

from wtte-rnn.

Combining data_pipeline and simple_example about wtte-rnn HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent