Code Monkey home page Code Monkey logo

sru's People

Contributors

adrianbg avatar ajsyp avatar andfoy avatar cdfox-asapp avatar hpasapp avatar jeremyasapp avatar jweese-asapp avatar jwohlwend avatar kzjeef avatar ryanferg avatar taolei87 avatar taoleicn avatar thesage21 avatar yzhang87 avatar zheolong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sru's Issues

Error when use SRU in DrQA

Hi,
when I use SRU in DrQA to instead of LSTM, this error happened to me,
File "/home/hebian.ww/DrQA/drqa/reader/cuda_functional.py", line 359, in forward stream=SRU_STREAM File "cupy/cuda/function.pyx", line 129, in cupy.cuda.function.Function.__call__ (cupy/cuda/function.cpp:3963) File "cupy/cuda/function.pyx", line 111, in cupy.cuda.function._launch (cupy/cuda/function.cpp:3600) File "cupy/cuda/driver.pyx", line 127, in cupy.cuda.driver.launchKernel (cupy/cuda/driver.cpp:2541) File "cupy/cuda/driver.pyx", line 62, in cupy.cuda.driver.check_status (cupy/cuda/driver.cpp:1446) cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_HANDLE: invalid resource handle
same error with #4 , but I received this error with single GPU

About k value in SRUCell , why it's can be 4 when n_in != out_size

Hi @taolei87 ,

I have a question about weight matrix dimension,
In the SRUCell code, I found the k = 4 if n_in != out_size else 3
But When I read the paper, it's only have 3 weight matrix, W, Wf, Wr,

And I found the n_in will not equal to out_size when the layer number is 0, but I don't understand why k = 4, what's those weight other than W, Wf, Wr ?

below is init code:

class SRUCell(nn.Module):
    def __init__(self, n_in, n_out, dropout=0, vari_dropout=0,
                 use_tanh=1, bidirectional=False):
 ....
        out_size = n_out*2 if bidirectional else n_out
        k = 4 if n_in != out_size else 3
        self.size_per_dir = n_out*k
        self.weight = nn.Parameter(torch.Tensor(
            n_in,
            self.size_per_dir*2 if bidirectional else self.size_per_dir
        ))
  ...

below is when in_in is not equal to out_size:

class SRU(nn.Module):
    def __init__(self, input_size, hidden_size,
                 num_layers=2, dropout=0, vari_dropout=0,
                 use_tanh=1, bidirectional=False):
     ...
    ...
        self.n_in = input_size
        self.n_out = hidden_size
       ...
        self.out_size = hidden_size*2 if bidirectional else hidden_size

        for i in range(num_layers):
            l = SRUCell(n_in=self.n_in if i == 0 else self.out_size,
                        n_out=self.n_out,
                        dropout=dropout if i+1 != num_layers else 0,
                        vari_dropout=vari_dropout,
                        use_tanh=use_tanh,
                        bidirectional=bidirectional)
            self.rnn_lst.append(l)

Thanks

how does SRU work for decoder?

Would SRU provide any benefits for a decoder that decodes step by step based on the previous decoded output? It will no longer be parallelizable over time steps so the only latency saving would come from the reduced number of operations per time step (compared to LSTM)?

Weight and grad shapes mismatch

Hi!

I have some issues running language_model example.

$ python3 train_lm.py --train train.txt --dev valid.txt --test test.txt 
Namespace(batch_size=32, bias=-3, clip_grad=5, d=910, depth=6, dev='valid.txt', dropout=0.7, lr=1.0, lr_decay=0.98, lr_decay_epoch=175, lstm=False, max_epoch=300, rnn_dropout=0.2, test='test.txt', train='train.txt', unroll_size=35, weight_decay=1e-05)

WARNING: set_bias() is deprecated. use `highway_bias` option in SRUCell() constructor.

WARNING: set_bias() is deprecated. use `highway_bias` option in SRUCell() constructor.

WARNING: set_bias() is deprecated. use `highway_bias` option in SRUCell() constructor.

WARNING: set_bias() is deprecated. use `highway_bias` option in SRUCell() constructor.

WARNING: set_bias() is deprecated. use `highway_bias` option in SRUCell() constructor.

WARNING: set_bias() is deprecated. use `highway_bias` option in SRUCell() constructor.
vocab size: 10000
num of parameters: 24026720
train_lm.py:110: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  norms = [ "{:.0f}".format(x.norm().data[0]) for x in self.parameters() ]
        p_norm: ['100', '45', '90', '45', '90', '45', '90', '45', '90', '45', '90', '47', '90', '0']

SRU loaded for gpu 0
train_lm.py:140: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
  torch.nn.utils.clip_grad_norm(model.parameters(), args.clip_grad)
Traceback (most recent call last):
  File "train_lm.py", line 262, in <module>
    main(args)
  File "train_lm.py", line 206, in main
    train_ppl = train_model(epoch, model, train)
  File "train_lm.py", line 145, in train_model
    p.data.add_(-lr, p.grad.data)
RuntimeError: expand(torch.cuda.FloatTensor{[2, 910]}, size=[1820]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

It looks like it is because some weight and grad has different shapes.

Latest master SRU fails to train

Thanks for your awesome work on this model, right now we're using it to process language and visual features to generate segmentation masks from referral expressions, however, we're experiencing major issues with the latest master revision.

Until commit 43c85ed, our model did train perfectly (without gradient clipping or any additional techniques), however, after updating to the latest version, the model fails to converge during training. Also, our old weights are not compatible with the current SRU version and we haven't touched our code at all. We would like to know if is there any incompatibility introduced after the multigpu branch was merged?

Here's a little snippet depicting our current declaration and usage of the SRU model:

class Net(nn.Module):
    def __init__(self, *args, **kwargs):
        ...
        self.lang_model = SRU(emb_size, hid_size, num_layers=lang_layers)
        ...
        self.mrnn = SRU(mixed_size, hid_mixed_size,
                        num_layers=mixed_layers)

    def forward(self, vis, lang):
        ...
        lang = self.emb(lang)
        # LxB representation
        lang = torch.transpose(lang, 0, 1)
        # input has dimensions: seq_length x batch_size (1) x we_dim
        lang, _ = self.lang_model(lang)
        ...
        # input has dimensions: seq_length x batch_size (1) x mix_size
        output, _ = self.mrnn(q)
        ...

I would appreciate any guidance to solve this issue. Right now, our model takes 5 days to train and we would like to parallelize it on multiple GPUs, such that we can reduce that time.

Bi-Direction forward and backward seems incorrect, only capture half of input_x in each direction in element-wise

Hi Tao,

I recently found there is issue in bi-direction case,
such as input_size = 6, hidden_size = 3, direction_count = 2, length = 2, batch_size = 2
in this case, k == 3,

the x matrix will be like this:
l(the f or the r latter before x's number is mark for forward and reverse (flip == 1))

[[[f-0.302948 f-0.255578 f-0.110915  r0.1591    r0.928114  r0.92241 ]

  [f-0.50604   f0.391675 f-0.187608  r0.468802 r-0.648262 r-0.177739]]

 [[ f0.50936   f0.67189  f-0.619738  r0.377355  r0.545083 r-0.971449]
  [ f0.948531 f-0.551092  f0.227567 r-0.46116  r-0.496896 r-0.769874]]]

In the forward kernel:
I did print in forward kernel:

[F] col:0  L:0 N:0 D:0 DIR:0 act:1 k:3 d:3 x: -0.302948 
[F] col:1  L:0 N:0 D:1 DIR:0 act:1 k:3 d:3 x: -0.255578 
[F] col:2  L:0 N:0 D:2 DIR:0 act:1 k:3 d:3 x: -0.110915 
[F] col:3  L:0 N:0 D:0 DIR:1 act:1 k:3 d:3 x: 0.377355 
[F] col:4  L:0 N:0 D:1 DIR:1 act:1 k:3 d:3 x: 0.545083 
[F] col:5  L:0 N:0 D:2 DIR:1 act:1 k:3 d:3 x: -0.971449 
[F] col:6  L:0 N:1 D:0 DIR:0 act:1 k:3 d:3 x: -0.506040 
[F] col:7  L:0 N:1 D:1 DIR:0 act:1 k:3 d:3 x: 0.391675 
[F] col:8  L:0 N:1 D:2 DIR:0 act:1 k:3 d:3 x: -0.187608 
[F] col:9  L:0 N:1 D:0 DIR:1 act:1 k:3 d:3 x: -0.461160 
[F] col:10  L:0 N:1 D:1 DIR:1 act:1 k:3 d:3 x: -0.496896
[F] col:11  L:0 N:1 D:2 DIR:1 act:1 k:3 d:3 x: -0.769874
[F] col:0  L:1 N:0 D:0 DIR:0 act:1 k:3 d:3 x: 0.509360 
[F] col:1  L:1 N:0 D:1 DIR:0 act:1 k:3 d:3 x: 0.671890 
[F] col:2  L:1 N:0 D:2 DIR:0 act:1 k:3 d:3 x: -0.619738 
[F] col:3  L:1 N:0 D:0 DIR:1 act:1 k:3 d:3 x: 0.159100 
[F] col:4  L:1 N:0 D:1 DIR:1 act:1 k:3 d:3 x: 0.928114 
[F] col:5  L:1 N:0 D:2 DIR:1 act:1 k:3 d:3 x: 0.922410 
[F] col:6  L:1 N:1 D:0 DIR:0 act:1 k:3 d:3 x: 0.948531 
[F] col:7  L:1 N:1 D:1 DIR:0 act:1 k:3 d:3 x: -0.551092 
[F] col:8  L:1 N:1 D:2 DIR:0 act:1 k:3 d:3 x: 0.227567 
[F] col:9  L:1 N:1 D:0 DIR:1 act:1 k:3 d:3 x: 0.468802 
[F] col:10  L:1 N:1 D:1 DIR:1 act:1 k:3 d:3 x: -0.648262 
[F] col:11  L:1 N:1 D:2 DIR:1 act:1 k:3 d:3 x: -0.177739 
F

For you reference, this is code add to print value.

void sru_bi_fwd(...) {
for (int row = 0; row < len; ++row )
{
...
...
      *hp = (val*mask-(*xp))*g2 + (*xp);
      printf("[F] col:%d  L:%d N:%d D:%d DIR:%d act:%d k:%d d:%d x:%f\n",
             col, cnt, (col/d2), (col%d), flip, activation_type, k, d, *(xp) );

And I found for the forward direction( flip == 0 (print as DIR)) code, only access the left half of input x, and backward direction(flip == 1) only access the right half of input x.

This behavior is very different from every other case in SRU (k == 3, uni-direction, k == 4, uni/bi-direction), Since it only can keep half information of input by the reset gate, but the activation will see all the x's input.

Do you think this is a issue ?

output vs. hidden

Hi,

I'm trying to understand what exactly is "output", and what is "hidden". Based on the code in cuda_functional.py, it seems that "output" is corresponding to the hidden states of LSTM, and "hidden" is corresponding to the cell states of LSTM. Is this understanding right?

Or in fact, both hidden and output are the same, hidden is just the last time-step hidden state for each layer, while output is the all-time-step hidden states for the last layer?

output, hidden = rnn(x)      # forward pass

# output is (length, batch size, hidden size * number of directions)
# hidden is (layers, batch size, hidden size * number of directions)

prevx = input
lstc = []
for i, rnn in enumerate(self.rnn_lst):
        h, c = rnn(prevx, c0[i])
        prevx = h
        lstc.append(c)

if return_hidden:
        return prevx, torch.stack(lstc)
else:
        return prevx

Different input dimention compared to output dimension

Hi, I'm trying to implement a naive version of this paper in Keras, and was wondering how is the case that - n_in != n_out handled.

I went through the code a few times, and couldn't understand the element wise multiplication of (1 - r_t) with x_t, if x_t is of a different shape than r_t.

Calculating Backwards For SRU Results in CUDA error.

I'm not sure how, but I'm seeing this error when I try to compute the backwards function. Don't know if you've come across this during your debug?

Traceback (most recent call last):
  File "gan_language.py", line 341, in <module>
    G.backward(one)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 156, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/__init__.py", line 98, in backward
    variables, grad_variables, retain_graph)
  File "/home/nick/wgan-gp/sru/cuda_functional.py", line 417, in backward
    stream=SRU_STREAM
  File "cupy/cuda/function.pyx", line 129, in cupy.cuda.function.Function.__call__ (cupy/cuda/function.cpp:4010)  File "cupy/cuda/function.pyx", line 111, in cupy.cuda.function._launch (cupy/cuda/function.cpp:3647)
  File "cupy/cuda/driver.pyx", line 127, in cupy.cuda.driver.launchKernel (cupy/cuda/driver.cpp:2541)
  File "cupy/cuda/driver.pyx", line 62, in cupy.cuda.driver.check_status (cupy/cuda/driver.cpp:1446)
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_HANDLE: invalid resource handle

Confused about the variable k in the code.

"k = 4 if n_in != out_size else 3".
I notice that many places involve the variable k which sometimes be 4 and sometimes be 3. It seems that it is related to the different layer. I think the k is related to the 3 different W matrix in the paper which are W, W_f, W_r. Could you please explain about it? Thanks.

torch binding?

Hi
Thanks for releasing the code. I am wondering whether it is possible to release a torch binding?
Thanks!

Unable to reproduce speech experiments

Can you please provide more detailed instructions on how to run the speech experiments from your paper? I am a graduate student participating in the ICLR 2018 Reproduciblity Challenage and we are having some difficulty reproducing the speech experiments as described by the installation instructions provided in the sru/speech directory. I was able to install Kaldi and CUDA 8, but I am uncertain how to build @yzhang87's forked CNTK version. According to your instructions:

Build KaldiReader in CNTK (follow the instruction).

I assume "KaldiReader" is the name of the file in @yzhang87's fork, is this correct? Are Microsoft's official instructions the correct instructions or are there specific instructions we should look for in the forked repository? Several steps from the official instructions are no longer appropriate for the fork. Notably, the fork expects a missing file called mkl.h which is no longer provided in any of the open source MKLML releases, which are suggested by the Microsoft instructions. Can you clarify how to obtain the correct MKLML dependencies to run your experiment? Thank you!

python setup.py install fail

error:

Traceback (most recent call last):
  File "setup.py", line 53, in <module>
    version=get_version(),
  File "setup.py", line 21, in get_version
    with open(version_py) as fh:
FileNotFoundError: [Errno 2] No such file or directory: 'version.py'

SRU's convergence speed is slow down in the second half of training compare to lstm

Hi, I use the SRU in tensorflow for a seq2seq model.

I compare the loss with SRU and LSTM (hyper param, network structure is same), after 10 epoch, the SRU's convergence speed is slow down.
And at the 50 epoch, SRU's loss is 0.05, and LSTM's loss is only 0.01. But the validate loss is very close.
My code is here: https://github.com/johnnykthink/SRU-Tensorflow/blob/master/sru.py

So, how can I fix it? Thanks.

The training loss graph, data pair num: 300k. (blue is SRU, gray is LSTM)
image

The validate loss graph, data pair num: 30k
image

loss=nan

@taolei87 hi, I want to thank you very much firstly. Your attribution about SRU helps me a lot. Seeing models with SRU running so fast makes me live in dream. So, I deploy SRU to my own code, replacing lstm(gru). However, my new model with SRU always has the problem: loss=nan. I study previous issues and found that we should use gradient clipping. Can you give some example about how to set gradient clipping, because I did not find in codes published? And the new 'Pull requests' may talk about it.Wish more and more people enjoy the speed of SRU, it's magic!

Question about the SRU training speed in tensorflow

Hi Tao,

Thanks for the great job, I had the implement you paper in tensorflow.
In a seq2seq task, the SRU training speed is 1.6x faster than the tensorflow's BasicLSTMCell !
And the accuracy is a little better than LSTM.

But how can I get the 5-10x faster in your paper? Now I'm using feed_dict feed the data to model, I will use tfrecords later for compare.

Thanks.

SRU Module Doesn't appear to Use Residual/skip connections

@taolei87 thanks for the repo again. Really good code.

One thing that I was analyzing was that it doesn't seem that the sru class doesn't have skip connections. Shouldn't it be:

        prevx = input
        lstc = []
        for i, rnn in enumerate(self.rnn_lst):
            h, c = rnn(prevx, c0[i])
            prevx += h #you have prevx = h

In this way the connections are residual which is useful for stacking multiple layers.

There is a bug in language model experiment

In the main function of language_model, the word vocabulary should contain all word from dev+train+test. otherwise the program will through a exception.

model = Model(train+dev+test, args)

A little question about the architecture

Could anyone tell me some tips about the architechture?
2
In the last equation, does that mean input x_t should have the same size as output h_t ?As it is element-wise product......
Thanks a lot.

question: SRU and end-to-end speech recognition models

Hi,

Part of experiments with SRU were conducted on Speech recognition tasks (section 4.5) however there is no work
on end-to-end models like deepspeech2 evaluated with SRU. Have you tried this? Are there any architecture problems with applying SRU for deepspeech2 model and end-to-end Speech recognition models in general?

Many thanks for a wonderful work on SRU

Best Regards,
Jacek

no support for variable sequence lengths with pytorch PackedSequence

Hi, one of the merits of RNNs is support for variable sequence lengths. In many applications (speech, NLP etc.), a batch of samples consists of sequences that do not have the same length, thus requiring padding for batching. To avoid training on the padded and non-informative parts, pytorch uses PackedSequence objects. With SRU, I get the following error using a PackedSequence as input:

in forward assert input.dim() == 3 # (len, batch, n_in)
AttributeError: 'PackedSequence' object has no attribute 'dim'

Do you plan to add support for PackedSequence objects?

word2vec processing procedure

I am a green hand of word2vec, so I'm confuse about the complete process of "Download pre-trained word embeddings such as word2vec; make it into text format" in the classification task. I have been cloned the word2vec repo and followed the quick-start in https://code.google.com/archive/p/word2vec/, then run the demo script ./demo-word.sh and ./demo-phrases.sh. But I don't know how to "make it into text format". Could you please give a more precise description?
Thanks!

wrong in pytorch=0.4

when I use SRU in pytorch=0.4, wrong as this:
optimizer.step()
File "/usr/local/lib/python3.5/dist-packages/torch/optim/sgd.py", line 93, in step
d_p.add_(weight_decay, p.data)
RuntimeError: The expanded size of the tensor (56) must match the existing size (112) at non-singleton dimension 1
when I repalce SRU to GRU, it won't happen.
I feel sorry that I don't futher study where the wrong happen, but notice it.

An error occurred in the backward pass after adding the slice of hidden state.

Hi,
when I use SRU in NMT
this error happened to me.

bidirectional = True
output, hidden = enc(x)      # forward pass

# (4, 64, 400) - (layers x batch_size x rnn_size*2)
hidden = hidden[:, :, 0:rnn_size] + hidden[:, :, rnn_size:]

output, hidden = dec(y, hidden)

...

loss.div(x.size(1)).backward() 

loss.div(x.size(1)).backward()
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 156, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/init.py", line 98, in backward
variables, grad_variables, retain_graph)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/function.py", line 91, in apply
return self._forward_cls.backward(self, *args)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/tensor.py", line 29, in backward
grad_input[ctx.index] = grad_output
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 85, in setitem
return SetItem.apply(self, key, value)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/tensor.py", line 43, in forward
i._set_index(ctx.index, value)
RuntimeError: invalid argument 2: sizes do not match at /pytorch/torch/lib/THC/THCTensorCopy.cu:31

No module named 'cupy'

I have installed cupy by pip install cupy.However, it said No module named 'cupy'.
What is the reason of it?

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-52f03c45a3e3> in <module>()
     11 from torch import optim
     12 import torch.nn.functional as F
---> 13 from cuda_functional import SRU, SRUCell

/home/quoniammm/version-control/mine-pytorch-examples/torch_basic/cuda_functional.py in <module>()
      7 import torch.nn as nn
      8 from torch.autograd import Function, Variable
----> 9 from cupy.cuda import function
     10 from pynvrtc.compiler import Program
     11 from collections import namedtuple

ModuleNotFoundError: No module named 'cupy'

treminal——conda list:

certifi                   2016.2.28                py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
cupy                      1.0.3                     <pip>
......

Different F1 scores in DrQA when resuming with different batch sizes

Here's a simple test. Train the model for 2 epochs on the SQUAD problem. Here's a bash dump from my machine.

(venv) ubuntu@x:~/work/sru/DrQA$ python train.py -e 0 -bs 32  --resume checkpoint_epoch_2.pt
seed: 937
10/09/2017 09:36:19 [program starts.]
10/09/2017 09:36:46 [Data loaded.]
10/09/2017 09:36:46 [loading previous model...]
2806118 parameters
10/09/2017 09:37:11 [dev EM: 55.638599810785244 F1: 66.81417432626549]
(venv) ubuntu@x:~/work/sru/DrQA$ python train.py -e 0 -bs 1  --resume checkpoint_epoch_2.pt
seed: 937
10/09/2017 09:37:36 [program starts.]
10/09/2017 09:38:03 [Data loaded.]
10/09/2017 09:38:03 [loading previous model...]
2806118 parameters
10/09/2017 09:39:33 [dev EM: 55.68590350047304 F1: 66.70301086732972]
(venv) ubuntu@ip-172-31-9-81:~/work/sru/DrQA$ python train.py -e 0 -bs 32  --resume checkpoint_epoch_2.pt
seed: 937
10/09/2017 09:42:17 [program starts.]
10/09/2017 09:42:45 [Data loaded.]
10/09/2017 09:42:45 [loading previous model...]
2806118 parameters
10/09/2017 09:43:10 [dev EM: 55.638599810785244 F1: 66.81417432626549]

The predictions should not be depending on batch size right? Also what's the point of setting the batch size to 1 in this?

CUDA 9 Support

Does the existing codebase work with CUDA 9 as well? On the Titan V GPU we could take advantage of FP16 to further speed up training.

I couldn't run it, because I had a mistake, but I couldn't find the reason.

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCStorage.cu line=66 error=30 : unknown error
Traceback (most recent call last):
File "/home/lai/filespace/eclipse-workpplace/sru-master/language_model/train_lm.py", line 14, in
import cuda_functional as MF
File "/home/lai/filespace/eclipse-workpplace/sru-master/cuda_functional.py", line 13, in
tmp_ = torch.rand(1,1).cuda()
File "/home/lai/anaconda3/lib/python3.6/site-packages/torch/_utils.py", line 66, in cuda
return new_type(self.size()).copy
(self, async)
File "/home/lai/anaconda3/lib/python3.6/site-packages/torch/cuda/init.py", line 269, in _lazy_new
return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: cuda runtime error (30) : unknown error at /opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCStorage.cu:66

hidden states in input

Could you please add the possibility to feed hidden states to the forward pass likewise nn.LSTMCell ?

train_lm.py using LSTM results nan/inf

Hi,
Did you use seed in your language model experiments? I tried to run LSTM experiment with default papameters, but the training was terminated at epoch 53. Looking at train_lm.py , I think this is because:

if math.isnan(loss.data[0]) or math.isinf(loss.data[0]): 

Thanks,

AttributeError when preprocessing data for DrQA

Firstly i ran download.sh, and it succesfully downloaded glove and train/dev jsons for SQuAD. However, python prepro.py gave me this:

Traceback (most recent call last):
  File "prepro.py", line 243, in <module>
    vocab_tag = list(nlp.tagger.tag_names)
AttributeError: 'Tagger' object has no attribute 'tag_names'

My Spacy version is 2.0.3, and it seems like something broke in update from 1.x that is written in requirements, and I didn't succeed in fixing it myself.
Any suggests?

Error when using DataParallel

When using with DataParallel, received the following error
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_HANDLE: invalid resource handle

single gpu works fine

About Grad: gradient check failed in some case, how to correct calculate x's gradient ?

Hi Taolei,

In you sru implement, the backward step will update a grad_u matrix, but in many framework like tensorflow, the grad operation will only rqeuire to calc the input parameter 's gradient,

U = X.dot( [Wx, Wf, Wr] )

as my understanding, U's dim size is [seq_length, batch_size, n_out * 3 * direction_cnt], but W's dim is [n_in, n_out * 3 * direction_cnt],

If I can compute the grad_u, how can I convert this to grad_w ?

I notice musyoku's code did some convert like that (https://github.com/musyoku/chainer-sru/blob/master/sru/sru.py), but I have trouble to understand that.

Could you give me some advise ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.