Code Monkey home page Code Monkey logo

ordered-neurons's People

Contributors

arkaung avatar bharatr21 avatar shawntan avatar yikangshen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ordered-neurons's Issues

Question about dataset construction

Hello Yikang:

Hi~, I'm a research intern of HIT-SCIR lab, Yangming Li. It's great for your contribution about this repository. But I found some problems about the dataset construction (including test set):

1, the use of pytorch API "narrow" will unexpectedly abandon some words and result in incorrect PPL score.

2, It seems that your slide window on the whole corpus is not continuous and thus generate far less data than usual.

Great thanks again for your contribution about this repository.
Yangming, 19/08/02

Question about the model design details

Hi, thanks for sharing the source code.

According to the Equation (10) in your paper, I guess the last elements of $\tilde{i}_t$ will always be zero, e.g., [0.8, 0.3, 0.1, 0].
Is this on purpose? If yes, could you please explain why? I just think this will let the upmost neuron chunk keep copying history without writing in anything new, is this correct?

How to train using main.py using multiple GPUs?

@yikangshen @shawntan Is there an easy way to train the model to replicate the experiments using main.py using multiple GPUs?

When using model = nn.DataParallel(model) before train(), the initialization goes into the LSTM stack and then the ONLSTM cell to return the weights but it throws an error.

We also tried doing the model = nn.DataParallel(model) after the hidden = model.init_hidden(args.batch_size) and it seems like the LinearDropConnect layer can't access the .weight tensors.

cur_loss suddenly increases to a larger number

Hi, Yikang thanks a lot for this awesome paper!

When I try to run the below command,

python main.py --batch_size 20 --dropout 0.45 --dropouth 0.3 --dropouti 0.5 --wdrop 0.45 --chunk_size 10 --seed 141 --epoch 1000

Such error triggered at certain(5th) epoch:

File "main.py", line 269, in
train()
File "main.py", line 245, in train
elapsed * 1000 / args.log_interval, cur_loss, math.exp(cur_loss), cur_loss / math.log(2)))
OverflowError: math range error

My initial found is that the cur_loss suddenly increases to a larger number(from ~5 to more than 10000), which results in such error.
However, I am not sure what cause this sudden huge increment.

The utils/batchify() can not work for unsupervised parsing

Hi, It seems that this function should be different on the PTB dataset for unsupervised parsing , right?

And would you give detailed guidances for how we can train the unsupervised model and I found there are many codes that should be changed manually.

Thanks a lot!

High performance for right-branching strategy

Really appreciate for releasing the code.

I found when I testing the baseline of right-branching strategy on WSJ test set, the F1 is really high (39.87), which does not match the result in the paper (16.5).

I have just changed the code

distance = model.distance[0].squeeze().data.cpu().numpy()
distance_in = model.distance[1].squeeze().data.cpu().numpy()

into

distance = numpy.array([numpy.arange(len(sen), 0, -1)] * 3)
distance_in = numpy.array([numpy.arange(len(sen), 0, -1)] * 3)

, which represent a right-branching strategy.

And the result on WSJ test set is:
image

So, what my be the reason? Thanks a lot if u could help me out.

Default Parameters

Hi Yikang,

If I want to reproduce your work, what parameters should I use?

In the readme, you suggest use the default parameters in main.py. At the same time, you provide another set of parameters: "python main.py --batch_size 20 --dropout 0.45 --dropouth 0.3 --dropouti 0.5 --wdrop 0.45 --chunk_size 10 --seed 141 --epoch 1000 --data /path/to/your/data".

Which one should I use? I tried both ones and after 48 hours, the quoted parameters outperform the default one, so I would like to double-check with you.

Thanks,
Ian

Did you use the test data during training in the Unsupervised Parsing experiment ?

On reviewing the fellowing code, I find that the train data contain the test data. Is this coirrect?

if 'WSJ/00/WSJ_0000.MRG' <= id <= 'WSJ/24/WSJ_2499.MRG':

for id in file_ids:
    if 'WSJ/00/WSJ_0000.MRG' <= id <= 'WSJ/24/WSJ_2499.MRG':
        train_file_ids.append(id)
    if 'WSJ/22/WSJ_2200.MRG' <= id <= 'WSJ/22/WSJ_2299.MRG':
        valid_file_ids.append(id)
    if 'WSJ/23/WSJ_2300.MRG' <= id <= 'WSJ/23/WSJ_2399.MRG':
        test_file_ids.append(id)
    # elif 'WSJ/00/WSJ_0000.MRG' <= id <= 'WSJ/01/WSJ_0199.MRG' or 'WSJ/24/WSJ_2400.MRG' <= id <= 'WSJ/24/WSJ_2499.MRG':
    #     rest_file_ids.append(id)

Data Directory used when running test_phrase_grammar.py

Hi Yikang and other Contributors,

Thank you for making public the source code! I am trying to reproduce your results, but I am not sure what path to use as the command line argument of test_phrase_grammar --data. I downloaded PTB data and I am currently using treebank_3/parsed/mrg as the data argument. It does not work.

The listings under treebank_3/parsed/mrg:
atis brown readme.mrg swbd wsj

The listings under treebank_3/parsed/mrg/wsj:

00 06 12 18 24
01 07 13 19 MERGE.LOG
02 08 14 20
03 09 15 21
04 10 16 22
05 11 17 23

Thank you for your time!
Ian

Tuning contextual embeddings with hierarchical relations

I have a masked LM pretrained with bert.

The embeddings are poor on the sentence level, but do well for base tokens.
There is a natural tree structure to my corpus that I believe stands to gain from something like on-lstm.

Do you think swapping out the embedding layer of the on-lstm with pretrained bert embeddings could be fruitful?

question about unidirectional

In your paper, you use a unidirectional ON-LSTM to trained a language model and then phrase grammar with the output distance of the pretrained language model. How can we explain that the level of first token is independent with the future tokens? Is there any bidirectional way to do it?

ZeroDivisionError test_phrase_grammar

Hi,when I run the test_phrase_grammar.py ,I will get this return like following:

ZeroDivisionError: float division by zero

This is the specific error:
image

checkpoint download

I'm sorry to bother you that when I try to test this model, I have no where to download the checkpoint. With the link you have proved ,I only found the ‘.txt’ files, where can I download the 'PTB.pt'

FileNotFoundError: [Errno 2] No such file or directory: 'PTB.pt'

How to train parsing

Hi
I wonder how to train parsing. The main.py seems to be only for training LM.

Besides, when I try testing the parsing, python test_phrase_grammar.py --cuda gives error of No such file or directory: 'PTB.pt'.

Best regards,
Ron

CAN RUN main.py without GPU.

I install Pytorch0.4 which choose cuda none.And after i run the main.py ,i got a error:torch.cuda.LongTensor is not enabled.,which i haven't found a simillar problem online.whether i need use a computer with GPU and CUDA?

Where is `corpus` object

Hi, I found test_phrase_grammar.py referring corpus object many times, but I didn't find the definition and initialization.

What does the chunk_size mean in the ONLSTMStack object?

From the argparser, there is this variable chunk_size, it’s described as “number of units per chunk”. What does this chunk refer to? Is it a mini-batch? Or part of a batch? Or part of the sequence length?

Is it related to the paragraph before section 5 in the paper?

As the master gates only focus on coarse-grained control, modeling them with the same dimensions as the hidden states is computationally expensive and unnecessary. In practice, we set ˜ft and ˜it to be Dm = D C dimensional vectors, where D is the dimension of hidden state, and C is a chunk size factor. We repeat each dimension C times, before the element-wise multiplication with ft and it. The downsizing significantly reduces the number of extra parameters that we need to add to the LSTM. Therefore, every neuron within each C-sized chunk shares the same master gates.

where the hidden state of the RNN cell is split into chunk and computed individually? If so, could you give an example of what the chunk and hidden state computation looks like?

question about on-lstm hidden states initialization ?

I find that hidden vector is initialized only at the beginning with zero value.
Then for each batch, hidden vector is detached by calling

hidden = repackage_hidden(hidden)

My question is why not create a new hidden vector for each batch instead of using the old ones.

hidden = model.init_hidden(batch_size)

Confusion on eq. 15

Dear Yikang,

I am new to NLP but I really like this paper and appreciate your work. Now I am reading the paper and have a question about Eq. 15 in section 5.2. What does y_t mean? Thank you!

Processing of variable length sequences in a batch

Hello, I just started learning the language model.I am very interested in your method after reading your paper.But after I read the paper carefully, I have a question and I would like to ask you.But after I read the paper carefully, I have a question and I would like to ask you for advice.In the paper, you directly divide the words in the corpus into equal-length batches.But now every sentence in a batch is different in length, how should I handle it?I queried the handling of the official pytorch documentation(nn.utils.rnn.pad_packed_sequence).But don't know if this method is right for your code.Can you please give me some advice?
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.