Code Monkey home page Code Monkey logo

nqg's Introduction

NQG

This repository contains code for the paper "Neural Question Generation from Text: A Preliminary Study"

About this code

The experiments in the paper were done with an in-house deep learning tool. Therefore, we re-implement this with PyTorch as a reference.

This code only implements the setting NQG+ in the paper. Within 1 hour's training on Tesla P100, the NQG+ model achieves 12.78 BLEU-4 score on the dev set.

If you find this code useful in your research, please consider citing:

@article{zhou2017neural,
  title={Neural Question Generation from Text: A Preliminary Study},
  author={Zhou, Qingyu and Yang, Nan and Wei, Furu and Tan, Chuanqi and Bao, Hangbo and Zhou, Ming},
  journal={arXiv preprint arXiv:1704.01792},
  year={2017}
}

How to run

Prepare the dataset and code

Make an experiment home folder for NQG data and code:

NQG_HOME=~/workspace/nqg
mkdir -p $NQG_HOME/code
mkdir -p $NQG_HOME/data
cd $NQG_HOME/code
git clone https://github.com/magic282/NQG.git
cd $NQG_HOME/data
wget https://res.qyzhou.me/redistribute.zip
unzip redistribute.zip

Put the data in the folder $NQG_HOME/code/data/giga and organize them as:

nqg
├── code
│   └── NQG
│       └── seq2seq_pt
└── data
    └── redistribute
        ├── QG
        │   ├── dev
        │   ├── test
        │   ├── test_sample
        │   └── train
        └── raw

Then collect vocabularies:

python $NQG_HOME/code/NQG/seq2seq_pt/CollectVocab.py \
       $NQG_HOME/data/redistribute/QG/train/train.txt.source.txt \
       $NQG_HOME/data/redistribute/QG/train/train.txt.target.txt \
       $NQG_HOME/data/redistribute/QG/train/vocab.txt
python $NQG_HOME/code/NQG/seq2seq_pt/CollectVocab.py \
       $NQG_HOME/data/redistribute/QG/train/train.txt.bio \
       $NQG_HOME/data/redistribute/QG/train/bio.vocab.txt
python $NQG_HOME/code/NQG/seq2seq_pt/CollectVocab.py \
       $NQG_HOME/data/redistribute/QG/train/train.txt.pos \
       $NQG_HOME/data/redistribute/QG/train/train.txt.ner \
       $NQG_HOME/data/redistribute/QG/train/train.txt.case \
       $NQG_HOME/data/redistribute/QG/train/feat.vocab.txt
head -n 20000 $NQG_HOME/data/redistribute/QG/train/vocab.txt > $NQG_HOME/data/redistribute/QG/train/vocab.txt.20k

Setup the environment

Package Requirements:

nltk scipy numpy pytorch

PyTorch version: This code requires PyTorch v0.4.0.

Python version: This code requires Python3.

Warning: Older versions of NLTK have a bug in the PorterStemmer. Therefore, a fresh installation or update of NLTK is recommended.

A Docker image is also provided.

Docker image

docker pull magic282/pytorch:0.4.0

Run training

The file run.sh is an example. Modify it according to your configuration.

Without Docker

bash $NQG_HOME/code/NQG/seq2seq_pt/run_squad_qg.sh $NQG_HOME/data/redistribute/QG $NQG_HOME/code/NQG/seq2seq_pt

With Docker

nvidia-docker run --rm -ti -v $NQG_HOME:/workspace magic282/pytorch:0.4.0

Then inside the docker:

bash code/NQG/seq2seq_pt/run_squad_qg.sh /workspace/data/redistribute/QG /workspace/code/NQG/seq2seq_pt

nqg's People

Contributors

magic282 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

nqg's Issues

docker image empty?

Hi,

I tried to get into the docker image after downloading and found there's nothing in there. see below:

..\ngq>docker run -it 2caa29d6a3b3 /bin/bash
root@3de7bdd7d8ec:/workspace# ls
root@3de7bdd7d8ec:/workspace# exit
exit

I used your command docker pull magic282/pytorch:0.4.0 to pull the image down. I tried to mount into your image and also found empty in the mounted container. Please help!

wget error

when i run " wget https://res.qyzhou.me/redistribute.zip",
i got this error:

--11:28:30-- https://res.qyzhou.me/redistribute.zip
=> `redistribute.zip'
正在解析主机 res.qyzhou.me... 104.31.76.139, 104.31.77.139, 2400:cb00:2048:1::681f:4d8b, ...
Connecting to res.qyzhou.me|104.31.76.139|:443... 已连接。
OpenSSL: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure
无法建立 SSL 连接。

Testing own input

I am done with training the model.Confused with how to generate questions for my own input. Cou;d you please help me with the file that has to be run?

Code for NQG++

Would it be possible to provide the code or maybe some code snippets for the NQG++ architecture with shared embedding matrix and pre-trained word embeddings?
That would really be a great help!

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 2. Got 6 and 64 in dimension 0

def forward(self, input, bio, feats, hidden=None):
        .....
        .....
        featsEmb = [self.feat_lut(feat) for feat in feats[0]] 
        featsEmb = torch.cat(featsEmb, dim=-1)

While trying to run this above code I got this error:
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 2. Got 6 and 1 in dimension 0

So I used pad_sequence to match the dimension

featsEmb = [self.feat_lut(feat) for feat in feats[0]]


featsEmb = (pad_sequence(featsEmb, batch_first=True))
featsEmb = torch.cat(tuple(featsEmb), dim=-1)

But now I am getting another dimension error:
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 2. Got 6 and 64 in dimension 0

I printed and checked the dimensions and all the dimensions are correct.

print("Shape:------------>")
print((len(featsEmb))
        for i in range(len(featsEmb)):
            print("feat[%d]: %d" % (i, len(featsEmb[i]) ))

            for j in range(len(featsEmb[i])):
                print("feat[%d][%d]: %d" % (i,j, len(featsEmb[i][j]) ))
            print()
Shape:------------>
3

feat[0]: 6
feat[0][0]: 64
feat[0][1]: 64
feat[0][2]: 64
feat[0][3]: 64
feat[0][4]: 64
feat[0][5]: 64

feat[1]: 6
feat[1][0]: 64
feat[1][1]: 64
feat[1][2]: 64
feat[1][3]: 64
feat[1][4]: 64
feat[1][5]: 64

feat[2]: 6
feat[2][0]: 64
feat[2][1]: 64
feat[2][2]: 64
feat[2][3]: 64
feat[2][4]: 64
feat[2][5]: 64

I think it is maybe because of different version of Pytorch.
How can I remove this error and concatenate the tensors correctly? I am running the latest version of PyTorch and the code was written in Pytorch v0.4.0

Any help would be greatly appreciated. Thank you in advance.

Can't reproduce results reported in the paper

I'm trying to reproduce the results reported in the paper but am getting a considerably lower BLEU score (by almost 1 BLEU point) on both the dev and test set.

I have run run_squad_qg.sh with the pre-defined parameters.
I used the generated model model_e20.pt and ran translate.py with the following parameters

python translate.py \
       -model "${MODELPATH}/model_e20.pt" \
       -src "${DATAPATH}/test/dev.txt.shuffle.test.source.txt" \
       -bio "${DATAPATH}/test/dev.txt.shuffle.test.bio" \
       -feats "${DATAPATH}/test/dev.txt.shuffle.test.pos" "${DATAPATH}/test/dev.txt.shuffle.test.ner" "${DATAPATH}/test/dev.txt.shuffle.test.case" \
       -tgt "${DATAPATH}/test/dev.txt.shuffle.test.target.txt" \
       -output "${SAVEPATH}/pred.txt" \
       -replace_unk \
       -verbose \
       -n_best 10 \
       -gpu 0

I then ran test.py in PyBLEU as

python3 test.py ../../../../data/redistribute/QG/test/dev.txt.shuffle.test.target.txt ../../../../data/generated/pred.txt

which resulted in a BLEU score of 11.26 which is almost one bleu score lower than the reported NQG+ BLEU score for the test set in the paper (12.18).

Do you have an explanation or an intuition for this considerable discrepancy? Maybe the parameters in run_squad_qg.sh are not those that were used to compute the results for the paper?

And thanks a lot for making this code public :)

how to implement the "NGQ-POS" module in the paper

Hello,
Firstly thanks for the code and instruction, it really helps me a lot!
I am wondering that how to implement the "NGQ-POS" module in the paper by modifying this repo.
(So the input will be only a paragraph, ner, bio and Case)

Glove word embedding

Hello,
I was wondering how do you specify the 'pre_word_vecs_enc' for the pretrained word embeddings.
The Readme file did not contain the specific formatting instructions.
Thank you

Request for test module

Hi,
Firstly thanks for the code and instruction
I have trained the model following the given instruction but I am unable to test it on real data.

If you can give me instructions on how to test my trained model (i.e, taking a paragraph(source), pos, ner,bio,Case tags) and generate a question as output it would be very help ful.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.