aseveryn / deep-qa Goto Github PK

Implementation of the Convolution Neural Network for factoid QA on the answer sentence selection task

Python 50.20% Shell 0.23% Makefile 1.97% C 47.61%

deep-qa's Introduction

OVERVIEW

This code implements a convolutional neural network architecture for learning to match question and answer sentences described in the paper:

Aliaksei Severyn and Alessandro Moschitti. Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks. SIGIR, 2015

The network features a state-of-the-art convolutional sentence model, advanced question-answer matching model, and introduces a novel relational model to encode related words in a question-answer pair.

The addressed task is a popular answer sentence selection benchmark, where the goal is for each question to select relevant answer sentences. The dataset was first introduced by (Wang et al., 2007) and further elaborated by (Yao et al., 2013). It is freely availabe.

Evaluation is performed using the standard 'trec_eval' script.

DEPENDENCIES

python 2.7+
numpy
theano
scikit-learn (sklearn)
pandas
tqdm
fish
numba

Python packages can be easily installed using the standard tool: pip install

EMBEDDINGS

The pre-initialized word2vec embeddings have to be downloaded from here.

BUILD

To build the required train/dev/test sets in the suitable format for the network run:

$ sh run_build_datasets.sh

It will parse the raw XML files containg QA pairs and convert them into a suitable format for the deep learning model. The output files are stored under the folders TRAIN and TRAIN-ALL corresponding to the TRAIN and TRAIN-ALL training settings as described in the paper.

At the next step the script will extract the word embeddings for the all words in the vocabulary. We use the pre-trained word embeddings obtained by running the word2vec tool on a merged Wiki dump and Aquaint corpus (provided under the 'embeddings' folder. The missing words are randomly initalized with the uniform distribution [-0.25; +0.25]. For the further details please refer to the paper.

TRAIN AND TEST

To train the model in the TRAIN setting run:

$ python run_nnet.py TRAIN

in the TRAIN-ALL setting using 53,417 qa pairs:

$ python run_nnet.py TRAIN-ALL

The parameters of the trained network are dumped under the 'exp.out' folder.

The results reported by the 'trec_eval' script should be around these numbers:

TRAIN: MAP: 0.7325 MRR: 0.8018

TRAIN-ALL: MAP: 0.7654 MRR: 0.8186

NOTE: Small variations on different platforms are expected due to differences in random seeds which affect random initialization of network weights.

REFERENCES

Peter Clark Xuchen Yao, Benjamin Van Durme and Chris Callison-Burch. Answer extraction as sequence tagging with tree edit distance. In NAACL, 2013.

Mengqiu Wang, Noah A. Smith, and Teruko Mitaura. What is the jeopardy model? a quasi- synchronous grammar for qa. In EMNLP, 2007.

License

This software is licensed under the Apache 2 license.

deep-qa's People

Contributors

Stargazers

Watchers

Forkers

skallumadi td-andy susie09 yangliuy wolfhu jderiu zbxzc35 jinfengr cairohy xuanhan863 aaronnie chao-jiang hjk41 onacloud w2wei yangjunpro binbinbian saikswaroop lijian8 zuacubd chrsitinass dingxiaoliang33 namkhanhtran lromang little1tow kuyezhiying duankai jlxue sophie-germain vorpalinsight zted nahidcse05 sqxiang pwwq0909 pyx123 yliyxr vyraun shashankg7 ljdawn betterenvi phantom-yh songhongya ustcjin respondperfect highflykxf haoshuji mengxuanwang andresrosso jackie-gj lightsilver fancycheung stevenlee-belief glebalshanskii hokimank yaosiyan haoweiliang1996 liuzp billpei xujunrt mindis colinsongf cutecha tumeteor loyaltyji meccy rain-y jiangnancaizi moonontheway jiellin ownermz ruijiera sabirdvd liuhaifeng0212 karam86 jcfeng baylee001 unibuddy-labs jusjosgra gaochonga ttliu-kiwi zihaow21 strategist922 afcarl sourcegraph-user-testing

deep-qa's Issues

pre-initialized word2vec embeddings

Hi, i cannot get the 'pre-initialized word2vec embeddings' from the link, May i ask for some help? Thanks.

Can this work on Paragraph Level?

Great work, great algorithms, great paper Severyn!

I got a question though, can this work on reranking pairs of paragraphs (Q+A), say of 2-3 sentences?

Question on '/tmp/trec-merged.txt'

A question on '/tmp/trec-merged.txt'. I have installed Theano as well as most other dependencies by installing Winpython. After running the first step:

To build the required train/dev/test sets in the suitable format for the network run:
$ sh run_build_datasets.sh

with "os.system('run_build_datasets.sh')" in the main directory of the project,

jacana-qa-naacl2013-data-results/train.xml
outdir TRAIN
Traceback (most recent call last):
  File "parse.py", line 178, in <module>
    qids, questions, answers, labels = load_data(all_fname)
  File "parse.py", line 15, in load_data
    lines = open(fname).readlines()
IOError: [Errno 2] No such file or directory: '/tmp/trec-merged.txt'
Vocab size 17022
embeddings/aquaint+wiki.txt.gz.ndim=50.bin
vocab_size, layer1_size 2470719 50
. . . . . . . . . . . . . . . . . . . . . . . . . done
Words found in wor2vec embeddings 16201
ndim 50
Using zero vector as random
random_words_count 821
(17023L, 50L)
TRAIN\emb_aquaint+wiki.txt.gz.ndim=50.bin.npy
Vocab size 56952
embeddings/aquaint+wiki.txt.gz.ndim=50.bin
vocab_size, layer1_size 2470719 50
. . . . . . . . . . . . . . . . . . . . . . . . . done
Words found in wor2vec embeddings 51250
ndim 50
Using zero vector as random
random_words_count 5702
(56953L, 50L)
TRAIN-ALL\emb_aquaint+wiki.txt.gz.ndim=50.bin.npy
bash: make: command not found
bash: make: command not found

The first problem is about '/tmp/trec-merged.txt' which I failed to find it.
What's '/tmp/trec-merged.txt'? Is it inside the downloaded zip file, or how to create it?

Implementation doubt

Hi,

Please excuse if you find this doubt to be very naive. So for evaluating MAP, for each test query text, ideally we should compare it with each answer text in the training set right? Or do we compare it against answers in the mini-batch only?

Architecture question

Pretty cool stuff.
Reading the code I'm just wondering about why so many levels of indirection from indexes to word2vec sentence matrixes.

It's like parsing -> creation of an "alphabet" to map words to indexes -> creation of questions / answers as series of alphabet indexes -> creation of an alphabet index to word2vec mapping.
This also requiring a nn layer that will do the lookup index to word2vec vector, before the convolution.

Is there a reason to bother with indexes at all, and not transforming everything straight into a word2vec matrix either at parsing time or even before the feed forward phase ?
Seems like this way the code would be more tolerant to being fed new document pairs containing words that exist in the word2vec but not in the "alphabet" mapping.

NotImplementedError: The image and the kernel must have the same type.inputs(float64), kerns(float32)

rzai@rzai00:/prj/deep-qa$ python run_nnet.py TRAIN
Using gpu device 0: GeForce GTX 1080 (CNMeM is disabled)
Running training in the TRAIN setting
y_train (array([0, 1], dtype=int32), array([4370, 348]))
y_dev (array([0, 1], dtype=int32), array([926, 222]))
y_test (array([0, 1], dtype=int32), array([1233, 284]))
q_train (4718, 33)
q_dev (1148, 33)
q_test (1517, 33)
a_train (4718, 40)
a_dev (1148, 40)
a_test (1517, 40)
Generating random vocabulary for word overlap indicator features with dim: 5
Gaussian
Loading word embeddings from TRAIN/emb_aquaint+wiki.txt.gz.ndim=50.bin.npy
Word embedding matrix size: (17023, 50)
batch_size 50
n_epochs 25
learning_rate 0.1
max_norm 0
Traceback (most recent call last):
File "run_nnet.py", line 500, in
main()
File "run_nnet.py", line 189, in main
nnet_q.set_input((x_q, x_q_overlap))
File "/home/rzai/prj/deep-qa/nn_layers.py", line 64, in set_input
self.output = self.output_func(input)
File "/home/rzai/prj/deep-qa/nn_layers.py", line 88, in output_func
layer.set_input(cur_input)
File "/home/rzai/prj/deep-qa/nn_layers.py", line 64, in set_input
self.output = self.output_func(input)
File "/home/rzai/prj/deep-qa/nn_layers.py", line 102, in output_func
layer.set_input(input)
File "/home/rzai/prj/deep-qa/nn_layers.py", line 64, in set_input
self.output = self.output_func(input)
File "/home/rzai/prj/deep-qa/nn_layers.py", line 88, in output_func
layer.set_input(cur_input)
File "/home/rzai/prj/deep-qa/nn_layers.py", line 64, in set_input
self.output = self.output_func(input)
File "/home/rzai/prj/deep-qa/nn_layers.py", line 435, in output_func
image_shape=self.input_shape)
File "/usr/local/lib/python2.7/dist-packages/theano/tensor/nnet/conv.py", line 151, in conv2d
return op(input, filters)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 509, in call
node = self.make_node(*inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/theano/tensor/nnet/conv.py", line 626, in make_node
"inputs(%s), kerns(%s)" % (_inputs.dtype, _kerns.dtype))
NotImplementedError: The image and the kernel must have the same type.inputs(float64), kerns(float32)
rzai@rzai00:/prj/deep-qa$

some doubt of adadelta

In the function get_adagrad_updates, classical adadelta use recent exp_sqr_grads and exp_sqr_grads, but in your algorithm, the exp_sqr_grads and exp_sqr_grads are accumulated from the first time, while adagrad express like this。

downsample has been moved to pool

relating to this Theano/Theano#4337
when I run 'python2.7 run_nnet.py TRAIN'
it gives this:
Traceback (most recent call last):
File "run_nnet.py", line 15, in
import nn_layers
File "/media/sf_D_DRIVE/installed/githubs/deep-qa/nn_layers.py", line 6, in
from theano.tensor.signal import downsample
ImportError: cannot import name downsample

then I rewrite 'from theano.tensor.signal import downsample' in nn_layers.py to 'from theano.tensor.signal import pool as downsample',
everything works.

How to show the low embedding of a question or an answer in the process of test ?

I'm very interesting about this work. I want to do some work following it.

The problem making me confuse is that how to show the low embedding of a question or an answer in the process of test.

In the file ‘run_nnet.py’, the 'train_nnet' is the whole CNN.
If I want to get the low embedding of a question, I should check the input of 'classifier' (also, the output of ‘hidden_layer’). Would you generously tell me how to show them?

Thank you very much for your help~

Best,

system issue

Is this program running under Linux?

Explanation for map_score() method.

I saw the following with map_score() method used to compute mean average precision:

It sorts the query-label pairs in decreasing order of predicted scores.
Then, if the original label > 0, i.e if it was a correct answer for a question, it increments the correct count and calculates precision@curr_index. This means:
Incorrect behaviour: For any predicted score (predicted score will be in [0, 1], if its original label is 1, it will be considered in calculating precision. It could be that predicted_score=0.2, i.e the current answer is not correct/relevant for the question, but since its original label=1, it will be used to calculate precision.
Ideally, predicted scores should be rounded to 0 or 1 based on some threshold and then, compared if label == score, if yes, then this item is relevant.

Original Code:

deep-qa/run_nnet.py

Lines 403 to 418 in 249a1ec

    
           def map_score(qids, labels, preds): 
        
             qid2cand = defaultdict(list) 
        
             for qid, label, pred in zip(qids, labels, preds): 
        
               qid2cand[qid].append((pred, label)) 
        
             average_precs = [] 
        
             for qid, candidates in qid2cand.iteritems(): 
        
               average_prec = 0 
        
               running_correct_count = 0 
        
               for i, (score, label) in enumerate(sorted(candidates, reverse=True), 1): 
        
                 if label > 0: 
        
                   running_correct_count += 1 
        
                   average_prec += float(running_correct_count) / i 
        
               average_precs.append(average_prec / (running_correct_count + 1e-6)) 
        
             map_score = sum(average_precs) / len(average_precs) 
        
             return map_score

Reworked Code: https://github.com/gvishal/rank-text-cnn/blob/master/code/utils.py#L24-L44

pip install numba --user failed

envy@ub1404:/media/envy/data1t/os_prj/github/deep-qa$ pip install numba --user
Requirement already satisfied (use --upgrade to upgrade): numba in /home/envy/.local/lib/python2.7/site-packages
Downloading/unpacking llvmlite (from numba)
Downloading llvmlite-0.10.0.tar.gz (92kB): 92kB downloaded
Running setup.py (path:/tmp/pip_build_envy/llvmlite/setup.py) egg_info for package llvmlite

Requirement already satisfied (use --upgrade to upgrade): numpy in /home/envy/.local/lib/python2.7/site-packages (from numba)
Requirement already satisfied (use --upgrade to upgrade): enum34 in /home/envy/.local/lib/python2.7/site-packages (from numba)
Downloading/unpacking singledispatch (from numba)
Downloading singledispatch-3.4.0.3-py2.py3-none-any.whl
Downloading/unpacking funcsigs (from numba)
Downloading funcsigs-1.0.2-py2.py3-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): six in /home/envy/.local/lib/python2.7/site-packages (from singledispatch->numba)
Installing collected packages: llvmlite, singledispatch, funcsigs
Running setup.py install for llvmlite
usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: -c --help [cmd1 cmd2 ...]
or: -c --help-commands
or: -c cmd --help

error: option --single-version-externally-managed not recognized
Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip_build_envy/llvmlite/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-4f3RJo-record/install-record.txt --single-version-externally-managed --compile --user:
usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]

or: -c --help [cmd1 cmd2 ...]

or: -c --help-commands

or: -c cmd --help

error: option --single-version-externally-managed not recognized

Cleaning up...
Command /usr/bin/python -c "import setuptools, tokenize;file='/tmp/pip_build_envy/llvmlite/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /tmp/pip-4f3RJo-record/install-record.txt --single-version-externally-managed --compile --user failed with error code 1 in /tmp/pip_build_envy/llvmlite
Storing debug log for failure in /home/envy/.pip/pip.log
envy@ub1404:/media/envy/data1t/os_prj/github/deep-qa$

How can I test on Microblog dataset?

Hello.
I have a trouble when I do an experiment on this code.
I'm interested on the paper, "Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks" and successfully do an experiment on answer sentence selection task.
But I can't do on a TREC Microblog Retrival Test because it requires raw rank of top 30 systems on TMB 2011.
Can anyone tell me how can I get this data for the finish the experiment??

Floating point exception (core dumped)

After running 'run_build_datasets.sh' all the files were generated inside TRAIN and TRAIN-ALL folder, but when i am trying to train the network with 'python run_nnet.py TRAIN' or 'python run_nnet.py TRAIN-ALL', it is stopping after printing 'Generating adadelta updates' (see below)

Zero out dummy word: True
1%|▎ | 8/1122 [00:03<08:00, 2.32it/s]
Floating point exception (core dumped)

can someone help me out with this?

Segmentation fault while training

When I run run_nnet.py TRAIN, I am getting

[1] 5595 segmentation fault (core dumped) python run_nnet.py TRAIN

	def map_score(qids, labels, preds):
	qid2cand = defaultdict(list)
	for qid, label, pred in zip(qids, labels, preds):
	qid2cand[qid].append((pred, label))

	average_precs = []
	for qid, candidates in qid2cand.iteritems():
	average_prec = 0
	running_correct_count = 0
	for i, (score, label) in enumerate(sorted(candidates, reverse=True), 1):
	if label > 0:
	running_correct_count += 1
	average_prec += float(running_correct_count) / i
	average_precs.append(average_prec / (running_correct_count + 1e-6))
	map_score = sum(average_precs) / len(average_precs)
	return map_score