hitvoice / drqa Goto Github PK

View Code? Open in Web Editor NEW

403.0 20.0 113.0 84 KB

A pytorch implementation of Reading Wikipedia to Answer Open-Domain Questions.

Shell 2.48% Python 97.52%

pytorch squad drqa

drqa's People

Contributors

Stargazers

Watchers

Forkers

hedgefair awokeknowing nianfudong little1tow hsinyuan-huang stratigraph stonesysu khanhptnk tk1363704 breakend demiguo vunb xpertasks poivrenoir singularscience tejamukka taolei87 sanjeeku namisan wsdm-paper-reading yangliuy 19ai nahidalam chickenbestlover lomberer jx57 siviltaram zhanglbjames chunlinx lixinsu hualichenxi klpek cceyda frankxu2004 shafiahmed augmen prasoontelang dfenglei afcarl aiedward hitman56 deepakkumar1984 ngocphuongnb meccy sohuren tomlisankie magicalwind dlworkspace liuweiping2020 peide jiyulongxu zhouhaosame sidney1994 tikyau nicemartin brettkoonce yucoian sayduke gilvandroneto maknotavailable miroblog tabshaikh esskay0000 code-krishna ramdhanoriya sepehrs07 urbankocmut kb-rahul tayyabikhlaq tomarraj008 mazzzystar homelh ztl-35 sharma-ji red8top carmanzhang sloanqin zheng19931128 saburbutt daishu7 wangcheny sourcepirate yfsunshine seahrh begugao parikshit-hooda andrey-tkachev pawardipawesh anshumanaich07 maobui2907 fd54 muhamob cindycandy lightluu nhunguet cocacolabai seongl aqui-tna anshiquanshu66 namratasaun

drqa's Issues

How to get the 78.6 F1 score?

Hi,

Thanks for creating this repo!

When I ran the code with default options and 30 epochs, I got 78.0~78.1 F1 score. Did I miss something? Do I need more training epochs?

Thanks,
Tao

Hi, I'm new to Machine Learning and I've got a question regarding models predictions. What kind of data do i have to provide? Or even more general question. How do I actually make any predictions using this particular model?
Thanks

using DrQA for Squad 2.0 and other datasets

Is it possible to use DrQA with Squad 2.0 or other QA datasets ? If so what would be the steps?

get_answer_index() takes 4 positional arguments but 5 were given

running prepro.py gives error:

Traceback (most recent call last):
File "prepro.py", line 165, in
train.answer_start, train.answer_end)])
File "prepro.py", line 163, in
zip(*[get_answer_index(a, b, c, d, e) for a, b, c, d, e in
TypeError: get_answer_index() takes 4 positional arguments but 5 were given

planning to implement Attend It Again paper.

Hello there!

I was planning to implement the attention-again from this paper Attend It Again on DrQA.

Basically, what Attend It again does is as follows.

This model has two LSTM layers. In the bottom layer of LSTM, we use the traditional attention mechanisms and generate the hidden state of LSTM unit from previous hidden state and current input. Next step, we integrate the hidden state of previous LSTM unit in top layer, current input feature and the current output from the bottom layer of LSTM unit.

My plan was to take the doc_hiddens and the x1_emb and feed these to an Attention similar to qemb_match along with question_hiddens then feed this to a LSTM network similar to doc_rnn. Later take this output and feed into start_attn and end_attn to get the start_scores and end_scores.

Can you please tell, if this will be any good to get the better F1 measure ?

train stop

hello, i'm the new researcher on machine reading comprehension.
when use"python train.py -e 40 -bs 32", the process will stop at “Data loaded”.
could you give me the solution about this?

Only decode on a test set

Can you use the trained model to just decode on a new test set (same json format as dev) ? Instead of train+decode and at the same time? Thanks

Trying to understand the index_answer funtion

The last condition in this function, wherein you return (None, None). Does this condition arise or is it just for avoiding a crash.
I am trying to implement the same paper and when I try to get the final labels for my context-question pair, there are many answers that result in ValueError. Is this some flaw in dataset?
Thank you.

Getting low F1 and EM scores

I have been working independently to implement this paper and I have referred to this repo on many occasions. I started training the model but I am not getting satisfactory results.
I am not updating the glove embeddings during training and not using the POS, ENT features. I have included f_align_feature however.
Can you tell me some reasons why this might be the case?

prepro.py , `to_id` function assigns id using tokens in BOTH train and dev?

Here's a code snippet of prepro.py:

full = train + dev
vocab, counter = build_vocab([row[5] for row in full], [row[1] for row in full])
w2id = {w: i for i, w in enumerate(vocab)}

def to_id(row, unk_id=1):
    context_tokens = row[1]
    context_features = row[2]
    context_tags = row[3]
    context_ents = row[4]
    question_tokens = row[5]
    question_ids = [w2id[w] if w in w2id else unk_id for w in question_tokens]
    context_ids = [w2id[w] if w in w2id else unk_id for w in context_tokens]
    ...

train = list(map(to_id, train))
dev = list(map(to_id, dev))

If my interpretation is right, this means that when processing the dev set, the FULL vocab set (constructed from train+dev) is used to determine if words in dev set are UNK. Shouldn't it be using vocab constructed from the train set only?
Let me know if my interpretation is right :)

New dataset

how to add new dataset to the prediction engine for training , testing purpose ?how to add new dataset for prediction purpose ?

Different function of evaluating metrics

I am facing some challenges in trying to reproduce the results. The evaluation function used in this repo is as follows: (To calculate the start and end indexes)

        max_len = self.opt['max_len'] or score_s.size(1)
        for i in range(score_s.size(0)):
            scores = torch.ger(score_s[i], score_e[i])
            scores.triu_().tril_(max_len - 1)
            scores = scores.numpy()
            s_idx, e_idx = np.unravel_index(np.argmax(scores), scores.shape)

I am using the following to calculate the start and end indexes from the predictions.

           preds = model(context, question, context_mask, question_mask)
           p1, p2 = preds
           y1, y2 = label[:,0], label[:,1]
           loss = F.nll_loss(p1, y1) + F.nll_loss(p2, y2)
           valid_loss += loss.item()
           yp1 = torch.argmax(p1, dim=1)
           yp2 = torch.argmax(p2, dim=1)
           yps = torch.stack([yp1, yp2], dim=1)

           y_min, _ = torch.min(yps,1) # corresponds to s_idx 
           y_max, _ = torch.max(yps,1) # corresponds to e_idx

I tried using both the methods and I am getting different results. Is something wrong with the latter approach?
Thank you

training stopped at epoch 1

can you tell me how long does it take for the training process to complete?

i am using a google colab notebook. and it has been stuck at epoch 1 since last 20 mins

Question about POS and NER in the model

Does the model map each POS tag and NER tag category to a one-hot encoding? If not, why? It doesn't make sense to me how you can just supply the category ID directly in the embedding.

Does it requires GPU acceleration

hi Does it requires GPU acceleration? like pytorch GPU version ? can we develop it to use CPUs ? how many cores and ram is required to run it ?

Gradient flow of the failing model

I am trying to reproduce this paper and I have referred to this repository. My training is not giving satisfactory results. The metrics are subpar. Upon investigation, I tried plotting the layer gradients and found this. I have used the function from this thread.

From the figure above, it seems as if the middle layers are not learning anything.
What should I try doing in order to fix this?

AssertionError: Torch not compiled with CUDA enabled

$ python3 train.py -e 40 -bs 32

02/15/2020 05:17:11 [Program starts. Loading data...]
02/15/2020 05:22:48 {'log_per_updates': 3, 'data_file': 'SQuAD/data.msgpack', 'model_dir': '/Users/balagopalbhallamudi/Desktop/DrQA/models', 'save_last_only': False, 'save_dawn_logs': False, 'seed': 1013, 'cuda': False, 'epochs': 40, 'batch_size': 32, 'resume': '', 'resume_options': False, 'reduce_lr': 0.0, 'optimizer': 'adamax', 'grad_clipping': 10, 'weight_decay': 0, 'learning_rate': 0.1, 'momentum': 0, 'tune_partial': 1000, 'fix_embeddings': False, 'rnn_padding': False, 'question_merge': 'self_attn', 'doc_layers': 3, 'question_layers': 3, 'hidden_size': 128, 'num_features': 4, 'pos': True, 'ner': True, 'use_qemb': True, 'concat_rnn_layers': True, 'dropout_emb': 0.4, 'dropout_rnn': 0.4, 'dropout_rnn_output': True, 'max_len': 15, 'rnn_type': 'lstm', 'pretrained_words': True, 'vocab_size': 91590, 'embedding_dim': 300, 'pos_size': 50, 'ner_size': 19}
02/15/2020 05:22:48 [Data loaded.]
02/15/2020 05:22:48 Epoch 1
02/15/2020 07:07:48 > epoch [ 1] updates[ 2707] train loss[4.38260] remaining[0:00:00]

02/15/2020 07:09:46 dev EM: 53.140964995269634 F1: 64.78947947738538
Traceback (most recent call last):
File "train.py", line 377, in
main()
File "train.py", line 87, in main
model.save(model_file, epoch, [em, f1, best_val_score])
File "/Users/balagopalbhallamudi/Desktop/DrQA/drqa/model.py", line 147, in save
'torch_cuda_state': torch.cuda.get_rng_state()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/random.py", line 20, in get_rng_state
_lazy_init()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/init.py", line 196, in _lazy_init
_check_driver()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/init.py", line 94, in _check_driver
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
(base) Balagopals-MacBook-Pro:DrQA balagopalbhallamudi$ python3 interact.py
Traceback (most recent call last):
File "interact.py", line 31, in
checkpoint = torch.load(args.model_file, map_location=lambda storage, loc: storage)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 525, in load
with _open_file_like(f, 'rb') as opened_file:
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 212, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 193, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'models/best_model.pt'

FileNotFoundError: [Errno 2] No such file or directory: 'SQuAD/meta.msgpack'

I ran download.sh and I saw two files in the SQuAD folder:
dev-v1.1.json train-v1.1.json

Then I got the error when running:
python train.py -e 40 -bs 32

Where can I download the meta.msgpack? Thanks.

UNK过多

在用squad_preprocess.py预处理之后，用load_squad函数load出来的context和question里面UNK太多了，最后的vocab数量40000+，如下图。

但是我把data.msgpack里面存储的id形式的context和question用vocab转化为string形式后，发现UNK太多了，想问一下怎么处理呢？

Update function doesn't work correctly.

I had already trained the model for 8 epochs and then the training stopped as my pc crashed. I saved the model folder and then I transfered it to an other pc. At the second pc, I executed the prepo.py to produce the meta.msgpack and data.msgpack again. Then I restarted the training with the command:
"python3 train.py -e 40 -bs 32 -rs checkpoint_epoch_8.pt -ro "

The strange thing is that Ι got this:
(02/08/2018 05:49:43 [Data loaded.]
02/08/2018 05:49:43 [loading previous model...]
02/08/2018 06:03:50 [dev EM: 44.11542100283822 F1: 56.78141180138346])
and at the evaluation state after the 8th epoch(at first pc) I had got
(02/05/2018 03:45:31 dev EM: 66.29139072847683 F1: 76.02391342380288)
How is it possible the accuracy at dev set to be reduced?

Also the train loss has been increased:
02/08/2018 06:03:50 Epoch 9
02/08/2018 06:07:01 epoch [ 9] updates[ 21641] train loss[3.85778] remaining[2 days, 4:23:09]
02/08/2018 06:09:00 epoch [ 9] updates[ 21644] train loss[4.76032] remaining[1 day, 11:30:17]
02/08/2018 06:12:22 epoch [ 9] updates[ 21647] train loss[4.78624] remaining[1 day, 17:57:18]
02/08/2018 06:14:58 epoch [ 9] updates[ 21650] train loss[4.76184] remaining[1 day, 17:00:43]
02/08/2018 06:19:14 epoch [ 9] updates[ 21653] train loss[4.79866] remaining[1 day, 22:13:37]
02/08/2018 06:22:23 epoch [ 9] updates[ 21656] train loss[4.88899] remaining[1 day, 22:20:00]
02/08/2018 06:24:32 epoch [ 9] updates[ 21659] train loss[4.91365] remaining[1 day, 20:01:18]
02/08/2018 06:26:16 epoch [ 9] updates[ 21662] train loss[4.88356] remaining[1 day, 17:30:11]
02/08/2018 06:27:48 epoch [ 9] updates[ 21665] train loss[4.81403] remaining[1 day, 15:14:03]
02/08/2018 06:29:32 epoch [ 9] updates[ 21668] train loss[4.76201] remaining[1 day, 13:45:44]
02/08/2018 06:31:33 epoch [ 9] updates[ 21671] train loss[4.74199] remaining[1 day, 12:58:03]
02/08/2018 06:33:21 epoch [ 9] updates[ 21674] train loss[4.73257] remaining[1 day, 12:00:50]
02/08/2018 06:35:34 epoch [ 9] updates[ 21677] train loss[4.72988] remaining[1 day, 11:43:07]
02/08/2018 06:37:44 epoch [ 9] updates[ 21680] train loss[4.69789] remaining[1 day, 11:24:20]
02/08/2018 06:39:13 epoch [ 9] updates[ 21683] train loss[4.70580] remaining[1 day, 10:26:02]
02/08/2018 06:41:08 epoch [ 9] updates[ 21686] train loss[4.71170] remaining[1 day, 10:00:15]
02/08/2018 06:42:29 epoch [ 9] updates[ 21689] train loss[4.70412] remaining[1 day, 9:05:42]
02/08/2018 06:44:02 epoch [ 9] updates[ 21692] train loss[4.70031] remaining[1 day, 8:28:25]
02/08/2018 06:45:49 epoch [ 9] updates[ 21695] train loss[4.67916] remaining[1 day, 8:06:20]
02/08/2018 06:47:25 epoch [ 9] updates[ 21698] train loss[4.65448] remaining[1 day, 7:37:25]
02/08/2018 06:49:12 epoch [ 9] updates[ 21701] train loss[4.64674] remaining[1 day, 7:19:29]
02/08/2018 06:50:58 epoch [ 9] updates[ 21704] train loss[4.61606] remaining[1 day, 7:02:11]

as at the 8th epoch(first pc) I had:
2/05/2018 03:23:24 epoch [ 8] updates[ 21624] train loss[3.56107] remaining[0:13:05]
02/05/2018 03:25:22 epoch [ 8] updates[ 21627] train loss[3.56098] remaining[0:10:37]
02/05/2018 03:27:28 epoch [ 8] updates[ 21630] train loss[3.56091] remaining[0:08:10]
02/05/2018 03:29:17 epoch [ 8] updates[ 21633] train loss[3.56084] remaining[0:05:43]
02/05/2018 03:30:57 epoch [ 8] updates[ 21636] train loss[3.56072] remaining[0:03:16]
02/05/2018 03:32:57 epoch [ 8] updates[ 21639] train loss[3.56061] remaining[0:00:49]

However it seems that after some iterations the train loss has been reduced even if some fluctuations still exists:
02/09/2018 10:32:30 epoch [ 9] updates[ 23123] train loss[4.03933] remaining[13:33:00]
02/09/2018 10:34:10 epoch [ 9] updates[ 23126] train loss[4.03983] remaining[13:30:44]
02/09/2018 10:36:24 epoch [ 9] updates[ 23129] train loss[4.03977] remaining[13:28:56]
02/09/2018 10:37:39 epoch [ 9] updates[ 23132] train loss[4.04052] remaining[13:26:20]
02/09/2018 10:39:15 epoch [ 9] updates[ 23135] train loss[4.04012] remaining[13:24:01]
02/09/2018 10:40:40 epoch [ 9] updates[ 23138] train loss[4.03946] remaining[13:21:34]
02/09/2018 10:42:24 epoch [ 9] updates[ 23141] train loss[4.04051] remaining[13:19:22]
02/09/2018 10:44:23 epoch [ 9] updates[ 23144] train loss[4.04052] remaining[13:17:22]
02/09/2018 10:46:02 epoch [ 9] updates[ 23147] train loss[4.04009] remaining[13:15:06]
02/09/2018 10:47:39 epoch [ 9] updates[ 23150] train loss[4.04014] remaining[13:12:49]
02/09/2018 10:49:17 epoch [ 9] updates[ 23153] train loss[4.03977] remaining[13:10:32]
02/09/2018 10:51:50 epoch [ 9] updates[ 23156] train loss[4.03920] remaining[13:08:59]
02/09/2018 10:53:59 epoch [ 9] updates[ 23159] train loss[4.03874] remaining[13:07:07]
02/09/2018 10:55:47 epoch [ 9] updates[ 23162] train loss[4.03874] remaining[13:04:59]

Can be an explanation for this model's behavior?

save best model

        # save
        if not args.save_last_only or epoch == epoch_0 + args.epoches - 1:
            model_file = os.path.join(model_dir, 'checkpoint_epoch_{}.pt'.format(epoch))
            model.save(model_file, epoch)
            if f1 > best_val_score:
                best_val_score = f1
                copyfile(
                    os.path.join(model_dir, model_file),
                    os.path.join(model_dir, 'best_model.pt'))
                log.info('[new best model saved.]')

train.py中的save部分，copyfile中给出src和dst文件名，但是model_file之前已经是os.path.join后的结果，copyfile中不需要再os.path.join吧？

Cant do "bash"

When I do "bash download.sh", this happens

Error: ${REQUIRED[i]} is not installed.

I installed everything.

Using DrQA on an Chinese dataset

Is it expected that this code can be applied to a Chinese language dataset with only minor changes?

I understand that I will need to provide the following:

Chinese train/dev data files in the SQuAD format
GloVe word vectors trained on the Chinese language
Spacy Chinese language models
Changes in prepro.py to take care of things such as tokenization, add encoding="utf8" to file read/write statements, etc.

Would very much appreciate any insights if there is any known reasons why this is not supposed to work.

Is there a way to know the score of the prediction to analyse whether it is right or wrong?

@hitvoice Consider below evidence and questions

{
"evidence":"I am on vacation from July 31st and coming back next month",
"question":{
"1":"when he is going on vacation?",
"2":"when he is returning back from vacation?",
}

Answer will be:
"when he is going on vacation?": "July 31st",
"when he is returning back from vacation?": "next month",

This is working as expected. But consider the case where I have not provided the return back details and the evidence is just

"evidence":"I am on vacation from July 31st"

And I am getting below answer
"when he is going on vacation?": "July 31st",
"when he is returning back from vacation?": "July 31st",

And we know that return back date is not July 31st, is there a way to get the score of the prediction and based on some threshold make it invalid or blank?

Finetune against a custom dataset

Hi Runqi Yang
Thanks for such a wonderful repo.
Quick help on how to finetune the model with a new dataset loading checkpoint from a SQUAD trained model.

Adding Evidence as Database (like wikipedia )

say once your model is trained and you export the model for prediction. you want to add all the evidence as database in "id", "txt" format.So multiple users can run the queries on the dataset for Answers . how to add such datasets ? would we require another python script like dataset / document reader.py ?

msgpack.exceptions.UnpackValueError: Unpack failed: error = 0

I followed all of the instructions and then got this error. Would anyone know how to go about troubleshooting this?

msgpack.exceptions.UnpackValueError: Unpack failed: error = 0

Complete Error Message:

Traceback (most recent call last):
  File "/home/samrat/Documents/StudyBuddy/Algorithms/DrQA/interact.py", line 36, in <module>
    meta = msgpack.load(f, encoding='utf8')
  File "/home/samrat/Documents/StudyBuddy/venv/lib/python3.6/site-packages/msgpack/__init__.py", line 58, in unpack
    return unpackb(data, **kwargs)
  File "msgpack/_unpacker.pyx", line 211, in msgpack._unpacker.unpackb
msgpack.exceptions.UnpackValueError: Unpack failed: error = 0

Just FYI, I trained the model in Google Colab and downloaded it from there. Could this have caused any problems? I ran interact.py in Colab and it worked fine so I am really unsure what the problem is.

Thank You in Advance!!

How long to run the model for the default params

The code is realy helpfull, I was just curious to know as to how long it took you guys to run 1 epoch

Test example

How to use an example(question) to test this project?

Regarding train.py

Upto how many epochs i can train the model?
40 epoches are sufficient?

no model file

I run the command python interact.py got this error
(pt) swapnilbhadade@hitvoice:~/pt/DrQA-1$ python interact.py Traceback (most recent call last): File "interact.py", line 22, in <module> checkpoint = torch.load(args.model_file) File "/home/swapnilbhadade/pt/lib/python3.5/site-packages/torch/serialization.py", line 301, in load f = open(f, 'rb') FileNotFoundError: [Errno 2] No such file or directory: 'models/best_model.pt'