dbpedia / neural-qa Goto Github PK

View Code? Open in Web Editor NEW

This project forked from liberai/nspm

83.0 19.0 20.0 226.6 MB

📚 A Neural QA Model for DBpedia using Neural SPARQL Machines.

License: MIT License

Python 28.02% Shell 0.48% Jupyter Notebook 71.50%

neural-qa's People

Stargazers

Watchers

neural-qa's Issues

Adding progress bar for loops in generator.py

On running the command:

python generator.py --templates data/annotations_monument.csv  --output data/monument_300

The output comes out to be:

39
for 1th template
for 2th template
for 3th template
for 4th template
for 5th template
for 6th template
for 7th template
for 8th template
for 9th template
for 10th template
for 11th template
for 12th template
for 13th template
for 14th template
for 15th template
for 16th template
for 17th template
for 18th template
for 19th template
for 20th template
for 21th template
for 22th template
for 23th template
for 24th template
for 25th template
for 26th template
for 27th template
for 28th template
for 29th template
for 30th template
for 31th template
for 32th template
for 33th template
for 34th template
for 35th template
for 36th template
for 37th template
for 38th template
for 39th template

Which can be converted into a progress bar instead. Using libraries like tqdm or progressbar.

sep_dot encoding problem

encode() method in generator_utils.py will potentially encode the abbreviation dot into a <sep_dot> token, which would affect the dataset and the final performance, for example:

select ?x where { dbr_Bandon_A.F.C. dbo_ceo ?x }
will be encoded into 'select var_x where brack_open dbr_Bandon_A.F.C sep_dot dbo_ceo var_x brack_close'
but not 'select var_x where brack_open dbr_Bandon_A.F.C. dbo_ceo var_x brack_close'

A Neural QA Model for DBpedia

GSoC year: 2020
Student: Zheyuan Bai
Mentors: Tommaso Soru, Anand Panchbhai, Jayakrishna Sahit

Detach dbpedia/neural-qa from upstream

Make dbpedia/neural-qa use LiberAI/NSpM as submodule
Keep gsoc folder only in fork

Broken Subjective Eye 3D dataset's compressed file

As mentioned in the repository, to download the SubjectiveEye3D dataset for pipeline3 one has to use: https://s3.amazonaws.com/subjectiveEye/0.9/subjectiveEye3D/part-r-00000.gz.

But it seems that the compressed file is broken and I can't extract anything out of it.

Zero division error

The readme states:

Split the data_.* files into train_.*, dev_.*, and test_.* (usually 80-10-10%).

But the code in 'split_in_train_dev_test.py' is :

TRAINING_PERCENTAGE = 90
TEST_PERCENTAGE = 0
DEV_PERCENTAGE = 10

This causes a zero division error at a later part of the code because the test set remains empty.

Fix tensorflow warnings

Command:

python build_vocab.py data/monument_300/data_300.en > data/monument_300/vocab.en

Output:

WARNING:tensorflow:From build_vocab.py:43: __init__ (from tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tensorflow/transform or tf.data.
WARNING:tensorflow:From /home/petrichor/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/preprocessing/text.py:154: __init__ (from tensorflow.contrib.learn.python.learn.preprocessing.categorical_vocabulary) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tensorflow/transform or tf.data.
WARNING:tensorflow:From /home/petrichor/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/preprocessing/text.py:170: tokenizer (from tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tensorflow/transform or tf.data.

Follow the updating instructions to handle these warnings.

Training Error

I had the following error while trying to run the following command
"sh train.sh data/monument_300 120000"

output:

Job id 0
Loading hparams from ../data/monument_300_model/hparams
Updating hparams.test_prefix: None -> ../data/monument_300/test
Updating hparams.num_train_steps: 12000 -> 120000
  saving hparams to ../data/monument_300_model/hparams
  saving hparams to ../data/monument_300_model/best_bleu/hparams
  attention=
  attention_architecture=standard
  batch_size=128
  beam_width=0
  best_bleu=0
  best_bleu_dir=../data/monument_300_model/best_bleu
  bpe_delimiter=None
  colocate_gradients_with_ops=True
  decay_factor=0.98
  decay_steps=10000
  dev_prefix=../data/monument_300/dev
  dropout=0.2
  encoder_type=uni
  eos=</s>
  epoch_step=0
  forget_bias=1.0
  infer_batch_size=32
  init_op=uniform
  init_weight=0.1
  learning_rate=1.0
  length_penalty_weight=0.0
  log_device_placement=False
  max_gradient_norm=5.0
  max_train=0
  metrics=[u'bleu']
  num_buckets=5
  num_embeddings_partitions=0
  num_gpus=1
  num_layers=2
  num_residual_layers=0
  num_train_steps=120000
  num_units=128
  optimizer=sgd
  out_dir=../data/monument_300_model
  pass_hidden_state=True
  random_seed=None
  residual=False
  share_vocab=False
  sos=<s>
  source_reverse=False
  src=en
  src_max_len=50
  src_max_len_infer=None
  src_vocab_file=../data/monument_300_model/vocab.en
  src_vocab_size=2228
  start_decay_step=0
  steps_per_external_eval=None
  steps_per_stats=100
  test_prefix=../data/monument_300/test
  tgt=sparql
  tgt_max_len=50
  tgt_max_len_infer=None
  tgt_vocab_file=../data/monument_300_model/vocab.sparql
  tgt_vocab_size=1763
  time_major=True
  train_prefix=../data/monument_300/train
  unit_type=lstm
  vocab_prefix=../data/monument_300/vocab
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/tarun/dbpedia/NSpM/nmt/nmt/nmt.py", line 495, in <module>
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "/home/tarun/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "/home/tarun/dbpedia/NSpM/nmt/nmt/nmt.py", line 488, in main
    run_main(FLAGS, default_hparams, train_fn, inference_fn)
  File "/home/tarun/dbpedia/NSpM/nmt/nmt/nmt.py", line 481, in run_main
    train_fn(hparams, target_session=target_session)
  File "nmt/train.py", line 171, in train
    train_model = model_helper.create_train_model(model_creator, hparams, scope)
  File "nmt/model_helper.py", line 69, in create_train_model
    src_dataset = tf.contrib.data.TextLineDataset(src_file)
AttributeError: 'module' object has no attribute 'TextLineDataset'

Tensorflow version

command:

sh train.sh data/monument_300 120000

output:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/petrichor/Projects/GSoC/test/nmt/nmt/nmt.py", line 28, in <module>
    from . import inference
  File "/home/petrichor/Projects/GSoC/test/nmt/nmt/inference.py", line 25, in <module>
    from . import attention_model
  File "/home/petrichor/Projects/GSoC/test/nmt/nmt/attention_model.py", line 22, in <module>
    from . import model
  File "/home/petrichor/Projects/GSoC/test/nmt/nmt/model.py", line 31, in <module>
    utils.check_tensorflow_version()
  File "/home/petrichor/Projects/GSoC/test/nmt/nmt/utils/misc_utils.py", line 33, in check_tensorflow_version
    raise EnvironmentError("Tensorflow version must >= 1.2.1")
OSError: Tensorflow version must >= 1.2.1

The latest stable release of tensorflow is: TensorFlow 1.12.2 (source: https://github.com/tensorflow/tensorflow/releases )

The corresponding nmt model present as a submodule in this repository specifies version to be Tensorflow version must >= 1.2.1. I cloned the latest nmt repository, it worked fine with the latest installation of tensorflow. Thus the problem is with the nmt submodule .

These are the few possible ways to tackle this issue:

Specify the tensorflow version in the documentation.(>= 1.2.1)
Update the submodule attached.

Fix PIPELINE

The PIPELINE file in /gsoc/aman/ has a bit of inconsistencies. And can be improved.

Convert to proper markdown format
Some instructions can be elaborated further.
A few (instructions)codes are showing errors on the file generated by following the previous steps: like step 4 had some issues with file generated as part of step 3.
The PIPELINE instead points to a previous folder named data in the initial directory containing files that work with the code without any error. (python decision_tree.py data/manual-annotation-updated-v2.csv).

Issues while running pipeline 1

I tried running the pipelines but because of some python version related issues I was getting errors in pipeline 1.

The solution which worked was using
from urllib.request import urlopen and then urlopen(<url>)
instead of import urlliband then using urllib.request.urlopen(<url>)
Make sure to use python3.7 to run the pipelines as @panchbhai1969 's code uses it.

It works becuase
The urllib and urllib2 modules from Python 2.x have been combined into the urllib module in Python 3 as mentioned here

Also, while setting up the project I realised it will be better to have a requirements.txt file.
I would like to do it too as my initial contribution.

Remove double space in build_vocab.py

Remove the double space in build_vocab.py

Quality control on DBNQA dataset

This task consists in to verify the quality of annotated question in the DBNQA dataset involving the following:

check SPARQL queries templates with SPARQL syntax errors;
check SPARQL queries that return empty results;
check the annotated templates and SPARQL templates are well-formed (e.g. (1) does not contain double spaces between words; (2) does not contain spelling mistakes; (3) the question does make any sense)

DBpedia Neural Multilingual QA

GSoC year: 2020
Student: Lahiru Hinguruduwa
Mentors: Edgard Marx, Renato Fabbri, Akshay Jagatap

BLEU Accuracy.

The BLEU Test gave me a value of 29.5 . According to the paper the value should have been much higher(~70).
I trained it for 120000 epochs.
Heres the snapshot of the model.

dbpedia / neural-qa Goto Github PK

neural-qa's People

Stargazers

Watchers

Forkers

neural-qa's Issues

Recommend Projects

Recommend Topics

Recommend Org