dbpedia / neural-qa Goto Github PK
View Code? Open in Web Editor NEWThis project forked from liberai/nspm
๐ A Neural QA Model for DBpedia using Neural SPARQL Machines.
License: MIT License
This project forked from liberai/nspm
๐ A Neural QA Model for DBpedia using Neural SPARQL Machines.
License: MIT License
On running the command:
python generator.py --templates data/annotations_monument.csv --output data/monument_300
The output comes out to be:
39
for 1th template
for 2th template
for 3th template
for 4th template
for 5th template
for 6th template
for 7th template
for 8th template
for 9th template
for 10th template
for 11th template
for 12th template
for 13th template
for 14th template
for 15th template
for 16th template
for 17th template
for 18th template
for 19th template
for 20th template
for 21th template
for 22th template
for 23th template
for 24th template
for 25th template
for 26th template
for 27th template
for 28th template
for 29th template
for 30th template
for 31th template
for 32th template
for 33th template
for 34th template
for 35th template
for 36th template
for 37th template
for 38th template
for 39th template
Which can be converted into a progress bar instead. Using libraries like tqdm or progressbar.
encode() method in generator_utils.py
will potentially encode the abbreviation dot into a <sep_dot> token, which would affect the dataset and the final performance, for example:
select ?x where { dbr_Bandon_A.F.C. dbo_ceo ?x }
will be encoded into 'select var_x where brack_open dbr_Bandon_A.F.C sep_dot dbo_ceo var_x brack_close'
but not 'select var_x where brack_open dbr_Bandon_A.F.C. dbo_ceo var_x brack_close'
dbpedia/neural-qa
use LiberAI/NSpM
as submodulegsoc
folder only in forkAs mentioned in the repository, to download the SubjectiveEye3D dataset for pipeline3 one has to use: https://s3.amazonaws.com/subjectiveEye/0.9/subjectiveEye3D/part-r-00000.gz.
But it seems that the compressed file is broken and I can't extract anything out of it.
The readme states:
Split the data_.* files into train_.*, dev_.*, and test_.* (usually 80-10-10%).
But the code in 'split_in_train_dev_test.py' is :
TRAINING_PERCENTAGE = 90
TEST_PERCENTAGE = 0
DEV_PERCENTAGE = 10
This causes a zero division error at a later part of the code because the test set remains empty.
Command:
python build_vocab.py data/monument_300/data_300.en > data/monument_300/vocab.en
Output:
WARNING:tensorflow:From build_vocab.py:43: __init__ (from tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tensorflow/transform or tf.data.
WARNING:tensorflow:From /home/petrichor/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/preprocessing/text.py:154: __init__ (from tensorflow.contrib.learn.python.learn.preprocessing.categorical_vocabulary) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tensorflow/transform or tf.data.
WARNING:tensorflow:From /home/petrichor/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/preprocessing/text.py:170: tokenizer (from tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tensorflow/transform or tf.data.
Follow the updating instructions to handle these warnings.
I had the following error while trying to run the following command
"sh train.sh data/monument_300 120000"
output:
Job id 0
Loading hparams from ../data/monument_300_model/hparams
Updating hparams.test_prefix: None -> ../data/monument_300/test
Updating hparams.num_train_steps: 12000 -> 120000
saving hparams to ../data/monument_300_model/hparams
saving hparams to ../data/monument_300_model/best_bleu/hparams
attention=
attention_architecture=standard
batch_size=128
beam_width=0
best_bleu=0
best_bleu_dir=../data/monument_300_model/best_bleu
bpe_delimiter=None
colocate_gradients_with_ops=True
decay_factor=0.98
decay_steps=10000
dev_prefix=../data/monument_300/dev
dropout=0.2
encoder_type=uni
eos=</s>
epoch_step=0
forget_bias=1.0
infer_batch_size=32
init_op=uniform
init_weight=0.1
learning_rate=1.0
length_penalty_weight=0.0
log_device_placement=False
max_gradient_norm=5.0
max_train=0
metrics=[u'bleu']
num_buckets=5
num_embeddings_partitions=0
num_gpus=1
num_layers=2
num_residual_layers=0
num_train_steps=120000
num_units=128
optimizer=sgd
out_dir=../data/monument_300_model
pass_hidden_state=True
random_seed=None
residual=False
share_vocab=False
sos=<s>
source_reverse=False
src=en
src_max_len=50
src_max_len_infer=None
src_vocab_file=../data/monument_300_model/vocab.en
src_vocab_size=2228
start_decay_step=0
steps_per_external_eval=None
steps_per_stats=100
test_prefix=../data/monument_300/test
tgt=sparql
tgt_max_len=50
tgt_max_len_infer=None
tgt_vocab_file=../data/monument_300_model/vocab.sparql
tgt_vocab_size=1763
time_major=True
train_prefix=../data/monument_300/train
unit_type=lstm
vocab_prefix=../data/monument_300/vocab
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/tarun/dbpedia/NSpM/nmt/nmt/nmt.py", line 495, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/home/tarun/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/home/tarun/dbpedia/NSpM/nmt/nmt/nmt.py", line 488, in main
run_main(FLAGS, default_hparams, train_fn, inference_fn)
File "/home/tarun/dbpedia/NSpM/nmt/nmt/nmt.py", line 481, in run_main
train_fn(hparams, target_session=target_session)
File "nmt/train.py", line 171, in train
train_model = model_helper.create_train_model(model_creator, hparams, scope)
File "nmt/model_helper.py", line 69, in create_train_model
src_dataset = tf.contrib.data.TextLineDataset(src_file)
AttributeError: 'module' object has no attribute 'TextLineDataset'
command:
sh train.sh data/monument_300 120000
output:
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/petrichor/Projects/GSoC/test/nmt/nmt/nmt.py", line 28, in <module>
from . import inference
File "/home/petrichor/Projects/GSoC/test/nmt/nmt/inference.py", line 25, in <module>
from . import attention_model
File "/home/petrichor/Projects/GSoC/test/nmt/nmt/attention_model.py", line 22, in <module>
from . import model
File "/home/petrichor/Projects/GSoC/test/nmt/nmt/model.py", line 31, in <module>
utils.check_tensorflow_version()
File "/home/petrichor/Projects/GSoC/test/nmt/nmt/utils/misc_utils.py", line 33, in check_tensorflow_version
raise EnvironmentError("Tensorflow version must >= 1.2.1")
OSError: Tensorflow version must >= 1.2.1
The latest stable release of tensorflow is: TensorFlow 1.12.2 (source: https://github.com/tensorflow/tensorflow/releases )
The corresponding nmt model present as a submodule in this repository specifies version to be Tensorflow version must >= 1.2.1. I cloned the latest nmt repository, it worked fine with the latest installation of tensorflow. Thus the problem is with the nmt submodule .
These are the few possible ways to tackle this issue:
The PIPELINE file in /gsoc/aman/
has a bit of inconsistencies. And can be improved.
python decision_tree.py data/manual-annotation-updated-v2.csv
).I tried running the pipelines but because of some python version related issues I was getting errors in pipeline 1.
The solution which worked was using
from urllib.request import urlopen
and then urlopen(<url>)
instead of import urllib
and then using urllib.request.urlopen(<url>)
Make sure to use python3.7
to run the pipelines as @panchbhai1969 's code uses it.
It works becuase
The urllib and urllib2 modules from Python 2.x have been combined into the urllib module in Python 3
as mentioned here
Also, while setting up the project I realised it will be better to have a requirements.txt
file.
I would like to do it too as my initial contribution.
Remove the double space in build_vocab.py
This task consists in to verify the quality of annotated question in the DBNQA dataset involving the following:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.