Code Monkey home page Code Monkey logo

nspm's People

Contributors

akaha avatar amanmehta-maniac avatar ashutosh16399 avatar dependabot[bot] avatar edgardmarx avatar jmeintrup avatar kshivendu avatar mommi84 avatar panchbhai1969 avatar pijusch avatar sahandilshan avatar srtarun avatar theodore3131 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nspm's Issues

Duplicate Initialization during Training Process

During executing the demo training step in the README, I found these two statements every epoch, I suppose there may be some unneeded code in the train.py

Table trying to initialize from file ../data/monument_300_model/vocab.en is already initialized.
Table trying to initialize from file ../data/monument_300_model/vocab.en is already initialized.

Trying to solve it.

What does ppl mean?

thanks for your excellent work in the interesting issue.
When i'm training with your given monument_300 data,I saw the output like this:

step 4100 lr 1 step-time 2.35s wps 2.36K ppl 64.10 gN 3.08 bleu 2.74, Sat Dec 8 14:28:50 2018
Can you tell me what does the ppl and gN mean? And why is the BLEU score so small?
Thank you very much.

Conflicting dependencies

When trying to install the dependencies with
pip install -r requirements.txt
several errors concerning conflicting dependencies are raised.

Que verdade

espaço: a fronteira final. Estas são as viagens da nave estelar Enterprise. Em sua missão de cinco anos... para explorar novos mundos... para pesquisar novas formas de vida e novas civilizações... audaciosamente indo onde nenhum homem jamais esteve.

Corrupted file

unzip art_30.zip
Archive: art_30.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of art_30.zip or
art_30.zip.zip, and cannot find art_30.zip.ZIP, period.

Issues in the file data folder monument 300 zip

there are some isssues in the file data folder monument 300 zip file in this file build_vocab there are some library are not mentioned

import numpy as np
from tensorflow.contrib import learn
import sys
from importlib import reload


reload(sys)
#there  is no neccessitiy of encoding as it is use in python 2.5 version so we can remove
sys.setdefaultencoding("utf-8")

x_text = list()
#there is no arguement in the script so we can change 1 to 0
with open(sys.argv[0]) as f:
    for line in f:
#we will remove unicode
        x_text.append(unicode(line[:-1]))

# x_text = ['This is a cat','This must be boy', 'This is a a dog']
max_document_length = max([len(x.split(" ")) for x in x_text])

## Create the vocabularyprocessor object, setting the max lengh of the documents.
vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)

## Transform the documents using the vocabulary.
x = np.array(list(vocab_processor.fit_transform(x_text)))    

## Extract word:id mapping from the object.
vocab_dict = vocab_processor.vocabulary_._mapping

## Sort the vocabulary dictionary on the basis of values(id).
## Both statements perform same task.
#sorted_vocab = sorted(vocab_dict.items(), key=operator.itemgetter(1))
sorted_vocab = sorted(vocab_dict.items(), key = lambda x : x[1])

## Treat the id's as index into list and create a list of words in the ascending order of id's
## word with id i goes at index i of the list.
vocabulary = list(list(zip(*sorted_vocab))[0])

# print(vocabulary)
# print(x)
for v in vocabulary:
    print(v)

Deprecated Tensorflow functions

While running

python build_vocab.py data/monument_300/data_300.en > data/monument_300/vocab.en    

The python interpreter gives out the following warning

WARNING:tensorflow:From build_vocab.py:43: init (from tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version.

Can use tensorflow/transform or tf.data in place to keep up with the recent updates (as suggested by the python interpreter)

ZeroDivisionError while training

File "nmt/model_helper.py", line 444, in compute_perplexity
perplexity = utils.safe_exp(total_loss / total_predict_count)
ZeroDivisionError: integer division or modulo by zero

fault in tensorflow version check

wrong output of tensorflow version check in NSpM/nmt/nmt/utils/misc_utils.py
EnvironmentError: Tensorflow version must >= 1.2.1

changes to be made
from

if tf.__version__ < “1.2.1”:
  raise EnvironmentError("Tensorflow version must >= 1.2.1”)

to

if tf.__version__ < "1.02.1":
    raise EnvironmentError("Tensorflow version must >= 1.02.1")

Where is the LC-QuAD dataset?

The LC-QuAD data set has 5000 pairs, but I generated it through the lc-quad cvs file in the data path, and the result exceeded hundreds of thousands of LC-QuAD sentence pairs.Please can you help me generate accurate LC-QuAD data set

Possible fault in version check implemented in nmt (linked to tensorflow's implementation)

The version check in nmt/utils/misc_utils.py has a snippet of code for checking the version of TF installed on the local machine

def check_tensorflow_version():
  if tf.__version__ < "1.2.1":
    raise EnvironmentError("Tensorflow version must >= 1.2.1")

I have tested out the code using both the current version of TF as well as TF-nightly as the authors want you to do (mentioned on their README.md). The current version is 1.12 and it is obvious the check is failing because it's comparing the versions numerically.

I have created an issue on the main project of nmt as well, but since this would affect the working of this project as well, I am also opening an issue here.

Deprecation warning in build_vocab.py

Environment

  • Python 3.7.6
  • tensorflow==1.14.0

Log

$ python build_vocab.py data/monument_300/data_300.en > data/monument_300/vocab.en
WARNING:tensorflow:From build_vocab.py:44: VocabularyProcessor.__init__ (from tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tensorflow/transform or tf.data.
WARNING:tensorflow:From /usr/local/lib/python3.7/site-packages/tensorflow/contrib/learn/python/learn/preprocessing/text.py:154: CategoricalVocabulary.__init__ (from tensorflow.contrib.learn.python.learn.preprocessing.categorical_vocabulary) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tensorflow/transform or tf.data.
WARNING:tensorflow:From /usr/local/lib/python3.7/site-packages/tensorflow/contrib/learn/python/learn/preprocessing/text.py:170: tokenizer (from tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tensorflow/transform or tf.data.

Fixes in README.md

It may seem trivial, but README.md files are the first point of information that a parson refers to when trying to understand a project, The repository's README.md is good, but i have found some error that need attention.

NUMLINES= $(echo awk '{ print $1}' | cat data/monument_300/data_300.sparql | wc -l)
There shouldn't be any space after =, the line should be :
NUMLINES=$(echo awk '{ print $1}' | cat data/monument_300/data_300.sparql | wc -l)

Other README.md related issues and suggestions brought up by users are #12 #14 #17

cat: output.txt: No such file or directory

On running the ./ask.sh script I'm getting the following error:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/content/NSpM/nmt/nmt/nmt.py", line 707, in <module>
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/content/NSpM/nmt/nmt/nmt.py", line 700, in main
    run_main(FLAGS, default_hparams, train_fn, inference_fn)
  File "/content/NSpM/nmt/nmt/nmt.py", line 658, in run_main
    save_hparams=(jobid == 0))
  File "/content/NSpM/nmt/nmt/nmt.py", line 607, in create_or_load_hparams
    hparams = extend_hparams(hparams)
  File "/content/NSpM/nmt/nmt/nmt.py", line 493, in extend_hparams
    unk=vocab_utils.UNK)
  File "/content/NSpM/nmt/nmt/utils/vocab_utils.py", line 137, in check_vocab
    raise ValueError("vocab_file '%s' does not exist." % vocab_file)
ValueError: vocab_file '../data/monument_300/vocab.en' does not exist.
# Job id 0
# Devices visible to TensorFlow: [_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 2340982298104704118), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 7897109989793672363)]
# Creating output directory ../data/monument_300_model ...

ANSWER IN SPARQL SEQUENCE:
cat: output.txt: No such file or directory

Can someone please help me with this?

Issues while running pipeline 1

I tried running the pipelines but because of some python version related issues I was getting errors in pipeline 1.

The solution which worked was using
from urllib.request import urlopen and then urlopen(<url>)
instead of import urlliband then using urllib.request.urlopen(<url>)
Make sure to use python3.7 to run the pipelines as @panchbhai1969 's code uses it.

It works becuase
The urllib and urllib2 modules from Python 2.x have been combined into the urllib module in Python 3 as mentioned here

Also, while setting up the project I realised it will be better to have a requirements.txt file.
I would like to do it too as my initial contribution.

How to use "analyse.sh" and "filter_dataset.py"?

Hello together

First of all, thank you for sharing the code of this project.
I was able to train the model and make some predictions, but now I want to find the shortcomings of the model, so I want to analyze on which questions/queries the model performs well.

I found the "analyse.sh" script and the "filter_dataset.py".
Now I want to ask you what's the purpose of these files and how to use them.

Thank you for your time
Kind regards
Nicolas

adding requirements.txt

to specify the packages and their version required for the project
ex
enum34
numpy
tensorflow==1.2.0

Error in path in ask.sh?

Hey, I don't know if I am just misunderstanding the instructions, but the project kept giving me ValueError:"Can't load save_path when it is None." if I tried to precisely follow the instructions in the readme.

I believe it would work correctly with your readme instructions if you would change this part of ask.sh:
python -m nmt.nmt --vocab_prefix=../$1/vocab --model_dir=../$1_model --inference_input_file=./to_ask.txt --inference_output_file=./output.txt --out_dir=../$1_model --src=en --tgt=sparql | tail -n4

to this:
python -m nmt.nmt --vocab_prefix=../$1/vocab --model_dir=../$1 --inference_input_file=./to_ask.txt --inference_output_file=./output.txt --out_dir=../$1 --src=en --tgt=sparql | tail -n4
(that $1_model makes it repeat the word weirdly in the folder names)

Please, feel free to correct me if I am wrong, and thank you for your awesome paper.

Fix prediction script

One of the last TensorFlow updates broke the ./ask.sh script. Probably it has to be written from scratch.

Duplicates in movies dataset

Hello!

Firstly thank you very much for your repository and research. This is a very interesting field. I am currently using your monument dataset as the training data in my master thesis.

I notice you uploaded a new dataset called movies_300.zip several days ago. I intended to try it in my experiments as well but I found that it has many duplicate lines in the training file (e.g. "how long is the longest movie" showed 227 times in 'train.en').
Could you explain what is the reason for that? Is it appropriate to use this dataset for training or this dataset is just made for other tasks?

Thank you and best regards
Xiaoyu

Tiny fix in example of Readme.md

Hi, a very small thing, but when running the example presented in the Readme.md, in the "Interpreter Module" section, the argument "--output" is not currently supported. Here is the fixed line of code.

Current:
python nspm/interpreter.py --input data/art_30 --output data/art_30 --query "yuncken freeman has architected in how many cities?"

New:
python nspm/interpreter.py --input data/art_30 --query "yuncken freeman has architected in how many cities?"

Attention for LSTM

Hello,
I really like your work on using seq2seq for creating SPARQL queries - just one question: Was there a specific reason not to include attention while training? As far as I understood the tensorflow NMT guilde, you would have to add something like --attention=scaled_luong to the options in your train.sh. Did you evaluate whether it works better with/without attention?
Greetings!

Possible need to rename files

While splitting the data file into train, dev and test sets by running the following commands given in the README.md

cd data/monument_300/
python ../../split_in_train_dev_test.py --lines $NUMLINES  --dataset data.sparql

I run into the following error

Traceback (most recent call last):
File "../../split_in_train_dev_test.py", line 42, in
with open(sparql_file) as original_sparql, open(en_file) as original_en:
IOError: [Errno 2] No such file or directory: 'data.sparql'

which can be solved by renaming the files in monument_300 (data_300.sparql and data_300.en to data.sparql and data.en)

How to deal with out of vocabulary words?

Hi,

I recently utilized the technique that has been discussed in this project for transforming a natural language sentence into a SPARQL query. Based on this, I created an end to end question answering system as part of my final year project. The system works well for known resource names, however; for questions which contain out of vocabulary words (resource names/words not part of the training data), the system does not predict an accurate query.

In the Neural Machine Translation for Query Construction paper, it says that External pre-trained word embeddings help deal with vocabulary mismatch. I am not sure how this would be implemented, could you provide any insight? I am already finished with the project but I would still like to learn about this.

The project I created is available on GitHub and can be found here if you would like to see. There's also a deployed version of the system and can be found here.

Thanks for the help in advance.

Update README.md to indicate the use of Python 2.7

Since the support for Python2 is now being rescinded, it would be great if the README.md could indicate that code had been written in Python2 so any future developers could set up the appropriate development environment.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.