frankwork / conv_relation Goto Github PK

View Code? Open in Web Editor NEW

94.0 94.0 29.0 24.23 MB

TensorFlow implementation of Relation Classification via Convolutional Deep Neural Network

Shell 0.84% Python 58.78% Perl 40.38%

conv_relation's People

Contributors

Stargazers

Watchers

conv_relation's Issues

About results

According scorer.pl, they said

In the examples above, the first three files are OK, while the last one contains four errors. And answer_key2.txt contains the true labels for the training dataset.

So the first param of scorer.pl should be network's predictions and the second param should be test_keys.txt. But in your log files I found this line

!!!WARNING!!! The proposed file contains 1 label(s) of type 'Entity-Destination(e2,e1)', which is NOT present in the key file.

It's seemed to be that you passed test_keys.txt as the first param and network's predictions as the second param.

问题

您好，直接run运行后报以下错误：
Parent directory of saved_models/cnn-200-50/model.ckpt doesn't exist, can't save.

请问如何解决？

为什么把 lexical reshape 成 6×word_dim?6的含义是什么？

Why is relation id included in features ?

Hello,
I couldn't understand why is rid included when building the tf.train.SequenceExample(), in the following lines :
rid = raw_example.label
ex.context.feature['rid'].int64_list.value.append(rid)
Does it mean the rid is considered as a feature to train the cnn on ?
Thanks in advance

问题

能提供一下原始的数据集吗？

Position feature embedding seems not right?

In the paper, it says:

The position should be calculated for each word in the sentence, relative to the two target entity words.
An example was given in the paper:

People have been moving back into downtown

The correspondent embedding for "moving", regarding to "people" and "downtown" should be:
[WordVec, 3, -3]

However, in the code:

What's used is the position of the two entity words. This kinda misses the point of using position embedding as it's supposed to be a means to extract structure features from the sentence.

PCNN

您好!我想知道您是否将您的模型修改为PCNN，是否有公开的源码。

问题

您好，请问data文件夹下“.cln”结尾的数据集中前五个数字表示什么？是词级别特征中的五个特征么？为什么是数字？
是否有数据预处理的代码？

谢谢

Testing on a new test set generate an error

I changed the code to do the validation on a dev set instead of the test set. But when I wanted to test the model on my test set, i got an error when mapping words_to_ids (reader/base.py), this is due to the fact that the vocab.txt file was constructed only on the train and dev data.
Do I need to use three of the train, dev and test sets to construct this vocab.txt file ?
Wouldn't this be a heavy constraint when using the model to predict on new data that we don't know its vocabulary in advance ?
Thanks for your answer in advance.

Some question about the Result

Hi,
Thanks for your released code. I have run the code and got the Accuracy is 0.779 and no F1-score(In fact the F1-score will be smaller about 4-5% than the accuracy); However the F1-score in the paper is almost 80-82% (except the WordNet lexcial features).
So I wonder that whether there are some tricks in the paper? And have you reach the result in the paper?

Thanks.

DataLossError

Hi, thanks for releasing the source code.
I got an DataLossError error when running the code. The log files show that your code was running fine. So I do not know what happened when running your code on my computer.
Do you have any idea for what happened ?
Thank you very much.
Best,
Dat.

2018-01-12 16:43:37.268221: W tensorflow/core/framework/op_kernel.cc:1192] Data loss: truncated record at 5986508
Traceback (most recent call last):
File "src/train.py", line 170, in
tf.app.run()
File "/Users/dqnguyen/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "src/train.py", line 164, in main
train(sess, m_train, m_valid)
File "src/train.py", line 103, in train
_, loss, acc = sess.run(fetches)
File "/Users/dqnguyen/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/Users/dqnguyen/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/Users/dqnguyen/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/Users/dqnguyen/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.DataLossError: truncated record at 5986508
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,6], [?], [?,?], [?,?], [?,?]], output_types=[DT_INT64, DT_INT64, DT_INT64, DT_INT64, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op u'IteratorGetNext', defined at:
File "src/train.py", line 170, in
tf.app.run()
File "/Users/dqnguyen/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "src/train.py", line 139, in main
train_data, test_data, word_embed = base_reader.inputs()
File "/Users/dqnguyen/workspace/RelationExtraction/conv_relation-master/src/reader/base.py", line 317, in inputs
pad_value, shuffle=True)
File "/Users/dqnguyen/workspace/RelationExtraction/conv_relation-master/src/reader/base.py", line 280, in read_tfrecord_to_batch
batch = iterator.get_next()
File "/Users/dqnguyen/tensorflow/lib/python2.7/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 259, in get_next
name=name))
File "/Users/dqnguyen/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 706, in iterator_get_next
output_shapes=output_shapes, name=name)
File "/Users/dqnguyen/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/Users/dqnguyen/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/Users/dqnguyen/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

DataLossError (see above for traceback): truncated record at 5986508
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,6], [?], [?,?], [?,?], [?,?]], output_types=[DT_INT64, DT_INT64, DT_INT64, DT_INT64, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

use test dataset as validation dataset?

Hello Frank, @FrankWork

I think you have implemented the model well. However, it seems you have made a mistake because you use test dataset as the validation dataset (which is forbidden absolutely in ML). So, I think the performance you have showed in log files is not exact.

Maybe I have misunderstood your code. Expecting your reply.

frankwork / conv_relation Goto Github PK

conv_relation's People

Contributors

Stargazers

Watchers

Forkers

conv_relation's Issues

Recommend Projects

Recommend Topics

Recommend Org