strubell / lisa Goto Github PK

Linguistically-Informed Self-Attention implemented in TensorFlow

License: Apache License 2.0

Python 48.00% Shell 0.57% Perl 51.42%

lisa's Introduction

LISA: Linguistically-Informed Self-Attention

This is a work-in-progress, but much-improved, re-implementation of the linguistically-informed self-attention (LISA) model described in the following paper:

Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, and Andrew McCallum. Linguistically-Informed Self-Attention for Semantic Role Labeling. Conference on Empirical Methods in Natural Language Processing (EMNLP). Brussels, Belgium. October 2018.

To exactly replicate the results in the paper at the cost of an unpleasantly hacky codebase, you can use the original LISA code here.

Requirements:

>= Python 3.6
>= TensorFlow 1.9 (tested up to 1.12)

Quick start:

Data setup (CoNLL-2005):

Get pre-trained word embeddings (GloVe):

wget -P embeddings http://nlp.stanford.edu/data/glove.6B.zip
unzip -j embeddings/glove.6B.zip glove.6B.100d.txt -d embeddings

Get CoNLL-2005 data in the right format using this repo. Follow the instructions all the way through further preprocessing.
Make sure the correct data paths are set in config/conll05.conf

Train a model:

To train a model with save directory model using the configuration conll05-lisa.conf:

bin/train.sh config/conll05-lisa.conf --save_dir model

Evaluate a model:

To evaluate the latest checkpoint saved in the directory model:

bin/evaluate.sh config/conll05-lisa.conf --save_dir model

Evaluate an exported model:

To evaluate the best¹ checkpoint so far, saved in the directory model (with id 1554216594):

bin/evaluate-exported.sh config/conll05-lisa.conf --save_dir model/export/best_exporter/1554216594

Training

The bin/train.sh script calls src/train.py with parameters specified in top-level configs (i.e. conll05-lisa.conf) which is the entry point for training. The following table describes the command line parameters that may be passed to src/train.py to configure training:

Name	Type	Description	Default value
`train-files`	string	Comma-separated list of training data files.	None
`dev-files`	string	Comma-separated list of development data files.	None
`save-dir`	string	Directory to save models, outputs, etc. If the directory already exists and contains a trained model, training will restart where it left off. Vocabularies will be re-used.	None
`transition_stats`	string	File containing pre-computed transition statistics between labels. Tab-separated file with one label-label-probability triple per line.	None
`hparams`	string	Comma separated list of `name=value` hyperparameter settings.	None
`debug`	string	Whether to run in debug mode: a little faster and smaller.	False
`data_config`	string	Path to data configuration json.	None
`model_configs`	string	Comma-separated list of paths to model configuration json.	None
`task_configs`	string	Comma-separated list of paths to data configuration json.	None
`layer_configs`	string	Comma-separated list of paths to data configuration json.	None
`attention_configs`	string	Comma-separated list of paths to attention configuration json.	None
`keep_k_best_models`	int	Number of best models to keep.	1
`best_eval_key`	string	Key corresponding to the evaluation to be used for determining early stopping. The value must correspond to a named eval under the `eval_fns` entry in a task config.	None

Hyperparameters

The following table lists optimization/training hyperparameters that can be set through the hparams command line flag. Hyperparameters are initialized to the default values are defined in src/constants.py. Then, these are overridden by hyperparameters set in the model config (e.g., glove_basic.json). Finally, these are overridden by hyperparameters specified at the command line. Hyperparameter loading is implemented in src/train_utils.py.

Name	Type	Description	Default value
`learning_rate`	float	Initial learning rate.	0.04
`beta1`	float	Adam first moment decay rate.	0.9
`beta2`	float	Adam second moment decay rate.	0.98
`epsilon`	float	Adam epsilon.	1e-12
`decay_rate`	float	Exponential rate of decay for learning rate.	1.5
`use_nesterov`	boolean	Whether to use Nesterov momentum in Adam.	true
`decay_steps`	int	If `warmup_steps` is not set, perform stepwise decay of learning rate every this many steps.	5000
`warmup_steps`	int	Number of training steps to linearly increase learning rate before exponential decay.	8000
`batch size`	int	Approximate number of sentences per batch.	256
`shuffle_buffer_multiplier`	int	Value to multiply by batch size to determine buffer size for efficient shuffling of examples during training. Higher means better shuffles, lower means less initial time required to fill shuffle buffer.	100
`eval_throttle_secs`	int	Do not run evaluation unless at least this many seconds have passed since the last evaluation.	1000
`eval_every_steps`	int	Evaluate every this many steps.	1000
`num_train_epochs`	int	Iterate through the full training data this many times.	10000
`gradient_clip_norm`	float	Clip gradients to this maximum value.	5.0
`label_smoothing`	float	Amount of label corruption for smoothing. Smoothing not performed if this value is 0.	0.1
`moving_average_decay`	float	Rate of decay for moving average of model parameters. Averaging not performed if this value is 0.	0.999
`average_norms`	boolean	Whether to average variables representing norms in parameter averaging.	false
`input_dropout`	float	Dropout rate on input layer (embeddings).	1.0
`bilinear_dropout`	float	Dropout rate used in bilinear classifier.	1.0
`mlp_dropout`	float	Dropout used in MLP layers	1.0
`attn_dropout`	float	Dropout rate on attention in transformer.	1.0
`ff_dropout`	float	Dropout rate in feed-forward layer in transformer.	1.0
`prepost_dropout`	float	Dropout rate applied before and after the feed-forward part of transformer layer.	1.0
`random_seed`	int	Random seed to use for training.	time.time()

Model hyperparameters (e.g. layer size, number of self-attention heads) are set in the model config json.

Evaluation

TODO

Custom configuration [WIP]

LISA model configuration is defined through a combination of configuration files. A top-level config defines a specific model configuration and dataset by setting other configurations. Top-level configs are written in bash, and bottom-level configs are written in json. Here is an example top-level config, conll05-lisa.conf, which defines the basic LISA model and CoNLL-2005 data:

# use CoNLL-2005 data  
source config/conll05.conf  
  
# take glove embeddings as input  
model_configs=config/model_configs/glove_basic.json  
  
# joint pos/predicate layer, parse heads and labels, and srl  
task_configs="config/task_configs/joint_pos_predicate.json,config/task_configs/parse_heads.json,config/task_configs/parse_labels.json,config/task_configs/srl.json"  
  
# use parse in attention  
attention_configs="config/attention_configs/parse_attention.json"  
  
# specify the layers  
layer_configs="config/layer_configs/lisa_layers.json"

And the top-level data config for the CoNLL-2005 dataset that it loads, conll05.conf:

data_config=config/data_configs/conll05.json  
data_dir=$DATA_DIR/conll05st-release-new  
train_files=$data_dir/train-set.gz.parse.sdeps.combined.bio  
dev_files=$data_dir/dev-set.gz.parse.sdeps.combined.bio  
test_files=$data_dir/test.wsj.gz.parse.sdeps.combined.bio,$data_dir/test.brown.gz.parse.sdeps.combined.bio

Note that $DATA_DIR is a bash global variable, but all the other variables are defined in these configs.

There are five types of bottom-level configurations, specifying different aspects of the model:

data configs: Data configs define a mapping from columns in a one-word-per-line formatted file (e.g. the CoNLL-X format) to named features and labels that will be provided to the model as batches.
model configs: Model configs define hyperparameters, both model hyperparameters, like various embedding dimensions, and optimization hyperparameters, like learning rate. Optimization hyperparameters can be reset at the command line using the hparams command line parameter, which takes a comma-separated list of name=value hyperparameter settings. Model hyperparameters cannot be redefined in this way, since this would invalidate a serialized model.
task configs: Task configs define a task: label, evaluation, and how predictions are formed from the model. Each task (e.g. SRL, parse edges, parse labels) should have its own task config.
layer configs: Layer configs attach tasks to layers, defining which layer representations should be trained to predict named labels (from the data config). The number of layers in the model is determined by the maximum depth listed in layer configs.
attention configs (optional): Attention configs define special attention functions which replace attention heads, i.e. syntactically-informed self attention. Omitting any attention configs results in a model performing simple single- or multi-task learning.

How these different configuration files work is specified in more detail below.

Data configs

An full example data config can be seen here: conll05.json.

Each top-level entry in the json defines a named feature or label that will be provided to the model. The following table describes the possible parameters for configuring how each input is interpreted.

Field	Type	Description	Default value
`conll_idx`	int or list	Column in the data file corresponding to this input.	N/A (required)
`vocab`	string	Name of the vocabulary used to map this (string) input to int.	None (output of converter is int)
`type`	string	Type of `conll_idx`. Possible types are: range, other (int/list). "range" can be used to specify that a variable-length range of columns should be read in at once and passed to the converter. Otherwise, the given single int or list of columns is read in and passed to the converter.	"other" (int/list)
`feature`	boolean	Whether this input should be used as a feature, i.e. provided to the model as input.	false
`label`	boolean	Whether this input should be used as a label, i.e. provided to the model as a label.	false
`updatable`	boolean	Whether this vocab should be updated after its initial creation (i.e. after creating a vocab based on the training data).	false
`converter`	json	A json object defining a function (name and, optionally, parameters) for converting the raw input. These functions are defined in `src/data_converters.py`.	`idx_list_converter`
`oov`	boolean	Whether an `OOV` entry should be added to this input's vocabulary.	false

Converters

The data config specifies a converter function and vocabulary for each desired column in the input data file. For each entry in the data config and each line in the input file, the column values specified by conll_idx are read in and provided to the given converter. Data generators, which take the data config and data file as input to perform this mapping, are defined in src/data_generator.py.

New converter functions can be defined in src/data_converters.py. At a minimum, every converter function takes two parameters: split_line, the current line in the data file split by whitespace, and idx, the value of conll_idx. Converters may also take additional parameters, whose values are defined via the params field in the converter json object. The output of a converter is a list of strings.

For example, the default converter, idx_list_converter, simply takes a single column index or list of indices and returns a list containing the corresponding column values in the input file:

def idx_list_converter(split_line, idx):
  if isinstance(idx, int):
    return [split_line[idx]]
  return [split_line[i] for i in idx]

Vocabs

When a vocab is specified for an entry in the data config, that vocab is used to map the string output of the converter to integer values suitable for features/labels in a TensorFlow model.² This mapping occurs in the map_strings_to_ints function in src/dataset.py.

TODO: vocab initialization
TODO: pre-trained word embeddings

Model configs

TODO

Layer configs

TODO

Task configs

TODO

Attention configs

TODO

Footnotes

1: "Best" is determined by best_eval_key, with default value for a given dataset in the top-level data config, e.g. config/conll05.conf. The value of best_eval_key must correspond to a named eval under the eval_fns entry in a task config. ↩︎

2: If no vocab is specified, then it's assumed that the output of the converter can be interpreted as an integer. ↩︎

lisa's People

Contributors

Stargazers

Watchers

lisa's Issues

I want to konw the way to preprocess CoNLL 2012 dataset,could you give some instructor?

Invalid argument error during training

Hello Ms.Strubell :-) I am trying to train and evaluate your LISA model on CoNLL05 dataset. I followed the recipe in this post https://github.com/strubell/preprocess-conll05 for preprocesing ConLL2005 dataset and I have adapted the data path in configuration file correspondingly. When I run the training, the initialization steps of tensorflow model seem to work normally. But after "filling up the shuffle buffer" , I got following error information immediately.. Do you have any ideas about the reason of this error? And could you have any pretrained models on CoNLL05 dataset ?

2018-10-18 23:39:20.446629: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:135] Shuffle buffer filled.
Traceback (most recent call last):
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
return fn(*args)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[862] = 5199 is not in [0, 1968)
[[Node: LISA/Nadam/update_LISA/word_type_embeddings/embeddings/GatherV2_3 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@LISA/Nadam/update_LISA/word_type_embeddings/embeddings/ScatterAdd"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](LISA/Nadam/update_LISA/word_type_embeddings/embeddings/add_1, LISA/Nadam/update_LISA/word_type_embeddings/embeddings/Unique, LISA/Nadam/update_LISA/word_type_embeddings/embeddings/GatherV2_3/axis)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "src/train.py", line 143, in
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 451, in train_and_evaluate
return executor.run()
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 590, in run
return self.run_local()
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 691, in run_local
saving_listeners=saving_listeners)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 376, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1145, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1173, in _train_model_default
saving_listeners)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1451, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 583, in run
run_metadata=run_metadata)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1059, in run
run_metadata=run_metadata)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1150, in run
raise six.reraise(*original_exc_info)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1135, in run
return self._sess.run(*args, **kwargs)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1207, in run
run_metadata=run_metadata)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 987, in run
return self._sess.run(*args, **kwargs)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[862] = 5199 is not in [0, 1968)
[[Node: LISA/Nadam/update_LISA/word_type_embeddings/embeddings/GatherV2_3 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@LISA/Nadam/update_LISA/word_type_embeddings/embeddings/ScatterAdd"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](LISA/Nadam/update_LISA/word_type_embeddings/embeddings/add_1, LISA/Nadam/update_LISA/word_type_embeddings/embeddings/Unique, LISA/Nadam/update_LISA/word_type_embeddings/embeddings/GatherV2_3/axis)]]

Caused by op 'LISA/Nadam/update_LISA/word_type_embeddings/embeddings/GatherV2_3', defined at:
File "src/train.py", line 143, in
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 451, in train_and_evaluate
return executor.run()
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 590, in run
return self.run_local()
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 691, in run_local
saving_listeners=saving_listeners)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 376, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1145, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1170, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/Users/xiaotang/Documents/SRL/LISA/src/model.py", line 294, in model_fn
train_op = optimizer.apply_gradients(zip(gradients, variables), global_step=tf.train.get_global_step())
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/contrib/optimizer_v2/optimizer_v2.py", line 866, in apply_gradients
self._distributed_apply, filtered, global_step=global_step, name=name)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/training/distribute.py", line 1053, in merge_call
return self._merge_call(merge_fn, *args, **kwargs)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/training/distribute.py", line 1060, in _merge_call
return merge_fn(self._distribution_strategy, *args, **kwargs)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/contrib/optimizer_v2/optimizer_v2.py", line 964, in _distributed_apply
var, update, grad)))
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/training/distribute.py", line 868, in update
return self._update(var, fn, *args, **kwargs)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/training/distribute.py", line 1144, in _update
return fn(var, *args, **kwargs)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/contrib/optimizer_v2/optimizer_v2.py", line 958, in update
return processor.update_op(self, g, state)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/contrib/optimizer_v2/optimizer_v2.py", line 81, in update_op
return optimizer._apply_sparse_duplicate_indices(g, self._v, *args)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/contrib/optimizer_v2/optimizer_v2.py", line 1204, in _apply_sparse_duplicate_indices
return self._apply_sparse(gradient_no_duplicate_indices, var, state)
File "/Users/xiaotang/Documents/SRL/LISA/src/lazy_adam_v2.py", line 228, in _apply_sparse
state)
File "/Users/xiaotang/Documents/SRL/LISA/src/lazy_adam_v2.py", line 212, in _apply_sparse_shared
m_bar_slice = array_ops.gather(m_bar, indices)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 2659, in gather
return gen_array_ops.gather_v2(params, indices, axis, name=name)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3142, in gather_v2
"GatherV2", params=params, indices=indices, axis=axis, name=name)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/Users/xiaotang/Documents/soft/miniconda3/envs/deep_nlp/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): indices[862] = 5199 is not in [0, 1968)
[[Node: LISA/Nadam/update_LISA/word_type_embeddings/embeddings/GatherV2_3 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@LISA/Nadam/update_LISA/word_type_embeddings/embeddings/ScatterAdd"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](LISA/Nadam/update_LISA/word_type_embeddings/embeddings/add_1, LISA/Nadam/update_LISA/word_type_embeddings/embeddings/Unique, LISA/Nadam/update_LISA/word_type_embeddings/embeddings/GatherV2_3/axis)]]

Use elmo and other word embedding in new framework

Hi Ms strubell, thanks for your great work ! I would like to try LISA with other state-of-the-art word embedding approach (e.g. elmo) to improve the accuracy on a private dataset :-) Does current LISA framework support other word embedding ? How should I modify the framework to use other word embedding ?

training SA model

Question on your paper.

Hello,
I'm here to ask a question.
It's not about code but about your paper.

To my understanding, the last term of the training objective should be
logP(yt_dep | Vg, X)
instead of logP(yt_dep | Pg, X).
Or Pg should be removed.

Because predicting predicates with POS tagging precedes parsing dependency.
And Pg is the answer for yt_dep.

I think it doesn't make sense to use Gold Parse(Pg) to predict yt_dep.

I'll look forward to your gold explanation. :)
TY

Question about the evaluation of CoNLL-2005 in the end-to-end setting.

Hi, I have a question about the evaluation in the end-to-end setting.
When I study the code, I found your evaluation of CoNLL-2005 is different from He et al. (2018) [https://github.com/luheng/lsgn/blob/master/srl_eval_utils.py] in line 190-207.
Then, I conduct some evaluation experiments with the following two files (gold file and system file)

gold file:

  -                (AM-LOC*               *
  -                       *               *
  -                       *               *
  -                       *               *
  -                       *)              *
  -                       *               *
  -                    (A0*)              *
  call                  (V*)              *
  -                    (A1*               *
  -                       *            (A1*
  -                       *)              *)
  -                    (A2*               *
  -                       *               *
  -                       *               *
  -                       *               *
  -                       *               *
  discipline              *             (V*)
  -                       *               *
  -                       *        (AM-MNR*
  -                       *               *
  -                       *               *
  -                       *               *
  -                       *)              *)
  -                       *               *

system file:

  -                (AM-LOC*               *
  -                       *               *
  -                       *               *
  -                       *               *
  -                       *)              *
  -                       *               *
  -                    (A0*)              *
  call                  (V*)              *
  -                    (A1*               *
  -                       *            (A1*
  -                       *)              *)
  -                    (A2*               *
  -                       *               *
  -                       *               *
  -                       *               *
  -                       *               *
  -                       *              *
  ,                       *               (V*)
  -                       *        (AM-MNR*
  -                       *               *
  -                       *               *
  -                       *               *
  -                       *)              *)
  -                       *               *

After running the evaluation script, we get the results as follows:

WARNING : sentence 0 : verb discipline at position 16 : missing predicted prop! Counting all arguments as missed!
Number of Sentences    :           1
Number of Propositions :           2
Percentage of perfect props :  50.00

              corr.  excess  missed    prec.    rec.      F1
------------------------------------------------------------
   Overall        4       0       2   100.00   66.67   80.00
----------
        A0        1       0       0   100.00  100.00  100.00
        A1        1       0       1   100.00   50.00   66.67
        A2        1       0       0   100.00  100.00  100.00
    AM-LOC        1       0       0   100.00  100.00  100.00
    AM-MNR        0       0       1     0.00    0.00    0.00
------------------------------------------------------------
         V        1       0       1   100.00   50.00   66.67
------------------------------------------------------------

The evaluation script only gives the correctly predicted number of 4, excess number of 0, and missed number of 2.
And the script warns that the second predicate ``discipline'' is missed and counting all arguments as missed.
In my mind, I think the precision should be 4 / (4 + 2), where 2 is the number of wrongly predicted predicates' arguments.
But in your code, you missed the number of wrongly predicted predicates' arguments, which would result in higher precision.

Did I miss something or understand something wrongly?

Thank you very much for your reply!

Question about Zero Volatile GPU-Util

Hello,
I am trying to train and evaluate your LISA model on CoNLL dataset.
While trying to train the model on a GPU, I use the cmd as CUDA_VISIBLE_DEVICES=0 bin/evaluate.sh config/conll05-lisa.conf --save_dir model. However, it seems that nothing works on the GPU. The nvidia-smi shows volatile GPU-util is zero.
How to make best use of GPU for TensorFlow Estimators?
Do you have any ideas about the reason of this problem?

eval the model with golden parsing

I am wondering if there is a place to config the eval with the golden parsing?

Question about use of nn_utils.MLP in srl_bilinear

Hi!
Really nice and interesting work!
I was having a look at the code to better understand the bilinear classification step for srl (srl_bilinear method of output_fns (line 169)).
Why do you use a single MLP layer to project both the predicate vectors and the word (role) vectors (line 195), of which you take two slices afterward (line 196), to do all the computation, instead of using two separate MLPs for predicates and roles? Is it because in such a way the projection of roles affects also the predicates one, and the other way around (or at least, this is what should happen in my mind - it should be a fully connected layer).

many thanks!

strubell / lisa Goto Github PK

lisa's Introduction

LISA: Linguistically-Informed Self-Attention

Requirements:

Quick start:

Data setup (CoNLL-2005):

Train a model:

Evaluate a model:

Evaluate an exported model:

Training

Hyperparameters

Evaluation

Custom configuration [WIP]

Data configs

Converters

Vocabs

Model configs

Layer configs

Task configs

Attention configs

Footnotes

lisa's People

Contributors

Stargazers

Watchers

Forkers

lisa's Issues

Recommend Projects

Recommend Topics

Recommend Org