Official PyTorch implementation of the paper "Similarity-based Memory Enhanced Joint Entity and Relation Extraction" accepted on the International Conference on Computational Science 2023.
To set up an environment first create the python 3.10
environment using conda
or virtualenv
,
activate it, and install
poetry using pip:
pip install poetry==1.4.2
make setup
OR
To force the CUDA version of pytorch library run the following:
make setup-cuda
We built our solution based on wonderful work from JEREX repository. We also used the code from Edge-oriented Graph repository to process the CDR dataset. We used the code as git submodules which You can initialize by running the following:
git submodule init && git submodule update --recursive
Because we import some code from the JEREX It might be useful to add the ./submodules/jerex
directory to
PYTHONPATH.
Create the '.env' file based on '.env.example' examples and set the variables:
- PRETRAINED_MODELS_DIR - The directory where the pretrained huggingface ๐ค models are stored;
- WANDB_PROJECT_NAME - If you want to use Weights&Biases logging You can set the project name used by the logger;
You can download the datasets we used in our experiments in ready-to-use format:
./scripts/datasets/fetch_datasets.sh
You can download the datasets we used in our experiments in ready-to-use format:
./scripts/fetch_models.sh
We used Hydra to create a hierarchical configuration for running
our experiments. The ./config
directory contains all the .yaml
file used to run the scripts.
You can make use of downloaded model checkpoint and run the inference on the CDR dataset using the following command:
python ./scripts/run.py --config-name memory_re/cdr/test
All the artifacts created during run will be logged into directory created in ./storage/runs
. The
script is configured to visualize predictions in .html file (using the code from JEREX) and
visualize the memory activations for entities and mentions categories.
You can run training on the CDR dataset using the following command:
python ./scripts/run.py --config-name memory_re/cdr/train