Code Monkey home page Code Monkey logo

similarity-based-memory-re's Introduction

Similarity-based Memory Enhanced Joint Entity and Relation Extraction

Official PyTorch implementation of the paper "Similarity-based Memory Enhanced Joint Entity and Relation Extraction" accepted on the International Conference on Computational Science 2023.

alt text

Setup

Install requirements

To set up an environment first create the python 3.10 environment using conda or virtualenv, activate it, and install poetry using pip:

pip install poetry==1.4.2
make setup

OR

To force the CUDA version of pytorch library run the following:

make setup-cuda

Init git submodules

We built our solution based on wonderful work from JEREX repository. We also used the code from Edge-oriented Graph repository to process the CDR dataset. We used the code as git submodules which You can initialize by running the following:

git submodule init && git submodule update --recursive

Because we import some code from the JEREX It might be useful to add the ./submodules/jerex directory to PYTHONPATH.

Set environment variables

Create the '.env' file based on '.env.example' examples and set the variables:

  • PRETRAINED_MODELS_DIR - The directory where the pretrained huggingface 🤗 models are stored;
  • WANDB_PROJECT_NAME - If you want to use Weights&Biases logging You can set the project name used by the logger;

Download datasets

You can download the datasets we used in our experiments in ready-to-use format:

./scripts/datasets/fetch_datasets.sh

Download pretrained models

You can download the datasets we used in our experiments in ready-to-use format:

./scripts/fetch_models.sh

Configuration

We used Hydra to create a hierarchical configuration for running our experiments. The ./config directory contains all the .yaml file used to run the scripts.

Example inference

You can make use of downloaded model checkpoint and run the inference on the CDR dataset using the following command:

python ./scripts/run.py --config-name memory_re/cdr/test

All the artifacts created during run will be logged into directory created in ./storage/runs. The script is configured to visualize predictions in .html file (using the code from JEREX) and visualize the memory activations for entities and mentions categories.

Training

You can run training on the CDR dataset using the following command:

python ./scripts/run.py --config-name memory_re/cdr/train

similarity-based-memory-re's People

Stargazers

 avatar Ready Xiao avatar 唐国梁Tommy avatar

Watchers

James Cloos avatar Witold Kościukiewicz avatar

Forkers

tonycsoka

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.