Code Monkey home page Code Monkey logo

replm's Introduction

REPLM: Document-Level In-Context Few-Shot Relation Extraction via Pre-Trained Language Models

The overview of our REPLM framework

The original implementation of the paper. You can cite the paper as below.

@article{ozyurt2023context,
  title={Document-Level In-Context Few-Shot Relation Extraction via Pre-Trained Language Models},
  author={Ozyurt, Yilmazcan and Feuerriegel, Stefan and Zhang, Ce},
  journal={arXiv preprint arXiv:2310.11085},
  year={2023}
}

We used Python 3.8.5 in our experiments.

You can install the requirement libraries via pip install -r requirements.txt into your new virtual Python environment.

Data pre-processing

First step is to download the DocRED dataset, following the instructions from the original repository. As a result, you should have a new folder ./DocRED.

Then you can run the pre-processing pipeline DocRED_preprocess/main.sh.

Running our REPLM framework

Run the inference for L different sets of in-context few-shot examples (by changing <seed_no>):

python extract_relations.py --relation <rel_id> --seed <seed_no> --experiments_main_folder experiment_<rel_id> --experiment_folder seed<seed_no>

After running for different seeds, aggregate their results as follows:

python aggregate_extractions.py --temperature <temperature> --threshold <threshold> --experiments_main_folder experiment_<rel_id>

This should yield experiment_<rel_id>/aggretaged_predictions.csv

Running our REPLM framework on OpenAI's GPT models

As our REPLM framework is transferable to other LM backbones, you can easily replace the default LM with your favourite one, such as one of the GPT models from OpenAI. Specifically, if you want to experiment with gpt-3.5-turbo, you can run the inference via the following:

python extract_relations_openai.py --model_name gpt-3.5-turbo --relation <rel_id> --seed <seed_no> --experiments_main_folder experiment_<rel_id> --experiment_folder seed<seed_no>

Important Note: Don't forget to set OPENAI_API_KEY environment variable before running the experiments.

Evaluation via external knowledge base

Extracted relations from DocRED can be further evaluated/compared from WikiData.

To achieve this, first you need to clone simple-wikidata-db and follow the steps there to have a local copy of WikiData. (Of note: One can ideally use the online query service of WikiData instead of a local copy, however, it is too slow to run queries for thousands of extracted relations.)

For convenience we repeat the important steps here:

  • Open the ./simple-wikidata-db directory, which you recently cloned.

  • Fetch the most recent dump of WikiData

wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.gz

  • Run the process (Note: it's a CPU-heavy process)

python simple_wikidata_db/preprocess_dump.py --input_file latest-all.json.gz --out_dir PROCESSED_DATA/ --num_lines_in_dump -1 --processes 200 --batch_size 250000

  • The local copy of WikiData should be ready at ./simple-wikidata-db/PROCESSED_DATA

The final step is to compare the extracted relations against WikiData entries

python pred_to_wikidata.py --pred_folder experiment_<rel_id> --pred_file aggregated_predictions.csv --rel_id <rel_id> -p_alias "simple-wikidata-db/PROCESSED_DATA/aliases" -p_rels "simple-wikidata-db/PROCESSED_DATA/entity_rels"

This should yield experiment_<rel_id>/preds_in_wikidata.csv, which lists the extracted relations from the documents that also appear in the knowledge base WikiData.

replm's People

Contributors

oezyurty avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

replm's Issues

Some question about the paper

Firstly, according to the file relation_docs_dev in the provided code, can I assume that in your method, the distribution of relation for each test document is known?
Secondly, in your experiment, the evaluation setting is mention-level or entity-level? The entity-level evaluation is that a mention is counted as correct if its span matches a ground truth mention span.
Finally, since I did not conduct experiments on the method of REBEL, I am not sure if it is due to the significant difference in F1 scores caused by using Micro-F1 in REBEL and weighted-averaged F1 in your method.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.