Code Monkey home page Code Monkey logo

hvpnet's Introduction

HVPNet

Code for the NAACL2022 (Findings) paper "Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction".

Model Architecture

The overall architecture of our hierarchical modality fusion network.

Requirements

To run the codes, you need to install the requirements:

pip install -r requirements.txt

Data Preprocess

To extract visual object images, we first use the NLTK parser to extract noun phrases from the text and apply the visual grouding toolkit to detect objects. Detailed steps are as follows:

  1. Using the NLTK parser (or Spacy, textblob) to extract noun phrases from the text.
  2. Applying the visual grouding toolkit to detect objects. Taking the twitter2015 dataset as an example, the extracted objects are stored in twitter2015_aux_images. The images of the object obey the following naming format: imgname_pred_yolo_crop_num.png, where imgname is the name of the raw image corresponding to the object, num is the number of the object predicted by the toolkit. (Note that in train/val/test.txt, text and raw image have a one-to-one relationship, so the imgname can be used as a unique identifier for the raw images)
  3. Establishing the correspondence between the raw images and the objects. We construct a dictionary to record the correspondence between the raw images and the objects. Taking twitter2015/twitter2015_train_dict.pth as an example, the format of the dictionary can be seen as follows: {imgname:['imgname_pred_yolo_crop_num0.png', 'imgname_pred_yolo_crop_num1.png', ...] }, where key is the name of raw images, value is a List of the objects.

The detected objects and the dictionary of the correspondence between the raw images and the objects are available in our data links.

Data Download

  • Twitter2015 & Twitter2017

    The text data follows the conll format. You can download the Twitter2015 data via this link and download the Twitter2017 data via this link. Please place them in data/NER_data.

    You can also put them anywhere and modify the path configuration in run.py

  • MNER

    The MRE dataset comes from MEGA, many thanks.

    You can download the MRE dataset with detected visual objects using folloing command:

    cd data
    wget 120.27.214.45/Data/re/multimodal/data.tar.gz
    tar -xzvf data.tar.gz
    mv data RE_data

The expected structure of files is:

HMNeT
 |-- data
 |    |-- NER_data
 |    |    |-- twitter2015  # text data
 |    |    |    |-- train.txt
 |    |    |    |-- valid.txt
 |    |    |    |-- test.txt
 |    |    |    |-- twitter2015_train_dict.pth  # {imgname: [object-image]}
 |    |    |    |-- ...
 |    |    |-- twitter2015_images       # raw image data
 |    |    |-- twitter2015_aux_images   # object image data
 |    |    |-- twitter2017
 |    |    |-- twitter2017_images
 |    |    |-- twitter2017_aux_images
 |    |-- RE_data
 |    |    |-- img_org          # raw image data
 |    |    |-- img_vg           # object image data
 |    |    |-- txt              # text data
 |    |    |-- ours_rel2id.json # relation data
 |-- models	# models
 |    |-- bert_model.py
 |    |-- modeling_bert.py
 |-- modules
 |    |-- metrics.py    # metric
 |    |-- train.py  # trainer
 |-- processor
 |    |-- dataset.py    # processor, dataset
 |-- logs     # code logs
 |-- run.py   # main 
 |-- run_ner_task.sh
 |-- run_re_task.sh

Train

NER Task

The data path and GPU related configuration are in the run.py. To train ner model, run this script.

bash run_twitter15.sh
bash run_twitter17.sh

checkpoints can be download via Twitter15_ckpt, Twitter17_ckpt.

RE Task

To train re model, run this script.

bash run_re_task.sh

checkpoints can be download via re_ckpt

Test

NER Task

To test ner model, you can download the model chekpoints we provide via Twitter15_ckpt, Twitter17_ckpt or use your own tained model and set load_path to the model path, then run following script:

python -u run.py \
      --dataset_name="twitter15/twitter17" \
      --bert_name="bert-base-uncased" \
      --seed=1234 \
      --only_test \
      --max_seq=80 \
      --use_prompt \
      --prompt_len=4 \
      --sample_ratio=1.0 \
      --load_path='your_ner_ckpt_path'

RE Task

To test re model, you can download the model chekpoints we provide via re_ckpt or use your own tained model and set load_path to the model path, then run following script:

python -u run.py \
      --dataset_name="MRE" \
      --bert_name="bert-base-uncased" \
      --seed=1234 \
      --only_test \
      --max_seq=80 \
      --use_prompt \
      --prompt_len=4 \
      --sample_ratio=1.0 \
      --load_path='your_re_ckpt_path'

Acknowledgement

The acquisition of Twitter15 and Twitter17 data refer to the code from UMT, many thanks.

The acquisition of MNRE data for multimodal relation extraction task refer to the code from MEGA, many thanks.

Papers for the Project & How to Cite

If you use or extend our work, please cite the paper as follows:

@inproceedings{DBLP:conf/naacl/ChenZLYDTHSC22,
  author    = {Xiang Chen and
               Ningyu Zhang and
               Lei Li and
               Yunzhi Yao and
               Shumin Deng and
               Chuanqi Tan and
               Fei Huang and
               Luo Si and
               Huajun Chen},
  editor    = {Marine Carpuat and
               Marie{-}Catherine de Marneffe and
               Iv{\'{a}}n Vladimir Meza Ru{\'{\i}}z},
  title     = {Good Visual Guidance Make {A} Better Extractor: Hierarchical Visual
               Prefix for Multimodal Entity and Relation Extraction},
  booktitle = {Findings of the Association for Computational Linguistics: {NAACL}
               2022, Seattle, WA, United States, July 10-15, 2022},
  pages     = {1607--1618},
  publisher = {Association for Computational Linguistics},
  year      = {2022},
  url       = {https://doi.org/10.18653/v1/2022.findings-naacl.121},
  doi       = {10.18653/v1/2022.findings-naacl.121},
  timestamp = {Tue, 23 Aug 2022 08:36:33 +0200},
  biburl    = {https://dblp.org/rec/conf/naacl/ChenZLYDTHSC22.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.