Code Monkey home page Code Monkey logo

reasoning_in_ee's Introduction

OntoED and OntoEvent

OntoED: A Model for Low-resource Event Detection with Ontology Embedding

๐ŸŽ The project is an official implementation for OntoED model and a repository for OntoEvent dataset, which has firstly been proposed in the paper OntoED: Low-resource Event Detection with Ontology Embedding accepted by ACL 2021.

๐Ÿค— The implementations are based on Huggingface's Transformers and remanagement is referred to MAVEN's baselines & DeepKE.

๐Ÿค— We also provide some baseline implementations for reproduction.

Brief Introduction

OntoED is a model that resolves event detection under low-resource conditions. It models the relationship between event types through ontology embedding: it can transfer knowledge of high-resource event types to low-resource ones, and the unseen event type can establish connection with seen ones via event ontology.

Project Structure

The structure of data and code is as follows:

Reasoning_In_EE
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ OntoED			# model
โ”‚ย ย  โ”œโ”€โ”€ README.md
โ”‚ย ย  โ”œโ”€โ”€ data_utils.py		# for data processing
โ”‚ย ย  โ”œโ”€โ”€ ontoed.py			# main model
โ”‚ย ย  โ”œโ”€โ”€ run_ontoed.py		# for model running
โ”‚ย ย  โ””โ”€โ”€ run_ontoed.sh		# bash file for model running
โ”œโ”€โ”€ OntoEvent		# data
โ”‚ย ย  โ”œโ”€โ”€ README.md
โ”‚ย ย  โ”œโ”€โ”€ __init__.py
โ”‚ย ย  โ”œโ”€โ”€ event_dict_data_on_doc.json.zip		# raw full ED data
โ”‚ย ย  โ”œโ”€โ”€ event_dict_train_data.json			# ED data for training
โ”‚ย ย  โ”œโ”€โ”€ event_dict_test_data.json			# ED data for testing
โ”‚ย ย  โ”œโ”€โ”€ event_dict_valid_data.json			# ED data for validation
โ”‚ย ย  โ””โ”€โ”€ event_relation.json					# event-event relation data
โ””โ”€โ”€ baselines		# baseline models
    โ”œโ”€โ”€ DMCNN
    โ”‚ย ย  โ”œโ”€โ”€ README.md
    โ”‚ย ย  โ”œโ”€โ”€ convert.py			# for data processing
    โ”‚ย ย  โ”œโ”€โ”€ data				# data
    โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ labels.json
    โ”‚ย ย  โ”œโ”€โ”€ dmcnn.config		# configure training & testing
    โ”‚ย ย  โ”œโ”€โ”€ eval.sh				# bash file for model evaluation
    โ”‚ย ย  โ”œโ”€โ”€ formatter
    โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ DmcnnFormatter.py	# runtime data processing
    โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ __init__.py
    โ”‚ย ย  โ”œโ”€โ”€ main.py				# project entrance
    โ”‚ย ย  โ”œโ”€โ”€ model
    โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ Dmcnn.py		# main model
    โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ __init__.py
    โ”‚ย ย  โ”œโ”€โ”€ raw
    โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ 100.utf8		# word vector
    โ”‚ย ย  โ”œโ”€โ”€ reader
    โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ MavenReader.py	# runtime data reader
    โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ __init__.py
    โ”‚ย ย  โ”œโ”€โ”€ requirements.txt	# requirements
    โ”‚ย ย  โ”œโ”€โ”€ train.sh			# bash file for model training
    โ”‚ย ย  โ””โ”€โ”€ utils
    โ”‚ย ย      โ”œโ”€โ”€ __init__.py
    โ”‚ย ย      โ”œโ”€โ”€ configparser_hook.py
    โ”‚ย ย      โ”œโ”€โ”€ evaluation.py
    โ”‚ย ย      โ”œโ”€โ”€ global_variables.py
    โ”‚ย ย      โ”œโ”€โ”€ initializer.py
    โ”‚ย ย      โ””โ”€โ”€ runner.py
    โ”œโ”€โ”€ JMEE
    โ”‚ย ย  โ”œโ”€โ”€ README.md
    โ”‚ย ย  โ”œโ”€โ”€ data				# to store data file
    โ”‚ย ย  โ”œโ”€โ”€ enet
    โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ __init__.py
    โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ consts.py		# configurable parameters
    โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ corpus
    โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ Corpus.py	# dataset class
    โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ Data.py
    โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ Sentence.py
    โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ __init__.py
    โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ models			# modules of JMEE
    โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ DynamicLSTM.py
    โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ EmbeddingLayer.py
    โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ GCN.py
    โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ HighWay.py
    โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ SelfAttention.py
    โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ __init__.py
    โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ ee.py
    โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ model.py	# main model
    โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ run
    โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ __init__.py
    โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ ee
    โ”‚ย ย  โ”‚ย ย  โ”‚ย ย      โ”œโ”€โ”€ __init__.py
    โ”‚ย ย  โ”‚ย ย  โ”‚ย ย      โ””โ”€โ”€ runner.py	# runner class
    โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ testing.py		# evaluation
    โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ training.py		# training
    โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ util.py
    โ”‚ย ย  โ”œโ”€โ”€ eval.sh				# bash file for model evaluation
    โ”‚ย ย  โ”œโ”€โ”€ requirements.txt	# requirements
    โ”‚ย ย  โ””โ”€โ”€ train.sh			# bash file for model training
    โ”œโ”€โ”€ README.md
    โ”œโ”€โ”€ eq1.png
    โ”œโ”€โ”€ eq2.png
    โ”œโ”€โ”€ jointEE-NN
    โ”‚ย ย  โ”œโ”€โ”€ README.md
    โ”‚ย ย  โ”œโ”€โ”€ data
    โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ fistDoc.nnData4.txt	# data format sample
    โ”‚ย ย  โ”œโ”€โ”€ evaluateJEE.py			# model evaluation
    โ”‚ย ย  โ”œโ”€โ”€ jeeModels.py			# main model
    โ”‚ย ย  โ”œโ”€โ”€ jee_processData.py		# data process
    โ”‚ย ย  โ””โ”€โ”€ jointEE.py				# project entrance
    โ””โ”€โ”€ stanford.zip			# cleaned dataset for baseline models

Requirements

  • python==3.6.9

  • torch==1.8.0 (lower may also be OK)

  • transformers==2.8.0

  • sklearn==0.20.2

Usage

1. Project Preparation๏ผšDownload this project and unzip the dataset. You can directly download the archive, or run git clone https://github.com/231sm/Reasoning_In_EE.git at your teminal.

cd [LOCAL_PROJECT_PATH]

git clone https://github.com/231sm/Reasoning_In_EE.git

2. Running Preparation: Adjust the parameters in run_ontoed.sh bash file, and input the true path of 'LABEL_PATH' and 'RELATION_PATH' at the end of data_utils.py.

cd Reasoning_In_EE/OntoED

vim run_ontoed.sh
(input the parameters, save and quit)

vim data_utils.py
(input the path of 'LABEL_PATH' and 'RELATION_PATH', save and quit)

Hint:

3. Running Model: Run ./run_ontoed.sh for training, validation, and testing. A folder with configuration, models weights, and results (in is_test_true_eval_results.txt) will be saved at the path you input ('--output_dir') in the bash file run_ontoed.sh.

cd Reasoning_In_EE/OntoED

./run_ontoed.sh
('--do_train', '--do_eval', '--evaluate_during_training', '--do_test' is necessarily input in 'run_ontoed.sh')

Or you can run run_ontoed.py with manual parameter input (parameters can be copied from 'run_ontoed.sh')

python run_ontoed.py --para... 

How about the Dataset

OntoEvent is proposed for ED and also annotated with correlations among events. It contains 13 supertypes with 100 subtypes, derived from 4,115 documents with 60,546 event instances. Please refer to OntoEvent for details.

Statistics

The statistics of OntoEvent are shown below, and the detailed data schema can be referred to our paper.

Dataset #Doc #Instance #SuperType #SubType #EventCorrelation
ACE 2005 599 4,090 8 33 None
TAC KBP 2017 167 4,839 8 18 None
FewEvent - 70,852 19 100 None
MAVEN 4,480 111,611 21 168 None
OntoEvent 4,115 60,546 13 100 3,804

Data Format

The OntoEvent dataset is stored in json format.

๐Ÿ’For each event instance in event_dict_data_on_doc.json, the data format is as below:

{
    'doc_id': '...', 
    'doc_title': 'XXX', 
    'sent_id': , 
    'event_mention': '......', 
    'event_mention_tokens': ['.', '.', '.', '.', '.', '.'], 
    'trigger': '...', 
    'trigger_pos': [, ], 
    'event_type': ''
}

๐Ÿ’For each event relation in event_relation.json, we list the event instance pair, and the data format is as below:

'EVENT_RELATION_1': [ 
    [
        {
            'doc_id': '...', 
            'doc_title': 'XXX', 
            'sent_id': , 
            'event_mention': '......', 
            'event_mention_tokens': ['.', '.', '.', '.', '.', '.'], 
            'trigger': '...', 
            'trigger_pos': [, ], 
            'event_type': ''
        }, 
        {
            'doc_id': '...', 
            'doc_title': 'XXX', 
            'sent_id': , 
            'event_mention': '......', 
            'event_mention_tokens': ['.', '.', '.', '.', '.', '.'], 
            'trigger': '...', 
            'trigger_pos': [, ], 
            'event_type': ''
        }
    ], 
    ...
]

๐Ÿ’Especially for "COSUPER", "SUBSUPER" and "SUPERSUB", we list the event type pair, and the data format is as below:

"COSUPER": [
    ["Conflict.Attack", "Conflict.Protest"], 
    ["Conflict.Attack", "Conflict.Sending"], 
    ...
]

How to Cite

๐Ÿ“‹ Thank you very much for your interest in our work. If you use or extend our work, please cite the following paper:

@inproceedings{ACL2021_OntoED,
    title = "{O}nto{ED}: Low-resource Event Detection with Ontology Embedding",
    author = "Deng, Shumin  and
      Zhang, Ningyu  and
      Li, Luoqiu  and
      Hui, Chen  and
      Huaixiao, Tou  and
      Chen, Mosha  and
      Huang, Fei  and
      Chen, Huajun",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.220",
    doi = "10.18653/v1/2021.acl-long.220",
    pages = "2828--2839"
}

reasoning_in_ee's People

Contributors

231sm avatar riroaki avatar zxlzr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

reasoning_in_ee's Issues

about train dataset

When I was testing, I found that if a sentence does not contain any event, but the prediction result is given as belonging to a certain event type, it turns out that the training set is for sentences that contain events, but how should sentences that do not contain events be added to the training set? What values should be filled in the trigger and trigger_pos fields? After adding the sentences without event type, what other changes should be made?

question about trigger identification

ๆ‚จๅฅฝ๏ผŒๆˆ‘ๅœจไปฃ็ ไธญๆฒกๆ‰พๅˆฐๅ…ณไบŽ่งฆๅ‘่ฏ่ฏ†ๅˆซ็š„้ƒจๅˆ†๏ผŒๅœจๆจกๅž‹้ƒจๅˆ†ๆˆ‘ๅชๆ‰พๅˆฐไฝฟ็”จinstance_embeddingๆฅ่ฟ›่กŒไบ‹ไปถๅˆ†็ฑป๏ผŒๆˆ‘ไธๅคชๆธ…ๆฅšๆ˜ฏไธๆ˜ฏๆˆ‘ๅฏนไปฃ็ ็†่งฃๆœ‰่ฏฏ๏ผŒ่ฏท้—ฎๆ‚จๆ˜ฏๆ€Žไนˆ่ฟ›่กŒ่งฆๅ‘่ฏ่ฏ†ๅˆซ็š„๏ผŒไปฅๅŠ็›ธๅบ”็š„่ฏ„ไปทๆŒ‡ๆ ‡ๆ˜ฏๆ€Žไนˆ่ฎก็ฎ—็š„

Missing file named โ€œsettingsโ€

hi, I have noticed an error when trying to run the train.py: "from settings import parameters as para", there is probably a missing file named settings, is there something I missed or would you please upload this file?

่ฏท้—ฎๆจกๅž‹้€‚็”จไบŽไธญๆ–‡ไบ‹ไปถๆŠฝๅ–ๅ—

ไฝ ๅฅฝ๏ผŒ็ฒ—็•ฅๆ‹œ่ฏปไบ†ๆ‚จ็š„่ฎบๆ–‡๏ผŒไฝ†ๅ‘็Žฐ้ƒฝๆ˜ฏๅœจ่‹ฑๆ–‡้›†ไธŠๅšๅฎž้ชŒ๏ผŒ่ฏท้—ฎ้€‚็”จไบŽไธญๆ–‡้›†ๅ—๏ผŒๅฆ‚ๆžœๅฏไปฅ้œ€่ฆไฟฎๆ”นๅ“ชไบ›ๅœฐๆ–นๅ‘ข

Implementation Problem

Thank u for ur sharing! In your paper, there are three modules discussed๏ผš

model

  1. Event Detection
  2. Event Ontology Learning
  3. Event Correlation Inference

However, in this repository, I can only find the implementaion of Event Detection. Did I overlook the other two modules' implementation or you forgot to upload them?

ๅ…ณไบŽๆ•ฐๆฎ้›†OntoEvent็š„้—ฎ้ข˜

่ฏท้—ฎๆ•ฐๆฎ้›†OntoEventๆ˜ฏๅชๅผ€ๆบไบ†ไธ€้ƒจๅˆ†ๅ—๏ผŒๅœจๅทฒๅผ€ๆบ็š„ๆ•ฐๆฎไธญ๏ผŒๆˆ‘ๆฒกๆœ‰ๆ‰พๅˆฐๅ…ณไบŽไบ‹ไปถไธŽไบ‹ไปถๅ…ณ็ณป่ฟ™ไธ€ๆ 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.