The emotion_proposition_store from sebastianruder

#Construction and Analysis of an Emotion Proposition Store

This is the respository for my Bachelor's thesis dealing with the construction and analysis of an Emotion Proposition Store. This project includes the following contributions:

Designing and evaluating patterns that are frequent and clearly associated with an emotion. These patterns can be used as-is to extract tuples of emotion holders and causes from the web as well as from special domain corpora.
Acquiring more than 1,700,000 propositions from the Annotated Gigaword news corpus using these patterns, filtering, and generalizing them by employing co-reference resolution and named-entity recognition (NER). These propositions contain information about the emotion, the emotion holder, and the cause of said emotion.
Storing these propositions in an emotion proposition store, which we make available to the research community.
Analysing and evaluating them to gain further understanding about emotions in news text as well as the capabilities of the resource. Distributional analysis allows us to determine ambiguous concepts as well as single-word and compound expressions that are highly associated with an emotion. Through topic modelling, we explore underlying themes that are associated with certain emotions or shared between different ones.

##Structure of this repository

This repository is organized as follows:

NRC-Emotion-Lexicon-v0.92: The NRC Word-Emotion Association Lexicon by Saif Mohammad.
R: R code to generate summaries for agreement with the NRC Emotion Lexicon.
anno_gigaword: The forked Annotated Gigaword Java API by Courtney Napoles, Matthew Gormley, and Benjamin Van Durme.
annotation: The annotated files of the two annotation tasks. patterns_annotated contains the pattern annotation, while bigrams_annotated contains the annotation of the bigrams.
dependencies: The JCommon and JFreeChart libraries used for generating charts.
mallet: The pseudo-documents and topic models generated using MALLET.
emotion_word_sources: Related work that was used as a source for the patterns.
out: The directory of the results.
- Emotion proposition store: The extracted propositions in shelves of 100,000 lines. They have the following format: ID \t emotion \t pattern \t emotion holder \t NP cause \t S cause subject \t S cause predicate \t S cause object \t S cause prepositional objects \t cause bag-of-words.
- Patterns: The pattern templates and the regular expressions.
- Scores: The lists of unigrams and bigrams ranked by point-wise mutual information (PMI) or chi-square for Plutchik's eight emotions, sorted by source (emotion holder, NP cause, S cause subject + predicate, S cause predicate + object). These can be used as an emotion lexicon.
- Sentences: The extracted propositions along with the sentences that they were extracted from in chunks of 100,000 lines.
- Stats: Statistics about the patterns and the extracted propositions.
src: The source directory.
- AgigaReader: Class to extract propositions from the Annotated Gigaword corpus.
- Analyzer: Class to analyze extractions.
- AnnotationComparer: Class to compare pattern and bigram annotations.
- AnnotationTaskGenerator: Class to create the bigram annotation task.
- EmotionPatternExtractor: Class to convert pattern templates into regular expressions.
- Enums: Class containing various enumerations.
- Extensions: Class containing various extension methods.
- Extraction: Class storing information about an extraction.
- GitFileSplitter: Class to split files for upload via GitHub.
- MALLETProcessor: Class to process MALLET topic distributions.
- RandomWriter: Class to create the pattern annotation task.
- ResultsCleaner: Class to remove duplicates and erroneous patterns from results.
- ResultsReader: Class to read extractions and write score files.
- ResultsStatsWriter: Class to write statistics about extracted propositions.
- Stats: Class to store and write emotion and pattern statistics.
- Utils: Utility class containing IO, token- and tree-processing methods.
- Visualizer: Class to generate charts from association scores.
- WordTypesExtractor: Class to extract word types from Tsvetkov et al.
thesis: The directory for the thesis LaTeX project.

sebastianruder / emotion_proposition_store Goto Github PK

emotion_proposition_store's Introduction

emotion_proposition_store's People

Contributors

Stargazers

Watchers

Forkers

emotion_proposition_store's Issues

How does NRC work for feature extraction process?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent