Code Monkey home page Code Monkey logo

emotion_proposition_store's Introduction

#Construction and Analysis of an Emotion Proposition Store

This is the respository for my Bachelor's thesis dealing with the construction and analysis of an Emotion Proposition Store. This project includes the following contributions:

  • Designing and evaluating patterns that are frequent and clearly associated with an emotion. These patterns can be used as-is to extract tuples of emotion holders and causes from the web as well as from special domain corpora.
  • Acquiring more than 1,700,000 propositions from the Annotated Gigaword news corpus using these patterns, filtering, and generalizing them by employing co-reference resolution and named-entity recognition (NER). These propositions contain information about the emotion, the emotion holder, and the cause of said emotion.
  • Storing these propositions in an emotion proposition store, which we make available to the research community.
  • Analysing and evaluating them to gain further understanding about emotions in news text as well as the capabilities of the resource. Distributional analysis allows us to determine ambiguous concepts as well as single-word and compound expressions that are highly associated with an emotion. Through topic modelling, we explore underlying themes that are associated with certain emotions or shared between different ones.

##Structure of this repository

This repository is organized as follows:

  • NRC-Emotion-Lexicon-v0.92: The NRC Word-Emotion Association Lexicon by Saif Mohammad.
  • R: R code to generate summaries for agreement with the NRC Emotion Lexicon.
  • anno_gigaword: The forked Annotated Gigaword Java API by Courtney Napoles, Matthew Gormley, and Benjamin Van Durme.
  • annotation: The annotated files of the two annotation tasks. patterns_annotated contains the pattern annotation, while bigrams_annotated contains the annotation of the bigrams.
  • dependencies: The JCommon and JFreeChart libraries used for generating charts.
  • mallet: The pseudo-documents and topic models generated using MALLET.
  • emotion_word_sources: Related work that was used as a source for the patterns.
  • out: The directory of the results.
    • Emotion proposition store: The extracted propositions in shelves of 100,000 lines. They have the following format: ID \t emotion \t pattern \t emotion holder \t NP cause \t S cause subject \t S cause predicate \t S cause object \t S cause prepositional objects \t cause bag-of-words.
    • Patterns: The pattern templates and the regular expressions.
    • Scores: The lists of unigrams and bigrams ranked by point-wise mutual information (PMI) or chi-square for Plutchik's eight emotions, sorted by source (emotion holder, NP cause, S cause subject + predicate, S cause predicate + object). These can be used as an emotion lexicon.
    • Sentences: The extracted propositions along with the sentences that they were extracted from in chunks of 100,000 lines.
    • Stats: Statistics about the patterns and the extracted propositions.
  • src: The source directory.
  • thesis: The directory for the thesis LaTeX project.

emotion_proposition_store's People

Contributors

sebastianruder avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

emotion_proposition_store's Issues

How does NRC work for feature extraction process?

Hello,
I have seen your program and I want to implement NRC emotion lexicon in English language. From what I can understand is that, we look for the emotion_value in the NRC emotion lexicon list and add the value for that emotions in a dictionary so as to get an entire score for the sentence. Do we need to binarize it later before we do machine learning or just apply machine learning on the dictionary itself?

Please Clarify.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.