Code Monkey home page Code Monkey logo

svo_automation's Introduction

SVO Triplet Automation of Narrative Stories for Social Sciences

The goal of the project is to build the pipeline to automate the process of generating SVO triplets for the use of social science research. For example, character relationships can be visualized using networks in Gephi based on SVO triplets. In the end, we want to integrate the pipeline into the NLP software PC-ACE developed by Professor Roberto Franzosi at Emory from Sociology Department.

The whole pipeline is composed of three steps:

  • Data Cleaning
  • Anaphora Resolution
  • SVO Triplets Extraction

Data Cleaning

  • Clean data converted from pdf format
  • Extract titles and contents of Emory Lynching articles and separate them into two parts

Anaphora Resolution: Stanford CoreNLP

  • Replace mentions of entities (e.g. pronouns like "he" and "she") with their most representative representations using Stanford CoreNLP's coreference (anaphora) resolution
  • Used to maximize and validate SVO extraction by correctly identifying actors

For example:

Bill Cato Attempted to Assault Mrs. Vickers. He was shot to death. will look like Bill Cato Attempted to Assault Mrs. Vickers. Bill Cato was shot to death. after anaphora resolution.

SVO Extraction: ClausIE

  • Format Emory Lynching Corpus cleaned_corenlp_lynching.txt into clausie_input.txt to be ready for ClausIE in order to get triplets
  • Extract only SVO's from sentences-test-out.txt to svo.txt
  • Filter SVO sets into terminal_svo.txt by preserving only triplets with a confirmed social actor as the subject

The SVO results will look like the following (verbs are converted into stem, so estim means estimate):

  S: mob            , V: estimate       , O: shooting       
  S: girl           , V: protect        , O: negro          
  S: prisoner       , V: have            , O: neck 

Data Visualization

  • output file is ready to be seen by Gephi

         Node1     Edge      Node2
    0   people     have      wrath
    1   people     have      hands
    2   county     have       duty
    3  sheriff  convene      court
    4  sheriff      try  criminals
    

Dependencies:

  • Stanford CoreNlp
  • NLTK
  • ClausIE
  • enchant

Version

Alpha Version. It is still up to changes in the future. Welcome any comments and advice.

svo_automation's People

Contributors

dwzhou avatar hjian42 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.