sbenthall / bluestocking Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 3.0 206 KB

A bookish botscript eager to provide her considerate opinion

License: GNU General Public License v3.0

Python 100.00%

bluestocking's People

Contributors

Stargazers

Watchers

Forkers

dkush dwins bluestocking

bluestocking's Issues

pronoun resolution

in the preprocessing step, replace pronouns with the chunks

'factchecker' demo

Build demo functionality:

given a text
tokenize the text, look up every word in wikipedia
build knowledge base based on collected articles
evaluate consistency of original text with knowledge base
segment into: supported claims, new claims, and contradicted claims.

introduct Concept class (and subclasses?)

Introduce Concept class as wrapper around a semantic node. A Concept has:

a set of terms that elicit it (e.g. a single term, or set of words in a WordNet synset)
a reference to the knowledge base they are a part of (see Ned Block on Conceptual Role Semantics)
a method that returns the available relations for this concept, in the knowledge base and/or based on wordnet (antonyms)

wikipedia article grabber

script the functionality of: given a word (or article title), grab the wikipedia article on the subject, and strip markup to return text.

For now, don't worry about marking the articles for different handling on the Concept level.

smarter knowledge base aggregation

Currently, knowledge bases are just built by appending the relations parsed from individual documents in the corpus.

This can lead to the introduction of contradictory relations into the knowledge base.

One way to deal with this would be to check for consistency between Concepts when adding relations to a knowledge base. If the concept in a newly added set of relations is similar enough to an existing concept, then the two concepts can be merged and all relations applicable to either can be adopted. If the concepts are to deviant (despite, say, having the same triggering word), then the new concept can be preserved separately.

why tautological relations?

The parser is pulling out tautological relations like:

[(True, 'yesterday', 'yesterday'), (True, 'today', 'today')]

which are throwing off consistency scoring.