Code Monkey home page Code Monkey logo

summarytweets's Introduction

SummaryTweets

##Authors Irene Feng, Orestis Lykouropoulos, Patrick Xu

##Overview SummaryTweets is python-based sentence compression program which primarily uses TF-IDF an phrase substitution to reduce the length of a given text to the desired length. SummaryTweets has a default output length of 140 characters, the length of one Tweet.

We use TF-IDF, or Term Frequency-Inverse Document Frequency, as a way to determine the importance of terms in a sentence, which we then sum within each sentence to build a sentence score. In general, TF-IDF is based upon the idea that, given a large enough sample text, the frequency of a word is inversely related to its importance. Using word counts from a large collection of sample texts, a corpus, we can assign each word a score which reflects its importance. Using these scores, we can determine the importance of a sentence. TF-IDF is not a perfect scoring method but has nonetheless proved quite accurate in our application.

Phrasal substitution is achieved through information from the Paraphrase Database (PPDB). Over 30,000 lexical rules are currently utilized by SummaryTweets, although there exists the possibility of utilizing far more at the trade off of substitution accuracy.

##How to Use SummaryTweets has been uploaded to the following webpage.

http://www.cs.dartmouth.edu/cgi-bin/cgiwrap/patxu/summary.cgi

It can also be used by running the bash shell script file run.sh. This runs tf_idf.py using approriate input arguments. The python file itself can be run with

python tfidf

Use the flag "-h" for information about the input arguments.

##Structure /CorpusFolder- contains the Brown Corpus and arpa bigram probabilities

/pickl- contains the serialized dictionaries for our corpus. These dictionaries are used for sentence compression and TF-IDF

/stat_parser- uses the CKY algorithm to return a parse tree of a sentence

/styles- files for marking up the webpage

summarytweets's People

Contributors

irenelfeng avatar olykos avatar patxu avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

olykos patxu

summarytweets's Issues

TypeError: ord() expected a character, but string of length 0 found

I have tried to run the repository and found the following error:

C:\Users\lenovo-pc\Desktop\SummaryTweets-master>c:\Python27\python.exe tf_idf.py -textfile input.txt
Parsing Corpus...
Traceback (most recent call last):
  File "tf_idf.py", line 167, in <module>
    program = tfidf()
  File "tf_idf.py", line 31, in __init__
    self.compressor = parse_compress.compressor()
  File "C:\Users\lenovo-pc\Desktop\SummaryTweets-master\parse_compress.py", line 18, in __init__
    self.all_phrases = pickle.load(all_phrases)
  File "c:\Python27\lib\pickle.py", line 1384, in load
    return Unpickler(file).load()
  File "c:\Python27\lib\pickle.py", line 864, in load
    dispatch[key](self)
  File "c:\Python27\lib\pickle.py", line 1175, in load_binput
    i = ord(self.read(1))
TypeError: ord() expected a character, but string of length 0 found

I am using the Windows 10 and have googled the error and tried the solution as:
Replacing r with rb while opening the pickle dictionary. But the solution doesn't worked at all.
Kindly help me.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.