Code Monkey home page Code Monkey logo

reddit_seer's People

Contributors

andrewprins avatar ffrankies avatar troylmadsen avatar

Stargazers

 avatar

Watchers

 avatar  avatar

reddit_seer's Issues

Build linear regression model for predicting post popularity

Considerations:

  • Should, at the very least, use the bag of words from the title and the post text to try to predict popularity
  • Do not confuse linear regression with logistic regression. We can try to use logistic regression separately, but it will also predict "buckets" or categories, like the naive bayes algorithm.
  • Ideally, would be extensible and allow additional features to be easily added

Set up the naive bayes algorithm for predicting post popularity

Considerations:

  • This will not give us an actual number, but rather use "buckets" or categories (i.e.: unpopular, popular, neutral, average, etc)
  • Defining and setting up these buckets will be a major consideration and will affect our output
  • Will probably need to do a refresher on how this works (hopefully, though, there's a scikit-learn module for this to do the bulk of the work for us)
  • Try to make this extensible in case we choose to add more features (e.g.: sentiment, variability from the average content, etc)

Comparison of different factors and popularity

Need visualizations to make comparisons like the following:

  • Does sentiment predict popularity?
  • Does time of posting predict popularity?
  • Does difference from average predict popularity? (semi-stretch goal?)

Figure out a way to extract information from images

Considerations

  • A stretch goal for us
  • Multiple ways to use image info for popularity prediction:
    • Directly infer popularity from image (simple CNN or ANN, ANN would more easily allow us to add more features, but is less suitable for images than CNN)
    • Use some kind of ANN to automatically 'label' images and perform analysis on the labels
    • Use some kind of ANN to extract text from images, and perform analysis on the text

Analyze how informative a post is

Considerations:

  • Stretch goal
  • Suggested by Dr. Moore - can lead to interesting discussion about whether or not reddit promotes discussion or not

Analyze sentiment on a given text or bag-of-words

Considerations:

  • Most likely, will be using the actual text, not bag-of-words
  • Need to be careful with the libraries we use, since most of them just do a simple num_positive_words - num_negative_words calculation
  • May be possible to do more complex sentiment analysis, such as extracting emotions

Analyze how different a post is from others

Currently, a basic idea is to use find an 'average' bag-of-words for a subreddit, and use the root-mean-square-difference algorithm on each post's bag-of-words to find how much it differs from the 'average'.

This might be even more useful if we do analysis on comments.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.