Code Monkey home page Code Monkey logo

reddit-prediction-model's Introduction

Reddit Upvote Prediction Model

A linear regression model aimed at finding features that best influence predicting a Reddit post's upvotes.

Description

This repo stores a research project for investigating the influence of different features in a linear regression model that aims to predict a Reddit posts upvotes. The finalized paper can be found in pdf/Research_Paper.pdf.

To start, this project web mined the top 500 subreddits based on subsriber count, and then attempts to get there top 500 posts from the last 365 days. The top 500 subreddits can be found here, and the 246,472 posts can be found here. NOTE: This data include both SFW and NSFW content, you have been warned.

The finalized prediction model script can be found here. Numerous scripts were used to optimize, and analyze the script with ablation. This can be found in the scripts folder. Any images produced by the scripts are stored in the images folder.

This project was a class assignment for Fall 2021, CSE 158, and the assignment description can be found here. The approachs and model was inspired by a research paper that can be found here, and the dataset this papers utilizes can be found here.

Getting Started

Dependencies

Authors

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

  • frontpagemetrics.com
    • Data Used: 2021-11-19.csv
  • CSE 158 Datasets
    • Understanding the interplay between titles, content, and communities in social media Himabindu Lakkaraju, Julian McAuley, Jure Leskovec ICWSM, 2013
    • Data Used: submissions.csv.gz

reddit-prediction-model's People

Contributors

donaldwolfson avatar

Watchers

 avatar

reddit-prediction-model's Issues

Scrape Reddit for metadata

Use PRAW to scrape reddit's top 500 most popular subreddits to find metadata of their top submissions of the year. More info on submissions can be found here. Make sure to handle PRAW's API request timeline which can be found here.

CSV Data should be returned in this format (if possible):

image_id,unixtime,rawtime,title,total_votes,reddit_id,number_of_upvotes,subreddit,number_of_downvotes,localtime,score,number_of_comments,username

Here is a possible mapping of those CSV values to the values in the Submissions:

  • image_id: ?
  • unixtime: created_utc
  • rawtime: Conversion of created_utc
  • title: name
  • total_votes: Math involving upvote_ratio and score
  • reddit_id: id
  • number_of_upvotes: score
  • subreddit: subreddit
  • number_of_downvotes: Math involving upvote_ratio and score
  • localtime: Conversion of created_utc
  • score: score (?)
  • number_of_comments: Length of comments
  • username: author

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.