Code Monkey home page Code Monkey logo

cscareerquestions-salaries's Introduction

/r/cscareerquestions Salary Scraper

This script scrapes /r/cscareerquestions salary sharing threads for offer information and writes these details to a csv file. Currently it records info for the following fields: company, location, salary, relocation bonus, signing bonus, stock, and total compensation.

Overview

The /r/cscareerquestions subreddit hosts periodic salary sharing threads where people share details of their job offers (like this one).

This script scrapes /r/cscareerquestions salary sharing threads for offer information and writes these details to a csv file. Currently it records info for the following fields: company, location, salary, relocation bonus, signing bonus, stock, and total compensation.

If you don't care about running the script and just want the data, then look at output/salaries.csv

Commenters don't use a common format when inputing data (e.g. they often write in text instead of numbers) so none of the fields are strictly numbers. This means its hard to do any analysis of salary/relocation/signing/stock without doing some serious cleanup of the data. So, for now this is mostly just useful as a personal reference for what salaries to expect from various companies.

Setup And Run

Clone this repository:

git clone https://github.com/anders617/cscareerquestions-salaries.git

Install the praw Reddit API wrapper:

pip install praw
conda install -c conda-forge praw

Install the dotenv library:

pip install -U python-dotenv
conda install -c conda-forge python-dotenv

Next you will need to get credentials to make use of the Reddit API

Navigate to https://www.reddit.com/prefs/apps and click the "create app" button. Create an app in order to get a client id and client secret.

You can find the CLIENT_ID and CLIENT_SECRET in the locations marked below:

img

Create a new .env file in the same directory as salaries.py with the following contents (using your new client id/secret):

CLIENT_ID='YOUR_CLIENT_ID'
CLIENT_SECRET='YOUR_CLIENT_SECRET'
USER_AGENT='python'

Run salaries.py in the terminal:

python salaries.py --output=output/salaries.csv --verbose

Output

You should get output similar to the following:

[...]
========================================================
Company: Financial Institution
Location: Charlotte, NC
Salary: 70k
Relocation: None
Signing: None
Stock: 5 - 10%
Total: 77k
========================================================
Company: Health Insurance
Location: Buffalo, NY
Salary: $45,00 (That was a year ago, offer is now $50k)
Relocation: $0
Signing: $0
Stock: $2,300, but since we are a non-profit, bonuses are dependent on meeting our financial goals for the year.
Total: $53,000
========================================================
Company: Northrop Grumman
Location: Richmond VA
Salary: 52.5K
Relocation: None
Signing: None
Stock: I think we get these, a couple thousand if we hit goals.
Total: None
========================================================
Company: Digital Agency
Location: Southern Brazil
Salary: $5.9k (year)
Relocation: None
Signing: None
Stock: None
Total: None
========================================================
Company: SAAS
Location: Chicago
Salary: $75,000
Relocation: $0
Signing: $0
Stock: No stock, yearly bonus depends. My last one was about 1.6k
Total: None
========================================================
718 Salaries Recorded From 718 Relevant Comments (Out Of 4458 Total) In 11 Salary Sharing Threads
16.1% of comments were salaries

10 Most Common Companies:
        Google: 30
        Amazon: 24
        Microsoft: 20
        Big 4: 19
        Finance: 18
        Facebook: 15
        Defense: 10
        IBM: 10
        Capital One: 10
        Fintech: 7

10 Most Common Locations:
        Seattle: 33
        NYC: 27
        Bay Area: 25
        San Francisco: 16
        Chicago: 16
        London: 14
        Toronto: 13
        Redmond, WA: 12
        SF: 12
        Austin, TX: 12

Here are the first few lines of output/salaries.csv:

Date Company Salary Location Relocation Signing Stock Total Url
2019-09-09 20:38:39 Amazon Web Services 112k/yr Austin Texas 9k lump sum post tax, miles/meals reimbursed 38k first year 22k second year 80k over 4 years ~150k a year? plus https://www.reddit.com/r/cscareerquestions/comments/czhew5/official_salary_sharing_thread_for_new_grads/ezqn8rr
2019-09-05 06:19:35 mature NYC startup $105,000 New York 0 0 17,000 stock options $105,000 (valuing options at $0) https://www.reddit.com/r/cscareerquestions/comments/czhew5/official_salary_sharing_thread_for_new_grads/ez3bre4
2019-09-04 13:38:07 Finance 80k Boston 5k 5k 0 85k https://www.reddit.com/r/cscareerquestions/comments/czhew5/official_salary_sharing_thread_for_new_grads/eyyx82q

You can view the entire output from a recent run in output/salaries.csv

Modifying The Script

Currently this only looks at New Grad salary sharing threads but can be pretty easily modified to parse whatever threads you want by modifying the submission_ids list in main.py to contain the ids of the desired salary sharing threads.

The id of a thread can be found in the url. (e.g. the id of reddit.com/r/cscareerquestions/comments/czhew5/official_salary_sharing_thread_for_new_grads/ is czhew5)

cscareerquestions-salaries's People

Contributors

anders617 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

sidguptacode

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.