Code Monkey home page Code Monkey logo

sc_salary_data's Introduction

Salary Data Explorations

This is a project to explore the SC Salary data that is provided through the state's Transparency Portal.

There are two sets of files currently in the raw_data directory: salary CSV files and state employee counts by agency. Files before July 2019, were pulled from Archive.org's copy of the site and there are points at which the CSV file links appear to have been broken so some files may be missing. Please also note, that the SC State Accountability Portal provides a careful description of the limitations of this information; this includes agencies not being included or not providing full details about all compensation, but review that source fully to understand the accuracy limits of this data.

Data Files

There are two sets of data provided in this project currently: Salary data as disclosed by the state government for people making more than $50,000 a year and a count of employees by agency as provided by the state. These two data sets should not match as the state does not include a disclaimer stating that only employees making more than $50,000 are included in the count. The state's data is included here as it should, in theory, make it easier to create rough estimates about how these lower paid employees affect the calculation of averages.

Scripts

This project provides a few simple scripts to help acquire files over time and clean them up.

get_salary_data.py

Checks for a new version of the salary data file on the admin.sc.gov site and adds it to the raw_data directory if one is found. This script is designed as a daily cron job and assumes the page layout and links use the same markup they have used since 2015 (one day this will be an invalid assumption).

combine_files.py

Takes all .csv files in the raw_data directory and combines them into the processed.json file in order to provide the full historic data set.

get_emp_data.py

Downloads the employee count reports provided in PDF format (currently). It is designed as a daily cron job and assumes the link doesn't change and that they do not switch to CSV (which could cause naming conflicts with the salary data).

Licensing

The included License only applies to other material in this project as the salary information (provided by the SC State government) and the content in the raw_data directory (which is unaltered from the source except the file name) are both public domain.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.