Code Monkey home page Code Monkey logo

san-francisco-compensation-case-study's Introduction

Case Study - San Francisco Compensation Using Python and SQL

Introduction

The San Francisco Salaries case study involves a dataset of salaries for public sector employees in the city of San Francisco.

The City of San Francisco maintains a public database of employee compensation data, which is updated annually and made available to the public for transparency and accountability purposes.

This case study will follow the 5 phases of the Data Analysis process: Ask, Prepare, Process, Analyze, and Act (APPAA) to explore the factors that contribute to employee compensation in San Francisco.

By analyzing the dataset, we can gain insights into the distribution of salaries across different job titles, the average compensation for different levels of experience, the correlation between years of experience and salaries, and the pension debt for different job titles.

The findings from this case study can be used by policymakers and public sector organizations to ensure fair and equitable compensation for all public sector employees in San Francisco.

ASK

In this case study, we will explore the San Francisco Salaries dataset to analyze different factors that contribute to employee compensation. The dataset includes a range of variables, such as job titles, total pay, years of experience, and pension debt, making it a valuable resource for understanding the compensation of public sector employees. To achieve this goal, we will address the following questions:

  1. What are the top 10 job titles with the highest number of employee?
  2. What is the distribution of salaries of top 10 job titles with the highest number?
  3. What is the average basic pay by job title and year between 2017 and 2021?
  4. What are the top 5 job titles with the highest average base pay in 2021 for the job titles that have at least 50 employees?
  5. What are the top 10 job titles with the highest number of employees in 2021 that have an average total pay and benefits above the overall average?
  6. Which job titles had the highest average BasePay in 2021?
  7. What was the average BasePay for employees in the three job titles with the highest number of employees in 2021?
  8. How many employees had a base pay less than the average base pay for their job title in each year from 2017 to 2021?
  9. How many employees had a total pay and benefits greater than $200,000 in each year from 2017 to 2021?
  10. Which job title had the highest average OvertimePay in 2021?

By answering these questions, we can gain valuable insights into the factors that contribute to employee compensation in San Francisco and identify areas where interventions may be necessary to ensure fair and equitable compensation for all public sector employees.

PROCESS

Cleaning and Preparation of data for analysis

USING PYTHON

Read the CSV file

import pandas as pd

df = pd.read_csv(r'...\sf-salaries.csv')

 

Check for nulls

df.isnull().sum() / len(df) * 100

 

Remove spaces between words in each column

df.columns = df.columns.str.replace(' ', '')

 

Drop unessential columns

df = df.drop(['EmployeeName', 'Notes', 'Agency'], axis=1)

 

Remove rows with 0 and negative values in the 'BasePay' column

df = df[df['BasePay'] > 0]

 

Save the cleaned data to a new CSV file

df.to_csv('SF_Salaries_cleaned.csv', index=False)

 

ANALYZE

We analyzed the results of our queries and discovered the following insights:

USING MySQL

  1. What are the top 10 job titles with the highest number of employee?
SELECT JobTitle, COUNT(*) AS TotalEmployees
FROM Salaries
GROUP BY JobTitle
ORDER BY TotalEmployees DESC
LIMIT 10;

 

  1. What is the distribution of salaries of top 10 job titles with the highest number?
SELECT JobTitle, COUNT(*) AS TotalEmployees, 
    MIN(BasePay) AS MinSalary, MAX(BasePay) AS MaxSalary,
    AVG(BasePay) AS AvgSalary, STDDEV(BasePay) AS StdDevSalary
FROM Salaries
WHERE JobTitle IN (
    SELECT JobTitle
    FROM (
        SELECT JobTitle, COUNT(*) AS TotalEmployees
        FROM Salaries
        GROUP BY JobTitle
        ORDER BY TotalEmployees DESC
        LIMIT 10
    ) AS top_jobs
)
GROUP BY JobTitle
ORDER BY TotalEmployees DESC;

 

  1. What is the average basic pay by job title and year between 2017 and 2021?
SELECT JobTitle, Year, ROUND(AVG(BasePay), 2) AS AverageBasicPay
FROM salaries
WHERE JobTitle IN (
  SELECT JobTitle
  FROM (
    SELECT JobTitle, COUNT(*) AS TotalEmployees
    FROM salaries
    GROUP BY JobTitle
    ORDER BY TotalEmployees DESC
  ) AS top_jobs
)
AND Year BETWEEN '2017' AND '2021'
GROUP BY JobTitle, Year
ORDER BY JobTitle, Year;

 

  1. What are the top 5 job titles with the highest average base pay in 2021 for the job titles that have at least 50 employees?
SELECT JobTitle, AVG(BasePay) AS AverageBasePay
FROM salaries
WHERE JobTitle IN (
  SELECT JobTitle
  FROM (
    SELECT JobTitle, COUNT(*) AS TotalEmployees
    FROM salaries
    WHERE Year = '2021'
    GROUP BY JobTitle
    HAVING TotalEmployees >= 50
  ) AS top_jobs
)
GROUP BY JobTitle
ORDER BY AverageBasePay DESC
LIMIT 5;

 

  1. What are the top 10 job titles with the highest number of employees in 2021 that have an average total pay and benefits above the overall average?
SELECT JobTitle, COUNT(*) AS TotalEmployees
FROM salaries
WHERE JobTitle IN (
  SELECT JobTitle
  FROM (
    SELECT JobTitle, AVG(TotalPay&Benefits) AS AverageTotalPayBenefits
    FROM salaries
    WHERE Year = '2021'
    GROUP BY JobTitle
    HAVING AverageTotalPayBenefits > (
      SELECT AVG(TotalPay&Benefits)
      FROM salaries
    )
  ) AS top_jobs
)
GROUP BY JobTitle
ORDER BY TotalEmployees DESC
LIMIT 10;

 

  1. Which job titles had the highest average BasePay in 2021?
SELECT JobTitle, ROUND(AVG(BasePay),2) AS AvgBasePay
FROM salaries
WHERE JobTitle IN (
    SELECT JobTitle
    FROM (
        SELECT JobTitle, COUNT(*) AS TotalEmployees
        FROM salaries
        GROUP BY JobTitle
        ORDER BY TotalEmployees DESC
        LIMIT 10
    ) AS top_jobs
)
AND Year = 2021
GROUP BY JobTitle
ORDER BY AvgBasePay DESC;

 

  1. What was the average BasePay for employees in the three job titles with the highest number of employees in 2021?
SELECT ROUND(AVG(BasePay),2) AS AvgBasePay
FROM salaries
WHERE JobTitle IN (
    SELECT JobTitle
    FROM (
        SELECT JobTitle, COUNT(*) AS TotalEmployees
        FROM salaries
        WHERE Year = 2021
        GROUP BY JobTitle
        ORDER BY TotalEmployees DESC
        LIMIT 3
    ) AS top_jobs
)
AND Year = 2021;

 

  1. How many employees had a base pay less than the average base pay for their job title in each year from 2017 to 2021?
SELECT Year, JobTitle, COUNT(*) AS NumEmployeesBelowAvgBasePay
FROM (
  SELECT Year, JobTitle, BasePay, AVG(BasePay) OVER (PARTITION BY JobTitle) AS AvgBasePay
  FROM salaries
  WHERE Year BETWEEN '2017' AND '2021'
) AS base_pay_by_jobtitle
WHERE BasePay < AvgBasePay
GROUP BY Year, JobTitle
ORDER BY 2,1;

 

  1. How many employees had a total pay and benefits greater than $200,000 in each year from 2017 to 2021?
SELECT Year, COUNT(*) AS NumEmployeesOver200K
FROM (
  SELECT Year, Status, (BasePay + OvertimePay + OtherPay + Benefits + PensionDebt) AS TotalPayAndBenefits
  FROM salaries
) AS pay_and_benefits
WHERE TotalPayAndBenefits > 200000 AND Year BETWEEN '2017' AND '2021' AND Status = 'FT'
GROUP BY Year;

 

  1. Which job title had the highest average OvertimePay in 2021?
SELECT JobTitle, ROUND(AVG(OvertimePay),2) AS AvgOvertimePay
FROM salaries
WHERE JobTitle IN (
    SELECT JobTitle
    FROM (
        SELECT JobTitle, COUNT(*) AS TotalEmployees
        FROM salaries
        WHERE Year = 2020
        GROUP BY JobTitle
        ORDER BY TotalEmployees DESC
        LIMIT 10
    ) AS top_jobs
)
AND Year = 2021
GROUP BY JobTitle
ORDER BY AvgOvertimePay DESC
LIMIT 1;

 

ACT

Conclusion:

Based on our analysis of the salaries dataset, we can conclude that the top job titles with the highest number of employees are "Transit Operator", "Special Nurse", and "Registered Nurse". Additionally, we found that the distribution of salaries among the top 10 job titles is quite wide, with some job titles having a large range of salaries. We also found that the average basic pay varies by job title and year, with some job titles experiencing fluctuations in average pay over time.  

Recommendation:

My analysis suggests that organizations should focus on ensuring that employees are paid fairly and equitably across job titles, particularly for job titles that have a large number of employees. Additionally, organizations should consider implementing measures to address the wide range of salaries among job titles and ensure that there is transparency and consistency in pay practices. Finally, organizations should regularly review salary data to identify any trends or fluctuations in average pay by job title and year, and adjust their compensation strategies as needed to remain competitive in the job market.  

References

Dataset Source: Transparent California  

Licenses: Database: Open Database, Contents: Database Contents

Contact Information

LinkedIn  

UpWork

san-francisco-compensation-case-study's People

Contributors

ninogarcia avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.