- Installation
- Project Motivation
- File Descriptions
- Results
- Acknowledgements
The original data is 197MB, to download please go to Stack Over Flow Annual Developer Survey and choose the dataset for 2019.
There should be no necessary libraries to run the code here beyond the Anaconda distribution of Python. This project is built using Anaconda 1.7.2, Python versions 3.6.3 and jupyter notebook 5.7.8.
Libraries used:
-
pandas
-
numpy
-
matplotlib
-
seaborn
For this project, I was interestested in using Stack Overflow data from 2019 to better understand:
- What does data just about man and women in terms of salary and working hours.
- How does company size influence the job satisfaction of developers?
- Do people in different countries learn about software development in different ways?
The full data exploration is contained in the notebook Stackoverflow 2019 Survey.ipynb.
This is carried out according to the CRISP-DM process and the data science process — gather, assess, clean, analyze, model, and visualize.
Markdown cells and comments are used to clarify all the steps and answer the questions I pose to the dataset. The remaining files in this repository are:
- -org-size-and-developer-2jjb.png — Image of a clustered bar chart depicting the relationship between company size and survey respondent satisfaction
- programming-training-correlated-with-heat.png — Image of a heat map depicting correlations between respondents’ country and non-degree software development education
- women and man -man.png-women.png-image of description of man and women.
The main findings of the code can be found at the post available here.
Thanks to Udacity for providing such a great project topic, and to Stack Overflow for providing the dataset