This repository show basics of using PySpark, inspired by Apache PySpark by Example, Lynda Course.
Data used are Chicago's Reported Crime Data that can be downloaded here [more than 1.5 GB] : https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD
And Chicago's police stations, that can be downloaded here : https://data.cityofchicago.org/api/views/z8bn-74gv/rows.csv?accessType=DOWNLOAD.
sample.csv is a simple of cirmes, and Police_Stations.csv is the entire dataset.