The files here are pyspark codes written using Zeppelin Notebook on AWS.
Create an EMR Cluster with Spark and Zeppelin and import these files to the Zeppelin notebook.
Data referenced in these files is made public in the S3 bucket. So it shouldn't be a problem running the codes.