KLaHD's Projects
A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):
Personal Data Engineering Projects
Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Purity UI Dashboard - Free and Open Source Chakra UI Dashboard
Spark with Scala. Big data project to analyze 35 GB Parquet data (~400 GB as decompressed CSV) and extract business insights from it
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Configuration files for post, 'Getting Started with Apache Zeppelin on Amazon EMR, using AWS Glue, RDS, and S3'
Zeppelin Notebooks for use on AWS EMR with and without using Zelp