Joao Pedro Afonso Cerqueira's Projects
Apache Airflow for K8s Clusters with Docker-compose orchestration. Example includes used in Workflows for Jobs like WebHooks and WebScrapers
Transformation of Akamai Logs with Spark ETL and discover of Values and similarities in logs used SparkML and H2O ML
Scripts to benchmark distributed Alternative Least Squares (ALS)
cluster-management-python-pyspark-ngrams-samples
Experimentation of confluent Kafka Tools and Client solutions
Docker-Container for Jupyter Notebooks using as a baseline hook other repo
Techical assignment
Hadoop Cloudera investigations
My adaptation of the flume-logs ingestion process
H2O and sparklyr setup in Rstudio with demo/trials for Hadoop Spark
This is the core of project lost_saturn . The project lost_saturn project is a modern approach to datascience, focus on enabling DataScience on containerised environments everywhere. Built first with a local setup and transformed into a container solution. It has tools centralized in Jupyter , with Spark and AutoML H2O.ai . Ideal to run Notebooks in Jupyter in WSL (Windows Subsystem Linux), or Docker containers with Ubunto 18.4 LTS
This is the core of project lost_saturn . The project lost_saturn project is a modern approach to datascience, focus on enabling DataScience on containerised environments everywhere. Built first with a local setup and transformed into a container solution. It has tools centralized in Jupyter , with Spark and AutoML H2O.ai . Ideal to run Notebooks in Jupyter in WSL (Windows Subsystem Linux), or Docker containers with Ubunto 18.4 LTS
An Deployment and Setup of Apache Spark for multi-tenant usage in Kubernetes Clusters. This deploys 1 Executor per K8S POD , scales linearly.
Elasticsearch publisher using Hadoop as source and Spark 1.6 as ETL engine :: Running package for Cloudera CDH 5.9.0 Cluster
Technical Test Github Repo for Container of Test
TensorFlow in Java. If Google Can do it! I can Do it!
AWScli Terraform for 6 Node Cloudera CDH with Hadoop Spark Hive