Data processing pipeline that extracts raw data, applies ETL processes to prepare it for analysis, stores it in the Delta format on Amazon S3, and utilizes PySpark within a Jupyter environment running in a Docker container managed by Jenkins

functions

google-cloud-python

Google Cloud Client Library for Python

gympass_robot

jornada_colaborativa

kml-brasil

Coordenadas geográficas de fronteiras (em KML e JSON) de estados e municípios brasileiros

labtools-k8s

Complete data engineering pipeline running on Minikube Kubernetes, Argo CD, Spark, Trino, S3, Delta lake, Postgres+ Debezium CDC, MySQL,Airflow, Kafka Strimzi, Datahub, OpenMetadata,Zeppelin, Jupyter, JFrog Container Registry

lambda-refarch-streamprocessing

Serverless Reference Architecture for Real-time Stream Processing

lorenz-system

Model, visualizations, and animation of the Lorenz system

open-data-registry

A registry of publicly available datasets on AWS

pymongo-ssh

How to query MongoDB through SSH Tunnel with Python

pyspark-k8s-py3.9

It covers the creation of a PySpark image for processing data in S3, particularly in Delta format, as well as the intricacies of configuring PySpark within an existing Kubernetes environment running services on the python:3.9-slim-buster image.

read.dbc

An R package for reading data in the DBC (compressed DBF) format used by DATASUS.

responses

A utility for mocking out the Python Requests library.

ribge

R package for (down)loading data from IBGE (Instituto Brasileiro de Geografia e Estatística)

slack-github-action

Send data into Slack using this GitHub Action!

spark-k8s-operator

Operator for Apache Spark-on-Kubernetes for Stackable Data Platform

spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

guigasque Goto Github PK

Guilherme de A. Gasque's Projects

Recommend Projects

Recommend Topics

Recommend Org