Name: Soumil Nitin Shah
Type: User
Company: Lead Data Engineer | AWS & Apache Hudi Expert | Spark & AWS Glue Enthusiast | YouTuber
Bio: Lead Data Engineer specializing in Apache Hudi, AWS, and big data. Creator of the "LakeBoost" framework. YouTuber with 42k subscribers, sharing tech insights.
Location: New York
Blog: https://soumilshah.com/
Soumil Nitin Shah's Projects
Accelerating Data Processing: Leveraging Apache Hudi with DynamoDB for Faster Commit Time Retrieval
Architecture Powering Down Stream System with CDC from HUDI Transactional Datalake
Hudi Best Practices: Handling Failed Inserts/Upserts with Error Tables
Advance Python Object Oriented with Meta Class
Advantages of Metadata Indexing and Asynchronous Indexing in Hudi Hands on Lab
test
Airflow Docker Image for the Development environment
An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Airflow Docker Image for the Production environment
Airflow Tutorials
This repo demonstrates how to set up an airflow environment and custom Selenium plugin.
A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs
An easy-to-use Python utility class for accessing incremental data from Hudi Data Lakes
apache hudi delta streamer labs
apache-hudi-lake-formation
pache Hudi Table Services | Hands on labs
apache-spark-on-lambdas
apache-x-table-docker-tutorial
apache-x-table-sync-aws-cloud-shell
AppleStock
Arduino-meets-software-log-sensordata-database-python
code
arduino-thingspeak-python-ifttt-processing-Twilio
Async callback Pattern to Automate orchestrating EMR Serverless Jobs with Step Functions
athena-iceberg-demo
Athena usage is simple python library that allows you to extract all usage information for given date range and for given workgroup
console app