Data processing pipeline that extracts raw data, applies ETL processes to prepare it for analysis, stores it in the Delta format on Amazon S3, and utilizes PySpark within a Jupyter environment running in a Docker container managed by Jenkins
guigasque / docker-pyspark-delta-s3 Goto Github PK
View Code? Open in Web Editor NEWData processing pipeline that extracts raw data, applies ETL processes to prepare it for analysis, stores it in the Delta format on Amazon S3, and utilizes PySpark within a Jupyter environment running in a Docker container managed by Jenkins
License: Apache License 2.0