In this repo, I recap my solutions for the assignments for the 15-month IBM Data Engineering Professional Specialization on Coursera that I have done in less than 3 weeks. The specialization contains:
- Create, design, and manage relational databases and apply database administration (DBA) concepts to RDBMSes such as MySQL, PostgreSQL, and IBM Db2.
- Develop and execute SQL queries using SELECT, INSERT, UPDATE, DELETE statements, database functions, stored procedures, Nested Queries, and JOINs.
- Demonstrate working knowledge of NoSQL & Big Data using MongoDB, Cassandra, Cloudant, Hadoop, Apache Spark, Spark SQL, Spark ML, Spark Streaming.
- Implement ETL & Data Pipelines with Bash, Airflow & Kafka; architect, populate, deploy Data Warehouses; create BI reports & interactive dashboards.
There are 13 courses throughout the specialization and a capstone project at the end:
- Introduction to Data Engineer
- Python for Data Science, AI & Development
- Python Project for Data Engineer
- Introduction to Relational Databases (RDBMS)
- Databases and SQL for Data Science with Python
- Hands-on Introduction to Linux Commands and Shell Scripting
- Relational Database Administration (DBA)
- ETL and Data Pipelines with Shell, Airflow and Kafka
- Getting Started with Data Warehousing and BI Analytics
- Introduction to NoSQL Databases
- Introduction to Big Data with Spark and Hadoop
- Data Engineering and Machine Learning using Spark
- Data Engineering Capstone Project
- OLTP database - MySQL
- NoSql database - MongoDB
- Production Data warehouse – DB2 on Cloud
- Staging - Data warehouse – PostgreSQL
- Big data platform - Hadoop
- Big data analytics platform – Spark
- Business Intelligence Dashboard - IBM Cognos Analytics
- Data Pipelines - Apache Airflow