Code Monkey home page Code Monkey logo

data-engineering-certificate-projects's Introduction

IBM Data Enineering Professional Certificate Projects

This repository contains projects I did while persuing the IBM Data Enineering Professional Certificate. They cover a wide range of topics and tools from ETL and ELT preocesses, data lakes, data warehouses and data lakehouese. I also had hands on experience with tools such as kafka, Apache airflow, Hadoop, HDFS and Apache Spark.

Course Details

  1. Introduction to Data Engineering

    List basic skills required for an entry-level data engineering role. Discuss various stages and concepts in the data engineering lifecycle. Describe data engineering technologies such as Relational Databases, NoSQL Data Stores, and Big Data Engines. Summarize concepts in data security, governance, and compliance.

  2. Python for Data Science, AI & Development

    Describe Python Basics including Data Types, Expressions, Variables, and Data Structures. Apply Python programming logic using Branching, Loops, Functions, Objects & Classes. Demonstrate proficiency in using Python libraries such as Pandas, Numpy, and Beautiful Soup. Access web data using APIs and web scraping from Python in Jupyter Notebooks.

  3. Python Project for Data Engineering

    Demonstrate your skills in Python for working with and manipulating data. Implement webscraping and use APIs to extract data with Python. Play the role of a Data Engineer working on a real project to extract, transform, and load data. Use Jupyter notebooks and IDEs to complete your project.

  4. Introduction to Relational Database (RDBMs)

    Describe data, databases, relational databases, and cloud databases. Describe information and data models, relational databases, and relational model concepts (including schemas and tables). Explain an Entity Relationship Diagram and design a relational database for a specific use case. Develop a working knowledge of popular DBMSes including MySQL, PostgreSQL, and IBM DB2.

  5. Databases and SQL for Data Science with Python

    Analyze data within a database using SQL and Python. Create a relational database and work with multiple tables using DDL commands. Construct basic to intermediate level SQL queries using DML commands. Compose more powerful queries with advanced SQL techniques like views, transactions, stored procedures, and joins.

  6. Hands-on Introductino to Linux Commands and Shell Scripting

    Describe the Linux architecture and common Linux distributions and update and install software on a Linux system. Perform common informational, file, content, navigational, compression, and networking commands in Bash shell. Develop shell scripts using Linux commands, environment variables, pipes, and filters. Schedule cron jobs in Linux with crontab and explain the cron syntax.

  7. Relational Database Administration (DBA)

    Create, query, and configure databases and access and build system objects such as tables. Perform basic database management including backing up and restoring databases as well as managing user roles and permissions. Monitor and optimize important aspects of database performance. Troubleshoot database issues such as connectivity, login, and configuration and automate functions such as reports, notifications, and alerts.

  8. ETL and Data Pipelines with Shell, Airflow and Kafka

    Describe and contrast Extract, Transform, Load (ETL) processes and Extract, Load, Transform (ELT) processes. Explain batch vs concurrent modes of execution. Implement an ETL pipeline through shell scripting. Describe data pipeline components, processes, tools, and technologies.

  9. Getting Started with Data Warehousing and BI Analytics

    Explore the architecture, features, and benefits of data warehouses, data marts, and data lakes and identify popular data warehouse system vendors. Design and populate a data warehouse, and model and query data using CUBE, ROLLUP, and materialized views. Identify popular data analytics and business intelligence tools and vendors and create data visualizations using IBM Cognos Analytics. Design and load data into a data warehouse, write aggregation queries, create materialized query tables, and create an analytics dashboard.

  10. Introduction to NoSQL Databases

    Differentiate between the four main categories of NoSQL repositories. Describe the characteristics, features, benefits, limitations, and applications of the more popular Big Data processing tools. Perform common tasks using MongoDB tasks including create, read, update, and delete (CRUD) operations. Execute keyspace, table, and CRUD operations in Cassandra.

  11. Introduction to Big Data with Spark and Hadoop

    Explain the impact of big data, including use cases, tools, and processing methods. Describe Apache Hadoop architecture, ecosystem, practices, and user-related applications, including Hive, HDFS, HBase, Spark, and MapReduce. Apply Spark programming basics, including parallel programming basics for DataFrames, data sets, and Spark SQL. Use Spark’s RDDs and data sets, optimize Spark SQL using Catalyst and Tungsten, and use Spark’s development and runtime environment options.

  12. DMachine Learning with Apache Spark

    Describe ML, explain its role in data engineering, summarize generative AI, discuss Spark's uses, and analyze ML pipelines and model persistence. Evaluate ML models, distinguish between regression, classification, and clustering models, and compare data engineering pipelines with ML pipelines. Construct the data analysis processes using Spark SQL, and perform regression, classification, and clustering using SparkML. Demonstrate connecting to Spark clusters, build ML pipelines, perform feature extraction and transformation, and model persistence.

  13. Data Engineering Capstone Project

    Demonstrate proficiency in skills required for an entry-level data engineering role. Design and implement various concepts and components in the data engineering lifecycle such as data repositories. Showcase working knowledge with relational databases, NoSQL data stores, big data engines, data warehouses, and data pipelines. Apply skills in Linux shell scripting, SQL, and Python programming languages to Data Engineering problems.

data-engineering-certificate-projects's People

Contributors

freddyjaoko avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.