Code Monkey home page Code Monkey logo

moataz-elmesmary / data-science-roadmap Goto Github PK

View Code? Open in Web Editor NEW
2.9K 45.0 395.0 130.8 MB

Data Science Roadmap from A to Z

Home Page: https://www.linkedin.com/posts/moatazelmesmary_github-moataz-elmesmarydata-science-roadmap-activity-6974293994960769024-yCdi?utm_source=share&utm_medium=member_desktop

License: MIT License

data-analysis data-engineering data-science data-visualization deep-learning machine-learning mathematics probability python sql

data-science-roadmap's Introduction

     DATA SCIENCE ROADMAP 🏴‍☠️ 2024

Data Science Roadmap for anyone interested in how to break into the field!

This repository is intended to provide a free Self-Learning Roadmap to learn the field of Data Science. I provide some of the best free resources.


  Our Previous Roadmap ♥️
   ⚠️ Before we start, ⚠️

If you Dont know What`s Data Science or Projects Life Cycle (starting from Business Understanding to Deployment) or Which Programming Language you should go for or Job Descriptions or the required Soft & Hard Skills needed for this field or Data Science Applications or the Most Common Mistakes, then

📌This Video is for you (Highly Recommended ✔️)

Data Science vs Data Analytics vs Data Engineering - What's the Difference?


aaa

These terms are wrongly used interchangeably among people. There are distinct differences:

🔸 Data Science 🔸 Data Analytics 🔸 Data Engineering
Is a multidisciplinary field that focuses on looking at raw and structured data sets and providing potential actionable insights. The field of Data Science looks at ensuring we are asking the right questions as opposed to finding exact answers. Data Scientist require skillsets that are centered on Computer Science, Mathematics, and Statistics. Data Scientist use several unique techniques to analyze data such as machine learning, trends, linear regressions, and predictive modeling. The tools Data Scientist use to apply these techniques include Python and R.
Focuses on looking at existing data sets and creating solutions to capture data, process data, and finally organize data to draw actionable insights. This field looks at finding general process, business, and engineering improvements we can make based on questions we don't know the answers to. Data Analytics require skillsets that are centered on Statistics, Mathematics, and high level understanding of Computer Science. It involves data cleaning, data visualization, and simple modeling. Common Data Analytic tools used include Microsoft Power Bi, Tableau, and SQL.
Focuses on creating the correct infrastructure and tools required to support the business. Data Engineers look at what are the optimal ways to store and extract data and involves writing scripts and building data warehouses. Data Engineering require skillsets that are centered on Software Engineering, Computer Science and high level Data Science. The tools Data Engineers utilize are mainly Python, Java, Scala, Hadoop, and Spark.

Prepare your workspace

Tip 1️⃣ : Pick one and stick to it. (📁Click)


Anaconda: It’s a tool kit that fulfills all your necessities in writing and running code. From Powershell prompt to Jupyter Notebook and PyCharm, even R Studio (if interested to try R)

a

Atom: A more advanced Python interface, highly recommended by experts.
Google Colab: It’s like a Jupyter Notebook but in the cloud. You don’t need to install anything locally. All the important libraries are already installed. For example NumPy, Pandas, Matplotlib, and Sci-kit Learn
PyCharm: PyCharm is another excellent IDE that enables you to integrate with libraries such as NumPy and Matplotlib, allowing you to work with array viewers and interactive plots.
Thonny: Thonny is an IDE for teaching and learning programming. Thonny is equipped with a debugger, and supports code completion, and highlights syntax errors.

Most learning platforms have integrated code exercises where you don’t need to install anything locally. But to learn it right, you should have an IDE installed on your local machine. Suggestions will be a marketplace with many options and few improvements from one platform to another.

Tip 2️⃣ : Focus on one course at least.

Tip 3️⃣ : Don’t chase certifications.

Tip 4️⃣ : Don’t rush for ML without having a good background in programming & maths.

This track is divided into 3 phases ⬇️ :

  1. Beginner: you get a basic understanding of data analysis, tools and techniques.

  2. Intermediate: dive deeper in more complex topics of ML, Math and data engineering.

  3. Advanced: where we learn more advanced Math, DL and Deployment.

🔔 For Data Camp courses, github student pack gives 3 free months. Google how to get it.
if you already used it, do not hesitate to contact us to have an account with free access.:hibiscus:

Legend

  • 📹 Video Content
  • 📕 Online Article Content / Book

💡 Roadmap Explanation ▶️ Youtube Video 🎥


🔰 Beginner 🔰

Algorithms Book Every piece of code could be called an algorithm, but this book covers the more interesting bits.
Specializations (data structures-algorithms)

1. Descriptive Statistics Statistics
   📹 Intro to descriptive statistics | Same Course on YouTube
   📹 Statistics Fundamentals - StatQuest - Youtube
   📕 Online statistics education
   📕 Intro to descriptive statistics Article1 & Article2
   📹 Arabic Course
   📹 Intro to Inferential Statistics++
   📕 Practical Statistics for Data Scientists

2. Probability
   📹 Khan Academy
   📹 Arabic Course
   📕 Introduction to Probability

3. Programming Languages

 🔹R - good tool for visualization and statistical analysis.
   📹 Introduction to R (Datacamp)
   📹 Data Science Specialization - coursera
   📕 An Introduction to R
   📕 R for Data Science

 🔹Python💯
   📹 Introduction to Python Programming
   📹 OOP
   📹 Arabic - Hassouna | Elzero
   📹 Python Full Course - FreeCodeCamp on YouTube
   📕 Intro to Python for CS and Data Science
   more in OOP
4. Pandas
   📹 Corey Schafer-Youtube
   📕 Kaggle
   📕 Docs
   📹 Data School-Youtube
   📹 Arabic Course
   📹 PandasAI🐼1 - 2 Enhances the capabilities of Pandas by integrating Generative AI functionalities into it.
5. Numpy
   📕 Kagglenumpy
   📹 Arabic Course
   📕 Tutorial
   📕 Docs
6. Scipy
   📕 Tutorial
   📕 Docs
7. Data Cleaning: One of the MOST important skills that you need to master to become a good data scientist, you need to practice on many datasets to master it.
   Read this
   📹 Course 1
   📕 Notebook1
   📕 Notebook2
   📕 Notebook3
   📕 Kaggle Data cleaning
8. Data Visualization 📊
   📹 Introduction to Data Visualization with Matplotlib or
   📹 Corey Schafer - Playlist on Youtube or
   📹 sentdex - Playlist on YouTube
   📕 Kaggle to Data Visualization with Seaborn
   📹 Playlist-Youtube
   📹 Course1: Intro to Data Visualization with Seaborn
   📹 Course2: Intermediate Data Visualization with Seaborn
   📹 Course3: Understanding and Visualizing with Python

9. EDA Note: it's already mentioned in the above probability course
   📹 DataCamp-EDA in Python
   📹 IBM-EDA for Machine Learning

10. Dashboards

Power BI
   📹 Power BI - Youtube (Alex)
   📹 Power BI training
   📹 Arabic - Youtube (Zanoon)
   📹 Arabic - Youtube
Tableau tableau
   📕 Tutorial
   📹 docs
   📹 course - datacamp
   📹 Simplilearn - Youtube

11. SQL and DB
   📹 SQL for Data Analysis (Udacity-notesl📋l or simplilearn)
   📹 Intro to SQL or IBM (SQL for Data Science)
   📹 Intro to Relational Databases in SQL
   📹 Arabic Course
   📹 Arabic -ITI by Eng.Ramy Advanced - [Course Materials]
   📹 365 Data Science - SQL
   📝 Practice HackerRank & DataLemur

12. Python Regular Expression
   📕 Tutorial
13. Time Series Analysis
   📹 Track - DataCamp
   📹 Course - Coursera
   📕 Book
   📕 fbprohet
   📹 Arabic Source Video1 & Video2

At The end of Beginner phase apply all what you've learned on a project.


🔰 Intermediate 🔰

1. Math for ML: consists of Linear Algebra, Calculus and PCA.
📹 Mathematics for Machine Learning and Data Science - Andrew Ng
📹 Specialization
📹 Mathematics for Machine Learning - Most of the needed basics

🔹Linear Algebra
   📹 Khan Academy - Linear Algebra
   📹 Mathematics for Machine Learning: Linear Algebra
   📹 3Blue1Brown - Essence of Linear Algebra
🔹Calculus
   📹 Multivariate Calculus - Coursera
   📹 Essence of calculus - Youtube
🔹PCA
   📹 PCA - Coursera

2. Machine Learning
   📹 Coursera - Old Course by Andrew Ng (Octave/Matlab)
   📹 Coursera Andrew`s new ML Specialization (Python)
   📹 Machine Learning - StatQuest - YouTube
   📹 Machine Learning Stanford Full Course on YouTube by Andrew
   📹 CS480/680 Intro to Machine Learning - Spring 2019 - University of Waterloo
   📹 SYDE 522 – Machine Intelligence (Winter 2018, University of Waterloo)
   📹 Machine Learning for Engineers 2022 / (YouTube)
   📹 Introduction to Machine Learning Course - Udacity
   📹 Hesham Asem - Arabic content
   📹 IBM ML with Python
   📹 Machine Learning From Scratch - YouTube (Python Engineer)
   📕 Hands On ML (1st & 2nd & 3rd) Editions | Code: View on Github
   📹 ML Algorithms in Practice
   📹 ML scientist
   📹 Project

3. Web Scraping/APIs
   📹 course
   📕 intro2
   📕 Tutorial
   📕 Book for both topics
APIs
   📕 Tutorial
   📕 Article
   📕 Tutorial
4. Stats.
   📕 This stats - Book
   📕 Think Bayes - Book
5. Advanced SQL
   📹 Joining Data in SQL - DataCamp
   📹 Intermediate SQL - DataCamp
   📹 More advanced SQL

7. Feature Engineering
   📕 Tutorial
   📕 Article
   📕 Book
8. interpet Shapley-based explanations of ML models.
   📕 SHAP
   📕 Kaggle ML explainability

After finishing this level apply to 2 or 3 good sized projects.

Read this book, please 📖 Introduction to Statistical Learning with Applications in R بقولك اقرأه


🔰 Advanced 🔰

1. Deep Learning
   📹 Deep Learning Fundamentals
   📹 Introduction to Deep Learning - MIT
   📹 Specialization
   📕 Dive into Deep Learning (En) | (Ar) version ➡️Part1 & Part2
   📹 Deep Learning UC Berkely
   📕 github of Dive into DL
   📹 Stanford Lecture - Convolutional Neural Networks for Visual Recognition
   📹 University of Waterloo - ML / DL
   📕 Deep Learning for coders with fastai & PyTorch

2. Tensorflow
   📹 Specialization
   📹 Youtube
    fast.ai's Deep Learning Courses

TensorFlow beats PyTorch in visualization capabilities and deploying trained models. Go for PyTorch if you want flexibility, debugging capabilities, and short training duration.

3. PyTorch
   📹 PyTorch (UC Berkeley - Youtube) - Lec3 (The 5 parts)
   📹 PyTorch - Dr. Data Science - Youtube
   📹 Pytorch Tutorial - Aladdin - Youtube
   📹 PyTorch Course (2022) - Youtube
   📕 Deep Learning With Pytorch
   📕 Machine Learning with PyTorch and Scikit-Learn -2022

4. Advanced Data Science
   📹 Advanced Data Science with IBM Specialization Includes Apache Spark
 ☠️Advanced ML Topics🧠 | Lecs (YouTube)
   📹 Stanford CS330: Deep Multi-Task and Meta Learning I Autumn 2022 - Materials
   📹 18.409 Algorithmic Aspects of Machine Learning Spring 2015 - MIT
 ☠️ML based Computer Vision | Lecs (YouTube)
   📹 CS 198-126: Modern Computer Vision Fall 2022 (UC Berkeley)
   📹 NOC:Deep Learning For Visual Computing - IIT Kharagpur
   📹 Deep Learning for Computer Vision - Michigan

5. NLP
   📹 Specialization - Coursera
   📹 Arabic - Ahmed El Sallab
   📹 Stanford CS224N Lectures - Winter 2021- YouTube
   📹 Stanford XCS224U Lectures - Spring 2021- YouTube
   📹 Introduction to Natural Language Processing in Python
 🔸LLMS What`s Large Language Model?
   📹 Generative AI for Everyone (Andrew Nj) - Coursera🆕
   📹 Generative AI with LLMs
   📹 LLM Foundations
   📹 How ChatGPTs / Transformers work?1 - 2 - 3 overview & Maths behind
   📹 Prompt Engineering | (Ar) If you want to get the most out of LLMs
   📹 LLMOps A Lec going through the entire LLM pipeline

6. Inferential Statistics
   📹 Specialization, 2nd & 3rd courses
   📹 course
7. Bayesian Statistics
   📹 1 - From Concept to Data Analysis
   📹 2 - Techniques and Models
   📹 3 - Mixture Models
8. Model Deployment
   📕 Flask tutorial
   📹 TensorFlow: Data and Deployment Specialization
   📹 Deploy Models with TensorFlow Serving and Flask
   📹 How to Deploy a Machine Learning Model to Google Cloud - Daniel Bourke
   if you`re interested in more deployment methods, search for (FastAPI - Heroku - chitra)

9. MLOps : is a combination of Model Deployment, Model Serving, Model Monitoring, and Model Maintenance.
   🔗 MLOps-zoomcamp
   🔗 MLOps-guide
   📕 Practical MLOps
10. Probabilistic Graphical Models
   📹 Specialization - Coursera
   📹 Spring 2016, University of Utah - YouTube

🌟 Read these books, they will be beneficial to you.
  📖 Bayesian Reasoning and Machine Learning
  📖 The Elements of Statistical Learning
  📖 Pattern Recognition and Machine Learning - Bishop (Advanced)

   Recommended by Eng.Mohamed Hammad.

📌PROJECTS ⏬


   🎥Deena Gergis - End to end Project
   🎥Machine Learning Projects - Youtube
   💻Top 10 Data Science Projects for Beginners
   💻12 Data Science Projects for Beginners and Experts
   💻Data Science Projects & Ideas
   💻Top 310+ Machine Learning Projects for 2023
   💻10 End-to-End Guided Data Science Projects
   🎥Real-World ML Tutorial w/ Scikit Learn
   💻Python Codes in Data Science
   🎥End To End ML Project With Dockers,Github Actions And Deployment
   💻12 free Data Science projects to practice Python and Pandas (resolve interactive online)


📌 Common Tools ⤵️


English Arabic Book
🎥 Git - Udacity 🎥 شخبط وانت مطمن 🚀 📕 Pro Git
📖 w3schools 🎥 almadrasa
🎥 Elzero

📌 More Books :atom::atom: 📌 Check This!

  📕 🔥 12 Free Important Books 🔥
  📕 Mathematics for Machine Learning
  📕 An Introduction to Statistical Learning
  📕 Understanding ML: From Theory to Algorithms
  📕 Probabilistic Machine Learning: An Introduction
  📕 storytelling with data ✔️Important data visualization guide.


📌 Collection of the best Cheat sheets

  1. Importing Data

  2. Pandas

   - (1)    - (2)    - (3)

  1. Matplotlib

  2. Seaborn

  3. Probability

  4. Supervised Learning

  5. Unsupervised Learning

  6. Deep Learning

  7. Machine Learning Tips and Tricks

  8. Probabilities and Statistics

  9. Comprehensive Stanford Master Cheat Sheet

  10. Linear Algebra and Calculus

  11. Data Science Cheat Sheet

  12. Keras Cheat Sheet

  13. Deep Learning with Keras Cheat Sheet

  14. Visual Guide to Neural Network Infrastructures

  15. Skicit-Learn Python Cheat Sheet

  16. Scikit-learn Cheat Sheet: Choosing the Right Estimator

  17. Tensorflow Cheat Sheet

  18. Machine Learning Test Cheat Sheet

  19. Machine Learning Cheat Sheets (Recommended Guide) راجع المواضيع اللي في الشيت دي يا عزيزي وشوف اللي ناقصك


The best way to practice is to take part in competitions.

Competitions will make you even more proficient in Data Science.
When we talk about top data science competitions, Kaggle is one of the most popular platforms for data science. Kaggle has a lot of competitions where you can participate according to your knowledge level.

You can also check these platforms for data science competitions-
- Driven Data
- Codalab
- Iron Viz
- Topcoder
- CrowdANALYTIX Community
- Bitgrit


📓 Data Science Interview Questions: ▶️   - (1)  - (2)  - (3)  - (4)  - (5)  - (6) Arabic Podcast🎧
                    - (7) 30 days of interview preparation📖


🎧Data Science Podcasts: 🎙️
The Best Way to Stay Up-to-Date on the Latest Data Science Trends and Developments

Podcasts About Produced by
Data Science at Home A podcast that provides practical advice and tutorials on data science topics. Greg Linhardt, a data scientist and machine learning engineer at Google AI
Data Stories An interview-driven podcast that tells the stories of data scientists and how they're using their skills to make a difference in the world. Kirill Eremenko, a data scientist and machine learning engineer at Netflix
O'Reilly Data Show A podcast that covers a wide range of data science topics, from machine learning to artificial intelligence to big data. Ben Lorica, the Chief Data Scientist at O'Reilly
Learning Machines 101 Mathematics, statistics, and algorithms that power the machine learning systems that we rely on every day. Richard Golden, a machine learning engineer and researcher at Google AI
Data Engineering Podcast Tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation. Tobias Macey, a data engineer at Netflix
Data Science Mixer A great resource for anyone who wants to learn more about data science and the latest trends in the field. It is also a great way to get inspired by the work of other data scientists and machine learning engineers. Alteryx, a data science and analytics software company
Chai Time Data Science Show Interviews top data scientists, practitioners, and researchers from around the world. Sanyam Bhutani, a data scientist and machine learning engineer at Google AI.
Becoming a Data Scientist Podcast that interviews data scientists about their journey to becoming a data scientist. Renee Teate, a data scientist and machine learning engineer at Google AI.
AI Today Podcast Explores the latest trends and developments in artificial intelligence. Ron Schmelzer and Kathleen Walch
Gradient Dissent A weekly podcast that explores the latest research in machine learning and artificial intelligence. Chris Olah, a machine learning engineer at Google AI
Data Skeptic A podcast that challenges the conventional wisdom in data science and asks tough questions about the ethics and implications of data-driven decision making. Kyle Polich, a data scientist and machine learning engineer
Linear Digressions A podcast that covers a wide range of data science topics, from the technical to the theoretical. Ben Recht and Noah Smith, two machine learning researchers at the University of California, Berkeley
The Data Engineering Show For data engineering and BI practitioners to go beyond theory, and learn from the biggest influencers in tech about their practical day to day data challenges. Eldad Farkash and Benjamin Wagner, who are both data engineering experts with experience at companies like Firebolt and Sisense
DataTalks.Club A weekly online community of data enthusiasts and practitioners that learn from each other and share their knowledge and experiences through meetups, workshops, and a podcast. A rotating cast of data experts
Datacast Top data scientists and practitioners in the data and AI infrastructure space. James Le, who is a data infrastructure expert with experience at companies like Google and Netflix
How to Get an Analytics Job Podcast A great resource for anyone who is interested in a career in analytics. The guests share their insights and advice on how to get started in analytics and how to succeed in an analytics career. John David Ariansen, an analytics agency owner and career coach
The Analytics Power Hour Five awesome people, an occasional guest, and drinks all around tackling the hottest data and analytics topics of the day. Tim Wilson, Michael Helbling, Josh Crowhurst, and Val Kroll. They are all analytics experts from different companies

    👀 Arabic Podcasts??
     :trollface:شايفك ياللي زهقان في المواصلات

   📻Arabic Data Podcast | Spotify by Eng. Kareem Abdelsalam
   📻lإلي البيانات وما بعدها by Eng. Youssef Hosni
   📻Garage Education by Eng. Mostafa Alaa
   📻Data Science بالعربي


📌 Data Analysis Recommendations.
Books (📕 The Data Analysis Workshop & 📕 Head First Data Analysis)
FWD - (The 3 Levels)
Google Data Analytics Professional Certificate
IBM Data Analyst Professional Certificate
Google Advanced Data Analytics Professional Certificate 🆕
Alex The Analyst - YouTube📺
Note: A good knowledge & projects in just Excel, SQL & Power BI / Tableau can bring you great opportunities.
  -excel Excel More Resources: (Arabic 1📹 - Arabic 2📹 - Books 📄 and cheat sheets for revising)

📌 Data Engineering Recommendations.
Books (📕 Fundamentals of Data Engineering & 📕 Designing Data-Intensive Applications)
Arabic Podcast, Starting a Career in Data Engineering.
For Arab, I recommend 2 YouTube Channels: (Garage Education & Big Data بالعربي)
Roadmap 1 - (Recommended)
Roadmap 2
Roadmap 3
IBM Data Engineering Professional Certificate
Note: A good knowledge & projects in SQL, Python, Apache Spark/Hadoop, Data Modeling and [Data Warehouse - {Arabic-Starting from the 7th video} can bring you great opportunities. Start with them then go for the other tools,concepts and cloud platforms.


📁 CV / Resumes 📝

📌 Data & AI Companies in Egypt   -   AI/ML Driven Companies In Egypt


Contact Me 📱


Typing SVG

data-science-roadmap's People

Contributors

moataz-elmesmary avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-science-roadmap's Issues

Missing Link

There is a missing link for the probability textbook, a google drive link.

math

I guess math must be in beginner page? At least before studying stats or probability

Lovely roadmap

Loved how you gathered and organized everything, keep up the amazing work ❤️

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.