Code Monkey home page Code Monkey logo

data-scientist-roadmap2024's Introduction

Data Scientist Roadmap2024

Description

Mastering the tools in this guide — including programming languages, machine learning libraries, and cloud platforms — is crucial for data science success.

I've categorized them based on difficulty:

  • Green text: Mandatory and easiest
  • Yellow text: Mediocre tough
  • Red text: Toughest and for pros (color codes are present here)

Structure:

List of tools, libraries and concepts


Programming Languages:

  • Python
  • R

Frameworks & Libraries:

  • Scikit-learn
  • Numpy
  • Pandas
  • TensorFlow
  • PyTorch
  • XGBoost
  • LightGBM
  • Keras (High-level deep learning API)
  • Jax (High-performance numerical computation)
  • CatBoost (Gradient boosting framework)
  • StaMPS (Scalable Modeling and Partitioning for Statistics)

Cloud Platforms & Services:

  • Docker (Containerization platform)
  • Learn any one of the following:
    • GCP (Google Cloud Platform)
      • Cloud Storage
      • Compute Engine
      • Cloud SQL
      • Cloud Functions
      • BigQuery
      • AI Platform (includes Vertex AI)
    • Azure (Microsoft Azure)
      • Blob Storage
      • Virtual Machines
      • SQL Database / Azure Database for PostgreSQL/MySQL
      • Azure Functions
      • Azure Synapse Analytics
      • Azure Machine Learning
    • AWS (Amazon Web Services)
      • AWS S3
      • AWS EC2
      • AWS RDS
      • AWS Lambda
      • AWS Redshift
      • AWS SageMaker
  • Kubeflow (Cloud-native machine learning platform)
  • Kubernetes (Container orchestration platform)

Data Tools & Libraries:

  • SQL (including OLAP & OLTP variations)
    • SQLBOLT, a simple & interactive. [2H]
  • Pandas
  • Elasticsearch
  • Dask (Parallel computing library for big data)
  • Spark (Large-scale data processing framework)
  • Airbyte (Open-source data integration platform)

Web Development Frameworks:

  • FastAPI
  • Uvicorn (likely mentioned in conjunction with FastAPI)
  • Streamlit (Machine learning app development framework)

Machine Learning Concepts:

  • Supervised Learning
    • Regression
    • Classification
  • Unsupervised Learning
    • Clustering
    • Dimensionality Reduction
  • Recommendation Systems
  • Time Series Forecasting
  • Natural Language Processing (NLP)
    • Text Mining
    • Natural Language Understanding (NLU)
      • Sentiment Analysis
      • Named Entity Recognition (NER)
      • Question Answering (QA)
    • Natural Language Generation (NLG)
  • Deep Learning Techniques
    • Convolutional Neural Networks (CNNs)
    • Long Short-Term Memory networks (LSTMs)
    • Generative AI
  • Reinforcement Learning
  • Bayesian Optimization
  • Statistics

DevOps & MLOps Tools:

  • Airflow (Workflow orchestration tool)
  • MLFlow (Machine learning lifecycle management)
  • Prometheus (Monitoring and alerting system)
  • Grafana (Data visualization and analytics tool)
  • Git version control (e.g., GitLab, GitHub)

Data Visualization Tools:

  • Tableau
  • Matplotlib (Python plotting library)
  • Seaborn (Statistical data visualization library built on top of Matplotlib)
  • Power BI (Microsoft business intelligence platform)

Other:

  • ETL (Extract, Transform, Load) processes

  • Optimisation algorithms (can be broader than just machine learning)

  • Distributed training

  • Curse of dimensionality

  • Financial modeling

    • MIT Course: Mathematics With Applications In Finance
      • The purpose of the class is to expose undergraduate and graduate students to the mathematical concepts and techniques used in the financial industry. Mathematics lectures are mixed with lectures illustrating the corresponding application in the financial industry.
  • LLMs

    • Lang-chain Agents
    • Prompt engineering
    • RAG
    • Fine-tuning

Interviews

Work in progress:

  1. Updating the pytorch material with notebooks containing code & concepts.(3/20 done)

data-scientist-roadmap2024's People

Contributors

xandie985 avatar

Stargazers

Niklas Lang avatar  avatar Jyoti Yadav avatar Joseph Ariel Christopher Teja avatar Rupinder Singh avatar Greg Cooper avatar  avatar Linh Dang avatar  avatar kat avatar Marcos Romero Lamas avatar  avatar  avatar Drew Bernard avatar Krishna soren avatar  avatar  avatar Tyler Cox avatar Ryan Ellis avatar Raphael lins avatar Murat avatar  avatar Patricio Villanueva avatar Catherine Rivas avatar Davide Fiocco avatar  avatar Huy Nguyen avatar Pablo Hernández avatar Parker Sweeney avatar Cheyne LeVesseur, PhD avatar  avatar  avatar Rayhan Momin avatar Max Schulkin avatar  avatar Yazcodes avatar  avatar Alexander avatar Jaehyun Lee avatar Maarten van Schaik avatar  avatar  avatar Dominique Held avatar Alexander Kyng avatar Pranjal Rawat avatar Clavin D'souza avatar Justin Babu avatar  avatar M avatar Alejandro Hohmann avatar Dennis K avatar Raj Sandhu avatar Aman Mehmood avatar  avatar  avatar  avatar  avatar Subrahmanyam Siddharth Puranam avatar Samuel Oyediran avatar  avatar Aditi Pal avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.