mrdbourke / zero-to-mastery-ml Goto Github PK

All course materials for the Zero to Mastery Machine Learning and Data Science course.

Home Page: https://dbourke.link/ZTMmlcourse

Jupyter Notebook 100.00% Shell 0.01%

machine-learning data-science deep-learning

zero-to-mastery-ml's Introduction

Zero to Mastery Machine Learning

Welcome! This repository contains all of the code, notebooks, images and other materials related to the Zero to Mastery Machine Learning Course on Udemy and zerotomastery.io.

Quick links

🎥 Watch the first 10 hours of the course on YouTube.
📚 Read the materials of the course in a beautiful online book.
🤔 Found something wrong with the code? Leave an issue.
❓ Got a question? Post a discussion (see the question template).

Updates

12 September 2024 - Working on updating the materials for 2025, see progress in #105
12 October 2023 - Created an online book version of the course materials, see: https://dev.mrdbourke.com/zero-to-mastery-ml/

What this course focuses on

Create a framework for working through problems (6 step machine learning modelling framework)
Find tools to fit the framework
Targeted practice = use tools and framework steps to work on end-to-end machine learning modelling projects

How this course is structured

Section 1 - Getting your mind and computer ready for machine learning (concepts, computer setup)
Section 2 - Tools for machine learning and data science (pandas, NumPy, Matplotlib, Scikit-Learn)
Section 3 - End-to-end structured data projects (classification and regression)
Section 4 - Neural networks, deep learning and transfer learning with TensorFlow 2.0
Section 5 - Communicating and sharing your work

Student notes

Some students have taken and shared extensive notes on this course, see them below.

If you'd like to submit yours, leave a pull request.

Chester's notes - https://github.com/chesterheng/machinelearning-datascience
Sophia's notes - https://www.rockyourcode.com/tags/udemy-complete-machine-learning-and-data-science-zero-to-mastery/

zero-to-mastery-ml's People

Contributors

Stargazers

Watchers

Forkers

oikwunze saikat1506 luther1014 untmdsprt matus-dubrava sechan9999 nivetha-jayakumar githubcratos fatima-safa novusli andresnino1 blerk8 ankit-bm kasthuribai khushbu-03 sachinpardeshi01 tripleorange mrcongliu liberto-siahaan joem4311 siva-kumar-k ryanb082 rezatriawan cbutlerp sarakainimame nabeelkhan adwaithpanukunnel alphago7 web-nirav jjcatulle parveznawaz miors aunglwinoo-psc maaliv ademata baloou supriyo97 sachinyar arthurcab juanperezarango abdo-learn nathimagubane rachoor ma9shah amgit2 mary-clayton tigerjoy shubhambartwal devopsvenkat1488 ancarter98 kmlspktaa mamun-developer sankarng vijitkamboj nihiinnovation stkline moacybarros arthur-ananyan adityabikram1918 vikasdakshinamurthy freebooterish patrickgco koumik magicandcode vengatesanns manas588 amitkandukuri antharix max2191 sharshenova anamika1214 amuaamir1 theadamhenning strekozza rmit-s3536515-jianning-pan antonioturco79 masoodbabri ak4550 chellarao-chowdary zahrataleb7 nithgovindasivan manish22ba sichistory tchigher ssultanov 0xkoios drdurham neerajs296 andrilablateral vikashhela aberzelius ethan-fang hridoy100 rmlai tiboine nick-socci jgreenwd satwik163 yogeshvishnole dpcoolmufa

zero-to-mastery-ml's Issues

Make predictions on test data batch using the loaded full model

Ml

Update notebook of "end-to-end-heart-disease-classification"

As you write True in X label and Predicted in Y label it should be opposite.

Couldn't get through Discord verification

Hi, I am Rostislav Alpin new student for the course "Complete Machine Learning & Data Science Bootcamp 2023" couldn't go through the verification even after logging in to Discord. Please help to resolve the issue.
Thanks.

Predicting bulldozer price - Wrong Hyperparameters Tuning

In lecture no. 196, you use RandomizedSearchCV with default cv=5 for tuning hyperparameters, i think that's a wrong approach for time series data! because :

It will perform cross validation by randomly splitting the data into 5-folds i.e. losing intrinsic order of data
This will result in poor evaluation of best hyperparameters

What ChatGPT says -

What we can do is use a `TimeSeriesSplit` of sklearn!

You should suggest the correct way of doing this in you course soon!

Typo issue in introduction-to-numpy.ipynb

In ### What unique values are in the array a3?
"unique" is wrongly written in "how to find the unique values in a numpy array"

I got an errors when I was trying to do data preprocessing

I was trying to follow your steps to convert the categorical features in the car_sales dataframe to numbers but got some errors

This is the thread:
https://github.com/scikit-learn/scikit-learn/issues/17741

Resolved Error in sklearn Lesson File - Incorrect Data Splitting

I have resolved an error in the provided sklearn lesson file. Below is my updated code along with the corrected data splitting after preprocessing:

Corrected data splitting after preprocessing

X_train, X_test, y_train, y_test = train_test_split(X_transform_df, y, test_size=0.2, random_state=5)

Fit and score the model

grid_cv = GridSearchCV(estimator=model, param_grid=param, cv=5, verbose=2)
grid_cv.fit(X_train, y_train)

y_preds = grid_cv.predict(X_test)
evaluation_metrics(y_test, y_preds)

You can also access the IPython Notebook containing the complete code and execution results updated-Notebook.

Zero to mastery

regarding joining of ZTM community channel

I'm not able to join the ZTM Discord community channel

Improve the end-to-end-heart-disease classification model score...

I tried CatBoost With tunned hyperparameters it gives me the score of 0.96!! Can we use it in our model??

ZeroToMastery SciKit Learn Exercises

plot_roc_curve no longer exists in SciKitLearn 1.5+ but is used in the exercises; new function is RocCurveDisplay

ML

The plot_roc_curve is not supported in the shown version

Before sklearn 1.2:

from sklearn.metrics import plot_roc_curve
svc_disp = plot_roc_curve(svc, X_test, y_test)
rfc_disp = plot_roc_curve(rfc, X_test, y_test, ax=svc_disp.ax_)
From sklearn 1.2:

from sklearn.metrics import RocCurveDisplay
svc_disp = RocCurveDisplay.from_estimator(svc, X_test, y_test)
rfc_disp = RocCurveDisplay.from_estimator(rfc, X_test, y_test, ax=svc_disp.ax_)

I can't get through to the Discord Community

For advanced users: you should export the conda virtual environment as a resource

I see some dependency issues in the code you put in the ML videos, to avoid that you can provide requirements.txt or yml file as part of resources.

Thanks & Regards
Koteswara

New to git hub

completely new to github dont know how to use, what to do on github for machine learning and data science course, suggest me guide line like a kid need help for 1st time while geeting on github ..

i have take a course on machine learning and data science course..

help need for a begnner on github...

if their is any issue will inform on via email i.e [email protected], contact number :- =+91 8169044393 and +91 993077743.

Zero To Mastery Data Science and Machine Learning Resources.

Pandas Exercises Solution

In[24]

This does not work anymore
car_sales.groupby(["Make"]).mean()

The mean now needs a condition in order for it to work
car_sales.groupby(["Make"]).mean(numeric_only=True)

error section 6 vid 55

So I am coding along with Complete A.I. & Machine learning, data science bootcamp 2024. On video 55 selecting and viewing data with pandas part 2 of section 6. I try to run the code:

car_sales["Price"] = car_sales["Price"].str.replace('[$,.]', '').astype(int)

I run it and get the following error which I cannot seem to find any solution or fix for:

<>:1: SyntaxWarning: invalid escape sequence '$'
<>:1: SyntaxWarning: invalid escape sequence '$'
C:\Users\sweet\AppData\Local\Temp\ipykernel_16004\2312081839.py:1: SyntaxWarning: invalid escape sequence '$'
car_sales["Price"] = car_sales["Price"].str.replace('[$,.]', '').astype(int)

ValueError Traceback (most recent call last)
Cell In[170], line 1
----> 1 car_sales["Price"] = car_sales["Price"].str.replace('[$,.]', '').astype(int)

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\generic.py:6640, in NDFrame.astype(self, dtype, copy, errors)
6634 results = [
6635 ser.astype(dtype, copy=copy, errors=errors) for _, ser in self.items()
6636 ]
6638 else:
6639 # else, only a single dtype is given
-> 6640 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
6641 res = self._constructor_from_mgr(new_data, axes=new_data.axes)
6642 return res.finalize(self, method="astype")

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\internals\managers.py:430, in BaseBlockManager.astype(self, dtype, copy, errors)
427 elif using_copy_on_write():
428 copy = False
--> 430 return self.apply(
431 "astype",
432 dtype=dtype,
433 copy=copy,
434 errors=errors,
435 using_cow=using_copy_on_write(),
436 )

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\internals\managers.py:363, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
361 applied = b.apply(f, **kwargs)
362 else:
--> 363 applied = getattr(b, f)(**kwargs)
364 result_blocks = extend_blocks(applied, result_blocks)
366 out = type(self).from_blocks(result_blocks, self.axes)

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\internals\blocks.py:758, in Block.astype(self, dtype, copy, errors, using_cow, squeeze)
755 raise ValueError("Can not squeeze with more than one column.")
756 values = values[0, :] # type: ignore[call-overload]
--> 758 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
760 new_values = maybe_coerce_values(new_values)
762 refs = None

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\dtypes\astype.py:237, in astype_array_safe(values, dtype, copy, errors)
234 dtype = dtype.numpy_dtype
236 try:
--> 237 new_values = astype_array(values, dtype, copy=copy)
238 except (ValueError, TypeError):
239 # e.g. _astype_nansafe can fail on object-dtype of strings
240 # trying to convert to float
241 if errors == "ignore":

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\dtypes\astype.py:182, in astype_array(values, dtype, copy)
179 values = values.astype(dtype, copy=copy)
181 else:
--> 182 values = _astype_nansafe(values, dtype, copy=copy)
184 # in pandas we don't store numpy str dtypes, so convert to object
185 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\dtypes\astype.py:133, in _astype_nansafe(arr, dtype, copy, skipna)
129 raise ValueError(msg)
131 if copy or arr.dtype == object or dtype == object:
132 # Explicit copy, or required since NumPy can't view from / to object.
--> 133 return arr.astype(dtype, copy=True)
135 return arr.astype(dtype, copy=copy)

ValueError: invalid literal for int() with base 10: '$4,000.00'

Fix Sklearn version upgrades videos/code

Some students are getting different results when running different models in Scikit-Learn.

This is because of different version upgrades (e.g. Scikit-Learn 0.23.0 -> 1.0.0).

Find the videos/code that is showing the worst results and update them with the newer versions.

Predicting bulldozer price - Converting string to category

Instead of getting objects in an order I am getting bound method exception. The output is not as shown in the course. Please solve this and let me know.

value error in keras layer

ValueError: Only instances of keras.Layer can be added to a Sequential model. Received: <tensorflow_hub.keras_layer.KerasLayer object at 0x7d4e252400a0> (of type <class 'tensorflow_hub.keras_layer.KerasLayer'>)

It showing this error for this code

Setup the model layers

model = tf.keras.Sequential([
hub.KerasLayer(MODEL_URL),

Wrap hub.KerasLayer in a Lambda layer

tf.keras.layers.Dense(units=OUTPUT_SHAPE, 
                      activation="softmax")  # Layer 2 (output layer)

])

Cannot participate in Discord

I got this error message:

Your message could not be delivered. This is usually because you don't share a server with the recipient or the recipient is only accepting direct messages from friends. You can see the full list of reasons here: https://support.discord.com/hc/en-us/articles/360060145013

Make predictions on test data batch using the loaded full mode

Issue regarding colliding dog breed name when plotting

Currently, in our visualization code, the dog breed labels sometimes collide with each other, making it difficult to read the breed names clearly. To address this problem and enhance the visual appeal of our graphs, we can implement a solution that prevents the breed names from overlapping.
with overlap

Proposed Solution:
We can make use of the tight_layout() function in our visualization code after visualizing the data batches
no overlap

Discord Community invalid link is invalid

error installing Jupyter

hi, I get this error while trying to install Jupyter through terminal in macOS.
how can I fix it?

thanks

Getting below error while import matplotlib.pyplot as plt

ImportError Traceback (most recent call last)
Cell In[7], line 1
----> 1 import matplotlib.pyplot as plt

File ~\Desktop\smart_project\env\Lib\site-packages\matplotlib_init_.py:161
157 from packaging.version import parse as parse_version
159 # cbook must import matplotlib only within function
160 # definitions, so it is safe to import from it here.
--> 161 from . import _api, _version, cbook, _docstring, rcsetup
162 from matplotlib.cbook import sanitize_sequence
163 from matplotlib._api import MatplotlibDeprecationWarning

File ~\Desktop\smart_project\env\Lib\site-packages\matplotlib\rcsetup.py:27
25 from matplotlib import _api, cbook
26 from matplotlib.cbook import ls_mapper
---> 27 from matplotlib.colors import Colormap, is_color_like
28 from matplotlib._fontconfig_pattern import parse_fontconfig_pattern
29 from matplotlib._enums import JoinStyle, CapStyle

File ~\Desktop\smart_project\env\Lib\site-packages\matplotlib\colors.py:52
49 from numbers import Real
50 import re
---> 52 from PIL import Image
53 from PIL.PngImagePlugin import PngInfo
55 import matplotlib as mpl

File ~\Desktop\smart_project\env\Lib\site-packages\PIL\Image.py:88
79 MAX_IMAGE_PIXELS: int | None = int(1024 * 1024 * 1024 // 4 // 3)
82 try:
83 # If the _imaging C module is not present, Pillow will not load.
84 # Note that other modules should not refer to _imaging directly;
85 # import Image and use the Image.core variable instead.
86 # Also note that Image.core is not a publicly documented interface,
87 # and should be considered private and subject to change.
---> 88 from . import _imaging as core
90 if version != getattr(core, "PILLOW_VERSION", None):
91 msg = (
92 "The _imaging extension was built for another version of Pillow or PIL:\n"
93 f"Core version: {getattr(core, 'PILLOW_VERSION', None)}\n"
94 f"Pillow version: {version}"
95 )

ImportError: DLL load failed while importing _imaging: The specified module could not be found.

the slide correction is itself incorrect - a1 shape is (3,) not (1,3)

at https://academy.zerotomastery.io/courses/complete-machine-learning-and-data-science-bootcamp-2020/lectures/14123223

says the shape of a1 is (1,3).

This is incorrect. It should simply be (3,)

Proof

Can't plot the bar graph

inplace = True in AI/ML course

https://academy.zerotomastery.io/courses/complete-machine-learning-and-data-science-bootcamp-2020/lectures/12693715

In this section we are doing the inplace=True. It works but it hits with Warning

Pandas 1.5.3 causes `ValueError`

Course:
"Complete Machine Learning & Data Science Bootcamp 2023"
Section 12, video 195, "Preprocessing Our Data", In the exercise "Make Predictions on Test Data"

Issue:
ValueError is thrown as demonstrated.

# Manually adjust to have auctioneerID_is_missing column
df_test["auctioneerID_is_missing"] = False
df_test.head()

# Make predictions on the test data
test_preds = ideal_model.predict(df_test)

A ValueError occurs:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[75], line 2
      1 # Make predictions on the test data
----> 2 test_preds = ideal_model.predict(df_test)

File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:981, in ForestRegressor.predict(self, X)
    979 check_is_fitted(self)
    980 # Check data
--> 981 X = self._validate_X_predict(X)
    983 # Assign chunk of trees to jobs
    984 n_jobs, _, _ = _partition_estimators(self.n_estimators, self.n_jobs)

File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:602, in BaseForest._validate_X_predict(self, X)
    599 """
    600 Validate X whenever one tries to predict, apply, predict_proba."""
    601 check_is_fitted(self)
--> 602 X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr", reset=False)
    603 if issparse(X) and (X.indices.dtype != np.intc or X.indptr.dtype != np.intc):
    604     raise ValueError("No support for np.int64 index based sparse matrices")

File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/base.py:548, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
    483 def _validate_data(
    484     self,
    485     X="no_validation",
   (...)
    489     **check_params,
    490 ):
    491     """Validate input data and set or check the `n_features_in_` attribute.
    492 
    493     Parameters
   (...)
    546         validated.
    547     """
--> 548     self._check_feature_names(X, reset=reset)
    550     if y is None and self._get_tags()["requires_y"]:
    551         raise ValueError(
    552             f"This {self.__class__.__name__} estimator "
    553             "requires y to be passed, but the target y is None."
    554         )

File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/base.py:481, in BaseEstimator._check_feature_names(self, X, reset)
    476 if not missing_names and not unexpected_names:
    477     message += (
    478         "Feature names must be in the same order as they were in fit.\n"
    479     )
--> 481 raise ValueError(message)

ValueError: The feature names should match those that were passed during fit.
Feature names must be in the same order as they were in fit.

Tests:
By the error alone, one could assume the error was caused by the addition of the missing column. After a bit of research and troubleshooting, I ran the following tests to determine if they had the same columns, in order.

set(df_test.columns) == set(X_train.columns)
[Output]: True

df_test.columns.tolist() == X_train.columns.tolist()
[Output]: False

sorted(df_test.columns) == sorted(X_train.columns)
[Output]: True

Solution:
To fix the column order, I had to reindex the test data, based on the columns of the train data

df_test = df_test.reindex(X_train.columns, axis=1)

The code was successful, demonstrated by the next following lines in the exercise.

# Make predictions on the test data
test_preds = ideal_model.predict(df_test)
test_preds

which resulted in:

array([17030.00927386, 14355.53565165, 46623.08774286, ...,
       11964.85073347, 16496.71079281, 27119.99044029])

Ml zero to hero

Update Sklearn API `plot_roc_curve` -> `RocCurveDisplay`

Link to notebook changed: https://github.com/mrdbourke/zero-to-mastery-ml/blob/master/section-3-structured-data-projects/end-to-end-heart-disease-classification.ipynb

Error

As of Scikit-Learn 1.2+ the method sklearn.metrics.plot_roc_curve is deprecated in favour of sklearn.metrics.RocCurveDisplay.

How to check your Scikit-Learn version

You can check your Scikit-Learn version with:

import sklearn
sklearn.__version__

How to update your Scikit-Learn version

You can run the following command in your terminal with your Conda (or other) environment active to upgrade Scikit-Learn (the -U stands for "upgrade):

pip install -U scikit-learn

Previous code (this will error if running Scikit-Learn version 1.2+)

# This will error if run in Scikit-Learn version 1.2+
from sklearn.metrics import plot_roc_curve

Also:

# This will error if run in Scikit-Learn version 1.2+
from sklearn.metrics import plot_roc_curve 
plot_roc_curve(gs_log_reg, X_test, y_test);

New code (this will work with Scikit-Learn version 1.2+)

from sklearn.metrics import RocCurveDisplay # new in Scikit-Learn 1.2+

And to plot a ROC curve, note the use of RocCurveDisplay.from_estimator():

# Scikit-Learn 1.2.0 or later
from sklearn.metrics import RocCurveDisplay 

# from_estimator() = use a model to plot ROC curve on data
RocCurveDisplay.from_estimator(estimator=gs_log_reg, 
                               X=X_test, 
                               y=y_test);