Code Monkey home page Code Monkey logo

zero-to-mastery-ml's Introduction

Zero to Mastery Machine Learning

Binder Colab

Welcome! This repository contains all of the code, notebooks, images and other materials related to the Zero to Mastery Machine Learning Course on Udemy and zerotomastery.io.

Quick links

Updates

  • 12 October 2023 - Created an online book version of the course materials, see: https://dev.mrdbourke.com/zero-to-mastery-ml/ (currently a work in progress)
  • 7 Sep 2023 onward - Working on updating the materials for 2024, see the progress in #63
  • 25 Aug 2023 - Update section 3 end-to-end bulldozer regression notebook for Scikit-Learn 1.3+ (column order for predictions should match column order for training). See #62 for more.

What this course focuses on

  1. Create a framework for working through problems (6 step machine learning modelling framework)
  2. Find tools to fit the framework
  3. Targeted practice = use tools and framework steps to work on end-to-end machine learning modelling projects

How this course is structured

  • Section 1 - Getting your mind and computer ready for machine learning (concepts, computer setup)
  • Section 2 - Tools for machine learning and data science (pandas, NumPy, Matplotlib, Scikit-Learn)
  • Section 3 - End-to-end structured data projects (classification and regression)
  • Section 4 - Neural networks, deep learning and transfer learning with TensorFlow 2.0
  • Section 5 - Communicating and sharing your work

Student notes

Some students have taken and shared extensive notes on this course, see them below.

If you'd like to submit yours, leave a pull request.

  1. Chester's notes - https://github.com/chesterheng/machinelearning-datascience
  2. Sophia's notes - https://www.rockyourcode.com/tags/udemy-complete-machine-learning-and-data-science-zero-to-mastery/

zero-to-mastery-ml's People

Contributors

anishchand99 avatar arpadikuma avatar djreyab avatar dmathewwws avatar imadsai avatar itzzmesid avatar jaintj95 avatar magicandcode avatar majidshakeelshawl avatar matus-dubrava avatar mauhpr avatar mrdbourke avatar mtosity avatar mziolkowski21 avatar pavanskipo avatar rodrez avatar rzmk avatar saketmunda avatar suzukikijai avatar tiboine avatar wloszynski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zero-to-mastery-ml's Issues

Update Sklearn API `plot_roc_curve` -> `RocCurveDisplay`

Link to notebook changed: https://github.com/mrdbourke/zero-to-mastery-ml/blob/master/section-3-structured-data-projects/end-to-end-heart-disease-classification.ipynb

Error

As of Scikit-Learn 1.2+ the method sklearn.metrics.plot_roc_curve is deprecated in favour of sklearn.metrics.RocCurveDisplay.

How to check your Scikit-Learn version

You can check your Scikit-Learn version with:

import sklearn
sklearn.__version__

How to update your Scikit-Learn version

You can run the following command in your terminal with your Conda (or other) environment active to upgrade Scikit-Learn (the -U stands for "upgrade):

pip install -U scikit-learn

Previous code (this will error if running Scikit-Learn version 1.2+)

# This will error if run in Scikit-Learn version 1.2+
from sklearn.metrics import plot_roc_curve

Also:

# This will error if run in Scikit-Learn version 1.2+
from sklearn.metrics import plot_roc_curve 
plot_roc_curve(gs_log_reg, X_test, y_test);

New code (this will work with Scikit-Learn version 1.2+)

from sklearn.metrics import RocCurveDisplay # new in Scikit-Learn 1.2+

And to plot a ROC curve, note the use of RocCurveDisplay.from_estimator():

# Scikit-Learn 1.2.0 or later
from sklearn.metrics import RocCurveDisplay 

# from_estimator() = use a model to plot ROC curve on data
RocCurveDisplay.from_estimator(estimator=gs_log_reg, 
                               X=X_test, 
                               y=y_test); 

Screenshot 2023-02-23 at 4 31 34 pm

Fix Sklearn version upgrades videos/code

Some students are getting different results when running different models in Scikit-Learn.

This is because of different version upgrades (e.g. Scikit-Learn 0.23.0 -> 1.0.0).

Find the videos/code that is showing the worst results and update them with the newer versions.

error section 6 vid 55

So I am coding along with Complete A.I. & Machine learning, data science bootcamp 2024. On video 55 selecting and viewing data with pandas part 2 of section 6. I try to run the code:

car_sales["Price"] = car_sales["Price"].str.replace('[$,.]', '').astype(int)

I run it and get the following error which I cannot seem to find any solution or fix for:

<>:1: SyntaxWarning: invalid escape sequence '$'
<>:1: SyntaxWarning: invalid escape sequence '$'
C:\Users\sweet\AppData\Local\Temp\ipykernel_16004\2312081839.py:1: SyntaxWarning: invalid escape sequence '$'
car_sales["Price"] = car_sales["Price"].str.replace('[$,.]', '').astype(int)


ValueError Traceback (most recent call last)
Cell In[170], line 1
----> 1 car_sales["Price"] = car_sales["Price"].str.replace('[$,.]', '').astype(int)

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\generic.py:6640, in NDFrame.astype(self, dtype, copy, errors)
6634 results = [
6635 ser.astype(dtype, copy=copy, errors=errors) for _, ser in self.items()
6636 ]
6638 else:
6639 # else, only a single dtype is given
-> 6640 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
6641 res = self._constructor_from_mgr(new_data, axes=new_data.axes)
6642 return res.finalize(self, method="astype")

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\internals\managers.py:430, in BaseBlockManager.astype(self, dtype, copy, errors)
427 elif using_copy_on_write():
428 copy = False
--> 430 return self.apply(
431 "astype",
432 dtype=dtype,
433 copy=copy,
434 errors=errors,
435 using_cow=using_copy_on_write(),
436 )

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\internals\managers.py:363, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
361 applied = b.apply(f, **kwargs)
362 else:
--> 363 applied = getattr(b, f)(**kwargs)
364 result_blocks = extend_blocks(applied, result_blocks)
366 out = type(self).from_blocks(result_blocks, self.axes)

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\internals\blocks.py:758, in Block.astype(self, dtype, copy, errors, using_cow, squeeze)
755 raise ValueError("Can not squeeze with more than one column.")
756 values = values[0, :] # type: ignore[call-overload]
--> 758 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
760 new_values = maybe_coerce_values(new_values)
762 refs = None

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\dtypes\astype.py:237, in astype_array_safe(values, dtype, copy, errors)
234 dtype = dtype.numpy_dtype
236 try:
--> 237 new_values = astype_array(values, dtype, copy=copy)
238 except (ValueError, TypeError):
239 # e.g. _astype_nansafe can fail on object-dtype of strings
240 # trying to convert to float
241 if errors == "ignore":

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\dtypes\astype.py:182, in astype_array(values, dtype, copy)
179 values = values.astype(dtype, copy=copy)
181 else:
--> 182 values = _astype_nansafe(values, dtype, copy=copy)
184 # in pandas we don't store numpy str dtypes, so convert to object
185 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\dtypes\astype.py:133, in _astype_nansafe(arr, dtype, copy, skipna)
129 raise ValueError(msg)
131 if copy or arr.dtype == object or dtype == object:
132 # Explicit copy, or required since NumPy can't view from / to object.
--> 133 return arr.astype(dtype, copy=True)
135 return arr.astype(dtype, copy=copy)

ValueError: invalid literal for int() with base 10: '$4,000.00'

The plot_roc_curve is not supported in the shown version

Before sklearn 1.2:

from sklearn.metrics import plot_roc_curve
svc_disp = plot_roc_curve(svc, X_test, y_test)
rfc_disp = plot_roc_curve(rfc, X_test, y_test, ax=svc_disp.ax_)
From sklearn 1.2:

from sklearn.metrics import RocCurveDisplay
svc_disp = RocCurveDisplay.from_estimator(svc, X_test, y_test)
rfc_disp = RocCurveDisplay.from_estimator(rfc, X_test, y_test, ax=svc_disp.ax_)

Resolved Error in sklearn Lesson File - Incorrect Data Splitting

I have resolved an error in the provided sklearn lesson file. Below is my updated code along with the corrected data splitting after preprocessing:

Corrected data splitting after preprocessing

X_train, X_test, y_train, y_test = train_test_split(X_transform_df, y, test_size=0.2, random_state=5)

Fit and score the model

grid_cv = GridSearchCV(estimator=model, param_grid=param, cv=5, verbose=2)
grid_cv.fit(X_train, y_train)
y_preds = grid_cv.predict(X_test)
evaluation_metrics(y_test, y_preds)

You can also access the IPython Notebook containing the complete code and execution results updated-Notebook.

Pandas Exercises Solution

In[24]

This does not work anymore
car_sales.groupby(["Make"]).mean()

The mean now needs a condition in order for it to work
car_sales.groupby(["Make"]).mean(numeric_only=True)

error installing Jupyter

hi, I get this error while trying to install Jupyter through terminal in macOS.
how can I fix it?

thanks

Screenshot 2023-12-04 at 11 22 13

New to git hub

completely new to github dont know how to use, what to do on github for machine learning and data science course, suggest me guide line like a kid need help for 1st time while geeting on github ..

i have take a course on machine learning and data science course..

help need for a begnner on github...

if their is any issue will inform on via email i.e [email protected], contact number :- =+91 8169044393 and +91 993077743.

Predicting bulldozer price - Wrong Hyperparameters Tuning

In lecture no. 196, you use RandomizedSearchCV with default cv=5 for tuning hyperparameters, i think that's a wrong approach for time series data! because :

  • It will perform cross validation by randomly splitting the data into 5-folds i.e. losing intrinsic order of data
  • This will result in poor evaluation of best hyperparameters

What ChatGPT says -

image

What we can do is use a TimeSeriesSplit of sklearn!

sphx_glr_plot_cv_indices_013

You should suggest the correct way of doing this in you course soon!

Pandas 1.5.3 causes `ValueError`

Course:
"Complete Machine Learning & Data Science Bootcamp 2023"
Section 12, video 195, "Preprocessing Our Data", In the exercise "Make Predictions on Test Data"

Issue:
ValueError is thrown as demonstrated.

# Manually adjust to have auctioneerID_is_missing column
df_test["auctioneerID_is_missing"] = False
df_test.head()

# Make predictions on the test data
test_preds = ideal_model.predict(df_test)

A ValueError occurs:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[75], line 2
      1 # Make predictions on the test data
----> 2 test_preds = ideal_model.predict(df_test)

File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:981, in ForestRegressor.predict(self, X)
    979 check_is_fitted(self)
    980 # Check data
--> 981 X = self._validate_X_predict(X)
    983 # Assign chunk of trees to jobs
    984 n_jobs, _, _ = _partition_estimators(self.n_estimators, self.n_jobs)

File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:602, in BaseForest._validate_X_predict(self, X)
    599 """
    600 Validate X whenever one tries to predict, apply, predict_proba."""
    601 check_is_fitted(self)
--> 602 X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr", reset=False)
    603 if issparse(X) and (X.indices.dtype != np.intc or X.indptr.dtype != np.intc):
    604     raise ValueError("No support for np.int64 index based sparse matrices")

File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/base.py:548, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
    483 def _validate_data(
    484     self,
    485     X="no_validation",
   (...)
    489     **check_params,
    490 ):
    491     """Validate input data and set or check the `n_features_in_` attribute.
    492 
    493     Parameters
   (...)
    546         validated.
    547     """
--> 548     self._check_feature_names(X, reset=reset)
    550     if y is None and self._get_tags()["requires_y"]:
    551         raise ValueError(
    552             f"This {self.__class__.__name__} estimator "
    553             "requires y to be passed, but the target y is None."
    554         )

File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/base.py:481, in BaseEstimator._check_feature_names(self, X, reset)
    476 if not missing_names and not unexpected_names:
    477     message += (
    478         "Feature names must be in the same order as they were in fit.\n"
    479     )
--> 481 raise ValueError(message)

ValueError: The feature names should match those that were passed during fit.
Feature names must be in the same order as they were in fit.

Tests:
By the error alone, one could assume the error was caused by the addition of the missing column. After a bit of research and troubleshooting, I ran the following tests to determine if they had the same columns, in order.

set(df_test.columns) == set(X_train.columns)
[Output]: True

df_test.columns.tolist() == X_train.columns.tolist()
[Output]: False

sorted(df_test.columns) == sorted(X_train.columns)
[Output]: True

Solution:
To fix the column order, I had to reindex the test data, based on the columns of the train data

df_test = df_test.reindex(X_train.columns, axis=1)

The code was successful, demonstrated by the next following lines in the exercise.

# Make predictions on the test data
test_preds = ideal_model.predict(df_test)
test_preds

which resulted in:

array([17030.00927386, 14355.53565165, 46623.08774286, ...,
       11964.85073347, 16496.71079281, 27119.99044029])

Issue regarding colliding dog breed name when plotting

Currently, in our visualization code, the dog breed labels sometimes collide with each other, making it difficult to read the breed names clearly. To address this problem and enhance the visual appeal of our graphs, we can implement a solution that prevents the breed names from overlapping.
with overlap
github-image-error-code
error-code

Proposed Solution:
We can make use of the tight_layout() function in our visualization code after visualizing the data batches
no overlap
github-image-no-error
Screenshot 2023-07-18 102853

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.