ageron / handson-ml3 Goto Github PK

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

License: Apache License 2.0

Jupyter Notebook 99.96% Shell 0.01% Dockerfile 0.01% Makefile 0.01% Python 0.03%

handson-ml3's Introduction

Machine Learning Notebooks, 3rd edition

This project aims at teaching you the fundamentals of Machine Learning in python. It contains the example code and solutions to the exercises in the third edition of my O'Reilly book Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow (3rd edition):

Note: If you are looking for the second edition notebooks, check out ageron/handson-ml2. For the first edition, see ageron/handson-ml.

Quick Start

Want to play with these notebooks online without having to install anything?

(recommended)

⚠ Colab provides a temporary environment: anything you do will be deleted after a while, so make sure you download any data you care about.

Other services may work as well, but I have not fully tested them:

Just want to quickly look at some notebooks, without executing any code?

github.com's notebook viewer also works but it's not ideal: it's slower, the math equations are not always displayed correctly, and large notebooks often fail to open.

Want to run this project using a Docker image?

Read the Docker instructions.

Want to install this project on your own machine?

Start by installing Anaconda (or Miniconda), git, and if you have a TensorFlow-compatible GPU, install the GPU driver, as well as the appropriate version of CUDA and cuDNN (see TensorFlow's documentation for more details).

Next, clone this project by opening a terminal and typing the following commands (do not type the first $ signs on each line, they just indicate that these are terminal commands):

$ git clone https://github.com/ageron/handson-ml3.git
$ cd handson-ml3

Next, run the following commands:

$ conda env create -f environment.yml
$ conda activate homl3
$ python -m ipykernel install --user --name=python3

Finally, start Jupyter:

$ jupyter notebook

If you need further instructions, read the detailed installation instructions.

FAQ

Which Python version should I use?

I recommend Python 3.10. If you follow the installation instructions above, that's the version you will get. Any version ≥3.7 should work as well.

I'm getting an error when I call load_housing_data()

If you're getting an HTTP error, make sure you're running the exact same code as in the notebook (copy/paste it if needed). If the problem persists, please check your network configuration. If it's an SSL error, see the next question.

I'm getting an SSL error on MacOSX

You probably need to install the SSL certificates (see this StackOverflow question). If you downloaded Python from the official website, then run /Applications/Python\ 3.10/Install\ Certificates.command in a terminal (change 3.10 to whatever version you installed). If you installed Python using MacPorts, run sudo port install curl-ca-bundle in a terminal.

I've installed this project locally. How do I update it to the latest version?

See INSTALL.md

How do I update my Python libraries to the latest versions, when using Anaconda?

See INSTALL.md

Contributors

I would like to thank everyone who contributed to this project, either by providing useful feedback, filing issues or submitting Pull Requests. Special thanks go to Haesun Park and Ian Beauregard who reviewed every notebook and submitted many PRs, including help on some of the exercise solutions. Thanks as well to Steven Bunkley and Ziembla who created the docker directory, and to github user SuperYorio who helped on some exercise solutions. Thanks a lot to Victor Khaustov who submitted plenty of excellent PRs, fixing many errors. And lastly, thanks to Google ML Developer Programs team who supported this work by providing Google Cloud Credit.

handson-ml3's People

Contributors

Stargazers

Watchers

Forkers

qmnxxy gongqingyi allensmile emilio-garcia-ie mohamed-dhouioui micseb ahmed-gamal97 weiwongfaye pchoengtawee yiranh bigfito chandrashekarkishor ocipoc peterleong hailiwang jhkim06 nisarpro efwfe dylanhogg sczhai alvarosimonmerino miladshiraniucb sparsh-ai fredymad hkhdair kolheabhay ydeh22 vusaleyvaz playfloor hitoshikumagai qhuzhl maninderpreetpuri mrslnv jonathan-marin-pavia riabovk learn2free tushark26 juwiragiye giader mbrukman vi3itor henrywu2019 balakishan77 haarstud thimotyb nikhiljose-ai kirenz natthanaphop-isa kurhula atulsharma89 rajasekhar2226 ttb-git demonsive2 azurecloudmonk yayachenyi anhnguyendepocen tarikkaankoc semaahmed soumyardas breathmark cpusummer-wdn rasmi shij316 andresrogers juliom86 mustafabozkaya telixia ashleykeung2021 scnmej luongvy oluwasemilorebadejo saurabh2086 ai-ml-next-steps moisestohias timills nirdigital sayaniboral mathdebater truongnhannguyen thebigbrother hamplustech marymlucas garrettjenkinson michelangelo367 akilbo eslamelsheikh crimsonkin nbsantos kirtast huning2009 cyclopian2022 python-repository-hub karlmarxk motta99 pedrocorma persistence18 kfarzad bartashevich-igor bioinfomonzino aneyewitness

handson-ml3's Issues

pydot requirement missing for keras "plot_model" functionality

Cloned repo and created new env from the environment.yml file today. When I ran through notebook, "10_neural_nets_with_keras", the, "tf.keras.utils.plot_model(...)" cell failed with missing pydot requirement. Cell is marked as "extra code", and overall this doesn't really impact running through the notebook.

[BUG] Fig 4-15 generated from the code from Github does not have the label for 0.30 probability boundary line (red)

Code from Github that generates Fig 4-15:

It doe not have a label 0.30 as it is in book for the 0.30 probability boundary line (which is red).

Multilabel Classification FutureWarning with KNeighborsClassifier.predict()

In the Multilabel Classification section, the following code generates a warning:

knn_clf.predict([some_digit])

The warning is:

[c:\tools\Anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228](file:///C:/tools/Anaconda3/lib/site-packages/sklearn/neighbors/_classification.py:228): FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning.
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)

I verified I have the latest repo code.

Versions:

OS: Windoows 11
Python: 3.9.13
Scikit-Learn: 1.0.2

[BUG] Chapter 4 - the regularization lack of data in the book

Thanks for helping us improve this project!

Before you create this issue
Please make sure you are using the latest updated code and libraries: see https://github.com/ageron/handson-ml3/blob/main/INSTALL.md#update-this-project-and-its-libraries

Also please make sure to read the FAQ (https://github.com/ageron/handson-ml3#faq) and search for existing issues (both open and closed), as your question may already have been answered: https://github.com/ageron/handson-ml3/issues

Describe the bug
Hi, on Chapter 4 - page 156 and 157 of the book the new X and y are not defined which makes the fit result different from the book
they are defined in the Colab of the book but as extra code.

Screenshots
If applicable, add screenshots to help explain your problem.
this is the result if we go through the path of the book:

This is the part in Colab defining the new X and y

which result in these answers :

Thanks for the helpful and great book of yours:)

`target_col` not used in `to_seq2seq_dataset` in Chapter 15.

The code for to_seq2seq_dataset is this

def to_seq2seq_dataset(series, seq_length=56, ahead=14, target_col=1,
                       batch_size=32, shuffle=False, seed=None):
    ds = to_windows(tf.data.Dataset.from_tensor_slices(series), ahead + 1)
    ds = to_windows(ds, seq_length).map(lambda S: (S[:, 0], S[:, 1:, 1]))
    if shuffle:
        ds = ds.shuffle(8 * batch_size, seed=seed)
    return ds.batch(batch_size)

but this doesn't seem to use the value of target_col. Should it not be

ds = to_windows(ds, seq_length).map(lambda S: (S[:, 0], S[:, 1:, target_col]))

on the second line?

Problem creating HOML3 environment on M2

The problem I encountered seems identical to the post #35, but I am trying to create it on MacBook Air M2 not Windows. However, I followed the suggestions in that post:

Comment out the line "- gym[classic_control,atari,accept-rom-license]~=0.26.1" in environment.yml, then run "conda env create -f environment.yml" again. It went through smoothly without a problem.
conda activate homl3
Run 'pip install "gym[classic_control]~=0.26.2" ' It went throgh smoothly without a problem.
Run 'pip install "gym[atari]~=0.26.2" ' It went throgh smoothly without a problem..
Run 'pip install "gym[atari,accept-rom-license]~=0.26.2" ' It does not go through, with the following error messages:

Building wheel for AutoROM.accept-rom-license (pyproject.toml) ... error
error: subprocess-exited-with-error

× Building wheel for AutoROM.accept-rom-license (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [467 lines of output]
.........
icense_2f193bcf7ba24e08abf0d13bb4ca9490/AutoROM.py", line 173, in torrent_tar
raise RuntimeError(
RuntimeError: Terminating attempt to download ROMs after 180 seconds, this has failed, please report it.
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for AutoROM.accept-rom-license
Failed to build AutoROM.accept-rom-license
ERROR: Could not build wheels for AutoROM.accept-rom-license, which is required to install pyproject.toml-based projects

Thank you in advance for any help.

The complete mesages are appended below (it is too long so not convenient to take a screenshot):

(homl3) lxl@Black-Tiger handson-ml3 % pip install "gym[atari,accept-rom-license]=0.26.2"
Requirement already satisfied: gym[accept-rom-license,atari]=0.26.2 in /Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages (0.26.2)
Requirement already satisfied: cloudpickle>=1.2.0 in /Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages (from gym[accept-rom-license,atari]=0.26.2) (2.2.1)
Requirement already satisfied: numpy>=1.18.0 in /Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages (from gym[accept-rom-license,atari]=0.26.2) (1.23.5)
Requirement already satisfied: gym-notices>=0.0.4 in /Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages (from gym[accept-rom-license,atari]=0.26.2) (0.0.8)
Requirement already satisfied: ale-py=0.8.0 in /Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages (from gym[accept-rom-license,atari]=0.26.2) (0.8.1)
Collecting autorom[accept-rom-license]=0.4.2
Using cached AutoROM-0.4.2-py3-none-any.whl (16 kB)
Requirement already satisfied: importlib-resources in /Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages (from ale-py~=0.8.0->gym[accept-rom-license,atari]=0.26.2) (5.12.0)
Requirement already satisfied: typing-extensions in /Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages (from ale-py=0.8.0->gym[accept-rom-license,atari]=0.26.2) (4.4.0)
Requirement already satisfied: requests in /Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages (from autorom[accept-rom-license]=0.4.2->gym[accept-rom-license,atari]=0.26.2) (2.28.2)
Requirement already satisfied: tqdm in /Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages (from autorom[accept-rom-license]=0.4.2->gym[accept-rom-license,atari]=0.26.2) (4.64.1)
Requirement already satisfied: click in /Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages (from autorom[accept-rom-license]=0.4.2->gym[accept-rom-license,atari]=0.26.2) (8.1.3)
Collecting AutoROM.accept-rom-license
Using cached AutoROM.accept-rom-license-0.5.5.tar.gz (22 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting libtorrent
Using cached libtorrent-2.0.7-cp310-cp310-macosx_10_9_x86_64.whl (5.6 MB)
Requirement already satisfied: certifi>=2017.4.17 in /Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages (from requests->autorom[accept-rom-license]=0.4.2->gym[accept-rom-license,atari]=0.26.2) (2022.12.7)
Requirement already satisfied: idna<4,>=2.5 in /Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages (from requests->autorom[accept-rom-license]=0.4.2->gym[accept-rom-license,atari]=0.26.2) (3.4)
Requirement already satisfied: charset-normalizer<4,>=2 in /Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages (from requests->autorom[accept-rom-license]=0.4.2->gym[accept-rom-license,atari]=0.26.2) (2.1.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages (from requests->autorom[accept-rom-license]=0.4.2->gym[accept-rom-license,atari]~=0.26.2) (1.26.14)
Building wheels for collected packages: AutoROM.accept-rom-license
Building wheel for AutoROM.accept-rom-license (pyproject.toml) ... error
error: subprocess-exited-with-error

× Building wheel for AutoROM.accept-rom-license (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [467 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying AutoROM.py -> build/lib
installing to build/bdist.macosx-10.9-x86_64/wheel
running install
running install_lib
creating build/bdist.macosx-10.9-x86_64
creating build/bdist.macosx-10.9-x86_64/wheel
copying build/lib/AutoROM.py -> build/bdist.macosx-10.9-x86_64/wheel
running install_egg_info
running egg_info
writing AutoROM.accept_rom_license.egg-info/PKG-INFO
writing dependency_links to AutoROM.accept_rom_license.egg-info/dependency_links.txt
writing requirements to AutoROM.accept_rom_license.egg-info/requires.txt
writing top-level names to AutoROM.accept_rom_license.egg-info/top_level.txt
reading manifest file 'AutoROM.accept_rom_license.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE.txt'
writing manifest file 'AutoROM.accept_rom_license.egg-info/SOURCES.txt'
Copying AutoROM.accept_rom_license.egg-info to build/bdist.macosx-10.9-x86_64/wheel/AutoROM.accept_rom_license-0.5.5-py3.10.egg-info
running install_scripts
time=0 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=0
total payload download=0
total failed bytes=0
time=5 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=0
total payload download=0
total failed bytes=0
time=10 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=0
total payload download=0
total failed bytes=0
time=15 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=0
total payload download=0
total failed bytes=0
time=20 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=25 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=30 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=35 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=40 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=45 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=50 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=55 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=60 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=65 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=70 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=75 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=80 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=85 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=90 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=95 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=100 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=105 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=110 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=115 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=120 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=125 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=130 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=135 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=140 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=145 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=150 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=155 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=160 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=165 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=170 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
time=175 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=180 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=185 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=190 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=195 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=200 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=205 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=210 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=215 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=220 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=225 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=230 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=235 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=240 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=245 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=250 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=255 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=260 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=265 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=270 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=275 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=280 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=285 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=290 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=295 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=300 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=305 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=310 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=315 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=320 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=325 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=330 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=335 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=340 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=345 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=350 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
Have been attempting to download for more than 180 seconds, consider terminating?
time=355 / 180 seconds - Trying to download atari roms
current status=downloading metadata (downloading_metadata)
total downloaded bytes=281
total payload download=0
total failed bytes=0
AutoROM will download the Atari 2600 ROMs.
They will be installed to:
/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-install-pp_3ry5t/autorom-accept-rom-license_2f193bcf7ba24e08abf0d13bb4ca9490/build/bdist.macosx-10.9-x86_64/wheel/AutoROM/roms

  Existing ROMs will be overwritten.
  Traceback (most recent call last):
    File "/Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/Users/lxl/opt/anaconda3/envs/homl3/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
      return _build_backend().build_wheel(wheel_directory, config_settings,
    File "/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-build-env-ce1n6e7j/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 413, in build_wheel
      return self._build_with_temp_dir(['bdist_wheel'], '.whl',
    File "/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-build-env-ce1n6e7j/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 398, in _build_with_temp_dir
      self.run_setup()
    File "/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-build-env-ce1n6e7j/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 484, in run_setup
      super(_BuildMetaLegacyBackend,
    File "/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-build-env-ce1n6e7j/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 335, in run_setup
      exec(code, locals())
    File "<string>", line 18, in <module>
    File "/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-build-env-ce1n6e7j/overlay/lib/python3.10/site-packages/setuptools/__init__.py", line 108, in setup
      return distutils.core.setup(**attrs)
    File "/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-build-env-ce1n6e7j/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
    File "/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-build-env-ce1n6e7j/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-build-env-ce1n6e7j/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-build-env-ce1n6e7j/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 1221, in run_command
      super().run_command(command)
    File "/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-build-env-ce1n6e7j/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-build-env-ce1n6e7j/overlay/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 360, in run
      self.run_command("install")
    File "/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-build-env-ce1n6e7j/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-build-env-ce1n6e7j/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 1221, in run_command
      super().run_command(command)
    File "/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-build-env-ce1n6e7j/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "<string>", line 15, in run
    File "/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-install-pp_3ry5t/autorom-accept-rom-license_2f193bcf7ba24e08abf0d13bb4ca9490/AutoROM.py", line 350, in main
      with open(torrent_tar() if source_file is None else source_file, "rb") as fh:
    File "/private/var/folders/s9/zk8r4hnd1p9dh9tf9s0xytfw0000gn/T/pip-install-pp_3ry5t/autorom-accept-rom-license_2f193bcf7ba24e08abf0d13bb4ca9490/AutoROM.py", line 173, in torrent_tar
      raise RuntimeError(
  RuntimeError: Terminating attempt to download ROMs after 180 seconds, this has failed, please report it.
  [end of output]

[QUESTION]

I'm a bit confused about the "Encoding Categorical Features Using Embeddings" on Chapter 13,

There you provided an example about encoding a categorical data, and said that 50,000 one-hot encoded categories is equivalent to around 100 embedding dimensions...
But the sample you provided isn't clear enough to understand the comparison...

The code you provided just output an Tensor shape=(3, 2) of the categories, and what else? (I expected a complete Model that outputs some real training data with compile and fit, so we can see the difference).

Can you provide a link to a more clear comparison between one-hot encoding and Embeddings?
Where should I use one or the other?

Thanks for sharing your knowledge with 3º version of the book, very impressive!

[QUESTION] why in chapter 4 code example in colab A is not Identity matrix?

Thanks for helping us improve this project!

Describe what is unclear to you
Hi
for the closed form of regularization in regression Equation 4-9 is discussed in the book
it is mentioned that A is an identity matrix
there is one part in the Colab of the chapter for the closed form but A is defined as [0, 0][0, 1].
Is there any special reason for that or am I missing something
I saw the np.ones is used for X_b
but I still don't get why A is not defined as [1, 0][0,1]

To Reproduce
If the question relates to a specific piece of code, please copy the code that fails here, using code blocks like this:

alpha = 0.1
A = np.array([[0., 0.], [0., 1.]])
X_b = np.c_[np.ones(m), X]
np.linalg.inv(X_b.T @ X_b + alpha * A) @ X_b.T @ y

Expected behavior
If applicable, a clear and concise description of what you expected to happen.

Screenshots

[BUG] homl.info/colab3 URL is down

The URL from the book (https://homl.info/colab3) is down

[IDEA] Requesting to Include Graph Neural Networks

The field of graph representation learning has grown at an incredible and sometimes unwieldy pace over the past few years, and a lot of new algorithms and innovations were made in the field. I read the second version completely it's a power-packed guide for beginners to gain knowledge on both machine learning and deep learning. But I find this important puzzle was missing there, I request the author to add a chapter on graph neural networks in upcoming versions. The hands-on experience with graph neural networks will be useful for all the readers. Thanks.

[IDEA] Ch. 2: DataFrame.corr update

Good day!

File https://github.com/ageron/handson-ml3/blob/main/02_end_to_end_machine_learning_project.ipynb
In [35]:
Line: corr_matrix = housing.corr()

In the latest version of Pandas (2.0.2) attribute numeric_only of corr() method is defaulted to False. This causes an issue:
ValueError: could not convert string to float: 'NEAR BAY'

I suggest adding a comment about this issue and possible solution in the comment section). Alternatively you can default the value manually. For me code corr_matrix = housing.corr(numeric_only=True) works just fine.

Thank you!

chapter-15, cell[41], Multivariate Time Series

After training mulvar_model with train_mulvar_ds and valid_mulvar_ds, I am trying to predict the model on mulvar_test using model.predict(mulvar_test).
and it's returning the array of floats containing 914 values(914 values maybe because of the shape of mulvar_test is (914, 5 ) ) and i guess thats the array of predictions on rail.
all i want to know if there's any way that model.predict will return only one value instead of array 914 values?

Unable to understand "If you use the row index as a unique identifier, you need to make sure that new data gets appended to the end of the dataset, and no row ever gets deleted" on page 50

I actually had doubt regarding "But both these solutions will break next time you fetch an updated dataset" which is referring to the methods of creating test set using numpy's random.permutation method and creating, loading the test set.

Now, I understand that even if we use a global seed when using numpy's permutation method in order to the same numbers, it'll cause issues if our data gets rearranged as it will cause the ML algorithm to see more data than its supposed to.

And creating the test set and loading is kind of a static approach as we will have to manually create it again and again everytime there is a change in the data set.

The solution suggested is to use the hash method. Now the part I'm struggling to understand is, if the data changes order here, we still have the same issue right (assuming the index column remains unchanged) or else these index values would need to be attached/fixed for each row instance. I guess we can counter that by using some sort of id for each instance

Now coming to my actual question, "If you use the row index as a unique identifier, you need to make sure that new data gets appended to the end of the dataset, and no row ever gets deleted", I understand here that if we insert new data in between, it'll jumble up the old data which can end in the wrong place (test data getting into training data and vice-versa). But why is it suggested to avoid deleting data?

If we are properly loading the csv file and the hash function isn't even dependent on our data's size, why then would this be an issue? If it's because the data (row) has already been fed to our model, isn't that more to do with our model than an issue with using row index as unique identifier?

I saw a similar question here: https://stackoverflow.com/questions/62229411/unique-identifier-in-a-dataset-index-problems
But I don't understand the answer given there

[BUG]Chapter 11 of the book contains incorrect content

Describe the bug
In the third edition of the book, in Chapter 11 of the Max Norm Regulation, in the last paragraph, you write, 'The max_norm() function has an axis argument that defaults to 0.' A Dense layer usually has weights of shape [number of inputs, number of neurons], so using axis=0 means that the max-norm constraint will apply independently to each neuron’s Weight vector ", which looks similar to 'https://www.tensorflow.org/api_docs/python/tf/keras/constraints/MaxNorm'. The explanation given is different, axis=0 represents the input dimension

[BUG] chapter 5, Setting the parameter C to float("inf") doesn't work.

Thanks for helping us improve this project!

Describe the bug
On chapter 5 for the code of "Figure 5-1. Large margin classification" there is one part on google colab where the C parameter is set to float("inf"), however, when I run the program I receive an error indicating that its not acceptable to use inf
page 176- google colab

To Reproduce
Please copy the code that fails here, using code blocks like this:

# extra code – this cell generates and saves Figure 5–1

import matplotlib.pyplot as plt
import numpy as np
from sklearn.svm import SVC
from sklearn import datasets

iris = datasets.load_iris(as_frame=True)
X = iris.data[["petal length (cm)", "petal width (cm)"]].values
y = iris.target

setosa_or_versicolor = (y == 0) | (y == 1)
X = X[setosa_or_versicolor]
y = y[setosa_or_versicolor]

# SVM Classifier model
svm_clf = SVC(kernel="linear", C=float("inf")) ##this is the part with the issue
svm_clf.fit(X, y)

# Bad models
x0 = np.linspace(0, 5.5, 200)
pred_1 = 5 * x0 - 20
pred_2 = x0 - 1.8
pred_3 = 0.1 * x0 + 0.5

def plot_svc_decision_boundary(svm_clf, xmin, xmax):
    w = svm_clf.coef_[0]
    b = svm_clf.intercept_[0]

    # At the decision boundary, w0*x0 + w1*x1 + b = 0
    # => x1 = -w0/w1 * x0 - b/w1
    x0 = np.linspace(xmin, xmax, 200)
    decision_boundary = -w[0] / w[1] * x0 - b / w[1]

    margin = 1/w[1]
    gutter_up = decision_boundary + margin
    gutter_down = decision_boundary - margin
    svs = svm_clf.support_vectors_

    plt.plot(x0, decision_boundary, "k-", linewidth=2, zorder=-2)
    plt.plot(x0, gutter_up, "k--", linewidth=2, zorder=-2)
    plt.plot(x0, gutter_down, "k--", linewidth=2, zorder=-2)
    plt.scatter(svs[:, 0], svs[:, 1], s=180, facecolors='#AAA',
                zorder=-1)

fig, axes = plt.subplots(ncols=2, figsize=(10, 2.7), sharey=True)

plt.sca(axes[0])
plt.plot(x0, pred_1, "g--", linewidth=2)
plt.plot(x0, pred_2, "m-", linewidth=2)
plt.plot(x0, pred_3, "r-", linewidth=2)
plt.plot(X[:, 0][y==1], X[:, 1][y==1], "bs", label="Iris versicolor")
plt.plot(X[:, 0][y==0], X[:, 1][y==0], "yo", label="Iris setosa")
plt.xlabel("Petal length")
plt.ylabel("Petal width")
plt.legend(loc="upper left")
plt.axis([0, 5.5, 0, 2])
plt.gca().set_aspect("equal")
plt.grid()

plt.sca(axes[1])
plot_svc_decision_boundary(svm_clf, 0, 5.5)
plt.plot(X[:, 0][y==1], X[:, 1][y==1], "bs")
plt.plot(X[:, 0][y==0], X[:, 1][y==0], "yo")
plt.xlabel("Petal length")
plt.axis([0, 5.5, 0, 2])
plt.gca().set_aspect("equal")
plt.grid()

save_fig("large_margin_classification_plot")
plt.show()

And if you got an exception, please copy the full stacktrace here:

InvalidParameterError                     Traceback (most recent call last)
[<ipython-input-11-ad9449b51ae6>](https://localhost:8080/#) in <cell line: 18>()
     16 # SVM Classifier model
     17 svm_clf = SVC(kernel="linear", C=float("inf"))
---> 18 svm_clf.fit(X, y)
     19 
     20 # Bad models

2 frames
[/usr/local/lib/python3.9/dist-packages/sklearn/utils/_param_validation.py](https://localhost:8080/#) in validate_parameter_constraints(parameter_constraints, params, caller_name)
     95                 )
     96 
---> 97             raise InvalidParameterError(
     98                 f"The {param_name!r} parameter of {caller_name} must be"
     99                 f" {constraints_str}. Got {param_val!r} instead."

InvalidParameterError: The 'C' parameter of SVC must be a float in the range (0.0, inf). Got inf instead.

Screenshots

Versions (please complete the following information):
all codes and versions set in this code:
https://colab.research.google.com/github/ageron/handson-ml3/blob/main/05_support_vector_machines.ipynb#scrollTo=bkZHEPZwizro

Figure 16-2

In the centre box showing the text windows with purple markers, Window 2 starts on the letter 'i', but I would have expected it to start on 'z', as the windows should only overlap by one letter since the window size = (length + 1) and the shift=length.

That is what appears to be correctly shown in the bottom box in teal / blue.

[QUESTION] mybinder dosen't start for this repo

why is this repo not working on mybinder.org?

here is build logs from mybinder.org

==== build logs ====

Found built image, launching...
Launching server...
Launch attempt 1 failed, retrying...
Launch attempt 2 failed, retrying...
Launch attempt 3 failed, retrying...
Failed to launch image gesiscss/binder-r2d-g5b5b759-ageron-2dhandson-2dml3-6daeaf:a858c72c76d42285d73c7f3b21dc9f4302c4bd26

[BUG] ageron/handson-ml3 image not found on dockerhub

Custom Optimizers not working [BUG]

To Reproduce run following

class MyMomentumOptimizer(tf.keras.optimizers.Optimizer):
    def __init__(self, learning_rate=0.001, momentum=0.9, name="MyMomentumOptimizer", **kwargs):
        """Call super().__init__() and use _set_hyper() to store hyperparameters"""
        super().__init__('MyMomentumOptimizer', **kwargs)
        self._set_hyper("learning_rate", kwargs.get("lr", learning_rate)) # handle lr=learning_rate
        self._set_hyper("decay", self._initial_decay) # 
        self._set_hyper("momentum", momentum)
    
    def _create_slots(self, var_list):
        """For each model variable, create the optimizer variable associated with it.
        TensorFlow calls these optimizer variables "slots".
        For momentum optimization, we need one momentum slot per model variable.
        """
        for var in var_list:
            self.add_slot(var, "momentum")

    @tf.function
    def _resource_apply_dense(self, grad, var):
        """Update the slots and perform one optimization step for one model variable
        """
        var_dtype = var.dtype.base_dtype
        lr_t = self._decayed_lr(var_dtype) # handle learning rate decay
        momentum_var = self.get_slot(var, "momentum")
        momentum_hyper = self._get_hyper("momentum", var_dtype)
        momentum_var.assign(momentum_var * momentum_hyper - (1. - momentum_hyper)* grad)
        var.assign_add(momentum_var * lr_t)

    def _resource_apply_sparse(self, grad, var):
        raise NotImplementedError

    def get_config(self):
        base_config = super().get_config()
        return {
            **base_config,
            "learning_rate": self._serialize_hyperparameter("learning_rate"),
            "decay": self._serialize_hyperparameter("decay"),
            "momentum": self._serialize_hyperparameter("momentum"),
        }

Following Exception Occured

AttributeError: 'MyMomentumOptimizer' object has no attribute '_set_hyper'
Versions (please complete the following information):

OS: [Debian 4.19.269-1]
Python: [3.7]
TensorFlow: [2.11]

Additional context
After taking a look at base class I couldn't find _set_hyper in base class

in chapter 10 : why input shape should be (input_shape=x_train[:1]) not in this way input_shape = x_train[1:].shape)

[BUG]: Chp2 - ValueError

Hi Aurélien,

Describe the bug

Chapter 2, page 75.

When I run a piece of code word for word from the book, I obtain an error.

The code

df_output = pd.DataFrame(cat_encoder.transform(df_test_unknown), 
                         columns=cat_encoder.get_feature_names_out(), 
                         index=df_test_unknown.index)

And if you got an exception, please copy the full stacktrace here:

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[183], line 1
----> 1 df_output = pd.DataFrame(cat_encoder.transform(df_test_unknown), 
      2                          columns=cat_encoder.get_feature_names_out(), 
      3                          index=df_test_unknown.index)

File [~/ml/studysession/lib/python3.10/site-packages/pandas/core/frame.py:762](https://file+.vscode-resource.vscode-cdn.net/Users/[user]/ml/~/ml/studysession/lib/python3.10/site-packages/pandas/core/frame.py:762), in DataFrame.__init__(self, data, index, columns, dtype, copy)
    754         mgr = arrays_to_mgr(
    755             arrays,
    756             columns,
   (...)
    759             typ=manager,
    760         )
    761     else:
--> 762         mgr = ndarray_to_mgr(
    763             data,
    764             index,
    765             columns,
    766             dtype=dtype,
    767             copy=copy,
    768             typ=manager,
    769         )
    770 else:
    771     mgr = dict_to_mgr(
...
    418 passed = values.shape
    419 implied = (len(index), len(columns))
--> 420 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")

ValueError: Shape of passed values is (2, 1), indices imply (2, 5)

Expected behavior

I am supposed to receive a DataFrame

Screenshots

Versions (please complete the following information):

OS: MacOSX 12.6
Python: 3.10
Scikit-Learn: 1.2.1

Thank you for your time

[QUESTION] Ridge regression cost function

What's the right definition of the cost function for Ridge regression?

The doc version:

The book version:

In the 2nd edition, 2 is used instead of m. Anyway, it's not obvious why with m.

19_training_and_deploying_at_scale.ipynb error

I tried to run the subject notebook in Colab and received the below error at [this section]. (https://colab.research.google.com/github/ageron/handson-ml3/blob/main/19_training_and_deploying_at_scale.ipynb#scrollTo=Querying_TF_Serving_through_the_REST_API). Please help. Thanks.

Codes caused the error:

`import requests

server_url = "http://localhost:8501/v1/models/my_mnist_model:predict"
response = requests.post(server_url, data=request_json)
response.raise_for_status() # raise an exception in case of error
response = response.json()`

Error messages:

ConnectionRefusedError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/urllib3/connection.py in _new_conn(self)
158 conn = connection.create_connection(
--> 159 (self._dns_host, self.port), self.timeout, **extra_kw)
160

19 frames
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

NewConnectionError Traceback (most recent call last)
NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f129a2f4550>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

MaxRetryError Traceback (most recent call last)
MaxRetryError: HTTPConnectionPool(host='localhost', port=8501): Max retries exceeded with url: /v1/models/my_mnist_model:predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f129a2f4550>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

ConnectionError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
514 raise SSLError(e, request=request)
515
--> 516 raise ConnectionError(e, request=request)
517
518 except ClosedPoolError as e:

ConnectionError: HTTPConnectionPool(host='localhost', port=8501): Max retries exceeded with url: /v1/models/my_mnist_model:predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f129a2f4550>: Failed to establish a new connection: [Errno 111] Connection refused'))

[KMeans default algorithm is being set to Lloyd]

Thanks for helping us improve this project!

Is your feature request related to a problem? Please describe.
Notebook 09_unsupervised_learning.ipynb section "Accelerated K-Means"

For Elkan's variant of K-Means, use algorithm="elkan". For regular KMeans, use algorithm="full". The default is "auto", which uses the full algorithm since Scikit-Learn 1.1 (it used Elkan's algorithm before that).

Describe the solution you'd like
change default algorithm to "LIoyd", reference to scikit-learn/scikit-learn@bacc91c

Describe alternatives you've considered
change default algorithm to "LIoyd", reference to scikit-learn/scikit-learn@bacc91c

Additional context
change default algorithm to "LIoyd", reference to scikit-learn/scikit-learn@bacc91c

[QUESTION] Why at the last paragraph of page 147 is recommended not to shuffle instances and on page 147 in warning paragraph is recommended to shuffle instances?

Could you please explain difference in those 2 paragraphs?

[QUESTION] For ridge regression (p.156), what means that "keep the model weights as small as possible"? Does the weights here mean model coefficients?

Install tensorflow-gpu with conda

The following sentence from "INSATALL.md" is somewhat outdated:

but the good news is that they will be installed automatically when you install the tensorflow-gpu package from Anaconda.

It is because the official command below uses tf version 2.4 which is not compatible with the jupyter notebooks for homl3.

conda install -c anaconda tensorflow-gpu

Cf. https://anaconda.org/anaconda/tensorflow-gpu

Is there any other easy way to use tensorflow-gpu?

[IDEA] Remove sharex=False in figure 2-13?

This isn't a bug, but rather a possible suggested improvement. In figure 2-13 the option sharex=False is used to handle a bug that used to occur in matplotlib.

The argument sharex=False fixes a display bug: without it, the x-axis values and label are not displayed (see: pandas-dev/pandas#10611).

That issue has been resolved. So there's no longer the need to include that argument. This is a possible improvement, the code still works perfectly.

Chap 16, cell[49], Reusing Pretrained Embeddings and Language Models.

Hello, I'm using Nvidia Geforce 1050 ti 4gb, after running cell [49], I'm getting this error.
Kindly share the requirements of GPU which will run smoothly with this book.

W tensorflow/core/framework/op_kernel.cc:1733] RESOURCE_EXHAUSTED: failed to allocate memory

[QUESTION] Chapter 11, cell 46 is using momentum, when it should not

In Chapter 11, cells 46 and 48 are the same. When I assume cell 46 should be SGD without momentum, and 48 with momentum.

[Bug] Chpt2: Issue while fetching the output of preprocessing.get_feature_names_out()

Describe the bug
The issue occurs while creating the final preprocessing pipeline before Select and Train a Model topic. Expected output of preprocessing.get_feature_names_out() should be the list of all the features in the preprocessing pipeline, but instead, I get an AttributeError: Transformer geo (type ClusterSimilarity) does not provide get_feature_names_out.

To Reproduce

def column_ratio(X):
    return X[:, [0]] / X[:, [1]]

def ratio_name(function_transformer, features_names_in):
    return ["ratio"] #features names out

def ratio_pipeline():
    return make_pipeline(
        SimpleImputer(strategy='median'),
        FunctionTransformer(column_ratio, feature_names_out=ratio_name),
        StandardScaler()
    )

log_pipeline = make_pipeline(
    SimpleImputer(strategy='median'),
    FunctionTransformer(np.log, feature_names_out='one-to-one'),
    StandardScaler()
)

cluster_simil = ClusterSimilarity(n_clusters=10, gamma=1., random_state=42)
default_num_pipeline = make_pipeline(
    SimpleImputer(strategy='median'),
    StandardScaler()
    )

preprocessing = ColumnTransformer([
    ('bedrooms', ratio_pipeline(), ['total_bedrooms', 'total_rooms']),
    ('rooms_per_house', ratio_pipeline(), ['total_rooms', 'households']),
    ('people_per_house', ratio_pipeline(), ['population', 'households']),
    ('log', log_pipeline, ['total_bedrooms', 'total_rooms', 'population', 'households', 'median_income']),
    ('geo', cluster_simil, ["latitude", "longitude"]),
    ('cat', cat_pipeline, make_column_selector(dtype_include=object)),
],
remainder=default_num_pipeline) #one column remaining: housing_median_age

#This is where the issue happens
preprocessing.get_feature_names_out()

And if you got an exception, please copy the full stacktrace here:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[119], line 1
----> 1 preprocessing.get_feature_names_out()

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\compose\_column_transformer.py:511, in ColumnTransformer.get_feature_names_out(self, input_features)
    509 transformer_with_feature_names_out = []
    510 for name, trans, column, _ in self._iter(fitted=True):
--> 511     feature_names_out = self._get_feature_name_out_for_transformer(
    512         name, trans, column, input_features
    513     )
    514     if feature_names_out is None:
    515         continue

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\compose\_column_transformer.py:479, in ColumnTransformer._get_feature_name_out_for_transformer(self, name, trans, column, feature_names_in)
    477 # An actual transformer
    478 if not hasattr(trans, "get_feature_names_out"):
--> 479     raise AttributeError(
    480         f"Transformer {name} (type {type(trans).__name__}) does "
    481         "not provide get_feature_names_out."
    482     )
    483 return trans.get_feature_names_out(names)

AttributeError: Transformer geo (type ClusterSimilarity) does not provide get_feature_names_out.

Expected behavior
It should list of the names of all the features in the pipeline.

Versions (please complete the following information):

OS: [Win 11]
Python: [3.10]
Scikit-Learn: [e.g., 1.2.1]

[BUG] Chapter 13: Solution to Exercise 10 might have an issue (paragraph e)

I am completely new to the material, so could be wrong, but I think the solution of exercise 10 has an issue. The solution presents the following function:

def compute_mean_embedding(inputs):
    not_pad = tf.math.count_nonzero(inputs, axis=-1)
    n_words = tf.math.count_nonzero(not_pad, axis=-1, keepdims=True)    
    sqrt_n_words = tf.math.sqrt(tf.cast(n_words, tf.float32))
    return tf.reduce_sum(inputs, axis=1) / sqrt_n_words

This function is supposed to calculate the square root of the number of words by counting non-zero vectors.
However, it is plugged into the model after the Embeddings layer, like so:

embedding_size = 20
tf.random.set_seed(42)

text_vectorization = tf.keras.layers.TextVectorization(
    max_tokens=max_tokens, output_mode="int")
text_vectorization.adapt(sample_reviews)

model = tf.keras.Sequential([
    text_vectorization,
    tf.keras.layers.Embedding(input_dim=max_tokens,
                              output_dim=embedding_size,
                              mask_zero=True),  # <pad> tokens => zero vectors
    tf.keras.layers.Lambda(compute_mean_embedding),
    tf.keras.layers.Dense(100, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid"),
])

From my tests, it does not seem that mask_zero=True guarantees that the padding token 0 (output of the TextVectorization layer) is embedded to a zeros vector by the Embedding layer. So in fact, I'm not sure it correctly counts the number of words. If I am correct, it always uses the square root of the length of the longest sentence in the batch.

Here is my suggested solution. The layer should replace both the Embedding and Lambda layers in the notebook's solution. Note, however, that when I tested a model using this layer, it did not have a better accuracy than the notebook's solution for the same dataset and hyperparameters.

class SentenceEmbedding(tf.keras.layers.Layer):
    def __init__(self, input_dim, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.embedding_layer = tf.keras.layers.Embedding(
            input_dim=self.input_dim,
            output_dim=self.output_dim,
            mask_zero=True
        )

    def call(self, input):
        n_words = tf.math.count_nonzero(input, axis=-1, keepdims=True, dtype=tf.float32)
        embeddings_output = self.embedding_layer(input)
        return tf.reduce_sum(embeddings_output, axis=1) / tf.sqrt(n_words)

    def get_config(self):
        base_config = super().get_config()
        return {
            **base_config,
            'input_dim': self.input_dim,
            'output_dim': self.output_dim
        }

[BUG] Running cell 5 of 05_support_vector_machines.ipynb notebook gives error

I am running cells from top till cell 5 and when running cell 5 with this code:

# extra code – this cell generates and saves Figure 5–1

import matplotlib.pyplot as plt
import numpy as np
from sklearn.svm import SVC
from sklearn import datasets

iris = datasets.load_iris(as_frame=True)
X = iris.data[["petal length (cm)", "petal width (cm)"]].values
y = iris.target

setosa_or_versicolor = (y == 0) | (y == 1)
X = X[setosa_or_versicolor]
y = y[setosa_or_versicolor]

# SVM Classifier model
svm_clf = SVC(kernel="linear", C=float("inf"))
svm_clf.fit(X, y)

# Bad models
x0 = np.linspace(0, 5.5, 200)
pred_1 = 5 * x0 - 20
pred_2 = x0 - 1.8
pred_3 = 0.1 * x0 + 0.5

def plot_svc_decision_boundary(svm_clf, xmin, xmax):
    w = svm_clf.coef_[0]
    b = svm_clf.intercept_[0]

    # At the decision boundary, w0*x0 + w1*x1 + b = 0
    # => x1 = -w0/w1 * x0 - b/w1
    x0 = np.linspace(xmin, xmax, 200)
    decision_boundary = -w[0] / w[1] * x0 - b / w[1]

    margin = 1/w[1]
    gutter_up = decision_boundary + margin
    gutter_down = decision_boundary - margin
    svs = svm_clf.support_vectors_

    plt.plot(x0, decision_boundary, "k-", linewidth=2, zorder=-2)
    plt.plot(x0, gutter_up, "k--", linewidth=2, zorder=-2)
    plt.plot(x0, gutter_down, "k--", linewidth=2, zorder=-2)
    plt.scatter(svs[:, 0], svs[:, 1], s=180, facecolors='#AAA',
                zorder=-1)

fig, axes = plt.subplots(ncols=2, figsize=(10, 2.7), sharey=True)

plt.sca(axes[0])
plt.plot(x0, pred_1, "g--", linewidth=2)
plt.plot(x0, pred_2, "m-", linewidth=2)
plt.plot(x0, pred_3, "r-", linewidth=2)
plt.plot(X[:, 0][y==1], X[:, 1][y==1], "bs", label="Iris versicolor")
plt.plot(X[:, 0][y==0], X[:, 1][y==0], "yo", label="Iris setosa")
plt.xlabel("Petal length")
plt.ylabel("Petal width")
plt.legend(loc="upper left")
plt.axis([0, 5.5, 0, 2])
plt.gca().set_aspect("equal")
plt.grid()

plt.sca(axes[1])
plot_svc_decision_boundary(svm_clf, 0, 5.5)
plt.plot(X[:, 0][y==1], X[:, 1][y==1], "bs")
plt.plot(X[:, 0][y==0], X[:, 1][y==0], "yo")
plt.xlabel("Petal length")
plt.axis([0, 5.5, 0, 2])
plt.gca().set_aspect("equal")
plt.grid()

save_fig("large_margin_classification_plot")
plt.show()

I am getting error:

---------------------------------------------------------------------------
InvalidParameterError                     Traceback (most recent call last)
Cell In[5], line 18
     16 # SVM Classifier model
     17 svm_clf = SVC(kernel="linear", C=float("inf"))
---> 18 svm_clf.fit(X, y)
     20 # Bad models
     21 x0 = np.linspace(0, 5.5, 200)

File ~\anaconda3\envs\ml_1\Lib\site-packages\sklearn\svm\_base.py:180, in BaseLibSVM.fit(self, X, y, sample_weight)
    147 def fit(self, X, y, sample_weight=None):
    148     """Fit the SVM model according to the given training data.
    149 
    150     Parameters
   (...)
    178     matrices as input.
    179     """
--> 180     self._validate_params()
    182     rnd = check_random_state(self.random_state)
    184     sparse = sp.isspmatrix(X)

File ~\anaconda3\envs\ml_1\Lib\site-packages\sklearn\base.py:581, in BaseEstimator._validate_params(self)
    573 def _validate_params(self):
    574     """Validate types and values of constructor parameters
    575 
    576     The expected type and values must be defined in the `_parameter_constraints`
   (...)
    579     accepted constraints.
    580     """
--> 581     validate_parameter_constraints(
    582         self._parameter_constraints,
    583         self.get_params(deep=False),
    584         caller_name=self.__class__.__name__,
    585     )

File ~\anaconda3\envs\ml_1\Lib\site-packages\sklearn\utils\_param_validation.py:97, in validate_parameter_constraints(parameter_constraints, params, caller_name)
     91 else:
     92     constraints_str = (
     93         f"{', '.join([str(c) for c in constraints[:-1]])} or"
     94         f" {constraints[-1]}"
     95     )
---> 97 raise InvalidParameterError(
     98     f"The {param_name!r} parameter of {caller_name} must be"
     99     f" {constraints_str}. Got {param_val!r} instead."
    100 )

InvalidParameterError: The 'C' parameter of SVC must be a float in the range (0.0, inf). Got inf instead.

How to fix it?

Chapter 9

Hi, in chapter 9 it mentioned about active learning but without examples, I'm wondering if there is any recommendation on active learning? tutorial/examples?

Thanks!!!

Chap# 16, cell [64], ValueError.

"copied the exact code on jupyter-notebook. still getting the value error"
here's the function im calling to predict translation:

def translate(sentence_en):
translation = ""
for word_idx in range(max_length):
X = np.array([sentence_en]) # encoder input
X_dec = np.array(["startofseq " + translation]) # decoder input
y_proba = model.predict((X, X_dec))[0, word_idx] # last token's probas
predicted_word_id = np.argmax(y_proba)
predicted_word = text_vec_layer_es.get_vocabulary()[predicted_word_id]
if predicted_word == "endofseq":
break
translation += " " + predicted_word
return translation.strip()

ValueError: Layer "sequential_1" expects 1 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None,) dtype=string>, <tf.Tensor 'IteratorGetNext:1' shape=(None,) dtype=string>]

Problem creating HOML3 environment

Having trouble creating HOML3 environment on Windows 10. I thought this might be due to having standalone Python 3.10 installed, so I uninstalled that, then uninstalled and reinstalled anaconda. Each time I retry creating the HOML3 environment, I first clean up by deleting env3\homl3 in the anaconda folder.

Here's the error. Before trying to create the HOML3 environment this time, I did python -m pip install libtorrent and got Requirement already satisfied: libtorrent in c:\users\x475a\anaconda3\lib\site-packages (2.0.7)

Thank you.

Building wheel for AutoROM.accept-rom-license (pyproject.toml): finished with status 'error'
Failed to build AutoROM.accept-rom-license

Pip subprocess error:
  error: subprocess-exited-with-error

  × Building wheel for AutoROM.accept-rom-license (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [66 lines of output]
      C:\Users\x475a\AppData\Local\Temp\pip-build-env-vk00i65p\overlay\Lib\site-packages\setuptools\config\setupcfg.py:508: SetuptoolsDeprecationWarning: The license_file parameter is deprecated, use license_files instead.
        warnings.warn(msg, warning_class)
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build\lib
      copying AutoROM.py -> build\lib
      installing to build\bdist.win-amd64\wheel
      running install
      running install_lib
      creating build\bdist.win-amd64
      creating build\bdist.win-amd64\wheel
      copying build\lib\AutoROM.py -> build\bdist.win-amd64\wheel\.
      running install_egg_info
      running egg_info
      writing AutoROM.accept_rom_license.egg-info\PKG-INFO
      writing dependency_links to AutoROM.accept_rom_license.egg-info\dependency_links.txt
      writing requirements to AutoROM.accept_rom_license.egg-info\requires.txt
      writing top-level names to AutoROM.accept_rom_license.egg-info\top_level.txt
      reading manifest file 'AutoROM.accept_rom_license.egg-info\SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      adding license file 'LICENSE.txt'
      writing manifest file 'AutoROM.accept_rom_license.egg-info\SOURCES.txt'
      Copying AutoROM.accept_rom_license.egg-info to build\bdist.win-amd64\wheel\.\AutoROM.accept_rom_license-0.5.0-py3.10.egg-info
      running install_scripts
      Traceback (most recent call last):
        File "C:\Users\x475a\anaconda3\envs\homl3\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 351, in <module>
          main()
        File "C:\Users\x475a\anaconda3\envs\homl3\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 333, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "C:\Users\x475a\anaconda3\envs\homl3\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 249, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
        File "C:\Users\x475a\AppData\Local\Temp\pip-build-env-vk00i65p\overlay\Lib\site-packages\setuptools\build_meta.py", line 413, in build_wheel
          return self._build_with_temp_dir(['bdist_wheel'], '.whl',
        File "C:\Users\x475a\AppData\Local\Temp\pip-build-env-vk00i65p\overlay\Lib\site-packages\setuptools\build_meta.py", line 398, in _build_with_temp_dir
          self.run_setup()
        File "C:\Users\x475a\AppData\Local\Temp\pip-build-env-vk00i65p\overlay\Lib\site-packages\setuptools\build_meta.py", line 484, in run_setup
          super(_BuildMetaLegacyBackend,
        File "C:\Users\x475a\AppData\Local\Temp\pip-build-env-vk00i65p\overlay\Lib\site-packages\setuptools\build_meta.py", line 335, in run_setup
          exec(code, locals())
        File "<string>", line 18, in <module>
        File "C:\Users\x475a\AppData\Local\Temp\pip-build-env-vk00i65p\overlay\Lib\site-packages\setuptools\__init__.py", line 87, in setup
          return distutils.core.setup(**attrs)
        File "C:\Users\x475a\AppData\Local\Temp\pip-build-env-vk00i65p\overlay\Lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
          return run_commands(dist)
        File "C:\Users\x475a\AppData\Local\Temp\pip-build-env-vk00i65p\overlay\Lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
          dist.run_commands()
        File "C:\Users\x475a\AppData\Local\Temp\pip-build-env-vk00i65p\overlay\Lib\site-packages\setuptools\_distutils\dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "C:\Users\x475a\AppData\Local\Temp\pip-build-env-vk00i65p\overlay\Lib\site-packages\setuptools\dist.py", line 1208, in run_command
          super().run_command(command)
        File "C:\Users\x475a\AppData\Local\Temp\pip-build-env-vk00i65p\overlay\Lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
          cmd_obj.run()
        File "C:\Users\x475a\AppData\Local\Temp\pip-build-env-vk00i65p\overlay\Lib\site-packages\wheel\bdist_wheel.py", line 360, in run
          self.run_command("install")
        File "C:\Users\x475a\AppData\Local\Temp\pip-build-env-vk00i65p\overlay\Lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "C:\Users\x475a\AppData\Local\Temp\pip-build-env-vk00i65p\overlay\Lib\site-packages\setuptools\dist.py", line 1208, in run_command
          super().run_command(command)
        File "C:\Users\x475a\AppData\Local\Temp\pip-build-env-vk00i65p\overlay\Lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
          cmd_obj.run()
        File "<string>", line 11, in run
        File "C:\Users\x475a\AppData\Local\Temp\pip-install-n3yw4sxy\autorom-accept-rom-license_e037a052a27841a5a54970ef670daf18\AutoROM.py", line 13, in <module>
          import libtorrent as lt
      ImportError: DLL load failed while importing libtorrent: The specified module could not be found.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for AutoROM.accept-rom-license
ERROR: Could not build wheels for AutoROM.accept-rom-license, which is required to install pyproject.toml-based projects

failed

CondaEnvException: Pip failed

[BUG]tensorflow cannot find GPU with "docker-compose up", but it works with "docker run". nvidia-smi shows GPU with cuda version error.

Thanks for helping us improve this project!

Describe the bug
tensorflow cannot find GPU with "docker-compose up", but it works with "docker run".
To Reproduce
make run
make exec
ipython

Then I get these:

In [1]: import tensorflow as tf
2023-04-30 20:38:26.804632: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Librar
y (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-30 20:38:27.019561: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plug
in cuBLAS when one has already been registered

In [2]: tf.config.list_physical_devices('GPU')
2023-04-30 20:38:30.584296: E tensorflow/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (34)
2023-04-30 20:38:30.584411: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: 3b0aa2bc90a6
2023-04-30 20:38:30.584443: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 3b0aa2bc90a6
2023-04-30 20:38:30.584559: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: NOT_FOUND: was unable to find libcuda.s
o DSO loaded into this program
2023-04-30 20:38:30.584654: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 530.30.2
Out[2]: []

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

This is my docker-compose.yml
version: "3"
services:
handson-ml3:
build:
context: ../
dockerfile: ./docker/Dockerfile.gpu #Dockerfile
args:
- username=devel
- userid=1000
container_name: handson-ml3
image: ageron/handson-ml3:latest-gpu #latest
restart: unless-stopped
logging:
driver: json-file
options:
max-size: 50m
ports:
- "8888:8888"
- "8890:8890"
- "6006:6006"
volumes:
- ../:/home/devel/handson-ml3
command: /opt/conda/envs/homl3/bin/jupyter-lab --ip=0.0.0.0 --port=8890 --no-browser
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu, utility]

[QUESTION] SVC's default value for `decision_function_shape`

According to sklearn's document, it seems that the hyperparmeter decision_function_shape for SVC model is "ovr", not "ovo". But in the book, "ovo" is mentioned and explained in the section about Multiclass classification, Chapter 3.

On the other hand, it is written in the jupyter notebook as follows:

If you want decision_function() to return all 45 scores, you can set the decision_function_shape hyperparameter to "ovo". The default value is "ovr", but don't let this confuse you: SVC always uses OvO for training. This hyperparameter only affects whether or not the 45 scores get aggregated or not:

This should be also explained in the book.

[QUESTION]what is changed in this book since i have started reading second edition. is it necessary to move to third edition

[BUG]

Thanks for helping us improve this project!

Describe the bug
Part 1 - Chapter 2 - (EndtoEndML), in creating ColumnTransformer supplying feature_names_out as argument rises TypeError exception.

To Reproduce
Please copy the code that fails here, using code blocks like this:

def ratio_pipeline(name=None):
    return make_pipeline(
        SimpleImputer(strategy="median"),
        FunctionTransformer(column_ratio, feature_names_out= lambda input_features: [name]),
        StandardScaler())

And if you got an exception, please copy the full stacktrace here:

Traceback (most recent call last):
  File "F:\*****\*****\*****\CHAPTER2-END_TO_END_PROJECT\final-example.py", line 49, in <module>
    ("bedrooms_ratio", ratio_pipeline("bedrooms_ratio"), ["total_bedrooms", "total_rooms"]),
  File "F:\*****\*****\*****\CHAPTER2-END_TO_END_PROJECT\final-example.py", line 38, in ratio_pipeline
    FunctionTransformer(column_ratio, feature_names_out= lambda input_features: [name]),
TypeError: FunctionTransformer.__init__() got an unexpected keyword argument 'feature_names_out'

Expected behavior
The preprocessing fit and transform the dataset

Versions (please complete the following information):

OS: Windows 10
Python: 3.10
TensorFlow: ****
Scikit-Learn: 1.0.2
Other libraries that may be connected with the issue:*****

Additional context

This is my full code

import numpy as np
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.cluster import KMeans
from sklearn.compose import ColumnTransformer, make_column_selector
from sklearn.impute import SimpleImputer
from sklearn.metrics.pairwise import rbf_kernel
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer, StandardScaler, OneHotEncoder
from data import housing


class ClusterSimilarity(BaseEstimator, TransformerMixin):
    def __init__(self, n_clusters=10, gamma=1.0, random_state=None):
        self.n_clusters = n_clusters
        self.gamma = gamma
        self.random_state = random_state

    def fit(self, X, y=None, sample_weight=None):
        self.kmeans_ = KMeans(self.n_clusters, random_state=self.random_state)
        self.kmeans_.fit(X, sample_weight=sample_weight)
        return self  # always return self!

    def transform(self, X):
        return rbf_kernel(X, self.kmeans_.cluster_centers_, gamma=self.gamma)

    def get_feature_names_out(self, names=None):
        return [f"Cluster {i} similarity" for i in range(self.n_clusters)]


def column_ratio(X):
    return X[:, [0]] / X[:, [1]]


def ratio_pipeline(name=None):
    return make_pipeline(
        SimpleImputer(strategy="median"),
        FunctionTransformer(column_ratio, feature_names_out= lambda input_features: [name]),
        StandardScaler())


log_pipeline = make_pipeline(SimpleImputer(strategy="median"), FunctionTransformer(np.log), StandardScaler())
cluster_simil = ClusterSimilarity(n_clusters=10, gamma=1., random_state=42)
default_num_pipeline = make_pipeline(SimpleImputer(strategy="median"),StandardScaler())

cat_pipeline = make_pipeline(SimpleImputer(strategy="most_frequent"), OneHotEncoder(handle_unknown="ignore"))

preprocessing = ColumnTransformer([
    ("bedrooms_ratio", ratio_pipeline("bedrooms_ratio"), ["total_bedrooms", "total_rooms"]),
    ("rooms_per_house", ratio_pipeline("rooms_per_house"), ["total_rooms", "households"]),
    ("people_per_house", ratio_pipeline("people_per_house"), ["population", "households"]),
    ("log", log_pipeline, ["total_bedrooms", "total_rooms", "population", "households", "median_income"]),
    ("geo", cluster_simil, ["latitude", "longitude"]),
    ("cat", cat_pipeline, make_column_selector(dtype_include=np.object)),
], remainder=default_num_pipeline)  # one column remaining: housing_median_age


if __name__ == '__main__':
    housing_prepared = preprocessing.fit_transform(housing)
    print(housing_prepared.shape)
    print(preprocessing.get_feature_names_out())

Chapter 15 : Forecasting Several Steps Ahead cell[50]

When i run the code in cell 50, I am getting a entirely differnt forecast.. which is actually quite away from the actuals. What can be the reason ?

Regards,
Ravi

[BUG]

Hi, on chapter 2, page 75
on this code:

df_output = pd.DataFrame(cat_encoder.transform(df_test_unknown),
 columns=cat_encoder.get_feature_names_out(),
 index=df_test_unknown.index)

I received an error like this:

NotFittedError                            Traceback (most recent call last)
[<ipython-input-111-4a1afad009dd>](https://localhost:8080/#) in <cell line: 2>()
      1 cat_encoder = OneHotEncoder(sparse=False)
----> 2 df_output = pd.DataFrame(cat_encoder.transform(df_test_unknown),
      3                          columns=cat_encoder.get_feature_names_out(),
      4                          index=df_test_unknown.index)

2 frames
[/usr/local/lib/python3.9/dist-packages/sklearn/utils/validation.py](https://localhost:8080/#) in check_is_fitted(estimator, attributes, msg, all_or_any)
   1388 
   1389     if not fitted:
-> 1390         raise NotFittedError(msg % {"name": type(estimator).__name__})
   1391 
   1392 

NotFittedError: This OneHotEncoder instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Screenshots

however when I tried the main colab of the book the code worked perfectly fine.
I believe it's due to the code:

cat_encoder = OneHotEncoder(sparse=False)

and although it just gives a suggestion for it on the book it must be necessary to avoid the errors for transformation.
Thanks for your amazing book.

Chapter 10 - Error using fetch_california_housing

I just installed everything using the related conda environment. Everything works fine, except for when using fetch_california_housing. As in the line

housing = fetch_california_housing()

The error I get is:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In [6], line 8
      5 from sklearn.pipeline import make_pipeline
      6 from sklearn.preprocessing import StandardScaler
----> 8 housing = fetch_california_housing()
      9 X_train_full, X_test, y_train_full, y_test = train_test_split(
     10     housing.data, housing.target, random_state=42)
     11 X_train, X_valid, y_train, y_valid = train_test_split(
     12     X_train_full, y_train_full, random_state=42)

File ~/.conda/envs/homl3/lib/python3.10/site-packages/sklearn/datasets/_california_housing.py:153, in fetch_california_housing(data_home, download_if_missing, return_X_y, as_frame)
    150     remove(archive_path)
    152 else:
--> 153     cal_housing = joblib.load(filepath)
    155 feature_names = [
    156     "MedInc",
    157     "HouseAge",
   (...)
    163     "Longitude",
    164 ]
    166 target, data = cal_housing[:, 0], cal_housing[:, 1:]

File ~/.conda/envs/homl3/lib/python3.10/site-packages/joblib/numpy_pickle.py:587, in load(filename, mmap_mode)
    581             if isinstance(fobj, str):
    582                 # if the returned file object is a string, this means we
    583                 # try to load a pickle file generated with an version of
    584                 # Joblib so we load it with joblib compatibility function.
    585                 return load_compatibility(fobj)
--> 587             obj = _unpickle(fobj, filename, mmap_mode)
    588 return obj

File ~/.conda/envs/homl3/lib/python3.10/site-packages/joblib/numpy_pickle.py:506, in _unpickle(fobj, filename, mmap_mode)
    504 obj = None
    505 try:
--> 506     obj = unpickler.load()
    507     if unpickler.compat_mode:
    508         warnings.warn("The file '%s' has been generated with a "
    509                       "joblib version less than 0.10. "
    510                       "Please regenerate this pickle file."
    511                       % filename,
    512                       DeprecationWarning, stacklevel=3)

File ~/.conda/envs/homl3/lib/python3.10/pickle.py:1213, in _Unpickler.load(self)
   1211             raise EOFError
   1212         assert isinstance(key, bytes_types)
-> 1213         dispatch[key[0]](self)
   1214 except _Stop as stopinst:
   1215     return stopinst.value

KeyError: 194

I have executed the same code, but without activating the conda environment (so using the stuff as installed by my OS - I use Arch Linux) and everything works fine. But when using the conda env provided, I get the error above. I also tried updating sklearn to 1.2, but I receive the same error.

[QUESTION] Why all estimators should be fitted to training data only?

Page 45 says:

As will all estimators, it is important to fit scalers to the training data only: never use fit() or fit_transform() for anything else than training set.

Could you please explain why it is important and what happens if this recommendation is not followed?

[IDEA] 3rd paper quality is much better than 2nd edition

Just wanted to say that 3rd paper quality is much better than 2nd edition. 2nd edition was harder to read because of easier light reflection from the paper.

[QUESTION] About the keyword argument "feature_names_out" of "FunctionTransformer" class

The definition of ratio_pipeline() function in the jupyter notebook of Chapter 2 contains the following transformer:

FunctionTransformer(column_ratio,
                    feature_names_out=[name]),

But the keyword feature_names_out is not listed as an argument in the scikit-learn document of FunctionTransformer class or its parent classes, and it's not clear how to understand it.

Is that just for naming the transformer to be built or for any other pourpose? And why isn't it listed as a keyword argment of the FunctionTransformer's constructor or of its parent classes'?

Link is not working chapter 11[BUG]

I'm currently reading the book from O'reilly website and in chapter 11 there is one link that is not working. It is under Unsupervised Pretraining header and when i'm trying to open the link https://homl.info/extra-anns which redirects to https://colab.research.google.com/github/ageron/handson-ml3/blob/main/extra_ann_architectures.ipynb but this is not working and when i cheked the repo, I dont find it as well.

@ageron can you please update it.

scipy.stats.reciprocal is deprecated (at documentation level)

In solution to exercice 10 of chapter 5, the random variable scipy.stats.reciprocal is used. It has been replaced (since v1.4.0), at least in Scipy's documentation, by scipy.stats.loguniform.

[BUG] Chapter 2 cell 72: '.toarray()' missing in df_test example

Running cell 72 in the Chapter 2 returns a 2x5 sparse matrix and not a numpy array.
I run this in Jetbrains DataSpell.

To Reproduce
cell 72:

cat_encoder.transform(df_test)

Not an exception and the output is correct - it just misses the step of converting the output to a numpy array. The np array output is displayed both in the book (page 74 at the top as well as in the notebook).

<2x5 sparse matrix of type '<class 'numpy.float64'>'
	with 2 stored elements in Compressed Sparse Row format>

Expected behavior
The expected output is the output in the book (p. 74) or in the notebook (out:72)
I achieve this result by adding .toarray() to the code

Versions (please complete the following information):

OS: MacOS Ventura 13.1
Python: 3.10
TensorFlow: as per requirements.txt
Scikit-Learn: as per requirements.txt
Other libraries that may be connected with the issue: Using Jetbrains DataSpell

Additional context

[BUG] docker compose build fails with unresolved packages

=> ERROR [ 4/12] RUN echo ' - pyvirtualdisplay' >> /tmp/environment. 14.1s

[ 4/12] RUN echo ' - pyvirtualdisplay' >> /tmp/environment.yml && conda env create -f /tmp/environment.yml && conda clean -afy && find /opt/conda/ -follow -type f -name '.a' -delete && find /opt/conda/ -follow -type f -name '.pyc' -delete && find /opt/conda/ -follow -type f -name '*.js.map' -delete && rm /tmp/environment.yml:
#8 0.444 Collecting package metadata (repodata.json): ...working... done
#8 12.95 Solving environment: ...working... failed
#8 12.96
#8 12.96 ResolvePackageNotFound:
#8 12.96 - pyglet=1.5
#8 12.96 - box2d-py
#8 12.96

executor failed running [/bin/sh -c echo ' - pyvirtualdisplay' >> /tmp/environment.yml && conda env create -f /tmp/environment.yml && conda clean -afy && find /opt/conda/ -follow -type f -name '.a' -delete && find /opt/conda/ -follow -type f -name '.pyc' -delete && find /opt/conda/ -follow -type f -name '*.js.map' -delete && rm /tmp/environment.yml]: exit code: 1
ERROR: Service 'handson-ml3' failed to build : Build failed

ageron / handson-ml3 Goto Github PK

handson-ml3's Introduction

Machine Learning Notebooks, 3rd edition

Quick Start

Want to play with these notebooks online without having to install anything?

Just want to quickly look at some notebooks, without executing any code?

Want to run this project using a Docker image?

Want to install this project on your own machine?

FAQ

Contributors

handson-ml3's People

Contributors

Stargazers

Watchers

Forkers

handson-ml3's Issues

The complete mesages are appended below (it is too long so not convenient to take a screenshot):

Error messages:

=> ERROR [ 4/12] RUN echo ' - pyvirtualdisplay' >> /tmp/environment. 14.1s

Recommend Projects

Recommend Topics

Recommend Org