edwardraff / inside-deep-learning Goto Github PK

View Code? Open in Web Editor NEW

210.0 210.0 67.0 24.57 MB

Inside Deep Learning: The math, the algorithms, the models

Jupyter Notebook 99.92% Python 0.08%

inside-deep-learning's People

Contributors

Stargazers

Watchers

Forkers

pmayd vickyvfq galleon theschoolof-ai pepsalehi ts-automation david-wl jingligao drengmohamed caobinbc mkabdelrahman gulcanyildiz zhenwenzhang xiemeigongzi nergnixouhm9 yaoxy2010 yunpenglidatascience ym-han wengbenjue angelzhan chowzh lenceai talbi28 semaahmed venidicii waelbou3 guihon12 wangqun010101 blessingnehohwa karthy257 tiagoooliveira ehfo0888 carljohanrehn jagprojects1971 ugsamiz thinkfan horizont19 nthon medvedev streamlitapp khawaritzmi ashutoshjoshi1 clabra fabacha saibaldasprivate increshi himanshu-pachori benjwolff mpiza alv2017 e271828e arancium98 iportilla pslydhh w3llr00t3d leihui6 labiybafakh eisthf robbieallover rafaelrosendof matinlabkhandagh numcruncher shashvatshukla cvlabsio

inside-deep-learning's Issues

Chapter 2 " optimizer.zero_grad()"

In the beginning of chapter 2
2.1.3 The training loop

you used the following steps in you train simple nn function.
optimizer.zero_grad()
loss.backward()
optimizer.step()

but in the end of the chapter for def run_epoch function you used a different order:

2.4.2 Training and testing passes

if model.training:
loss.backward()
optimizer.step()
optimizer.zero_grad()

is there any reason for switching the order? i thought we had to zero the gradients firs thing at every epoch.

Custom Dataset class on Chapter_1

Hi, I was getting the followed erro when I executing this code:

from torch.utils.data import Dataset
from sklearn.datasets import fetch_openml

X, y = fetch_openml("mnist_784", version=1, return_X_y=True)

class SimpleDataset(Dataset):
    def __init__(self, X, y):
        super(SimpleDataset, self).__init__()
        self.X = X
        self.y = y
    
    def __getitem__(self, index):
        inputs = torch.tensor(self.X[index, :], dtype=torch.float32)
        targets = torch.tensor(int(self.y[index]), dtype=torch.int64)
        return inputs, targets

    def __len__(self):
        return self.X.shape[0]

dataset = SimpleDataset(X, y)
example, label = dataset[0]

InvalidIndexError: (tensor(0), slice(None, None, None))

The same was fixed when I change the code of the fetch_openml to:

X, y = fetch_openml("mnist_784", version=1, return_X_y=True, as_frame=False)

The problem was that whithout the as_frame, scikit will import the data as a DataFrame, not as numpy anymore.

Chapter_6.ipynb wrong test data

test_data = torchvision.datasets.FashionMNIST("./", train=True, transform=transforms.ToTensor(), download=True)

should use train=False

This changes the figures in this chapter significantly.

Chapter 4.2.3 Making predictions using the last time step

this RNN output dim should be (B,T,D)->((B,T,hidden_nodes),(S,B,hidden_nodes)) accounting to docs

this shoud be

Chapter 3.4.2 - wrong dimensions in step 3 in figure 3.11

In the figure 3.11, that shows convolution in Chapter 3.4.2 the output should have the same dimensions as the original image.
Original Image is 7x7 (without padding), however the output of the convolutions is 7x6.

a question about auto grad. thx

Dear Edward,

From page 21 to page 23, when we are talking about auto grad,

we choose to test conditon ||prev - cur || < epsilon satisfies or not to check whether we have got the minimun

my question is : why not just to test whether the grad of cur is zero or not ?

that is to say :

can

while torch.linalg.norm(x_cur-x_prev) > epsilon:

be replaced by

epsilon = 1e-12 # an enough small value

while abs(cur.grad) > epsilon:

thanks a lot !

an advice for ch1

Backpropagation is really an important and fundamental topic in deep learning

yeah, i admit that bp is a little math-heavy and a little hard for new guys

but i also can not imagine that a guy who can not understand bp can undestand deep learning, really.

bp hurts people but it is a good hurt and a must hurt.

you can not omit it . so , please add bp in ch1.

Error in Chapter_6.ipynb

When running in colab (using GPU) I got the following error in cell:

rnn_3layer = nn.Sequential( #Simple old style RNN 
  EmbeddingPackable(nn.Embedding(len(all_letters), 64)), #(B, T) -> (B, T, D)
  nn.RNN(64, n, num_layers=3, batch_first=True), #(B, T, D) -> ( (B,T,D) , (S, B, D)  )
  LastTimeStep(rnn_layers=3), #We need to take the RNN output and reduce it to one item, (B, D)
  nn.Linear(n, len(namge_language_data)), #(B, D) -> (B, classes)
)

#Apply gradient cliping to maximize its performance
for p in rnn_3layer.parameters():
    p.register_hook(lambda grad: torch.clamp(grad, -5, 5))

rnn_results = train_network(rnn_3layer, loss_func, train_lang_loader, val_loader=test_lang_loader, score_funcs={'Accuracy': accuracy_score}, device=device, epochs=10)

Error is:

/usr/local/lib/python3.6/dist-packages/torch/nn/utils/rnn.py in pack_padded_sequence(input, lengths, batch_first, enforce_sorted)

    242 
    243     data, batch_sizes = \
--> 244         _VF._pack_padded_sequence(input, lengths, batch_first)
    245     return _packed_sequence_init(data, batch_sizes, sorted_indices, None)
    246 
RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor

Chapter 3.5 - Code indentation error

In Chapter 3.5 is the following Code example:

The indentation should not end stop after w, h = img.shape.

some typos in ch 2

-p47 you write W_{d,c} instead of W^{d,c}

p46 Your comment about Y_pred.ravel() could have been made earlier on p40 where it was first introduced

Question: cannot reproduce a figure in Chapter 2

Hi @EdwardRaff - I am a beginner and currently working through Chapter 2. On print book's P49, there is a figure:

I am using the code in this repo and my pytorch is v2.0.1

The figure I got is:

Can you advise what's going wrong?

Thank you.

a typo on page 36

In the code snippet 'train_simple_network'

optimizer.step() # updates all the parameters theta(k+1) = theta(k)yita gradient

it should have been theta(k+1) = theta(k) - yita gradient

Adding mounting file info

Hi,
Thanks for the examples.
I think it would make sense to add the following to the notebooks:

from google.colab import drive
drive.mount('/content/drive')

and

# Here you want to customize the path to the right location in your drive
!cp drive/MyDrive/Inside\ Deep\ Learning/idlmam.py .

AttributeError in Chapter_2.ipynb

Hi,

when executing the 2nd cell:

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import * 
from idlmam import *

I get this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-376bfb908340> in <module>()
      2 import torch.nn as nn
      3 import torch.nn.functional as F
----> 4 from torch.utils.data import *
      5 from idlmam import *

AttributeError: module 'torch.utils.data' has no attribute 'BatchSamplerDistributedSamplerDataset'

it is solved by importing only the used modules

from torch.utils.data import Dataset, DataLoader, TensorDataset

I'm not sure if it has something to do with my setup at Collab, but based on this post, it is related to version 1.7.0 of PyTorch

Thanks

Sliding filter, Code snippet in Chapter 3.2.1

In Chapter 3.2.1, there is an implementation of sliding the filter over the input:

filter = [1, 0, -1]
input = [1, 0, 2, -1, 1, 2]
output = []
for i in range(len(input) - len(filter)):
    result = 0
    for j in range(len(filter)):
        result += input[i+j] * filter[j]
    output.append(result)

The first loop does not catch the last possible slide, so it should be:

filter = [1, 0, -1]
input = [1, 0, 2, -1, 1, 2]
output = []
for i in range(len(input) - len(filter) + 1):
    result = 0
    for j in range(len(filter)):
        result += input[i+j] * filter[j]
    output.append(result)

PS: @EdwardRaff your book is absolutely brilliant!

Chapter 3.4.4 - Dimensions after nn.Flattening-Module

In Chapter 3.4.4 the code is shown for creating a first CNN. For using nn.Flattening before the last Layer, it says in the code comments (in the book it's point 10), "Converts from (B, C, W, H) ->(B, D) so we can use a Linear layer".
Shouldn't it actually be (B, filters, C, W, H) -> (B, filters*D) ?

typos and suggestions

Figure 6.2: the 4th "high complexity" function figure is not a function because the figure suggests that a particular x value can map to multiple distinct y values.

Section 6.3.2: in the algorithm annotations on page 231, "cat=3" should be "dim=3"

Section 6.6.2: "anything time 0" should be "anything times 0"

Section 6.6.2: "forget fate" -> "forget gate" in the second to last paragraph in that section.

Section 9.0: "GANS" -> "GANs" right before the start of section 9.1.

Section 9.5.1: in the algorithm annotations on page 382 "inear" -> "linear"

Section 9.5.3: "which is now always a possibility" -> "which is not always a possibility"

Section 11.2: in the first figure describing the data: "journxe" -> "journée"

Also in the last paragraph of Section 11.2: "almond" translates to "amande" not "amende". Perhaps a homograph example might be clearer, like "avocat" which means both "lawyer" and "avocado" in French.

In Figure 11.6, it is unclear why there is an arrow from the "z hat" box to the "Attention" box.

Section 12.2.1: "EmbeddingAttentionBad" -> "EmbeddingAttentionBag"

Section 13.1.1: The cats and dogs dataset is no longer downloadable from the link assigned to "data_url_zip" in the code example.

Section 14.3.1: It might be informative to show some additional parameter settings for the Beta distribution. In particular settings for which the distribution might not be U shaped or might be asymmetrical, and when it looks like a Uniform distribution.

There is a random figure on the next to last page in the book. Is this normal?

Chapter 3.4.1- Making a convolutional layer with multiple filters

I think the new result shape is C out,W,H

I think Conv3d does Batch, Channels, Width, Height, Depth

idlmam.py: learning rate parameter of `train_simple_network` is ignored

Learning rate (named parameter lr) is ignored inside the train_simple_network function's body.
Instead, learning rate is hardcoded to 0.001 in line:

Inside-Deep-Learning/idlmam.py

Line 123 in bc4dccf

optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

Solution: pass lr to the created optimizer

Use padding mask for attention in SimpleTransformerClassifier

I think in the forward pass of the TransformerEncoder a padding mask for the attention should be used.
The padding tokens need to be excluded when calculating the attention weights. This is related to Chapter 12.2.1.

See cell 33 here. See also the PyTorch docs for refernece.

It should be changed into something like this (the src_key_padding_mask needs to be True for the values that need to be masked out):

def forward(self, input):
        if self.padding_idx is not None:
            mask = input != self.padding_idx
            src_key_padding_mask = torch.logical_not(mask)
        else:
            mask = input == input 
            src_key_padding_mask = None
        x = self.embd(input) #(B, T, D)
        x = self.position(x) #(B, T, D)
        #Because the resut of our code is (B, T, D), but transformers 
        #take input as (T, B, D), we will have to permute the order 
        #of the dimensions before and after 
        x = self.transformer(x.permute(1,0,2), src_key_padding_mask=src_key_padding_mask) #(T, B, D)
        x = x.permute(1,0,2) #(B, T, D)
        #average over time
        context = x.sum(dim=1)/mask.sum(dim=1).unsqueeze(1)
        return self.pred(self.attn(x, context, mask=mask))```