In, <a href="https://github.com/hunkim/DeepLearningZeroToAll/blob/master/lab-10-X1-mni

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

This is a matrix version: <a target="_blank" rel="noopener noreferre

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Lab 10. Back Propagation Implementation. about deeplearningzerotoall HOT 14 CLOSED

hunkim commented on May 14, 2024

Lab 10. Back Propagation Implementation.

from deeplearningzerotoall.

Comments (14)

hunkim commented on May 14, 2024 3

@kkweon : " I think it's still better to explicitly define what loss function we are using in every files (even if we stick to the MSE and do backprops by hands)"

I think it's a very good idea.

from deeplearningzerotoall.

kkweon commented on May 14, 2024 1

I agree it can look tedious but I thought to bring up because of the following reasons:

~~I thought Lab10-X1 was optional like an extra challenge~~. Currently, it can give people the wrong idea that RMSE is okay to use for any classification problems.
- Quora has some great answers for anyone who wants to know why it isn't a good idea.
It is still a good practice to explicitly write down a loss function we are trying to minimize/maximize
- Although tensorflow and other frameworks will do dirty jobs by calculating gradients its own, it's still required to write a correct loss function. What people need to learn is actually how to define a loss function on their own.
- So I think it's still better to explicitly define what loss function we are using in every files (even if we stick to the MSE and do backprops by hands). It might look more complex but people would have been writing/used to many loss functions already before coming to this lab 10.
- For example, in the code, we can write as below and leave some comments why it's not advised to use this l2 loss when solving a classification problem.

# Forward
...
loss = tf.reduce_sum(tf.square(y_pred - Y)) / 2  # or loss = tf.nn.l2_loss(y_pred - Y) / 2
diff = (y_pred - Y)
...

from deeplearningzerotoall.

cynthia commented on May 14, 2024 1

So, here are my two cents: consistency wise, I'm not sure if I am in full agreement that TF notation is easier to understand than numpy notation.

As for the slides, content wise I don't think I can add more than what has been mentioned above - but who is your audience? If you want your audience to be graduate school (or at least CS undergrad) level, the slides are fine. If you want to make the material accessible for everyone, using mathematical notation is not a great idea. (Even the most "obvious" greek characters are enough to scare away most programmers.)

from deeplearningzerotoall.

cynthia commented on May 14, 2024

MSE will still work, just not very well as it's less suitable for multinomial distribution. I'm guessing the choice is for simplicity reasons, as cross-entropy+softmax would make it harder to follow.

For pedagogical reasons, not introducing extra concepts which are extensions of the base concept (in this case, backprop) being taught is probably better for the audience. (That said, TensorFlow's notation decreases the readability significantly with little gain, so there is that...)

from deeplearningzerotoall.

cynthia commented on May 14, 2024

I second the point that it would be useful for the readers if the method used was noted, and probably a inline comment mentioning "you wouldn't do this in a real world environment" about the inadequacy of the pieces used. (And deal with better tools for this later.)

I understand that TensorFlow is the cool thing to do, but I'm a bit curious if it would have been better to do this in raw numpy for beginners. (TF's lazy evaluation and un-Pythonic notation can be confusing even for seasoned Python programmers.)

from deeplearningzerotoall.

hunkim commented on May 14, 2024

Thanks for your discussion. I try to make one slide to explain forward/backprop (for single values). Could you do a quick sanity check for me?

Many variables (so there might be some typos), but we will see if I can explain it well in my video/lecture at HKUST.

from deeplearningzerotoall.

kkweon commented on May 14, 2024

It seems good, but some dimensions are wrong which I suppose you are aware of this.

I think it's also worth mentioning how to do a quick dimension check.
We know that a gradient of W2 (= dE/dW2) must have the same dimension as W2.

So, we know

dE/dW2 must have the dimension shape of (hidden_dim, output dim)
- assuming a1 * W2 = (N, hidden_dim) * (hidden_dim, output_dim)
dE/dsigma_2 has the shape of (N, output_dim)
a_1 has the shape of (N, hidden_dim)
Therefore, we know a_1 must be transposed such that
(hidden_dim, N) x (N, output_dim) = (hidden_dim, output_dim)
dE/dW2 = t(a_1) * dE/dsigma_2

So as long as we know the derivative of W2 must have the same dimension, we can just focus on a normal calculus (without worrying about matrix).
After that, we can correct the dimension easily (usually by changing the order of computation or transpose)

from deeplearningzerotoall.

hunkim commented on May 14, 2024

This is a matrix version:

from deeplearningzerotoall.

hunkim commented on May 14, 2024

@kkweon Thanks for the comments. For figure 1, it's for single values so no need to worry about the dimensions. In Figure 1, I just wanted to show how forward and back prop works with the simple chain rule.

I added Figure 2 for matrix, and I guess the dimensions are all correct. Basically, we can directly write code from these rules. Could you do a quick check? Cheers!

For easy comments, I shared the slides + latex code at https://docs.google.com/presentation/d/1_ZmtfEjLmhbuM_PqbDYMXXLAqeWN0HwuhcSKnUQZ6MM/edit?usp=sharing.

from deeplearningzerotoall.

hunkim commented on May 14, 2024

@cynthia I agree. Using TF to write backprp is not the best idea. However, I don't want to introduce new numpy functions such as np.dot, etc.

Do you think we can simplify this code as much as we can? For example l1 = tf.add(tf.matmul(X, w1), b1) -> l1 = tf.matmul(X, w1) + b1.

It's just my thought. Feel free to add yours.

from deeplearningzerotoall.

kkweon commented on May 14, 2024

There were two typos (left comments in the google slides). Everything else looks good to me.

from deeplearningzerotoall.

hunkim commented on May 14, 2024

@cynthia I see. Perhaps, could you make a simple numpy version of lab 10-X1? I really appreciate it.

Slides, I guess they are for more advanced students/developers. Certainly, they are not for the beginners.

from deeplearningzerotoall.

cynthia commented on May 14, 2024

Sure, that's probably a separate issue, I'll send in a numpy PR when I have time.

As for the remark about this being advanced students, I think advanced students deserve better datasets. I'm personally a bit uncomfortable with the data used ([1 2 3] -> [1 2 3]) as it's not the best data for demonstrating the characteristics of the underlying algorithms involved. Obviously, this is a subjective remark from one person so feel free to ignore it.

Aside from that nit, LGTM. (LGTM is not for the slides, I haven't looked at them carefully so I don't have any remarks)

from deeplearningzerotoall.

hunkim commented on May 14, 2024

@cynthia "I'll send in a numpy PR when I have time." +1

"data used ([1 2 3] -> [1 2 3]) as it's not the best" , agree. However, I used that in my theory lecture part, so it's hard to change in the lab. When I remake the theory video, I'll change it. Thanks for your comments.

from deeplearningzerotoall.

Lab 10. Back Propagation Implementation. about deeplearningzerotoall HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent