Comments (14)
@kkweon : " I think it's still better to explicitly define what loss function we are using in every files (even if we stick to the MSE and do backprops by hands)"
I think it's a very good idea.
from deeplearningzerotoall.
I agree it can look tedious but I thought to bring up because of the following reasons:
I thought Lab10-X1 was optional like an extra challenge. Currently, it can give people the wrong idea that RMSE is okay to use for any classification problems.- Quora has some great answers for anyone who wants to know why it isn't a good idea.
- It is still a good practice to explicitly write down a loss function we are trying to minimize/maximize
- Although tensorflow and other frameworks will do dirty jobs by calculating gradients its own, it's still required to write a correct loss function. What people need to learn is actually how to define a loss function on their own.
- So I think it's still better to explicitly define what loss function we are using in every files (even if we stick to the MSE and do backprops by hands). It might look more complex but people would have been writing/used to many loss functions already before coming to this lab 10.
- For example, in the code, we can write as below and leave some comments why it's not advised to use this l2 loss when solving a classification problem.
# Forward
...
loss = tf.reduce_sum(tf.square(y_pred - Y)) / 2 # or loss = tf.nn.l2_loss(y_pred - Y) / 2
diff = (y_pred - Y)
...
from deeplearningzerotoall.
So, here are my two cents: consistency wise, I'm not sure if I am in full agreement that TF notation is easier to understand than numpy notation.
As for the slides, content wise I don't think I can add more than what has been mentioned above - but who is your audience? If you want your audience to be graduate school (or at least CS undergrad) level, the slides are fine. If you want to make the material accessible for everyone, using mathematical notation is not a great idea. (Even the most "obvious" greek characters are enough to scare away most programmers.)
from deeplearningzerotoall.
MSE will still work, just not very well as it's less suitable for multinomial distribution. I'm guessing the choice is for simplicity reasons, as cross-entropy+softmax would make it harder to follow.
For pedagogical reasons, not introducing extra concepts which are extensions of the base concept (in this case, backprop) being taught is probably better for the audience. (That said, TensorFlow's notation decreases the readability significantly with little gain, so there is that...)
from deeplearningzerotoall.
I second the point that it would be useful for the readers if the method used was noted, and probably a inline comment mentioning "you wouldn't do this in a real world environment" about the inadequacy of the pieces used. (And deal with better tools for this later.)
I understand that TensorFlow is the cool thing to do, but I'm a bit curious if it would have been better to do this in raw numpy for beginners. (TF's lazy evaluation and un-Pythonic notation can be confusing even for seasoned Python programmers.)
from deeplearningzerotoall.
Thanks for your discussion. I try to make one slide to explain forward/backprop (for single values). Could you do a quick sanity check for me?
Many variables (so there might be some typos), but we will see if I can explain it well in my video/lecture at HKUST.
from deeplearningzerotoall.
It seems good, but some dimensions are wrong which I suppose you are aware of this.
I think it's also worth mentioning how to do a quick dimension check.
We know that a gradient of W2 (= dE/dW2) must have the same dimension as W2.
So, we know
- dE/dW2 must have the dimension shape of (hidden_dim, output dim)
- assuming a1 * W2 = (N, hidden_dim) * (hidden_dim, output_dim)
- dE/dsigma_2 has the shape of (N, output_dim)
- a_1 has the shape of (N, hidden_dim)
- Therefore, we know a_1 must be transposed such that
(hidden_dim, N) x (N, output_dim) = (hidden_dim, output_dim) - dE/dW2 = t(a_1) * dE/dsigma_2
So as long as we know the derivative of W2 must have the same dimension, we can just focus on a normal calculus (without worrying about matrix).
After that, we can correct the dimension easily (usually by changing the order of computation or transpose)
from deeplearningzerotoall.
This is a matrix version:
from deeplearningzerotoall.
@kkweon Thanks for the comments. For figure 1, it's for single values so no need to worry about the dimensions. In Figure 1, I just wanted to show how forward and back prop works with the simple chain rule.
I added Figure 2 for matrix, and I guess the dimensions are all correct. Basically, we can directly write code from these rules. Could you do a quick check? Cheers!
For easy comments, I shared the slides + latex code at https://docs.google.com/presentation/d/1_ZmtfEjLmhbuM_PqbDYMXXLAqeWN0HwuhcSKnUQZ6MM/edit?usp=sharing.
from deeplearningzerotoall.
@cynthia I agree. Using TF to write backprp is not the best idea. However, I don't want to introduce new numpy functions such as np.dot, etc.
Do you think we can simplify this code as much as we can? For example l1 = tf.add(tf.matmul(X, w1), b1) -> l1 = tf.matmul(X, w1) + b1.
It's just my thought. Feel free to add yours.
from deeplearningzerotoall.
There were two typos (left comments in the google slides). Everything else looks good to me.
from deeplearningzerotoall.
@cynthia I see. Perhaps, could you make a simple numpy version of lab 10-X1? I really appreciate it.
Slides, I guess they are for more advanced students/developers. Certainly, they are not for the beginners.
from deeplearningzerotoall.
Sure, that's probably a separate issue, I'll send in a numpy PR when I have time.
As for the remark about this being advanced students, I think advanced students deserve better datasets. I'm personally a bit uncomfortable with the data used ([1 2 3] -> [1 2 3]) as it's not the best data for demonstrating the characteristics of the underlying algorithms involved. Obviously, this is a subjective remark from one person so feel free to ignore it.
Aside from that nit, LGTM. (LGTM is not for the slides, I haven't looked at them carefully so I don't have any remarks)
from deeplearningzerotoall.
@cynthia "I'll send in a numpy PR when I have time." +1
"data used ([1 2 3] -> [1 2 3]) as it's not the best" , agree. However, I used that in my theory lecture part, so it's hard to change in the lab. When I remake the theory video, I'll change it. Thanks for your comments.
from deeplearningzerotoall.
Related Issues (20)
- lab-07-2 vs lab-07-3 HOT 1
- Question fashion miniest 도와주세요!!
- lab-11-2-mnist_deep_cnn The output of Fully-connected layer1 HOT 1
- hope have a English video HOT 2
- d_l1 i lab-09-5-linear_back_prop.py HOT 1
- tf.layers.dropout()'s rate is not keep_prob HOT 3
- lab_10_5 dropout HOT 3
- Code does not work(lab-12-0-rnn_basics) HOT 1
- Wrong commented in lab-09-4-xor_tensorboard.py to run the tensorboard HOT 1
- possible look-ahead bias in lab-12-5-rnn-stock HOT 4
- lab-04-4-tf_reader_linear_regression.py HOT 2
- lab-02-2-linear_regression_feed.py need `global_variables_initializer` HOT 2
- lab-10-1-mnist_softmax Weight,bias 관련 질문있습니다. HOT 2
- always same prediction 질문입니다. HOT 1
- Anyone interested in sending PR to change this code to TF 2.0? HOT 4
- keras and TF2 folders HOT 1
- Question on lab-12-0-rnn_basics.ipynb HOT 1
- hello HOT 1
- DeepLearningZeroToAll HOT 1
- - Create a new issue HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deeplearningzerotoall.