For example, <a href="https://github.com/hunkim/DeepLearningZeroToAl

We just need to divide by its batch size. <div class="highlight highlight-source-p

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Refactored: <div class="highlight highlight-source-python notranslate position-rel

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Our manual backprop weight average is missing about deeplearningzerotoall HOT 12 CLOSED

hunkim commented on May 14, 2024

Our manual backprop weight average is missing

from deeplearningzerotoall.

Comments (12)

kkweon commented on May 14, 2024

We just need to divide by its batch size.

N = X.shape[0]
d_W1 = tf.matmul(tf.transpose(X), d_l1) / N

from deeplearningzerotoall.

hunkim commented on May 14, 2024

How about something like this? I removed sigma_prime.

W1 = tf.Variable(tf.random_normal([2, 2]), name='weight1')
b1 = tf.Variable(tf.random_normal([2]), name='bias1')
layer1 = tf.sigmoid(tf.matmul(X, W1) + b1)

W2 = tf.Variable(tf.random_normal([2, 1]), name='weight2')
b2 = tf.Variable(tf.random_normal([1]), name='bias2')
Y_pred = tf.sigmoid(tf.matmul(layer1, W2) + b2)

# cost/loss function
cost = -tf.reduce_mean(Y * tf.log(Y_pred) + (1 - Y) *
                       tf.log(1 - Y_pred))

d_Y_pred = (Y_pred - Y) / (Y_pred * (1.0 - Y_pred) + 1e-7)
d_sigma = Y_pred * (1 - Y_pred)

# Layer 2
d_o2 = d_Y_pred * d_sigma
d_l2 = tf.multiply(d_o2, d_sigma)
d_b2 = d_l2
d_W2 = tf.matmul(tf.transpose(layer1), d_l2)

# Mean
d_b2_mean = tf.reduce_mean(d_b2, axis=[0])
d_W2_mean = d_W2 / tf.cast(tf.shape(layer1)[0], dtype=tf.float32)

# Layer 1
d_o1 = layer1 * (1-layer1)
d_l1 = tf.multiply(tf.matmul(d_l2, tf.transpose(W2)), d_o1)
d_b1 = d_l1
d_W1 = tf.matmul(tf.transpose(X), d_l1)

# Mean
d_W1_mean = d_W1 / tf.cast(tf.shape(X)[0], dtype=tf.float32)
d_b1_mean = tf.reduce_mean(d_b1, axis=[0])

# Weight update
step = [
  tf.assign(W2, W2 - learning_rate * d_W2_mean),
  tf.assign(b2, b2 - learning_rate * d_b2_mean),
  tf.assign(W1, W1 - learning_rate * d_W1_mean),
  tf.assign(b1, b1 - learning_rate * d_b1_mean)
]

from deeplearningzerotoall.

kkweon commented on May 14, 2024

I don't have a machine to run a test right now, but I guess it will work.
However, I'm not sure if it's better. It looks quite complicated to me already.

from deeplearningzerotoall.

hunkim commented on May 14, 2024

@kkweon Do you need a machine to run? :-) I think your brain is enough.

The previous code starts with diff:
diff = hypothesis - Y which is hard to understand.

Let me know if you have any refactoring suggestions.

from deeplearningzerotoall.

kkweon commented on May 14, 2024

@hunkim
I personally prefer d_o2 * d_sigma over tf.multiply(d_o2, d_sigma).
Because

it's more natural
it's safer because every operation is overridden in tf.Tensor class and tested
Less verbose

Like you remember when it turned into 1.0, all the basic operations were renamed.
People who used tf.mul had to manually fix their codes to tf.multiply.

from deeplearningzerotoall.

hunkim commented on May 14, 2024

Refactored:

# cost/loss function                                                          
cost = -tf.reduce_mean(Y * tf.log(Y_pred) + (1 - Y) *                         
                       tf.log(1 - Y_pred))                                    
                                                                              
# Loss derivative                                                             
d_Y_pred = (Y_pred - Y) / (Y_pred * (1.0 - Y_pred) + 1e-7)                    
                                                                              
# Layer 2                                                                     
d_sigma2 = Y_pred * (1 - Y_pred)                                              
d_l2 = d_Y_pred * d_sigma2                                                    
d_b2 = d_l2                                                                   
d_W2 = tf.matmul(tf.transpose(layer1), d_l2)                                  
                                                                              
# Mean                                                                        
d_b2_mean = tf.reduce_mean(d_b2, axis=[0])                                    
d_W2_mean = d_W2 / tf.cast(tf.shape(layer1)[0], dtype=tf.float32)             
                                                                              
# Layer 1                                                                     
d_sigma1 = layer1 * (1-layer1)                                                
d_l1 = d_l2 * d_sigma1                                                        
d_b1 = d_l1                                                                   
d_W1 = tf.matmul(tf.transpose(X), d_l1)                                       
                                                                              
# Mean                                                                        
d_W1_mean = d_W1 / tf.cast(tf.shape(X)[0], dtype=tf.float32)                  
d_b1_mean = tf.reduce_mean(d_b1, axis=[0])                                    
                                                                              
# Weight update                                                               
step = [                                                                      
  tf.assign(W2, W2 - learning_rate * d_W2_mean),                              
  tf.assign(b2, b2 - learning_rate * d_b2_mean),                              
  tf.assign(W1, W1 - learning_rate * d_W1_mean),                              
  tf.assign(b1, b1 - learning_rate * d_b1_mean)                               
]

make sense?

from deeplearningzerotoall.

kkweon commented on May 14, 2024

looks good. autopep8 will do the rest.

from deeplearningzerotoall.

hunkim commented on May 14, 2024

@kkweon This is right version:

# Network                                                              
#          p1     a1           l1     p2     a2           l2 (y_pred)  
# X -> (*) -> (+) -> (sigmoid) -> (*) -> (+) -> (sigmoid) -> (loss)    
#       ^      ^                   ^      ^                            
#       |      |                   |      |                            
#       W1     b1                  W2     b2                           
                                                                       
# Loss derivative                                                      
d_Y_pred = (Y_pred - Y) / (Y_pred * (1.0 - Y_pred) + 1e-7)             
                                                                       
# Layer 2                                                              
d_sigma2 = Y_pred * (1 - Y_pred)                                       
d_a2 = d_Y_pred * d_sigma2                                             
d_p2 = d_a2                                                            
d_b2 = d_a2                                                            
d_W2 = tf.matmul(tf.transpose(l1), d_p2)                               
                                                                       
# Mean                                                                 
d_b2_mean = tf.reduce_mean(d_b2, axis=[0])                             
d_W2_mean = d_W2 / tf.cast(tf.shape(l1)[0], dtype=tf.float32)          
                                                                       
# Layer 1                                                              
d_l1 = tf.matmul(d_p2, tf.transpose(W2))                               
d_sigma1 = l1 * (1 - l1)                                               
d_a1 = d_l1 * d_sigma1                                                 
d_b1 = d_a1                                                            
d_p1 = d_a1                                                            
d_W1 = tf.matmul(tf.transpose(X), d_a1)                                
                                                                       
# Mean                                                                 
d_W1_mean = d_W1 / tf.cast(tf.shape(X)[0], dtype=tf.float32)           
d_b1_mean = tf.reduce_mean(d_b1, axis=[0])                             
                                                                       
# Weight update                                                        
step = [                                                               
  tf.assign(W2, W2 - learning_rate * d_W2_mean),                       
  tf.assign(b2, b2 - learning_rate * d_b2_mean),                       
  tf.assign(W1, W1 - learning_rate * d_W1_mean),                       
  tf.assign(b1, b1 - learning_rate * d_b1_mean)                        
]

Can you run in your brain?

from deeplearningzerotoall.

kkweon commented on May 14, 2024

Yes, the comment really helped. It looks great.
If I can, anyone should be able to run this in his/her brain. So, it's awesome.

from deeplearningzerotoall.

hunkim commented on May 14, 2024

@kkweon Do you like the naming? p for product and a for addition.

from deeplearningzerotoall.

kkweon commented on May 14, 2024

@hunkim it should be fine with the comment. Honestly I thought it is a name for the activation layer but I was able to figure it out by reading the comment

from deeplearningzerotoall.

hunkim commented on May 14, 2024

@kkweon Still I don't like names. Let me know if you have any suggestions.

from deeplearningzerotoall.

Our manual backprop weight average is missing about deeplearningzerotoall HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent