Code Monkey home page Code Monkey logo

Comments (12)

kkweon avatar kkweon commented on May 14, 2024

We just need to divide by its batch size.

N = X.shape[0]
d_W1 = tf.matmul(tf.transpose(X), d_l1) / N

from deeplearningzerotoall.

hunkim avatar hunkim commented on May 14, 2024

How about something like this? I removed sigma_prime.

W1 = tf.Variable(tf.random_normal([2, 2]), name='weight1')
b1 = tf.Variable(tf.random_normal([2]), name='bias1')
layer1 = tf.sigmoid(tf.matmul(X, W1) + b1)

W2 = tf.Variable(tf.random_normal([2, 1]), name='weight2')
b2 = tf.Variable(tf.random_normal([1]), name='bias2')
Y_pred = tf.sigmoid(tf.matmul(layer1, W2) + b2)

# cost/loss function
cost = -tf.reduce_mean(Y * tf.log(Y_pred) + (1 - Y) *
                       tf.log(1 - Y_pred))

d_Y_pred = (Y_pred - Y) / (Y_pred * (1.0 - Y_pred) + 1e-7)
d_sigma = Y_pred * (1 - Y_pred)

# Layer 2
d_o2 = d_Y_pred * d_sigma
d_l2 = tf.multiply(d_o2, d_sigma)
d_b2 = d_l2
d_W2 = tf.matmul(tf.transpose(layer1), d_l2)

# Mean
d_b2_mean = tf.reduce_mean(d_b2, axis=[0])
d_W2_mean = d_W2 / tf.cast(tf.shape(layer1)[0], dtype=tf.float32)

# Layer 1
d_o1 = layer1 * (1-layer1)
d_l1 = tf.multiply(tf.matmul(d_l2, tf.transpose(W2)), d_o1)
d_b1 = d_l1
d_W1 = tf.matmul(tf.transpose(X), d_l1)

# Mean
d_W1_mean = d_W1 / tf.cast(tf.shape(X)[0], dtype=tf.float32)
d_b1_mean = tf.reduce_mean(d_b1, axis=[0])

# Weight update
step = [
  tf.assign(W2, W2 - learning_rate * d_W2_mean),
  tf.assign(b2, b2 - learning_rate * d_b2_mean),
  tf.assign(W1, W1 - learning_rate * d_W1_mean),
  tf.assign(b1, b1 - learning_rate * d_b1_mean)
]

from deeplearningzerotoall.

kkweon avatar kkweon commented on May 14, 2024

I don't have a machine to run a test right now, but I guess it will work.
However, I'm not sure if it's better. It looks quite complicated to me already.

from deeplearningzerotoall.

hunkim avatar hunkim commented on May 14, 2024

@kkweon Do you need a machine to run? :-) I think your brain is enough.

The previous code starts with diff:
diff = hypothesis - Y which is hard to understand.

Let me know if you have any refactoring suggestions.

from deeplearningzerotoall.

kkweon avatar kkweon commented on May 14, 2024

@hunkim
I personally prefer d_o2 * d_sigma over tf.multiply(d_o2, d_sigma).
Because

  • it's more natural
  • it's safer because every operation is overridden in tf.Tensor class and tested
  • Less verbose

Like you remember when it turned into 1.0, all the basic operations were renamed.
People who used tf.mul had to manually fix their codes to tf.multiply.

from deeplearningzerotoall.

hunkim avatar hunkim commented on May 14, 2024

Refactored:

# cost/loss function                                                          
cost = -tf.reduce_mean(Y * tf.log(Y_pred) + (1 - Y) *                         
                       tf.log(1 - Y_pred))                                    
                                                                              
# Loss derivative                                                             
d_Y_pred = (Y_pred - Y) / (Y_pred * (1.0 - Y_pred) + 1e-7)                    
                                                                              
# Layer 2                                                                     
d_sigma2 = Y_pred * (1 - Y_pred)                                              
d_l2 = d_Y_pred * d_sigma2                                                    
d_b2 = d_l2                                                                   
d_W2 = tf.matmul(tf.transpose(layer1), d_l2)                                  
                                                                              
# Mean                                                                        
d_b2_mean = tf.reduce_mean(d_b2, axis=[0])                                    
d_W2_mean = d_W2 / tf.cast(tf.shape(layer1)[0], dtype=tf.float32)             
                                                                              
# Layer 1                                                                     
d_sigma1 = layer1 * (1-layer1)                                                
d_l1 = d_l2 * d_sigma1                                                        
d_b1 = d_l1                                                                   
d_W1 = tf.matmul(tf.transpose(X), d_l1)                                       
                                                                              
# Mean                                                                        
d_W1_mean = d_W1 / tf.cast(tf.shape(X)[0], dtype=tf.float32)                  
d_b1_mean = tf.reduce_mean(d_b1, axis=[0])                                    
                                                                              
# Weight update                                                               
step = [                                                                      
  tf.assign(W2, W2 - learning_rate * d_W2_mean),                              
  tf.assign(b2, b2 - learning_rate * d_b2_mean),                              
  tf.assign(W1, W1 - learning_rate * d_W1_mean),                              
  tf.assign(b1, b1 - learning_rate * d_b1_mean)                               
]                                                                             

make sense?

from deeplearningzerotoall.

kkweon avatar kkweon commented on May 14, 2024

looks good. autopep8 will do the rest.

from deeplearningzerotoall.

hunkim avatar hunkim commented on May 14, 2024

@kkweon This is right version:

# Network                                                              
#          p1     a1           l1     p2     a2           l2 (y_pred)  
# X -> (*) -> (+) -> (sigmoid) -> (*) -> (+) -> (sigmoid) -> (loss)    
#       ^      ^                   ^      ^                            
#       |      |                   |      |                            
#       W1     b1                  W2     b2                           
                                                                       
# Loss derivative                                                      
d_Y_pred = (Y_pred - Y) / (Y_pred * (1.0 - Y_pred) + 1e-7)             
                                                                       
# Layer 2                                                              
d_sigma2 = Y_pred * (1 - Y_pred)                                       
d_a2 = d_Y_pred * d_sigma2                                             
d_p2 = d_a2                                                            
d_b2 = d_a2                                                            
d_W2 = tf.matmul(tf.transpose(l1), d_p2)                               
                                                                       
# Mean                                                                 
d_b2_mean = tf.reduce_mean(d_b2, axis=[0])                             
d_W2_mean = d_W2 / tf.cast(tf.shape(l1)[0], dtype=tf.float32)          
                                                                       
# Layer 1                                                              
d_l1 = tf.matmul(d_p2, tf.transpose(W2))                               
d_sigma1 = l1 * (1 - l1)                                               
d_a1 = d_l1 * d_sigma1                                                 
d_b1 = d_a1                                                            
d_p1 = d_a1                                                            
d_W1 = tf.matmul(tf.transpose(X), d_a1)                                
                                                                       
# Mean                                                                 
d_W1_mean = d_W1 / tf.cast(tf.shape(X)[0], dtype=tf.float32)           
d_b1_mean = tf.reduce_mean(d_b1, axis=[0])                             
                                                                       
# Weight update                                                        
step = [                                                               
  tf.assign(W2, W2 - learning_rate * d_W2_mean),                       
  tf.assign(b2, b2 - learning_rate * d_b2_mean),                       
  tf.assign(W1, W1 - learning_rate * d_W1_mean),                       
  tf.assign(b1, b1 - learning_rate * d_b1_mean)                        
]                                                                      

Can you run in your brain?

from deeplearningzerotoall.

kkweon avatar kkweon commented on May 14, 2024

Yes, the comment really helped. It looks great.
If I can, anyone should be able to run this in his/her brain. So, it's awesome.

from deeplearningzerotoall.

hunkim avatar hunkim commented on May 14, 2024

@kkweon Do you like the naming? p for product and a for addition.

from deeplearningzerotoall.

kkweon avatar kkweon commented on May 14, 2024

@hunkim it should be fine with the comment. Honestly I thought it is a name for the activation layer but I was able to figure it out by reading the comment

from deeplearningzerotoall.

hunkim avatar hunkim commented on May 14, 2024

@kkweon Still I don't like names. Let me know if you have any suggestions.

from deeplearningzerotoall.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.