- What is Neural Network
- How Neural Network Works
- Steps to build a Neural Network
- How Forward propagation works
- Error Calculation Neural Network
- How Back Propagation works
- Matrix calculation of neural network in Python
Architecture of Neural Network
Neural Network Formation
[mathbf{ Sigmoid; function = frac{1}{1+e^{-x}}}]
[mathbf{Output;of;H_{1} = frac{1}{1+e^{-H_1}}}]
Steps to train Neural Network
Forward Propagation in Neural Network
To explain how neural network works, let’s assume we have only one row of below dataset.
1. Hidden layer Calculation
[mathbf{Out;h_1=frac{1}{1+e^{-h_1}}}] So, [ Out;h_1=frac{1}{1+e^{-0.4176}}=0.60290881]
[Out;h_2=frac{1}{1+e^{-0.4425}}=0.60885457]
2. Output layer Calculation
= 0.60290881 * 0.44 + 0.60885456921 * 0.48 + 0.42
[Out;y_1=frac{1}{1+e^{-0.97753007}}=0.72661785]
= 0.60290881 * 0.23 + 0.60885457 * 0.29 + 0.42
[Out;y_2=frac{1}{1+e^{-0.73523685}}=0.67595341]
Error Calculation in Neural Network
So calculating mean square error for y1
[mathbf{E_1} = frac{1}{2} (T_1-Out y_1 )^2= frac{1}{2} (0.01-0.72661785)^2= 0.25677057]
[mathbf{E_2} = frac{1}{2} (T_2-Out y_2 )^2= frac{1}{2} (0.99-0.67595341)^2= 0.04931263]
Back Propagation in Neural Network
1.Back Propagation for Output Layer
Consider weight w5, we want to know how much changes in weight w5affect the total error of our neural network ().
[mathbf{frac{dE_{Total}}{dw_5}}=frac{dE_{Total}}{dOut y_1}*frac{dOut y_1}{dy_1}*frac{dy_1}{dw_5};;[Applying;chain;rule]]
[mathbf{E_{Total} = frac{1}{2} (T_1-Out y_1 )^2+frac{1}{2} (T_2-Out y_2 )^2}]
[mathbf{frac{dE_{Total}}{dOut y_1}}=2*frac{1}{2}(T_1-Out y_1)^{2-1}*(0-1)+0 ;; [As;frac{d}{dx}x^n=nx^{n-1}]] [=-(T_1-Out y_1 )=-(0.01-0.72661785)=0.71661785]
[mathbf{frac{dE_{Total}}{dOut y_1}= 0.71661785}]
[Out;y_1=frac{1}{1+e^{-y_1}}]
[mathbf{frac{d;Out y_1}{dy_1}}=frac{d}{dy_1}(1+e^{-y_1})^{-1}] [=-1*(1+e^{-y_1})^{-1-1}*[0+(-e^{-y_1})];;[As mathbf{frac{d}{dx}e^{-x}=-e^{-x}};,;derivative;of;sigmoid;explained;at;the;end;of;this;tutorial]] [=-(1+e^{-y_1})^{-2}*(-e^{-y_1})] [=frac{e^{-y_1}}{(1+e^{-y_1})^2}] [=frac{1}{(1+e^{-y_1})}*frac{e^{-y_1}}{(1+e^{-y_1})}] [=frac{1}{(1+e^{-y_1})}*frac{(1+e^{-y_1})-1}{(1+e^{-y_1})}] [=frac{1}{(1+e^{-y_1})}*[1-frac{1}{(1+e^{-y_1})}]]
[mathbf{frac{d;Out;y_1}{dy_1}=0.19864435}]
[mathbf{frac{dy_1}{dw_5}=Out;h_1=0.60290881}]
[mathbf{frac{dE_{Total}}{dw_5}}=frac{dE_{Total}}{dOut;y_1}*frac{dOut;y_1}{dy_1}*frac{dy_1}{dw_5}] [mathbf{frac{dE_{Total}}{dw_5}=0.71661785*0.19864435* 0.60290881=0.08582533}]
Update w5with change value

[mathbf{(new)w_5=w_5-eta *frac{dE_{Total}}{dw_5}};;[eta;is;learning;rate]]
2.Back Propagation for Hidden Layer

[mathbf{frac{dE_Total}{dw_1}=frac{dE_Total}{dOut;h_1}*frac{dOut;h_1}{dh_1}*frac{dh_1}{dw_1};;[Applying;chain;rule]}]
[mathbf{frac{dE_{Total}}{dOut;h_1}=frac{dE_1}{dOut;h_1}+frac{dE_2}{dOut;h_1}}]
[mathbf{frac{dE_1}{dOut;h_1}=frac{dE_1}{dy_1}*frac{dy_1}{dOut;h_1}}]
[mathbf{frac{dE_1}{dy_1}}=frac{dE_1}{dOut;y_1}*frac{dOut;y_1}{dy_1}] [frac{dE_1}{dy_1}=[2*frac{1}{2}(T_1-Out;y_1 )^{2-1}*(0-1)]*frac{dOut;y_1}{dy_1};;mathbf{[As;E1 = frac{1}{2}(T_1-Out;y_1 )^2]}] [frac{dE_1}{dy_1}=[0.71661785]*frac{dOut;y_1}{dy_1}] [mathbf{frac{dE_1}{dy_1}}=0.71661785*mathbf{0.19864435}=mathbf{0.14235209}]
[mathbf{frac{dy_1}{dOut;h_1}}=frac{d}{dOut;h_1}(Out;h_1*w_5+Out h_2*w_6+b_2)\\\ =w5 +0 +0=0.44]
[mathbf{frac{dE_1}{dOut;h_1}=frac{dE_1}{dy_1}*frac{dy_1}{dOut;h_1}=0.14235209*0.44=0.06263492}]
[mathbf{frac{dE_2}{dOut;h_1}=frac{dE_2}{dy_2}*frac{dy_2}{dOut;h_1}}]
[mathbf{frac{dE_2}{dy_2}}=frac{dE_2}{dOut;y_2}*frac{dOut;y_2}{dy_2}] [frac{dE_2}{dy_2}=frac{d}{dOut;y_2}[frac{1}{2}(T_2-Out;y_2)^2]*frac{dOut;y_2}{dy_2}] [frac{dE_2}{dy_2}=[2*frac{1}{2}(T_2-Out;y_2 )^{2-1}*(0-1)]*frac{dOut;y_2}{dy_2}] [frac{dE_2}{dy_2}=[(0.99-0.67595341)*(0-1)]*frac{dOut;y_2}{dy_2}] [frac{dE_2}{dy_2}=-0.31404659*frac{dOut;y_2}{dy_2}] [frac{dE_2}{dy_2}=-0.31404659;*;[mathbf{Out y_2 (1-Out;y_2 )}];;[Similar;type;of;calculation;have;done;before]] [frac{dE_2}{dy_2}=-0.31404659*[0.67595341(1-0.67595341)]] [frac{dE_2}{dy_2}=-0.31404659*0.21904039] [mathbf{frac{dE_2}{dy_2}=-0.06878889}]
[frac{dy_2}{dOut;h_1}= frac{d}{dOut;h_1}(Out;h_1*w_7+Out h_2*w_8+b_2)\\ = w7 +0 +0= 0.23]
[mathbf{frac{dE_2}{dOut;h_1}=frac{dE_2}{dy_2}*frac{dy_2}{dOut;h_1}=-0.06878889*0.23=-0.01582144}]
[mathbf{frac{dE_{Total}}{dOut;h_1}=frac{dE_1}{dOut;h_1}+frac{dE_2}{dOut;h_1}}= 0.06263492 – 0.01582144\\\\ mathbf{frac{dE_{Total}}{dOut;h_1}=0.04681348}]
[mathbf{frac{dE_{Total}}{dw_1}=frac{dE_{Total}}{dOut;h_1}*frac{dOut;h_1}{dh_1}*frac{dh_1}{dw_1}}]
[mathbf{frac{dOut;h_1}{dh_1}=Outh_1 (1-Outh_1)}= 0.60290881(1 – 0.60290881)] [mathbf{frac{dOut;h_1}{dh_1}=0.23940978}]
[mathbf{frac{dh_1}{dw_1}=frac{d}{dw_1}(x_1 w_1+x_2 w_2+b_1 )=x_1=0.03}]
[mathbf{frac{dE_{Total}}{dw_1}=frac{dE_Total}{dOut;h_1}*frac{dOut;h_1}{dh_1}*frac{dh_1}{dw_1}}=mathbf{0.04681348 * 0.23940978 * 0.03}\\ mathbf{frac{dE_{Total}}{dw_1}=0.00033623}]
Update w1with change value

[mathbf{(new)w_1=w_1-eta*frac{dE_{Total}}{dw_1};;[eta;is;learning;rate]}\\ =0.11 – 0.3 * mathbf{0.00033623} ;;[Taking;eta; = 0.3]\\ So,\\ mathbf{(new)w_1 = 0.10989913}]
******* This is the end of 1st iteration of our Neural Network model *******
It is important to note that the model is not trained properly yet, as we only back-propagated through one sample (first row) from the training set. Doing all we did, all over again for all the samples (each row) will yield a complete model.
Neural Network Matrix Calculation in Python
Hidden Layer Matrix Calculation
[begin{pmatrix}h_1 \ h_2 end{pmatrix} = begin{pmatrix}x_1 \ x_2 end{pmatrix} begin{pmatrix} w_1 & w_2\ w_3 & w_4 end{pmatrix} = begin{pmatrix} x_1w_1 +x_2w_2 +b_1\ x_1w_3 +x_2w_4 +b_1 end{pmatrix}]
[begin{pmatrix}Out;h_1 \ Out;h_2 end{pmatrix} = begin{pmatrix}phi (h_1) \ phi(h_2) end{pmatrix} = begin{pmatrix} phi(x_1w_1 +x_2w_2 +b_1)\ phi(x_1w_3 +x_2w_4 +b_1) end{pmatrix}]
Output Layer Matrix Calculation
[begin{pmatrix} y_1\ y_2 end{pmatrix} = begin{pmatrix} h_1\h_2 end{pmatrix} begin{pmatrix} w_5 & w_6\ w_7 & w_8 end{pmatrix} =begin{pmatrix} h_1 w_5+h_2 w_6+b_2\ h_1 w_7+h_2 w_8+b_2 end{pmatrix}] [begin{pmatrix} Out;y_1\ Out;y_2 end{pmatrix} = begin{pmatrix} phi(y_1)\phi(y_2) end{pmatrix} =begin{pmatrix} phi(h_1 w_5+h_2 w_6+b_2)\ phi(h_1 w_7+h_2 w_8+b_2) end{pmatrix}]
Error Calculation
Neural Network in Numpy Python
##########################################################################
# Neural Network from Scratch using numpy
##########################################################################
import numpy as np
# input data x variable
x_val = np.array([[0.03, 0.09],
[0.04, 0.10],
[0.05, 0.11],
[0.06, 0.12]])
# output data y variable
y_val = np.array([[0.01, 0.99],
[0.99, 0.01],
[0.01, 0.99],
[0.99, 0.01]])
###############################################
# Initializing weights
# 1st layer Weights
w1 = 0.11
w2 = 0.27
w3 = 0.19
w4 = 0.52
# 2nd layer weights
w5 = 0.44
w6 = 0.48
w7 = 0.23
w8 = 0.29
# Bias
b1 = 0.39
b2 = 0.42
# Learning rate
eta = 0.3
# setting 100 iteration to tune our neural network algorithm
iteration = 100
# 1st layer weights matrix
weights_h1 = np.array([[w1], [w2]])
weights_h2 = np.array([[w3], [w4]])
# 2nd layer weights matrix
weights_y1 = np.array([[w5], [w6]])
weights_y2 = np.array([[w7], [w8]])
##################### Forward Propagation ##########################
# Entire hidden layer weight matrix
weights_h = np.row_stack((weights_h1.T, weights_h2.T))
# Entire output layer weight matrix
weights_y = np.row_stack((weights_y1.T, weights_y2.T))
# Sigmoid Activation function ==> S(x) = 1/1+e^(-x)
def sigmoid(x, deriv=False):
if deriv == True:
return x * (1 - x)
return 1 / (1 + np.exp(-x))
h = np.dot(x_val, weights_h.T) + b1
# Entire 1st layer output matrix
out_h = sigmoid(h)
y = np.dot(out_h, weights_y.T) + b2
# Entire 2nd layer output matrix
out_y = sigmoid(y)
##################### Error Calculation ##########################
# E as E total
E_total = (np.square(y_val - out_y))/2
##################### Back Propagation ##########################
# 1. Update 2nd layer weights with change value 111111111111111111
# (dE_Total)/(dOut y_1 )
dE_total_dout_y = -(y_val - out_y)
# (d Out y_1)/(dy_1 )
dout_y_dy = out_y * (1 - out_y)
# (dy_1)/(dw_5 )
dy_dw = out_h
# For each iteration
for iter in range(iteration):
# Foreach row of input data update 2nd layer weight matrix
for row in range(len(x_val)):
# row = 0
# (dE_Total)/(dw_5 ) = (dE_Total)/(dOut y_1 )*(dOut y_1)/(dy_1 )*(dy_1)/(dw_5 )
dE_Total_dw5 = dE_total_dout_y[row][0] * round(dout_y_dy[row][0], 8) * dy_dw[0][0]
dE_Total_dw5 = round(dE_Total_dw5, 8)
# (dE_Total)/(dw_5 ) = (dE_Total)/(dOut y_1 )*(dOut y_1)/(dy_1 )*(dy_1)/(dw_5 )
dE_Total_dw6 = dE_total_dout_y[row][0] * round(dout_y_dy[row][0], 8) * dy_dw[0][1]
dE_Total_dw6 = round(dE_Total_dw6, 8)
# (dE_Total)/(dw_5 ) = (dE_Total)/(dOut y_1 )*(dOut y_1)/(dy_1 )*(dy_1)/(dw_5 )
dE_Total_dw7 = dE_total_dout_y[row][0] * round(dout_y_dy[row][1], 8) * dy_dw[0][0]
dE_Total_dw7 = round(dE_Total_dw7, 8)
# (dE_Total)/(dw_5 ) = (dE_Total)/(dOut y_1 )*(dOut y_1)/(dy_1 )*(dy_1)/(dw_5 )
dE_Total_dw8 = dE_total_dout_y[row][0] * round(dout_y_dy[row][1], 8) * dy_dw[0][1]
dE_Total_dw8 = round(dE_Total_dw8, 8)
# Combine all differential weights
dE_Total_dw_2nd_layer = np.array([[dE_Total_dw5, dE_Total_dw6],
[dE_Total_dw7, dE_Total_dw8]])
# Updated weights for 2nd layer
# (new)w_5 = w_5-η*(dE_Total)/(dw_5 ) [η is learning rate]
weights_y = weights_y - (eta * dE_Total_dw_2nd_layer)
weights_y
# 2. Update 1st layer weights with change value 22222222222222222
# (dE_2)/(dy_2 )=(dE_2)/(d〖Out y〗_2 )*(d〖Out y〗_2)/(dy_2 )
dE_dy = -(y_val - out_y) * (out_y * (1-out_y))
# (dE_1)/(dOut h_1 )= (dE_1)/(dy_1 )*(dy_1)/(dOut h_1 )
dE_dOut_h1 = dE_dy * np.array([[w5, w7]])
# (dE_2)/(dOut h_1 )= (dE_2)/(dy_2 )*(dy_2)/(dOut h_1 )
dE_dOut_h2 = dE_dy * np.array([[w6, w8]])
# (dE_Total)/(dOut h_1 )=(dE_1)/(dOut h_1 )+(dE_2)/(dOut h_1 )
dE_Total_dOut_h1 = dE_dOut_h1[row][0] + dE_dOut_h1[row][1]
# (dOut h_1)/(dh_1 )=Outh_1 (1-Outh_1)
dOut_h_dh = out_h * (1-out_h)
# dh1_dw1 = x
dh_dw = x_val
# (dE_Total)/(dw_1 )=(dE_Total)/(dOut h_1 )*(dOut h_1)/(dh_1 )*(dh_1)/(dw_1 )
dE_Total_dw1 = dE_Total_dOut_h1 * dOut_h_dh[row][0] * dh_dw[row][0]
dE_Total_dw1 = round(dE_Total_dw1, 8)
# (dE_Total)/(dw_1 )=(dE_Total)/(dOut h_1 )*(dOut h_1)/(dh_1 )*(dh_1)/(dw_1 )
dE_Total_dw2 = dE_Total_dOut_h1 * dOut_h_dh[row][0] * dh_dw[row][1]
dE_Total_dw2 = round(dE_Total_dw2, 8)
# (dE_Total)/(dw_1 )=(dE_Total)/(dOut h_1 )*(dOut h_1)/(dh_1 )*(dh_1)/(dw_1 )
dE_Total_dw3 = dE_Total_dOut_h1 * dOut_h_dh[row][1] * dh_dw[row][0]
dE_Total_dw3 = round(dE_Total_dw3, 8)
# (dE_Total)/(dw_1 )=(dE_Total)/(dOut h_1 )*(dOut h_1)/(dh_1 )*(dh_1)/(dw_1 )
dE_Total_dw4 = dE_Total_dOut_h1 * dOut_h_dh[row][1] * dh_dw[row][1]
dE_Total_dw4 = round(dE_Total_dw4, 8)
# Combine all differential weights
dE_Total_dw_1st_layer = np.array([[dE_Total_dw1, dE_Total_dw2],
[dE_Total_dw3, dE_Total_dw4]])
# update weights w1
weights_h = weights_h - (eta * dE_Total_dw_1st_layer)
print('iteration: ' + str(iter) + ' complete')
To ease to remember everything, let me list down all equations of Neural Network
Neural Network Equations
Forward Propagation
h1= x1w1 + x2w2 +b1
[mathbf{Out;h_1=frac{1}{1+e^{-h_1}}}]
h2 = x1w3 + x2w4+b1
[mathbf{Out;h_2=frac{1}{1+e^{-h_2}}}]
y1 = Outh1*w5 + Outh2*w6 + b2
[mathbf{Out;y_1=frac{1}{1+e^{-y_1}}}]
y2 = Outh1*w7 + Outh2*w8 + b2
[mathbf{Out;y_2=frac{1}{1+e^{-y_2}}}]
Error Calculation
[mathbf{E_1 = frac{1}{2}(T_1-Out;y_1 )^2}] [mathbf{E_2 = frac{1}{2}(T_2-Out;y_2 )^2}]
Back Propagation
1.Update 2nd layer weights
[mathbf{frac{dE_{Total}}{dw_5}=frac{dE_{Total}}{dOut;y_1}*frac{dOut;y_1}{dy_1}*frac{dy_1}{dw_5}}]
2.Update 1st layer weights
[mathbf{frac{dE_{Total}}{dw_1}=frac{dE_{Total}}{dOut;h_1}*frac{dOut;h_1}{dh_1}*frac{dh_1}{dw_1}}]
Conclusion
Derivative of sigmoid function
[mathbf{frac{d}{dx}e^{-x}=e^{-x}}]
Explanation
[mathbf{frac{d}{dx}e^{-x}=frac{de^{-x}}{d(-x)}*frac{d}{dx}(-x)=e^{-x}*(-1)=-e^{-x}}]
Thanks for this blog and it more informative and useful to read. It’s a depth post to read and improve knowledge about neural network.