Neural Network is used everywhere like speech recognition, face recognition, marketing, healthcare, etc. Artificial Neural networks mimic the behavior of human brain and try to solve any given (data-driven) problem like human. Neural Network consists of multiple layers of Perceptrons. When you fed some input data to Neural Network, this data is then passed through those multiple layers of Perceptrons to produce the desired output.

In this tutorial, I will explain each step to train a neural network from scratch in python with a simple example and write **Neural Network from Scratch using numpy Python**. After reading this tutorial you will have answers for below questions:

*What is Neural Network**How Neural Network Works**Steps to build a Neural Network**How Forward propagation works**Error Calculation Neural Network**How Back Propagation works**Matrix calculation of neural network in Python*

Before moving into each step of neural network let me give you an overview of Neural Network Architecture.

**Architecture of Neural Network**

A neural network consists of three layers:

**Input Layer:**In this layer, input data needs to feed. Input of input layer goes to the hidden layer- Hidden Layer: Locate between input and output layer. The input of hidden layer is output of input layer. In the real-world example, there can be multiple hidden layers. To explain neural network, I am using one hidden layer in this article
- Output Layer: Output of hidden layer goes to output layer. This layer generate predicted output of Neural Network. In the above picture and for this article I am considering two class Neural Network (Out y
_{1}, Out y_{2})

**Neural Network Formation**

Before listing down all equations of a simple neural network, let me clear you that, an artificial neural network equation consist of three things:

**Linear function****Bias****Activation function**

Output of any layer is the combination of **bias** and **activation function** with a **linear function**.

For example

**Input of H _{1}** (or Output of x

_{1}) = x

_{1}w

_{1}+ x

_{2}w

_{2}+ b

_{1}

Here

**x _{1}w_{1}+ x_{2}w_{2}** is the linear function

**b _{1}** is the bias (constant)

**Activation function** is required to calculate output of any layer.

Now let’s calculate **output of H _{1}**

To calculate output of H_{1} you need to apply activation function to input of H_{1}. You can use any activation function like Sigmoid, Tanh, ReLu, etc. For this tutorial, I am **using sigmoid function** as my activation function.

Let me show you the equation for the sigmoid function.

So after applying activation function with input of H_{1}, we will get Output of H_{1}

**Steps to train Neural Network**

**1.**

**Forward Propagation**

**2.**

**Error Calculation**

**3.**

**Back Propagation**

**h1**(after applying linear function and bias) as input of H1

**h2**(after applying linear function and bias) as input of H1

**Out h1**(after applying activation function) as output of H1

**Out h2**(after applying activation function) as output of H2

**y1**(after applying linear function and bias) as input of y1 layer

**y2**(after applying linear function and bias) as input of y2 layer

**Out y1**(after applying activation function) as output of y1 layer

**Out y2**(after applying activation function) as output of y2 layer

**E**

_{Total}**as total error of the Neural Network**

**model**

**Forward Propagation in Neural Network**

_{1}| T

_{2}) have two class of probability (for example probability to

**and probability to**

__win__**)**

__loss__To explain how neural network works, let’s assume we have only one row of below dataset.

**First:**Input data (x

_{1}, x

_{2}) fed into input layer

**Second:**Hidden Layer (H

_{1}, H

_{2}) calculation

**Third:**Predict output by output layer (y

_{1}, y

_{2}) calculation

To calculate forward propagation in hand, let’s take some numbers for weights, bias and target value or actual output (first row output T_{1} | T_{2}) along with input value (first row of our dataset).

**1. Hidden layer Calculation**

**two hidden units**(h

_{1}, h

_{2}) in hidden layer. Let’s calculate output of those hidden units.

**h**

_{1}= x_{1}w_{1}+ x_{2}w_{2}+b_{1 }= 0.03*0.11 + 0.09*0.27 + 0.39 = 0.4176

**h**

_{2}= x_{1}w_{3}+ x_{2}w_{4}+b_{1}**= 0.03*0.19 + 0.09*0.52 + 0.39 = 0.4425**

**2. Output layer Calculation**

**y**

_{1}= Outh_{1 }* w_{5}+ Outh_{2 }* w_{6}+ b_{2 }**= 0.60290881 * 0.44 + 0.60885456921 * 0.48 + 0.42****= 0.97753007**

**y**

_{2}= Outh_{1 }* w_{7}+ Outh_{2 }* w_{8}+ b_{2 }**= ****0.60290881 ***** 0.23 + 0.60885457 * 0.29 + 0.42****= ****0.73523685**** **

**Out y**and

_{1}**Out y**(predicted target value), we will calculate the

_{2}**error**now to find out how accurately our Neural Network algorithm is predicting.

**Error Calculation in Neural Network**

**Mean Square Error**to find out accuracy.

_{1}(out y

_{1}) is 0.72661785, therefore it is an error.

So calculating **mean square error for y _{1}**

**mean square error for y**

_{2}**total error**for our neural network (

**after one iteration**) is the sum of these errors:

**E**

_{Total}= E_{1}+ E_{2}

**= 0.25677057 + 0.04931263 = 0.3060832**

**Back Propagation in Neural Network**

**Back Propagation**.

_{1}, w

_{2}, …w

_{8}) in the network so that they cause the actual output to be closer the target output. In this way we can minimize the error for each output neuron (

**y**and

_{1}**y**)

_{2}**how back propagation works in Neural Network**.

**1.Back Propagation for Output Layer **

**w**.

_{5}**Consider weight w _{5}**, we want to know how much changes in weight

**w**affect the total error of our neural network ().

_{5}**E**

_{Total}= E_{1}+ E_{2}**= Out y**

_{1}(1-Out y_{1})**= 0.72661785 * (1 – 0.72661785)**

**= 0.19864435**

**y1 = Out h1 *w5 +Out h2 * w6 +b2**

**Update w**_{5}with change value

_{5}with change value

_{1}) to affect total error of our neural network model, to decrease error, we will subtract this change value () from the current weight (w

_{5}).

**(new)w**

_{5 }= 0.41425240**w**) of output layer.

_{6}, w_{7}, w_{8}**2.Back Propagation for Hidden Layer**

_{1}, w

_{2}, w

_{3}, w

_{4})

**Consider weight w**, we want to know how much changes in weight

_{1}**w**affect the total error of our neural network ()

_{1}

__Note__**:**Output of each hidden layer neuron (Out h

_{1}, Out h

_{2}) contributes to the output of each output neurons (Out y

_{1}, Out y

_{2}) and therefore contributes to the error.

**Now,**

**Now,**

**Update w**_{1}with change value

_{1}with change value

_{1}) to affect total error of our neural network model, to decrease error, we will subtract this change value () from the current weight (w

_{1})

**w**) of output layer.

_{2}, w_{3}, w_{4}********* This is the end of 1st iteration of our Neural Network model *********

It is important to note that the model is not trained properly yet, as we only back-propagated through one sample (first row) from the training set. Doing all we did, all over again for all the samples (each row) will yield a complete model.

**Neural Network Matrix Calculation in Python**

**neural network from scratch using numpy in Python**.

**Hidden Layer Matrix Calculation**

**Output Layer Matrix Calculation**

**Error Calculation**

**Neural Network in Numpy Python**

```
##########################################################################
# Neural Network from Scratch using numpy
##########################################################################
import numpy as np
# input data x variable
x_val = np.array([[0.03, 0.09],
[0.04, 0.10],
[0.05, 0.11],
[0.06, 0.12]])
# output data y variable
y_val = np.array([[0.01, 0.99],
[0.99, 0.01],
[0.01, 0.99],
[0.99, 0.01]])
###############################################
# Initializing weights
# 1st layer Weights
w1 = 0.11
w2 = 0.27
w3 = 0.19
w4 = 0.52
# 2nd layer weights
w5 = 0.44
w6 = 0.48
w7 = 0.23
w8 = 0.29
# Bias
b1 = 0.39
b2 = 0.42
# Learning rate
eta = 0.3
# setting 100 iteration to tune our neural network algorithm
iteration = 100
# 1st layer weights matrix
weights_h1 = np.array([[w1], [w2]])
weights_h2 = np.array([[w3], [w4]])
# 2nd layer weights matrix
weights_y1 = np.array([[w5], [w6]])
weights_y2 = np.array([[w7], [w8]])
##################### Forward Propagation ##########################
# Entire hidden layer weight matrix
weights_h = np.row_stack((weights_h1.T, weights_h2.T))
# Entire output layer weight matrix
weights_y = np.row_stack((weights_y1.T, weights_y2.T))
# Sigmoid Activation function ==> S(x) = 1/1+e^(-x)
def sigmoid(x, deriv=False):
if deriv == True:
return x * (1 - x)
return 1 / (1 + np.exp(-x))
h = np.dot(x_val, weights_h.T) + b1
# Entire 1st layer output matrix
out_h = sigmoid(h)
y = np.dot(out_h, weights_y.T) + b2
# Entire 2nd layer output matrix
out_y = sigmoid(y)
##################### Error Calculation ##########################
# E as E total
E_total = (np.square(y_val - out_y))/2
##################### Back Propagation ##########################
# 1. Update 2nd layer weights with change value 111111111111111111
# (dE_Total)/(dOut y_1 )
dE_total_dout_y = -(y_val - out_y)
# (d Out y_1)/(dy_1 )
dout_y_dy = out_y * (1 - out_y)
# (dy_1)/(dw_5 )
dy_dw = out_h
# For each iteration
for iter in range(iteration):
# Foreach row of input data update 2nd layer weight matrix
for row in range(len(x_val)):
# row = 0
# (dE_Total)/(dw_5 ) = (dE_Total)/(dOut y_1 )*(dOut y_1)/(dy_1 )*(dy_1)/(dw_5 )
dE_Total_dw5 = dE_total_dout_y[row][0] * round(dout_y_dy[row][0], 8) * dy_dw[0][0]
dE_Total_dw5 = round(dE_Total_dw5, 8)
# (dE_Total)/(dw_5 ) = (dE_Total)/(dOut y_1 )*(dOut y_1)/(dy_1 )*(dy_1)/(dw_5 )
dE_Total_dw6 = dE_total_dout_y[row][0] * round(dout_y_dy[row][0], 8) * dy_dw[0][1]
dE_Total_dw6 = round(dE_Total_dw6, 8)
# (dE_Total)/(dw_5 ) = (dE_Total)/(dOut y_1 )*(dOut y_1)/(dy_1 )*(dy_1)/(dw_5 )
dE_Total_dw7 = dE_total_dout_y[row][0] * round(dout_y_dy[row][1], 8) * dy_dw[0][0]
dE_Total_dw7 = round(dE_Total_dw7, 8)
# (dE_Total)/(dw_5 ) = (dE_Total)/(dOut y_1 )*(dOut y_1)/(dy_1 )*(dy_1)/(dw_5 )
dE_Total_dw8 = dE_total_dout_y[row][0] * round(dout_y_dy[row][1], 8) * dy_dw[0][1]
dE_Total_dw8 = round(dE_Total_dw8, 8)
# Combine all differential weights
dE_Total_dw_2nd_layer = np.array([[dE_Total_dw5, dE_Total_dw6],
[dE_Total_dw7, dE_Total_dw8]])
# Updated weights for 2nd layer
# (new)w_5 = w_5-η*(dE_Total)/(dw_5 ) [η is learning rate]
weights_y = weights_y - (eta * dE_Total_dw_2nd_layer)
weights_y
# 2. Update 1st layer weights with change value 22222222222222222
# (dE_2)/(dy_2 )=(dE_2)/(d〖Out y〗_2 )*(d〖Out y〗_2)/(dy_2 )
dE_dy = -(y_val - out_y) * (out_y * (1-out_y))
# (dE_1)/(dOut h_1 )= (dE_1)/(dy_1 )*(dy_1)/(dOut h_1 )
dE_dOut_h1 = dE_dy * np.array([[w5, w7]])
# (dE_2)/(dOut h_1 )= (dE_2)/(dy_2 )*(dy_2)/(dOut h_1 )
dE_dOut_h2 = dE_dy * np.array([[w6, w8]])
# (dE_Total)/(dOut h_1 )=(dE_1)/(dOut h_1 )+(dE_2)/(dOut h_1 )
dE_Total_dOut_h1 = dE_dOut_h1[row][0] + dE_dOut_h1[row][1]
# (dOut h_1)/(dh_1 )=Outh_1 (1-Outh_1)
dOut_h_dh = out_h * (1-out_h)
# dh1_dw1 = x
dh_dw = x_val
# (dE_Total)/(dw_1 )=(dE_Total)/(dOut h_1 )*(dOut h_1)/(dh_1 )*(dh_1)/(dw_1 )
dE_Total_dw1 = dE_Total_dOut_h1 * dOut_h_dh[row][0] * dh_dw[row][0]
dE_Total_dw1 = round(dE_Total_dw1, 8)
# (dE_Total)/(dw_1 )=(dE_Total)/(dOut h_1 )*(dOut h_1)/(dh_1 )*(dh_1)/(dw_1 )
dE_Total_dw2 = dE_Total_dOut_h1 * dOut_h_dh[row][0] * dh_dw[row][1]
dE_Total_dw2 = round(dE_Total_dw2, 8)
# (dE_Total)/(dw_1 )=(dE_Total)/(dOut h_1 )*(dOut h_1)/(dh_1 )*(dh_1)/(dw_1 )
dE_Total_dw3 = dE_Total_dOut_h1 * dOut_h_dh[row][1] * dh_dw[row][0]
dE_Total_dw3 = round(dE_Total_dw3, 8)
# (dE_Total)/(dw_1 )=(dE_Total)/(dOut h_1 )*(dOut h_1)/(dh_1 )*(dh_1)/(dw_1 )
dE_Total_dw4 = dE_Total_dOut_h1 * dOut_h_dh[row][1] * dh_dw[row][1]
dE_Total_dw4 = round(dE_Total_dw4, 8)
# Combine all differential weights
dE_Total_dw_1st_layer = np.array([[dE_Total_dw1, dE_Total_dw2],
[dE_Total_dw3, dE_Total_dw4]])
# update weights w1
weights_h = weights_h - (eta * dE_Total_dw_1st_layer)
print('iteration: ' + str(iter) + ' complete')
```

To ease to remember everything, let me list down all equations of Neural Network

**Neural Network Equations**

**Forward Propagation**

**h _{1}= x_{1}w_{1} + x_{2}w_{2} +b_{1}**

**h _{2} = x_{1}w_{3} + x_{2}w_{4}+b_{1}**

**y _{1} = Outh_{1}*w_{5} + Outh_{2}*w_{6} + b_{2}**

**y _{2} = Outh_{1}*w_{7} + Outh_{2}*w_{8} + b_{2}**

**Error Calculation**

**Back Propagation**

**1.Update 2nd layer weights**

**2.Update 1st layer weights**

**Conclusion**

_{1},w

_{2},…w

_{5},..w

_{8}). After updating again you need to calculate total error by doing forward pass.

**Derivative of sigmoid function**

**Explanation**

**If you have any question or suggestion regarding this topic see you in comment section. I will try my best to answer.**
Thanks for this blog and it more informative and useful to read. It’s a depth post to read and improve knowledge about neural network.