Three Layer Feed Forward ANN#
The mathematical calculations for a simple artificial neural network (ANN) with two hidden layers, each having three ReLU (Rectified Linear Unit) nodes, and an output layer with a Sigmoid activation function. Let’s denote the inputs as \(x_1\), \(x_2\), and \(x_3\), the weights for the connections between the input layer and the first hidden layer as \(w_{ij}^{(1)}\) (where \(i\) represents the input node, and \(j\) represents the hidden node), the biases for the first hidden layer as \(b_j^{(1)}\), the weights for the connections between the first hidden layer and the second hidden layer as \(w_{jk}^{(2)}\), the biases for the second hidden layer as \(b_k^{(2)}\), the weights for the connections between the second hidden layer and the output layer as \(w_{kl}^{(3)}\), and the bias for the output layer as \(b_l^{(3)}\).
The calculations for the nodes in the first hidden layer (\(h_1^{(1)}\), \(h_2^{(1)}\), and \(h_3^{(1)}\)) are as follows:
The ReLU function is defined as \(\text{ReLU}(x) = \max(0, x)\), meaning that it outputs the input value if it is positive and zero otherwise.
The calculations for the nodes in the second hidden layer (\(h_1^{(2)}\), \(h_2^{(2)}\), and \(h_3^{(2)}\)) are similar:
Finally, the calculation for the output node (\(y\)) with the Sigmoid activation function is:
Where \(\text{Sigmoid}(x) = \frac{1}{1 + e^{-x}}\) is the sigmoid function that squashes the output value between 0 and 1.
import numpy as np
# Define the neural network architecture
input_size = 3 # Number of input features
hidden_size1 = 3 # Number of nodes in the hidden layer
hidden_size2 = 3 # Number of nodes in the hidden layer
output_size = 1 # Number of output nodes
# Initialize random weights and biases
np.random.seed(0)
input_layer_size = input_size
hidden_layer1_size = hidden_size1
hidden_layer2_size = hidden_size2
output_layer_size = output_size
# Initialize weights and biases with random values
weights_input_hidden1 = np.random.randn(input_layer_size, hidden_layer1_size)
bias_hidden1 = np.zeros((1, hidden_layer1_size))
# Initialize weights and biases with random values
weights_input_hidden2 = np.random.randn(hidden_layer1_size, hidden_layer2_size)
bias_hidden2 = np.zeros((1, hidden_layer2_size))
weights_hidden_output = np.random.randn(hidden_layer2_size, output_layer_size)
bias_output = np.zeros((1, output_size))
# Define the sigmoid activation function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Define the ReLU activation function
def ReLU(x):
return np.maximum(0,x)
# Define the forward propagation function
def forward_propagation(X):
# Calculate the values for the hidden layer 1
hidden_input1 = np.dot(X.T, weights_input_hidden1) + bias_hidden1
hidden_output1 = ReLU(hidden_input1)
# Calculate the values for the hidden layer 2
hidden_input2 = np.dot(hidden_output1, weights_input_hidden2) + bias_hidden2
hidden_output2 = ReLU(hidden_input2)
# Calculate the values for the output layer
output_input = np.dot(hidden_output2, weights_hidden_output) + bias_output
output = sigmoid(output_input)
return hidden_input1, hidden_output1,hidden_input2, hidden_output2, output_input, output
# Example input data
X = np.array([[0, 0, 1], [0, 1, 0]]).T # Two input samples
# Perform forward propagation
hidden_input1, hidden_output1,hidden_input2, hidden_output2, output_input, output = forward_propagation(X)
# Print the output
print("Output:")
print(output)
Output:
[[2.86873055e-02]
[3.81899579e-05]]
Outline of the Maths for an ANN#
Input Layer:
Input data: X
First Hidden Layer:
Number of nodes: 3
Weight matrix: W1 (3x3)
Bias vector: b1 (1x3)
Activation function: ReLU (Rectified Linear Unit)
The output of the first hidden layer (Z1) is calculated as follows:
Z1 = X * W1 + b1
Where:
X is the input data (a row vector),
W1 is the weight matrix for the first hidden layer,
b1 is the bias vector for the first hidden layer,
“*” represents matrix multiplication.
Apply ReLU activation to Z1:
A1 = ReLU(Z1)
A1 is the output of the first hidden layer.
Second Hidden Layer:
Number of nodes: 3
Weight matrix: W2 (3x3)
Bias vector: b2 (1x3)
Activation function: ReLU
The output of the second hidden layer (Z2) is calculated in the same way as the first hidden layer:
Z2 = A1 * W2 + b2
Apply ReLU activation to Z2:
A2 = ReLU(Z2)
A2 is the output of the second hidden layer.
Output Layer:
Number of nodes: 1
Weight matrix: W3 (3x1)
Bias vector: b3 (1x1)
Activation function: Sigmoid
The output of the output layer (Z3) is calculated as:
Z3 = A2 * W3 + b3
Apply the Sigmoid activation to Z3 to get the final output (Y):
Y = Sigmoid(Z3)
This is the forward pass of the network, and it calculates the output based on the given input data.