Vector Form of Back Propagation#
Backpropagation algorithm for a neural network with one hidden layer, a ReLU activation function in the hidden layer, and a linear activation function in the output layer.
Assumptions:
Let \(X\) be the input vector,
\(W^{(1)}\) be the weight matrix for the hidden layer,
\(b^{(1)}\) be the bias for the hidden layer,
\(W^{(2)}\) be the weight matrix for the output layer,
\(b^{(2)}\) be the bias for the output layer,
\(Z^{(1)}\) be the weighted sum of inputs for the hidden layer,
\(A^{(1)}\) be the output of the hidden layer after applying the ReLU activation,
\(Z^{(2)}\) be the weighted sum of inputs for the output layer,
\(A^{(2)}\) be the predicted output of the neural network,
\(Y\) be the true output.
The forward propagation equations are as follows:
And the loss function is typically defined as the Mean Squared Error (MSE):
where \(m\) is the number of training examples.
Now, let’s go through one iteration of backpropagation:
Compute the loss gradient with respect to the output layer:
Backpropagate the gradient to the hidden layer:
Backpropagate the gradient to the input layer:
Now, you can update the weights and biases using a learning rate \(\alpha\):
This completes one iteration of the backpropagation algorithm for the given neural network architecture.