Vector Form of Back Propagation#

Backpropagation algorithm for a neural network with one hidden layer, a ReLU activation function in the hidden layer, and a linear activation function in the output layer.

Assumptions:

  • Let \(X\) be the input vector,

  • \(W^{(1)}\) be the weight matrix for the hidden layer,

  • \(b^{(1)}\) be the bias for the hidden layer,

  • \(W^{(2)}\) be the weight matrix for the output layer,

  • \(b^{(2)}\) be the bias for the output layer,

  • \(Z^{(1)}\) be the weighted sum of inputs for the hidden layer,

  • \(A^{(1)}\) be the output of the hidden layer after applying the ReLU activation,

  • \(Z^{(2)}\) be the weighted sum of inputs for the output layer,

  • \(A^{(2)}\) be the predicted output of the neural network,

  • \(Y\) be the true output.

The forward propagation equations are as follows:

\[ Z^{(1)} = X \cdot W^{(1)} + b^{(1)} \]
\[ A^{(1)} = \text{ReLU}(Z^{(1)}) \]
\[ Z^{(2)} = A^{(1)} \cdot W^{(2)} + b^{(2)} \]
\[ A^{(2)} = Z^{(2)} \]

And the loss function is typically defined as the Mean Squared Error (MSE):

\[ \text{MSE} = \frac{1}{2m} \sum_{i=1}^{m} (A^{(2)} - Y)^2 \]

where \(m\) is the number of training examples.

Now, let’s go through one iteration of backpropagation:

  1. Compute the loss gradient with respect to the output layer:

\[ dZ^{(2)} = A^{(2)} - Y \]
  1. Backpropagate the gradient to the hidden layer:

\[ dW^{(2)} = \frac{1}{m} A^{(1)T} \cdot dZ^{(2)} \]
\[ db^{(2)} = \frac{1}{m} \sum_{i=1}^{m} dZ^{(2)} \]
\[ dZ^{(1)} = dZ^{(2)} \cdot (W^{(2)})^T \]
(601)#\[ dZ^{(1)}[Z^{(1)} \leq 0] = 0 \]
  1. Backpropagate the gradient to the input layer:

\[ dW^{(1)} = \frac{1}{m} X^T \cdot dZ^{(1)} \]
\[ db^{(1)} = \frac{1}{m} \sum_{i=1}^{m} dZ^{(1)} \]

Now, you can update the weights and biases using a learning rate \(\alpha\):

\[ W^{(2)} = W^{(2)} - \alpha \cdot dW^{(2)} \]
\[ b^{(2)} = b^{(2)} - \alpha \cdot db^{(2)} \]
\[ W^{(1)} = W^{(1)} - \alpha \cdot dW^{(1)} \]
\[ b^{(1)} = b^{(1)} - \alpha \cdot db^{(1)} \]

This completes one iteration of the backpropagation algorithm for the given neural network architecture.