Vector Form of Back Propagation

Vector Form of Back Propagation#

Backpropagation algorithm for a neural network with one hidden layer, a ReLU activation function in the hidden layer, and a linear activation function in the output layer.

Assumptions:

Let \(X\) be the input vector,
\(W^{(1)}\) be the weight matrix for the hidden layer,
\(b^{(1)}\) be the bias for the hidden layer,
\(W^{(2)}\) be the weight matrix for the output layer,
\(b^{(2)}\) be the bias for the output layer,
\(Z^{(1)}\) be the weighted sum of inputs for the hidden layer,
\(A^{(1)}\) be the output of the hidden layer after applying the ReLU activation,
\(Z^{(2)}\) be the weighted sum of inputs for the output layer,
\(A^{(2)}\) be the predicted output of the neural network,
\(Y\) be the true output.

The forward propagation equations are as follows:

\[ Z^{(1)} = X \cdot W^{(1)} + b^{(1)} \]

\[ A^{(1)} = \text{ReLU}(Z^{(1)}) \]

\[ Z^{(2)} = A^{(1)} \cdot W^{(2)} + b^{(2)} \]

\[ A^{(2)} = Z^{(2)} \]

And the loss function is typically defined as the Mean Squared Error (MSE):

\[ \text{MSE} = \frac{1}{2m} \sum_{i=1}^{m} (A^{(2)} - Y)^2 \]

where \(m\) is the number of training examples.

Now, let’s go through one iteration of backpropagation:

Compute the loss gradient with respect to the output layer:

\[ dZ^{(2)} = A^{(2)} - Y \]

Backpropagate the gradient to the hidden layer:

\[ dW^{(2)} = \frac{1}{m} A^{(1)T} \cdot dZ^{(2)} \]

\[ db^{(2)} = \frac{1}{m} \sum_{i=1}^{m} dZ^{(2)} \]

\[ dZ^{(1)} = dZ^{(2)} \cdot (W^{(2)})^T \]

(601)#\[ dZ^{(1)}[Z^{(1)} \leq 0] = 0 \]

Backpropagate the gradient to the input layer:

\[ dW^{(1)} = \frac{1}{m} X^T \cdot dZ^{(1)} \]

\[ db^{(1)} = \frac{1}{m} \sum_{i=1}^{m} dZ^{(1)} \]

Now, you can update the weights and biases using a learning rate \(\alpha\):

\[ W^{(2)} = W^{(2)} - \alpha \cdot dW^{(2)} \]

\[ b^{(2)} = b^{(2)} - \alpha \cdot db^{(2)} \]

\[ W^{(1)} = W^{(1)} - \alpha \cdot dW^{(1)} \]

\[ b^{(1)} = b^{(1)} - \alpha \cdot db^{(1)} \]

This completes one iteration of the backpropagation algorithm for the given neural network architecture.