General Back Propagation for a Deep ANN#
Notation:
\( L \): the number of layers in the network
\( w_{ij}^{(l)} \): the weight associated with the connection between neuron \( i \) in layer \( l-1 \) and neuron \( j \) in layer \( l \)
\( a_i^{(l)} \): the activation (output) of neuron \( i \) in layer \( l \)
\( z_i^{(l)} \): the weighted input to neuron \( i \) in layer \( l \) (before activation function is applied)
\( \delta_i^{(l)} \): the error of neuron \( i \) in layer \( l \)
The steps for backpropagation in a deep neural network are as follows:
Forward Pass:
Compute the activation \( a_i^{(l)} \) for each neuron in each layer using the input features and current weights.
Compute the error \( \delta_i^{(L)} \) for each neuron in the output layer \( L \) using the specified loss function.
Backward Pass:
For each layer \( l = L-1, L-2, \ldots, 2 \):
Compute \( \delta_i^{(l)} \) for each neuron in layer \( l \): $\( \delta_i^{(l)} = \left(\sum_{k} w_{ki}^{(l+1)} \delta_k^{(l+1)}\right) \cdot f'(z_i^{(l)}) \)\( where \) f’(\cdot) $ is the derivative of the activation function.
Weight Update:
Update the weights using the computed errors:
\[ w_{ij}^{(l)} = w_{ij}^{(l)} - \alpha \cdot \frac{\partial J}{\partial w_{ij}^{(l)}} \]where \( \alpha \) is the learning rate, and \( J \) is the total cost function.
The partial derivative with respect to a weight is given by:
\[ \frac{\partial J}{\partial w_{ij}^{(l)}} = a_i^{(l-1)} \delta_j^{(l)} \]Update biases in a similar manner.
Repeat the forward and backward passes for a number of iterations or until convergence.
It’s important to note that the activation function, loss function, and learning rate are design choices that depend on the specific problem being solved. Additionally, regularization techniques and optimization algorithms (like Adam, RMSprop, etc.) are often used to improve training stability and convergence speed.