One Node Back Propagation ReLU#

In this document we calculate the back-propagation aglorithm for an Artificial Neural Network with one input \(x\), one hidden layer with a single ReLU node, and out linear output node.

Step 1: Initialization

Let’s initialize the network’s parameters:

  • Weight of the hidden neuron (\(w\)): Initialize randomly, e.g., \(w = 0.5\).

  • Bias of the hidden neuron (\(b\)): Initialize randomly, e.g., \(b = 0.2\).

  • Weight of the output neuron (\(v\)): Initialize randomly, e.g., \(v = 0.3\).

  • Bias of the output neuron (\(c\)): Initialize randomly, e.g., \(c = 0.1\).

Hyperparameters:

  • Learning rate (\(\alpha\)): Choose a suitable learning rate, e.g., \(\alpha = 0.01\).

Step 2: Forward Pass

For each training example, perform the forward pass through the network:

  • Input (\(x\)): The input data.

  • Hidden layer output (\(h\)): Apply the ReLU activation function.

    \[h = \text{ReLU}(w \cdot x + b)\]
  • Output (\(y\)): Compute the network’s output.

    \[y = v \cdot h + c\]

Step 3: Compute the Loss

Calculate the loss using a loss function, such as mean squared error (MSE):

  • Target output (\(y_{\text{target}}\)): The desired output for the given input.

    \[J(\theta) = \frac{1}{2}(y - y_{\text{target}})^2\]

Step 4: Backpropagation

Calculate the gradients of the loss with respect to the parameters (\(w\), \(b\), \(v\), \(c\)) using backpropagation:

  • Compute the gradient of the loss with respect to the output (\(\frac{\partial J}{\partial y}\)) and use the chain rule to compute the gradients for the other parameters.

    \(\frac{\partial J}{\partial v} = \frac{\partial J}{\partial y} \cdot h\)

    \(\frac{\partial J}{\partial c} = \frac{\partial J}{\partial y}\)

    \(\frac{\partial J}{\partial h} = \frac{\partial J}{\partial y} \cdot v\)

    \(\frac{\partial J}{\partial w} = \frac{\partial J}{\partial h} \cdot \frac{\partial h}{\partial(w \cdot x + b)} \cdot x\)

    \(\frac{\partial J}{\partial b} = \frac{\partial J}{\partial h} \cdot \frac{\partial h}{\partial(w \cdot x + b)}\)

  • Update the parameters using gradient descent:

    • Update \(v\) and \(c\) using the computed gradients and the learning rate (\(\alpha\)).

    • Update \(w\) and \(b\) using the computed gradients and the learning rate (\(\alpha\)).

Repeat steps 2-4 for a specified number of iterations or until the loss converges to a minimum. This process will train the network to make accurate predictions for the given input data. The choice of learning rate and the number of iterations can significantly affect the training process, and tuning these hyperparameters is an important part of training neural networks effectively.

import numpy as np
import matplotlib.pyplot as plt
# Initialize parameters
w = 0.5
b = 0.2
v = 0.3
c = 0.1
learning_rate = 0.01

# Training data
x = 1  # Input feature
y_target = 3# Target output

# Number of training iterations
num_iterations = 10

# Gradient Descent
for i in range(num_iterations):
    # Forward pass
    h = max(0, w * x + b)  # ReLU activation
    y_pred = v * h + c
    
    # Compute the cost (MSE)
    cost = 0.5 * (y_pred - y_target)**2
    dy = (y_pred - y_target)
    
    # Backpropagation
    # Compute gradients
    dv = dy*h
    dc = dy*1
    dh = dy*v
    
    if w * x + b > 0:
        dw = x * dh
        db = dh
    else:
        dw = 0
        db = 0
    
    # Update parameters using gradient descent
    w =w- learning_rate * dw
    b =b- learning_rate * db
    v =v- learning_rate * dv
    c =c- learning_rate * dc
    
    # Print progress
    print(f"Iteration {i}: Cost = {cost:.4f}, Predicted Output = {y_pred:.2f}")
    print("\nTrained Parameters:")
    print(f"w {w}: b = {b:.4f}, v = {v:.2f}, c = {v:.2f}")
Iteration 0: Cost = 3.6180, Predicted Output = 0.31

Trained Parameters:
w 0.50807: b = 0.2081, v = 0.32, c = 0.32
Iteration 1: Cost = 3.4974, Predicted Output = 0.36

Trained Parameters:
w 0.5165023300230795: b = 0.2165, v = 0.34, c = 0.34
Iteration 2: Cost = 3.3776, Predicted Output = 0.40

Trained Parameters:
w 0.5252811993729141: b = 0.2253, v = 0.36, c = 0.36
Iteration 3: Cost = 3.2585, Predicted Output = 0.45

Trained Parameters:
w 0.5343902995969806: b = 0.2344, v = 0.38, c = 0.38
Iteration 4: Cost = 3.1402, Predicted Output = 0.49

Trained Parameters:
w 0.5438127352962644: b = 0.2438, v = 0.40, c = 0.40
Iteration 5: Cost = 3.0228, Predicted Output = 0.54

Trained Parameters:
w 0.553530966017995: b = 0.2535, v = 0.41, c = 0.41
Iteration 6: Cost = 2.9061, Predicted Output = 0.59

Trained Parameters:
w 0.5635267600341148: b = 0.2635, v = 0.43, c = 0.43
Iteration 7: Cost = 2.7904, Predicted Output = 0.64

Trained Parameters:
w 0.5737811608423549: b = 0.2738, v = 0.45, c = 0.45
Iteration 8: Cost = 2.6757, Predicted Output = 0.69

Trained Parameters:
w 0.58427446722317: b = 0.2843, v = 0.47, c = 0.47
Iteration 9: Cost = 2.5620, Predicted Output = 0.74

Trained Parameters:
w 0.5949862276409392: b = 0.2950, v = 0.49, c = 0.49