Back Progagation ReLU three inputs#
Gradient descent to train a simple artificial neural network (ANN) with one input node, one hidden node (using the ReLU activation function), and one output node. We will use a basic regression problem as an example.
Problem Statement: We want to train a neural network to predict the output (y) based on a single input feature (x). The network architecture is as follows:
Input layer: One neuron with input \(x\).
Hidden layer: One neuron with weight \(w\) and bias \(b\), applying the ReLU activation function.
Output layer: One neuron with weight \(v\) and bias \(c\), representing the predicted output.
The network’s output \(y\) can be expressed as:
Data: Let’s assume we have a dataset with input-output pairs (x, y) for training:
Initialization:
Initialize the parameters randomly or with predetermined values:
\(w = 0.5\)
\(b = 0.2\)
\(v = 0.3\)
\(c = 0.1\)
Set the learning rate \(\alpha\) (e.g., \(\alpha = 0.1\)).
Training: We’ll use gradient descent to update the network parameters in order to minimize the mean squared error (MSE) cost function:
where \(N\) is the number of training examples, \(y_i\) is the predicted output for the \(i\)-th example, and \(y_{\text{target},i}\) is the target output for the \(i\)-th example.
Here’s how we update the parameters in each training iteration:
For each training example (x, y), calculate the predicted output \(y\) using the current network parameters:
\[y = v \cdot \text{ReLU}(w \cdot x + b) + c\]Calculate the cost for this example:
\[J(\theta) = \frac{1}{2}(y - y_{\text{target}})^2\]Compute the gradients of the cost with respect to the parameters:
\(\frac{\partial J}{\partial v} = y - y_{\text{target}}\)
\(\frac{\partial J}{\partial c} = y - y_{\text{target}}\)
\(\frac{\partial J}{\partial w} = \frac{\partial J}{\partial y} \cdot \frac{\partial y}{\partial \text{ReLU}} \cdot \frac{\partial \text{ReLU}}{\partial(w \cdot x + b)} \cdot \frac{\partial(w \cdot x + b)}{\partial w}\)
\(\frac{\partial J}{\partial b} = \frac{\partial J}{\partial y} \cdot \frac{\partial y}{\partial \text{ReLU}} \cdot \frac{\partial \text{ReLU}}{\partial(w \cdot x + b)} \cdot \frac{\partial(w \cdot x + b)}{\partial b}\)
Update the parameters using gradient descent:
\(v = v - \alpha \cdot \frac{\partial J}{\partial v}\)
\(c = c - \alpha \cdot \frac{\partial J}{\partial c}\)
\(w = w - \alpha \cdot \frac{\partial J}{\partial w}\)
\(b = b - \alpha \cdot \frac{\partial J}{\partial b}\)
Repeat steps 1-4 for a specified number of training iterations or until the cost converges to a minimum. In each iteration, you use the gradients to adjust the parameters in the direction that minimizes the cost function, thus training the neural network.
This example illustrates a basic gradient descent approach for training a simple neural network with ReLU activation. In practice, real-world scenarios involve larger networks, more data, and libraries like TensorFlow or PyTorch to handle the training process efficiently.
import numpy as np
# Define the neural network architecture
def relu(x):
return np.maximum(0, x)
# Initialize the network parameters
w = 0.5
b = 0.2
v = 0.3
c = 0.1
learning_rate = 0.1
# Define the training data
x = 2.0
y_target = 2.0
# Forward pass
z = w * x + b
h = relu(z)
y = v * h + c
# Compute the cost
cost = 0.5 * (y - y_target)**2
# Compute the gradients
dJ_dy = y - y_target
dJ_dv = dJ_dy * h
dJ_dc = dJ_dy
dJ_dh = dJ_dy * v
dJ_dz = dJ_dh if z > 0 else 0
dJ_dw = dJ_dz * x
dJ_db = dJ_dz
# Update the parameters using gradient descent
v -= learning_rate * dJ_dv
c -= learning_rate * dJ_dc
w -= learning_rate * dJ_dw
b -= learning_rate * dJ_db
# Print the updated parameters
print(f"Updated w: {w}")
print(f"Updated b: {b}")
print(f"Updated v: {v}")
print(f"Updated c: {c}")
Updated w: 0.5924
Updated b: 0.2462
Updated v: 0.4848
Updated c: 0.254