Gradient Descent Neural Network sigmoid#

Here’s a simplified neural network architecture:

  1. Input layer: One neuron with input \(x\).

  2. Hidden layer: One neuron with weight \(w\) and bias \(b\), applying the sigmoid activation function.

  3. Output layer: One neuron with weight \(v\) and bias \(c\), applying the linear (identity) activation function.

The network’s output \(y\) can be expressed as:

\[y = v \cdot \text{sigmoid}(w \cdot x + b) + c\]

Your goal is to use gradient descent to train the network by updating the weights and biases to minimize a cost function \(J(\theta)\), where \(\theta\) represents all the network parameters (\(w\), \(b\), \(v\), and \(c\)).

Initialization:

  • Initialize the parameters randomly or with predetermined values.

  • Set the learning rate \(\alpha\) (e.g., \(\alpha = 0.1\)).

Training:

  1. For each training example, compute the predicted output \(y\) using the current network parameters:

    \[y = v \cdot \text{sigmoid}(w \cdot x + b) + c\]
  2. Compute the cost function \(J(\theta)\) (e.g., mean squared error) between the predicted output \(y\) and the actual target output \(y_{\text{target}}\):

    \[J(\theta) = \frac{1}{2}(y - y_{\text{target}})^2\]
  3. Compute the gradients of the cost function with respect to the parameters \(\theta\) using backpropagation. For example:

    • \(\frac{\partial J}{\partial v} = y - y_{\text{target}}\)

    • \(\frac{\partial J}{\partial c} = y - y_{\text{target}}\)

    • \(\frac{\partial J}{\partial w} = \frac{\partial J}{\partial y} \cdot \frac{\partial y}{\partial \text{sigmoid}} \cdot \frac{\partial \text{sigmoid}}{\partial(w \cdot x + b)} \cdot \frac{\partial(w \cdot x + b)}{\partial w}\)

    • \(\frac{\partial J}{\partial b} = \frac{\partial J}{\partial y} \cdot \frac{\partial y}{\partial \text{sigmoid}} \cdot \frac{\partial \text{sigmoid}}{\partial(w \cdot x + b)} \cdot \frac{\partial(w \cdot x + b)}{\partial b}\)

  4. Update the parameters using gradient descent:

    • \(v = v - \alpha \cdot \frac{\partial J}{\partial v}\)

    • \(c = c - \alpha \cdot \frac{\partial J}{\partial c}\)

    • \(w = w - \alpha \cdot \frac{\partial J}{\partial w}\)

    • \(b = b - \alpha \cdot \frac{\partial J}{\partial b}\)

Repeat steps 1-4 for multiple iterations and examples until the cost function converges to a minimum, indicating that the network has been trained.

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(sig):
    return sig * (1 - sig)

# Initialize random weights and biases
input_size = 1
hidden_size = 1
output_size = 1

np.random.seed(1)
weights_input_hidden = np.random.rand(input_size, hidden_size)
weights_hidden_output = np.random.rand(hidden_size, output_size)

bias_hidden = np.random.rand(1, hidden_size)
bias_output = np.random.rand(1, output_size)

# Define learning rate
learning_rate = 0.01

# Training data
X = np.array([[0], [1]])
y = np.array([[0], [1]])

# Training loop
epochs = 1000
for epoch in range(epochs):
    # Forward pass
    hidden_input = np.dot(X, weights_input_hidden) + bias_hidden
    hidden_output = sigmoid(hidden_input)

    final_input = np.dot(hidden_output, weights_hidden_output) + bias_output
    predicted_output = final_input  # Linear activation for output layer

    # Calculate error
    error = y - predicted_output

    # Backpropagation
    output_delta = error
    hidden_error = output_delta.dot(weights_hidden_output.T)
    hidden_delta = hidden_error * sigmoid_derivative(hidden_output)

    # Update weights and biases
    weights_hidden_output += hidden_output.T.dot(output_delta) * learning_rate
    weights_input_hidden += X.T.dot(hidden_delta) * learning_rate
    bias_output += np.sum(output_delta, axis=0, keepdims=True) * learning_rate
    bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate
    
# Testing the trained network
new_data = np.array([[2.5]])
hidden_layer_activation = sigmoid(np.dot(new_data, weights_input_hidden) + bias_hidden)
output = np.dot(hidden_layer_activation, weights_hidden_output) + bias_output

print("Input:", new_data)
print("Predicted Output:", output)
Input: [[2.5]]
Predicted Output: [[0.94234077]]