Gradient Descent Neural Network sigmoid#
Here’s a simplified neural network architecture:
- Input layer: One neuron with input \(x\). 
- Hidden layer: One neuron with weight \(w\) and bias \(b\), applying the sigmoid activation function. 
- Output layer: One neuron with weight \(v\) and bias \(c\), applying the linear (identity) activation function. 
The network’s output \(y\) can be expressed as:
Your goal is to use gradient descent to train the network by updating the weights and biases to minimize a cost function \(J(\theta)\), where \(\theta\) represents all the network parameters (\(w\), \(b\), \(v\), and \(c\)).
Initialization:
- Initialize the parameters randomly or with predetermined values. 
- Set the learning rate \(\alpha\) (e.g., \(\alpha = 0.1\)). 
Training:
- For each training example, compute the predicted output \(y\) using the current network parameters: \[y = v \cdot \text{sigmoid}(w \cdot x + b) + c\]
- Compute the cost function \(J(\theta)\) (e.g., mean squared error) between the predicted output \(y\) and the actual target output \(y_{\text{target}}\): \[J(\theta) = \frac{1}{2}(y - y_{\text{target}})^2\]
- Compute the gradients of the cost function with respect to the parameters \(\theta\) using backpropagation. For example: - \(\frac{\partial J}{\partial v} = y - y_{\text{target}}\) 
- \(\frac{\partial J}{\partial c} = y - y_{\text{target}}\) 
- \(\frac{\partial J}{\partial w} = \frac{\partial J}{\partial y} \cdot \frac{\partial y}{\partial \text{sigmoid}} \cdot \frac{\partial \text{sigmoid}}{\partial(w \cdot x + b)} \cdot \frac{\partial(w \cdot x + b)}{\partial w}\) 
- \(\frac{\partial J}{\partial b} = \frac{\partial J}{\partial y} \cdot \frac{\partial y}{\partial \text{sigmoid}} \cdot \frac{\partial \text{sigmoid}}{\partial(w \cdot x + b)} \cdot \frac{\partial(w \cdot x + b)}{\partial b}\) 
 
- Update the parameters using gradient descent: - \(v = v - \alpha \cdot \frac{\partial J}{\partial v}\) 
- \(c = c - \alpha \cdot \frac{\partial J}{\partial c}\) 
- \(w = w - \alpha \cdot \frac{\partial J}{\partial w}\) 
- \(b = b - \alpha \cdot \frac{\partial J}{\partial b}\) 
 
Repeat steps 1-4 for multiple iterations and examples until the cost function converges to a minimum, indicating that the network has been trained.
import numpy as np
def sigmoid(x):
    return 1 / (1 + np.exp(-x))
def sigmoid_derivative(sig):
    return sig * (1 - sig)
# Initialize random weights and biases
input_size = 1
hidden_size = 1
output_size = 1
np.random.seed(1)
weights_input_hidden = np.random.rand(input_size, hidden_size)
weights_hidden_output = np.random.rand(hidden_size, output_size)
bias_hidden = np.random.rand(1, hidden_size)
bias_output = np.random.rand(1, output_size)
# Define learning rate
learning_rate = 0.01
# Training data
X = np.array([[0], [1]])
y = np.array([[0], [1]])
# Training loop
epochs = 1000
for epoch in range(epochs):
    # Forward pass
    hidden_input = np.dot(X, weights_input_hidden) + bias_hidden
    hidden_output = sigmoid(hidden_input)
    final_input = np.dot(hidden_output, weights_hidden_output) + bias_output
    predicted_output = final_input  # Linear activation for output layer
    # Calculate error
    error = y - predicted_output
    # Backpropagation
    output_delta = error
    hidden_error = output_delta.dot(weights_hidden_output.T)
    hidden_delta = hidden_error * sigmoid_derivative(hidden_output)
    # Update weights and biases
    weights_hidden_output += hidden_output.T.dot(output_delta) * learning_rate
    weights_input_hidden += X.T.dot(hidden_delta) * learning_rate
    bias_output += np.sum(output_delta, axis=0, keepdims=True) * learning_rate
    bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate
    
# Testing the trained network
new_data = np.array([[2.5]])
hidden_layer_activation = sigmoid(np.dot(new_data, weights_input_hidden) + bias_hidden)
output = np.dot(hidden_layer_activation, weights_hidden_output) + bias_output
print("Input:", new_data)
print("Predicted Output:", output)
Input: [[2.5]]
Predicted Output: [[0.94234077]]
