Gradient Descent Neural Network sigmoid

Gradient Descent Neural Network sigmoid#

Here’s a simplified neural network architecture:

Input layer: One neuron with input \(x\).
Hidden layer: One neuron with weight \(w\) and bias \(b\), applying the sigmoid activation function.
Output layer: One neuron with weight \(v\) and bias \(c\), applying the linear (identity) activation function.

The network’s output \(y\) can be expressed as:

\[y = v \cdot \text{sigmoid}(w \cdot x + b) + c\]

Your goal is to use gradient descent to train the network by updating the weights and biases to minimize a cost function \(J(\theta)\), where \(\theta\) represents all the network parameters (\(w\), \(b\), \(v\), and \(c\)).

Initialization:

Initialize the parameters randomly or with predetermined values.
Set the learning rate \(\alpha\) (e.g., \(\alpha = 0.1\)).

Training:

For each training example, compute the predicted output \(y\) using the current network parameters:

\[y = v \cdot \text{sigmoid}(w \cdot x + b) + c\]
Compute the cost function \(J(\theta)\) (e.g., mean squared error) between the predicted output \(y\) and the actual target output \(y_{\text{target}}\):

\[J(\theta) = \frac{1}{2}(y - y_{\text{target}})^2\]
Compute the gradients of the cost function with respect to the parameters \(\theta\) using backpropagation. For example:
- \(\frac{\partial J}{\partial v} = y - y_{\text{target}}\)
- \(\frac{\partial J}{\partial c} = y - y_{\text{target}}\)
- \(\frac{\partial J}{\partial w} = \frac{\partial J}{\partial y} \cdot \frac{\partial y}{\partial \text{sigmoid}} \cdot \frac{\partial \text{sigmoid}}{\partial(w \cdot x + b)} \cdot \frac{\partial(w \cdot x + b)}{\partial w}\)
- \(\frac{\partial J}{\partial b} = \frac{\partial J}{\partial y} \cdot \frac{\partial y}{\partial \text{sigmoid}} \cdot \frac{\partial \text{sigmoid}}{\partial(w \cdot x + b)} \cdot \frac{\partial(w \cdot x + b)}{\partial b}\)
Update the parameters using gradient descent:
- \(v = v - \alpha \cdot \frac{\partial J}{\partial v}\)
- \(c = c - \alpha \cdot \frac{\partial J}{\partial c}\)
- \(w = w - \alpha \cdot \frac{\partial J}{\partial w}\)
- \(b = b - \alpha \cdot \frac{\partial J}{\partial b}\)

Repeat steps 1-4 for multiple iterations and examples until the cost function converges to a minimum, indicating that the network has been trained.

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(sig):
    return sig * (1 - sig)

# Initialize random weights and biases
input_size = 1
hidden_size = 1
output_size = 1

np.random.seed(1)
weights_input_hidden = np.random.rand(input_size, hidden_size)
weights_hidden_output = np.random.rand(hidden_size, output_size)

bias_hidden = np.random.rand(1, hidden_size)
bias_output = np.random.rand(1, output_size)

# Define learning rate
learning_rate = 0.01

# Training data
X = np.array([[0], [1]])
y = np.array([[0], [1]])

# Training loop
epochs = 1000
for epoch in range(epochs):
    # Forward pass
    hidden_input = np.dot(X, weights_input_hidden) + bias_hidden
    hidden_output = sigmoid(hidden_input)

    final_input = np.dot(hidden_output, weights_hidden_output) + bias_output
    predicted_output = final_input  # Linear activation for output layer

    # Calculate error
    error = y - predicted_output

    # Backpropagation
    output_delta = error
    hidden_error = output_delta.dot(weights_hidden_output.T)
    hidden_delta = hidden_error * sigmoid_derivative(hidden_output)

    # Update weights and biases
    weights_hidden_output += hidden_output.T.dot(output_delta) * learning_rate
    weights_input_hidden += X.T.dot(hidden_delta) * learning_rate
    bias_output += np.sum(output_delta, axis=0, keepdims=True) * learning_rate
    bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate
    
# Testing the trained network
new_data = np.array([[2.5]])
hidden_layer_activation = sigmoid(np.dot(new_data, weights_input_hidden) + bias_hidden)
output = np.dot(hidden_layer_activation, weights_hidden_output) + bias_output

print("Input:", new_data)
print("Predicted Output:", output)

Input: [[2.5]]
Predicted Output: [[0.94234077]]