Cost Functions#

Cost functions, also known as loss functions or objective functions, play a crucial role in training artificial neural networks (ANNs). These functions measure the discrepancy between the network’s predictions and the actual target values during the training process. The choice of the appropriate cost function depends on the specific task at hand, such as regression or classification. Here’s a summary of some common cost functions for ANNs:

1. Mean Squared Error (MSE):

  • Used for regression tasks.

  • Measures the average squared difference between predicted values and actual target values.

  • Encourages the network to minimize the variance of its predictions.

  • Mathematically, MSE is defined as:

\[J(\theta) = \frac{1}{2N} \sum_{i=1}^{N} (y_i - y_{\text{target},i})^2\]

2. Mean Absolute Error (MAE):

  • Another cost function for regression.

  • Measures the average absolute difference between predicted values and actual target values.

  • Less sensitive to outliers compared to MSE.

  • Mathematically, MAE is defined as:

\[J(\theta) = \frac{1}{N} \sum_{i=1}^{N} |y_i - y_{\text{target},i}|\]

3. Cross-Entropy Loss (Log Loss):

  • Used for binary and multiclass classification tasks.

  • Measures the dissimilarity between predicted class probabilities and true class labels.

  • Commonly used with logistic and softmax activation functions in the output layer.

  • For binary classification:

\[J(\theta) = -\frac{1}{N} \sum_{i=1}^{N} \left(y_{\text{target},i} \log(y_i) + (1 - y_{\text{target},i}) \log(1 - y_i)\right)\]
  • For multiclass classification:

\[J(\theta) = -\frac{1}{N} \sum_{i=1}^{N} \sum_{c=1}^{C} y_{\text{target},i,c} \log(y_i,c)\]

4. Hinge Loss (SVM Loss):

  • Used for support vector machine (SVM) and margin-based classification.

  • Encourages correct classification with a margin between classes.

  • Mathematically, hinge loss is defined as:

\[J(\theta) = \frac{1}{N} \sum_{i=1}^{N} \max(0, 1 - y_{\text{target},i} \cdot y_i)\]

5. Huber Loss:

  • Used for regression tasks.

  • Combines the benefits of both MSE and MAE by having a quadratic loss for small errors and a linear loss for large errors.

  • Less sensitive to outliers than MSE.

  • Mathematically, Huber loss is defined as:

\[\begin{split}J(\theta) = \frac{1}{N} \sum_{i=1}^{N} \begin{cases} \frac{1}{2}(y_i - y_{\text{target},i})^2 & \text{if } |y_i - y_{\text{target},i}| \leq \delta \\ \delta |y_i - y_{\text{target},i}| - \frac{1}{2}\delta^2 & \text{otherwise} \end{cases}\end{split}\]

The choice of the cost function should align with the specific problem being solved, as different tasks (e.g., regression, binary classification, multiclass classification) require different loss functions.