Problem Sheet 8#

Question 1#

  1. Suppose we have a cost function for producing a product that depends on two variables, \(x\) and \(y\), given by:

\[C(x,y) = 100x^2 + 50xy + 2y^2 - 100x - 50y + 200\]

Find the values of \(x\) and \(y\) that minimize the cost function using the Newton-Raphson method. Choose an initial guess for the root, (x0, y0) = (1, 1).

Question 2#

a. Suppose we have a two variables loss function, \(\theta_1\) and \(\theta_2\), given by:

\[J(\theta_1, \theta_2) = \theta_1^2 + \theta_2^2 - 4\theta_1 - 4\theta_2\]

Find the values of \(x\) and \(y\) that minimize the cost function using the Newton-Raphson method, with the initial conditions \(\theta_1 = 2.0\) and \(\theta_2 = 2.0\).

b. Suppose we have a cost function for producing a product that depends on two variables, \(x\) and \(y\), given by:

\[C(x,y) = 100x^2 + 50xy + 2y^2 - 100x - 50y + 200\]

Find the values of \(x\) and \(y\) that minimize the cost function using the Newton-Raphson method. Choose an initial guess for the root, (x0, y0) = (1, 1).

c. Describe in your own word the steps of the multi-variable Newton-Raphson method.

Question 4#

Describe in your own words the perceptron and the role of different activation functions.

Question 5#

a. Describe in your own words the McCulloch-Pitts Neuron.

b. State the mathematical formula for the feed-forward calculation of an artificial neural network (ANN) with three inputs, two hidden layers, each having three ReLU (Rectified Linear Unit) nodes, and an output layer with a Sigmoid activation function.

c. State the mathematical formula for the feed-forward calculation of an general artificial neural network (ANN) with \(n_{inputs}\) inputs, \(k\) hidden layers, each having \(n_k\) ReLU (Rectified Linear Unit) nodes, and an output layer with a Sigmoid activation function.

Question 6#

a. i. Describe in your own words the gradient descent algorithm and state the strengths and weaknesses of the algorithm.

ii. Find the mimimum of simple quadratic cost function

\[ J(\theta) = \theta^2+2\theta+100 \]

using gradient descent, with the inital guess of \(\theta_0=10\) and a learning rate of \(\alpha = 0.1\), for two iterations.

b. In your own words outline the proof of the theorem:

Let \( f: \mathbb{R}^n \rightarrow \mathbb{R} \) be a convex and continuously differentiable function. Assume the gradient of \( f \), denoted by \( \nabla f(x) \), is Lipschitz continuous with Lipschitz constant \( L > 0 \). If the learning rate \( \alpha \) satisfies \( 0 < \alpha \leq \frac{1}{L} \), then the sequence \( \{x_k\} \) generated by the gradient descent update rule

\[ x_{k+1} = x_k - \alpha \nabla f(x_k) \]

converges to the global minimum of \( f \).

c. Find the mimimum of simple cost function with two variables:

\[J(\theta_0, \theta_1) = \theta_0^2 + 2\theta_1^2,\]

using the gradient descent, with the inital guesses

  • \(\theta_0 = 1.0\)

  • \(\theta_1 = 2.0\)

and a learning rate of \(\alpha = 0.1\), for two iterations.

d. 9. Given the cost function with three variables:

\[J(\theta_1, \theta_2, \theta_3) = \theta_1^2 + 2\theta_2^2 + 3\theta_3^2\]

Minize using gradient descent, given the initial conditions

\(\theta_1 = 1.0\), \(\theta_2 = 2.0\), \(\theta_3 = 3.0\) and learning rate of \(\alpha = 0.1\) for two iterations.

Question 7#

  1. Describe the relevance and different types of cost functions for an artificial neural networks.

Question 8#

a. Outline the back-propagation algorithm for an artificial neural networks.