Problem Sheet 9#

Question 1#

a. Prove the following theorem on the convergence of gradient descent for convex and smooth functions:

Let \( f: \mathbb{R}^n \rightarrow \mathbb{R} \) be a convex and continuously differentiable function. Assume the gradient of \( f \), denoted by \( \nabla f(x) \), is Lipschitz continuous with Lipschitz constant \( L > 0 \). If the learning rate \( \alpha \) satisfies \( 0 < \alpha \leq \frac{1}{L} \), then the sequence \( \{x_k\} \) generated by the gradient descent update rule:

\[ x_{k+1} = x_k - \alpha \nabla f(x_k), \]

converges to the global minimum of \( f \), \(x^*=\arg \min_x(f(x))\).

b. Explain the statement and relevance of the above theorem.

c. Outline in your own words the proof of the above theorem.

d. Outline possible strengths, weakness and pitfalls with the implementation of gradient descent for finding a minimum of a function.

Question 2#

a. In your own words explain the statement and briefly outline the proof of the Universal Approximation Theorem for a sigmoid function given by: Let \( C(X,\mathbb{R}^{m})\) denote the set of continuous functions from a subset \(X\) of a Euclidean \(\mathbb{R}^{n}\) space to a Euclidean space \(\mathbb{R}^{m}.\)

Let \(\sigma\) be any continuous sigmoidal function. Then the finite sums of the form:

\[ G(\vec{x})=\Sigma_{j=1}^N \alpha_j\sigma(\vec{y}_j\cdot \vec{x}+\theta_j),\]

are dense in \(C([0,1]^n)\). In other words, given any \(f\in C([0,1]^n)\) and \(\epsilon >0\), there is a sum \(G(\vec{x})\) of the above form, for which:

\[ |G(\vec{x})-f(\vec{x}) |<\epsilon,\]

for all \(\vec{x}\in [0,1]^n\)

b. Explain the relevance of the Universal Approximation Theorem to Artificial Neural Networks.