【SI152笔记】part1：方程与优化

猿柒

8个月前更新

430

SI152: Numerical Optimization

Lec 1.

Optimization

Three elements of an optimization problem: Objective（目标）, Variables（变量）, Constraints（约束条件）.

\[\textbf{Objective} : \min_{x\in\mathbb{R}^n} f(x) \\ \textbf{Variables} : x \\ \textbf{Constraints} : \begin{cases} c_i(x) = 0, &i\in\mathcal{E} \\ c_i(x) < 0, &i\in\mathcal{I} \\ x\in\Omega \end{cases} \]

Classifacation:

Linear Optimization v.s. Nonlinear Optimization
Constrained Optimization v.s. Unconstrained Optimization
Continuous Optimization v.s. Integer Optimization
Stochastic Optimization v.s. Deterministic Optimization
Convex Optimization v.s. Nonconvex Optimization
Single-objective Optimization v.s. Multi-objective Optimization
Bilevel Optimization v.s. Single-level Optimization
- Bilevel Optimization: $F(x, y(x))$ is the objective function of the upper level problem, which depends on the solution $y(x)$ of the lower level problem.
Global Optimization v.s. Local Optimization

Equation $\iff$ Optimization

Iterative algorithms: $x_{k+1} = \mathcal{M} (x_k)$

Generally, the sequence from iterative algorithms converges to an “optimal solution”.

Globally convergent algorithm v.s. Locally convergent algorithm

Convergence rates

For sequence $\{x_k\}$ converges to $x^*$.

Q-linear convergence: If there exists a constant $c \in [0,1)$ and $\hat{k}\geq 0$ such that

\[|| x_{k+1} – x^* ||_2 \leq c || x_{k} – x^* ||_2 ~,~\forall k\geq \hat{k} \]

then $\{x_k\}$ converges Q-linearly to $x^*$.

i.e.

\[\limsup_{k\to\infty} \dfrac{|| x_{k+1} – x^* ||_2}{|| x_{k} – x^* ||_2 } < 1 \]

also called geometric convergence.

Q-superlinear convergence: If there exists a sequence $\{c_k\} \to 0$ such that

\[|| x_{k+1} – x^* ||_2 \leq c_k || x_{k} – x^* ||_2 \]

then $\{x_k\}$ converges Q-superlinearly to $x^*$.

i.e.

\[\lim_{k\to\infty} \dfrac{|| x_{k+1} – x^* ||_2}{|| x_{k} – x^* ||_2 } = 0 \]

Q-quadratic convergence: If there exists a constant $c \geq 0$ and $\hat{k}\geq 0$ such that

\[|| x_{k+1} – x^* ||_2 \leq c || x_{k} – x^* ||_2^2 ~,~\forall k\geq \hat{k} \]

then $\{x_k\}$ converges Q-linearly to $x^*$.

“R-” convergence: skipped. So we’ll drop the “Q-”.

Sublinear convergence (Arithmetic Convergence): If the sequence $\{r_k\}$ converges to $r^*$ in such a way that

\[|| r_{k+1} – r^* ||_2 \leq C \dfrac{|| r_{0} – r^* ||_2}{k^p} , k\geq 1, 0<p<\infty \]

where $C$ is a fixed positive number, the sequence is said to converge arithmetically to $r^∗$ with order $p$.

i.e.

\[\limsup_{k\to\infty} \dfrac{|| r_{k+1} – r^* ||_2}{|| r_{k} – r^* ||_2 } = 1 \]

Lec 2.

Linear equations

Solution:

Direct methods: Gaussian elimination
Iterative methods
Conjugate gradient method

The Jacobi Iteration Method

For $n \times n$ linear system $Ax=b$, let solution be $x^*$:

Let $A=L+D+U$

\[L=\begin{bmatrix} 0 & \cdots & \cdots & 0 \\ a_{21} & \ddots & & \vdots \\ \vdots & & \ddots & \vdots \\ a_{n1} & \cdots & a_{n,n-1} & 0 \end{bmatrix}, \quad D=\begin{bmatrix} a_{11} & 0 & \cdots & 0 \\ 0 & a_{22} & & \vdots \\ \vdots & & \ddots & 0 \\ 0 & \cdots & 0 & a_{nn} \end{bmatrix}, \quad U=\begin{bmatrix} 0 & a_{12} & \cdots & a_{1n} \\ \vdots & \ddots & \ddots & \vdots \\ \vdots & & \ddots & a_{n-1,n} \\ 0 & \cdots & \cdots & 0 \end{bmatrix} \]

Trasform $(D+L+U)x=b$ into:

\[x = D^{-1} b – D^{-1}(L+U)x \]

So Jacobi iterative technique (Matrix form):

\[x^{(k+1)} = D^{-1} b – D^{-1}(L+U)x^{(k)} \stackrel{def}{=} Mx^{(k)} + c \]

(Elementwisely form):

\[x^{(k+1)}_i = \dfrac{1}{a_{ii}}\left(b_i – \sum_{j\neq i} a_{ij} x^{(k)}_{j} \right) \]

Convergence:

\[x^{(k+1)} – x^* = M(x^{(k)} – x^*) \implies ||x^{(k+1)} – x^*|| \leq ||M||\cdot||(x^{(k)} – x^*) || \]

So $||M||<1 \implies \text{linear convergence} $ , $||M||$ is spectral norm（谱范数） of $M$.

Weighted Jacobi method (1):

\[x^{(k+1)} = (1-\omega)x^{(k)} + \omega D^{-1}(b – (L+U)x^{(k)}) \]

Weighted Jacobi method (2):

\[\begin{aligned} x^{(k+1)} &= (1-\omega)x^{(k)} + \omega D^{-1}(b – (L+U)x^{(k)}) \\ &= \end{aligned} \]

Gauss-Seidel Method

(Matrix form):

\[x^{(k+1)} = (L+D)^{-1} (b – U x^{(k)}) \]

Often converges faster than Jacobi.
Still depends on the spectral radius but with improved convergence

Nonlinear equations

Nonlinear equation $f(x)=0$, solution is $x_*$.

Definition

A function $G$ is Lipschitz continuous with constant $L \geq 0$ in $\mathcal{X}$ if

\[\lVert G(x_1) -G(x_2) \rVert \leq L \lVert x_1 – x_2 \rVert \]

Bisection Method

In use of Intermediate Value Theorem:
If $f(x)$ is continuous on $[a, b]$ and $f(a) \cdot f(b) < 0$, then $\exist c\in (a, b) , f(c) = 0$.

Newton’s method

From:

\[f(x) = f(x_k) + f'(x_k)(x-x_k) + o(x-x_k) \]

Then:

\[x_{k+1} := x_{k} – \dfrac{f(x_{k})}{f'(x_{k})} \]

Quadratic convergence:
Let $e_k = x_k – x_*$

\[\begin{aligned} e_{k+1} &= e_{k} – \dfrac{0 + f'(x_*)e_k + \frac{f”(x_*)}{2}e_k^2 + O(e_k^3)}{f'(x_*)+f”(x_*)e_k + O(e_k^2)} \\ &= e_{k} – e_{k}\left(1 + \dfrac{f”(x_*)}{2f'(x_*)}e_k + O(e_k^2)\right) \left(1 – \dfrac{f”(x_*)}{f'(x_*)}e_k + O(e_k^2)\right) \\ &= \dfrac{f”(x_*)}{2f'(x_*)}e_k^2 \end{aligned} \]

Problem of Newton’s method

May cause cycling and even divergence. ($f(x) = \arctan(x)$)
May have $f'(x_k) = 0$. ($f(x) = x^3$)
$f(x_k)$ may be undefined.

Secant method

Change tangent into secant:

\[x_{k+1} := x_{k} – \dfrac{x_{k} – x_{k-1}}{f(x_{k}) – f(x_{k-1})} f(x_{k}) \]

Superline convergence:
Let $e_k = x_k – x_*$, $M=\dfrac{f”(x_*)}{2f'(x_*)}$

\[\begin{aligned} f(x_*+e_k) &\approx e_k f'(x_*) (1+M e_k) \\ f(x_*+e_k) – f(x_*+e_{k-1}) &\approx f'(x_*) (e_k-e_{k-1}) (1+M(e_k+e_{k-1})) \\ e_{k+1} &\approx e_{k} – \dfrac{e_k (1+M e_k)}{1+M(e_k+e_{k-1})} \\ &= \dfrac{M e_{k-1}e_{k}}{1+M(e_k+e_{k-1})} \approx M e_{k-1}e_{k} \end{aligned} \]

If $|e_{k+1}|\approx C|e_k|^p$, then

\[C|e_k|^p \approx |M||e_{k-1}||e_{k}| \\ |e_k| = C|e_{k-1}|^p \approx \left(\dfrac{|M|}{C}\right)^{\frac{1}{p-1}} |e_{k-1}|^{\frac{1}{p-1}} \]

So we can get:

\[|e_{k+1}|\approx \left| \dfrac{f”(x_*)}{2f'(x_*)} \right|^{\frac{\sqrt{5}-1}{2}}|e_k|^{\frac{\sqrt{5}+1}{2}} \]

Newton’s method for multivariable

Suppose we have $F: \mathbb{R}^m \mapsto \mathbb{R}^n $ and solve $F(x) = 0$.

Then

\[\begin{gathered} \nabla F(x_k)^T d = -F(x_k) \\ x_{k+1} = x_k + d_k \end{gathered} \]

Quadratical convergence:

Newton’s method converges quadratically! (under nice assumptions)

$F$ is continuously differentiable in an open convex set $\mathcal{X}\subset\mathbb{R}^n$ with $ x_* \in\mathcal{X} $.
The Jacobian of $F$ at $x_*$ is invertible and is bounded in norm by $M > 0$, i.e.,

\[\lVert \nabla F(x_*)^{-T} \rVert _2 \leq M \]
- If $M$ is large, then following the gradient may send us far away.
For some neighborhood of $x_∗$ with radius $r > 0$ contained in $\mathcal{X}$, i.e.,

\[\mathbb{B}(x_∗, r) := \{ x\in\mathbb{R}^n \mid \lVert x-x_* \rVert _2 \leq r \}\subset \mathcal{X} \]

the Jacobian of F(x) is Lipschitz continuous with constant $L$ in $\mathbb{B}(x_∗, r)$.

If $L$ is large, then the gradients of the functions change rapidly.

Equations and Optimization

Linear

\[Kx = y \implies \min_x \dfrac{1}{2}\lVert Kx-y \rVert _2 \iff K^T(Kx-y)=0 \]

Nonlinear

Nonlinear equation $\implies$ Optimization
- $F(x) = 0 \implies \min_{x\in\mathbb{R}} f(x) := \frac{1}{2}\lVert F(x) \rVert _2^2$
- However, we generally do not prefer solve the latter, since
  
  \[ \nabla f(x) = F(x)^T \nabla F(x), \nabla^2 f(x) = \nabla F(x)^T \nabla F(x) \]
Optimization $\implies$ Nonlinear equation
- For any $C^1$-smooth,
  
  \[\min_{x\in\mathbb{R}} f(x) \iff \nabla f(x) = 0 \]