SI152: Numerical Optimization
Lec 1.
Optimization
Three elements of an optimization problem: Objective(目标), Variables(变量), Constraints(约束条件).
\[\textbf{Objective} : \min_{x\in\mathbb{R}^n} f(x) \\ \textbf{Variables} : x \\ \textbf{Constraints} : \begin{cases} c_i(x) = 0, &i\in\mathcal{E} \\ c_i(x) < 0, &i\in\mathcal{I} \\ x\in\Omega \end{cases} \]
Classifacation:
- Linear Optimization v.s. Nonlinear Optimization
- Constrained Optimization v.s. Unconstrained Optimization
- Continuous Optimization v.s. Integer Optimization
- Stochastic Optimization v.s. Deterministic Optimization
- Convex Optimization v.s. Nonconvex Optimization
- Single-objective Optimization v.s. Multi-objective Optimization
- Bilevel Optimization v.s. Single-level Optimization
- Bilevel Optimization: \(F(x, y(x))\) is the objective function of the upper level problem, which depends on the solution \(y(x)\) of the lower level problem.
- Global Optimization v.s. Local Optimization
Equation \(\iff\) Optimization
Iterative algorithms: \(x_{k+1} = \mathcal{M} (x_k)\)
Generally, the sequence from iterative algorithms converges to an “optimal solution”.
- Globally convergent algorithm v.s. Locally convergent algorithm
Convergence rates
For sequence \(\{x_k\}\) converges to \(x^*\).
Q-linear convergence: If there exists a constant \(c \in [0,1)\) and \(\hat{k}\geq 0\) such that
\[|| x_{k+1} – x^* ||_2 \leq c || x_{k} – x^* ||_2 ~,~\forall k\geq \hat{k} \]
then \(\{x_k\}\) converges Q-linearly to \(x^*\).
- i.e.
\[\limsup_{k\to\infty} \dfrac{|| x_{k+1} – x^* ||_2}{|| x_{k} – x^* ||_2 } < 1 \]
- also called geometric convergence.
Q-superlinear convergence: If there exists a sequence \(\{c_k\} \to 0\) such that
\[|| x_{k+1} – x^* ||_2 \leq c_k || x_{k} – x^* ||_2 \]
then \(\{x_k\}\) converges Q-superlinearly to \(x^*\).
- i.e.
\[\lim_{k\to\infty} \dfrac{|| x_{k+1} – x^* ||_2}{|| x_{k} – x^* ||_2 } = 0 \]
Q-quadratic convergence: If there exists a constant \(c \geq 0\) and \(\hat{k}\geq 0\) such that
\[|| x_{k+1} – x^* ||_2 \leq c || x_{k} – x^* ||_2^2 ~,~\forall k\geq \hat{k} \]
then \(\{x_k\}\) converges Q-linearly to \(x^*\).
“R-” convergence: skipped. So we’ll drop the “Q-”.
Sublinear convergence (Arithmetic Convergence): If the sequence \(\{r_k\}\) converges to \(r^*\) in such a way that
\[|| r_{k+1} – r^* ||_2 \leq C \dfrac{|| r_{0} – r^* ||_2}{k^p} , k\geq 1, 0<p<\infty \]
where \(C\) is a fixed positive number, the sequence is said to converge arithmetically to \(r^∗\) with order \(p\).
- i.e.
\[\limsup_{k\to\infty} \dfrac{|| r_{k+1} – r^* ||_2}{|| r_{k} – r^* ||_2 } = 1 \]
Lec 2.
Linear equations
Solution:
- Direct methods: Gaussian elimination
- Iterative methods
- Conjugate gradient method
The Jacobi Iteration Method
For \(n \times n\) linear system \(Ax=b\), let solution be \(x^*\):
Let \(A=L+D+U\)
\[L=\begin{bmatrix} 0 & \cdots & \cdots & 0 \\ a_{21} & \ddots & & \vdots \\ \vdots & & \ddots & \vdots \\ a_{n1} & \cdots & a_{n,n-1} & 0 \end{bmatrix}, \quad D=\begin{bmatrix} a_{11} & 0 & \cdots & 0 \\ 0 & a_{22} & & \vdots \\ \vdots & & \ddots & 0 \\ 0 & \cdots & 0 & a_{nn} \end{bmatrix}, \quad U=\begin{bmatrix} 0 & a_{12} & \cdots & a_{1n} \\ \vdots & \ddots & \ddots & \vdots \\ \vdots & & \ddots & a_{n-1,n} \\ 0 & \cdots & \cdots & 0 \end{bmatrix} \]
Trasform \((D+L+U)x=b\) into:
\[x = D^{-1} b – D^{-1}(L+U)x \]
So Jacobi iterative technique (Matrix form):
\[x^{(k+1)} = D^{-1} b – D^{-1}(L+U)x^{(k)} \stackrel{def}{=} Mx^{(k)} + c \]
(Elementwisely form):
\[x^{(k+1)}_i = \dfrac{1}{a_{ii}}\left(b_i – \sum_{j\neq i} a_{ij} x^{(k)}_{j} \right) \]
Convergence:
\[x^{(k+1)} – x^* = M(x^{(k)} – x^*) \implies ||x^{(k+1)} – x^*|| \leq ||M||\cdot||(x^{(k)} – x^*) || \]
So $||M||<1 \implies \text{linear convergence} $ , \(||M||\) is spectral norm(谱范数) of \(M\).
Weighted Jacobi method (1):
\[x^{(k+1)} = (1-\omega)x^{(k)} + \omega D^{-1}(b – (L+U)x^{(k)}) \]
Weighted Jacobi method (2):
\[\begin{aligned} x^{(k+1)} &= (1-\omega)x^{(k)} + \omega D^{-1}(b – (L+U)x^{(k)}) \\ &= \end{aligned} \]
Gauss-Seidel Method
(Matrix form):
\[x^{(k+1)} = (L+D)^{-1} (b – U x^{(k)}) \]
- Often converges faster than Jacobi.
- Still depends on the spectral radius but with improved convergence
Nonlinear equations
Nonlinear equation \(f(x)=0\), solution is \(x_*\).
Definition
A function \(G\) is Lipschitz continuous with constant \(L \geq 0\) in \(\mathcal{X}\) if
\[\lVert G(x_1) -G(x_2) \rVert \leq L \lVert x_1 – x_2 \rVert \]
Bisection Method
In use of Intermediate Value Theorem:
If \(f(x)\) is continuous on \([a, b]\) and \(f(a) \cdot f(b) < 0\), then \(\exist c\in (a, b) , f(c) = 0\).
Newton’s method
From:
\[f(x) = f(x_k) + f'(x_k)(x-x_k) + o(x-x_k) \]
Then:
\[x_{k+1} := x_{k} – \dfrac{f(x_{k})}{f'(x_{k})} \]
Quadratic convergence:
Let \(e_k = x_k – x_*\)
\[\begin{aligned} e_{k+1} &= e_{k} – \dfrac{0 + f'(x_*)e_k + \frac{f”(x_*)}{2}e_k^2 + O(e_k^3)}{f'(x_*)+f”(x_*)e_k + O(e_k^2)} \\ &= e_{k} – e_{k}\left(1 + \dfrac{f”(x_*)}{2f'(x_*)}e_k + O(e_k^2)\right) \left(1 – \dfrac{f”(x_*)}{f'(x_*)}e_k + O(e_k^2)\right) \\ &= \dfrac{f”(x_*)}{2f'(x_*)}e_k^2 \end{aligned} \]
Problem of Newton’s method
- May cause cycling and even divergence. (\(f(x) = \arctan(x)\))
- May have \(f'(x_k) = 0\). (\(f(x) = x^3\))
- \(f(x_k)\) may be undefined.
Secant method
Change tangent into secant:
\[x_{k+1} := x_{k} – \dfrac{x_{k} – x_{k-1}}{f(x_{k}) – f(x_{k-1})} f(x_{k}) \]
Superline convergence:
Let \(e_k = x_k – x_*\), \(M=\dfrac{f”(x_*)}{2f'(x_*)}\)
\[\begin{aligned} f(x_*+e_k) &\approx e_k f'(x_*) (1+M e_k) \\ f(x_*+e_k) – f(x_*+e_{k-1}) &\approx f'(x_*) (e_k-e_{k-1}) (1+M(e_k+e_{k-1})) \\ e_{k+1} &\approx e_{k} – \dfrac{e_k (1+M e_k)}{1+M(e_k+e_{k-1})} \\ &= \dfrac{M e_{k-1}e_{k}}{1+M(e_k+e_{k-1})} \approx M e_{k-1}e_{k} \end{aligned} \]
If \(|e_{k+1}|\approx C|e_k|^p\), then
\[C|e_k|^p \approx |M||e_{k-1}||e_{k}| \\ |e_k| = C|e_{k-1}|^p \approx \left(\dfrac{|M|}{C}\right)^{\frac{1}{p-1}} |e_{k-1}|^{\frac{1}{p-1}} \]
So we can get:
\[|e_{k+1}|\approx \left| \dfrac{f”(x_*)}{2f'(x_*)} \right|^{\frac{\sqrt{5}-1}{2}}|e_k|^{\frac{\sqrt{5}+1}{2}} \]
Newton’s method for multivariable
Suppose we have $F: \mathbb{R}^m \mapsto \mathbb{R}^n $ and solve \(F(x) = 0\).
Then
\[\begin{gathered} \nabla F(x_k)^T d = -F(x_k) \\ x_{k+1} = x_k + d_k \end{gathered} \]
Quadratical convergence:
Newton’s method converges quadratically! (under nice assumptions)
- \(F\) is continuously differentiable in an open convex set \(\mathcal{X}\subset\mathbb{R}^n\) with $ x_* \in\mathcal{X} $.
- The Jacobian of \(F\) at \(x_*\) is invertible and is bounded in norm by \(M > 0\), i.e.,
\[\lVert \nabla F(x_*)^{-T} \rVert _2 \leq M \]
- If \(M\) is large, then following the gradient may send us far away.
- For some neighborhood of \(x_∗\) with radius \(r > 0\) contained in \(\mathcal{X}\), i.e.,
\[\mathbb{B}(x_∗, r) := \{ x\in\mathbb{R}^n \mid \lVert x-x_* \rVert _2 \leq r \}\subset \mathcal{X} \]
the Jacobian of F(x) is Lipschitz continuous with constant \(L\) in \(\mathbb{B}(x_∗, r)\).
- If \(L\) is large, then the gradients of the functions change rapidly.
Equations and Optimization
Linear
\[Kx = y \implies \min_x \dfrac{1}{2}\lVert Kx-y \rVert _2 \iff K^T(Kx-y)=0 \]
Nonlinear
- Nonlinear equation \(\implies\) Optimization
- \(F(x) = 0 \implies \min_{x\in\mathbb{R}} f(x) := \frac{1}{2}\lVert F(x) \rVert _2^2\)
- However, we generally do not prefer solve the latter, since
\[ \nabla f(x) = F(x)^T \nabla F(x), \nabla^2 f(x) = \nabla F(x)^T \nabla F(x) \]
- Optimization \(\implies\) Nonlinear equation
- For any \(C^1\)-smooth,
\[\min_{x\in\mathbb{R}} f(x) \iff \nabla f(x) = 0 \]
- For any \(C^1\)-smooth,










没有回复内容