CVX notes

时间：2018-07-31 23:28:45 阅读：365 评论：0 收藏：0 [点我收藏+]

标签：red 必须 linear \n rspec inf where normal sqrt

CVX notes

CVX notes

Preliminaries

1. PSD

M is positive semidefinite matrix \(\iff\) all principal submatrices \(P\) of \(M\) are PSD

Note: This follows by considering the quadratic form \(x^T Mx\) and looking at the components of \(x\) corresponding to the defining subset of principal submatrix. The converse is trivially true.

M is PSD \(\iff\) all principal minors are non-negative (所有主子式非负)

将M写成二次型：
\[ x^T M x = \sum_{i,j}M_{ij}x_ix_j \]
于是取 \(x\) 为标准基 \(e_i ~\implies M_{ii} \ge 0 \implies \mathbf{tr}(M) \ge 0\) , 再取\(x\)为零向量只有 i，j两个位置为 1，则
\[ \begin{gathered} x^T M x = M_{ii}M_{jj} - M_{ij}^2 \ge 0 ~~(PSD) \\implies M_{ij} \le \sqrt{M_{ii}M_{jj}} \le \frac{M_{ii} + M_{jj}}{2} \end{gathered} \]

2. Matrix norm

General definition of a norm: 技术分享图片

Matrix norm: 技术分享图片

Frobenius norm: \(\|A\|_F := \sqrt{\langle A,A\rangle_F} = \sqrt{\mathbf{tr}(A^*A)}\)
Induced norm: \(\|A\|_p := \sup_\limits{\|x\|_p = 1} \|Ax\|_p\)
Nuclear norm: \(\|A\|_{nuclear} := \sum \sigma_i(A)\) (奇异值之和)
Spectral norm: \(\|A\|_{spectral} := \lambda_1\) (最大特征值)

Spectrial radius

技术分享图片

3. Duality

Two ==equivalent== ways to represent a convex set:

standard representation: The family of points in the set
dual representation: The set of halfspaces containing the set (半平面的交集)

A closed convex set \(S\) is the intersection of all closed halfspaces \(H\) containing it. 技术分享图片

技术分享图片

Polar
Let \(S \subseteq \mathbb{R}^n\) be a convex set containing the origin. The polar of \(S\) is defined as follows
\[ S^{\circ} := \{y ~|~ y^Tx \le 1, ~\forall x \in S\} \]

Note

polar is one way of representing the all halfspaces containing a convex set
every halfspace \(a^Tx \le b\) with \(b \neq 0\) can be written as a “normalized” inequality \(y^T x \le 1\), by dividing by \(b\)
\(S^{\circ}\) can be thought of as the normalized representations of halfspaces containing \(S\)

Properties of the polar:

\(S^{\circ\circ} = S\)
\(S^{\circ}\) is a closed convex set containing the origin
When 0 is in the interior of \(S\), then \(S^{\circ}\) is bounded
When \(S\) is non-convex, \(S^{\circ} = (\mathbf{conv}(S))^{\circ}\), and \(S^{\circ\circ} = \mathbf{conv}(S)\)

技术分享图片

Polar duality of convex cones 技术分享图片

技术分享图片

Notes

\(K^{\circ\circ} = K\)
\(K^{\circ}\) is closed and convex

Conjugation of convex functions
Let \(f: \mathbb{R}^n \mapsto \mathbb{R}\cup\{\infty\}\) be a convex function. The ==conjugation== of \(f\) is
\[ f^*(y) := \sup_\limits{x}(y^Tx - f(x)) \]
技术分享图片

Properties of the conjugate

\(f^{**} = f\)
\(f^*\) is convex (supremum of affine functions of \(y\))

技术分享图片

Convex sets

Convex functions

技术分享图片

affine is convex: \(f(x) = a^T x+b\)

affine 既凸也凹
任何_范数_是凸的

Proof: let \(\pi(x)\) be a norm of \(x\), then
\(f\) is convex \(\iff\) epi(\(f\)) is convex

1. Closed convex

A convex function \(f\) is called closed if its epigraph is a closed set.

\(f\) which is convex and continuous on a closed domain is a closed function. (norms)
all differentiable convex functions are closed with dom\(f = \mathbb{R}^n\).
当考虑一个凸函数时，通常认为在dom\(f\)外取值为\(\infty\)
Jensen‘s inequality:
Corollary:

pf: \(f(x) = f(\sum\alpha_i x_i) \le \sum \alpha_i f(x_i) \le \max_\limits{i} f(x_i)\)

2. Level sets

技术分享图片

Note: the convexity of level sets does not characterize convex functions, but quasiconvex functions.

convex \(f\) is closed \(\implies\) all its level sets are closed

Some convex sets

norm ball (\(\{x\in \mathbb{R}^n | \|x\| \le 1\}\)) is convex and closed
椭球(\(\{x | (x-a)^T Q (x-a) \le r^2\}\)) is convex and closed
pf: \(x^TQy := \langle x, y \rangle\) 满足内积的三条性质
- bilinearity
- symmetry
- positivity
  上述三条性质 \(\iff\) Q is PSD
\(\epsilon\)-neighborhood:

3. Operations perserving convexity of functions

stability under taking weighted sums: \(f,g \mapsto \lambda f + \mu g, \; \lambda,\mu \ge 0\)
stability under affine substitutions of the argument: \(x \mapsto Ax+b\) or \(f(x) \mapsto \phi(x) = f(Ax+b)\)
stability under taking pointwise sup: \(\{f_i\}_{i \in \mathcal{I}} \mapsto g(x) := \sup_\limits{i \in \mathcal{I}}f_i(x)\), 凸函数族 \(\{f_i\}_{i \in \mathcal{I}}\) 逐点取上确界而成的函数也是凸的
stability under partial minimization: \(f(x,y)\) jointly convex in \((x,y)\), then \(g(x) = \inf_\limits{y} f(x,y)\) is convex (suppose g is proper, i.e., > -\(\infty\) everywhere and is finite at least at one point)
stability under perspective: \(f(x) \mapsto g(x,t) = tf(x/t), \mathbf{dom}g = \{(x,t) | x/t \in \mathbf{dom}f, t > 0\}\)

4. Detect convexity

Necessary and Sufficient Convexity Condition for smooth function:

一阶可微的光滑函数 \(f\) 是凸的 \(\iff\) \(f‘(x)\) 单调非减
二阶可微的光滑函数 \(f\) 是凸的 \(\iff\) \(f‘‘(x) \ge 0\)

subgradient property is characteristic of convex functions: 技术分享图片

5. Subgradient

Examples

6. Optimality conditions

凸函数的局部最优等价于全局最优。

第一充要条件（凸函数）
\(x^* \in \mathbf{dom}f?\) is the minimizer \(\iff?\) \(0 \in \partial f(x^*)?\)

7. Strong convexity

A differentiable function f is strongly convex if
\[ f(y) \ge f(x) + \nabla f(x)^T(y-x) + \frac{\mu}{2} \|y-x\|^2 \]

Note

\(f\) is not necessarily differentiable, (see the equivalent definition)
if \(f\) is non-smooth, gradient -> subgradient
strong convexity \(\implies\) strict convexity

Note: Intuitively speaking, strong convexity means that there exists a quartic lower bound on the growth of the function.

Equivalent definition
\[ \begin{align} &(i)~f(y)\ge f(x)+\nabla f(x)^T(y-x)+\frac{\mu}{2}\lVert y-x \rVert^2,~\forall x, y.\ &(ii)~g(x) = f(x)-\frac{\mu}{2}\lVert x \rVert^2~\text{is convex},~\forall x.\ &(iii)~\langle \nabla f(x) - \nabla f(y),x-y \rangle \ge \mu \lVert x-y\rVert^2,~\forall x, y.\ &(iv)~f(\alpha x+ (1-\alpha) y) \le \alpha f(x) + (1-\alpha) f(y) - \frac{\alpha (1-\alpha)\mu}{2}\Vert x-y\rVert^2,~\alpha \in [0,1].\ &(v)~\nabla^2 f(x) \succeq \mu \boldsymbol{I} \end{align} \]

Lagrange Duality

Consider an optimization problem in standard form (not necessarily convex)
\[ \begin{array}{ll} \underset{x}{\text{minimize}} & f_0(x) \\text{subject to} & f_i(x) \le 0, ~i=1,\cdots,m \~ & h_i(x) = 0, ~i=1,\cdots,p \end{array} \]

The Lagrangian is
\[ L(x,\boldsymbol{\lambda},\boldsymbol{\mu}) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \mu_i h_i(x) \]

The Lagrange dual function is defined as
\[ g(\lambda, \mu) = \inf_{x} L(x,\lambda,\mu) \]

技术分享图片

Lagrange dual problem
\[ \begin{array}{ll} \underset{\lambda, \mu}{\text{maximize}} & g(\lambda, \mu) \\text{subject to} & \boldsymbol{\lambda} \succeq \mathbf{0} \end{array} \]

Weak duality
\[ d^* \le p^* \]

\(d^*\): optima of dual problem
\(p^*\): optima of primal problem
duality gap: \(p^* - d^*\)
always hold

Strong dualiy
\[ d^* = p^* \]

constraint qualifications \(\implies\) strong duality
Slater’s Constraint Qualification: a convex problem is strictly feasible (i.e., \(\exists ~x \in \mathbf{int} \mathcal{D}: x \in \Omega\))

Complementary slackness
技术分享图片

KKT conditions
技术分享图片

技术分享图片

Cones

Tagent cone
Let M be a (nonempty) convex set and \(x^* \in M\), the tagent cone of \(M\) at \(x^*\) is the cone
\[ \begin{split} T_M(x^*) &= \{h \in \mathbb{R}^n | x^* + th \in M, \; \forall t > 0 \} \&= \{y \in \mathbb{R}^n ~|~ y - x^* \in M\} \end{split} \]

Note:

Geometrically, this is the set of all directions leading from \(x^*\) inside \(M\)
convex but not necessarily closed
fact: if \(x^*\) is a minimizer, then \(\forall h \in T_M(x^*) \implies h^T \nabla f(x^*) \ge 0\). （因为tangent cone里面都是可行解，所以必须不是下降方向）
\(T_M(x^*) = \mathbb{R}^n \iff x^* \in \mathbf{int}M\)

e.g. 多面体
\[ M = \{x | Ax \le b\} = \{x | a_i^Tx \le b_i, \; i = 1,\dots,m\} \]
the tangent cone at \(x^*\) is
\[ T_M(x^*) = \{h~|~a_i^T h \le 0, ~\forall i, ~a_i^T x^* = b_i\} \]

Normal cone: the polar cone of tangent cone
\[ N_M(x^*) = \{g \in \mathbb{R}^n ~|~ \langle g, y-x^*\rangle \le 0, ~\forall y \in M\} \]

Note:

normal cone is the polar to tangent cone, i.e.,
\[ \begin{split} T_M(x^*) &= \{g \in \mathbb{R}^n ~|~ \langle g, y-x^*\rangle \ge 0, ~\forall y \in M\} \N_M(x^*) &= \{g \in \mathbb{R}^n ~|~ \langle g, y-x^*\rangle \le 0, ~\forall y \in M\} \end{split} \]
fact: if \(x^*\) is a minimizer, then \(-\nabla f(x^*) \in N_M(x^*)\).

Algorithm convergence

~	Stepsize Rule	Convergence Rate	Iteration Complexity
Gradient descent
strongly convex & smooth	\(\eta_t = \frac{2}{\mu + L}\)	\(O\left(\frac{\kappa -1}{\kappa +1}\right)^t\)	\(O\left(\frac{\log\frac{1}{\epsilon}}{\log\frac{\kappa+1}{\kappa-1}}\right)\)
convex & smooth	\(\eta_t = \frac{1}{L}\)	\(O(\frac{1}{\sqrt{t}})\)	\(O(\frac{1}{\epsilon})\)
Frank-Wolfe
(strongly) convex & smooth	\(\eta_t = \frac{1}{t}\)	\(O(\frac{1}{\sqrt{t}})\)	\(O(\frac{1}{\epsilon})\)
Projected GD
convex & smooth	\(\eta_t = \frac{1}{L}\)	\(O(\frac{1}{\sqrt{t}})\)	\(O(\frac{1}{\epsilon})\)
strongly convex & smooth	\(\eta_t = \frac{1}{L}\)	\(O\left((1-\frac{1}{\kappa})^t\right)\)	\(O(\kappa\log\frac{1}{\epsilon})\)
Subgradient method
convex & Lipschitz	\(\eta_t = \frac{1}{\sqrt{t}}\)	\(O(\frac{1}{\sqrt{t}})\)	\(O(\frac{1}{\epsilon^2})\)
strongly convex & Lipschitz	\(\eta_t = \frac{1}{t}\)	\(O\left(\frac{1}{t}\right)\)	\(O(\frac{1}{\epsilon})\)
Proximal GD
convex & smooth (w.r.t. \(f\))	\(\eta_t = \frac{1}{L}\)	\(O(\frac{1}{t})\)	\(O(\frac{1}{\epsilon})\)
strongly convex & smooth (w.r.t. \(f\))	\(\eta_t = \frac{1}{L}\)	\(O\left((1-\frac{1}{\kappa})^t\right)\)	\(O(\kappa\log\frac{1}{\epsilon})\)

CVX notes

标签：red 必须 linear \n rspec inf where normal sqrt

原文地址：https://www.cnblogs.com/yychi/p/9398439.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行