凸优化(三):不等式约束优化算法(内点法)

不等式约束优化问题

凸优化问题根据难度分为三层,不同难度的求解方法不同。第一层可以直接用KKT条件求解,第二层可以用牛顿法求解,以解决带有线性等式约束的优化问题。

本文要讲的内点法(interior-point method)位于第三层,进一步解决优化问题中的不等式约束,如下所示:

minf0(x)s.t.fi(x)0,i=1,,mAx=b,(1)\begin{aligned} \min &\quad f_0(x)\\ \operatorname{s.t.} &\quad f_i(x)\leq 0,\quad i=1,\cdots,m\\ &\quad Ax=b, \end{aligned}\quad (1)

其中f0,f1,,fm:RnRf_0,f_1,\cdots,f_m:\mathbb R^n\to \mathbb R是凸函数,且连续可微;对于ARp×nA\in\mathbb R^{p\times n}rankA=p<n\operatorname{rank}A=p<n。假设问题(1)可解,即最优解xx^*存在。记f0(x)f_0(x^*)f0f_0^*

假设:问题(1)严格可行(strictly feasible),即存在xDx\in\mathcal D使得Ax=bAx=b,所有fi(x)<0f_i(x)<0成立。

KKT条件:

Ax=b,fi(x)0,i=1,,mλ0f0(x)+i=1mλifi(x)+ATν=0λifi(x)=0,i=1,,m.(2)\begin{aligned} Ax^*=b,\quad f_i(x^*)&\leq 0,\quad i=1,\cdots,m\\ \lambda^*&\geq 0\\ \nabla f_0(x^*)+\sum_{i=1}^m\lambda_i^*\nabla f_i(x^*)+A^T\nu^*&=0\\ \lambda_i^*f_i(x^*)&=0,\quad i=1,\cdots,m. \end{aligned}\quad (2)

对数屏障

为解决问题(1),我们用罚函数将问题(1)近似为一个等式约束优化问题

minf0(x)+i=1nI(fi(x))s.t.Ax=b,\begin{aligned} \min &\quad f_0(x)+\sum_{i=1}^n \mathbb I_-(f_i(x))\\ \operatorname{s.t.}&\quad Ax=b, \end{aligned}

其中,I:RR\mathbb I_-:\mathbb R\to \mathbb R是非正实数的指标函数(indicator function),

I(u)={0u0u>0.I_-(u)=\left\{ \begin{aligned} 0\quad &u\leq 0\\ \infty \quad &u>0. \end{aligned} \right.

对数屏障算法

定义对数屏障(logarithmic barrier)

ϕ(x)=i=1mlog(fi(x)),\phi(x)=-\sum_{i=1}^m\log(-f_i(x)),

其中定义域domϕ={xRnfi(x)<0,i=1,,m}\operatorname{dom} \phi=\{x\in\mathbb R^n|f_i(x)<0,i=1,\cdots,m \}要求不等式约束严格可行。

引入一个参数t>0t>0,可以看出,1tϕ(x)\frac{1}{t}\phi(x)I(x)I_-(x)的近似。当tt越大时,1tϕ(x)\frac{1}{t}\phi(x)I(x)I_-(x)的近似效果越好,如下图所示。实际求解优化问题过程中,可以通过每一步迭代增加tt的值来达到更好的逼近效果,如:直接令tt为迭代步数。

对数屏障函数ϕ(x)\phi(x)的梯度和Hessian分别为

ϕ(x)=i=1m1fi(x)fi(x),2ϕ(x)=i=1n1fi2(x)fi(x)fiT(x)+i=1m1fi(x)2fi(x).\begin{aligned} \nabla\phi(x)&=\sum_{i=1}^m\frac{1}{-f_i(x)}\nabla f_i(x),\\ \nabla^2\phi(x)&=\sum_{i=1}^n \frac{1}{f_i^2(x)}\nabla f_i(x)\nabla f_i^T(x)+\sum_{i=1}^m\frac{1}{-f_i(x)}\nabla^2f_i(x). \end{aligned}

令最优解为ϵ\epsilon-optimal,则m/t=ϵm/t=\epsilon。算法描述如下:

中心路径

考虑优化问题

mintf0(x)+ϕ(x)s.t.Ax=b.(3)\begin{aligned} \min &\quad tf_0(x)+\phi(x)\\ \operatorname{s.t.} &\quad Ax=b. \end{aligned}\quad (3)

假设问题(3)能用牛顿法解决,且对于每一个t>0t>0,问题(3)都有唯一解x(t)x^*(t)。问题(1)的中心路径(central path)即为中心点x(t)x^*(t)形成的集合。

点位于中心路径的充分必要条件(centrality condition)为:x(t)x^*(t)严格可行,且存在ν^Rp\hat\nu\in \mathbb R^p使得式(4)成立:

0=tf0(x(t))+ϕ(x(t))+ATν^=tf0(x(t))+i=1m1fi(x(t))fi(x(t))+ATν^.(4)\begin{aligned} 0&=t\nabla f_0(x^*(t))+\nabla \phi(x^*(t))+A^T\hat \nu\\ &=t\nabla f_0(x^*(t))+\sum_{i=1}^m\frac{1}{-f_i(x^*(t))}\nabla f_i(x^*(t))+A^T\hat \nu. \end{aligned}\quad (4)

由式(4)可以推出,每一个中心点产生一个对偶可行点。定义

λi(t)=1tfi(x(t)),i=1,,m,ν=ν^t.\lambda_i^*(t) = -\frac{1}{tf_i(x^*(t))},\quad i=1,\cdots,m,\quad \nu^*=\frac{\hat \nu}{t}.

则称λ(t)\lambda^*(t)ν(t)\nu^*(t)对偶可行(dual feasible)。由centrality condition可知,λ(t)\lambda^*(t)ν(t)\nu^*(t)满足前面的KKT条件(2),故相应的x(t)x^*(t)也使得拉格朗日函数最小。

对偶函数g(λ(t),ν(t))g(\lambda^*(t),\nu^*(t))有界,且

g(λ(t),ν(t))=f0(x(t))+i=1mλi(t)fi(x(t))+ν(t)T(ATx(t)b)=f0(x(t))m/tf0.\begin{aligned} g(\lambda^*(t),\nu^*(t))&=f_0(x^*(t))+\sum_{i=1}^m\lambda_i^*(t)f_i(x^*(t))+\nu^*(t)^T(A^Tx^*(t)-b)\\ &=f_0(x^*(t))-m/t\leq f_0^*. \end{aligned}

可知duality gap为m/tm/t,且f0(x(t))f0m/tf_0(x^*(t))-f_0^*\leq m/t,即x(t)x^*(t)至少是m/tm/t-optimal的。

用改进KKT条件解释

改进KKT条件(modified KKT):对于近似的x(t)x^*(t)x=x(t)x=x^*(t)当且仅当存在λ,ν\lambda,\nu满足

Ax=b,fi(x)0,i=1,,mλ0f0(x)+i=1mλifi(x)+ATν=0λifi(x)=1/t,i=1,,m.(5)\begin{aligned} Ax^*=b,\quad f_i(x^*)&\leq 0,\quad i=1,\cdots,m\\ \lambda^*&\geq 0\\ \nabla f_0(x^*)+\sum_{i=1}^m\lambda_i^*\nabla f_i(x^*)+A^T\nu^*&=0\\ -\lambda_i^*f_i(x^*)&=1/t,\quad i=1,\cdots,m. \end{aligned}\quad (5)

与式(2)对比,唯一不同的地方是λifi(x)=0\lambda_i^*f_i(x^*)=0换成了λifi(x)=1/t-\lambda_i^*f_i(x^*)=1/t

例子:不等式线性规划

考虑LP

mincTxs.t.Axb.\begin{aligned} \min&\quad c^Tx\\ \operatorname{s.t.} &\quad Ax\leq b. \end{aligned}

对数屏障函数

ϕ(x)=i=1mlog(biaiTx),domϕ={xAx<b}.\phi(x)=-\sum_{i=1}^m\log(b_i-a_i^Tx),\quad \operatorname{dom}\phi=\{x|Ax<b\}.

centrality condition

tc+i=1n1biaiTxai=0.tc+\sum_{i=1}^n\frac{1}{b_i-a_i^Tx}a_i=0.

随着tt增大,近似最优解逐渐接近真实最优解,形成的轨迹即为中心路径,如下图所示。

改进KKT条件与牛顿法的关系

牛顿法KKT系统

[t2f0(x)+2ϕ(x)ATA0][vnνn]=[tf0(x)ϕ(x)0]\begin{bmatrix} t\nabla^2 f_0(x)+\nabla^2\phi(x)&A^T\\ A&0 \end{bmatrix}\begin{bmatrix} v_n\\ \nu_n \end{bmatrix}=\begin{bmatrix} -t\nabla f_0(x)-\nabla\phi(x)\\ 0 \end{bmatrix}

λi=1tfi(x)\lambda_i = -\frac{1}{tf_i(x)}代入改进KKT条件,解得v=vnv=v_nν=νn/t\nu=\nu_n/t。详见Boyd 2004章节11.3.4[1]

对数屏障算法的近似误差

以Mahyar 2018[2]的时变优化问题为例。只考虑不等式约束

minf0(x,t)s.t.fi(x,t)0,i[p]\begin{aligned} \min &\quad f_0(x,t)\\ \operatorname{s.t.} &\quad f_i(x,t)\leq 0,\quad i\in[p]\\ \end{aligned}

其最优解为

x(t)=argminxRnf0(x,t)+i=1pI(fi(x,t))(6)x^*(t)=\mathop {\arg \min }\limits_{x \in {R^n}} f_0(x,t)+\sum_{i=1}^p\mathbb I_-(f_i(x,t))\qquad (6)

用对数屏障函数近似的最优解为

x^(t)=argminxD^(t)f0(x,t)1c(t)i=1plog(s(t)fi(x,t))(7)\hat x^*(t)=\mathop{\arg\min}\limits_{x\in\hat{\mathcal D}(t)}f_0(x,t)-\frac{1}{c(t)}\sum_{i=1}^p\log(s(t)-f_i(x,t))\qquad (7)

其中D^(t):={xRnfi(x,t)<s(t),i[p]}\hat {\mathcal D}(t):=\{x\in\mathbb R^n|f_i(x,t)<s(t),i\in[p]\}。通过引入松弛变量s(t)s(t)保证初始时刻x(0)D^(0)x(0)\in\hat {\mathcal D}(0)

假设1 (convexity):对任意t0t\geq 0fi(x,t),i[p]f_i(x,t),i\in [p]是凸的,f0(x,t)f_0(x,t)是一致强凸的。

假设2 (Slater's Condition):可行域的内点非空,即存在xx^\dagger使得fi(x,t)<0,i[p]f_i(x^\dagger,t)<0,i\in[p]成立。

Lemma 1: Let x(t)x^*(t) and x^(t)\hat x^*(t) be defined as in (6) and (7), respectively. Then, under Assumption 1 and 2, and for any λΓ(t)\lambda^*\in \Gamma^*(t) and t0t\geq 0, the following inequality holds:

f0(x^(t),t)f0(x(t),t)pc(t)+s(t)(infλΓ(t)λ1)|f_0(\hat x^*(t),t)-f_0(x^*(t),t)|\leq \frac{p}{c(t)}+s(t)\left(\inf_{\lambda^*\in\Gamma^*(t)}\|\lambda^*\|_1\right)

证明:定义

x~=argminxRnf0(x,t)+i=1pI(fi(x,t)s(t))\tilde x^*=\mathop{\arg\min}\limits_{x\in\mathbb R^n}f_0(x,t)+\sum_{i=1}^p\mathbb I_-(f_i(x,t)-s(t))

通过扰动和灵敏度分析(Boyd 2004章节5.6),当s(t)0s(t)\geq 0时,可建立如下不等式:

0f0(x(t),t)f0(x~(t),t)i=1pλis(t)(8)0\leq f_0(x^*(t),t)-f_0(\tilde x^*(t),t)\leq \sum_{i=1}^p\lambda_i^* s(t)\qquad(8)

前一个不等式是因为s(t)s(t)放松了可行域,最优值只会更小不会增加。后一个不等式直接由灵敏度分析得出(Boyd 2004章节5.6)。利用对数屏障函数1clog(u)-\frac{1}{c}\log(-u)替代指标函数I(u)\mathbb I_-(u),我们可以得到正上界(Boyd 2004章节11)

f0(x^(t),t)f0(x~(t),t)pc(t)(9)f_0(\hat x^*(t),t)-f_0(\tilde x^*(t),t)\leq \frac{p}{c(t)}\qquad (9)

由(8)(9)可得,

f0(x^(t),t)f0(x(t),t)pc(t)+i=1pλis(t)pc(t)+s(t)(infλΓ(t)λ1)\begin{aligned} |f_0(\hat x^*(t),t)-f_0(x^*(t),t)|&\leq \frac{p}{c(t)}+\sum_{i=1}^p\lambda_i^* s(t)\\ &\leq \frac{p}{c(t)}+s(t)\left(\inf_{\lambda^*\in\Gamma^*(t)}\|\lambda^*\|_1\right) \end{aligned}

定义

Φ(x,c,s,t)=f0(x,t)1ci=1plog(sfi(x,t))\Phi(x,c,s,t)=f_0(x,t)-\frac{1}{c}\sum_{i=1}^p\log(s-f_i(x,t))

给定prediction-correction算法

x˙=xx1Φ(PxΦ+xsΦs˙+xcΦc˙+xtΦ)(10)\dot x=-\nabla_{xx}^{-1}\Phi(P\nabla_x\Phi+\nabla_{xs}\Phi\dot s+\nabla_{xc}\Phi\dot c+\nabla_{xt}\Phi)\qquad (10)

要使用对数屏障算法,还需要证明x(t)D^(t)x(t)\in\hat {\mathcal D}(t)对所有t0t\geq 0成立。

Lemma 2: Let x^(t)\hat x^*(t) be defined in (7) and x(t)x(t) be the solution of (10) for x(0)D^(0)x(0)\in \hat {\mathcal D}(0), s(0)>maxifi(x(0),0)s(0)>\max_if_i(x(0),0), and c(0)>0c(0)> 0. Then, under Assumptions 1 and 2, x(t)x(t) satisfies x(t)D^(t)x(t)\in \hat {\mathcal D}(t) for all t0t\geq 0. Moreover, the following inequality holds:

x(t)x^(t)C3eαt\|x(t)-\hat x^*(t)\|\leq C_3 e^{-\alpha t}

where 0C3:=1mfxΦ(x(0),c(0),s(0),0)00\leq C_3:=\frac{1}{m_f}\|\nabla _x\Phi(x(0),c(0),s(0),0)\|\leq 0

证明:因为f0f_0mfm_f-强凸,c>0c>0Φ\PhiD^(t)\hat {\mathcal D}(t)内也是mfm_f-强凸。xxΦ1\nabla_{xx}\Phi^{-1}存在且一致有界。将(10)代入闭环动力学,得˙xΦ=αxΦ\dot \nabla_x\Phi=-\alpha \nabla_x \Phi,因此xΦ\|\nabla_x \Phi\|是有界的。注意到

xΦ(x,c,s,t):=xf0(x,t)+1ci=1pxfi(x,t)sfi(x,t)\nabla_x \Phi(x,c,s,t):=\nabla_x f_0(x,t)+\frac{1}{c}\sum_{i=1}^p\frac{\nabla_x f_i(x,t)}{s-f_i(x,t)},

即在D^(t)\hat {\mathcal D}(t)的边界上xΦ\|\nabla_x \Phi\|无界。因此由xΦ\|\nabla_x \Phi\|的有界性可以推出x(t)D^(t)x(t)\in \hat {\mathcal D}(t)。再由强凸性得到

x(t)x^(t)1mfxΦ(x(t),c(t),s(t),t)\|x(t)-\hat x^*(t)\|\leq \frac{1}{m_f}\|\nabla _x\Phi(x(t),c(t),s(t),t)\|

结合˙xΦ=αxΦ\dot \nabla_x\Phi=-\alpha \nabla_x \Phi得证。


  1. Boyd, S., Boyd, S. P., & Vandenberghe, L. (2004). Convex optimization. Cambridge university press. ↩︎

  2. M. Fazlyab, S. Paternain, V. M. Preciado, and A. Ribeiro, “Prediction-correction interior-point method for time-varying convex optimization,” IEEE Trans. Automat. Contr., vol. 63, no. 7, pp. 1973–1986, Jul. 2018. ↩︎