【论文笔记】用无投影动力学解决分布式优化问题

写在前面

原论文:Distributed Optimization with Projection-free Dynamics.

本文是Chen 2021[1]的笔记,主要记录了一种带约束的gradient tracking算法,该算法无需投影算子即可保证约束集始终满足。

问题描述和算法

考虑分布式优化问题

minxF(x)s.t.xΩ\min_{x}\, F(x) \quad \text{s.t.} \,\, x\in\Omega

其中F(x)=1ni=1nfi(x)F(x)=\frac{1}{n}\sum_{i=1}^nf_i(x)

假设1

  • 集合Ω\Omega紧凸非空;
  • fif_i凸可微,fi\nabla f_iΩ\Omegaκ\kappa-Lipschitz连续。
  • 有向图是强连通平衡的。

给出算法

x˙i=j=1naij(xjxi)+β(t)(vixi)y˙i=j=1naij(yjyi)+˙fi(xi)vi=argminvΩyiTv\begin{aligned} \dot x_i&=\sum_{j=1}^n a_{ij}(x_j-x_i)+\beta(t)(v_i-x_i)\\ \dot y_i&=\sum_{j=1}^n a_{ij}(y_j-y_i)+\dot \nabla f_i(x_i)\\ v_i&=\arg\min_{v\in\Omega}y_i^Tv \end{aligned}

其中初始条件满足xi(0),vi(0)Ωx_i(0),v_i(0)\in\Omegayi(0)=fi(xi(0))y_i(0)=\nabla f_i(x_i(0))

此外,控制参数β(t)\beta(t)满足limtβ(t)=0\operatorname{lim}_{t\to\infty}\beta(t)=0limt0tβ(τ)dτ=\operatorname{lim}_{t\to\infty}\int_0^t\beta(\tau)d\tau=\infty

收敛性证明

写成矩阵形式

x˙=Lx+β(vx)y˙=Ly+g˙\begin{aligned} \dot x&=-Lx+\beta(v-x)\\ \dot y&=-Ly+\dot g \end{aligned}

其中gi:=fi(xi)g_i:=\nabla f_i(x_i)

引理1:在假设1满足情况下,对任意iVi\in\mathcal V,如果xi(0)Ωx_i(0)\in\Omega,那么xi(t)Ωx_i(t)\in\Omegat>0t>0

证明:首先给出法锥定义,NΩ(x)={vvT(yx)0,yΩ}\mathcal N_\Omega(x)=\{v|v^T(y-x)\leq 0, y\in\Omega \}

由于(xiPΩ(xi))T(yxi)0(x_i-P_\Omega(x_i))^T(y-x_i)\leq 0,故xiPΩ(xi)NΩ(xi)x_i-P_\Omega(x_i)\in \mathcal N_\Omega(x_i)

因为viΩv_i\in\Omega,故vixiTΩ(xi)v_i-x_i\in \mathcal T_\Omega(x_i)。而一旦xjΩx_j\in\Omega,则有xjxiTΩ(xi)x_j-x_i\in \mathcal T_\Omega(x_i)。(需要所有agent的约束集相同)

因此x˙i=j=1naij(xjxi)+β(t)(vixi)TΩ(xi)\dot x_i=\sum_{j=1}^n a_{ij}(x_j-x_i)+\beta(t)(v_i-x_i)\in\mathcal T_\Omega(x_i)

定义能量函数

E=12xiPΩ(xi)2E = \frac{1}{2}\|x_i-P_\Omega(x_i)\|^2

对时间求导

E˙=(xiPΩ(xi))Tx˙i\dot E=(x_i-P_\Omega (x_i))^T\dot x_i

因为切锥法锥是正交的,一旦xiΩx_i\in\Omega,则E˙=0\dot E=0,故得证。

引理2:给定ε(t)0\varepsilon(t)\geq 0s(t)0s(t)\geq 0γ(t)0\gamma (t)\geq 0,如果limtε(t)=0\operatorname{lim}_{t\to\infty}\varepsilon(t)=0limt0tγ(τ)dτ=\operatorname{lim}_{t\to\infty}\int_0^t\gamma(\tau)d\tau=\infty

s˙(t)γ(t)s(t)+γ(t)ε(t),\dot s(t)\leq -\gamma(t)s(t)+\gamma (t)\varepsilon(t),

那么limts(t)=0\lim_{t\to \infty}s (t)=0

证明:令h(t)=exp0tγ(τ)dτh(t)=\operatorname{exp}\int_0^t\gamma(\tau)d\tau。则limth(t)=\operatorname{lim}_{t\to\infty}h(t)=\inftyh˙(t)=γ(t)h(t)\dot h(t)=\gamma(t)h(t)

由于s(t)0s(t)\geq 0s˙(t)γ(t)s(t)+γ(t)ε(t)\dot s(t)\leq -\gamma(t)s(t)+\gamma(t)\varepsilon(t),两边同乘h(t)h(t)得到

ddt(s(t)h(t))γ(t)h(t)ε(t).\frac{d}{dt}(s(t)h(t))\leq \gamma(t)h(t)\varepsilon(t).

由比较引理,对上式两边同时积分,得到

s(t)s(0)h(t)+1h(t)0tγ(τ)h(τ)ε(τ)dτ.s(t)\leq \frac{s(0)}{h(t)}+\frac{1}{h(t)}\int_0^t\gamma(\tau)h(\tau)\varepsilon(\tau)d\tau.

如果0tγ(τ)h(τ)ε(τ)dτ\int_0^t\gamma(\tau)h(\tau)\varepsilon(\tau)d\tau\leq \infty,那么limts(t)=0\lim_{t\to \infty}s(t)=0

否则根据L’ Hospital rule,有

limtsups(t)limtγ(t)h(t)ε(t)γ(t)h(t)=limtε(t)=0.\lim_{t\to\infty} \sup s(t)\leq \lim_{t\to\infty}\frac{\gamma(t)h(t)\varepsilon(t)}{\gamma(t)h(t)}=\lim_{t\to\infty}\varepsilon(t)=0.

L’ Hospital rule:

定理1:在假设1和初始条件满足情况下,

  • 所有xix_i达成一致性;
  • 所有yiy_i渐进收敛于1ni=1ngi\frac{1}{n}\sum_{i=1}^n g_i
  • 所有xix_i收敛于最优解xx^*

证明:1)由引理1,xiΩx_i\in\Omega。又viΩv_i\in\Omega,故vixiv_i-x_i有界。因此β(vixi)0\beta(v_i-x_i)\to 0t0t\to 0。由Shi 2013[2]可知,xixj0x_i-x_j\to 0

2)定义W=y1n1n1nTgW=y-\frac{1}{n}1_n1_n^Tg。考虑y=y0+yy=y_0+y_\bot,使得y0ker(L)y_0\in\ker(L)yker(L)y_\bot\in \ker(L)_\bot

由于1nTy=1nTg1_n^T y=1_n^Tg,且y0ker(L)y_0\in\ker(L),故y01n1n1nTg=0y_0-\frac{1}{n}1_n1_n^Tg=0。即W=yker(L)W=y_\bot\in \ker(L)_\bot

(实际上1nTy0=1nTg1_n^T y_0=1_n^Tg1nTy=01_n^Ty_\bot=0

定义能量函数J=12W2J=\frac{1}{2}\|W\|^2。由于图是平衡图,LT1n=0L^T1_n=0。对时间求导,

J˙=(y1n1n1nTg)TLy+(y1n1n1nTg)TΠg˙WTLsW+WΠg˙λ2W2+WΠg˙\begin{aligned} \dot J&=-(y-\frac{1}{n}1_n1_n^Tg)^TLy+(y-\frac{1}{n}1_n1_n^Tg)^T\Pi\dot g\\ &\leq -W^TL_sW+\|W\|\|\Pi\|\|\dot g\|\\ &\leq -\lambda_2\|W\|^2+\|W\|\|\Pi\|\|\dot g\|\\ \end{aligned}

ddtW=ddt2J=J˙Wλ2W2+WΠg˙\frac{d}{dt}\|W\|=\frac{d}{dt}\sqrt{2J}=\frac{\dot J}{\|W\|}\leq-\lambda_2\|W\|^2+\|W\|\|\Pi\|\|\dot g\|

由于ggκ\kappa-Lipschitz的,g˙κx˙=κLx+β(vx)\|\dot g\|\leq \kappa\|\dot x\|=\kappa\|-Lx+\beta(v-x)\|。由1)的分析,g˙0\|\dot g\|\to 0

再由引理2,W0\|W\|\to 0

3)令xˉ=1ni=1nxi\bar x=\frac{1}{n}\sum_{i=1}^nx_i。考虑函数

V=F(xˉ)F(x)0V=F(\bar x)-F(x^*)\geq 0

定义gˉi=fi(xˉ)\bar g_i = \nabla f_i(\bar x)。对时间求导,

V˙=1n(1nTgˉ)Txˉ˙=1n2(1nTgˉ)T1nTx˙=1n2gˉT1n1nT(Lx+β(vx))=βn2gˉT1n1nT(vx)=βn2(gˉT1n1nTnyT)(vx)+βnyT(vx)\begin{aligned} \dot V&=\frac{1}{n}(1_n^T\bar g)^T\dot {\bar x}\\ &=\frac{1}{n^2}(1_n^T\bar g)^T1_n^T\dot x\\ &=\frac{1}{n^2}\bar g^T1_n1_n^T(-Lx+\beta(v-x))\\ &=\frac{\beta}{n^2}\bar g^T1_n1_n^T(v-x)\\ &=\frac{\beta}{n^2}(\bar g^T1_n1_n^T-ny^T)(v-x)+\frac{\beta}{n}y^T(v-x) \end{aligned}

由于yTvyT(1nx)y^Tv\leq y^T(1_n\otimes x^*)恒成立,得到

V˙βn2(gˉT1n1nTnyT)(vx)+βnyT(1nxx)=βn2(gˉT1n1nTnyT)(v1nx+1nxx)+βnyT(1nxx)=βn2(gˉT1n1nTnyT)(v1nx)+βn2gˉT1n1nT(1nxx)\begin{aligned} \dot V&\leq \frac{\beta}{n^2}(\bar g^T1_n1_n^T-ny^T)(v-x)+\frac{\beta}{n}y^T(1_n\otimes x^*-x)\\ &=\frac{\beta}{n^2}(\bar g^T1_n1_n^T-ny^T)(v-1_n\otimes x^*+1_n\otimes x^*-x)+\frac{\beta}{n}y^T(1_n\otimes x^*-x)\\ &=\frac{\beta}{n^2}(\bar g^T1_n1_n^T-ny^T)(v-1_n\otimes x^*)+\frac{\beta}{n^2}\bar g^T1_n1_n^T(1_n\otimes x^*-x)\\ \end{aligned}

由凸函数性质

βn2gT1n1nT(1nxx)=β(1n1nTgˉ)T(xxˉ)β(F(x)F(xˉ))=βV\begin{aligned} \frac{\beta}{n^2}g^T1_n1_n^T(1_n\otimes x^*-x)&=\beta(\frac{1}{n}1_n^T\bar g)^T(x^*-\bar x)\\ &\leq \beta(F(x^*)-F(\bar x))=-\beta V \end{aligned}

βn2(gˉT1n1nTnyT)(v1nx)=βn2(gT1n1nTnyT)(v1nx)+βn2(gˉT1n1nTgT1n1nT)(v1nx)βn(W+κn1nxˉx)v1nx\begin{aligned} \frac{\beta}{n^2}(\bar g^T1_n1_n^T-ny^T)(v-1_n\otimes x^*)&=\frac{\beta}{n^2}( g^T1_n1_n^T-ny^T)(v-1_n\otimes x^*)\\ &\quad +\frac{\beta}{n^2}(\bar g^T1_n1_n^T- g^T1_n1_n^T)(v-1_n\otimes x^*)\\ &\leq \frac{\beta}{n}(\|W\|+\kappa n\|1_n\otimes \bar x-x\|)\|v-1_n\otimes x^*\| \end{aligned}

因为vi,xΩv_i,x^*\in\Omega,故v1nxc\|v-1_n\otimes x^*\|\leq c有界。则

V˙βV+cβn(W+κn1nxˉx)\dot V\leq -\beta V+\frac{c\beta}{n}(\|W\|+\kappa n\|1_n\otimes \bar x-x\|)

再次利用引理2,得到V0V\to 0

由于F(x)F(x)不是强凸,需要加证明xxx\to x^*,见原论文。


  1. G. Chen, P. Yi, and Y. Hong, “Distributed Optimization with Projection-free Dynamics,” May 2021. ↩︎

  2. G. Shi and K. H. Johansson, “Robust consensus for continuous-time multiagent dynamics,” SIAM Journal on Control and Optimization, vol. 51, no. 5, pp. 3673–3691, 2013. ↩︎