写在前面
原论文:Distributed Optimization with Projection-free Dynamics.
本文是Chen 2021的笔记,主要记录了一种带约束的gradient tracking算法,该算法无需投影算子即可保证约束集始终满足。
问题描述和算法
考虑分布式优化问题
min x F ( x ) s.t. x ∈ Ω \min_{x}\, F(x) \quad \text{s.t.} \,\, x\in\Omega
x min F ( x ) s.t. x ∈ Ω
其中F ( x ) = 1 n ∑ i = 1 n f i ( x ) F(x)=\frac{1}{n}\sum_{i=1}^nf_i(x) F ( x ) = n 1 ∑ i = 1 n f i ( x ) 。
假设1 :
集合Ω \Omega Ω 紧凸非空;
f i f_i f i 凸可微,∇ f i \nabla f_i ∇ f i 在Ω \Omega Ω 上κ \kappa κ -Lipschitz连续。
有向图是强连通平衡的。
给出算法
x ˙ i = ∑ j = 1 n a i j ( x j − x i ) + β ( t ) ( v i − x i ) y ˙ i = ∑ j = 1 n a i j ( y j − y i ) + ∇ ˙ f i ( x i ) v i = arg min v ∈ Ω y i T v \begin{aligned}
\dot x_i&=\sum_{j=1}^n a_{ij}(x_j-x_i)+\beta(t)(v_i-x_i)\\
\dot y_i&=\sum_{j=1}^n a_{ij}(y_j-y_i)+\dot \nabla f_i(x_i)\\
v_i&=\arg\min_{v\in\Omega}y_i^Tv
\end{aligned}
x ˙ i y ˙ i v i = j = 1 ∑ n a i j ( x j − x i ) + β ( t ) ( v i − x i ) = j = 1 ∑ n a i j ( y j − y i ) + ∇ ˙ f i ( x i ) = arg v ∈ Ω min y i T v
其中初始条件满足x i ( 0 ) , v i ( 0 ) ∈ Ω x_i(0),v_i(0)\in\Omega x i ( 0 ) , v i ( 0 ) ∈ Ω ,y i ( 0 ) = ∇ f i ( x i ( 0 ) ) y_i(0)=\nabla f_i(x_i(0)) y i ( 0 ) = ∇ f i ( x i ( 0 ) ) 。
此外,控制参数β ( t ) \beta(t) β ( t ) 满足lim t → ∞ β ( t ) = 0 \operatorname{lim}_{t\to\infty}\beta(t)=0 l i m t → ∞ β ( t ) = 0 且lim t → ∞ ∫ 0 t β ( τ ) d τ = ∞ \operatorname{lim}_{t\to\infty}\int_0^t\beta(\tau)d\tau=\infty l i m t → ∞ ∫ 0 t β ( τ ) d τ = ∞ 。
收敛性证明
写成矩阵形式
x ˙ = − L x + β ( v − x ) y ˙ = − L y + g ˙ \begin{aligned}
\dot x&=-Lx+\beta(v-x)\\
\dot y&=-Ly+\dot g
\end{aligned}
x ˙ y ˙ = − L x + β ( v − x ) = − L y + g ˙
其中g i : = ∇ f i ( x i ) g_i:=\nabla f_i(x_i) g i : = ∇ f i ( x i ) 。
引理1:在假设1满足情况下,对任意i ∈ V i\in\mathcal V i ∈ V ,如果x i ( 0 ) ∈ Ω x_i(0)\in\Omega x i ( 0 ) ∈ Ω ,那么x i ( t ) ∈ Ω x_i(t)\in\Omega x i ( t ) ∈ Ω ,t > 0 t>0 t > 0 。
证明:首先给出法锥定义,N Ω ( x ) = { v ∣ v T ( y − x ) ≤ 0 , y ∈ Ω } \mathcal N_\Omega(x)=\{v|v^T(y-x)\leq 0, y\in\Omega \} N Ω ( x ) = { v ∣ v T ( y − x ) ≤ 0 , y ∈ Ω } 。
由于( x i − P Ω ( x i ) ) T ( y − x i ) ≤ 0 (x_i-P_\Omega(x_i))^T(y-x_i)\leq 0 ( x i − P Ω ( x i ) ) T ( y − x i ) ≤ 0 ,故x i − P Ω ( x i ) ∈ N Ω ( x i ) x_i-P_\Omega(x_i)\in \mathcal N_\Omega(x_i) x i − P Ω ( x i ) ∈ N Ω ( x i ) 。
因为v i ∈ Ω v_i\in\Omega v i ∈ Ω ,故v i − x i ∈ T Ω ( x i ) v_i-x_i\in \mathcal T_\Omega(x_i) v i − x i ∈ T Ω ( x i ) 。而一旦x j ∈ Ω x_j\in\Omega x j ∈ Ω ,则有x j − x i ∈ T Ω ( x i ) x_j-x_i\in \mathcal T_\Omega(x_i) x j − x i ∈ T Ω ( x i ) 。(需要所有agent的约束集相同 )
因此x ˙ i = ∑ j = 1 n a i j ( x j − x i ) + β ( t ) ( v i − x i ) ∈ T Ω ( x i ) \dot x_i=\sum_{j=1}^n a_{ij}(x_j-x_i)+\beta(t)(v_i-x_i)\in\mathcal T_\Omega(x_i) x ˙ i = ∑ j = 1 n a i j ( x j − x i ) + β ( t ) ( v i − x i ) ∈ T Ω ( x i ) 。
定义能量函数
E = 1 2 ∥ x i − P Ω ( x i ) ∥ 2 E = \frac{1}{2}\|x_i-P_\Omega(x_i)\|^2
E = 2 1 ∥ x i − P Ω ( x i ) ∥ 2
对时间求导
E ˙ = ( x i − P Ω ( x i ) ) T x ˙ i \dot E=(x_i-P_\Omega (x_i))^T\dot x_i
E ˙ = ( x i − P Ω ( x i ) ) T x ˙ i
因为切锥法锥是正交的,一旦x i ∈ Ω x_i\in\Omega x i ∈ Ω ,则E ˙ = 0 \dot E=0 E ˙ = 0 ,故得证。
引理2:给定ε ( t ) ≥ 0 \varepsilon(t)\geq 0 ε ( t ) ≥ 0 ,s ( t ) ≥ 0 s(t)\geq 0 s ( t ) ≥ 0 和γ ( t ) ≥ 0 \gamma (t)\geq 0 γ ( t ) ≥ 0 ,如果lim t → ∞ ε ( t ) = 0 \operatorname{lim}_{t\to\infty}\varepsilon(t)=0 l i m t → ∞ ε ( t ) = 0 且lim t → ∞ ∫ 0 t γ ( τ ) d τ = ∞ \operatorname{lim}_{t\to\infty}\int_0^t\gamma(\tau)d\tau=\infty l i m t → ∞ ∫ 0 t γ ( τ ) d τ = ∞ ,
s ˙ ( t ) ≤ − γ ( t ) s ( t ) + γ ( t ) ε ( t ) , \dot s(t)\leq -\gamma(t)s(t)+\gamma (t)\varepsilon(t),
s ˙ ( t ) ≤ − γ ( t ) s ( t ) + γ ( t ) ε ( t ) ,
那么lim t → ∞ s ( t ) = 0 \lim_{t\to \infty}s (t)=0 lim t → ∞ s ( t ) = 0 。
证明:令h ( t ) = exp ∫ 0 t γ ( τ ) d τ h(t)=\operatorname{exp}\int_0^t\gamma(\tau)d\tau h ( t ) = e x p ∫ 0 t γ ( τ ) d τ 。则lim t → ∞ h ( t ) = ∞ \operatorname{lim}_{t\to\infty}h(t)=\infty l i m t → ∞ h ( t ) = ∞ 且h ˙ ( t ) = γ ( t ) h ( t ) \dot h(t)=\gamma(t)h(t) h ˙ ( t ) = γ ( t ) h ( t ) 。
由于s ( t ) ≥ 0 s(t)\geq 0 s ( t ) ≥ 0 和s ˙ ( t ) ≤ − γ ( t ) s ( t ) + γ ( t ) ε ( t ) \dot s(t)\leq -\gamma(t)s(t)+\gamma(t)\varepsilon(t) s ˙ ( t ) ≤ − γ ( t ) s ( t ) + γ ( t ) ε ( t ) ,两边同乘h ( t ) h(t) h ( t ) 得到
d d t ( s ( t ) h ( t ) ) ≤ γ ( t ) h ( t ) ε ( t ) . \frac{d}{dt}(s(t)h(t))\leq \gamma(t)h(t)\varepsilon(t).
d t d ( s ( t ) h ( t ) ) ≤ γ ( t ) h ( t ) ε ( t ) .
由比较引理,对上式两边同时积分,得到
s ( t ) ≤ s ( 0 ) h ( t ) + 1 h ( t ) ∫ 0 t γ ( τ ) h ( τ ) ε ( τ ) d τ . s(t)\leq \frac{s(0)}{h(t)}+\frac{1}{h(t)}\int_0^t\gamma(\tau)h(\tau)\varepsilon(\tau)d\tau.
s ( t ) ≤ h ( t ) s ( 0 ) + h ( t ) 1 ∫ 0 t γ ( τ ) h ( τ ) ε ( τ ) d τ .
如果∫ 0 t γ ( τ ) h ( τ ) ε ( τ ) d τ ≤ ∞ \int_0^t\gamma(\tau)h(\tau)\varepsilon(\tau)d\tau\leq \infty ∫ 0 t γ ( τ ) h ( τ ) ε ( τ ) d τ ≤ ∞ ,那么lim t → ∞ s ( t ) = 0 \lim_{t\to \infty}s(t)=0 lim t → ∞ s ( t ) = 0 。
否则根据L’ Hospital rule,有
lim t → ∞ sup s ( t ) ≤ lim t → ∞ γ ( t ) h ( t ) ε ( t ) γ ( t ) h ( t ) = lim t → ∞ ε ( t ) = 0. \lim_{t\to\infty} \sup s(t)\leq \lim_{t\to\infty}\frac{\gamma(t)h(t)\varepsilon(t)}{\gamma(t)h(t)}=\lim_{t\to\infty}\varepsilon(t)=0.
t → ∞ lim sup s ( t ) ≤ t → ∞ lim γ ( t ) h ( t ) γ ( t ) h ( t ) ε ( t ) = t → ∞ lim ε ( t ) = 0 .
L’ Hospital rule:
定理1:在假设1和初始条件满足情况下,
所有x i x_i x i 达成一致性;
所有y i y_i y i 渐进收敛于1 n ∑ i = 1 n g i \frac{1}{n}\sum_{i=1}^n g_i n 1 ∑ i = 1 n g i 。
所有x i x_i x i 收敛于最优解x ∗ x^* x ∗ 。
证明:1)由引理1,x i ∈ Ω x_i\in\Omega x i ∈ Ω 。又v i ∈ Ω v_i\in\Omega v i ∈ Ω ,故v i − x i v_i-x_i v i − x i 有界。因此β ( v i − x i ) → 0 \beta(v_i-x_i)\to 0 β ( v i − x i ) → 0 当t → 0 t\to 0 t → 0 。由Shi 2013可知,x i − x j → 0 x_i-x_j\to 0 x i − x j → 0 。
2)定义W = y − 1 n 1 n 1 n T g W=y-\frac{1}{n}1_n1_n^Tg W = y − n 1 1 n 1 n T g 。考虑y = y 0 + y ⊥ y=y_0+y_\bot y = y 0 + y ⊥ ,使得y 0 ∈ ker ( L ) y_0\in\ker(L) y 0 ∈ ker ( L ) 且y ⊥ ∈ ker ( L ) ⊥ y_\bot\in \ker(L)_\bot y ⊥ ∈ ker ( L ) ⊥ 。
由于1 n T y = 1 n T g 1_n^T y=1_n^Tg 1 n T y = 1 n T g ,且y 0 ∈ ker ( L ) y_0\in\ker(L) y 0 ∈ ker ( L ) ,故y 0 − 1 n 1 n 1 n T g = 0 y_0-\frac{1}{n}1_n1_n^Tg=0 y 0 − n 1 1 n 1 n T g = 0 。即W = y ⊥ ∈ ker ( L ) ⊥ W=y_\bot\in \ker(L)_\bot W = y ⊥ ∈ ker ( L ) ⊥ 。
(实际上1 n T y 0 = 1 n T g 1_n^T y_0=1_n^Tg 1 n T y 0 = 1 n T g 且1 n T y ⊥ = 0 1_n^Ty_\bot=0 1 n T y ⊥ = 0 )
定义能量函数J = 1 2 ∥ W ∥ 2 J=\frac{1}{2}\|W\|^2 J = 2 1 ∥ W ∥ 2 。由于图是平衡图,L T 1 n = 0 L^T1_n=0 L T 1 n = 0 。对时间求导,
J ˙ = − ( y − 1 n 1 n 1 n T g ) T L y + ( y − 1 n 1 n 1 n T g ) T Π g ˙ ≤ − W T L s W + ∥ W ∥ ∥ Π ∥ ∥ g ˙ ∥ ≤ − λ 2 ∥ W ∥ 2 + ∥ W ∥ ∥ Π ∥ ∥ g ˙ ∥ \begin{aligned}
\dot J&=-(y-\frac{1}{n}1_n1_n^Tg)^TLy+(y-\frac{1}{n}1_n1_n^Tg)^T\Pi\dot g\\
&\leq -W^TL_sW+\|W\|\|\Pi\|\|\dot g\|\\
&\leq -\lambda_2\|W\|^2+\|W\|\|\Pi\|\|\dot g\|\\
\end{aligned}
J ˙ = − ( y − n 1 1 n 1 n T g ) T L y + ( y − n 1 1 n 1 n T g ) T Π g ˙ ≤ − W T L s W + ∥ W ∥ ∥ Π ∥ ∥ g ˙ ∥ ≤ − λ 2 ∥ W ∥ 2 + ∥ W ∥ ∥ Π ∥ ∥ g ˙ ∥
又
d d t ∥ W ∥ = d d t 2 J = J ˙ ∥ W ∥ ≤ − λ 2 ∥ W ∥ 2 + ∥ W ∥ ∥ Π ∥ ∥ g ˙ ∥ \frac{d}{dt}\|W\|=\frac{d}{dt}\sqrt{2J}=\frac{\dot J}{\|W\|}\leq-\lambda_2\|W\|^2+\|W\|\|\Pi\|\|\dot g\|
d t d ∥ W ∥ = d t d 2 J = ∥ W ∥ J ˙ ≤ − λ 2 ∥ W ∥ 2 + ∥ W ∥ ∥ Π ∥ ∥ g ˙ ∥
由于g g g 是κ \kappa κ -Lipschitz的,∥ g ˙ ∥ ≤ κ ∥ x ˙ ∥ = κ ∥ − L x + β ( v − x ) ∥ \|\dot g\|\leq \kappa\|\dot x\|=\kappa\|-Lx+\beta(v-x)\| ∥ g ˙ ∥ ≤ κ ∥ x ˙ ∥ = κ ∥ − L x + β ( v − x ) ∥ 。由1)的分析,∥ g ˙ ∥ → 0 \|\dot g\|\to 0 ∥ g ˙ ∥ → 0 。
再由引理2,∥ W ∥ → 0 \|W\|\to 0 ∥ W ∥ → 0 。
3)令x ˉ = 1 n ∑ i = 1 n x i \bar x=\frac{1}{n}\sum_{i=1}^nx_i x ˉ = n 1 ∑ i = 1 n x i 。考虑函数
V = F ( x ˉ ) − F ( x ∗ ) ≥ 0 V=F(\bar x)-F(x^*)\geq 0
V = F ( x ˉ ) − F ( x ∗ ) ≥ 0
定义g ˉ i = ∇ f i ( x ˉ ) \bar g_i = \nabla f_i(\bar x) g ˉ i = ∇ f i ( x ˉ ) 。对时间求导,
V ˙ = 1 n ( 1 n T g ˉ ) T x ˉ ˙ = 1 n 2 ( 1 n T g ˉ ) T 1 n T x ˙ = 1 n 2 g ˉ T 1 n 1 n T ( − L x + β ( v − x ) ) = β n 2 g ˉ T 1 n 1 n T ( v − x ) = β n 2 ( g ˉ T 1 n 1 n T − n y T ) ( v − x ) + β n y T ( v − x ) \begin{aligned}
\dot V&=\frac{1}{n}(1_n^T\bar g)^T\dot {\bar x}\\
&=\frac{1}{n^2}(1_n^T\bar g)^T1_n^T\dot x\\
&=\frac{1}{n^2}\bar g^T1_n1_n^T(-Lx+\beta(v-x))\\
&=\frac{\beta}{n^2}\bar g^T1_n1_n^T(v-x)\\
&=\frac{\beta}{n^2}(\bar g^T1_n1_n^T-ny^T)(v-x)+\frac{\beta}{n}y^T(v-x)
\end{aligned}
V ˙ = n 1 ( 1 n T g ˉ ) T x ˉ ˙ = n 2 1 ( 1 n T g ˉ ) T 1 n T x ˙ = n 2 1 g ˉ T 1 n 1 n T ( − L x + β ( v − x ) ) = n 2 β g ˉ T 1 n 1 n T ( v − x ) = n 2 β ( g ˉ T 1 n 1 n T − n y T ) ( v − x ) + n β y T ( v − x )
由于y T v ≤ y T ( 1 n ⊗ x ∗ ) y^Tv\leq y^T(1_n\otimes x^*) y T v ≤ y T ( 1 n ⊗ x ∗ ) 恒成立,得到
V ˙ ≤ β n 2 ( g ˉ T 1 n 1 n T − n y T ) ( v − x ) + β n y T ( 1 n ⊗ x ∗ − x ) = β n 2 ( g ˉ T 1 n 1 n T − n y T ) ( v − 1 n ⊗ x ∗ + 1 n ⊗ x ∗ − x ) + β n y T ( 1 n ⊗ x ∗ − x ) = β n 2 ( g ˉ T 1 n 1 n T − n y T ) ( v − 1 n ⊗ x ∗ ) + β n 2 g ˉ T 1 n 1 n T ( 1 n ⊗ x ∗ − x ) \begin{aligned}
\dot V&\leq \frac{\beta}{n^2}(\bar g^T1_n1_n^T-ny^T)(v-x)+\frac{\beta}{n}y^T(1_n\otimes x^*-x)\\
&=\frac{\beta}{n^2}(\bar g^T1_n1_n^T-ny^T)(v-1_n\otimes x^*+1_n\otimes x^*-x)+\frac{\beta}{n}y^T(1_n\otimes x^*-x)\\
&=\frac{\beta}{n^2}(\bar g^T1_n1_n^T-ny^T)(v-1_n\otimes x^*)+\frac{\beta}{n^2}\bar g^T1_n1_n^T(1_n\otimes x^*-x)\\
\end{aligned}
V ˙ ≤ n 2 β ( g ˉ T 1 n 1 n T − n y T ) ( v − x ) + n β y T ( 1 n ⊗ x ∗ − x ) = n 2 β ( g ˉ T 1 n 1 n T − n y T ) ( v − 1 n ⊗ x ∗ + 1 n ⊗ x ∗ − x ) + n β y T ( 1 n ⊗ x ∗ − x ) = n 2 β ( g ˉ T 1 n 1 n T − n y T ) ( v − 1 n ⊗ x ∗ ) + n 2 β g ˉ T 1 n 1 n T ( 1 n ⊗ x ∗ − x )
由凸函数性质
β n 2 g T 1 n 1 n T ( 1 n ⊗ x ∗ − x ) = β ( 1 n 1 n T g ˉ ) T ( x ∗ − x ˉ ) ≤ β ( F ( x ∗ ) − F ( x ˉ ) ) = − β V \begin{aligned}
\frac{\beta}{n^2}g^T1_n1_n^T(1_n\otimes x^*-x)&=\beta(\frac{1}{n}1_n^T\bar g)^T(x^*-\bar x)\\
&\leq \beta(F(x^*)-F(\bar x))=-\beta V
\end{aligned}
n 2 β g T 1 n 1 n T ( 1 n ⊗ x ∗ − x ) = β ( n 1 1 n T g ˉ ) T ( x ∗ − x ˉ ) ≤ β ( F ( x ∗ ) − F ( x ˉ ) ) = − β V
β n 2 ( g ˉ T 1 n 1 n T − n y T ) ( v − 1 n ⊗ x ∗ ) = β n 2 ( g T 1 n 1 n T − n y T ) ( v − 1 n ⊗ x ∗ ) + β n 2 ( g ˉ T 1 n 1 n T − g T 1 n 1 n T ) ( v − 1 n ⊗ x ∗ ) ≤ β n ( ∥ W ∥ + κ n ∥ 1 n ⊗ x ˉ − x ∥ ) ∥ v − 1 n ⊗ x ∗ ∥ \begin{aligned}
\frac{\beta}{n^2}(\bar g^T1_n1_n^T-ny^T)(v-1_n\otimes x^*)&=\frac{\beta}{n^2}( g^T1_n1_n^T-ny^T)(v-1_n\otimes x^*)\\
&\quad +\frac{\beta}{n^2}(\bar g^T1_n1_n^T- g^T1_n1_n^T)(v-1_n\otimes x^*)\\
&\leq \frac{\beta}{n}(\|W\|+\kappa n\|1_n\otimes \bar x-x\|)\|v-1_n\otimes x^*\|
\end{aligned}
n 2 β ( g ˉ T 1 n 1 n T − n y T ) ( v − 1 n ⊗ x ∗ ) = n 2 β ( g T 1 n 1 n T − n y T ) ( v − 1 n ⊗ x ∗ ) + n 2 β ( g ˉ T 1 n 1 n T − g T 1 n 1 n T ) ( v − 1 n ⊗ x ∗ ) ≤ n β ( ∥ W ∥ + κ n ∥ 1 n ⊗ x ˉ − x ∥ ) ∥ v − 1 n ⊗ x ∗ ∥
因为v i , x ∗ ∈ Ω v_i,x^*\in\Omega v i , x ∗ ∈ Ω ,故∥ v − 1 n ⊗ x ∗ ∥ ≤ c \|v-1_n\otimes x^*\|\leq c ∥ v − 1 n ⊗ x ∗ ∥ ≤ c 有界。则
V ˙ ≤ − β V + c β n ( ∥ W ∥ + κ n ∥ 1 n ⊗ x ˉ − x ∥ ) \dot V\leq -\beta V+\frac{c\beta}{n}(\|W\|+\kappa n\|1_n\otimes \bar x-x\|)
V ˙ ≤ − β V + n c β ( ∥ W ∥ + κ n ∥ 1 n ⊗ x ˉ − x ∥ )
再次利用引理2,得到V → 0 V\to 0 V → 0 。
由于F ( x ) F(x) F ( x ) 不是强凸,需要加证明x → x ∗ x\to x^* x → x ∗ ,见原论文。