矩阵求导学习

阿里云国内75折 回扣 微信号:monov8
阿里云国际,腾讯云国际,低至75折。AWS 93折 免费开户实名账号 代冲值 优惠多多 微信号:monov8 飞机:@monov6

布局

在这里插入图片描述

分子布局

∂ y ∂ x = ( ∂ y ∂ x 1 ∂ y ∂ x 2 ⋯ ∂ y ∂ x n ) \frac{\partial y}{\partial \mathbf{x}} = \begin{pmatrix} \frac{\partial y}{\partial x_1} & \frac{\partial y}{\partial x_2} &\cdots & \frac{\partial y}{\partial x_n} \end{pmatrix} xy=(x1yx2yxny)
∂ y ∂ x = ( ∂ y 1 ∂ x ∂ y 2 ∂ x ⋮ ∂ y n ∂ x ) \frac{\partial \mathbf{y}}{\partial x} = \begin{pmatrix} \frac{\partial y_1}{\partial x} \\ \frac{\partial y_2}{\partial x} \\ \vdots \\ \frac{\partial y_n}{\partial x} \end{pmatrix} xy= xy1xy2xyn
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ⋯ ∂ y 1 ∂ x n ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ⋯ ∂ y 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ y m ∂ x 1 ∂ y m ∂ x 2 ⋯ ∂ y m ∂ x n ] \frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\left[\begin{array}{cccc} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n} \\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n} \end{array}\right] xy= x1y1x1y2x1ymx2y1x2y2x2ymxny1xny2xnym
∂ y ∂ X = [ ∂ y ∂ x 11 ∂ y ∂ x 21 ⋯ ∂ y ∂ x p 1 ∂ y ∂ x 12 ∂ y ∂ x 22 ⋯ ∂ y ∂ x p 2 ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x 1 q ∂ y ∂ x 2 q ⋯ ∂ y ∂ x p q ] \frac{\partial y}{\partial \mathbf{X}}=\left[\begin{array}{cccc} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{p 1}} \\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{p 2}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y}{\partial x_{1 q}} & \frac{\partial y}{\partial x_{2 q}} & \cdots & \frac{\partial y}{\partial x_{p q}} \end{array}\right] Xy= x11yx12yx1qyx21yx22yx2qyxp1yxp2yxpqy
∂ Y ∂ x = [ ∂ y 11 ∂ x ∂ y 12 ∂ x ⋯ ∂ y 1 n ∂ x ∂ y 21 ∂ x ∂ y 22 ∂ x ⋯ ∂ y 2 n ∂ x ⋮ ⋮ ⋱ ⋮ ∂ y m 1 ∂ x ∂ y m 2 ∂ x ⋯ ∂ y m n ∂ x ] \frac{\partial \mathbf{Y}}{\partial x}=\left[\begin{array}{cccc} \frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \cdots & \frac{\partial y_{1 n}}{\partial x} \\ \frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \cdots & \frac{\partial y_{2 n}}{\partial x} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_{m 1}}{\partial x} & \frac{\partial y_{m 2}}{\partial x} & \cdots & \frac{\partial y_{m n}}{\partial x} \end{array}\right] xY= xy11xy21xym1xy12xy22xym2xy1nxy2nxymn
d X = [ d x 11 d x 12 ⋯ d x 1 n d x 21 d x 22 ⋯ d x 2 n ⋮ ⋮ ⋱ ⋮ d x m 1 d x m 2 ⋯ d x m n ] d \mathbf{X}=\left[\begin{array}{cccc} d x_{11} & d x_{12} & \cdots & d x_{1 n} \\ d x_{21} & d x_{22} & \cdots & d x_{2 n} \\ \vdots & \vdots & \ddots & \vdots \\ d x_{m 1} & d x_{m 2} & \cdots & d x_{m n} \end{array}\right] dX= dx11dx21dxm1dx12dx22dxm2dx1ndx2ndxmn

分母布局

∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x n ] \frac{\partial y}{\partial \mathbf{x}}=\left[\begin{array}{c} \frac{\partial y}{\partial x_1} \\ \frac{\partial y}{\partial x_2} \\ \vdots \\ \frac{\partial y}{\partial x_n} \end{array}\right] xy= x1yx2yxny
∂ y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋯ ∂ y m ∂ x ] \frac{\partial \mathbf{y}}{\partial x}=\left[\begin{array}{llll} \frac{\partial y_1}{\partial x} & \frac{\partial y_2}{\partial x} & \cdots & \frac{\partial y_m}{\partial x} \end{array}\right] xy=[xy1xy2xym]
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ⋯ ∂ y m ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 2 ∂ x 2 ⋯ ∂ y m ∂ x 2 ⋮ ⋮ ⋱ ⋮ ∂ y 1 ∂ x n ∂ y 2 ∂ x n ⋯ ∂ y m ∂ x n ] \frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\left[\begin{array}{cccc} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_1} \\ \frac{\partial y_1}{\partial x_2} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_1}{\partial x_n} & \frac{\partial y_2}{\partial x_n} & \cdots & \frac{\partial y_m}{\partial x_n} \end{array}\right] xy= x1y1x2y1xny1x1y2x2y2xny2x1ymx2ymxnym
∂ y ∂ X = [ ∂ y ∂ x 11 ∂ y ∂ x 12 ⋯ ∂ y ∂ x 1 q ∂ y ∂ x 21 ∂ y ∂ x 22 ⋯ ∂ y ∂ x 2 q ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x p 1 ∂ y ∂ x p 2 ⋯ ∂ y ∂ x p q ] \frac{\partial y}{\partial \mathbf{X}}=\left[\begin{array}{cccc} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} & \cdots & \frac{\partial y}{\partial x_{1 q}} \\ \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{2 q}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y}{\partial x_{p 1}} & \frac{\partial y}{\partial x_{p 2}} & \cdots & \frac{\partial y}{\partial x_{p q}} \end{array}\right] Xy= x11yx21yxp1yx12yx22yxp2yx1qyx2qyxpqy

向量对向量求导

在这里插入图片描述
推导1
v = v ( x ) , u = u ( x ) v = v\left(\mathbf{x}\right),\mathbf{u}=\mathbf{u}\left(\mathbf{x}\right) v=v(x),u=u(x)
∂ v u ∂ x \frac{\partial v \mathbf{u}}{\partial \mathbf{x}} xvu
分子布局
∂ ( v u ) i ∂ x j = ∂ ( v u i ) ∂ x j = ∂ v ∂ x j u i + v ∂ u i ∂ x j = u i ( ∂ v ∂ x ) j + v ( ∂ u ∂ x ) i j \frac{\partial \left(v \mathbf{u}\right)_i}{\partial \mathbf{x}_j}=\frac{\partial \left(v \mathbf{u}_i\right)}{\partial \mathbf{x}_j}=\frac{\partial v}{\partial \mathbf{x}_j}\mathbf{u}_i + v\frac{\partial \mathbf{u}_i}{\partial \mathbf{x}_j}=\mathbf{u}_i\left(\frac{\partial v}{\partial \mathbf{x}}\right)_j +v\left(\frac{\partial \mathbf{u}}{\partial \mathbf{x}}\right)_{ij} xj(vu)i=xj(vui)=xjvui+vxjui=ui(xv)j+v(xu)ij
进而
∂ v u ∂ x = u ∂ v ∂ x + v ∂ u ∂ x \frac{\partial v \mathbf{u}}{\partial \mathbf{x}}=\mathbf{u}\frac{\partial v}{\partial \mathbf{x}}+v\frac{\partial \mathbf{u}}{\partial \mathbf{x}} xvu=uxv+vxu

分母布局
∂ ( v u ) j ∂ x i = ∂ ( v u j ) ∂ x i = ∂ v ∂ x i u j + v ∂ u j ∂ x i = ( ∂ v ∂ x ) i u j + v ( ∂ u ∂ x ) i j \frac{\partial \left(v \mathbf{u}\right)_j}{\partial \mathbf{x}_i}=\frac{\partial \left(v \mathbf{u}_j\right)}{\partial \mathbf{x}_i}=\frac{\partial v}{\partial \mathbf{x}_i}\mathbf{u}_j + v\frac{\partial \mathbf{u}_j}{\partial \mathbf{x}_i}=\left(\frac{\partial v}{\partial \mathbf{x}}\right)_i \mathbf{u}_j +v\left(\frac{\partial \mathbf{u}}{\partial \mathbf{x}}\right)_{ij} xi(vu)j=xi(vuj)=xivuj+vxiuj=(xv)iuj+v(xu)ij
∂ v u ∂ x = ∂ v ∂ x u T + v ∂ u ∂ x \frac{\partial v \mathbf{u}}{\partial \mathbf{x}}=\frac{\partial v}{\partial \mathbf{x}} \mathbf{u}^T+v\frac{\partial \mathbf{u}}{\partial \mathbf{x}} xvu=xvuT+vxu

推导2
g ( u ) : R n → R n \mathbf{g}\left(\mathbf{u}\right):\mathbb{R}^{n}\to\mathbb{R}^n g(u):RnRn
∂ g i ∂ x j = ∑ k ∂ g i ∂ u k ∂ u k ∂ x j \frac{\partial g_i}{\partial x_j}=\sum_{k}\frac{\partial g_i}{\partial u_k} \frac{\partial u_k}{\partial x_j} xjgi=kukgixjuk
分子布局 ∂ g ∂ x = ∂ g ∂ u ∂ u ∂ x \frac{\partial \mathbf{g}}{\partial \mathbf{x}} = \frac{\partial \mathbf{g}}{\partial \mathbf{u}}\frac{\partial \mathbf{u}}{\partial \mathbf{x}} xg=ugxu

例子
l = ∥ X w − y ∥ 2 l=\|\mathbf{X}\mathbf{w}-\mathbf{y}\|^2 l=Xwy2,其中 X ∈ R m × n , w , y ∈ R n \mathbf{X}\in\mathbb{R}^{m\times n},\mathbf{w},\mathbf{y}\in\mathbb{R}^n XRm×n,w,yRn,求 ∂ l ∂ w \frac{\partial l}{\partial \mathbf{w}} wl
u = X w − y \mathbf{u} = \mathbf{X}\mathbf{w}-\mathbf{y} u=Xwy
∂ l ∂ w = ∂ u ∂ w ∂ l ∂ u = X T 2 u = 2 X T ( X w − y ) \frac{\partial l}{\partial \mathbf{w}} = \frac{\partial \mathbf{u}}{\partial \mathbf{w}} \frac{\partial l}{\partial \mathbf{u}}=\mathbf{X}^T2\mathbf{u}=2\mathbf{X}^T\left( \mathbf{X}\mathbf{w}-\mathbf{y}\right) wl=wuul=XT2u=2XT(Xwy)

标量对向量求导

在这里插入图片描述

微分

分母布局
d f = ∑ i = 1 m ∑ i = 1 n ∂ f ∂ x i j d x i j = t r ( ∂ f ∂ x T d X ) \rm{d} f=\sum_{i=1}^{m}\sum_{i=1}^{n}\frac{\partial f}{\partial x_{ij}}\rm{d}x_{ij}=tr\left(\frac{\partial f}{\partial \mathbf{x}}^T\rm{d}\mathbf{X}\right) df=i=1mi=1nxijfdxij=tr(xfTdX)
法则
d ( X ± Y ) = d X ± d Y \rm{d}\left(\mathbf{X} \pm \mathbf{Y}\right) = \rm{d}\mathbf{X} \pm \rm{d}\mathbf{Y} d(X±Y)=dX±dY
d ( X Y ) = d ( X ) Y + X d ( Y ) \rm{d}\left(\mathbf{X} \mathbf{Y}\right) =\rm{d}\left(\mathbf{X} \right) \mathbf{Y}+ \mathbf{X} \rm{d}\left(\mathbf{Y}\right) d(XY)=d(X)Y+Xd(Y)
d ( X T ) = ( d X ) T \rm{d}\left(\mathbf{X}^T\right)=\left(\rm{d} \mathbf{X}\right)^T d(XT)=(dX)T
d t r ( X ) = t r ( d X ) \rm{d} tr\left(\mathbf{X}\right)=tr\left(\rm{d} \mathbf{X}\right) dtr(X)=tr(dX)
d X − 1 = − X − 1 ( d X ) X − 1 \rm{d} \mathbf{X}^{-1}=-\mathbf{X}^{-1}\left(\rm{d}\mathbf{X}\right) \mathbf{X}^{-1} dX1=X1(dX)X1
d ∣ X ∣ = t r ( X ∗ d X ) = ∣ X ∣ t r ( X − 1 d X ) \rm{d}\left|\mathbf{X}\right|=tr\left(\mathbf{X}^{*}\rm{d}\mathbf{X}\right) = \left|\mathbf{X}\right|tr\left(\mathbf{X}^{-1}\rm{d}\mathbf{X}\right) dX=tr(XdX)=Xtr(X1dX)
d ( X ⊙ Y ) = d X ⊙ Y + X ⊙ d Y d(\mathbf{X} \odot \mathbf{Y})=d \mathbf{X} \odot \mathbf{Y}+\mathbf{X} \odot d \mathbf{Y} d(XY)=dXY+XdY
d σ ( X ) = σ ′ ( X ) ⊙ d X d \sigma(\mathbf{X})=\sigma^{\prime}(\mathbf{X}) \odot d \mathbf{X} dσ(X)=σ(X)dX

技巧
X = t r ( X ) \mathbf{X} = tr\left(\mathbf{X}\right) X=tr(X)
t r ( X T ) = t r ( X ) tr\left(\mathbf{X}^T\right)=tr\left(\mathbf{X}\right) tr(XT)=tr(X)
t r ( X ± Y ) = t r ( X ) ± t r ( Y ) tr\left(\mathbf{X} \pm \mathbf{Y}\right)=tr\left(\mathbf{X}\right) \pm tr\left(\mathbf{Y}\right) tr(X±Y)=tr(X)±tr(Y)
t r ( X Y ) = t r ( Y X ) tr\left(\mathbf{X}\mathbf{Y}\right) = tr\left(\mathbf{Y}\mathbf{X}\right) tr(XY)=tr(YX)
t r ( A T ( B ⊙ C ) ) = t r ( ( A ⊙ B ) T C ) tr\left(\mathbf{A}^T\left(\mathbf{B}\odot\mathbf{C}\right)\right)=tr\left(\left(\mathbf{A}\odot\mathbf{B}\right)^T\mathbf{C}\right) tr(AT(BC))=tr((AB)TC)

求导例子
f = t r ( Y T M Y ) , Y = σ ( W X ) f = tr\left(\mathbf{Y}^T\mathbf{M}\mathbf{Y}\right),\mathbf{Y} = \sigma\left(\mathbf{W}\mathbf{X}\right) f=tr(YTMY),Y=σ(WX)
d f = t r ( d Y T M Y + Y T M d Y ) ⇒ ∂ f ∂ Y = M Y + M T Y \rm{d}f=tr\left(\rm{d}\mathbf{Y}^T\mathbf{M}\mathbf{Y}+\mathbf{Y}^T\mathbf{M}\rm{d}\mathbf{Y}\right)\Rightarrow\frac{\partial f}{\partial \mathbf{Y}}=\mathbf{M}\mathbf{Y}+\mathbf{M}^T\mathbf{Y} df=tr(dYTMY+YTMdY)Yf=MY+MTY
d Y = t r ( σ ′ ( W X ) ⊙ ( W d X ) ) \rm{d}\mathbf{Y} = tr\left(\sigma^{\prime}\left(\mathbf{W}\mathbf{X}\right)\odot\left( \mathbf{W}\rm{d}\mathbf{X}\right)\right) dY=tr(σ(WX)(WdX))
d f = t r ( ∂ f ∂ Y T d Y ) = t r ( ∂ f ∂ Y T σ ′ ( W X ) ⊙ ( W d X ) ) = t r ( ( ∂ f ∂ Y ⊙ σ ′ ( W X ) ) T ( W d X ) ) \rm{d}f=tr\left(\frac{\partial f}{\partial\mathbf{Y}}^T\mathbf{d}\mathbf{Y}\right)=tr\left(\frac{\partial f}{\partial\mathbf{Y}}^T\sigma^{\prime}\left(\mathbf{W}\mathbf{X}\right)\odot\left( \mathbf{W}\rm{d}\mathbf{X}\right)\right)=tr\left(\left(\frac{\partial f}{\partial\mathbf{Y}}\odot\sigma^{\prime}\left(\mathbf{W}\mathbf{X}\right)\right)^T\left( \mathbf{W}\rm{d}\mathbf{X}\right)\right) df=tr(YfTdY)=tr(YfTσ(WX)(WdX))=tr((Yfσ(WX))T(WdX))
于是
∂ f ∂ X = W T ( ( M Y + M T Y ) ⊙ σ ′ ( W X ) ) \frac{\partial f}{\partial \mathbf{X}}=\mathbf{W}^T\left(\left(\mathbf{M}\mathbf{Y}+\mathbf{M}^T\mathbf{Y}\right)\odot\sigma^{\prime}\left(\mathbf{W}\mathbf{X}\right)\right) Xf=WT((MY+MTY)σ(WX))

参考
https://zhuanlan.zhihu.com/p/24709748
https://en.wikipedia.org/wiki/Matrix_calculus#convert_differential_derivative

阿里云国内75折 回扣 微信号:monov8
阿里云国际,腾讯云国际,低至75折。AWS 93折 免费开户实名账号 代冲值 优惠多多 微信号:monov8 飞机:@monov6