微积分笔记04:常见的矩阵求导运算
4.1 常规矩阵求导示例
4.1.1 求导示例1:\(f(x)=A_{m\times n}\cdot x_{n \times 1}\) \(\Rightarrow f’_{x^T}(x)=A_{m\times n}\)
如:
\[A= \begin{bmatrix} a_1&a_2&a_3\\ b_1&b_2&b_3 \end{bmatrix}, x= \begin{bmatrix} x_1\\ x_2\\ x_3 \end{bmatrix} \Rightarrow f(x)= \begin{bmatrix} a_1x_1+a_2x_2+a_3x_3\\ b_1x_1+b_2x_2+b_3x_3 \end{bmatrix} \]
由矩阵性质和意义(参数项直接保留在矩阵中)可得:
\[\tag{1} f’_{x^T}(x)= \begin{bmatrix} a_1&a_2&a_3\\ b_1&b_2&b_3 \end{bmatrix}=A \]
4.1.2 求导示例2:\(f(x)= x_{1 \times m}\cdot A_{mm} \cdot x^T_{1 \times m} \Rightarrow f’_x(x)=(A_{mm}+A_{mm}^T)\cdot x_{1 \times m}\)
如:
\[x= \begin{bmatrix} x_1&x_2 \end{bmatrix}, A= \begin{bmatrix} a&b\\ c&d \end{bmatrix}, x^T= \begin{bmatrix} x_1\\ x_2 \end{bmatrix} \]
\[\Rightarrow f(x)= \begin{bmatrix} ax_1+cx_2&bx_1+dx_2 \end{bmatrix} \cdot \begin{bmatrix} x_1\\ x_2 \end{bmatrix} \]
\[\qquad\quad = \begin{bmatrix} a{x_1}^2+bx_1x_2+cx_1x_2+dx_2^2 \end{bmatrix} \]
则有:
\[f’_x(x)= \begin{bmatrix} 2ax_1+bx_2+cx_2&2dx_2+bx_1+cx_1 \end{bmatrix} \]
\[\tag{2} = \begin{bmatrix} a&b\\ c&d \end{bmatrix} \cdot \begin{bmatrix} x_1&x_2 \end{bmatrix} + \begin{bmatrix} a&c\\ b&d \end{bmatrix} \cdot \begin{bmatrix} x_1&x_2 \end{bmatrix} =(A+A^T)x \]
4.1.3 求导示例3:\(f(x)=x_{1\times n}^T\cdot a_{n \times 1} \Rightarrow f_x'(x)=(x_{1\times n}\cdot a_{n \times 1}^T)’_x=a\)
如:
\[x^T= \begin{bmatrix} x_1&x_2 \end{bmatrix}, a= \begin{bmatrix} a_1\\ a_2 \end{bmatrix} \]
\[\Rightarrow f(x)= x^T\cdot a= \begin{bmatrix} x_1a_1+x_2a_2 \end{bmatrix} =x\cdot a^T \]
又:
\[x= \begin{bmatrix} x_1\\ x_2 \end{bmatrix} \]
则由矩阵的性质及意义(参数项直接保留在矩阵中),有:
\[\tag{3} f’_x(x)= (x\cdot a^T)_x’ = \begin{bmatrix} a_1\\ a_2 \end{bmatrix} =a \]
4.1.4 求导示例4:\(f(x)=x_{m\times 1}^T\cdot A_{m \times n}\cdot y_{n \times 1} \Rightarrow f_x'(x)=Ay,f’_A(x)=xy^T\)
如:
\[x^T= \begin{bmatrix} x_1&x_2&x_3 \end{bmatrix}, A= \begin{bmatrix} a_1&a_2\\ a_3&a_4\\ a_5&a_6 \end{bmatrix}, y= \begin{bmatrix} y_1\\ y_2\\ \end{bmatrix} \]
\[\Rightarrow f(x) =x^T\cdot A\cdot y= \begin{bmatrix} a_1x_1+a_3x_2+a_5x_3&a_2x_1+a_4x_2+a_6x_3\\ \end{bmatrix} \cdot \begin{bmatrix} y_1\\ y_2\\ \end{bmatrix} \]
\[\qquad\qquad\qquad\qquad\qquad\quad = \begin{bmatrix} (a_1x_1+a_3x_2+a_5x_3)\cdot y_1+(a_2x_1+a_4x_2+a_6x_3)\cdot y_2 \end{bmatrix} \]
则有:
\[f’_x(x)= \begin{bmatrix} (a_1+a_3+a_5)\cdot y_1+(a_2+a_4+a_6)\cdot y_2 \end{bmatrix} =A \cdot y \]
\[\tag{4} f’_A(x)= \begin{bmatrix} (x_1)\cdot y_1+(x_1)\cdot y_2\\ (x_2)\cdot y_1+(x_2)\cdot y_2\\ (x_3)\cdot y_1+(x_3)\cdot y_2 \end{bmatrix} =x\cdot y^T \]
4.2 矩阵的范数求导示例
设存在矩阵\(X_{N \times n},向量a_{n \times 1},y_{N \times 1}\)
设\(f(x)=||X\cdot a-y||^2\),则\(f’_a(x)\)的求解过程如下:
由范数相关性质可得:
\[f(x)=(X\cdot a-y)\cdot (X\cdot a-y)^T \]
\[\qquad \qquad =(X\cdot a-y)\cdot (a^T\cdot X^T -y^T) \]
\[\tag{5} \qquad \qquad\qquad\qquad\qquad\quad =a\cdot X X^T \cdot a^T -X\cdot a\cdot y^T-y\cdot a^T \cdot X^T + yy^T \]
式(5)中:
对于项\(a\cdot X X^T \cdot a^T\),由常规矩阵求导的式(2)可得:
\[(a\cdot X X^T \cdot a^T)’_a=(XX^T+X^TX)\cdot a=2XX^T\cdot a \]
对于项\(X\cdot a\cdot y^T\),由常规矩阵求导的式(3)可得:
\[(X\cdot a\cdot y^T)_a’=(y^T\cdot X\cdot a )_a’=[(X^T\cdot y )^T\cdot a] _a’=X^T\cdot y \]
对于项\(y\cdot a^T \cdot X^T\):
\[(y\cdot a^T \cdot X^T)’_a=(a^T\cdot X^T\cdot y)’_a=X^T\cdot y \]
由上可得:
\[f’_a(x)=(||X\cdot a-y||^2)_a’=2(XX^T\cdot a-X^T\cdot y) \]
4.3 矩阵的迹求导示例
4.3.1 矩阵的迹求导示例1:\(tr’_A(A)=I\)
设存在矩阵\(A_{mm}\),且\(tr(A)\)为矩阵\(A\)的迹,则有:
\[tr(A)=\Sigma_{i=1}^m a_{ii} \]
由矩阵的特性和意义(参数项直接保留在矩阵中)可得:
\[\tag{6} \Rightarrow tr(A)’_A=I= \begin {bmatrix} 1&&&\\ &1&&\\ &&…&\\ &&&1\\ \end{bmatrix} \]
4.3.2 矩阵的迹求导示例2:\(tr’_A(A\cdot B)=B^T\)
设存在矩阵\(A_{mm}、B_{mm}\),且\(tr(A\cdot B)\)为\(A\cdot B\)的迹,则有:
\[tr(A\cdot B)=\Sigma_{i=1}^m\Sigma_{j=1}^m a_{ij}b_{ji} \]
由矩阵的特性和意义(参数项直接保留在矩阵中)可得:
\[\tag{7} tr’_A(A\cdot B)=(\Sigma_{i=1}^m\Sigma_{j=1}^m a_{ij}b_{ji})’_A=B^T \]
4.3.3 矩阵的迹求导示例3:\(tr’_A(A\cdot A^T)=2\cdot A\)
设存在矩阵\(A_{mm}\),且\(tr(A\cdot A^T)\)为\(A\cdot A^T\)的迹,则有:
\[tr(A\cdot A^T)=\Sigma_{i=1}^m\Sigma_{j=1}^m a_{ij}a_{ji}=\Sigma_{i=1}^m\Sigma_{j=1}^m a^2_{ij} \]
由矩阵的特性和意义(参数项直接保留在矩阵中)可得:
\[\tag{8} tr’_A(A\cdot A^T)=(\Sigma_{i=1}^m\Sigma_{j=1}^m a^2_{ij})’_A=(A^2)’_A=2\cdot A \]
4.4 行列式求导示例:\(|A|’_A=|A|\cdot (A^{-1})^T\)
设存在矩阵\(A_{mm}\),\(|A|\)是A的行列式,\(a_{ij}\)是A中任一元素,\(A_{ij}\)是\(a_{ij}\)的代数余子式
则有:
\[|A|=a_{i1}A_{i1}+a_{i2}A_{i2}+…+a_{im}A_{im} \]
\[\Rightarrow |A|’_A=(a_{i1}A_{i1}+a_{i2}A_{i2}+…+a_{im}A_{im})’_A \]
\[\qquad\qquad\qquad\qquad = \begin {bmatrix} (a_{11}A_{11}+a_{12}A_{12}+…+a_{1m}A_{1m})’_A\\ (a_{21}A_{21}+a_{22}A_{22}+…+a_{2m}A_{2m})’_A\\ ……\\ (a_{m1}A_{m1}+a_{m2}A_{m2}+…+a_{mm}A_{mm})’_A \end {bmatrix} \]
\[\tag{9} \qquad\qquad\quad = \begin {bmatrix} A_{11}&A_{12}&…&A_{1m}\\ A_{21}&A_{22}&…&A_{2m}\\ &&……&\\ A_{m1}&A_{m2}&…&A_{mm}\\ \end {bmatrix} =A^{*T} \]
由矩阵的逆相关性质\(A^{-1}=\frac{A^*}{|A|}\)可得:
\[\tag{10} |A|’_A=|A|\cdot (A^{-1})^T \]
来源链接:https://www.cnblogs.com/efancn/p/18770330
如有侵犯您的版权,请及时联系3500663466#qq.com(#换@),我们将第一时间删除本站数据。
暂无评论内容