Table of Contents
- Overview
- Main Contents
- Symbol Description
- Two representations of $Y=Conv(K,X)$
- $Y=K\tilde{X}$
- $Y=\mathcal{K}X$
- kernel orthogonal regularization
- orthogonal convolution
Wang J, Chen Y, Chakraborty R, et al. Orthogonal Convolutional Neural Networks.[J]. arXiv: Computer Vision and Pattern Recognition, 2019.
@article{wang2019orthogonal,
title={Orthogonal Convolutional Neural Networks.},
author={Wang, Jiayun and Chen, Yubei and Chakraborty, Rudrasis and Yu, Stella X},
journal={arXiv: Computer Vision and Pattern Recognition },
year={2019}}
General
This paper proposes a method for orthogonalizing CNN.
main content
symbol description
\(X \in \mathbb {R}^{N \times C \times H \times W}\): Input
\(K \in \mathbb{R}^{M \times C \times k \times k} \): convolution kernel
\(Y \in \mathbb{R}^{N \times M \times H’ \times W’}\): output
\ [Y= Conv(K,X)
\]
\(Y=Conv(K,X)\)Two representations
\(Y=K\tilde{X}\)
At this time\(K\in \ mathbb{R}^{M \times Ck^2}\), each line is equivalent to a convolution kernel, \(\tilde{X} \in \mathbb{R}^{Ck^2 \times H’W’ }\), \(Y \in \mathbb{R}^{M \times H’W’}\).
\(Y=\mathcal{ K}X\)
At this time \(X \in \mathbb{R}^{CHW}\) is equivalent to stretching a picture into strips, \(\mathcal{K} \in \ mathbb{R}^{MHW’ \times CHW}\), and the inner product of each row and column is equivalent to a convolution operation, \(Y \in \mathbb{R}^{MH’W’}\).
kernel orthogonal regularization
It is equivalent to requiring \(KK^T=I\)(row orthogonal) or\(K^ TK=I\)(column orthogonal), the regular term is
\[L_{korth-row}= \|KK^T-I\|_F,\\
L_{korth-col} = \|K^TK-I\|_F.
\]
The author explained in the latest version of the paper that the two are equivalent.
orthogonal convolution
What the author expects is \(\mathcal{K}\mathcal{K}^T=I\) or\(\mathcal{K}^ T\mathcal{K}=I\).
Use \(\mathcal{K}(ihw,\cdot)\) to represent the first \((i-1) H’W’+(h -1)W’+w\) row, the corresponding \(\mathcal{K}(\cdot, ihw)\) means \((i-1) HW+(h-1)W+w\) column.
Then \(\mathcal{K}\mathcal{K}^T=I\) is equivalent to
\[\tag{5}
\langle \mathcal{ K}(ih_1w_1, \cdot), \mathcal{K}(jh_2w_2,\cdot)\rangle =
\left \{
\begin{array}{ll}
1, & (i, h_1,w_1)=(j,h_2,w_2) \\
0, & else.
\end{array} \right.
\]
\(\mathcal{ K}^T\mathcal{K}=I\) is equivalent to
\[\tag{10}
\langle \mathcal{K}(\cdot, ih_1w_1), \mathcal{ K}(\cdot, jh_2w_2)\rangle =
\left \{
\begin{array}{ll}
1, & (i,h_1,w_1)=(j,h_2,w_2) \\
0, & else.
\end{array} \right.
\]
Actually, there are a lot of redundancy in doing this, and it can be further simplified The form of.
(5) is equivalent to
\[\tag{7}
Conv(K, K, padding=P, stride=S)=I_{ r0},
\]
Where\(I_{r0}\in \mathbb{R}^{M\times M \times (2P/S+1) \times (2P/S +1)}\) only in \([i,i,\lfloor \frac{k-1}{S} \rfloor+1,\lfloor \frac{k-1}{S} \rfloor+1], i=1,\ldots, M\) is \(1\) and the rest of the elements are\(0\).
\[P= \lfloor \frac{k-1}{S} \rfloor \cdot S.
\]
The derivation process is as follows (this is really hard to write clearly):
\(\mathcal{K}^T\mathcal{K}\) In the special case of \(S=1\) special case, (10) is equivalent to
\[\tag{11}
Conv (K^T,K^T, padding=k-1, stride=1)=I_{c0},
\]
Where\(I_{c0} \in \mathbb{R}^{C \times C \times (2k-1) \times (2k-1)}\), also only in \((i,i, k,k)\) is 1, and the rest are non-zero.\(K^T \in \mathbb{R}^{C \times M \times k \times k}\) is the first of \(K\) , 2 coordinate axes for transformation.
The same
\[\min_K \|\mathcal{K}\mathcal{K}^T-I\|_F
\]
with
\[\min_K \|\mathcal{K}^T\mathcal{K}-I\| _F
\]
It is equivalent.
On the other hand, the kernel orthogonal regularization mentioned at the beginning is a necessary condition (but not sufficient) for orthogonal convolution\( KK^T=I\), \(K^TK=I\) are equivalent to:
\[Conv(K,K,padding=0)=I_{r0} \\
Conv(K^T, K^T, padding=0)=I_{c_0},
\]
Where\(I_{r0} \in \mathbb{R}^{M \times M \times 1 \times 1}\), \(I_{c0} \in \mathbb{R}^{C \times C \times 1 \times 1}\).