Orthogonal Convolutional Neural Networks-1024programmer

Table of Contents

Overview
Main Contents
- Symbol Description
- Two representations of $Y=Conv(K,X)$
  - $Y=K\tilde{X}$
  - $Y=\mathcal{K}X$
- kernel orthogonal regularization
- orthogonal convolution

Wang J, Chen Y, Chakraborty R, et al. Orthogonal Convolutional Neural Networks.[J]. arXiv: Computer Vision and Pattern Recognition, 2019.

@article{wang2019orthogonal,

title={Orthogonal Convolutional Neural Networks.},

author={Wang, Jiayun and Chen, Yubei and Chakraborty, Rudrasis and Yu, Stella X},

journal={arXiv: Computer Vision and Pattern Recognition },

year={2019}}

General

This paper proposes a method for orthogonalizing CNN.

main content

symbol description

$X \in \mathbb {R}^{N \times C \times H \times W}$: Input

$K \in \mathbb{R}^{M \times C \times k \times k} $: convolution kernel

$Y \in \mathbb{R}^{N \times M \times H’ \times W’}$: output

\ [Y= Conv(K,X)
\]

$Y=Conv(K,X)$Two representations

$Y=K\tilde{X}$

At this time$K\in \ mathbb{R}^{M \times Ck^2}$, each line is equivalent to a convolution kernel, $\tilde{X} \in \mathbb{R}^{Ck^2 \times H’W’ }$, $Y \in \mathbb{R}^{M \times H’W’}$.

$Y=\mathcal{ K}X$

At this time $X \in \mathbb{R}^{CHW}$ is equivalent to stretching a picture into strips, $\mathcal{K} \in \ mathbb{R}^{MHW’ \times CHW}$, and the inner product of each row and column is equivalent to a convolution operation, $Y \in \mathbb{R}^{MH’W’}$.

kernel orthogonal regularization

It is equivalent to requiring $KK^T=I$(row orthogonal) or$K^ TK=I$(column orthogonal), the regular term is

\[L_{korth-row}= \|KK^T-I\|_F,\\
L_{korth-col} = \|K^TK-I\|_F.
\]

The author explained in the latest version of the paper that the two are equivalent.

orthogonal convolution

What the author expects is $\mathcal{K}\mathcal{K}^T=I$ or$\mathcal{K}^ T\mathcal{K}=I$.

Use $\mathcal{K}(ihw,\cdot)$ to represent the first $(i-1) H’W’+(h -1)W’+w$ row, the corresponding $\mathcal{K}(\cdot, ihw)$ means $(i-1) HW+(h-1)W+w$ column.

Then $\mathcal{K}\mathcal{K}^T=I$ is equivalent to

\[\tag{5}
\langle \mathcal{ K}(ih_1w_1, \cdot), \mathcal{K}(jh_2w_2,\cdot)\rangle =
\left \{
\begin{array}{ll}
1, & (i, h_1,w_1)=(j,h_2,w_2) \\
0, & else.
\end{array} \right.
\]

$\mathcal{ K}^T\mathcal{K}=I$ is equivalent to

\[\tag{10}
\langle \mathcal{K}(\cdot, ih_1w_1), \mathcal{ K}(\cdot, jh_2w_2)\rangle =
\left \{
\begin{array}{ll}
1, & (i,h_1,w_1)=(j,h_2,w_2) \\
0, & else.
\end{array} \right.
\]

Actually, there are a lot of redundancy in doing this, and it can be further simplified The form of.

(5) is equivalent to

\[\tag{7}
Conv(K, K, padding=P, stride=S)=I_{ r0},
\]

Where$I_{r0}\in \mathbb{R}^{M\times M \times (2P/S+1) \times (2P/S +1)}$ only in $[i,i,\lfloor \frac{k-1}{S} \rfloor+1,\lfloor \frac{k-1}{S} \rfloor+1], i=1,\ldots, M$ is $1$ and the rest of the elements are$0$.

\[P= \lfloor \frac{k-1}{S} \rfloor \cdot S.
\]

The derivation process is as follows (this is really hard to write clearly):

$\mathcal{K}^T\mathcal{K}$ In the special case of $S=1$ special case, (10) is equivalent to

\[\tag{11}
Conv (K^T,K^T, padding=k-1, stride=1)=I_{c0},
\]

Where$I_{c0} \in \mathbb{R}^{C \times C \times (2k-1) \times (2k-1)}$, also only in $(i,i, k,k)$ is 1, and the rest are non-zero.$K^T \in \mathbb{R}^{C \times M \times k \times k}$ is the first of $K$ , 2 coordinate axes for transformation.

The same

\[\min_K \|\mathcal{K}\mathcal{K}^T-I\|_F
\]

with

\[\min_K \|\mathcal{K}^T\mathcal{K}-I\| _F
\]

It is equivalent.

On the other hand, the kernel orthogonal regularization mentioned at the beginning is a necessary condition (but not sufficient) for orthogonal convolution$ KK^T=I$, $K^TK=I$ are equivalent to:

\[Conv(K,K,padding=0)=I_{r0} \\
Conv(K^T, K^T, padding=0)=I_{c_0},
\]

Where$I_{r0} \in \mathbb{R}^{M \times M \times 1 \times 1}$, $I_{c0} \in \mathbb{R}^{C \times C \times 1 \times 1}$.

Orthogonal Convolutional Neural Networks

General