1024programmer News Orthogonal Convolutional Neural Networks

Orthogonal Convolutional Neural Networks

Table of Contents

  • Overview
  • Main Contents
    • Symbol Description
    • Two representations of $Y=Conv(K,X)$
      • $Y=K\tilde{X}$
      • $Y=\mathcal{K}X$
    • kernel orthogonal regularization
    • orthogonal convolution

Wang J, Chen Y, Chakraborty R, et al. Orthogonal Convolutional Neural Networks.[J]. arXiv: Computer Vision and Pattern Recognition, 2019.

@article{wang2019orthogonal,

title={Orthogonal Convolutional Neural Networks.},

author={Wang, Jiayun and Chen, Yubei and Chakraborty, Rudrasis and Yu, Stella X},

journal={arXiv: Computer Vision and Pattern Recognition },

year={2019}}

General

This paper proposes a method for orthogonalizing CNN.

main content

symbol description

\(X \in \mathbb {R}^{N \times C \times H \times W}\): Input

\(K \in \mathbb{R}^{M \times C \times k \times k} \): convolution kernel

\(Y \in \mathbb{R}^{N \times M \times H’ \times W’}\): output

\ [Y= Conv(K,X)
\]

\(Y=Conv(K,X)\)Two representations

Insert picture description here

Insert picture description here

\(Y=K\tilde{X}\)

At this time\(K\in \ mathbb{R}^{M \times Ck^2}\), each line is equivalent to a convolution kernel, \(\tilde{X} \in \mathbb{R}^{Ck^2 \times H’W’ }\), \(Y \in \mathbb{R}^{M \times H’W’}\).

\(Y=\mathcal{ K}X\)

At this time \(X \in \mathbb{R}^{CHW}\) is equivalent to stretching a picture into strips, \(\mathcal{K} \in \ mathbb{R}^{MHW’ \times CHW}\), and the inner product of each row and column is equivalent to a convolution operation, \(Y \in \mathbb{R}^{MH’W’}\).

kernel orthogonal regularization

It is equivalent to requiring \(KK^T=I\)(row orthogonal) or\(K^ TK=I\)(column orthogonal), the regular term is

\[L_{korth-row}= \|KK^T-I\|_F,\\
L_{korth-col} = \|K^TK-I\|_F.
\]

The author explained in the latest version of the paper that the two are equivalent.

orthogonal convolution

What the author expects is \(\mathcal{K}\mathcal{K}^T=I\) or\(\mathcal{K}^ T\mathcal{K}=I\).

Use \(\mathcal{K}(ihw,\cdot)\) to represent the first \((i-1) H’W’+(h -1)W’+w\) row, the corresponding \(\mathcal{K}(\cdot, ihw)\) means \((i-1) HW+(h-1)W+w\) column.

Then \(\mathcal{K}\mathcal{K}^T=I\) is equivalent to

\[\tag{5}
\langle \mathcal{ K}(ih_1w_1, \cdot), \mathcal{K}(jh_2w_2,\cdot)\rangle =
\left \{
\begin{array}{ll}
1, & (i, h_1,w_1)=(j,h_2,w_2) \\
0, & else.
\end{array} \right.
\]

\(\mathcal{ K}^T\mathcal{K}=I\) is equivalent to

\[\tag{10}
\langle \mathcal{K}(\cdot, ih_1w_1), \mathcal{ K}(\cdot, jh_2w_2)\rangle =
\left \{
\begin{array}{ll}
1, & (i,h_1,w_1)=(j,h_2,w_2) \\
0, & else.
\end{array} \right.
\]

Actually, there are a lot of redundancy in doing this, and it can be further simplified The form of.

(5) is equivalent to

\[\tag{7}
Conv(K, K, padding=P, stride=S)=I_{ r0},
\]

Where\(I_{r0}\in \mathbb{R}^{M\times M \times (2P/S+1) \times (2P/S +1)}\) only in \([i,i,\lfloor \frac{k-1}{S} \rfloor+1,\lfloor \frac{k-1}{S} \rfloor+1], i=1,\ldots, M\) is \(1\) and the rest of the elements are\(0\).

\[P= \lfloor \frac{k-1}{S} \rfloor \cdot S.
\]

The derivation process is as follows (this is really hard to write clearly):

Insert picture description here

  Insert image description here

Insert picture description here

\(\mathcal{K}^T\mathcal{K}\) In the special case of \(S=1\) special case, (10) is equivalent to

\[\tag{11}
Conv (K^T,K^T, padding=k-1, stride=1)=I_{c0},
\]

Where\(I_{c0} \in \mathbb{R}^{C \times C \times (2k-1) \times (2k-1)}\), also only in \((i,i, k,k)\) is 1, and the rest are non-zero.\(K^T \in \mathbb{R}^{C \times M \times k \times k}\) is the first of \(K\) , 2 coordinate axes for transformation.

Insert picture description here

The same

\[\min_K \|\mathcal{K}\mathcal{K}^T-I\|_F
\]

with

\[\min_K \|\mathcal{K}^T\mathcal{K}-I\| _F
\]

It is equivalent.

On the other hand, the kernel orthogonal regularization mentioned at the beginning is a necessary condition (but not sufficient) for orthogonal convolution\( KK^T=I\), \(K^TK=I\) are equivalent to:

\[Conv(K,K,padding=0)=I_{r0} \\
Conv(K^T, K^T, padding=0)=I_{c_0},
\]

Where\(I_{r0} \in \mathbb{R}^{M \times M \times 1 \times 1}\), \(I_{c0} \in \mathbb{R}^{C \times C \times 1 \times 1}\).

This article is from the internet and does not represent1024programmerPosition, please indicate the source when reprinting:https://www.1024programmer.com/orthogonal-convolutional-neural-networks/

author: admin

Previous article
Next article

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact us

181-3619-1160

Online consultation: QQ交谈

E-mail: [email protected]

Working hours: Monday to Friday, 9:00-17:30, holidays off

Follow wechat
Scan wechat and follow us

Scan wechat and follow us

Follow Weibo
Back to top
首页
微信
电话
搜索