1024programmer Blog Convolutional Neural Network Model_Kevin Shark’s Blog

Convolutional Neural Network Model_Kevin Shark’s Blog

class=”markdown_views prism-atom-one-dark”>

Convolutional Neural Network Model

Convolutional Neural Network (LeNet)

Model structure: convolutional layer block, fully connected layer block

  • Convolutional layer block: 2 convolutional layers + max pooling layers. Since LeNet is an earlier CNN, there will be a sigmod layer after each convolutional layer + pooling layer to correct the output result. Now, Relu is more used.
  • Fully connected layer block: the input is a two-dimensional vector. When the output of a single convolution layer block is passed to the fully connected layer, each sample will be flattened (flatten) in a small batch

LeNet will gradually decrease in width and increase in channels as the network deepens.

Deep Convolutional Neural Network (AlexNet)

Model structure: 5 layers of convolution + 2 layers of fully connected hidden layer + 1 layer of fully connected output layer

  • Convolution layer: The first two use 11×11 and 5×5 convolution kernels, and the rest are 3×3 convolution kernels. The first, second, and fifth convolutional layers all use a 3×3 max pooling layer with a stride of 2.
  • Fully connected layer: 2 fully connected layers with 4096 outputs carry nearly 1GB of model parameters.
  • Activation function: AlexNet uses the Relu activation function. Compared to sigmod, Relu has simpler computation and is easier to train with different initializations. For example, under some special initializations, the output of sigmod in the positive interval is extremely close to 0, which will make it difficult for the model to continue updating, while the value of Relu in the positive interval is always 1.
  • Overfitting: AlexNet uses the discarding method to control model complexity and prevent overfitting. And it uses a lot of image augmentation, including flipping, cropping, changing colors, etc., to further prevent overfitting.

Network using repeating elements (VGG)

Model structure: VGG block + fully connected layer block

  • VGG block: convolutional layer + pooling layer, the convolutional layer is connected with a padding of 1, a 3×3 convolution kernel connected to a maximum pooling layer with a stride of 2 and a window of 2×2
  • Fully connected layer block: similar to LeNet

VGG is a very symmetrical network, each layer increases or decreases exponentially. Compared with AlexNet, it provides a simple and fixed construction idea of ​​convolution model and depth model.

Networks of Networks (NiN)

Model structure: NiN block

  • NiN block: AlexNet is a structure that uses multiple convolutional layers + fully connected layer output. NiN proposes another way of thinking, which uses small blocks of convolutional layer + “full connection” layerConnected in series to form a network. Since the fully connected layer is two-dimensional and the convolutional layer is generally four-dimensional, the NiN block uses a 1×1 convolutional layer instead of a fully connected layer (each element in the spatial dimension (height and width) is equivalent to a sample, and the channel is equivalent to on features). Each convolutional layer is similar to AlexNet, 11×11, 5×5, 3×3. And each NiN block is followed by a maximum pooling layer with a stride of 2 and a window size of 3×3.

Compared to AlexNet, NiN removes the last 3 fully connected layers, uses a NiN block whose output channel is equal to the label category, and then uses a global average pooling layer to average all elements in each channel and directly use it for classification. This benefit is that the model parameter size can be significantly reduced, but it will increase the training time.

Network with parallel connections (GoogLeNet)

  • Inception block: The basic block of GoogLeNet, which draws on the idea of ​​NiN’s network series network. Include 4 parallel lines in each Inception block. The first three lines use 1×1, 3×3, and 5×5 convolutional layers to extract feature information at different spatial scales. In the second and third lines of the interim, 1×1 convolutional layers are used to reduce the number of input channels and reduce the complexity of the model. . The last one uses a 3×3 max pooling layer followed by a 1×1 convolutional layer to change the number of channels. Appropriate padding is applied to all 4 lines to ensure that the height and width of the input and output are consistent.

Residual network (ResNet)

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-XmZToNSs-1649515875246)(https://d2l.ai/_images/resnet-block.svg) ]

  • Residual block: Generally speaking, the input to the activation function is the output of the calculation of the neural network layer by layerAs a result, gradient instability (gradient explosion, gradient disappearance) is prone to occur due to the continuous deepening of the network. As the network gradually deepens, the error will not become smaller and smaller. The purpose of the residual block is to solve the gradient instability. It makes the output result need to refer to the input result through a skip connection.

  • The principle of residual block: a [ l + 2 ] = g ( z [ l + 2 ] + a [ l ] ) = g ( w [ l + 2 ] a [ l + 1 ] + b [ l + 2 ] a [ l ] ) a^{[l+2]}=g(z^{ [l+2]}+a^{[l]})=g(w^{[l+2]}a^{[l+1]} + b^{[l+2]}a^{[ l]}) a[l+22]=g(z[l+2]+ a[l])=g(w[l+2] a[l+1]+b[l+2]a[l]) We are not considering b [ l + 2 ] b^{[l+2]} b[l+2], when the gradient disappears, w [ l + 2 ] = 0 w^{[l+2]}=0 w[l+2]=0, this When a [ l + 2 ] = g ( a [ l ] ) a^{[l+ 2]}=g(a^{[l]}) a[l+2] =+b[l+2]a[l]) We are not currently considering b [ l + 2 ] b^{[l+2]} b[l+2], when the gradient disappears, w [ l + 2 ] = 0 w^{[l+2]}=0 w [l+2] =0, at this time a [ l + 2 ] = g ( a [ l ] ) a^{[l+2]}=g(a^{[l]}) a [l+2]=g(a[l] ) is equivalent to outputting the output of the first layer directly through Relu. There will be no negative impact due to gradient disappearance.

Densely connected network (DenseNet)

Model structure: dense layer + transition layer

  • Dense layer: DenseNet and ResNet are very similar, the difference is that DenseNet does not directly add the output of the previous module to the output of the module like ResNet, but directly superimposes on the channel
  • Transition layer: In order to prevent the number of channels from being superimposed and cause the model complexity to be too large, the transition layer reduces the number of channels by using a 1×1 convolutional layer, and uses an average pooling layer with a stride of 2 to halve the height and width. Reduce complexity.

ight: 0.03588em;”>g(a[l] ), which is equivalent to outputting the output of the first layer directly through Relu. It will not have a negative impact due to the disappearance of the gradient.

Densely connected network (DenseNet)

Model structure: dense layer + transition layer

  • Dense layer: DenseNet and ResNet are very similar, the difference is that DenseNet does not directly add the output of the previous module to the output of the module like ResNet, but directly superimposes on the channel
  • Transition layer: In order to prevent the number of channels from being superimposed and cause the model complexity to be too large, the transition layer reduces the number of channels by using a 1×1 convolutional layer, and uses an average pooling layer with a stride of 2 to halve the height and width. Reduce complexity.

This article is from the internet and does not represent1024programmerPosition, please indicate the source when reprinting:https://www.1024programmer.com/convolutional-neural-network-model_kevin-sharks-blog/

author: admin

Previous article
Next article

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact us

181-3619-1160

Online consultation: QQ交谈

E-mail: 34331943@QQ.com

Working hours: Monday to Friday, 9:00-17:30, holidays off

Follow wechat
Scan wechat and follow us

Scan wechat and follow us

Follow Weibo
Back to top
首页
微信
电话
搜索