ResNet 顯著改變了如何在深度網(wǎng)絡(luò)中參數(shù)化函數(shù)的觀點(diǎn)。DenseNet(密集卷積網(wǎng)絡(luò))在某種程度上是對此的邏輯延伸 (Huang et al. , 2017)。DenseNet 的特點(diǎn)是每一層都連接到所有前面的層的連接模式和連接操作(而不是 ResNet 中的加法運(yùn)算符)以保留和重用早期層的特征。要了解如何得出它,讓我們稍微繞道數(shù)學(xué)。
import torch from torch import nn from d2l import torch as d2l
from mxnet import init, np, npx from mxnet.gluon import nn from d2l import mxnet as d2l npx.set_np()
import jax from flax import linen as nn from jax import numpy as jnp from d2l import jax as d2l
import tensorflow as tf from d2l import tensorflow as d2l
8.7.1. 從 ResNet 到 DenseNet
回憶一下函數(shù)的泰勒展開式。對于這一點(diǎn)x=0 它可以寫成
(8.7.1)f(x)=f(0)+x?[f′(0)+x?[f″(0)2!+x?[f?(0)3!+…]]].
關(guān)鍵是它將函數(shù)分解為越來越高階的項(xiàng)。同樣,ResNet 將函數(shù)分解為
(8.7.2)f(x)=x+g(x).
也就是說,ResNet分解f分為一個簡單的線性項(xiàng)和一個更復(fù)雜的非線性項(xiàng)。如果我們想捕獲(不一定要添加)兩個術(shù)語以外的信息怎么辦?一種這樣的解決方案是 DenseNet (Huang等人,2017 年)。
圖 8.7.1 ResNet(左)和 DenseNet(右)在跨層連接中的主要區(qū)別:加法的使用和連接的使用。
如圖 8.7.1所示,ResNet 和 DenseNet 的主要區(qū)別在于后者的輸出是 連接的(表示為[,]) 而不是添加。結(jié)果,我們從x在應(yīng)用越來越復(fù)雜的函數(shù)序列后,它的值:
(8.7.3)x→[x,f1(x),f2([x,f1(x)]),f3([x,f1(x),f2([x,f1(x)])]),…].
最后,將所有這些功能組合在 MLP 中,再次減少特征數(shù)量。就實(shí)現(xiàn)而言,這非常簡單:我們不是添加術(shù)語,而是將它們連接起來。DenseNet 這個名字源于變量之間的依賴圖變得非常密集這一事實(shí)。這種鏈的最后一層與前面的所有層緊密相連。密集連接如圖 8.7.2所示 。
圖 8.7.2 DenseNet 中的密集連接。注意維度如何隨著深度增加。
構(gòu)成 DenseNet 的主要組件是密集塊和 過渡層。前者定義輸入和輸出如何連接,而后者控制通道的數(shù)量,使其不會太大,因?yàn)閿U(kuò)展 x→[x,f1(x),f2([x,f1(x)]),…] 可以是相當(dāng)高維的。
8.7.2. 密集塊
DenseNet 使用改進(jìn)的 ResNet 的“批量歸一化、激活和卷積”結(jié)構(gòu)(參見第 8.6 節(jié)中的練習(xí) )。首先,我們實(shí)現(xiàn)這個卷積塊結(jié)構(gòu)。
def conv_block(num_channels): return nn.Sequential( nn.LazyBatchNorm2d(), nn.ReLU(), nn.LazyConv2d(num_channels, kernel_size=3, padding=1))
def conv_block(num_channels): blk = nn.Sequential() blk.add(nn.BatchNorm(), nn.Activation('relu'), nn.Conv2D(num_channels, kernel_size=3, padding=1)) return blk
class ConvBlock(nn.Module): num_channels: int training: bool = True @nn.compact def __call__(self, X): Y = nn.relu(nn.BatchNorm(not self.training)(X)) Y = nn.Conv(self.num_channels, kernel_size=(3, 3), padding=(1, 1))(Y) Y = jnp.concatenate((X, Y), axis=-1) return Y
class ConvBlock(tf.keras.layers.Layer): def __init__(self, num_channels): super(ConvBlock, self).__init__() self.bn = tf.keras.layers.BatchNormalization() self.relu = tf.keras.layers.ReLU() self.conv = tf.keras.layers.Conv2D( filters=num_channels, kernel_size=(3, 3), padding='same') self.listLayers = [self.bn, self.relu, self.conv] def call(self, x): y = x for layer in self.listLayers.layers: y = layer(y) y = tf.keras.layers.concatenate([x,y], axis=-1) return y
密集塊由多個卷積塊組成,每個卷積塊使用相同數(shù)量的輸出通道。然而,在前向傳播中,我們在通道維度上連接每個卷積塊的輸入和輸出。惰性評估允許我們自動調(diào)整維度。
class DenseBlock(nn.Module): def __init__(self, num_convs, num_channels): super(DenseBlock, self).__init__() layer = [] for i in range(num_convs): layer.append(conv_block(num_channels)) self.net = nn.Sequential(*layer) def forward(self, X): for blk in self.net: Y = blk(X) # Concatenate input and output of each block along the channels X = torch.cat((X, Y), dim=1) return X
class DenseBlock(nn.Block): def __init__(self, num_convs, num_channels): super().__init__() self.net = nn.Sequential() for _ in range(num_convs): self.net.add(conv_block(num_channels)) def forward(self, X): for blk in self.net: Y = blk(X) # Concatenate input and output of each block along the channels X = np.concatenate((X, Y), axis=1) return X
class DenseBlock(nn.Module): num_convs: int num_channels: int training: bool = True def setup(self): layer = [] for i in range(self.num_convs): layer.append(ConvBlock(self.num_channels, self.training)) self.net = nn.Sequential(layer) def __call__(self, X): return self.net(X)
class DenseBlock(tf.keras.layers.Layer): def __init__(self, num_convs, num_channels): super(DenseBlock, self).__init__() self.listLayers = [] for _ in range(num_convs): self.listLayers.append(ConvBlock(num_channels)) def call(self, x): for layer in self.listLayers.layers: x = layer(x) return x
在下面的示例中,我們定義了一個DenseBlock具有 10 個輸出通道的 2 個卷積塊的實(shí)例。當(dāng)使用 3 個通道的輸入時,我們將得到一個輸出3+10+10=23渠道。卷積塊通道數(shù)控制輸出通道數(shù)相對于輸入通道數(shù)的增長。這也稱為增長率。
blk = DenseBlock(2, 10) X = torch.randn(4, 3, 8, 8) Y = blk(X) Y.shape
torch.Size([4, 23, 8, 8])
blk = DenseBlock(2, 10) X = np.random.uniform(size=(4, 3, 8, 8)) blk.initialize() Y = blk(X) Y.shape
(4, 23, 8, 8)
blk = DenseBlock(2, 10) X = jnp.zeros((4, 8, 8, 3)) Y = blk.init_with_output(d2l.get_key(), X)[0] Y.shape
(4, 8, 8, 23)
blk = DenseBlock(2, 10) X = tf.random.uniform((4, 8, 8, 3)) Y = blk(X) Y.shape
TensorShape([4, 8, 8, 23])
8.7.3. 過渡層
由于每個密集塊都會增加通道的數(shù)量,因此添加太多通道會導(dǎo)致模型過于復(fù)雜。過渡層用于控制模型的復(fù)雜性。它通過使用一個減少通道的數(shù)量1×1卷積。此外,它通過步幅為 2 的平均池將高度和寬度減半。
def transition_block(num_channels): return nn.Sequential( nn.LazyBatchNorm2d(), nn.ReLU(), nn.LazyConv2d(num_channels, kernel_size=1), nn.AvgPool2d(kernel_size=2, stride=2))
def transition_block(num_channels): blk = nn.Sequential() blk.add(nn.BatchNorm(), nn.Activation('relu'), nn.Conv2D(num_channels, kernel_size=1), nn.AvgPool2D(pool_size=2, strides=2)) return blk
class TransitionBlock(nn.Module): num_channels: int training: bool = True @nn.compact def __call__(self, X): X = nn.BatchNorm(not self.training)(X) X = nn.relu(X) X = nn.Conv(self.num_channels, kernel_size=(1, 1))(X) X = nn.avg_pool(X, window_shape=(2, 2), strides=(2, 2)) return X
class TransitionBlock(tf.keras.layers.Layer): def __init__(self, num_channels, **kwargs): super(TransitionBlock, self).__init__(**kwargs) self.batch_norm = tf.keras.layers.BatchNormalization() self.relu = tf.keras.layers.ReLU() self.conv = tf.keras.layers.Conv2D(num_channels, kernel_size=1) self.avg_pool = tf.keras.layers.AvgPool2D(pool_size=2, strides=2) def call(self, x): x = self.batch_norm(x) x = self.relu(x) x = self.conv(x) return self.avg_pool(x)
將具有 10 個通道的過渡層應(yīng)用于前面示例中的密集塊的輸出。這將輸出通道的數(shù)量減少到 10,并將高度和寬度減半。
blk = transition_block(10) blk(Y).shape
torch.Size([4, 10, 4, 4])
blk = transition_block(10) blk.initialize() blk(Y).shape
(4, 10, 4, 4)
blk = TransitionBlock(10) blk.init_with_output(d2l.get_key(), Y)[0].shape
(4, 4, 4, 10)
blk = TransitionBlock(10) blk(Y).shape
TensorShape([4, 4, 4, 10])
8.7.4. DenseNet 模型
接下來,我們將構(gòu)建一個 DenseNet 模型。DenseNet 首先使用與 ResNet 中相同的單卷積層和最大池化層。
class DenseNet(d2l.Classifier): def b1(self): return nn.Sequential( nn.LazyConv2d(64, kernel_size=7, stride=2, padding=3), nn.LazyBatchNorm2d(), nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
class DenseNet(d2l.Classifier): def b1(self): net = nn.Sequential() net.add(nn.Conv2D(64, kernel_size=7, strides=2, padding=3), nn.BatchNorm(), nn.Activation('relu'), nn.MaxPool2D(pool_size=3, strides=2, padding=1)) return net
class DenseNet(d2l.Classifier): num_channels: int = 64 growth_rate: int = 32 arch: tuple = (4, 4, 4, 4) lr: float = 0.1 num_classes: int = 10 training: bool = True def setup(self): self.net = self.create_net() def b1(self): return nn.Sequential([ nn.Conv(64, kernel_size=(7, 7), strides=(2, 2), padding='same'), nn.BatchNorm(not self.training), nn.relu, lambda x: nn.max_pool(x, window_shape=(3, 3), strides=(2, 2), padding='same') ])
class DenseNet(d2l.Classifier): def b1(self): return tf.keras.models.Sequential([ tf.keras.layers.Conv2D( 64, kernel_size=7, strides=2, padding='same'), tf.keras.layers.BatchNormalization(), tf.keras.layers.ReLU(), tf.keras.layers.MaxPool2D( pool_size=3, strides=2, padding='same')])
然后,類似于 ResNet 使用的由殘差塊組成的四個模塊,DenseNet 使用四個密集塊。與 ResNet 類似,我們可以設(shè)置每個密集塊中使用的卷積層數(shù)。這里,我們設(shè)置為4,與8.6節(jié)中的ResNet-18模型一致。此外,我們將密集塊中卷積層的通道數(shù)(即增長率)設(shè)置為 32,因此每個密集塊將添加 128 個通道。
在 ResNet 中,每個模塊之間的高度和寬度通過步長為 2 的殘差塊減少。這里,我們使用過渡層將高度和寬度減半,并將通道數(shù)減半。與 ResNet 類似,在最后連接一個全局池化層和一個全連接層以產(chǎn)生輸出。
@d2l.add_to_class(DenseNet) def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4), lr=0.1, num_classes=10): super(DenseNet, self).__init__() self.save_hyperparameters() self.net = nn.Sequential(self.b1()) for i, num_convs in enumerate(arch): self.net.add_module(f'dense_blk{i+1}', DenseBlock(num_convs, growth_rate)) # The number of output channels in the previous dense block num_channels += num_convs * growth_rate # A transition layer that halves the number of channels is added # between the dense blocks if i != len(arch) - 1: num_channels //= 2 self.net.add_module(f'tran_blk{i+1}', transition_block( num_channels)) self.net.add_module('last', nn.Sequential( nn.LazyBatchNorm2d(), nn.ReLU(), nn.AdaptiveAvgPool2d((1, 1)), nn.Flatten(), nn.LazyLinear(num_classes))) self.net.apply(d2l.init_cnn)
@d2l.add_to_class(DenseNet) def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4), lr=0.1, num_classes=10): super(DenseNet, self).__init__() self.save_hyperparameters() self.net = nn.Sequential() self.net.add(self.b1()) for i, num_convs in enumerate(arch): self.net.add(DenseBlock(num_convs, growth_rate)) # The number of output channels in the previous dense block num_channels += num_convs * growth_rate # A transition layer that halves the number of channels is added # between the dense blocks if i != len(arch) - 1: num_channels //= 2 self.net.add(transition_block(num_channels)) self.net.add(nn.BatchNorm(), nn.Activation('relu'), nn.GlobalAvgPool2D(), nn.Dense(num_classes)) self.net.initialize(init.Xavier())
@d2l.add_to_class(DenseNet) def create_net(self): net = self.b1() for i, num_convs in enumerate(self.arch): net.layers.extend([DenseBlock(num_convs, self.growth_rate, training=self.training)]) # The number of output channels in the previous dense block num_channels = self.num_channels + (num_convs * self.growth_rate) # A transition layer that halves the number of channels is added # between the dense blocks if i != len(self.arch) - 1: num_channels //= 2 net.layers.extend([TransitionBlock(num_channels, training=self.training)]) net.layers.extend([ nn.BatchNorm(not self.training), nn.relu, lambda x: nn.avg_pool(x, window_shape=x.shape[1:3], strides=x.shape[1:3], padding='valid'), lambda x: x.reshape((x.shape[0], -1)), nn.Dense(self.num_classes) ]) return net
@d2l.add_to_class(DenseNet) def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4), lr=0.1, num_classes=10): super(DenseNet, self).__init__() self.save_hyperparameters() self.net = tf.keras.models.Sequential(self.b1()) for i, num_convs in enumerate(arch): self.net.add(DenseBlock(num_convs, growth_rate)) # The number of output channels in the previous dense block num_channels += num_convs * growth_rate # A transition layer that halves the number of channels is added # between the dense blocks if i != len(arch) - 1: num_channels //= 2 self.net.add(TransitionBlock(num_channels)) self.net.add(tf.keras.models.Sequential([ tf.keras.layers.BatchNormalization(), tf.keras.layers.ReLU(), tf.keras.layers.GlobalAvgPool2D(), tf.keras.layers.Flatten(), tf.keras.layers.Dense(num_classes)]))
8.7.5. 訓(xùn)練
由于我們在這里使用更深的網(wǎng)絡(luò),在本節(jié)中,我們將輸入的高度和寬度從 224 減少到 96 以簡化計(jì)算。
model = DenseNet(lr=0.01) trainer = d2l.Trainer(max_epochs=10, num_gpus=1) data = d2l.FashionMNIST(batch_size=128, resize=(96, 96)) trainer.fit(model, data)
model = DenseNet(lr=0.01) trainer = d2l.Trainer(max_epochs=10, num_gpus=1) data = d2l.FashionMNIST(batch_size=128, resize=(96, 96)) trainer.fit(model, data)
model = DenseNet(lr=0.01) trainer = d2l.Trainer(max_epochs=10, num_gpus=1) data = d2l.FashionMNIST(batch_size=128, resize=(96, 96)) trainer.fit(model, data)
trainer = d2l.Trainer(max_epochs=10) data = d2l.FashionMNIST(batch_size=128, resize=(96, 96)) with d2l.try_gpu(): model = DenseNet(lr=0.01) trainer.fit(model, data)
8.7.6. 總結(jié)與討論
構(gòu)成 DenseNet 的主要組件是密集塊和過渡層。對于后者,我們需要在組成網(wǎng)絡(luò)時通過添加再次縮小通道數(shù)量的過渡層來控制維數(shù)。在跨層連接方面,不同于ResNet將輸入和輸出相加,DenseNet是在通道維度上拼接輸入和輸出。雖然這些連接操作重用特征來實(shí)現(xiàn)計(jì)算效率,但不幸的是它們會導(dǎo)致大量的 GPU 內(nèi)存消耗。因此,應(yīng)用 DenseNet 可能需要更高效的內(nèi)存實(shí)現(xiàn),這可能會增加訓(xùn)練時間 (Pleiss等人,2017 年)。
8.7.7. 練習(xí)
為什么我們在過渡層使用平均池而不是最大池?
DenseNet 論文中提到的優(yōu)點(diǎn)之一是其模型參數(shù)比 ResNet 小。為什么會這樣?
DenseNet 被詬病的一個問題是它的高內(nèi)存消耗。
真的是這樣嗎?嘗試將輸入形狀更改為 224×224憑經(jīng)驗(yàn)查看實(shí)際的 GPU 內(nèi)存消耗。
你能想到減少內(nèi)存消耗的替代方法嗎?您需要如何更改框架?
實(shí)施 DenseNet 論文(Huang等人,2017 年)表 1 中提供的各種 DenseNet 版本。
應(yīng)用 DenseNet 思想設(shè)計(jì)基于 MLP 的模型。將其應(yīng)用于第 5.7 節(jié)中的房價預(yù)測任務(wù)。
-
連接網(wǎng)絡(luò)
+關(guān)注
關(guān)注
0文章
2瀏覽量
790
發(fā)布評論請先 登錄
相關(guān)推薦
評論