Dynamic ReLU: 与输入相关的动态激活函数
整流线性单元(ReLU)是深度神经网络中常用的单元。到目前为止,ReLU及其推广(非参数或参数)是静态的,对所有输入样本都执行相同的操作。本文提出了一种动态整流器DY-ReLU,它的参数由所有输入元素的超函数产生。DY-ReLU的关键观点是将全局上下文编码为超函数,并相应地调整分段线性激活函数。与静态神经网络相比,DY-ReLU的额外计算量可以忽略不计,但表示能力显著提高,特别是对于轻量级神经网络
★★★ 本文源自AlStudio社区精品项目,【点击此处】查看更多精品内容 >>>
Dynamic ReLU: 与输入相关的动态激活函数
摘要
整流线性单元(ReLU)是深度神经网络中常用的单元。 到目前为止,ReLU及其推广(非参数或参数)是静态的,对所有输入样本都执行相同的操作。 本文提出了一种动态整流器DY-ReLU,它的参数由所有输入元素的超函数产生。 DY-ReLU的关键观点是将全局上下文编码为超函数,并相应地调整分段线性激活函数。 与静态神经网络相比,DY-ReLU的额外计算量可以忽略不计,但表示能力显著提高,特别是对于轻量级神经网络。 通过对MobileNetv2简单地使用DY-ReLU,ImageNet分类的Top-1准确率从72.0%提高到76.2%,仅增加了5%的FLOPs。
1. Dynamic ReLU
Dynamic ReLU主要的创新思想来源于Dynamic Conv:Dynamic Conv是通过编码全局上下文得到与输入相关的卷积核。如图1所示,Dynamic ReLU将其推广到激活函数,由于ReLU是分段函数,那么也就对应了两个斜率,本文作者想那就编码全局信息得到这两个斜率的倍数数量,然后做运算取每一个位置的最大值。具体公式为:
y c = f θ ( x ) ( x c ) = max 1 ≤ k ≤ K { a c k ( x ) x c + b c k ( x ) } a c k ( x ) = α k + λ a Δ a c k ( x ) , b c k ( x ) = β k + λ b Δ b c k ( x ) \begin{array}{c} y_{c}=f_{\boldsymbol{\theta}(\boldsymbol{x})}\left(x_{c}\right)=\max _{1 \leq k \leq K}\left\{a_{c}^{k}(\boldsymbol{x}) x_{c}+b_{c}^{k}(\boldsymbol{x})\right\}\\ a_{c}^{k}(\boldsymbol{x})=\alpha^{k}+\lambda_{a} \Delta a_{c}^{k}(\boldsymbol{x}), b_{c}^{k}(\boldsymbol{x})=\beta^{k}+\lambda_{b} \Delta b_{c}^{k}(\boldsymbol{x}) \end{array} yc=fθ(x)(xc)=max1≤k≤K{ack(x)xc+bck(x)}ack(x)=αk+λaΔack(x),bck(x)=βk+λbΔbck(x)
根据上述表达式,本文提出了三种变体:
- 空间和通道共享的Dynamic ReLU
- 空间共享且通道不共享的Dynamic ReLU
- 空间和通道均不共享共享的Dynamic ReLU
2. 代码复现
代码修正
本文对博哥的项目重新思考神经网络的激活函数:Dynamic ReLU 复现进行修正,包括DyReLU-A以及针对Conv1D的报错进行修改,同时新增DyReLU-C(即空间和通道均不共享的DyReLU-C)。
2.1 下载并导入所需要的包
%matplotlib inline
import paddle
import paddle.fluid as fluid
import numpy as np
import matplotlib.pyplot as plt
from paddle.vision.datasets import Cifar10
from paddle.vision.transforms import Transpose
from paddle.io import Dataset, DataLoader
from paddle import nn
import paddle.nn.functional as F
import paddle.vision.transforms as transforms
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
from paddle import ParamAttr
from paddle.nn.layer.norm import _BatchNormBase
2.2 创建数据集
train_tfm = transforms.Compose([
transforms.RandomResizedCrop(32),
transforms.RandomHorizontalFlip(0.5),
transforms.ToTensor(),
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)),
])
test_tfm = transforms.Compose([
transforms.Resize((32, 32)),
transforms.ToTensor(),
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)),
])
paddle.vision.set_image_backend('cv2')
# 使用Cifar100数据集
train_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='train', transform = train_tfm, )
val_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='test',transform = test_tfm)
print("train_dataset: %d" % len(train_dataset))
print("val_dataset: %d" % len(val_dataset))
train_dataset: 50000
val_dataset: 10000
batch_size=256
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, drop_last=False, num_workers=4)
2.3 标签平滑
class LabelSmoothingCrossEntropy(nn.Layer):
def __init__(self, smoothing=0.1):
super().__init__()
self.smoothing = smoothing
def forward(self, pred, target):
confidence = 1. - self.smoothing
log_probs = F.log_softmax(pred, axis=-1)
idx = paddle.stack([paddle.arange(log_probs.shape[0]), target], axis=1)
nll_loss = paddle.gather_nd(-log_probs, index=idx)
smooth_loss = paddle.mean(-log_probs, axis=-1)
loss = confidence * nll_loss + self.smoothing * smooth_loss
return loss.mean()
2.4 ResNet-DyReLU
2.4.1 Dynamic ReLU
class DyReLU(nn.Layer):
def __init__(self, channels, reduction=4, k=2, conv_type='2d'):
super(DyReLU, self).__init__()
self.channels = channels
self.k = k
self.conv_type = conv_type
assert self.conv_type in ['1d', '2d']
self.fc1 = nn.Linear(channels, channels // reduction)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(channels // reduction, 2*k)
self.sigmoid = nn.Sigmoid()
self.register_buffer('lambdas', paddle.to_tensor([1.]*k + [0.5]*k))
self.register_buffer('init_v', paddle.to_tensor([1.] + [0.]*(2*k - 1)))
def get_relu_coefs(self, x):
theta = paddle.mean(x, axis=-1)
if self.conv_type == '2d':
theta = paddle.mean(theta, axis=-1)
theta = self.fc1(theta)
theta = self.relu(theta)
theta = self.fc2(theta)
theta = 2 * self.sigmoid(theta) - 1
return theta
def forward(self, x):
raise NotImplementedError
class DyReLUA(DyReLU):
def __init__(self, channels, reduction=8, k=2, conv_type='2d'):
super(DyReLUA, self).__init__(channels, reduction, k, conv_type)
self.fc2 = nn.Linear(channels // reduction, 2*k)
def forward(self, x):
assert x.shape[1] == self.channels
theta = self.get_relu_coefs(x)
relu_coefs = theta.reshape((-1, 2 * self.k)) * self.lambdas + self.init_v
if self.conv_type == '1d':
# BxCxL -> LxBxCx1
x_perm = x.transpose([2, 0, 1]).unsqueeze(-1)
output = x_perm * relu_coefs[:, :self.k] + relu_coefs[:, self.k:]
# LxBxCx2 -> BxCxL
result = paddle.max(output, axis=-1).transpose([1, 2, 0])
elif self.conv_type == '2d':
# BxCxHxW -> HxWxCxBx1
x_perm = x.transpose([2, 3, 0, 1]).unsqueeze(-1)
output = x_perm * relu_coefs[:, :self.k] + relu_coefs[:, self.k:]
# HxWxCxBx2 -> BxCxHxW
result = paddle.max(output, axis=-1).transpose([2, 3, 0, 1])
return result
class DyReLUB(DyReLU):
def __init__(self, channels, reduction=8, k=2, conv_type='2d'):
super(DyReLUB, self).__init__(channels, reduction, k, conv_type)
self.fc2 = nn.Linear(channels // reduction, 2*k*channels)
def forward(self, x):
assert x.shape[1] == self.channels
theta = self.get_relu_coefs(x)
relu_coefs = theta.reshape([-1, self.channels, 2 * self.k]) * self.lambdas + self.init_v
if self.conv_type == '1d':
# BxCxL -> LxBxCx1
x_perm = x.transpose([2, 0, 1]).unsqueeze(-1)
output = x_perm * relu_coefs[:, :, :self.k] + relu_coefs[:, :, self.k:]
# LxBxCx2 -> BxCxL
result = paddle.max(output, axis=-1).transpose([1, 2, 0])
elif self.conv_type == '2d':
# BxCxHxW -> HxWxBxCx1
x_perm = x.transpose([2, 3, 0, 1]).unsqueeze(-1)
output = x_perm * relu_coefs[:, :, :self.k] + relu_coefs[:, :, self.k:]
# HxWxBxCx2 -> BxCxHxW
result = paddle.max(output, axis=-1).transpose([2, 3, 0, 1])
return result
class DyReLUC(DyReLU):
def __init__(self, channels, reduction=8, k=2, conv_type='2d'):
super(DyReLUC, self).__init__(channels, reduction, k, conv_type)
self.fc2 = nn.Linear(channels // reduction, 2*k*channels)
if self.conv_type == '1d':
self.conv = nn.Conv1D(channels, 1, 1)
elif self.conv_type == '2d':
self.conv = nn.Conv2D(channels, 1, 1)
def forward(self, x):
assert x.shape[1] == self.channels
theta = self.get_relu_coefs(x)
if self.conv_type == '1d':
B, C, L = x.shape
self.spatial_theta = self.conv(x) # Bx1xL
self.spatial_theta = paddle.minimum(F.softmax(self.spatial_theta/10) * L / 3, paddle.ones_like(self.spatial_theta))
theta = theta.reshape([-1, self.channels * 2 * self.k, 1]) * self.spatial_theta
theta = theta.transpose([0, 2, 1]).reshape((-1, L, self.channels, 2 * self.k)) #BxLxCx(2k)
relu_coefs = theta * self.lambdas + self.init_v
# BxCxL -> BxLxCx1
x_perm = x.transpose([0, 2, 1]).unsqueeze(-1)
output = x_perm * relu_coefs[:, :, :, :self.k] + relu_coefs[:, :, :, self.k:]
# BxLxCx2 -> BxCxL
result = paddle.max(output, axis=-1).transpose([0, 2, 1])
elif self.conv_type == '2d':
B, C, H, W = x.shape
self.spatial_theta = self.conv(x) # Bx1xHxW
self.spatial_theta = paddle.minimum(F.softmax(self.spatial_theta/10) * H * W / 3, paddle.ones_like(self.spatial_theta))
theta = theta.reshape([-1, self.channels * 2 * self.k, 1, 1]) * self.spatial_theta
theta = theta.transpose([0, 2, 3, 1]).reshape((-1, H, W, self.channels, 2 * self.k)) #BxHxWxCx(2k)
relu_coefs = theta * self.lambdas + self.init_v
# BxCxHxW -> BxHxWxCx1
x_perm = x.transpose([0, 2, 3, 1]).unsqueeze(-1)
output = x_perm * relu_coefs[:, :, :, :,:self.k] + relu_coefs[:, :, :, :, self.k:]
# BxHxWxCx2 -> BxCxHxW
result = paddle.max(output, axis=-1).transpose([0, 3, 1, 2])
return result
model = DyReLUA(64)
paddle.summary(model, (1, 64, 32, 32))
W0314 13:27:07.097597 20083 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0314 13:27:07.101840 20083 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
---------------------------------------------------------------------------
Layer (type) Input Shape Output Shape Param #
===========================================================================
Linear-1 [[1, 64]] [1, 8] 520
ReLU-1 [[1, 8]] [1, 8] 0
Linear-3 [[1, 8]] [1, 4] 36
Sigmoid-1 [[1, 4]] [1, 4] 0
===========================================================================
Total params: 556
Trainable params: 556
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.25
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.25
---------------------------------------------------------------------------
{'total_params': 556, 'trainable_params': 556}
model = DyReLUB(64)
paddle.summary(model, (1, 64, 32, 32))
---------------------------------------------------------------------------
Layer (type) Input Shape Output Shape Param #
===========================================================================
Linear-4 [[1, 64]] [1, 8] 520
ReLU-2 [[1, 8]] [1, 8] 0
Linear-6 [[1, 8]] [1, 256] 2,304
Sigmoid-2 [[1, 256]] [1, 256] 0
===========================================================================
Total params: 2,824
Trainable params: 2,824
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.25
Forward/backward pass size (MB): 0.00
Params size (MB): 0.01
Estimated Total Size (MB): 0.26
---------------------------------------------------------------------------
{'total_params': 2824, 'trainable_params': 2824}
model = DyReLUC(64)
paddle.summary(model, (1, 64, 32, 32))
---------------------------------------------------------------------------
Layer (type) Input Shape Output Shape Param #
===========================================================================
Linear-7 [[1, 64]] [1, 8] 520
ReLU-3 [[1, 8]] [1, 8] 0
Linear-9 [[1, 8]] [1, 256] 2,304
Sigmoid-3 [[1, 256]] [1, 256] 0
Conv2D-1 [[1, 64, 32, 32]] [1, 1, 32, 32] 65
===========================================================================
Total params: 2,889
Trainable params: 2,889
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.25
Forward/backward pass size (MB): 0.01
Params size (MB): 0.01
Estimated Total Size (MB): 0.27
---------------------------------------------------------------------------
{'total_params': 2889, 'trainable_params': 2889}
2.4.2 ResNet-DyReLUC
class BasicBlock(nn.Layer):
expansion = 1
def __init__(
self,
inplanes,
planes,
stride=1,
downsample=None,
groups=1,
base_width=64,
dilation=1,
norm_layer=None,
):
super().__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2D
if dilation > 1:
raise NotImplementedError(
"Dilation > 1 not supported in BasicBlock"
)
self.conv1 = nn.Conv2D(
inplanes, planes, 3, padding=1, stride=stride, bias_attr=False
)
self.bn1 = norm_layer(planes)
self.relu1 = DyReLUC(planes)
self.conv2 = nn.Conv2D(planes, planes, 3, padding=1, bias_attr=False)
self.bn2 = norm_layer(planes)
self.relu2 = DyReLUC(planes)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu1(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu2(out)
return out
class BottleneckBlock(nn.Layer):
expansion = 4
def __init__(
self,
inplanes,
planes,
stride=1,
downsample=None,
groups=1,
base_width=64,
dilation=1,
norm_layer=None,
):
super().__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2D
width = int(planes * (base_width / 64.0)) * groups
self.conv1 = nn.Conv2D(inplanes, width, 1, bias_attr=False)
self.bn1 = norm_layer(width)
self.conv2 = nn.Conv2D(
width,
width,
3,
padding=dilation,
stride=stride,
groups=groups,
dilation=dilation,
bias_attr=False,
)
self.bn2 = norm_layer(width)
self.conv3 = nn.Conv2D(
width, planes * self.expansion, 1, bias_attr=False
)
self.bn3 = norm_layer(planes * self.expansion)
self.relu1 = DyReLUC(width)
self.relu2 = DyReLUC(width)
self.relu3 = DyReLUC(planes * self.expansion)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu1(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu2(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu3(out)
return out
class ResNet(nn.Layer):
"""ResNet model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
Args:
Block (BasicBlock|BottleneckBlock): Block module of model.
depth (int, optional): Layers of ResNet, Default: 50.
width (int, optional): Base width per convolution group for each convolution block, Default: 64.
num_classes (int, optional): Output num_features of last fc layer. If num_classes <= 0, last fc layer
will not be defined. Default: 1000.
with_pool (bool, optional): Use pool before the last fc layer or not. Default: True.
groups (int, optional): Number of groups for each convolution block, Default: 1.
Returns:
:ref:`api_paddle_nn_Layer`. An instance of ResNet model.
Examples:
.. code-block:: python
import paddle
from paddle.vision.models import ResNet
from paddle.vision.models.resnet import BottleneckBlock, BasicBlock
# build ResNet with 18 layers
resnet18 = ResNet(BasicBlock, 18)
# build ResNet with 50 layers
resnet50 = ResNet(BottleneckBlock, 50)
# build Wide ResNet model
wide_resnet50_2 = ResNet(BottleneckBlock, 50, width=64*2)
# build ResNeXt model
resnext50_32x4d = ResNet(BottleneckBlock, 50, width=4, groups=32)
x = paddle.rand([1, 3, 224, 224])
out = resnet18(x)
print(out.shape)
# [1, 1000]
"""
def __init__(
self,
block,
depth=50,
width=64,
num_classes=1000,
with_pool=True,
groups=1,
):
super().__init__()
layer_cfg = {
18: [2, 2, 2, 2],
34: [3, 4, 6, 3],
50: [3, 4, 6, 3],
101: [3, 4, 23, 3],
152: [3, 8, 36, 3],
}
layers = layer_cfg[depth]
self.groups = groups
self.base_width = width
self.num_classes = num_classes
self.with_pool = with_pool
self._norm_layer = nn.BatchNorm2D
self.inplanes = 64
self.dilation = 1
self.conv1 = nn.Conv2D(
3,
self.inplanes,
kernel_size=3,
stride=1,
padding=1,
bias_attr=False,
)
self.bn1 = self._norm_layer(self.inplanes)
self.relu = DyReLUC(self.inplanes)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
if with_pool:
self.avgpool = nn.AdaptiveAvgPool2D((1, 1))
if num_classes > 0:
self.fc = nn.Linear(512 * block.expansion, num_classes)
def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
norm_layer = self._norm_layer
downsample = None
previous_dilation = self.dilation
if dilate:
self.dilation *= stride
stride = 1
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2D(
self.inplanes,
planes * block.expansion,
1,
stride=stride,
bias_attr=False,
),
norm_layer(planes * block.expansion),
)
layers = []
layers.append(
block(
self.inplanes,
planes,
stride,
downsample,
self.groups,
self.base_width,
previous_dilation,
norm_layer,
)
)
self.inplanes = planes * block.expansion
for _ in range(1, blocks):
layers.append(
block(
self.inplanes,
planes,
groups=self.groups,
base_width=self.base_width,
norm_layer=norm_layer,
)
)
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
if self.with_pool:
x = self.avgpool(x)
if self.num_classes > 0:
x = paddle.flatten(x, 1)
x = self.fc(x)
return x
model = ResNet(BasicBlock, depth=18, num_classes=10)
paddle.summary(model, (1, 3, 32, 32))
2.5 训练
learning_rate = 0.1
n_epochs = 50
paddle.seed(42)
np.random.seed(42)
work_path = 'work/model'
model = ResNet(BasicBlock, depth=18, num_classes=10)
criterion = LabelSmoothingCrossEntropy()
scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=learning_rate, T_max=50000 // batch_size * n_epochs, verbose=False)
optimizer = paddle.optimizer.Momentum(parameters=model.parameters(), learning_rate=scheduler, weight_decay=5e-4)
gate = 0.0
threshold = 0.0
best_acc = 0.0
val_acc = 0.0
loss_record = {'train': {'loss': [], 'iter': []}, 'val': {'loss': [], 'iter': []}} # for recording loss
acc_record = {'train': {'acc': [], 'iter': []}, 'val': {'acc': [], 'iter': []}} # for recording accuracy
loss_iter = 0
acc_iter = 0
for epoch in range(n_epochs):
# ---------- Training ----------
model.train()
train_num = 0.0
train_loss = 0.0
val_num = 0.0
val_loss = 0.0
accuracy_manager = paddle.metric.Accuracy()
val_accuracy_manager = paddle.metric.Accuracy()
print("#===epoch: {}, lr={:.10f}===#".format(epoch, optimizer.get_lr()))
for batch_id, data in enumerate(train_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
logits = model(x_data)
loss = criterion(logits, y_data)
acc = paddle.metric.accuracy(logits, labels)
accuracy_manager.update(acc)
if batch_id % 10 == 0:
loss_record['train']['loss'].append(loss.numpy())
loss_record['train']['iter'].append(loss_iter)
loss_iter += 1
loss.backward()
optimizer.step()
optimizer.clear_grad()
train_loss += loss
train_num += len(y_data)
scheduler.step()
total_train_loss = (train_loss / train_num) * batch_size
train_acc = accuracy_manager.accumulate()
acc_record['train']['acc'].append(train_acc)
acc_record['train']['iter'].append(acc_iter)
acc_iter += 1
# Print the information.
print("#===epoch: {}, train loss is: {}, train acc is: {:2.2f}%===#".format(epoch, total_train_loss.numpy(), train_acc*100))
# ---------- Validation ----------
model.eval()
for batch_id, data in enumerate(val_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
with paddle.no_grad():
logits = model(x_data)
loss = criterion(logits, y_data)
acc = paddle.metric.accuracy(logits, labels)
val_accuracy_manager.update(acc)
val_loss += loss
val_num += len(y_data)
total_val_loss = (val_loss / val_num) * batch_size
loss_record['val']['loss'].append(total_val_loss.numpy())
loss_record['val']['iter'].append(loss_iter)
val_acc = val_accuracy_manager.accumulate()
acc_record['val']['acc'].append(val_acc)
acc_record['val']['iter'].append(acc_iter)
print("#===epoch: {}, val loss is: {}, val acc is: {:2.2f}%===#".format(epoch, total_val_loss.numpy(), val_acc*100))
# ===================save====================
if val_acc > best_acc:
best_acc = val_acc
paddle.save(model.state_dict(), os.path.join(work_path, 'best_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'best_optimizer.pdopt'))
print(best_acc)
paddle.save(model.state_dict(), os.path.join(work_path, 'final_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'final_optimizer.pdopt'))
2.6 实验结果
def plot_learning_curve(record, title='loss', ylabel='CE Loss'):
''' Plot learning curve of your CNN '''
maxtrain = max(map(float, record['train'][title]))
maxval = max(map(float, record['val'][title]))
ymax = max(maxtrain, maxval) * 1.1
mintrain = min(map(float, record['train'][title]))
minval = min(map(float, record['val'][title]))
ymin = min(mintrain, minval) * 0.9
total_steps = len(record['train'][title])
x_1 = list(map(int, record['train']['iter']))
x_2 = list(map(int, record['val']['iter']))
figure(figsize=(10, 6))
plt.plot(x_1, record['train'][title], c='tab:red', label='train')
plt.plot(x_2, record['val'][title], c='tab:cyan', label='val')
plt.ylim(ymin, ymax)
plt.xlabel('Training steps')
plt.ylabel(ylabel)
plt.title('Learning curve of {}'.format(title))
plt.legend()
plt.show()
plot_learning_curve(loss_record, title='loss', ylabel='CE Loss')
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/cbook/__init__.py:2349: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
if isinstance(obj, collections.Iterator):
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/cbook/__init__.py:2366: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
return list(data) if isinstance(data, collections.MappingView) else data
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-2cqewWc4-1681385693824)(main_files/main_30_1.png)]
plot_learning_curve(acc_record, title='acc', ylabel='Accuracy')
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lkzPG0BB-1681385693824)(main_files/main_31_0.png)]
import time
work_path = 'work/model'
model = ResNet(BasicBlock, depth=18, num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
aa = time.time()
for batch_id, data in enumerate(val_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
with paddle.no_grad():
logits = model(x_data)
bb = time.time()
print("Throughout:{}".format(int(len(val_dataset)//(bb - aa))))
Throughout:2900
3. ResNet
3.1 ResNet
model = paddle.vision.models.resnet18(num_classes=10)
model.conv1 = nn.Conv2D(3, 64, 3, padding=1, bias_attr=False)
model.maxpool = nn.Identity()
paddle.summary(model, (1, 3, 128, 128))
3.2 训练
learning_rate = 0.1
n_epochs = 50
paddle.seed(42)
np.random.seed(42)
work_path = 'work/model1'
model = paddle.vision.models.resnet18(num_classes=10)
model.conv1 = nn.Conv2D(3, 64, 3, padding=1, bias_attr=False)
model.maxpool = nn.Identity()
criterion = LabelSmoothingCrossEntropy()
scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=learning_rate, T_max=50000 // batch_size * n_epochs, verbose=False)
optimizer = paddle.optimizer.Momentum(parameters=model.parameters(), learning_rate=scheduler, weight_decay=5e-4)
gate = 0.0
threshold = 0.0
best_acc = 0.0
val_acc = 0.0
loss_record1 = {'train': {'loss': [], 'iter': []}, 'val': {'loss': [], 'iter': []}} # for recording loss
acc_record1 = {'train': {'acc': [], 'iter': []}, 'val': {'acc': [], 'iter': []}} # for recording accuracy
loss_iter = 0
acc_iter = 0
for epoch in range(n_epochs):
# ---------- Training ----------
model.train()
train_num = 0.0
train_loss = 0.0
val_num = 0.0
val_loss = 0.0
accuracy_manager = paddle.metric.Accuracy()
val_accuracy_manager = paddle.metric.Accuracy()
print("#===epoch: {}, lr={:.10f}===#".format(epoch, optimizer.get_lr()))
for batch_id, data in enumerate(train_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
logits = model(x_data)
loss = criterion(logits, y_data)
acc = paddle.metric.accuracy(logits, labels)
accuracy_manager.update(acc)
if batch_id % 10 == 0:
loss_record1['train']['loss'].append(loss.numpy())
loss_record1['train']['iter'].append(loss_iter)
loss_iter += 1
loss.backward()
optimizer.step()
optimizer.clear_grad()
train_loss += loss
train_num += len(y_data)
scheduler.step()
total_train_loss = (train_loss / train_num) * batch_size
train_acc = accuracy_manager.accumulate()
acc_record1['train']['acc'].append(train_acc)
acc_record1['train']['iter'].append(acc_iter)
acc_iter += 1
# Print the information.
print("#===epoch: {}, train loss is: {}, train acc is: {:2.2f}%===#".format(epoch, total_train_loss.numpy(), train_acc*100))
# ---------- Validation ----------
model.eval()
for batch_id, data in enumerate(val_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
with paddle.no_grad():
logits = model(x_data)
loss = criterion(logits, y_data)
acc = paddle.metric.accuracy(logits, labels)
val_accuracy_manager.update(acc)
val_loss += loss
val_num += len(y_data)
total_val_loss = (val_loss / val_num) * batch_size
loss_record1['val']['loss'].append(total_val_loss.numpy())
loss_record1['val']['iter'].append(loss_iter)
val_acc = val_accuracy_manager.accumulate()
acc_record1['val']['acc'].append(val_acc)
acc_record1['val']['iter'].append(acc_iter)
print("#===epoch: {}, val loss is: {}, val acc is: {:2.2f}%===#".format(epoch, total_val_loss.numpy(), val_acc*100))
# ===================save====================
if val_acc > best_acc:
best_acc = val_acc
paddle.save(model.state_dict(), os.path.join(work_path, 'best_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'best_optimizer.pdopt'))
print(best_acc)
paddle.save(model.state_dict(), os.path.join(work_path, 'final_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'final_optimizer.pdopt'))
3.3 实验结果
plot_learning_curve(loss_record1, title='loss', ylabel='CE Loss')
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-H6fXIkE6-1681385693826)(main_files/main_42_0.png)]
plot_learning_curve(acc_record1, title='acc', ylabel='Accuracy')
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-dc7sSK5C-1681385693826)(main_files/main_43_0.png)]
##### import time
work_path = 'work/model1'
model = paddle.vision.models.resnet18(num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
aa = time.time()
for batch_id, data in enumerate(val_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
with paddle.no_grad():
logits = model(x_data)
bb = time.time()
print("Throughout:{}".format(int(len(val_dataset)//(bb - aa))))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1517: UserWarning: Skip loading for conv1.weight. conv1.weight receives a shape [64, 3, 3, 3], but the expected shape is [64, 3, 7, 7].
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
Throughout:6641
loading for {}. ".format(key) + str(err)))
Throughout:6641
4. 对比实验结果
Model | Train Acc | Val Acc | Parameter |
---|---|---|---|
ResNet18 w/o DyReLU | 0.8978 | 0.9213 | 11183562 |
ResNet18 w DyReLUA | 0.8842 | 0.8759 | 11360662 |
ResNet18 w DyReLUB | 0.8986 | 0.9101 | 12072626 |
ResNet18 w DyReLUC | 0.9054 | 0.9245 | 12076547 |
DyReLUA和DyReLUB分别见DyReLUA.ipynb和DyReLUB.ipynb
总结
本文提出了三种Dynamic ReLU,在CIFAR10上可能由于数据集太小,导致前两个低于原始精度,最后的DyReLUC提高了0.32%。
参考资料
论文:Dynamic ReLU(ECCV 2020)
代码:Islanna/DynamicReLU(非官方)
项目:重新思考神经网络的激活函数:Dynamic ReLU 复现(对该项目进行修正并新增了DyReLUC)
更多推荐
所有评论(0)