★★★ 本文源自AlStudio社区精品项目,【点击此处】查看更多精品内容 >>>

Dynamic ReLU: 与输入相关的动态激活函数

摘要

        整流线性单元(ReLU)是深度神经网络中常用的单元。 到目前为止,ReLU及其推广(非参数或参数)是静态的,对所有输入样本都执行相同的操作。 本文提出了一种动态整流器DY-ReLU,它的参数由所有输入元素的超函数产生。 DY-ReLU的关键观点是将全局上下文编码为超函数,并相应地调整分段线性激活函数。 与静态神经网络相比,DY-ReLU的额外计算量可以忽略不计,但表示能力显著提高,特别是对于轻量级神经网络。 通过对MobileNetv2简单地使用DY-ReLU,ImageNet分类的Top-1准确率从72.0%提高到76.2%,仅增加了5%的FLOPs。

1. Dynamic ReLU

        Dynamic ReLU主要的创新思想来源于Dynamic Conv:Dynamic Conv是通过编码全局上下文得到与输入相关的卷积核。如图1所示,Dynamic ReLU将其推广到激活函数,由于ReLU是分段函数,那么也就对应了两个斜率,本文作者想那就编码全局信息得到这两个斜率的倍数数量,然后做运算取每一个位置的最大值。具体公式为:

y c = f θ ( x ) ( x c ) = max ⁡ 1 ≤ k ≤ K { a c k ( x ) x c + b c k ( x ) } a c k ( x ) = α k + λ a Δ a c k ( x ) , b c k ( x ) = β k + λ b Δ b c k ( x ) \begin{array}{c} y_{c}=f_{\boldsymbol{\theta}(\boldsymbol{x})}\left(x_{c}\right)=\max _{1 \leq k \leq K}\left\{a_{c}^{k}(\boldsymbol{x}) x_{c}+b_{c}^{k}(\boldsymbol{x})\right\}\\ a_{c}^{k}(\boldsymbol{x})=\alpha^{k}+\lambda_{a} \Delta a_{c}^{k}(\boldsymbol{x}), b_{c}^{k}(\boldsymbol{x})=\beta^{k}+\lambda_{b} \Delta b_{c}^{k}(\boldsymbol{x}) \end{array} yc=fθ(x)(xc)=max1kK{ack(x)xc+bck(x)}ack(x)=αk+λaΔack(x),bck(x)=βk+λbΔbck(x)

        根据上述表达式,本文提出了三种变体:

  1. 空间和通道共享的Dynamic ReLU
  2. 空间共享且通道不共享的Dynamic ReLU
  3. 空间和通道均不共享共享的Dynamic ReLU

2. 代码复现

代码修正

        本文对博哥的项目重新思考神经网络的激活函数:Dynamic ReLU 复现进行修正,包括DyReLU-A以及针对Conv1D的报错进行修改,同时新增DyReLU-C(即空间和通道均不共享的DyReLU-C)。

2.1 下载并导入所需要的包

%matplotlib inline
import paddle
import paddle.fluid as fluid
import numpy as np
import matplotlib.pyplot as plt
from paddle.vision.datasets import Cifar10
from paddle.vision.transforms import Transpose
from paddle.io import Dataset, DataLoader
from paddle import nn
import paddle.nn.functional as F
import paddle.vision.transforms as transforms
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
from paddle import ParamAttr
from paddle.nn.layer.norm import _BatchNormBase

2.2 创建数据集

train_tfm = transforms.Compose([
    transforms.RandomResizedCrop(32),
    transforms.RandomHorizontalFlip(0.5),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)),
])

test_tfm = transforms.Compose([
    transforms.Resize((32, 32)),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)),
])
paddle.vision.set_image_backend('cv2')
# 使用Cifar100数据集
train_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='train', transform = train_tfm, )
val_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='test',transform = test_tfm)
print("train_dataset: %d" % len(train_dataset))
print("val_dataset: %d" % len(val_dataset))
train_dataset: 50000
val_dataset: 10000
batch_size=256
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, drop_last=False, num_workers=4)

2.3 标签平滑

class LabelSmoothingCrossEntropy(nn.Layer):
    def __init__(self, smoothing=0.1):
        super().__init__()
        self.smoothing = smoothing

    def forward(self, pred, target):

        confidence = 1. - self.smoothing
        log_probs = F.log_softmax(pred, axis=-1)
        idx = paddle.stack([paddle.arange(log_probs.shape[0]), target], axis=1)
        nll_loss = paddle.gather_nd(-log_probs, index=idx)
        smooth_loss = paddle.mean(-log_probs, axis=-1)
        loss = confidence * nll_loss + self.smoothing * smooth_loss

        return loss.mean()

2.4 ResNet-DyReLU

2.4.1 Dynamic ReLU
class DyReLU(nn.Layer):
    def __init__(self, channels, reduction=4, k=2, conv_type='2d'):
        super(DyReLU, self).__init__()
        self.channels = channels
        self.k = k
        self.conv_type = conv_type
        assert self.conv_type in ['1d', '2d']

        self.fc1 = nn.Linear(channels, channels // reduction)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(channels // reduction, 2*k)
        self.sigmoid = nn.Sigmoid()

        self.register_buffer('lambdas', paddle.to_tensor([1.]*k + [0.5]*k))
        self.register_buffer('init_v', paddle.to_tensor([1.] + [0.]*(2*k - 1)))

    def get_relu_coefs(self, x):
        theta = paddle.mean(x, axis=-1)
        if self.conv_type == '2d':
            theta = paddle.mean(theta, axis=-1)
        theta = self.fc1(theta)
        theta = self.relu(theta)
        theta = self.fc2(theta)
        theta = 2 * self.sigmoid(theta) - 1
        return theta

    def forward(self, x):
        raise NotImplementedError

class DyReLUA(DyReLU):
    def __init__(self, channels, reduction=8, k=2, conv_type='2d'):
        super(DyReLUA, self).__init__(channels, reduction, k, conv_type)
        self.fc2 = nn.Linear(channels // reduction, 2*k)

    def forward(self, x):
        assert x.shape[1] == self.channels
        theta = self.get_relu_coefs(x)

        relu_coefs = theta.reshape((-1, 2 * self.k)) * self.lambdas + self.init_v

        if self.conv_type == '1d':
            # BxCxL -> LxBxCx1
            x_perm = x.transpose([2, 0, 1]).unsqueeze(-1)
            output = x_perm * relu_coefs[:, :self.k] + relu_coefs[:, self.k:]
            # LxBxCx2 -> BxCxL
            result = paddle.max(output, axis=-1).transpose([1, 2, 0])
        elif self.conv_type == '2d':
            # BxCxHxW -> HxWxCxBx1
            x_perm = x.transpose([2, 3, 0, 1]).unsqueeze(-1)
            output = x_perm * relu_coefs[:, :self.k] + relu_coefs[:, self.k:]
            # HxWxCxBx2 -> BxCxHxW
            result = paddle.max(output, axis=-1).transpose([2, 3, 0, 1])

        return result

class DyReLUB(DyReLU):
    def __init__(self, channels, reduction=8, k=2, conv_type='2d'):
        super(DyReLUB, self).__init__(channels, reduction, k, conv_type)
        self.fc2 = nn.Linear(channels // reduction, 2*k*channels)

    def forward(self, x):
        assert x.shape[1] == self.channels
        theta = self.get_relu_coefs(x)

        relu_coefs = theta.reshape([-1, self.channels, 2 * self.k]) * self.lambdas + self.init_v

        if self.conv_type == '1d':
            # BxCxL -> LxBxCx1
            x_perm = x.transpose([2, 0, 1]).unsqueeze(-1)
            output = x_perm * relu_coefs[:, :, :self.k] + relu_coefs[:, :, self.k:]
            # LxBxCx2 -> BxCxL
            result = paddle.max(output, axis=-1).transpose([1, 2, 0])

        elif self.conv_type == '2d':
            # BxCxHxW -> HxWxBxCx1
            x_perm = x.transpose([2, 3, 0, 1]).unsqueeze(-1)

            output = x_perm * relu_coefs[:, :, :self.k] + relu_coefs[:, :, self.k:]

            # HxWxBxCx2 -> BxCxHxW
            result = paddle.max(output, axis=-1).transpose([2, 3, 0, 1])

        return result


class DyReLUC(DyReLU):
    def __init__(self, channels, reduction=8, k=2, conv_type='2d'):
        super(DyReLUC, self).__init__(channels, reduction, k, conv_type)
        self.fc2 = nn.Linear(channels // reduction, 2*k*channels)
        if self.conv_type == '1d':
            self.conv = nn.Conv1D(channels, 1, 1)
        elif self.conv_type == '2d':
            self.conv = nn.Conv2D(channels, 1, 1)

    def forward(self, x):
        assert x.shape[1] == self.channels
        theta = self.get_relu_coefs(x)

        if self.conv_type == '1d':
            B, C, L = x.shape
            self.spatial_theta = self.conv(x)   # Bx1xL

            self.spatial_theta = paddle.minimum(F.softmax(self.spatial_theta/10) * L / 3, paddle.ones_like(self.spatial_theta))

            theta = theta.reshape([-1, self.channels * 2 * self.k, 1]) * self.spatial_theta
            theta = theta.transpose([0, 2, 1]).reshape((-1, L, self.channels, 2 * self.k))    #BxLxCx(2k)
            relu_coefs = theta * self.lambdas + self.init_v

            # BxCxL -> BxLxCx1
            x_perm = x.transpose([0, 2, 1]).unsqueeze(-1)
            output = x_perm * relu_coefs[:, :, :, :self.k] + relu_coefs[:, :, :, self.k:]
            # BxLxCx2 -> BxCxL
            result = paddle.max(output, axis=-1).transpose([0, 2, 1])

        elif self.conv_type == '2d':
            B, C, H, W = x.shape
            self.spatial_theta = self.conv(x)   # Bx1xHxW

            self.spatial_theta = paddle.minimum(F.softmax(self.spatial_theta/10) * H * W / 3, paddle.ones_like(self.spatial_theta))

            theta = theta.reshape([-1, self.channels * 2 * self.k, 1, 1]) * self.spatial_theta
            theta = theta.transpose([0, 2, 3, 1]).reshape((-1, H, W, self.channels, 2 * self.k))    #BxHxWxCx(2k)

            relu_coefs = theta * self.lambdas + self.init_v
            # BxCxHxW -> BxHxWxCx1
            x_perm = x.transpose([0, 2, 3, 1]).unsqueeze(-1)

            output = x_perm * relu_coefs[:, :, :, :,:self.k] + relu_coefs[:, :, :, :, self.k:]

            # BxHxWxCx2 -> BxCxHxW
            result = paddle.max(output, axis=-1).transpose([0, 3, 1, 2])

        return result
model = DyReLUA(64)
paddle.summary(model, (1, 64, 32, 32))
W0314 13:27:07.097597 20083 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0314 13:27:07.101840 20083 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.


---------------------------------------------------------------------------
 Layer (type)       Input Shape          Output Shape         Param #    
===========================================================================
   Linear-1          [[1, 64]]              [1, 8]              520      
    ReLU-1            [[1, 8]]              [1, 8]               0       
   Linear-3           [[1, 8]]              [1, 4]              36       
   Sigmoid-1          [[1, 4]]              [1, 4]               0       
===========================================================================
Total params: 556
Trainable params: 556
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.25
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.25
---------------------------------------------------------------------------






{'total_params': 556, 'trainable_params': 556}
model = DyReLUB(64)
paddle.summary(model, (1, 64, 32, 32))
---------------------------------------------------------------------------
 Layer (type)       Input Shape          Output Shape         Param #    
===========================================================================
   Linear-4          [[1, 64]]              [1, 8]              520      
    ReLU-2            [[1, 8]]              [1, 8]               0       
   Linear-6           [[1, 8]]             [1, 256]            2,304     
   Sigmoid-2         [[1, 256]]            [1, 256]              0       
===========================================================================
Total params: 2,824
Trainable params: 2,824
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.25
Forward/backward pass size (MB): 0.00
Params size (MB): 0.01
Estimated Total Size (MB): 0.26
---------------------------------------------------------------------------






{'total_params': 2824, 'trainable_params': 2824}
model = DyReLUC(64)
paddle.summary(model, (1, 64, 32, 32))
---------------------------------------------------------------------------
 Layer (type)       Input Shape          Output Shape         Param #    
===========================================================================
   Linear-7          [[1, 64]]              [1, 8]              520      
    ReLU-3            [[1, 8]]              [1, 8]               0       
   Linear-9           [[1, 8]]             [1, 256]            2,304     
   Sigmoid-3         [[1, 256]]            [1, 256]              0       
   Conv2D-1      [[1, 64, 32, 32]]      [1, 1, 32, 32]          65       
===========================================================================
Total params: 2,889
Trainable params: 2,889
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.25
Forward/backward pass size (MB): 0.01
Params size (MB): 0.01
Estimated Total Size (MB): 0.27
---------------------------------------------------------------------------






{'total_params': 2889, 'trainable_params': 2889}
2.4.2 ResNet-DyReLUC
class BasicBlock(nn.Layer):
    expansion = 1

    def __init__(
        self,
        inplanes,
        planes,
        stride=1,
        downsample=None,
        groups=1,
        base_width=64,
        dilation=1,
        norm_layer=None,
    ):
        super().__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2D

        if dilation > 1:
            raise NotImplementedError(
                "Dilation > 1 not supported in BasicBlock"
            )

        self.conv1 = nn.Conv2D(
            inplanes, planes, 3, padding=1, stride=stride, bias_attr=False
        )
        self.bn1 = norm_layer(planes)
        self.relu1 = DyReLUC(planes)
        self.conv2 = nn.Conv2D(planes, planes, 3, padding=1, bias_attr=False)
        self.bn2 = norm_layer(planes)
        self.relu2 = DyReLUC(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu1(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu2(out)

        return out


class BottleneckBlock(nn.Layer):

    expansion = 4

    def __init__(
        self,
        inplanes,
        planes,
        stride=1,
        downsample=None,
        groups=1,
        base_width=64,
        dilation=1,
        norm_layer=None,
    ):
        super().__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2D
        width = int(planes * (base_width / 64.0)) * groups

        self.conv1 = nn.Conv2D(inplanes, width, 1, bias_attr=False)
        self.bn1 = norm_layer(width)

        self.conv2 = nn.Conv2D(
            width,
            width,
            3,
            padding=dilation,
            stride=stride,
            groups=groups,
            dilation=dilation,
            bias_attr=False,
        )
        self.bn2 = norm_layer(width)

        self.conv3 = nn.Conv2D(
            width, planes * self.expansion, 1, bias_attr=False
        )
        self.bn3 = norm_layer(planes * self.expansion)
        self.relu1 = DyReLUC(width)
        self.relu2 = DyReLUC(width)
        self.relu3 = DyReLUC(planes * self.expansion)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu1(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu2(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu3(out)

        return out


class ResNet(nn.Layer):
    """ResNet model from
    `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
    Args:
        Block (BasicBlock|BottleneckBlock): Block module of model.
        depth (int, optional): Layers of ResNet, Default: 50.
        width (int, optional): Base width per convolution group for each convolution block, Default: 64.
        num_classes (int, optional): Output num_features of last fc layer. If num_classes <= 0, last fc layer
                            will not be defined. Default: 1000.
        with_pool (bool, optional): Use pool before the last fc layer or not. Default: True.
        groups (int, optional): Number of groups for each convolution block, Default: 1.
    Returns:
        :ref:`api_paddle_nn_Layer`. An instance of ResNet model.
    Examples:
        .. code-block:: python
            import paddle
            from paddle.vision.models import ResNet
            from paddle.vision.models.resnet import BottleneckBlock, BasicBlock
            # build ResNet with 18 layers
            resnet18 = ResNet(BasicBlock, 18)
            # build ResNet with 50 layers
            resnet50 = ResNet(BottleneckBlock, 50)
            # build Wide ResNet model
            wide_resnet50_2 = ResNet(BottleneckBlock, 50, width=64*2)
            # build ResNeXt model
            resnext50_32x4d = ResNet(BottleneckBlock, 50, width=4, groups=32)
            x = paddle.rand([1, 3, 224, 224])
            out = resnet18(x)
            print(out.shape)
            # [1, 1000]
    """

    def __init__(
        self,
        block,
        depth=50,
        width=64,
        num_classes=1000,
        with_pool=True,
        groups=1,
    ):
        super().__init__()
        layer_cfg = {
            18: [2, 2, 2, 2],
            34: [3, 4, 6, 3],
            50: [3, 4, 6, 3],
            101: [3, 4, 23, 3],
            152: [3, 8, 36, 3],
        }
        layers = layer_cfg[depth]
        self.groups = groups
        self.base_width = width
        self.num_classes = num_classes
        self.with_pool = with_pool
        self._norm_layer = nn.BatchNorm2D

        self.inplanes = 64
        self.dilation = 1

        self.conv1 = nn.Conv2D(
            3,
            self.inplanes,
            kernel_size=3,
            stride=1,
            padding=1,
            bias_attr=False,
        )
        self.bn1 = self._norm_layer(self.inplanes)
        self.relu = DyReLUC(self.inplanes)

        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        if with_pool:
            self.avgpool = nn.AdaptiveAvgPool2D((1, 1))

        if num_classes > 0:
            self.fc = nn.Linear(512 * block.expansion, num_classes)

    def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
        norm_layer = self._norm_layer
        downsample = None
        previous_dilation = self.dilation
        if dilate:
            self.dilation *= stride
            stride = 1
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2D(
                    self.inplanes,
                    planes * block.expansion,
                    1,
                    stride=stride,
                    bias_attr=False,
                ),
                norm_layer(planes * block.expansion),
            )

        layers = []
        layers.append(
            block(
                self.inplanes,
                planes,
                stride,
                downsample,
                self.groups,
                self.base_width,
                previous_dilation,
                norm_layer,
            )
        )
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(
                block(
                    self.inplanes,
                    planes,
                    groups=self.groups,
                    base_width=self.base_width,
                    norm_layer=norm_layer,
                )
            )

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if self.with_pool:
            x = self.avgpool(x)

        if self.num_classes > 0:
            x = paddle.flatten(x, 1)
            x = self.fc(x)

        return x
model = ResNet(BasicBlock, depth=18, num_classes=10)
paddle.summary(model, (1, 3, 32, 32))

2.5 训练

learning_rate = 0.1
n_epochs = 50
paddle.seed(42)
np.random.seed(42)
work_path = 'work/model'

model = ResNet(BasicBlock, depth=18, num_classes=10)

criterion = LabelSmoothingCrossEntropy()

scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=learning_rate, T_max=50000 // batch_size * n_epochs, verbose=False)
optimizer = paddle.optimizer.Momentum(parameters=model.parameters(), learning_rate=scheduler, weight_decay=5e-4)

gate = 0.0
threshold = 0.0
best_acc = 0.0
val_acc = 0.0
loss_record = {'train': {'loss': [], 'iter': []}, 'val': {'loss': [], 'iter': []}}   # for recording loss
acc_record = {'train': {'acc': [], 'iter': []}, 'val': {'acc': [], 'iter': []}}      # for recording accuracy

loss_iter = 0
acc_iter = 0

for epoch in range(n_epochs):
    # ---------- Training ----------
    model.train()
    train_num = 0.0
    train_loss = 0.0

    val_num = 0.0
    val_loss = 0.0
    accuracy_manager = paddle.metric.Accuracy()
    val_accuracy_manager = paddle.metric.Accuracy()
    print("#===epoch: {}, lr={:.10f}===#".format(epoch, optimizer.get_lr()))
    for batch_id, data in enumerate(train_loader):
        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)

        logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        accuracy_manager.update(acc)
        if batch_id % 10 == 0:
            loss_record['train']['loss'].append(loss.numpy())
            loss_record['train']['iter'].append(loss_iter)
            loss_iter += 1

        loss.backward()

        optimizer.step()
        optimizer.clear_grad()
        
        train_loss += loss
        train_num += len(y_data)
        scheduler.step()

    total_train_loss = (train_loss / train_num) * batch_size
    train_acc = accuracy_manager.accumulate()
    acc_record['train']['acc'].append(train_acc)
    acc_record['train']['iter'].append(acc_iter)
    acc_iter += 1
    # Print the information.
    print("#===epoch: {}, train loss is: {}, train acc is: {:2.2f}%===#".format(epoch, total_train_loss.numpy(), train_acc*100))

    # ---------- Validation ----------
    model.eval()

    for batch_id, data in enumerate(val_loader):

        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)
        with paddle.no_grad():
          logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        val_accuracy_manager.update(acc)

        val_loss += loss
        val_num += len(y_data)

    total_val_loss = (val_loss / val_num) * batch_size
    loss_record['val']['loss'].append(total_val_loss.numpy())
    loss_record['val']['iter'].append(loss_iter)
    val_acc = val_accuracy_manager.accumulate()
    acc_record['val']['acc'].append(val_acc)
    acc_record['val']['iter'].append(acc_iter)
    
    print("#===epoch: {}, val loss is: {}, val acc is: {:2.2f}%===#".format(epoch, total_val_loss.numpy(), val_acc*100))

    # ===================save====================
    if val_acc > best_acc:
        best_acc = val_acc
        paddle.save(model.state_dict(), os.path.join(work_path, 'best_model.pdparams'))
        paddle.save(optimizer.state_dict(), os.path.join(work_path, 'best_optimizer.pdopt'))

print(best_acc)
paddle.save(model.state_dict(), os.path.join(work_path, 'final_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'final_optimizer.pdopt'))

2.6 实验结果

def plot_learning_curve(record, title='loss', ylabel='CE Loss'):
    ''' Plot learning curve of your CNN '''
    maxtrain = max(map(float, record['train'][title]))
    maxval = max(map(float, record['val'][title]))
    ymax = max(maxtrain, maxval) * 1.1
    mintrain = min(map(float, record['train'][title]))
    minval = min(map(float, record['val'][title]))
    ymin = min(mintrain, minval) * 0.9

    total_steps = len(record['train'][title])
    x_1 = list(map(int, record['train']['iter']))
    x_2 = list(map(int, record['val']['iter']))
    figure(figsize=(10, 6))
    plt.plot(x_1, record['train'][title], c='tab:red', label='train')
    plt.plot(x_2, record['val'][title], c='tab:cyan', label='val')
    plt.ylim(ymin, ymax)
    plt.xlabel('Training steps')
    plt.ylabel(ylabel)
    plt.title('Learning curve of {}'.format(title))
    plt.legend()
    plt.show()
plot_learning_curve(loss_record, title='loss', ylabel='CE Loss')
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/cbook/__init__.py:2349: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  if isinstance(obj, collections.Iterator):
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/cbook/__init__.py:2366: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  return list(data) if isinstance(data, collections.MappingView) else data

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-2cqewWc4-1681385693824)(main_files/main_30_1.png)]

plot_learning_curve(acc_record, title='acc', ylabel='Accuracy')

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lkzPG0BB-1681385693824)(main_files/main_31_0.png)]

import time
work_path = 'work/model'
model = ResNet(BasicBlock, depth=18, num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
aa = time.time()
for batch_id, data in enumerate(val_loader):

    x_data, y_data = data
    labels = paddle.unsqueeze(y_data, axis=1)
    with paddle.no_grad():
        logits = model(x_data)
bb = time.time()
print("Throughout:{}".format(int(len(val_dataset)//(bb - aa))))
Throughout:2900

3. ResNet

3.1 ResNet

model = paddle.vision.models.resnet18(num_classes=10)
model.conv1 = nn.Conv2D(3, 64, 3, padding=1, bias_attr=False)
model.maxpool = nn.Identity()
paddle.summary(model, (1, 3, 128, 128))

3.2 训练

learning_rate = 0.1
n_epochs = 50
paddle.seed(42)
np.random.seed(42)
work_path = 'work/model1'

model = paddle.vision.models.resnet18(num_classes=10)
model.conv1 = nn.Conv2D(3, 64, 3, padding=1, bias_attr=False)
model.maxpool = nn.Identity()

criterion = LabelSmoothingCrossEntropy()

scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=learning_rate, T_max=50000 // batch_size * n_epochs, verbose=False)
optimizer = paddle.optimizer.Momentum(parameters=model.parameters(), learning_rate=scheduler, weight_decay=5e-4)

gate = 0.0
threshold = 0.0
best_acc = 0.0
val_acc = 0.0
loss_record1 = {'train': {'loss': [], 'iter': []}, 'val': {'loss': [], 'iter': []}}   # for recording loss
acc_record1 = {'train': {'acc': [], 'iter': []}, 'val': {'acc': [], 'iter': []}}      # for recording accuracy

loss_iter = 0
acc_iter = 0

for epoch in range(n_epochs):
    # ---------- Training ----------
    model.train()
    train_num = 0.0
    train_loss = 0.0

    val_num = 0.0
    val_loss = 0.0
    accuracy_manager = paddle.metric.Accuracy()
    val_accuracy_manager = paddle.metric.Accuracy()
    print("#===epoch: {}, lr={:.10f}===#".format(epoch, optimizer.get_lr()))
    for batch_id, data in enumerate(train_loader):
        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)

        logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        accuracy_manager.update(acc)
        if batch_id % 10 == 0:
            loss_record1['train']['loss'].append(loss.numpy())
            loss_record1['train']['iter'].append(loss_iter)
            loss_iter += 1

        loss.backward()

        optimizer.step()
        optimizer.clear_grad()
        
        train_loss += loss
        train_num += len(y_data)
        scheduler.step()

    total_train_loss = (train_loss / train_num) * batch_size
    train_acc = accuracy_manager.accumulate()
    acc_record1['train']['acc'].append(train_acc)
    acc_record1['train']['iter'].append(acc_iter)
    acc_iter += 1
    # Print the information.
    print("#===epoch: {}, train loss is: {}, train acc is: {:2.2f}%===#".format(epoch, total_train_loss.numpy(), train_acc*100))

    # ---------- Validation ----------
    model.eval()

    for batch_id, data in enumerate(val_loader):

        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)
        with paddle.no_grad():
          logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        val_accuracy_manager.update(acc)

        val_loss += loss
        val_num += len(y_data)

    total_val_loss = (val_loss / val_num) * batch_size
    loss_record1['val']['loss'].append(total_val_loss.numpy())
    loss_record1['val']['iter'].append(loss_iter)
    val_acc = val_accuracy_manager.accumulate()
    acc_record1['val']['acc'].append(val_acc)
    acc_record1['val']['iter'].append(acc_iter)
    
    print("#===epoch: {}, val loss is: {}, val acc is: {:2.2f}%===#".format(epoch, total_val_loss.numpy(), val_acc*100))

    # ===================save====================
    if val_acc > best_acc:
        best_acc = val_acc
        paddle.save(model.state_dict(), os.path.join(work_path, 'best_model.pdparams'))
        paddle.save(optimizer.state_dict(), os.path.join(work_path, 'best_optimizer.pdopt'))

print(best_acc)
paddle.save(model.state_dict(), os.path.join(work_path, 'final_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'final_optimizer.pdopt'))

3.3 实验结果

plot_learning_curve(loss_record1, title='loss', ylabel='CE Loss')

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-H6fXIkE6-1681385693826)(main_files/main_42_0.png)]

plot_learning_curve(acc_record1, title='acc', ylabel='Accuracy')

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-dc7sSK5C-1681385693826)(main_files/main_43_0.png)]

##### import time
work_path = 'work/model1'
model = paddle.vision.models.resnet18(num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
aa = time.time()
for batch_id, data in enumerate(val_loader):

    x_data, y_data = data
    labels = paddle.unsqueeze(y_data, axis=1)
    with paddle.no_grad():
        logits = model(x_data)
bb = time.time()
print("Throughout:{}".format(int(len(val_dataset)//(bb - aa))))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1517: UserWarning: Skip loading for conv1.weight. conv1.weight receives a shape [64, 3, 3, 3], but the expected shape is [64, 3, 7, 7].
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))


Throughout:6641

loading for {}. ".format(key) + str(err)))

Throughout:6641

4. 对比实验结果

ModelTrain AccVal AccParameter
ResNet18 w/o DyReLU0.89780.921311183562
ResNet18 w DyReLUA0.88420.875911360662
ResNet18 w DyReLUB0.89860.910112072626
ResNet18 w DyReLUC0.90540.924512076547

        DyReLUA和DyReLUB分别见DyReLUA.ipynb和DyReLUB.ipynb

总结

        本文提出了三种Dynamic ReLU,在CIFAR10上可能由于数据集太小,导致前两个低于原始精度,最后的DyReLUC提高了0.32%。

参考资料

论文:Dynamic ReLU(ECCV 2020)
代码:Islanna/DynamicReLU(非官方)
项目:重新思考神经网络的激活函数:Dynamic ReLU 复现(对该项目进行修正并新增了DyReLUC)

Logo

学大模型,用大模型上飞桨星河社区!每天8点V100G算力免费领!免费领取ERNIE 4.0 100w Token >>>

更多推荐