论文复现：基于PaddleClas复现VovNet

转自AI Studio，原文链接：引入在目标检测中，DenseNet表现良好，通过聚合不同感受野特征层的方式，保留了中间特征层的信息。它通过feature reuse 使得模型的大小和flops大大降低，但是，实验证明，DenseNet backbone更加耗时也增加了能耗：dense connection架构使得输入channel线性递增，导致了更多的内存访问消耗，进而导致更多的计算消耗和能耗。

AI Studio

467人浏览 · 2022-05-13 00:46:30

AI Studio · 2022-05-13 00:46:30 发布

转自AI Studio，原文链接：

引入

在目标检测中，DenseNet表现良好，通过聚合不同感受野特征层的方式，保留了中间特征层的信息。它通过feature reuse 使得模型的大小和flops大大降低，但是，实验证明，DenseNet backbone更加耗时也增加了能耗：dense connection架构使得输入channel线性递增，导致了更多的内存访问消耗，进而导致更多的计算消耗和能耗。因此文章介绍了一种由OSA（one-shot-aggregation）模块组成的叫做VoVNet的高效体系结构。该网络在继承DenseNet的多感受野表示多种特征的优点的情况下，同时解决了密集连接效率低的问题。该网络性能优于DenseNet，而且速度也比DenseNet快2倍。此外，VoVNet网络速度和效率也都优于ResNet，而且对于小目标检测的性能有了显著提高。

在本文中将通过PaddlePaddle深度学习框架复现vovnet39、vovnet57,并添加至PaddleClas深度学习套件中，完成flowers102数据集的全流程训练、验证与测试。

模型介绍

相关资料
- 论文地址：【VovNet】
- 代码地址：【VoVNet.pytorch】
系列模型：
- 分别为轻量级网络，例如VoVNet-27-slim，大型网络，例如VoVNet-39/57。VoVNet由一个包括3个卷积层和4个输出步长为32的OSA模块组成。OSA模块由5个卷积层组成，具有相同的输入/输出信道，用于最小化MAC。

模型重要概念介绍

（1）rethinking densenet

实际上，DenseNet通过密集连接来交换特征数量和特征质量。尽管DenseNet的表现证明了这种交换是有益的，但从能源和时间的角度来看，这种交换还有其他一些缺点。

首先，密集的连接会导致较高的内存访问成本。因为在固定的计算量或模型参数下，当输入和输出信道尺寸相同时，MAC可以最小化。密集连接增加了输入通道大小，而输出通道大小保持不变，因此，每个层的输入和输出通道大小都不平衡。因此，DenseNet在计算量或参数相同的模型中具有较高的mac值，并且消耗更多的能量和时间。
其次，密集连接引入瓶颈结构，影响了GPU并行计算的效率。当模型尺寸较大时，线性增加的输入尺寸是一个严重的问题，因为它使得整体计算相对于深度呈二次增长。为了抑制这种增长，DenseNet采用了增加1×1卷积层的瓶颈结构来保持3 × 3卷积层的输入大小不变。尽管这种方法可以减少FLOPs和参数，但同时它会损害GPU并行计算的效率。瓶颈结构将一个3 × 3卷积层分成两个较小的层，导致更多的顺序计算，从而降低了推理速度。

（2）OSA模块

文章提出一次性聚合（one-shot aggregation 即OSA）模块，该模块将其特征同时聚合到最后一层。如图1所示。每个卷积层包含双向连接，一个连接到下一层以产生具有更大感受野的特征，而另一个仅聚合到最终输出特征映射中一次。
与DenseNet的不同之处在于，每一层的输出并没有按路线（route）到所有后续的中间层，这使得中间层的输入大小是恒定的。这样就提高了GPU的计算效率。
另外一个不同之处在于没有了密集连接，因此MAC比DenseNet小得多。
此外，由于OSA模块聚集了浅层特征，它包含的层更少。因此，OSA模块被设计成只有几层，可以在GPU中高效计算。

VoVNet复现

复现后的OSA模块：


class _OSA_module(nn.Layer):
    def __init__(self,
                 in_ch,
                 stage_ch,
                 concat_ch,
                 layer_per_block,
                 module_name,
                 identity=False):
        super(_OSA_module, self).__init__()

        self.identity = identity
        self.layers = nn.LayerList()
        in_channel = in_ch
        for i in range(layer_per_block):
            self.layers.append(nn.Sequential(
                *conv3x3(in_channel, stage_ch, module_name, i)))
            in_channel = stage_ch

        # feature aggregation
        in_channel = in_ch + layer_per_block * stage_ch
        self.concat = nn.Sequential(
            *(conv1x1(in_channel, concat_ch, module_name, 'concat')))

    def forward(self, x):
        identity_feat = x
        output = []
        output.append(x)
        for layer in self.layers:
            x = layer(x)
            output.append(x)

        x = paddle.concat(output, axis=1)
        xt = self.concat(x)

        if self.identity:
            xt = xt + identity_feat

        return xt


class _OSA_stage(nn.Sequential):
    def __init__(self,
                 in_ch,
                 stage_ch,
                 concat_ch,
                 block_per_stage,
                 layer_per_block,
                 stage_num):
        super(_OSA_stage, self).__init__()

        if not stage_num == 2:
            self.add_sublayer('Pooling',
                nn.MaxPool2D(kernel_size=3, stride=2, ceil_mode=True))

        module_name = f'OSA{stage_num}_1'
        self.add_sublayer(module_name,
            _OSA_module(in_ch,
                        stage_ch,
                        concat_ch,
                        layer_per_block,
                        module_name))
        for i in range(block_per_stage-1):
            module_name = f'OSA{stage_num}_{i+2}'
            self.add_sublayer(module_name,
                _OSA_module(concat_ch,
                            stage_ch,
                            concat_ch,
                            layer_per_block,
                            module_name,
                            identity=True))

复现后VovNet39、VovNet57、VovNet27_slim网络模型

class VoVNet(nn.Layer):
    def __init__(self,
                 config_stage_ch,
                 config_concat_ch,
                 block_per_stage,
                 layer_per_block,
                 class_num=1000):
        super(VoVNet, self).__init__()

        # Stem module
        stem = conv3x3(3,   64, 'stem', '1', 2)
        stem += conv3x3(64,  64, 'stem', '2', 1)
        stem += conv3x3(64, 128, 'stem', '3', 2)
        self.add_sublayer('stem', nn.Sequential(*stem))

        stem_out_ch = [128]
        in_ch_list = stem_out_ch + config_concat_ch[:-1]
        self.stage_names = []
        for i in range(4): #num_stages
            name = 'stage%d' % (i+2)
            self.stage_names.append(name)
            self.add_sublayer(name,
                            _OSA_stage(in_ch_list[i],
                                       config_stage_ch[i],
                                       config_concat_ch[i],
                                       block_per_stage[i],
                                       layer_per_block,
                                       i+2))

        self.classifier = nn.Linear(config_concat_ch[-1], class_num)

        for m in self.sublayers():
            if isinstance(m, nn.Conv2D):
                # nn.init.kaiming_normal_(m.weight)
                kaiming_normal_(m.weight)
            elif isinstance(m, (nn.BatchNorm2D, nn.GroupNorm)):
                # nn.init.constant_(m.weight, 1)
                # nn.init.constant_(m.bias, 0)
                ones_(m.weight)
                zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                # nn.init.constant_(m.bias, 0)
                zeros_(m.bias)

    def forward(self, x):
        x = self.stem(x)
        for name in self.stage_names:
            x = getattr(self, name)(x)
        x = F.adaptive_avg_pool2d(x, (1, 1)).flatten(1)
        x = self.classifier(x)
        return x



def _load_pretrained(pretrained, model, model_url="", use_ssld=False):
    if pretrained is False:
        pass
    elif pretrained is True:
        load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld)
    elif isinstance(pretrained, str):
        load_dygraph_pretrain(model, pretrained)
    else:
        raise RuntimeError(
            "pretrained type is not available. Please use `string` or `boolean` type."
        )

def _vovnet(arch,
            config_stage_ch,
            config_concat_ch,
            block_per_stage,
            layer_per_block,
            pretrained,
            progress,
            **kwargs):
    model = VoVNet(config_stage_ch, config_concat_ch,
                   block_per_stage, layer_per_block,
                   **kwargs)
    if pretrained:
        _load_pretrained(pretrained, model)
    return model


def vovnet57(pretrained=False, progress=True, **kwargs):
    r"""Constructs a VoVNet-57 model as described in
    `"An Energy and GPU-Computation Efficient Backbone Networks"
    <https://arxiv.org/abs/1904.09730>`_.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vovnet('vovnet57', [128, 160, 192, 224], [256, 512, 768, 1024],
                    [1,1,4,3], 5, pretrained, progress, **kwargs)


def vovnet39(pretrained=False, progress=True, **kwargs):
    r"""Constructs a VoVNet-39 model as described in
    `"An Energy and GPU-Computation Efficient Backbone Networks"
    <https://arxiv.org/abs/1904.09730>`_.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vovnet('vovnet39', [128, 160, 192, 224], [256, 512, 768, 1024],
                    [1,1,2,2], 5, pretrained, progress, **kwargs)


def vovnet27_slim(pretrained=False, progress=True, **kwargs):
    r"""Constructs a VoVNet-39 model as described in
    `"An Energy and GPU-Computation Efficient Backbone Networks"
    <https://arxiv.org/abs/1904.09730>`_.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vovnet('vovnet27_slim', [64, 80, 96, 112], [128, 256, 384, 512],
                    [1,1,1,1], 5, pretrained, progress, **kwargs)

将复现后的vovnet合入PaddleClas中开始模型的训练、测试和评估。

模型训练

1.查看飞桨版本

In [ ]

# 1.查看飞桨版本
import paddle
print(paddle.__version__)

2.安装PaddleClas环境

添加了vovnet的PaddleClas已打包至PaddleClas.zip中，通过解压进行环境安装

参考文档：https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/tutorials/install_en.md

In [ ]

# !git clone https://github.com/PaddlePaddle/PaddleClas.git
!unzip PaddleClas.zip
%cd ./PaddleClas/
!pip install --upgrade pip
!pip3 install --upgrade -r requirements.txt -i https://mirror.baidu.com/pypi/simple

2.1 了解PaddleClas深度学习套件

官方文档：[https://github.com/PaddlePaddle/PaddleClas](https://github.com/PaddlePaddle/PaddleClas)

拥有图像识别、图像分类、特征学习等内容

2.2 PaddleClas全局配置

参数名称	具体含义	默认值
checkpoints	断点模型路径，用于恢复训练	null
pretrained_model	预训练模型路径	null
output_dir	保存模型路径	"./output/"
save_interval	每隔多少个epoch保存模型	1
eval_during_train	是否在训练时进行评估	True
eval_interval	每隔多少个epoch进行模型评估	1
epochs	训练总epoch数	无
print_batch_step	每隔多少个mini-batch打印输出	10
use_visualdl	是否是用visualdl可视化训练过程	False
image_shape	图片大小	[3，224，224]
save_inference_dir	inference模型的保存路径	"./inference"
eval_mode	eval的模式	"classification"

注：image_shape值除了默认还可以选择list, shape: (3,)

eval_mode除了默认值还可以选择"retrieval"

参考文档：https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/tutorials/config_description_en.md

3.1 官方分类尝试

3.1.1 下载并解压flowers102数据集

训练集中有：

train_list.txt：训练集，1020张图

val_list.txt：验证集，1020张图

train_extra_list.txt：大的训练集，7169张图

图片展示：

数据写入情况：

In [ ]

# 1.修改当前路径
%cd ./dataset/
# 2.下载数据集
!wget https://paddle-imagenet-models-name.bj.bcebos.com/data/flowers102.zip
# 3.解压数据
!unzip flowers102.zip

3.2 配置文件

我们要首先对配置文件进行修改。 AI Studio由于没有共享内存，所以需要修改num_workers: 0,其他的可以不修改。

查看自己的环境是否为CPU或者是GPU然后对device:进行修改

3.3 进行训练

这里我们选用复现好的vovnet39网络模型的配置进行鲜花分类模型的训练

python tools/train.py -c ./ppcls/configs/ImageNet/VovNet/vovnet39_x1.0.yaml

运行tools文件夹下的train.py文件

-c指的是训练使用的配置文件的路径为./ppcls/configs/ImageNet/VovNet/vovnet39_x1.0.yaml

-o表示的是是否使用预训练模型，可以是选择为True或False，也可以使用预训练模型存放路径。

In [ ]

# 切换目录到PaddleClas下
%cd /home/aistudio/PaddleClas
# 开始训练
!python tools/train.py -c ./ppcls/configs/ImageNet/VovNet/vovnet39_x1.0.yaml

3.4 模型预测

-c为训练配置文件

-o Infer.infer_imgs=为预测的图片

-o Global.pretrained_model=为用于预测的模型

In [ ]

!python tools/infer.py -c ./ppcls/configs/ImageNet/VovNet/vovnet39_x1.0.yaml\
                     -o Infer.infer_imgs=dataset/flowers102/jpg/image_00001.jpg \
                     -o Global.pretrained_model=output_vov/vovnet39/latest

3.5 结果解析

[{'class_ids': [75, 45, 43, 18, 58], 'scores': [1.0, 0.0, 0.0, 0.0, 0.0], 'file_name': 'dataset/flowers102/jpg/image_00001.jpg', 'label_names': []}]

这里对75,45,43,18,58的比例进行了分析，最后以75为最后结果。

3.6 模型评估

In [ ]

import paddle

!python -m paddle.distributed.launch \
    tools/eval.py \
        -c ./ppcls/configs/ImageNet/VovNet/vovnet39_x1.0.yaml \
        -o Global.pretrained_model=output_vov/vovnet39/latest

[Eval][Epoch 0][Avg]CELoss: 0.34251, loss: 0.34251, top1: 0.90885, top5: 0.99464