「论文复现」A TRANSFORMER-BASED SIAMESE NETWORK FOR CHANGE DETECTION

Index Terms:
Change detection, transformer - Siamese network, attention mechanism, multilayer perceptron, remote
sensing.

(一) 摘要

1.1 方法简介

ChangeFormer是一种基于Transformer的Siamese网络架构,用于从一对前后时序的遥感图像中提取变化区域(ChangeDetection,CD)。与基于全卷积网络(ConvNet)的CD框架不同,ChangeFormer将层次化的Transformer编码器与多层感知(MLP)解码器在Siamese网络统一起来,以高效提取准确CD所需的多尺度和长程依赖的细节。对两个CD数据集的实验(Paddle版本仅在LEVIR-CD数据集上进行了训练和推理测试)表明,与之前的版本相比,端到端可训练的ChangeFormer架构实现了更好的CD性能。原论文代码和预训练模型可在以上参考链接找到。

1.2 模型概览

注:以下ChangeFormer的有关分析参考CSDN博客:【论文笔记】A Transformer-based Siamese network for change detection

  • ChangeFormer的主要贡献是将Transformer模块全程应用于变化检测(CD),而不是使用CNN+TRM的网络搭建方式,证明了纯TRM网络+MLP Head也能取得SOTA效果。网络整体架构可以分为三个模块:

    • hierarchical transformer encoder:构成孪生TRM网络,用于特征提取和编码;
    • difference modules:四个差异化模块,计算多尺度差异;
    • light weight MLP decoder: 轻量化的MLP decoder,融合多尺度特征,进行CD mask的预测。
  • 网络的pipeline:

    • 输入变化检测图像对,孪生TRM网络分别提取前后时相的多级特征;
    • difference modules接收孪生网络不同层生成的共4组不同尺度的特征,进行拼接和融合;
    • 将上一步产生的多级特征输入MLP 解码器进行聚合,得到变化检测mask输出。

在这里插入图片描述

(二)模型架构

从2020年ViT模型横空出世到如今Vision Transformer大行其道,且有一统CV的态势,也只有短短两年时间。现今各大CV任务榜单上均不乏ViT,SwinTRM以及其他一些TRM变体的身影。

Transformer能够在NLP和CV领域大放异彩,主要来源于TRM的全局注意力以及神经元间长程依赖的建模能力。有关TRM的介绍特别是视觉TRM的有关知识,可以参见AI Studio课程:从零开始学视觉Transformer

即使TRM在各自然图像任务上应用广泛且效果优异,但是在遥感类任务上却有所滞后,以此篇论文关注的变化检测领域来说,BiT引入TRM完成CD任务,但backbone仍是CNN(ResNet18)(有关BiT的介绍以及上手体验可以转【第六期论文复现赛-变化检测】Remote Sensing Image Change Detection with Transformers),直到本次复现论文ChangeFormer才完全使用TRM架构作为模型主体。下面就让笔者带着大家一探ChangeFormer里TRM的魅力吧。

模型的主体架构在章节(一)中已经说明,不再赘述,下文将就这三种模块进行说明以及代码展示。

2.1 hierarchical transformer encoder

  • hierachical transformer encoder是ChangeFormer Encoder部分的基础模块,用来构建孪生特征提取网络。

  • ChangeFormer的hierachical transformer encoder里的TRM模块大体上就是经典的ViT模块,最主要的修改即其在堆叠的TRM层间加入了Downsample模块。

    • DownSample模块的主要功能就是对patch进行下采样,模仿CNN结构获得多尺度的图像特征。
      在这里插入图片描述
  • Encoder代码如下:

# Transormer Ecoder with x2, x4, x8, x16 scales
class EncoderTransformer_v3(nn.Layer):
    def __init__(
        self,
        img_size=256,
        patch_size=3,
        in_chans=3,
        num_classes=2,
        embed_dims=[32,64,128,256],
        num_heads=[2,2,4,8],
            mlp_ratios=[4,4,4,4],
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.,
        attn_drop_rate=0.,
        drop_path_rate=0.,
        norm_layer=nn.LayerNorm,
        depths=[3,3,6,18],
        sr_ratios=[8,4,2,1]):
        super().__init__()
        self.num_classes = num_classes
        self.depths = depths
        self.embed_dims = embed_dims

        # patch embedding definitions
        self.patch_embed1 = OverlapPatchEmbed(
            img_size=img_size,
            patch_size=7,
            stride=4,
            in_chans=in_chans,
            embed_dim=embed_dims[0])
        self.patch_embed2 = OverlapPatchEmbed(
            img_size=img_size // 4,
            patch_size=patch_size,
            stride=2,
            in_chans=embed_dims[0],
            embed_dim=embed_dims[1])
        self.patch_embed3 = OverlapPatchEmbed(
            img_size=img_size // 8,
            patch_size=patch_size,
            stride=2,
            in_chans=embed_dims[1],
            embed_dim=embed_dims[2])
        self.patch_embed4 = OverlapPatchEmbed(
            img_size=img_size // 16,
            patch_size=patch_size,
            stride=2,
            in_chans=embed_dims[2],
            embed_dim=embed_dims[3])

        # Stage-1 (x1/4 scale)
        dpr = [x.item() for x in pd.linspace(0, drop_path_rate, sum(depths))]
        cur = 0
        self.block1 = nn.LayerList([Block(dim=embed_dims[0],
                                          num_heads=num_heads[0],
                                          mlp_ratio=mlp_ratios[0],
                                          qkv_bias=qkv_bias,
                                          qk_scale=qk_scale,
                                          drop=drop_rate,
                                          attn_drop=attn_drop_rate,
                                          drop_path=dpr[cur + i],
                                          norm_layer=norm_layer,
                                          sr_ratio=sr_ratios[0]) for i in range(depths[0])])
        self.norm1 = norm_layer(embed_dims[0])

        # Stage-2 (x1/8 scale)
        cur += depths[0]
        self.block2 = nn.LayerList([Block(dim=embed_dims[1],
                                          num_heads=num_heads[1],
                                          mlp_ratio=mlp_ratios[1],
                                          qkv_bias=qkv_bias,
                                          qk_scale=qk_scale,
                                          drop=drop_rate,
                                          attn_drop=attn_drop_rate,
                                          drop_path=dpr[cur + i],
                                          norm_layer=norm_layer,
                                          sr_ratio=sr_ratios[1]) for i in range(depths[1])])
        self.norm2 = norm_layer(embed_dims[1])

        # Stage-3 (x1/16 scale)
        cur += depths[1]
        self.block3 = nn.LayerList([Block(dim=embed_dims[2],
                                          num_heads=num_heads[2],
                                          mlp_ratio=mlp_ratios[2],
                                          qkv_bias=qkv_bias,
                                          qk_scale=qk_scale,
                                          drop=drop_rate,
                                          attn_drop=attn_drop_rate,
                                          drop_path=dpr[cur + i],
                                          norm_layer=norm_layer,
                                          sr_ratio=sr_ratios[2]) for i in range(depths[2])])
        self.norm3 = norm_layer(embed_dims[2])

        # Stage-4 (x1/32 scale)
        cur += depths[2]
        self.block4 = nn.LayerList([Block(dim=embed_dims[3],
                                          num_heads=num_heads[3],
                                          mlp_ratio=mlp_ratios[3],
                                          qkv_bias=qkv_bias,
                                          qk_scale=qk_scale,
                                          drop=drop_rate,
                                          attn_drop=attn_drop_rate,
                                          drop_path=dpr[cur + i],
                                          norm_layer=norm_layer,
                                          sr_ratio=sr_ratios[3]) for i in range(depths[3])])
        self.norm4 = norm_layer(embed_dims[3])

        self.apply(self._init_weights)

        # for ent in self.parameters():
        #     self._init_weights(ent)

    def _init_weights(self, m):
        if isinstance(m, nn.Linear):
            trunc_normal_op = nn.initializer.TruncatedNormal(std=.02)
            trunc_normal_op(m.weight)
            # trunc_normal_(m.weight, std=.02)
            if isinstance(m, nn.Linear) and m.bias is not None:
                init_bias = nn.initializer.Constant(0)
                init_bias(m.bias)
                # nn.init.constant_(m.bias, 0)
        elif isinstance(m, nn.LayerNorm):
            init_bias = nn.initializer.Constant(0)
            init_bias(m.bias)
            # nn.init.constant_(m.bias, 0)
            init_weight = nn.initializer.Constant(1.0)
            init_weight(m.weight)
            # nn.init.constant_(m.weight, 1.0)
        elif isinstance(m, nn.Conv2D):
            fan_out = m._kernel_size[0] * m._kernel_size[1] * m._out_channels
            fan_out //= m._groups
            init_weight = nn.initializer.Normal(0, math.sqrt(2.0 / fan_out))
            init_weight(m.weight)
            # m.weight.data.normal_(0, math.sqrt(2.0 / fan_out))
            if m.bias is not None:
                init_bias = nn.initializer.Constant(0)
                init_bias(m.bias)
                # m.bias.data.zero_()

    def reset_drop_path(self, drop_path_rate):
        dpr = [
            x.item() for x in pd.linspace(
                0, drop_path_rate, sum(
                    self.depths))]
        cur = 0
        for i in range(self.depths[0]):
            self.block1[i].drop_path.drop_prob = dpr[cur + i]

        cur += self.depths[0]
        for i in range(self.depths[1]):
            self.block2[i].drop_path.drop_prob = dpr[cur + i]

        cur += self.depths[1]
        for i in range(self.depths[2]):
            self.block3[i].drop_path.drop_prob = dpr[cur + i]

        cur += self.depths[2]
        for i in range(self.depths[3]):
            self.block4[i].drop_path.drop_prob = dpr[cur + i]

    def forward_features(self, x):
        # print(x)
        B = x.shape[0]
        outs = []

        # stage 1
        x1, H1, W1 = self.patch_embed1(x)
        # print(x1,H1,W1)
        for i, blk in enumerate(self.block1):
            x1 = blk(x1, H1, W1)
        # print(x1)
        x1 = self.norm1(x1)
        x1 = x1.reshape([B, H1, W1, -1]).transpose([0, 3, 1, 2])
        # print(x1)
        outs.append(x1)

        # stage 2
        x1, H1, W1 = self.patch_embed2(x1)
        for i, blk in enumerate(self.block2):
            x1 = blk(x1, H1, W1)
        x1 = self.norm2(x1)
        x1 = x1.reshape([B, H1, W1, -1]).transpose([0, 3, 1, 2])
        outs.append(x1)

        # stage 3
        x1, H1, W1 = self.patch_embed3(x1)
        for i, blk in enumerate(self.block3):
            x1 = blk(x1, H1, W1)
        x1 = self.norm3(x1)
        x1 = x1.reshape([B, H1, W1, -1]).transpose([0, 3, 1, 2])
        outs.append(x1)

        # stage 4
        x1, H1, W1 = self.patch_embed4(x1)
        for i, blk in enumerate(self.block4):
            x1 = blk(x1, H1, W1)
        x1 = self.norm4(x1)
        x1 = x1.reshape([B, H1, W1, -1]).transpose([0, 3, 1, 2])
        outs.append(x1)
        return outs

    def forward(self, x):
        x = self.forward_features(x)
        return x

2.2 difference module

  • difference module的主要功能就是在不同尺度进行特征拼接输出差异。

    • 基本过程是进行拼接,然后卷积。
    • 要注意的是,这一操作并没有直接计算特征图的绝对差值,而是在训练过中学习每个尺度上的最优距离度量。
  • differenc module代码:

# Difference Layer
def conv_diff(in_channels, out_channels):
    return nn.Sequential(
        nn.Conv2D(in_channels, out_channels, kernel_size=3, padding=1),
        nn.ReLU(),
        nn.BatchNorm2D(out_channels),
        nn.Conv2D(out_channels, out_channels, kernel_size=3, padding=1),
        nn.ReLU()
    )

2.3 MLP decoder

  • MLP Decoder的操作可以分为如下三个步骤:

    • MLP & Upsampling
      • MLP层统一不同特征图的通道数,然后,上采样到H/4,W/4。
    • Concatenation & Fusion
      • 将上一步得到的上采样特征图进行拼接,然后再使用一层MLP层进行融合。
    • Upsampling & Classification
      • 再次进行上采样,得到H,W尺寸的特征图,最后再使用一层MLP层得到CD mask。
  • MLP decoder的代码:

class DecoderTransformer_v3(nn.Layer):
    """
    Transformer Decoder
    """

    def __init__(
        self,
        input_transform='multiple_select',
        in_index=[0,1,2,3],
        align_corners=True,
        in_channels=[32,64,128,256],
            embedding_dim=64,
            output_nc=2,
            decoder_softmax=False,
            feature_strides=[2,4,8,16]):
        super(DecoderTransformer_v3, self).__init__()
        # assert
        assert len(feature_strides) == len(in_channels)
        assert min(feature_strides) == feature_strides[0]

        # settings
        self.feature_strides = feature_strides
        self.input_transform = input_transform
        self.in_index = in_index
        self.align_corners = align_corners
        self.in_channels = in_channels
        self.embedding_dim = embedding_dim
        self.output_nc = output_nc
        c1_in_channels, c2_in_channels, c3_in_channels, c4_in_channels = self.in_channels

        # MLP decoder heads
        self.linear_c4 = MLP(
            input_dim=c4_in_channels,
            embed_dim=self.embedding_dim)
        self.linear_c3 = MLP(
            input_dim=c3_in_channels,
            embed_dim=self.embedding_dim)
        self.linear_c2 = MLP(
            input_dim=c2_in_channels,
            embed_dim=self.embedding_dim)
        self.linear_c1 = MLP(
            input_dim=c1_in_channels,
            embed_dim=self.embedding_dim)

        # convolutional Difference Layers
        self.diff_c4 = conv_diff(
            in_channels=2 * self.embedding_dim,
            out_channels=self.embedding_dim)
        self.diff_c3 = conv_diff(
            in_channels=2 * self.embedding_dim,
            out_channels=self.embedding_dim)
        self.diff_c2 = conv_diff(
            in_channels=2 * self.embedding_dim,
            out_channels=self.embedding_dim)
        self.diff_c1 = conv_diff(
            in_channels=2 * self.embedding_dim,
            out_channels=self.embedding_dim)

        # taking outputs from middle of the encoder
        self.make_pred_c4 = make_prediction(
            in_channels=self.embedding_dim,
            out_channels=self.output_nc)
        self.make_pred_c3 = make_prediction(
            in_channels=self.embedding_dim,
            out_channels=self.output_nc)
        self.make_pred_c2 = make_prediction(
            in_channels=self.embedding_dim,
            out_channels=self.output_nc)
        self.make_pred_c1 = make_prediction(
            in_channels=self.embedding_dim,
            out_channels=self.output_nc)

        # Final linear fusion layer
        self.linear_fuse = nn.Sequential(
            nn.Conv2D(
                in_channels=self.embedding_dim *
                len(in_channels),
                out_channels=self.embedding_dim,
                kernel_size=1),
            nn.BatchNorm2D(
                self.embedding_dim))

        # Final predction head
        self.convd2x = UpsampleConvLayer(
            self.embedding_dim,
            self.embedding_dim,
            kernel_size=4,
            stride=2)
        self.dense_2x = nn.Sequential(ResidualBlock(self.embedding_dim))
        self.convd1x = UpsampleConvLayer(
            self.embedding_dim,
            self.embedding_dim,
            kernel_size=4,
            stride=2)
        self.dense_1x = nn.Sequential(ResidualBlock(self.embedding_dim))
        self.change_probability = ConvLayer(
            self.embedding_dim,
            self.output_nc,
            kernel_size=3,
            stride=1,
            padding=1)

        # Final activation
        self.output_softmax = decoder_softmax
        self.active = nn.Sigmoid()

    def _transform_inputs(self, inputs):
        """Transform inputs for decoder.
        Args:
            inputs (list[Tensor]): List of multi-level img features.
        Returns:
            Tensor: The transformed inputs
        """

        if self.input_transform == 'resize_concat':
            inputs = [inputs[i] for i in self.in_index]
            upsampled_inputs = [
                resize(
                    input=x,
                    size=inputs[0].shape[2:],
                    mode='bilinear',
                    align_corners=self.align_corners) for x in inputs
            ]
            inputs = pd.concat(upsampled_inputs, dim=1)
        elif self.input_transform == 'multiple_select':
            inputs = [inputs[i] for i in self.in_index]
        else:
            inputs = inputs[self.in_index]

        return inputs

    def forward(self, inputs1, inputs2):
        # Transforming encoder features (select layers)
        x_1 = self._transform_inputs(inputs1)  # len=4, 1/2, 1/4, 1/8, 1/16
        x_2 = self._transform_inputs(inputs2)  # len=4, 1/2, 1/4, 1/8, 1/16

        # img1 and img2 features
        c1_1, c2_1, c3_1, c4_1 = x_1
        c1_2, c2_2, c3_2, c4_2 = x_2

        ############## MLP decoder on C1-C4 ###########
        n, _, h, w = c4_1.shape

        outputs = []
        # Stage 4: x1/32 scale
        _c4_1 = self.linear_c4(c4_1).transpose([0, 2, 1]).reshape(
            [n, -1, c4_1.shape[2], c4_1.shape[3]])
        _c4_2 = self.linear_c4(c4_2).transpose([0, 2, 1]).reshape(
            [n, -1, c4_2.shape[2], c4_2.shape[3]])
        _c4 = self.diff_c4(pd.concat((_c4_1, _c4_2), axis=1))
        p_c4 = self.make_pred_c4(_c4)
        outputs.append(p_c4)
        _c4_up = resize(_c4,
                        size=c1_2.shape[2:],
                        mode='bilinear',
                        align_corners=False)

        # Stage 3: x1/16 scale
        _c3_1 = self.linear_c3(c3_1).transpose([0, 2, 1]).reshape(
            [n, -1, c3_1.shape[2], c3_1.shape[3]])
        _c3_2 = self.linear_c3(c3_2).transpose([0, 2, 1]).reshape(
            [n, -1, c3_2.shape[2], c3_2.shape[3]])
        _c3 = self.diff_c3(pd.concat((_c3_1, _c3_2), axis=1)) + \
            F.interpolate(_c4, scale_factor=2, mode="bilinear")
        p_c3 = self.make_pred_c3(_c3)
        outputs.append(p_c3)
        _c3_up = resize(_c3,
                        size=c1_2.shape[2:],
                        mode='bilinear',
                        align_corners=False)

        # Stage 2: x1/8 scale
        _c2_1 = self.linear_c2(c2_1).transpose([0, 2, 1]).reshape(
            [n, -1, c2_1.shape[2], c2_1.shape[3]])
        _c2_2 = self.linear_c2(c2_2).transpose([0, 2, 1]).reshape(
            [n, -1, c2_2.shape[2], c2_2.shape[3]])
        _c2 = self.diff_c2(pd.concat((_c2_1, _c2_2), axis=1)) + \
            F.interpolate(_c3, scale_factor=2, mode="bilinear")
        p_c2 = self.make_pred_c2(_c2)
        outputs.append(p_c2)
        _c2_up = resize(_c2,
                        size=c1_2.shape[2:],
                        mode='bilinear',
                        align_corners=False)

        # Stage 1: x1/4 scale
        _c1_1 = self.linear_c1(c1_1).transpose([0, 2, 1]).reshape(
            [n, -1, c1_1.shape[2], c1_1.shape[3]])
        _c1_2 = self.linear_c1(c1_2).transpose([0, 2, 1]).reshape(
            [n, -1, c1_2.shape[2], c1_2.shape[3]])
        _c1 = self.diff_c1(pd.concat((_c1_1, _c1_2), axis=1)) + \
            F.interpolate(_c2, scale_factor=2, mode="bilinear")
        p_c1 = self.make_pred_c1(_c1)
        outputs.append(p_c1)

        # Linear Fusion of difference image from all scales
        _c = self.linear_fuse(pd.concat((_c4_up, _c3_up, _c2_up, _c1), axis=1))

        # #Dropout
        # if dropout_ratio > 0:
        #     self.dropout = nn.Dropout2d(dropout_ratio)
        # else:
        #     self.dropout = None

        # Upsampling x2 (x1/2 scale)
        x = self.convd2x(_c)
        # Residual block
        x = self.dense_2x(x)
        # Upsampling x2 (x1 scale)
        x = self.convd1x(x)
        # Residual block
        x = self.dense_1x(x)

        # Final prediction
        cp = self.change_probability(x)

        outputs.append(cp)

        if self.output_softmax:
            temp = outputs
            outputs = []
            for pred in temp:
                outputs.append(self.active(pred))

        return outputs

(三)复现精度

在LEVIR-CD测试集的测试效果如下,通过验收。

NetworkOPTEpochBatch-SizeDatasetF1-Score
ChangeFormerAdamW20016LEVIR-CD90.347%

(四) 环境和数据准备

  • 克隆仓库:
%cd work
!git clone https://github.com/HULEIYI/ChangeFormer-pd.git

# 注:代码已经放在work目录下并配置好了路径,建议直接使用。
正克隆到 'ChangeFormer-pd'...
remote: Enumerating objects: 182, done.[K
remote: Counting objects: 100% (182/182), done.[K
remote: Compressing objects: 100% (156/156), done.[K
remote: Total 182 (delta 26), reused 168 (delta 19), pack-reused 0[K
接收对象中: 100% (182/182), 32.78 MiB | 27.00 KiB/s, 完成.
处理 delta 中: 100% (26/26), 完成.
检查连接... 完成。
  • 解压数据和预训练权重:
!unzip -qo data/data161372/Data-and-PreWeight.zip -d data/data161372
  • 解压训练好的权重:
!unzip -qo data/data162790/checkpoints.zip -d data/data162790
!unzip -qo data/data162790/pretrained_changeformer.zip -d data/data162790
%mv data/data162790/checkpoints/CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256/best_ckpt.pdparam data/data162790/checkpoints/CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256/best_ckpt.pdparams
%mv data/data162790/pretrained_changeformer/pretrained_changeformer.pdparam data/data162790/pretrained_changeformer/pretrained_changeformer.pdparams
  • 安装依赖:
!pip install tifffile
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting tifffile
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d8/38/85ae5ed77598ca90558c17a2f79ddaba33173b31cf8d8f545d34d9134f0d/tifffile-2021.11.2-py3-none-any.whl (178 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m178.9/178.9 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hRequirement already satisfied: numpy>=1.15.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tifffile) (1.19.5)
Installing collected packages: tifffile
Successfully installed tifffile-2021.11.2

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.1.2[0m[39;49m -> [0m[32;49m22.2.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

(五)快速体验

  • 模型训练:
    • 使用 run_ChangeFormer_LEVIR.sh脚本即可快速开始训练,超参数已在脚本中对齐原文。
%cd work/ChangeFormer-pd/scripts
/home/aistudio/work/ChangeFormer-pd/scripts
!sh run_ChangeFormer_LEVIR.sh

# 注意:
#     如果报错pretrained_changeformer.pdparams不存在,请在相应路径下将pretrained_changeformer.pdparam重命名为pretrained_changeformer.pdparams即可。
#     这一错误是上传数据集时命名错误,不影响训练。
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Sized
['gpu:0']
W0817 16:58:56.233995   960 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0817 16:58:56.238157   960 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
initialize network with normal
================ (Wed Aug 17 16:58:57 2022) ================
gpu_ids: ['gpu:0'] project_name: CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256 checkpoint_root: ./checkpoints vis_root: ./vis num_workers: 0 dataset: CDDataset data_name: LEVIR batch_size: 16 split: trainval split_val: test img_size: 256 shuffle_AB: False n_class: 2 embed_dim: 256 pretrain: ../../data/data162790/pretrained_changeformer/pretrained_changeformer.pdparam multi_scale_train: True multi_scale_infer: False multi_pred_weights: [0.5, 0.5, 0.5, 0.8, 1.0] net_G: ChangeFormerV6 loss: ce optimizer: adamw lr: 0.0001 max_epochs: 200 lr_policy: linear lr_decay_iters: 100 checkpoint_dir: ./checkpoints/CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256 vis_dir: ./vis/CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256 

Initializing backbone weights from: ../../data/data162790/pretrained_changeformer/pretrained_changeformer.pdparam


lr: 0.0001000
 
  0%|                                                   | 0/509 [00:00<?, ?it/s]/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:654: UserWarning: When training, we now always track global mean and variance.
  "When training, we now always track global mean and variance.")
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py:278: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float32, but right dtype is paddle.bool, the right dtype will convert to paddle.float32
  format(lhs_dtype, rhs_dtype, lhs_dtype))
  0%|                                           | 1/509 [00:03<31:52,  3.77s/it]Is_training: True. [0,199][1,509], imps: 92.96, est: 77.87h, G_loss: 1.34087, running_mf1: 0.52390
  8%|███▌                                      | 43/509 [01:10<12:27,  1.60s/it]^C
  • 模型验证:
    • 训练好的权重文件已放在 data/data162790/checkpoints下,可自行查看。
    • 使用提供的 eval_ChangeFormer_LEVIR.sh 文件即可体验。
!sh eval_ChangeFormer_LEVIR.sh
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Sized
[]
W0817 17:00:43.274778  1190 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0817 17:00:43.278940  1190 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
initialize network with normal
gpu:0
================ (Wed Aug 17 17:00:44 2022) ================
gpu_ids: [] project_name: CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256 print_models: False checkpoints_root: ../../data/data162790/checkpoints vis_root: ./vis num_workers: 8 dataset: CDDataset data_name: LEVIR batch_size: 1 split: test img_size: 256 n_class: 2 embed_dim: 256 net_G: ChangeFormerV6 checkpoint_name: best_ckpt.pdparam checkpoint_dir: ../../data/data162790/checkpoints/CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256 vis_dir: ./vis/CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256 loading last checkpoint...
Eval Historical_best_acc = 0.9492 (at epoch 183)

Begin evaluation...
Is_training: False. [1,2048],  running_mf1: 0.96424
Is_training: False. [101,2048],  running_mf1: 0.50000
Is_training: False. [201,2048],  running_mf1: 0.96965
Is_training: False. [301,2048],  running_mf1: 0.98952
Is_training: False. [401,2048],  running_mf1: 0.50000
Is_training: False. [501,2048],  running_mf1: 0.49922
Is_training: False. [601,2048],  running_mf1: 0.50000
Is_training: False. [701,2048],  running_mf1: 0.88527
Is_training: False. [801,2048],  running_mf1: 0.93557
Is_training: False. [901,2048],  running_mf1: 0.89550
Is_training: False. [1001,2048],  running_mf1: 0.97242
Is_training: False. [1101,2048],  running_mf1: 0.92867
Is_training: False. [1201,2048],  running_mf1: 0.49982
Is_training: False. [1301,2048],  running_mf1: 0.95010
Is_training: False. [1401,2048],  running_mf1: 0.50000
Is_training: False. [1501,2048],  running_mf1: 0.96817
Is_training: False. [1601,2048],  running_mf1: 0.97160
Is_training: False. [1701,2048],  running_mf1: 0.49924
Is_training: False. [1801,2048],  running_mf1: 0.50000
Is_training: False. [1901,2048],  running_mf1: 0.49871
Is_training: False. [2001,2048],  running_mf1: 0.96150
acc: 0.99034 miou: 0.90691 mf1: 0.94919 iou_0: 0.98988 iou_1: 0.82394 F1_0: 0.99491 F1_1: 0.90347 precision_0: 0.99399 precision_1: 0.91975 recall_0: 0.99584 recall_1: 0.88777 
  • 模型预测:
    • 加载训练好的权重,使用模型进行预测。
    • 使用 demo.py
      • 生成预测图片在 ChangeFormer-pd/samples_LEVIR/predict_CD_ChangeFormerV6 路径下。
%cd ..
/home/aistudio/work/ChangeFormer-pd
!python demo_LEVIR.py
W0817 18:22:09.385291  4511 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0817 18:22:09.389484  4511 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
initialize network with normal
gpu:0
best_acc:  0.9491929514314832
process: ['test_77_0512_0256.png']
process: ['test_102_0512_0000.png']
process: ['test_121_0768_0256.png']
process: ['test_2_0000_0000.png']
process: ['test_2_0000_0512.png']
process: ['test_7_0256_0512.png']
process: ['test_55_0256_0000.png']
  • 模型导出:
    • 参数model_path.pdparams后缀的权重的路径
    • 参数save_inference_dir为静态图的保存文件夹路径
!python export_model.py --model_path ../../data/data162790/checkpoints/CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256/best_ckpt.pdparams --save_inference_dir ./inference/
W0817 18:27:48.933506  5191 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0817 18:27:48.937757  5191 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
  • 静态图推理:
    • 使用infer.py文件进行推理。
    • 参数model_dir为静态图保存的文件夹路径
    • 参数img_dir为待预测的图像文件夹路径,文件夹下需包含A和B两个子文件夹
!python infer.py --model_dir ./inference/ --img_dir ./samples_LEVIR/
total file number is 11
finish

(六)TIPC基础链条测试

该部分依赖auto_log,需要进行安装,安装方式如下:

auto_log的详细介绍参考https://github.com/LDOUBLEV/AutoLog

#%cd ~
%cd /home/aistudio/work/ChangeFormer-pd
!git clone https://github.com/LDOUBLEV/AutoLog
%cd ./AutoLog/
!pip3 install -r requirements.txt
!python3 setup.py bdist_wheel
!pip3 install ./dist/auto_log-1.2.0-py3-none-any.whl
%cd ..

## 网络原因可能会失败。
  • 运行命令,准备小批量数据
%cd work/ChangeFormer-pd/
!bash ./test_tipc/prepare.sh test_tipc/configs/ChangeFormer/train_infer_python.txt 'lite_train_lite_infer'
  • 运行命令,小批量数据训练、导出、推理一体化:
!bash test_tipc/test_train_inference_python.sh test_tipc/configs/ChangeFormer/train_infer_python.txt 'lite_train_lite_infer'

(七)项目总结

  • 项目基于论文原文以及【论文笔记】A Transformer-based Siamese network for change detection论文笔记对ChangeFormer进行了较为详细的说明。
  • 项目给出了基于paddl框架的复现仓库地址,并基于代码给出了训练、测试和预测,以及ChangeFormer模型导出的快速体验命令。
  • 关于模型代码部分,由于paddle和pytorch框架架构类似,其代码转换并不复杂,具体可以参考《论文复现赛指南》,其中提供了很多torch和paddle的api转换对,可以直接使用,对于paddle没有实现的api也提供了一些组合解决的方案。
    • 对于没有解决方案的torch api,我的一点心得是查看torch的源码,将对应功能的源码改写成paddle格式一般就可以使用。
      -另外,对于自己没有办法解决的问题,也可以在paddle的社区内进行求助,复现赛专属的rd也会帮助解决很多问题。这一点必须给我们paddle点赞!👍

(八)致谢

  • 感谢百度飞桨提供了论文复现赛这样的平台,在比赛冲奖的同时也提升了自己的深度学习代码能力,加深了对科研方向的理解。
  • 特别感谢遥感复现组的rd晖哥在任务代码方面的帮助,以及芋泥啵啵姐的算力支持!
  • 最最最后,感谢孔远杭:KKKloveqbh同学的鼎力相助!

此文章为搬运
原项目链接

Logo

学大模型,用大模型上飞桨星河社区!每天8点V100G算力免费领!免费领取ERNIE 4.0 100w Token >>>

更多推荐