「复现赛」A TRANSFORMER-BASED SIAMESE NETWORK

AI Studio

1163人浏览 · 2022-08-28 09:56:34

AI Studio · 2022-08-28 09:56:34 发布

「论文复现」A TRANSFORMER-BASED SIAMESE NETWORK FOR CHANGE DETECTION

Index Terms：
Change detection, transformer - Siamese network, attention mechanism, multilayer perceptron, remote
sensing.

(一) 摘要

1.1 方法简介

ChangeFormer是一种基于Transformer的Siamese网络架构，用于从一对前后时序的遥感图像中提取变化区域（ChangeDetection,CD）。与基于全卷积网络（ConvNet）的CD框架不同，ChangeFormer将层次化的Transformer编码器与多层感知（MLP）解码器在Siamese网络统一起来，以高效提取准确CD所需的多尺度和长程依赖的细节。对两个CD数据集的实验（Paddle版本仅在LEVIR-CD数据集上进行了训练和推理测试）表明，与之前的版本相比，端到端可训练的ChangeFormer架构实现了更好的CD性能。原论文代码和预训练模型可在以上参考链接找到。

1.2 模型概览

注：以下ChangeFormer的有关分析参考CSDN博客：【论文笔记】A Transformer-based Siamese network for change detection

ChangeFormer的主要贡献是将Transformer模块全程应用于变化检测（CD），而不是使用CNN+TRM的网络搭建方式，证明了纯TRM网络+MLP Head也能取得SOTA效果。网络整体架构可以分为三个模块：
- hierarchical transformer encoder：构成孪生TRM网络，用于特征提取和编码；
- difference modules：四个差异化模块，计算多尺度差异；
- light weight MLP decoder: 轻量化的MLP decoder，融合多尺度特征，进行CD mask的预测。
网络的pipeline：
- 输入变化检测图像对，孪生TRM网络分别提取前后时相的多级特征；
- difference modules接收孪生网络不同层生成的共4组不同尺度的特征，进行拼接和融合；
- 将上一步产生的多级特征输入MLP 解码器进行聚合，得到变化检测mask输出。

在这里插入图片描述

（二）模型架构

从2020年ViT模型横空出世到如今Vision Transformer大行其道，且有一统CV的态势，也只有短短两年时间。现今各大CV任务榜单上均不乏ViT，SwinTRM以及其他一些TRM变体的身影。

Transformer能够在NLP和CV领域大放异彩，主要来源于TRM的全局注意力以及神经元间长程依赖的建模能力。有关TRM的介绍特别是视觉TRM的有关知识，可以参见AI Studio课程：从零开始学视觉Transformer。

即使TRM在各自然图像任务上应用广泛且效果优异，但是在遥感类任务上却有所滞后，以此篇论文关注的变化检测领域来说，BiT引入TRM完成CD任务，但backbone仍是CNN（ResNet18）（有关BiT的介绍以及上手体验可以转【第六期论文复现赛-变化检测】Remote Sensing Image Change Detection with Transformers），直到本次复现论文ChangeFormer才完全使用TRM架构作为模型主体。下面就让笔者带着大家一探ChangeFormer里TRM的魅力吧。

模型的主体架构在章节（一）中已经说明，不再赘述，下文将就这三种模块进行说明以及代码展示。

2.1 hierarchical transformer encoder

hierachical transformer encoder是ChangeFormer Encoder部分的基础模块，用来构建孪生特征提取网络。
- 孪生网络简单来说就是共享参数的网络处理一对图像对来完成某种任务的backbone框架。关于孪生网络的介绍可以参见[【第六期论文复现赛-变化检测】SNUNet-CD.
ChangeFormer的hierachical transformer encoder里的TRM模块大体上就是经典的ViT模块，最主要的修改即其在堆叠的TRM层间加入了Downsample模块。
- DownSample模块的主要功能就是对patch进行下采样，模仿CNN结构获得多尺度的图像特征。
Encoder代码如下：

# Transormer Ecoder with x2, x4, x8, x16 scales
class EncoderTransformer_v3(nn.Layer):
    def __init__(
        self,
        img_size=256,
        patch_size=3,
        in_chans=3,
        num_classes=2,
        embed_dims=[32,64,128,256],
        num_heads=[2,2,4,8],
            mlp_ratios=[4,4,4,4],
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.,
        attn_drop_rate=0.,
        drop_path_rate=0.,
        norm_layer=nn.LayerNorm,
        depths=[3,3,6,18],
        sr_ratios=[8,4,2,1]):
        super().__init__()
        self.num_classes = num_classes
        self.depths = depths
        self.embed_dims = embed_dims

        # patch embedding definitions
        self.patch_embed1 = OverlapPatchEmbed(
            img_size=img_size,
            patch_size=7,
            stride=4,
            in_chans=in_chans,
            embed_dim=embed_dims[0])
        self.patch_embed2 = OverlapPatchEmbed(
            img_size=img_size // 4,
            patch_size=patch_size,
            stride=2,
            in_chans=embed_dims[0],
            embed_dim=embed_dims[1])
        self.patch_embed3 = OverlapPatchEmbed(
            img_size=img_size // 8,
            patch_size=patch_size,
            stride=2,
            in_chans=embed_dims[1],
            embed_dim=embed_dims[2])
        self.patch_embed4 = OverlapPatchEmbed(
            img_size=img_size // 16,
            patch_size=patch_size,
            stride=2,
            in_chans=embed_dims[2],
            embed_dim=embed_dims[3])

        # Stage-1 (x1/4 scale)
        dpr = [x.item() for x in pd.linspace(0, drop_path_rate, sum(depths))]
        cur = 0
        self.block1 = nn.LayerList([Block(dim=embed_dims[0],
                                          num_heads=num_heads[0],
                                          mlp_ratio=mlp_ratios[0],
                                          qkv_bias=qkv_bias,
                                          qk_scale=qk_scale,
                                          drop=drop_rate,
                                          attn_drop=attn_drop_rate,
                                          drop_path=dpr[cur + i],
                                          norm_layer=norm_layer,
                                          sr_ratio=sr_ratios[0]) for i in range(depths[0])])
        self.norm1 = norm_layer(embed_dims[0])

        # Stage-2 (x1/8 scale)
        cur += depths[0]
        self.block2 = nn.LayerList([Block(dim=embed_dims[1],
                                          num_heads=num_heads[1],
                                          mlp_ratio=mlp_ratios[1],
                                          qkv_bias=qkv_bias,
                                          qk_scale=qk_scale,
                                          drop=drop_rate,
                                          attn_drop=attn_drop_rate,
                                          drop_path=dpr[cur + i],
                                          norm_layer=norm_layer,
                                          sr_ratio=sr_ratios[1]) for i in range(depths[1])])
        self.norm2 = norm_layer(embed_dims[1])

        # Stage-3 (x1/16 scale)
        cur += depths[1]
        self.block3 = nn.LayerList([Block(dim=embed_dims[2],
                                          num_heads=num_heads[2],
                                          mlp_ratio=mlp_ratios[2],
                                          qkv_bias=qkv_bias,
                                          qk_scale=qk_scale,
                                          drop=drop_rate,
                                          attn_drop=attn_drop_rate,
                                          drop_path=dpr[cur + i],
                                          norm_layer=norm_layer,
                                          sr_ratio=sr_ratios[2]) for i in range(depths[2])])
        self.norm3 = norm_layer(embed_dims[2])

        # Stage-4 (x1/32 scale)
        cur += depths[2]
        self.block4 = nn.LayerList([Block(dim=embed_dims[3],
                                          num_heads=num_heads[3],
                                          mlp_ratio=mlp_ratios[3],
                                          qkv_bias=qkv_bias,
                                          qk_scale=qk_scale,
                                          drop=drop_rate,
                                          attn_drop=attn_drop_rate,
                                          drop_path=dpr[cur + i],
                                          norm_layer=norm_layer,
                                          sr_ratio=sr_ratios[3]) for i in range(depths[3])])
        self.norm4 = norm_layer(embed_dims[3])

        self.apply(self._init_weights)

        # for ent in self.parameters():
        #     self._init_weights(ent)

    def _init_weights(self, m):
        if isinstance(m, nn.Linear):
            trunc_normal_op = nn.initializer.TruncatedNormal(std=.02)
            trunc_normal_op(m.weight)
            # trunc_normal_(m.weight, std=.02)
            if isinstance(m, nn.Linear) and m.bias is not None:
                init_bias = nn.initializer.Constant(0)
                init_bias(m.bias)
                # nn.init.constant_(m.bias, 0)
        elif isinstance(m, nn.LayerNorm):
            init_bias = nn.initializer.Constant(0)
            init_bias(m.bias)
            # nn.init.constant_(m.bias, 0)
            init_weight = nn.initializer.Constant(1.0)
            init_weight(m.weight)
            # nn.init.constant_(m.weight, 1.0)
        elif isinstance(m, nn.Conv2D):
            fan_out = m._kernel_size[0] * m._kernel_size[1] * m._out_channels
            fan_out //= m._groups
            init_weight = nn.initializer.Normal(0, math.sqrt(2.0 / fan_out))
            init_weight(m.weight)
            # m.weight.data.normal_(0, math.sqrt(2.0 / fan_out))
            if m.bias is not None:
                init_bias = nn.initializer.Constant(0)
                init_bias(m.bias)
                # m.bias.data.zero_()

    def reset_drop_path(self, drop_path_rate):
        dpr = [
            x.item() for x in pd.linspace(
                0, drop_path_rate, sum(
                    self.depths))]
        cur = 0
        for i in range(self.depths[0]):
            self.block1[i].drop_path.drop_prob = dpr[cur + i]

        cur += self.depths[0]
        for i in range(self.depths[1]):
            self.block2[i].drop_path.drop_prob = dpr[cur + i]

        cur += self.depths[1]
        for i in range(self.depths[2]):
            self.block3[i].drop_path.drop_prob = dpr[cur + i]

        cur += self.depths[2]
        for i in range(self.depths[3]):
            self.block4[i].drop_path.drop_prob = dpr[cur + i]

    def forward_features(self, x):
        # print(x)
        B = x.shape[0]
        outs = []

        # stage 1
        x1, H1, W1 = self.patch_embed1(x)
        # print(x1,H1,W1)
        for i, blk in enumerate(self.block1):
            x1 = blk(x1, H1, W1)
        # print(x1)
        x1 = self.norm1(x1)
        x1 = x1.reshape([B, H1, W1, -1]).transpose([0, 3, 1, 2])
        # print(x1)
        outs.append(x1)

        # stage 2
        x1, H1, W1 = self.patch_embed2(x1)
        for i, blk in enumerate(self.block2):
            x1 = blk(x1, H1, W1)
        x1 = self.norm2(x1)
        x1 = x1.reshape([B, H1, W1, -1]).transpose([0, 3, 1, 2])
        outs.append(x1)

        # stage 3
        x1, H1, W1 = self.patch_embed3(x1)
        for i, blk in enumerate(self.block3):
            x1 = blk(x1, H1, W1)
        x1 = self.norm3(x1)
        x1 = x1.reshape([B, H1, W1, -1]).transpose([0, 3, 1, 2])
        outs.append(x1)

        # stage 4
        x1, H1, W1 = self.patch_embed4(x1)
        for i, blk in enumerate(self.block4):
            x1 = blk(x1, H1, W1)
        x1 = self.norm4(x1)
        x1 = x1.reshape([B, H1, W1, -1]).transpose([0, 3, 1, 2])
        outs.append(x1)
        return outs

    def forward(self, x):
        x = self.forward_features(x)
        return x

2.2 difference module

difference module的主要功能就是在不同尺度进行特征拼接输出差异。
- 基本过程是进行拼接，然后卷积。
- 要注意的是，这一操作并没有直接计算特征图的绝对差值，而是在训练过中学习每个尺度上的最优距离度量。
differenc module代码：

# Difference Layer
def conv_diff(in_channels, out_channels):
    return nn.Sequential(
        nn.Conv2D(in_channels, out_channels, kernel_size=3, padding=1),
        nn.ReLU(),
        nn.BatchNorm2D(out_channels),
        nn.Conv2D(out_channels, out_channels, kernel_size=3, padding=1),
        nn.ReLU()
    )

2.3 MLP decoder

MLP Decoder的操作可以分为如下三个步骤：
- MLP & Upsampling
  - MLP层统一不同特征图的通道数，然后，上采样到H/4,W/4。
- Concatenation & Fusion
  - 将上一步得到的上采样特征图进行拼接，然后再使用一层MLP层进行融合。
- Upsampling & Classification
  - 再次进行上采样，得到H,W尺寸的特征图，最后再使用一层MLP层得到CD mask。
MLP decoder的代码：

class DecoderTransformer_v3(nn.Layer):
    """
    Transformer Decoder
    """

    def __init__(
        self,
        input_transform='multiple_select',
        in_index=[0,1,2,3],
        align_corners=True,
        in_channels=[32,64,128,256],
            embedding_dim=64,
            output_nc=2,
            decoder_softmax=False,
            feature_strides=[2,4,8,16]):
        super(DecoderTransformer_v3, self).__init__()
        # assert
        assert len(feature_strides) == len(in_channels)
        assert min(feature_strides) == feature_strides[0]

        # settings
        self.feature_strides = feature_strides
        self.input_transform = input_transform
        self.in_index = in_index
        self.align_corners = align_corners
        self.in_channels = in_channels
        self.embedding_dim = embedding_dim
        self.output_nc = output_nc
        c1_in_channels, c2_in_channels, c3_in_channels, c4_in_channels = self.in_channels

        # MLP decoder heads
        self.linear_c4 = MLP(
            input_dim=c4_in_channels,
            embed_dim=self.embedding_dim)
        self.linear_c3 = MLP(
            input_dim=c3_in_channels,
            embed_dim=self.embedding_dim)
        self.linear_c2 = MLP(
            input_dim=c2_in_channels,
            embed_dim=self.embedding_dim)
        self.linear_c1 = MLP(
            input_dim=c1_in_channels,
            embed_dim=self.embedding_dim)

        # convolutional Difference Layers
        self.diff_c4 = conv_diff(
            in_channels=2 * self.embedding_dim,
            out_channels=self.embedding_dim)
        self.diff_c3 = conv_diff(
            in_channels=2 * self.embedding_dim,
            out_channels=self.embedding_dim)
        self.diff_c2 = conv_diff(
            in_channels=2 * self.embedding_dim,
            out_channels=self.embedding_dim)
        self.diff_c1 = conv_diff(
            in_channels=2 * self.embedding_dim,
            out_channels=self.embedding_dim)

        # taking outputs from middle of the encoder
        self.make_pred_c4 = make_prediction(
            in_channels=self.embedding_dim,
            out_channels=self.output_nc)
        self.make_pred_c3 = make_prediction(
            in_channels=self.embedding_dim,
            out_channels=self.output_nc)
        self.make_pred_c2 = make_prediction(
            in_channels=self.embedding_dim,
            out_channels=self.output_nc)
        self.make_pred_c1 = make_prediction(
            in_channels=self.embedding_dim,
            out_channels=self.output_nc)

        # Final linear fusion layer
        self.linear_fuse = nn.Sequential(
            nn.Conv2D(
                in_channels=self.embedding_dim *
                len(in_channels),
                out_channels=self.embedding_dim,
                kernel_size=1),
            nn.BatchNorm2D(
                self.embedding_dim))

        # Final predction head
        self.convd2x = UpsampleConvLayer(
            self.embedding_dim,
            self.embedding_dim,
            kernel_size=4,
            stride=2)
        self.dense_2x = nn.Sequential(ResidualBlock(self.embedding_dim))
        self.convd1x = UpsampleConvLayer(
            self.embedding_dim,
            self.embedding_dim,
            kernel_size=4,
            stride=2)
        self.dense_1x = nn.Sequential(ResidualBlock(self.embedding_dim))
        self.change_probability = ConvLayer(
            self.embedding_dim,
            self.output_nc,
            kernel_size=3,
            stride=1,
            padding=1)

        # Final activation
        self.output_softmax = decoder_softmax
        self.active = nn.Sigmoid()

    def _transform_inputs(self, inputs):
        """Transform inputs for decoder.
        Args:
            inputs (list[Tensor]): List of multi-level img features.
        Returns:
            Tensor: The transformed inputs
        """

        if self.input_transform == 'resize_concat':
            inputs = [inputs[i] for i in self.in_index]
            upsampled_inputs = [
                resize(
                    input=x,
                    size=inputs[0].shape[2:],
                    mode='bilinear',
                    align_corners=self.align_corners) for x in inputs
            ]
            inputs = pd.concat(upsampled_inputs, dim=1)
        elif self.input_transform == 'multiple_select':
            inputs = [inputs[i] for i in self.in_index]
        else:
            inputs = inputs[self.in_index]

        return inputs

    def forward(self, inputs1, inputs2):
        # Transforming encoder features (select layers)
        x_1 = self._transform_inputs(inputs1)  # len=4, 1/2, 1/4, 1/8, 1/16
        x_2 = self._transform_inputs(inputs2)  # len=4, 1/2, 1/4, 1/8, 1/16

        # img1 and img2 features
        c1_1, c2_1, c3_1, c4_1 = x_1
        c1_2, c2_2, c3_2, c4_2 = x_2

        ############## MLP decoder on C1-C4 ###########
        n, _, h, w = c4_1.shape

        outputs = []
        # Stage 4: x1/32 scale
        _c4_1 = self.linear_c4(c4_1).transpose([0, 2, 1]).reshape(
            [n, -1, c4_1.shape[2], c4_1.shape[3]])
        _c4_2 = self.linear_c4(c4_2).transpose([0, 2, 1]).reshape(
            [n, -1, c4_2.shape[2], c4_2.shape[3]])
        _c4 = self.diff_c4(pd.concat((_c4_1, _c4_2), axis=1))
        p_c4 = self.make_pred_c4(_c4)
        outputs.append(p_c4)
        _c4_up = resize(_c4,
                        size=c1_2.shape[2:],
                        mode='bilinear',
                        align_corners=False)

        # Stage 3: x1/16 scale
        _c3_1 = self.linear_c3(c3_1).transpose([0, 2, 1]).reshape(
            [n, -1, c3_1.shape[2], c3_1.shape[3]])
        _c3_2 = self.linear_c3(c3_2).transpose([0, 2, 1]).reshape(
            [n, -1, c3_2.shape[2], c3_2.shape[3]])
        _c3 = self.diff_c3(pd.concat((_c3_1, _c3_2), axis=1)) + \
            F.interpolate(_c4, scale_factor=2, mode="bilinear")
        p_c3 = self.make_pred_c3(_c3)
        outputs.append(p_c3)
        _c3_up = resize(_c3,
                        size=c1_2.shape[2:],
                        mode='bilinear',
                        align_corners=False)

        # Stage 2: x1/8 scale
        _c2_1 = self.linear_c2(c2_1).transpose([0, 2, 1]).reshape(
            [n, -1, c2_1.shape[2], c2_1.shape[3]])
        _c2_2 = self.linear_c2(c2_2).transpose([0, 2, 1]).reshape(
            [n, -1, c2_2.shape[2], c2_2.shape[3]])
        _c2 = self.diff_c2(pd.concat((_c2_1, _c2_2), axis=1)) + \
            F.interpolate(_c3, scale_factor=2, mode="bilinear")
        p_c2 = self.make_pred_c2(_c2)
        outputs.append(p_c2)
        _c2_up = resize(_c2,
                        size=c1_2.shape[2:],
                        mode='bilinear',
                        align_corners=False)

        # Stage 1: x1/4 scale
        _c1_1 = self.linear_c1(c1_1).transpose([0, 2, 1]).reshape(
            [n, -1, c1_1.shape[2], c1_1.shape[3]])
        _c1_2 = self.linear_c1(c1_2).transpose([0, 2, 1]).reshape(
            [n, -1, c1_2.shape[2], c1_2.shape[3]])
        _c1 = self.diff_c1(pd.concat((_c1_1, _c1_2), axis=1)) + \
            F.interpolate(_c2, scale_factor=2, mode="bilinear")
        p_c1 = self.make_pred_c1(_c1)
        outputs.append(p_c1)

        # Linear Fusion of difference image from all scales
        _c = self.linear_fuse(pd.concat((_c4_up, _c3_up, _c2_up, _c1), axis=1))

        # #Dropout
        # if dropout_ratio > 0:
        #     self.dropout = nn.Dropout2d(dropout_ratio)
        # else:
        #     self.dropout = None

        # Upsampling x2 (x1/2 scale)
        x = self.convd2x(_c)
        # Residual block
        x = self.dense_2x(x)
        # Upsampling x2 (x1 scale)
        x = self.convd1x(x)
        # Residual block
        x = self.dense_1x(x)

        # Final prediction
        cp = self.change_probability(x)

        outputs.append(cp)

        if self.output_softmax:
            temp = outputs
            outputs = []
            for pred in temp:
                outputs.append(self.active(pred))

        return outputs

（三）复现精度

在LEVIR-CD测试集的测试效果如下，通过验收。

Network	OPT	Epoch	Batch-Size	Dataset	F1-Score
ChangeFormer	AdamW	200	16	LEVIR-CD	90.347%

（四）环境和数据准备

克隆仓库：

%cd work
!git clone https://github.com/HULEIYI/ChangeFormer-pd.git

# 注：代码已经放在work目录下并配置好了路径，建议直接使用。

正克隆到 'ChangeFormer-pd'...
remote: Enumerating objects: 182, done.[K
remote: Counting objects: 100% (182/182), done.[K
remote: Compressing objects: 100% (156/156), done.[K
remote: Total 182 (delta 26), reused 168 (delta 19), pack-reused 0[K
接收对象中: 100% (182/182), 32.78 MiB | 27.00 KiB/s, 完成.
处理 delta 中: 100% (26/26), 完成.
检查连接... 完成。

解压数据和预训练权重：

!unzip -qo data/data161372/Data-and-PreWeight.zip -d data/data161372

解压训练好的权重：

!unzip -qo data/data162790/checkpoints.zip -d data/data162790
!unzip -qo data/data162790/pretrained_changeformer.zip -d data/data162790

%mv data/data162790/checkpoints/CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256/best_ckpt.pdparam data/data162790/checkpoints/CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256/best_ckpt.pdparams
%mv data/data162790/pretrained_changeformer/pretrained_changeformer.pdparam data/data162790/pretrained_changeformer/pretrained_changeformer.pdparams

安装依赖：

!pip install tifffile

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting tifffile
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d8/38/85ae5ed77598ca90558c17a2f79ddaba33173b31cf8d8f545d34d9134f0d/tifffile-2021.11.2-py3-none-any.whl (178 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m178.9/178.9 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hRequirement already satisfied: numpy>=1.15.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tifffile) (1.19.5)
Installing collected packages: tifffile
Successfully installed tifffile-2021.11.2

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.1.2[0m[39;49m -> [0m[32;49m22.2.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

（五）快速体验

模型训练：
- 使用 run_ChangeFormer_LEVIR.sh脚本即可快速开始训练，超参数已在脚本中对齐原文。

%cd work/ChangeFormer-pd/scripts

/home/aistudio/work/ChangeFormer-pd/scripts

!sh run_ChangeFormer_LEVIR.sh

# 注意：
#     如果报错pretrained_changeformer.pdparams不存在，请在相应路径下将pretrained_changeformer.pdparam重命名为pretrained_changeformer.pdparams即可。
#     这一错误是上传数据集时命名错误，不影响训练。

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Sized
['gpu:0']
W0817 16:58:56.233995   960 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0817 16:58:56.238157   960 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
initialize network with normal
================ (Wed Aug 17 16:58:57 2022) ================
gpu_ids: ['gpu:0'] project_name: CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256 checkpoint_root: ./checkpoints vis_root: ./vis num_workers: 0 dataset: CDDataset data_name: LEVIR batch_size: 16 split: trainval split_val: test img_size: 256 shuffle_AB: False n_class: 2 embed_dim: 256 pretrain: ../../data/data162790/pretrained_changeformer/pretrained_changeformer.pdparam multi_scale_train: True multi_scale_infer: False multi_pred_weights: [0.5, 0.5, 0.5, 0.8, 1.0] net_G: ChangeFormerV6 loss: ce optimizer: adamw lr: 0.0001 max_epochs: 200 lr_policy: linear lr_decay_iters: 100 checkpoint_dir: ./checkpoints/CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256 vis_dir: ./vis/CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256 

Initializing backbone weights from: ../../data/data162790/pretrained_changeformer/pretrained_changeformer.pdparam


lr: 0.0001000
 
  0%|                                                   | 0/509 [00:00<?, ?it/s]/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:654: UserWarning: When training, we now always track global mean and variance.
  "When training, we now always track global mean and variance.")
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py:278: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float32, but right dtype is paddle.bool, the right dtype will convert to paddle.float32
  format(lhs_dtype, rhs_dtype, lhs_dtype))
  0%|                                           | 1/509 [00:03<31:52,  3.77s/it]Is_training: True. [0,199][1,509], imps: 92.96, est: 77.87h, G_loss: 1.34087, running_mf1: 0.52390
  8%|███▌                                      | 43/509 [01:10<12:27,  1.60s/it]^C

模型验证：
- 训练好的权重文件已放在 data/data162790/checkpoints下，可自行查看。
- 使用提供的 eval_ChangeFormer_LEVIR.sh 文件即可体验。

!sh eval_ChangeFormer_LEVIR.sh

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Sized
[]
W0817 17:00:43.274778  1190 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0817 17:00:43.278940  1190 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
initialize network with normal
gpu:0
================ (Wed Aug 17 17:00:44 2022) ================
gpu_ids: [] project_name: CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256 print_models: False checkpoints_root: ../../data/data162790/checkpoints vis_root: ./vis num_workers: 8 dataset: CDDataset data_name: LEVIR batch_size: 1 split: test img_size: 256 n_class: 2 embed_dim: 256 net_G: ChangeFormerV6 checkpoint_name: best_ckpt.pdparam checkpoint_dir: ../../data/data162790/checkpoints/CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256 vis_dir: ./vis/CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256 loading last checkpoint...
Eval Historical_best_acc = 0.9492 (at epoch 183)

Begin evaluation...
Is_training: False. [1,2048],  running_mf1: 0.96424
Is_training: False. [101,2048],  running_mf1: 0.50000
Is_training: False. [201,2048],  running_mf1: 0.96965
Is_training: False. [301,2048],  running_mf1: 0.98952
Is_training: False. [401,2048],  running_mf1: 0.50000
Is_training: False. [501,2048],  running_mf1: 0.49922
Is_training: False. [601,2048],  running_mf1: 0.50000
Is_training: False. [701,2048],  running_mf1: 0.88527
Is_training: False. [801,2048],  running_mf1: 0.93557
Is_training: False. [901,2048],  running_mf1: 0.89550
Is_training: False. [1001,2048],  running_mf1: 0.97242
Is_training: False. [1101,2048],  running_mf1: 0.92867
Is_training: False. [1201,2048],  running_mf1: 0.49982
Is_training: False. [1301,2048],  running_mf1: 0.95010
Is_training: False. [1401,2048],  running_mf1: 0.50000
Is_training: False. [1501,2048],  running_mf1: 0.96817
Is_training: False. [1601,2048],  running_mf1: 0.97160
Is_training: False. [1701,2048],  running_mf1: 0.49924
Is_training: False. [1801,2048],  running_mf1: 0.50000
Is_training: False. [1901,2048],  running_mf1: 0.49871
Is_training: False. [2001,2048],  running_mf1: 0.96150
acc: 0.99034 miou: 0.90691 mf1: 0.94919 iou_0: 0.98988 iou_1: 0.82394 F1_0: 0.99491 F1_1: 0.90347 precision_0: 0.99399 precision_1: 0.91975 recall_0: 0.99584 recall_1: 0.88777

模型预测：
- 加载训练好的权重，使用模型进行预测。
- 使用 demo.py。
  - 生成预测图片在 ChangeFormer-pd/samples_LEVIR/predict_CD_ChangeFormerV6 路径下。

%cd ..

/home/aistudio/work/ChangeFormer-pd

!python demo_LEVIR.py

W0817 18:22:09.385291  4511 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0817 18:22:09.389484  4511 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
initialize network with normal
gpu:0
best_acc:  0.9491929514314832
process: ['test_77_0512_0256.png']
process: ['test_102_0512_0000.png']
process: ['test_121_0768_0256.png']
process: ['test_2_0000_0000.png']
process: ['test_2_0000_0512.png']
process: ['test_7_0256_0512.png']
process: ['test_55_0256_0000.png']

模型导出：
- 参数model_path为.pdparams后缀的权重的路径
- 参数save_inference_dir为静态图的保存文件夹路径

!python export_model.py --model_path ../../data/data162790/checkpoints/CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_trainval_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256/best_ckpt.pdparams --save_inference_dir ./inference/

W0817 18:27:48.933506  5191 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0817 18:27:48.937757  5191 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.

静态图推理：
- 使用infer.py文件进行推理。
- 参数model_dir为静态图保存的文件夹路径
- 参数img_dir为待预测的图像文件夹路径，文件夹下需包含A和B两个子文件夹

!python infer.py --model_dir ./inference/ --img_dir ./samples_LEVIR/

total file number is 11
finish

（六）TIPC基础链条测试

该部分依赖auto_log，需要进行安装，安装方式如下：

auto_log的详细介绍参考https://github.com/LDOUBLEV/AutoLog。

#%cd ~
%cd /home/aistudio/work/ChangeFormer-pd
!git clone https://github.com/LDOUBLEV/AutoLog
%cd ./AutoLog/
!pip3 install -r requirements.txt
!python3 setup.py bdist_wheel
!pip3 install ./dist/auto_log-1.2.0-py3-none-any.whl
%cd ..

## 网络原因可能会失败。

运行命令，准备小批量数据

%cd work/ChangeFormer-pd/

!bash ./test_tipc/prepare.sh test_tipc/configs/ChangeFormer/train_infer_python.txt 'lite_train_lite_infer'

运行命令，小批量数据训练、导出、推理一体化：

!bash test_tipc/test_train_inference_python.sh test_tipc/configs/ChangeFormer/train_infer_python.txt 'lite_train_lite_infer'

（七）项目总结

项目基于论文原文以及【论文笔记】A Transformer-based Siamese network for change detection论文笔记对ChangeFormer进行了较为详细的说明。
项目给出了基于paddl框架的复现仓库地址，并基于代码给出了训练、测试和预测，以及ChangeFormer模型导出的快速体验命令。
关于模型代码部分，由于paddle和pytorch框架架构类似，其代码转换并不复杂，具体可以参考《论文复现赛指南》，其中提供了很多torch和paddle的api转换对，可以直接使用，对于paddle没有实现的api也提供了一些组合解决的方案。
- 对于没有解决方案的torch api，我的一点心得是查看torch的源码，将对应功能的源码改写成paddle格式一般就可以使用。
  -另外，对于自己没有办法解决的问题，也可以在paddle的社区内进行求助，复现赛专属的rd也会帮助解决很多问题。这一点必须给我们paddle点赞！👍