简单的数据增强示例——第十一届“中国软件杯”遥感赛项
1.项目介绍
本项目基于“中国软件杯”大学生软件设计大赛A4赛道变化检测问题baseline,项目中实现了几个自定义数据增强,并给出了使用说明,以此帮助复活赛的同学们自己设计数据增强。本项目只有基础的数据增强,只求抛砖引玉,让各位取得更好的成绩。此处要说明的一点是,本项目中使用的数据增强仅用于教学,并不保证这些增强方法可以有效提点。

如果这个项目能够帮到你,或是给你带来了一些启发,欢迎给我的项目点个喜欢。另外,作者因能力有限,不能保证以下内容的正确性,如果发现了任何错误,请尽管指出,我会及时修改。

如果在学习过程中遇到了问题,或对此项目有任何建议,也可以通过评论或是我的QQ1694666307联系我。😀(只能提供技术方面的帮助,希望我帮忙调参的朋友还是先自己努力哈)

在此仍要感谢组委会老师同学的及时答疑解惑,他们给出的信息帮助我解决了很多问题。另外,如果没有林大佬的baseline,也就没有本项目,感谢大佬。

比赛官网链接

1.1 项目内容安排
本项目中实现了在训练前数据增强和训练时数据增强两个部分,所有对图像的变换,都使用PaddleRS的Transform类进行。

训练前数据增强会对原样本进行复制,然后对副本进行变换后保存,原样本和增强后的副本会一同被打包进Dataset中。

训练时数据增强是在训练循环中,每次读取样本用于训练时,会先进行变换,之后送入模型进行训练。

细节及实现代码见后文。

2.数据预处理与训练前数据增强
在此处,我们需要先把数据集解压,并导入一些我们需要的包,以便我们之后的操作。建议先运行下方安装PaddleRS和解压数据集的代码,然后继续阅读。我们将在该部分实现训练前数据增强。

In [73]

因为有库依赖该版本的pyzmq,因此对该库重新安装

!pip install pyzmq==18.1.1

安装第三方库

!pip install scikit-image > /dev/null
!pip install matplotlib==3.4 > /dev/null

print(“更新RS”)

安装PaddleRS(AI Studio上缓存的版本)

!unzip -o -d /home/aistudio/data/ /home/aistudio/data/data135375/PaddleRS-develop.zip > /dev/null
!mv /home/aistudio/data/PaddleRS-develop /home/aistudio/data/PaddleRS
print(“解压完成,开始安装”)
!pip install -e /home/aistudio/data/PaddleRS > /dev/null

因为sys.path可能没有及时更新,这里选择手动更新

print(“开始更新路径”)
import sys
sys.path.append(‘/home/aistudio/data/PaddleRS’)
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: pyzmq==18.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (18.1.1)
WARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
You should consider upgrading via the ‘/opt/conda/envs/python35-paddle120-env/bin/python -m pip install --upgrade pip’ command.
WARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
You should consider upgrading via the ‘/opt/conda/envs/python35-paddle120-env/bin/python -m pip install --upgrade pip’ command.
WARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
You should consider upgrading via the ‘/opt/conda/envs/python35-paddle120-env/bin/python -m pip install --upgrade pip’ command.
更新RS
解压完成,开始安装
WARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
You should consider upgrading via the ‘/opt/conda/envs/python35-paddle120-env/bin/python -m pip install --upgrade pip’ command.
开始更新路径
In [74]

导入一些需要用到的库

print(“导入库”)
import random
import os
import os.path as osp
from copy import deepcopy
from functools import partial

import cv2
import numpy as np
import paddle
import paddlers as pdrs
from paddlers import transforms as T
from skimage.io import imread, imsave
from PIL import Image,ImageFilter,ImageEnhance
from tqdm import tqdm
from matplotlib import pyplot as plt
import numbers
导入库
In [75]

解压数据集

该操作涉及大量文件IO,可能需要一些时间

#原始训练集
!unzip -o -d /home/aistudio/data/dataset /home/aistudio/data/data134796/train_data.zip > /dev/null
#测试数据集
!unzip -o -d /home/aistudio/data/dataset /home/aistudio/data/data134796/test_data.zip > /dev/null
print(“解压完成”)

DATA_DIR = ‘/home/aistudio/data/dataset/’
解压完成
2.1 训练前数据增强的前置知识
在这一部分,我们简述PaddleRS中Transform、Dataset的工作方式,并根据其工作方式编写一个对所有样本进行随机旋转的数据增强。

2.1.1 Dataset的工作方式
在PaddleRS中,Dataset采用的工作方式和PaddlePaddle中是有所不同的,因此即使有PaddlePaddle的使用经验,仍需要注意其差别。

在Dataset中,每个样本保存的都是其路径,我们以cd_dataset,也就是本项目中使用的变化检测数据集为例,其维护一个列表file_list,列表中存有数个字典item_list,结构如下:

在这里插入图片描述
其中aux_masks是允许为None的。其他几个PaddleRS的数据集也都使用了类似的实现,只是有一些区别,如图像分类数据集只有一个image,没有mask等。

在getitem函数中,其实现方式是,先将待取样本深拷贝为一个sample,包含了一个样本的两时相与mask图片的路径。然后将sample直接送入Transform中,由Transform实现图片的读取和解析。这也是其和PaddlePaddle的不同之处,因为依赖Transform实现图片读取,因此PaddleRS在实例化Dataset时其transforms属性是不能为空的,否则会出现错误。

2.1.2 Transform的工作流程
前文中我们提到,PaddleRS中读取图片的功能是由Transform实现的,那么这里我们更细致的阐述Transform工作的流程。在PaddleRS中,Transform的功能也和PaddlePaddle中不同,我个人认为,PaddlePaddle对这一功能的设计更合理,但PaddleRS同样有其可取之处。

在PaddleRS中,Dataset必须要传入transforms参数,该参数可以是T.Compose,或者是一个T.ImgDecoder,而不能是其他单独的一个Transform。即使只有一个Transform,其也必须使用Compose进行打包。这主要是因为,在Compose中会先调用ImgDecoder根据路径读取图片,而其他Transform都是对图片进行操作,因此会导致报错。下方我们给出了几个关于Transform的示例。

In [76]

显示图片

def showImg(img):
%matplotlib inline

fig = plt.figure(figsize=(20,20))

ax = fig.add_subplot(131)
plt.imshow(img['image'])

ax = fig.add_subplot(132)
plt.imshow(img['image2'])

ax = fig.add_subplot(133)
plt.imshow(img['mask'])

In [77]

以第327张图作为示例

img_path_t1 = “/home/aistudio/data/dataset/train/A/train_327.png”
img_path_t2 = “/home/aistudio/data/dataset/train/B/train_327.png”
img_path_label = “/home/aistudio/data/dataset/train/label/train_327.png”

Dataset中一个样本的结构如下

item_dict = dict(image_t1=img_path_t1,image_t2=img_path_t2,mask=img_path_label)

print(item_dict.keys())
print(type(item_dict[‘mask’]))

在Dataset中维护由多个item_dict组成的一个file_list列表

decoder = T.ImgDecoder()
item_dict = decoder(item_dict)
print(item_dict.keys())
print(type(item_dict[‘mask’]))

可以看到,ImgDecoder会解析路径,并返回其图片。需要注意,其中两个时相图片的路径仍保留,而mask则被直接替换

showImg(item_dict)
dict_keys([‘image_t1’, ‘image_t2’, ‘mask’])
<class ‘str’>
dict_keys([‘image_t1’, ‘image_t2’, ‘mask’, ‘image’, ‘image2’, ‘im_shape’, ‘scale_factor’])
<class ‘numpy.ndarray’>
在这里插入图片描述
In [78]

再看一个应用归一化的例子

item_dict = dict(image_t1=img_path_t1,image_t2=img_path_t2,mask=img_path_label)

norm = T.Normalize(
mean=[0.485, 0.455, 0.405],
std=[0.229, 0.224, 0.226])

item_dict = decoder(item_dict)

item_dict = norm(item_dict)

showImg(item_dict)
Clipping input data to the valid range for imshow with RGB data ([0…1] for floats or [0…255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0…1] for floats or [0…255] for integers).
在这里插入图片描述
In [79]

我们也可以使用Compose来合并多个Transform,当然这样就不需要我们去使用ImgDecoder了,

item_dict = dict(image_t1=img_path_t1,image_t2=img_path_t2,mask=img_path_label)

compose = T.Compose([T.Normalize(
mean=[0.485, 0.455, 0.405],
std=[0.229, 0.224, 0.226])])

item_dict = compose(item_dict)

showImg(item_dict)

Clipping input data to the valid range for imshow with RGB data ([0…1] for floats or [0…255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0…1] for floats or [0…255] for integers).
在这里插入图片描述
2.2 训练前数据增强的实现方法
在掌握了使用Transform的方法后,我们可以尝试着编写一个将所有图像的水平镜像加入训练集的功能。

首先,我们会使用测试函数验证翻转的效果,之后,我们会遍历数据集并进行扩增。

In [80]
#首先,我们定义一个tranform,使用PaddleRS实现的RandomHorizontalFlip,因为我们是对整个数据集全部进行扩增,因此我们将翻转的概率设为1
pretrain_transforms = T.Compose([T.RandomHorizontalFlip(prob=1.)])

接下来,我们先输出原图像作为对比

img_path_t1 = “/home/aistudio/data/dataset/train/A/train_1.png”
img_path_t2 = “/home/aistudio/data/dataset/train/B/train_1.png”
img_path_label = “/home/aistudio/data/dataset/train/label/train_1.png”
item_dict = dict(image_t1=img_path_t1,image_t2=img_path_t2,mask=img_path_label)
decoder = T.ImgDecoder()
showImg(decoder(item_dict))
在这里插入图片描述
In [81]

然后我们输出其进行变换后的图像,观察其变换是否生效

item_dict = dict(image_t1=img_path_t1,image_t2=img_path_t2,mask=img_path_label)
item_dict = pretrain_transforms(item_dict)
showImg(item_dict)在这里插入图片描述
In [82]
#验证过我们的transform后,即可对整个数据集进行操作。
item_dicts = []

times = 1

for i in range(1,637):

img_path_t1 = osp.join(DATA_DIR,'train','A','train_{}.png'.format(i))
img_path_t2 = osp.join(DATA_DIR,'train','B','train_{}.png'.format(i))
img_path_label = osp.join(DATA_DIR,'train','label','train_{}.png'.format(i))
item_dict = dict(image_t1=img_path_t1,image_t2=img_path_t2,mask=img_path_label)
item_dict = pretrain_transforms(item_dict)
# 由于cv2默认的通道为BGR,我们需要进行一些转换
cv2.imwrite(osp.join(DATA_DIR,'train','A','train_{}.png'.format(i+637*times)),cv2.cvtColor(item_dict['image'],cv2.COLOR_RGB2BGR))
cv2.imwrite(osp.join(DATA_DIR,'train','B','train_{}.png'.format(i+637*times)),cv2.cvtColor(item_dict['image2'],cv2.COLOR_RGB2BGR))
# 注意,mask是单通道图像,不需要进行变换
cv2.imwrite(osp.join(DATA_DIR,'train','label','train_{}.png'.format(i+637*times)),item_dict['mask'])

In [83]
#接下来我们输出第一个图片翻转后的结果,看看我们的代码是否正确

img_path_t1 = “/home/aistudio/data/dataset/train/A/train_638.png”
img_path_t2 = “/home/aistudio/data/dataset/train/B/train_638.png”
img_path_label = “/home/aistudio/data/dataset/train/label/train_638.png”
item_dict = dict(image_t1=img_path_t1,image_t2=img_path_t2,mask=img_path_label)
decoder = T.ImgDecoder()
showImg(decoder(item_dict))
在这里插入图片描述
2.3 训练前数据增强的总结
经过上方的代码,我们就通过将所有的图片左右翻转,将数据集扩充了一倍。

在进行其他方式的数据增强时,也可以参考我的方法,编写transforms,输出原图像和变换后的图像,观察变换的效果。当验证代码效果之后,就可以将扩增应用于整个数据集。

在PaddleRS中,已经为我们实现了多个比较实用的数据增强,如随机交换时相,随机裁剪、模糊、翻转等。这些功能都和上文的随机翻转一样,可以比较简单的实现,也可以使用Compose将各个增强进行组合。详细内容可以参考Github PaddleRS数据增强

数据扩充结束之后,我们就可以根据现在的数据集,划分训练集和验证集,下方的baseline代码实现了自动将文件夹内的所有文件都进行统计并划分的代码,因此我们进行的数据扩充不需要对其进行任何修改。

In [84]

划分训练集/验证集,并生成文件名列表

import random
from glob import glob

随机数生成器种子

RNG_SEED = 114514

调节此参数控制训练集数据的占比

TRAIN_RATIO = 0.8

数据集路径

DATA_DIR = ‘/home/aistudio/data/dataset/’

def write_rel_paths(phase, names, out_dir, prefix=‘’):
“”“将文件相对路径存储在txt格式文件中”“”
with open(osp.join(out_dir, phase+‘.txt’), ‘w’) as f:
for name in names:
f.write(
’ ‘.join([
osp.join(prefix, ‘A’, name),
osp.join(prefix, ‘B’, name),
osp.join(prefix, ‘label’, name)
])
)
f.write(’\n’)

random.seed(RNG_SEED)

随机划分训练集/验证集

names = list(map(osp.basename, glob(osp.join(DATA_DIR, ‘train’, ‘label’, ‘*.png’))))

对文件名进行排序,以确保多次运行结果一致

names.sort()
random.shuffle(names)
len_train = int(len(names)*TRAIN_RATIO) # 向下取整
write_rel_paths(‘train’, names[:len_train], DATA_DIR, prefix=‘train’)
write_rel_paths(‘val’, names[len_train:], DATA_DIR, prefix=‘train’)

处理测试集

test_names = map(osp.basename, glob(osp.join(DATA_DIR, ‘test’, ‘A’, ‘*.png’)))
test_names = sorted(test_names)
write_rel_paths(
‘test’,
test_names,
DATA_DIR,
prefix=‘test’
)

print(“数据集划分已完成。”)
数据集划分已完成。
3.模型构建与训练时数据增强
本项目使用PaddleRS套件搭建模型训练与推理框架。PaddleRS是基于飞桨开发的遥感处理平台,支持遥感图像分类、目标检测、图像分割、以及变化检测等常用遥感任务,能够帮助开发者更便捷地完成从训练到部署全流程遥感深度学习应用。在变化检测方面,PaddleRS目前支持9个state-of-the-art(SOTA)模型,且复杂的训练和推理过程被封装到数个API中,能够提供开箱即用的用户体验。

In [85]

定义全局变量

可在此处调整实验所用超参数

print(“声明全局变量”)

随机种子

SEED = 1919810

数据集路径

DATA_DIR = ‘/home/aistudio/data/dataset/’

实验路径。实验目录下保存输出的模型权重和结果

EXP_DIR = ‘/home/aistudio/exp/’

保存最佳模型的路径

BEST_CKP_PATH = osp.join(EXP_DIR, ‘best_model’, ‘model.pdparams’)

训练的epoch数

NUM_EPOCHS = 100

每多少个epoch保存一次模型权重参数

SAVE_INTERVAL_EPOCHS = 10

初始学习率

LR = 0.001

学习率衰减步长(注意,单位为迭代次数而非epoch数),即每多少次迭代将学习率衰减一半

DECAY_STEP = 1000

训练阶段 batch size

TRAIN_BATCH_SIZE = 16

推理阶段 batch size

INFER_BATCH_SIZE = 16

加载数据所使用的进程数

NUM_WORKERS = 4

裁块大小

CROP_SIZE = 256

模型推理阶段使用的滑窗步长

STRIDE = 64

影像原始大小

ORIGINAL_SIZE = (1024, 1024)
声明全局变量
In [86]

固定随机种子,尽可能使实验结果可复现

random.seed(SEED)
np.random.seed(SEED)
paddle.seed(SEED)
<paddle.fluid.core_avx.Generator at 0x7f818a504770>
In [87]

定义一些辅助函数

def info(msg, **kwargs):
print(msg, **kwargs)

def warn(msg, **kwargs):
print(‘\033[0;31m’+msg, **kwargs)

def quantize(arr):
return (arr*255).astype(‘uint8’)
3.1 模型构建
作为演示,本项目选用LEVIR小组2021年的作品——基于Transformer的变化检测模型BIT-CD[1]。原论文请参考此链接,原作者官方实现请参考此链接。

项目中包含一些用于使用后台任务的代码,如下文手动保存权重文件等。与后台任务有关的内容请看我的另一个项目能使用后台任务的遥感变化检测

[1] Hao Chen, Zipeng Qi, and Zhenwei Shi. Remote Sensing Image Change Detection with Transformers. IEEE Transactions on Geoscience and Remote Sensing.

In [88]

调用PaddleRS API一键构建模型

!mkdir ~/.cache/paddle/
!mkdir ~/.cache/paddle/hapi/
!mkdir ~/.cache/paddle/hapi/weights/
!cp resnet18.pdparams ~/.cache/paddle/hapi/weights/
!cp resnet34.pdparams ~/.cache/paddle/hapi/weights/

print(“构建模型”)
model = pdrs.tasks.BIT(
# 模型输出类别数
num_classes=2,
# 是否使用混合损失函数,默认使用交叉熵损失函数训练
use_mixed_loss=False,
# 模型输入通道数
in_channels=3,
# 模型使用的骨干网络,支持’resnet18’或’resnet34’
backbone=‘resnet18’,
# 骨干网络中的resnet stage数量
n_stages=4,
# 是否使用tokenizer获取语义token
use_tokenizer=True,
# token的长度
token_len=4,
# 若不使用tokenizer,则使用池化方式获取token。此参数设置池化模式,有’max’和’avg’两种选项,分别对应最大池化与平均池化
pool_mode=‘max’,
# 池化操作输出特征图的宽和高(池化方式得到的token的长度为pool_size的平方)
pool_size=2,
# 是否在Transformer编码器中加入位置编码(positional embedding)
enc_with_pos=True,
# Transformer编码器使用的注意力模块(attention block)个数
enc_depth=1,
# Transformer编码器中每个注意力头的嵌入维度(embedding dimension)
enc_head_dim=64,
# Transformer解码器使用的注意力模块个数
dec_depth=8,
# Transformer解码器中每个注意力头的嵌入维度
dec_head_dim=8
)
print(“模型构建完成”)
mkdir: 无法创建目录"/home/aistudio/.cache/paddle/": 文件已存在
构建模型
W0602 14:52:43.797078 160 dynamic_loader.cc:305] The third-party dynamic library (libcudnn.so) that Paddle depends on is not configured correctly. (error code is /usr/local/cuda/lib64/libcudnn.so: cannot open shared object file: No such file or directory)
Suggestions:

  1. Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.
  2. Configure third-party dynamic library environment variables as follows:
  • Linux: set LD_LIBRARY_PATH by export LD_LIBRARY_PATH=...
  • Windows: set PATH by `set PATH=XXX;
    模型构建完成
    In [89]

查看组网信息

在PaddleRS中,可通过ChangeDetector对象的net属性获取paddle.nn.Layer类型组网

model.net
BIT(
(backbone): Backbone(
(resnet): ResNet(
(conv1): Conv2D(3, 64, kernel_size=[7, 7], stride=[2, 2], padding=3, data_format=NCHW)
(bn1): BatchNorm2D(num_features=64, momentum=0.9, epsilon=1e-05)
(relu): ReLU()
(maxpool): MaxPool2D(kernel_size=3, stride=2, padding=1)
(layer1): Sequential(
(0): BasicBlock(
(conv1): Conv2D(64, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
(bn1): BatchNorm2D(num_features=64, momentum=0.9, epsilon=1e-05)
(relu): ReLU()
(conv2): Conv2D(64, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
(bn2): BatchNorm2D(num_features=64, momentum=0.9, epsilon=1e-05)
)
(1): BasicBlock(
(conv1): Conv2D(64, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
(bn1): BatchNorm2D(num_features=64, momentum=0.9, epsilon=1e-05)
(relu): ReLU()
(conv2): Conv2D(64, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
(bn2): BatchNorm2D(num_features=64, momentum=0.9, epsilon=1e-05)
)
)
(layer2): Sequential(
(0): BasicBlock(
(conv1): Conv2D(64, 128, kernel_size=[3, 3], stride=[2, 2], padding=1, data_format=NCHW)
(bn1): BatchNorm2D(num_features=128, momentum=0.9, epsilon=1e-05)
(relu): ReLU()
(conv2): Conv2D(128, 128, kernel_size=[3, 3], padding=1, data_format=NCHW)
(bn2): BatchNorm2D(num_features=128, momentum=0.9, epsilon=1e-05)
(downsample): Sequential(
(0): Conv2D(64, 128, kernel_size=[1, 1], stride=[2, 2], data_format=NCHW)
(1): BatchNorm2D(num_features=128, momentum=0.9, epsilon=1e-05)
)
)
(1): BasicBlock(
(conv1): Conv2D(128, 128, kernel_size=[3, 3], padding=1, data_format=NCHW)
(bn1): BatchNorm2D(num_features=128, momentum=0.9, epsilon=1e-05)
(relu): ReLU()
(conv2): Conv2D(128, 128, kernel_size=[3, 3], padding=1, data_format=NCHW)
(bn2): BatchNorm2D(num_features=128, momentum=0.9, epsilon=1e-05)
)
)
(layer3): Sequential(
(0): BasicBlock(
(conv1): Conv2D(128, 256, kernel_size=[3, 3], padding=1, data_format=NCHW)
(bn1): BatchNorm2D(num_features=256, momentum=0.9, epsilon=1e-05)
(relu): ReLU()
(conv2): Conv2D(256, 256, kernel_size=[3, 3], padding=1, data_format=NCHW)
(bn2): BatchNorm2D(num_features=256, momentum=0.9, epsilon=1e-05)
(downsample): Sequential(
(0): Conv2D(128, 256, kernel_size=[1, 1], data_format=NCHW)
(1): BatchNorm2D(num_features=256, momentum=0.9, epsilon=1e-05)
)
)
(1): BasicBlock(
(conv1): Conv2D(256, 256, kernel_size=[3, 3], padding=1, data_format=NCHW)
(bn1): BatchNorm2D(num_features=256, momentum=0.9, epsilon=1e-05)
(relu): ReLU()
(conv2): Conv2D(256, 256, kernel_size=[3, 3], padding=1, data_format=NCHW)
(bn2): BatchNorm2D(num_features=256, momentum=0.9, epsilon=1e-05)
)
)
(layer4): Identity()
(avgpool): Identity()
(fc): Identity()
)
(upsample): Upsample(scale_factor=2, mode=nearest, align_corners=False, align_mode=0, data_format=NCHW)
(conv_out): Conv3x3(
(seq): Sequential(
(0): Pad2D(padding=[1, 1, 1, 1], mode=constant, value=0.0, data_format=NCHW)
(1): Conv2D(256, 32, kernel_size=[3, 3], data_format=NCHW)
)
)
)
(conv_att): Conv1x1(
(seq): Sequential(
(0): Conv2D(32, 4, kernel_size=[1, 1], data_format=NCHW)
)
)
(encoder): TransformerEncoder(
(layers): LayerList(
(0): LayerList(
(0): Residual(
(fn): PreNorm(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): SelfAttention(
(fc_q): Linear(in_features=32, out_features=512, dtype=float32)
(fc_k): Linear(in_features=32, out_features=512, dtype=float32)
(fc_v): Linear(in_features=32, out_features=512, dtype=float32)
(fc_out): Sequential(
(0): Linear(in_features=512, out_features=32, dtype=float32)
(1): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
(1): Residual(
(fn): PreNorm(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): FeedForward(
(0): Linear(in_features=32, out_features=64, dtype=float32)
(1): GELU(approximate=False)
(2): Dropout(p=0.0, axis=None, mode=upscale_in_train)
(3): Linear(in_features=64, out_features=32, dtype=float32)
(4): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
)
)
(decoder): TransformerDecoder(
(layers): LayerList(
(0): LayerList(
(0): Residual2(
(fn): PreNorm2(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): CrossAttention(
(fc_q): Linear(in_features=32, out_features=64, dtype=float32)
(fc_k): Linear(in_features=32, out_features=64, dtype=float32)
(fc_v): Linear(in_features=32, out_features=64, dtype=float32)
(fc_out): Sequential(
(0): Linear(in_features=64, out_features=32, dtype=float32)
(1): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
(1): Residual(
(fn): PreNorm(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): FeedForward(
(0): Linear(in_features=32, out_features=64, dtype=float32)
(1): GELU(approximate=False)
(2): Dropout(p=0.0, axis=None, mode=upscale_in_train)
(3): Linear(in_features=64, out_features=32, dtype=float32)
(4): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
(1): LayerList(
(0): Residual2(
(fn): PreNorm2(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): CrossAttention(
(fc_q): Linear(in_features=32, out_features=64, dtype=float32)
(fc_k): Linear(in_features=32, out_features=64, dtype=float32)
(fc_v): Linear(in_features=32, out_features=64, dtype=float32)
(fc_out): Sequential(
(0): Linear(in_features=64, out_features=32, dtype=float32)
(1): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
(1): Residual(
(fn): PreNorm(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): FeedForward(
(0): Linear(in_features=32, out_features=64, dtype=float32)
(1): GELU(approximate=False)
(2): Dropout(p=0.0, axis=None, mode=upscale_in_train)
(3): Linear(in_features=64, out_features=32, dtype=float32)
(4): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
(2): LayerList(
(0): Residual2(
(fn): PreNorm2(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): CrossAttention(
(fc_q): Linear(in_features=32, out_features=64, dtype=float32)
(fc_k): Linear(in_features=32, out_features=64, dtype=float32)
(fc_v): Linear(in_features=32, out_features=64, dtype=float32)
(fc_out): Sequential(
(0): Linear(in_features=64, out_features=32, dtype=float32)
(1): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
(1): Residual(
(fn): PreNorm(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): FeedForward(
(0): Linear(in_features=32, out_features=64, dtype=float32)
(1): GELU(approximate=False)
(2): Dropout(p=0.0, axis=None, mode=upscale_in_train)
(3): Linear(in_features=64, out_features=32, dtype=float32)
(4): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
(3): LayerList(
(0): Residual2(
(fn): PreNorm2(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): CrossAttention(
(fc_q): Linear(in_features=32, out_features=64, dtype=float32)
(fc_k): Linear(in_features=32, out_features=64, dtype=float32)
(fc_v): Linear(in_features=32, out_features=64, dtype=float32)
(fc_out): Sequential(
(0): Linear(in_features=64, out_features=32, dtype=float32)
(1): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
(1): Residual(
(fn): PreNorm(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): FeedForward(
(0): Linear(in_features=32, out_features=64, dtype=float32)
(1): GELU(approximate=False)
(2): Dropout(p=0.0, axis=None, mode=upscale_in_train)
(3): Linear(in_features=64, out_features=32, dtype=float32)
(4): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
(4): LayerList(
(0): Residual2(
(fn): PreNorm2(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): CrossAttention(
(fc_q): Linear(in_features=32, out_features=64, dtype=float32)
(fc_k): Linear(in_features=32, out_features=64, dtype=float32)
(fc_v): Linear(in_features=32, out_features=64, dtype=float32)
(fc_out): Sequential(
(0): Linear(in_features=64, out_features=32, dtype=float32)
(1): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
(1): Residual(
(fn): PreNorm(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): FeedForward(
(0): Linear(in_features=32, out_features=64, dtype=float32)
(1): GELU(approximate=False)
(2): Dropout(p=0.0, axis=None, mode=upscale_in_train)
(3): Linear(in_features=64, out_features=32, dtype=float32)
(4): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
(5): LayerList(
(0): Residual2(
(fn): PreNorm2(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): CrossAttention(
(fc_q): Linear(in_features=32, out_features=64, dtype=float32)
(fc_k): Linear(in_features=32, out_features=64, dtype=float32)
(fc_v): Linear(in_features=32, out_features=64, dtype=float32)
(fc_out): Sequential(
(0): Linear(in_features=64, out_features=32, dtype=float32)
(1): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
(1): Residual(
(fn): PreNorm(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): FeedForward(
(0): Linear(in_features=32, out_features=64, dtype=float32)
(1): GELU(approximate=False)
(2): Dropout(p=0.0, axis=None, mode=upscale_in_train)
(3): Linear(in_features=64, out_features=32, dtype=float32)
(4): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
(6): LayerList(
(0): Residual2(
(fn): PreNorm2(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): CrossAttention(
(fc_q): Linear(in_features=32, out_features=64, dtype=float32)
(fc_k): Linear(in_features=32, out_features=64, dtype=float32)
(fc_v): Linear(in_features=32, out_features=64, dtype=float32)
(fc_out): Sequential(
(0): Linear(in_features=64, out_features=32, dtype=float32)
(1): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
(1): Residual(
(fn): PreNorm(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): FeedForward(
(0): Linear(in_features=32, out_features=64, dtype=float32)
(1): GELU(approximate=False)
(2): Dropout(p=0.0, axis=None, mode=upscale_in_train)
(3): Linear(in_features=64, out_features=32, dtype=float32)
(4): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
(7): LayerList(
(0): Residual2(
(fn): PreNorm2(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): CrossAttention(
(fc_q): Linear(in_features=32, out_features=64, dtype=float32)
(fc_k): Linear(in_features=32, out_features=64, dtype=float32)
(fc_v): Linear(in_features=32, out_features=64, dtype=float32)
(fc_out): Sequential(
(0): Linear(in_features=64, out_features=32, dtype=float32)
(1): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
(1): Residual(
(fn): PreNorm(
(norm): LayerNorm(normalized_shape=[32], epsilon=1e-05)
(fn): FeedForward(
(0): Linear(in_features=32, out_features=64, dtype=float32)
(1): GELU(approximate=False)
(2): Dropout(p=0.0, axis=None, mode=upscale_in_train)
(3): Linear(in_features=64, out_features=32, dtype=float32)
(4): Dropout(p=0.0, axis=None, mode=upscale_in_train)
)
)
)
)
)
)
(upsample): Upsample(scale_factor=4, mode=bilinear, align_corners=False, align_mode=0, data_format=NCHW)
(conv_out): Sequential(
(0): Conv3x3(
(seq): Sequential(
(0): Pad2D(padding=[1, 1, 1, 1], mode=constant, value=0.0, data_format=NCHW)
(1): Conv2D(32, 32, kernel_size=[3, 3], data_format=NCHW)
(2): BatchNorm2D(num_features=32, momentum=0.9, epsilon=1e-05)
(3): ReLU()
)
)
(1): Conv3x3(
(seq): Sequential(
(0): Pad2D(padding=[1, 1, 1, 1], mode=constant, value=0.0, data_format=NCHW)
(1): Conv2D(32, 2, kernel_size=[3, 3], data_format=NCHW)
)
)
)
)
3.2 训练时数据增强
训练时数据增强的作用方式和训练前数据增强有些不一样,但只要你掌握了训练前数据增强的使用方法,这一部分的数据增强也不会有任何困难。

归功于PaddleRS的高度封装,我们不需要关心训练时数据增强的具体流程,只需要在定义Dataset时传入一个合适的Compose,剩余的操作都不需要我们关心。

因此,我们在这一节将聚焦于如何写一个自己的数据增强操作。本节将包含两个示例,分别是随机旋转和色彩平衡。

3.2.1 Transform的工作方式
我们已经知道,只需要在Compose中传入需要的Transform,图像就会按Compose中包含的变换串行执行。所以想要添加我们自己的数据增强方式,只需要写一个自己的Transform即可。这里我们先简述Transform的工作方式。

当调用Transform时,会首先调用其apply方法,它接受一个sample参数,其中包含样本的图像和标签等,是和上文提到的ImgDecoder的返回值一致。在apply方法中,会将sample的image和mask分别送入apply_im、apply_mask等方法。这些方法会将传入的图像进行变换,然后返回变换后的图像。

至于为什么需要区分image和mask,其原因在于图像格式的不同,如image在本项目中是3通道彩色图像,保存形式为(1024,1024,3)的np数组,而mask的大小却是(1024,1024),如果都使用同一个方法,就会导致尺寸错误等问题。

3.2.2 随机旋转的实现
在PaddleRS中,大部分Transform都使用了Opencv来实现。为了让我们的数据增强具有更高的兼容性,我们同样使用Opencv中的函数来实现图像旋转的操作。

因为需要实现随机的功能,所以还要在apply中进行判断,如果随机数大于prob,才会进行翻转。

具体实现请看下方代码。

In [91]

实现随机旋转的Transform

class RandomRotation(T.operators.Transform):
“”"
Randomly Rotation.

随机旋转,由prob确定旋转的概率。在旋转时,将在45°、135°、225°、315°四个角度中随机选择一个,对原图进行旋转

Args:
    prob(float, optional): Probability of rotate the input. Defaults to .5.
"""

def __init__(self,
    prob=0.5):
    self.prob = prob
    super(RandomRotation, self).__init__()

def apply_im(self, image):
    # 对图像进行旋转
    h, w = image.shape[:2]
    center = (w // 2, h // 2)
    M_1 = cv2.getRotationMatrix2D(center, self.angle, 1)
    image = cv2.warpAffine(image, M_1, (w, h))
    return image

def apply_mask(self, mask):
    # 对mask进行旋转
    h, w = mask.shape[:2]
    center = (w // 2, h // 2)
    M_1 = cv2.getRotationMatrix2D(center, self.angle, 1)
    mask = cv2.warpAffine(mask, M_1, (w, h))
    return mask

def apply_bbox(self, bbox, height):
    pass

def apply_segm(self, segms, height, width):
    pass

def apply(self, sample):
    #根据概率判断本次是否需要旋转
    if random.random() < self.prob:
        #通过随机数决定需要旋转的角度
        random_angle = random.random()
        if random_angle < 0.25:
            self.angle = 45
        elif random_angle < 0.5:
            self.angle = 135
        elif random_angle < 0.75:
            self.angle = 225
        else:
            self.angle = 315
        im_h, im_w = sample['image'].shape[:2]
        sample['image'] = self.apply_im(sample['image'])
        if 'image2' in sample:
            sample['image2'] = self.apply_im(sample['image2'])
        if 'mask' in sample:
            sample['mask'] = self.apply_mask(sample['mask'])
        if 'aux_masks' in sample:
            sample['aux_masks'] = list(
                map(self.apply_mask, sample['aux_masks']))
        if 'gt_bbox' in sample and len(sample['gt_bbox']) > 0:
            sample['gt_bbox'] = self.apply_bbox(sample['gt_bbox'], im_h)
        if 'gt_poly' in sample and len(sample['gt_poly']) > 0:
            sample['gt_poly'] = self.apply_segm(sample['gt_poly'], im_h,
                                                im_w)
    return sample

In [92]

首先还是显示原图片

img_path_t1 = “/home/aistudio/data/dataset/train/A/train_2.png”
img_path_t2 = “/home/aistudio/data/dataset/train/B/train_2.png”
img_path_label = “/home/aistudio/data/dataset/train/label/train_2.png”
item_dict = dict(image_t1=img_path_t1,image_t2=img_path_t2,mask=img_path_label)
decoder = T.ImgDecoder()
showImg(decoder(item_dict))在这里插入图片描述
In [95]

测试时我们将概率设为1,让旋转必定进行

item_dict = dict(image_t1=img_path_t1,image_t2=img_path_t2,mask=img_path_label)
temp_transforms = T.Compose([RandomRotation(prob=1.)])
showImg(temp_transforms(item_dict))在这里插入图片描述
3.2.3 色彩平衡的实现
接下来我们再看一个色彩平衡的实现,由于只有image需要进行色彩平衡,而mask不能改变其色彩。因此,在apply_mask中,我们只需直接返回mask,而不进行任何修改。而在apply_im中,需要对图片进行处理。其实现参考了这篇文章 Python Opencv 色彩平衡

In [96]
class ColorBalance(T.operators.Transform):
“”"
Color Balance.
Args:
prob(float, optional): Probability. Defaults to .5.
“”"

def __init__(self,
    prob=0.5):
    self.prob = prob
    super(ColorBalance, self).__init__()

def apply_im(self, image):
    # image = Image.fromarray(image)
    b, g, r = cv2.split(image)
    B = np.mean(b)
    G = np.mean(g)
    R = np.mean(r)
    K = (R + G + B) / 3
    Kb = K / B
    Kg = K / G
    Kr = K / R
    cv2.addWeighted(b, Kb, 0, 0, 0, b)
    cv2.addWeighted(g, Kg, 0, 0, 0, g)
    cv2.addWeighted(r, Kr, 0, 0, 0, r)
    image = cv2.merge([b,g,r])

    # image = np.array(image)

    # image = vertical_flip(image)
    # print(image.shape)
    return image

def apply_mask(self, mask):
    # mask = vertical_flip(mask)
    # print(mask.shape)
    return mask

def apply_bbox(self, bbox, height):
    pass

def apply_segm(self, segms, height, width):
    pass

def apply(self, sample):
    if random.random() < self.prob:
        im_h, im_w = sample['image'].shape[:2]
        sample['image'] = self.apply_im(sample['image'])
        if 'image2' in sample:
            sample['image2'] = self.apply_im(sample['image2'])
        if 'mask' in sample:
            sample['mask'] = self.apply_mask(sample['mask'])
        if 'aux_masks' in sample:
            sample['aux_masks'] = list(
                map(self.apply_mask, sample['aux_masks']))
        if 'gt_bbox' in sample and len(sample['gt_bbox']) > 0:
            sample['gt_bbox'] = self.apply_bbox(sample['gt_bbox'], im_h)
        if 'gt_poly' in sample and len(sample['gt_poly']) > 0:
            sample['gt_poly'] = self.apply_segm(sample['gt_poly'], im_h,
                                                im_w)
    return sample

In [99]

同样显示原图片

img_path_t1 = “/home/aistudio/data/dataset/train/A/train_5.png”
img_path_t2 = “/home/aistudio/data/dataset/train/B/train_5.png”
img_path_label = “/home/aistudio/data/dataset/train/label/train_5.png”
item_dict = dict(image_t1=img_path_t1,image_t2=img_path_t2,mask=img_path_label)
decoder = T.ImgDecoder()
showImg(decoder(item_dict))
在这里插入图片描述
In [100]

测试时我们将概率设为1,让色彩平衡必定进行

item_dict = dict(image_t1=img_path_t1,image_t2=img_path_t2,mask=img_path_label)
temp_transforms = T.Compose([ColorBalance(prob=1.)])
showImg(temp_transforms(item_dict))在这里插入图片描述
3.3 数据集构建
在设计好我们自己的Transform后,还需要在定义数据集时使用我们写的代码,其流程是:先创建transforms,然后在创建Dataset时将transforms作为参数传入。下方设计了一个使用随机旋转,随机色彩平衡,随机模糊和随机交换时相的transforms。

要注意的是有些数据增强是需要同时应用在训练集、验证集和测试集上,才能保证增强效果的,如归一化、锐化等操作。

In [101]

构建需要使用的数据变换(数据增强、预处理)

使用Compose组合多种变换方式。Compose中包含的变换将按顺序串行执行

train_transforms = T.Compose([
# 随机裁剪
T.RandomCrop(
# 裁剪区域将被缩放到此大小
crop_size=CROP_SIZE,
# 将裁剪区域的横纵比固定为1
aspect_ratio=[1.0, 1.0],
# 裁剪区域相对原始影像长宽比例在一定范围内变动,最小不低于原始长宽的1/5
scaling=[0.2, 1.0]
),
# 色彩平衡
ColorBalance(prob=0.5),
# 随机旋转
RandomRotation(prob=0.5),
#随机模糊
T.RandomBlur(prob=0.1),
#随机交换时相
T.RandomSwap(prob=0.2),
# 以50%的概率实施随机水平翻转
T.RandomHorizontalFlip(prob=0.5),
# 以50%的概率实施随机垂直翻转
T.RandomVerticalFlip(prob=0.5),
# 数据归一化
T.Normalize(
mean=[0.485, 0.455, 0.405],
std=[0.229, 0.224, 0.226]
)
])
eval_transforms = T.Compose([
# 在验证阶段,输入原始尺寸影像,对输入影像仅进行归一化处理
# 验证阶段与训练阶段的数据归一化方式必须相同

T.Normalize(
    mean=[0.485, 0.455, 0.405],
    std=[0.229, 0.224, 0.226]
)

])

实例化数据集

train_dataset = pdrs.datasets.CDDataset(
data_dir=DATA_DIR,
file_list=osp.join(DATA_DIR, ‘train.txt’),
label_list=None,
transforms=train_transforms,
num_workers=NUM_WORKERS,
shuffle=True,
binarize_labels=True
)
eval_dataset = pdrs.datasets.CDDataset(
data_dir=DATA_DIR,
file_list=osp.join(DATA_DIR, ‘val.txt’),
label_list=None,
transforms=eval_transforms,
num_workers=0,
shuffle=False,
binarize_labels=True
)
2022-06-02 15:33:09 [INFO] 1018 samples in file /home/aistudio/data/dataset/train.txt
2022-06-02 15:33:09 [INFO] 255 samples in file /home/aistudio/data/dataset/val.txt
3.4 模型训练
使用AI Studio高级版硬件配置(16G V100)和默认的超参数,训练总时长约为50分钟,训练结束时验证集上最高的mIoU指标约为0.89(参考值,实际值可能存在波动)。

如果在训练中启用了VisualDL日志记录的功能(默认开启),则可以在“数据模型可视化”页签中查看可视化结果,请将logdir设置为EXP_DIR目录下的vdl_log子目录。在notebook中使用VisualDL的相关教程可参考此处。

需要注意的是,PaddleRS默认以mIoU评价验证集上的最优模型,而赛事官方则选用F1分数作为评价指标。

变化检测任务的mIoU与F1分数指标定义:

mIoU=12(TPFN+FP+TP+TNFP+FN+TN)mIoU=\frac{1}{2}\left(\frac{TP}{FN+FP+TP}+\frac{TN}{FP+FN+TN}\right)
mIoU=
2
1

(
FN+FP+TP
TP

+
FP+FN+TN
TN

)

F1=2⋅TP2⋅TP+FN+FPF1=\frac{2 \cdot TP}{2 \cdot TP + FN + FP}
F1=
2⋅TP+FN+FP
2⋅TP

式中,TPTPTP表示预测为变化且实际为变化的样本数,TNTNTN表示预测为不变且实际为不变的样本数,FPFPFP表示预测为变化但实际为不变的样本数,FNFNFN表示预测为不变但实际为变化的样本数。

此外,PaddleRS在验证集上汇报针对每一类的指标,因此对于二类变化检测来说,category_acc、category_F1-score等指标均存在两个数据项,以列表形式体现。由于变化检测任务主要关注变化类,因此观察和比较每种指标的第二个数据项(即列表的第二个元素)是更有意义的。

In [102]

若实验目录不存在,则新建之(递归创建目录)

if not osp.exists(EXP_DIR):
os.makedirs(EXP_DIR)
In [103]

构建学习率调度器和优化器

制定定步长学习率衰减策略

lr_scheduler = paddle.optimizer.lr.StepDecay(
LR,
step_size=DECAY_STEP,
# 学习率衰减系数,这里指定每次减半
gamma=0.5
)

构造Adam优化器

optimizer = paddle.optimizer.Adam(
learning_rate=lr_scheduler,
parameters=model.net.parameters()
)
In [104]

调用PaddleRS API实现一键训练

model.train(
num_epochs=NUM_EPOCHS,
train_dataset=train_dataset,
train_batch_size=TRAIN_BATCH_SIZE,
eval_dataset=eval_dataset,
optimizer=optimizer,
save_interval_epochs=SAVE_INTERVAL_EPOCHS,
# 每多少次迭代记录一次日志
log_interval_steps=10,
save_dir=EXP_DIR,
# 是否使用early stopping策略,当精度不再改善时提前终止训练
early_stop=False,
# 是否启用VisualDL日志功能
use_vdl=True,
# 指定从某个检查点继续训练
resume_checkpoint=None
)
---------------------------------------------------------------------------KeyboardInterrupt Traceback (most recent call last)/tmp/ipykernel_160/2333809920.py in 15 use_vdl=True, 16 # 指定从某个检查点继续训练 —> 17 resume_checkpoint=None 18 ) ~/data/PaddleRS/paddlers/tasks/change_detector.py in train(self, num_epochs, train_dataset, train_batch_size, eval_dataset, optimizer, save_interval_epochs, log_interval_steps, save_dir, pretrain_weights, learning_rate, lr_decay_power, early_stop, early_stop_patience, use_vdl, resume_checkpoint) 312 early_stop=early_stop, 313 early_stop_patience=early_stop_patience, --> 314 use_vdl=use_vdl) 315 316 def quant_aware_train(self, ~/data/PaddleRS/paddlers/tasks/base.py in train_loop(self, num_epochs, train_dataset, train_batch_size, eval_dataset, save_interval_epochs, log_interval_steps, save_dir, ema, early_stop, early_stop_patience, use_vdl) 366 outputs = self.run(ddp_net, data, mode=‘train’) 367 else: --> 368 outputs = self.run(self.net, data, mode=‘train’) 369 loss = outputs[‘loss’] 370 loss.backward() ~/data/PaddleRS/paddlers/tasks/change_detector.py in run(self, net, inputs, mode) 106 107 def run(self, net, inputs, mode): --> 108 net_out = net(inputs[0], inputs[1]) 109 logit = net_out[0] 110 outputs = OrderedDict() /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in call(self, *inputs, **kwargs) 928 return self.forward(*inputs, **kwargs) 929 else: --> 930 return self._dygraph_call_func(*inputs, **kwargs) 931 932 def forward(self, *inputs, **kwargs): /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in _dygraph_call_func(self, *inputs, **kwargs) 913 outputs = self.forward(*inputs, **kwargs) 914 else: --> 915 outputs = self.forward(*inputs, **kwargs) 916 917 for forward_post_hook in self._forward_post_hooks.values(): ~/data/PaddleRS/paddlers/custom_models/cd/bit.py in forward(self, t1, t2) 164 # Extract features via shared backbone. 165 x1 = self.backbone(t1) --> 166 x2 = self.backbone(t2) 167 168 # Tokenization /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in call(self, *inputs, **kwargs) 928 return self.forward(*inputs, **kwargs) 929 else: --> 930 return self._dygraph_call_func(*inputs, **kwargs) 931 932 def forward(self, *inputs, **kwargs): /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in _dygraph_call_func(self, *inputs, **kwargs) 913 outputs = self.forward(*inputs, **kwargs) 914 else: --> 915 outputs = self.forward(*inputs, **kwargs) 916 917 for forward_post_hook in self._forward_post_hooks.values(): ~/data/PaddleRS/paddlers/custom_models/cd/bit.py in forward(self, x) 396 y = self.resnet.maxpool(y) 397 --> 398 y = self.resnet.layer1(y) 399 y = self.resnet.layer2(y) 400 y = self.resnet.layer3(y) /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in call(self, *inputs, **kwargs) 928 return self.forward(*inputs, **kwargs) 929 else: --> 930 return self._dygraph_call_func(*inputs, **kwargs) 931 932 def forward(self, *inputs, **kwargs): /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in _dygraph_call_func(self, *inputs, **kwargs) 913 outputs = self.forward(*inputs, **kwargs) 914 else: --> 915 outputs = self.forward(*inputs, **kwargs) 916 917 for forward_post_hook in self._forward_post_hooks.values(): /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/container.py in forward(self, input) 96 def forward(self, input): 97 for layer in self._sub_layers.values(): —> 98 input = layer(input) 99 return input 100 /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in call(self, *inputs, **kwargs) 928 return self.forward(*inputs, **kwargs) 929 else: --> 930 return self._dygraph_call_func(*inputs, **kwargs) 931 932 def forward(self, *inputs, **kwargs): /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in _dygraph_call_func(self, *inputs, **kwargs) 913 outputs = self.forward(*inputs, **kwargs) 914 else: --> 915 outputs = self.forward(*inputs, **kwargs) 916 917 for forward_post_hook in self._forward_post_hooks.values(): ~/data/PaddleRS/paddlers/custom_models/cd/backbones/resnet.py in forward(self, x) 88 out = self.relu(out) 89 —> 90 out = self.conv2(out) 91 out = self.bn2(out) 92 /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in call(self, *inputs, **kwargs) 928 return self.forward(*inputs, **kwargs) 929 else: --> 930 return self._dygraph_call_func(*inputs, **kwargs) 931 932 def forward(self, *inputs, **kwargs): /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in _dygraph_call_func(self, *inputs, **kwargs) 913 outputs = self.forward(*inputs, **kwargs) 914 else: --> 915 outputs = self.forward(*inputs, **kwargs) 916 917 for forward_post_hook in self._forward_post_hooks.values(): /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/conv.py in forward(self, x) 676 channel_dim=self._channel_dim, 677 op_type=self._op_type, --> 678 use_cudnn=self._use_cudnn) 679 return out 680 /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/functional/conv.py in _conv_nd(x, weight, bias, stride, padding, padding_algorithm, dilation, groups, data_format, channel_dim, op_type, use_cudnn, use_mkldnn, name) 142 “padding_algorithm”, padding_algorithm, “data_format”, 143 data_format) --> 144 pre_bias = getattr(_C_ops, op_type)(x, weight, *attrs) 145 if bias is not None: 146 out = nn.elementwise_add(pre_bias, bias, axis=channel_dim) KeyboardInterrupt:
In [105]

查看实验目录中存储的已训练好的模型

best_model子目录对应验证集上指标最好的模型

其中,eval_details.json包含验证阶段记录的混淆矩阵信息;model.pdopt包含训练过程中使用到的优化器的状态参数;

model.pdparams包含模型的权重参数;model.yml包含模型的配置文件(包括预处理参数、模型规格参数等)

!ls /home/aistudio/exp/best_model/
ls: 无法访问’/home/aistudio/exp/best_model/': 没有那个文件或目录
3.5 模型推理
注意!模型推理中会使用到测试集的transform,这里同样需要归一化等。

使用AI Studio高级版硬件配置(16G V100)和默认的超参数,推理总时长约为4分钟。

推理脚本使用固定阈值法从变化概率图获取二值变化图(binary change map),默认阈值为0.5,可根据模型实际表现调整阈值。当然,也可以换用Otsu法、k-means聚类法等更先进的阈值分割算法。

模型前向推理结果存储在EXP_DIR目录下的out子目录中,可将该子目录内的文件打包、并将压缩文件重命名后提交到比赛系统。在提交结果前,请仔细阅读提交规范。

In [ ]

定义推理阶段使用的数据集

class InferDataset(paddle.io.Dataset):
“”"
变化检测推理数据集。

Args:
    data_dir (str): 数据集所在的目录路径。
    transforms (paddlers.transforms.Compose): 需要执行的数据变换操作。
"""

def __init__(
    self,
    data_dir,
    transforms
):
    super().__init__()

    self.data_dir = data_dir
    self.transforms = deepcopy(transforms)

    pdrs.transforms.arrange_transforms(
        model_type='changedetector',
        transforms=self.transforms,
        mode='test'
    )

    with open(osp.join(data_dir, 'test.txt'), 'r') as f:
        lines = f.read()
        lines = lines.strip().split('\n')

    samples = []
    names = []
    for line in lines:
        items = line.strip().split(' ')
        items = list(map(pdrs.utils.path_normalization, items))
        item_dict = {
            'image_t1': osp.join(data_dir, items[0]),
            'image_t2': osp.join(data_dir, items[1])
        }
        samples.append(item_dict)
        names.append(osp.basename(items[0]))

    self.samples = samples
    self.names = names

def __getitem__(self, idx):
    name = self.names[idx]
    sample = deepcopy(self.samples[idx])
    output = self.transforms(sample)
    return name, \
           paddle.to_tensor(output[0]), \
           paddle.to_tensor(output[1]),

def __len__(self):
    return len(self.samples)

In [ ]

考虑到原始影像尺寸较大,以下类和函数与影像裁块-拼接有关。

class WindowGenerator:
def init(self, h, w, ch, cw, si=1, sj=1):
self.h = h
self.w = w
self.ch = ch
self.cw = cw
if self.h < self.ch or self.w < self.cw:
raise NotImplementedError
self.si = si
self.sj = sj
self._i, self._j = 0, 0

def __next__(self):
    # 列优先移动(C-order)
    if self._i > self.h:
        raise StopIteration

    bottom = min(self._i+self.ch, self.h)
    right = min(self._j+self.cw, self.w)
    top = max(0, bottom-self.ch)
    left = max(0, right-self.cw)

    if self._j >= self.w-self.cw:
        if self._i >= self.h-self.ch:
            # 设置一个非法值,使得迭代可以early stop
            self._i = self.h+1
        self._goto_next_row()
    else:
        self._j += self.sj
        if self._j > self.w:
            self._goto_next_row()

    return slice(top, bottom, 1), slice(left, right, 1)

def __iter__(self):
    return self

def _goto_next_row(self):
    self._i += self.si
    self._j = 0

def crop_patches(dataloader, ori_size, window_size, stride):
“”"
dataloader中的数据裁块。

Args:
    dataloader (paddle.io.DataLoader): 可迭代对象,能够产生原始样本(每个样本中包含任意数量影像)。
    ori_size (tuple): 原始影像的长和宽,表示为二元组形式(h,w)。
    window_size (int): 裁块大小。
    stride (int): 裁块使用的滑窗每次在水平或垂直方向上移动的像素数。

Returns:
    一个生成器,能够产生iter(`dataloader`)中每一项的裁块结果。一幅图像产生的块在batch维度拼接。例如,当`ori_size`为1024,而
        `window_size`和`stride`均为512时,`crop_patches`返回的每一项的batch_size都将是iter(`dataloader`)中对应项的4倍。
"""

for name, *ims in dataloader:
    ims = list(ims)
    h, w = ori_size
    win_gen = WindowGenerator(h, w, window_size, window_size, stride, stride)
    all_patches = []
    for rows, cols in win_gen:
        # NOTE: 此处不能使用生成器,否则因为lazy evaluation的缘故会导致结果不是预期的
        patches = [im[...,rows,cols] for im in ims]
        all_patches.append(patches)
    yield name[0], tuple(map(partial(paddle.concat, axis=0), zip(*all_patches)))

def recons_prob_map(patches, ori_size, window_size, stride):
“”“从裁块结果重建原始尺寸影像,与crop_patches相对应”“”
# NOTE: 目前只能处理batch size为1的情况
h, w = ori_size
win_gen = WindowGenerator(h, w, window_size, window_size, stride, stride)
prob_map = np.zeros((h,w), dtype=np.float)
cnt = np.zeros((h,w), dtype=np.float)
# XXX: 需要保证win_gen与patches具有相同长度。此处未做检查
for (rows, cols), patch in zip(win_gen, patches):
prob_map[rows, cols] += patch
cnt[rows, cols] += 1
prob_map /= cnt
return prob_map
In [24]

若输出目录不存在,则新建之(递归创建目录)

out_dir = osp.join(EXP_DIR, ‘out’)
if not osp.exists(out_dir):
os.makedirs(out_dir)

为模型加载历史最佳权重

state_dict = paddle.load(BEST_CKP_PATH)

同样通过net属性访问组网对象

model.net.set_state_dict(state_dict)

实例化测试集

test_dataset = InferDataset(
DATA_DIR,
# 注意,测试阶段使用的归一化方式需与训练时相同
T.Compose([
# 色彩平衡
ColorBalance(prob=1.),

    # RandomFilter(prob=1.,use_filter="sharpen"),

    T.Normalize(
        mean=[0.485, 0.455, 0.405],
        std=[0.229, 0.224, 0.226]
    )
])

)

创建DataLoader

test_dataloader = paddle.io.DataLoader(
test_dataset,
batch_size=1,
shuffle=False,
num_workers=0,
drop_last=False,
return_list=True
)
In [25]

推理过程主循环

info(“模型推理开始”)

model.net.eval()
len_test = len(test_dataset)
test_patches = crop_patches(
test_dataloader,
ORIGINAL_SIZE,
CROP_SIZE,
STRIDE
)
with paddle.no_grad():
for name, (t1, t2) in tqdm(test_patches, total=len_test):
shape = paddle.shape(t1)
pred = paddle.zeros(shape=(shape[0],2,*shape[2:]))
for i in range(0, shape[0], INFER_BATCH_SIZE):
pred[i:i+INFER_BATCH_SIZE] = model.net(t1[i:i+INFER_BATCH_SIZE], t2[i:i+INFER_BATCH_SIZE])[0]
# 取softmax结果的第1(从0开始计数)个通道的输出作为变化概率
prob = paddle.nn.functional.softmax(pred, axis=1)[:,1]
# 由patch重建完整概率图
prob = recons_prob_map(prob.numpy(), ORIGINAL_SIZE, CROP_SIZE, STRIDE)
# 默认将阈值设置为0.5,即,将变化概率大于0.5的像素点分为变化类
out = quantize(prob>0.5)

    imsave(osp.join(out_dir, name), out, check_contrast=False)

info(“模型推理完成”)
模型推理开始
100%|██████████| 363/363 [06:28<00:00, 1.05s/it]
模型推理完成
In [26]

推理结果展示

重复运行本单元可以查看不同结果

def show_images_in_row(im_paths, fig, title=‘’):
n = len(im_paths)
fig.suptitle(title)
axs = fig.subplots(nrows=1, ncols=n)
for idx, (path, ax) in enumerate(zip(im_paths, axs)):
# 去掉刻度线和边框
ax.spines[‘top’].set_visible(False)
ax.spines[‘right’].set_visible(False)
ax.spines[‘bottom’].set_visible(False)
ax.spines[‘left’].set_visible(False)
ax.get_xaxis().set_ticks([])
ax.get_yaxis().set_ticks([])

    im = imread(path)
    ax.imshow(im)

需要展示的样本个数

num_imgs_to_show = 4

随机抽取样本

chosen_indices = random.choices(range(len_test), k=num_imgs_to_show)

参考 https://stackoverflow.com/a/68209152

fig = plt.figure(constrained_layout=True)
fig.suptitle(“Inference Results”)

subfigs = fig.subfigures(nrows=3, ncols=1)

读入第一时相影像

im_paths = [osp.join(DATA_DIR, test_dataset.samples[idx][‘image_t1’]) for idx in chosen_indices]
show_images_in_row(im_paths, subfigs[0], title=‘Image 1’)

读入第二时相影像

im_paths = [osp.join(DATA_DIR, test_dataset.samples[idx][‘image_t2’]) for idx in chosen_indices]
show_images_in_row(im_paths, subfigs[1], title=‘Image 2’)

读入变化图

im_paths = [osp.join(out_dir, test_dataset.names[idx]) for idx in chosen_indices]
show_images_in_row(im_paths, subfigs[2], title=‘Change Map’)

渲染结果

fig.canvas.draw()
Image.frombytes(‘RGB’, fig.canvas.get_width_height(), fig.canvas.tostring_rgb())在这里插入图片描述
In [27]

将推理结果打包并压缩为zip文件。如果修改了默认输出目录,也需要在此指令中做出对应修改。

官方typo: submission -> submisson

!zip -j submisson.zip /home/aistudio/exp/out/* > /dev/null
4.参考资料
遥感数据介绍
PaddleRS文档

Logo

学大模型,用大模型上飞桨星河社区!每天8点V100G算力免费领!免费领取ERNIE 4.0 100w Token >>>

更多推荐