1. Dense Nested Attention Network for Infrared Small Target Detection

论文

Paper

研究意义及价值

单帧红外小目标(SIRST)检测广泛应用于许多应用,如海上监视、预警系统和精确制导。与一般的目标检测相比,红外小目标检测有几个独特的特点:1):由于成像距离长,红外目标通常很小,在图像中从一个像素到几十个像素不等。
2) 弱小:红外目标通常具有较低的信杂比(SCR),并且容易淹没在强噪声和杂波背景中。3) 无特征:红外小目标的形状特征有限。4) 可变:红外目标的大小和形状在不同场景中变化很大。

本文的贡献如下:

  • 提出了一个密集的嵌套注意力网络(即DNANet)来保持深层中的小目标。
  • 具有丰富目标的开源数据集(即 NUDT-SIRST)。
  • 在所有现有的 SIRST 数据集上表现良好。

文章作者提供了Pytorch的实现Code,本项目中将采用Paddle复现。

2. 算法详解

文章的核心模型如下:

在这里插入图片描述

模型由三部分组成,分别是(a) 特征提取模块。首先将输入图像送入密集嵌套交互模块(DNIM),以实现渐进式特征融合。然后,通过通道和空间注意模块(CSAM)自适应增强不同语义级别的特征。(b) 特征金字塔融合模块(FPFM)。增强的特征被上采样和串联,以实现多层输出融合。(c) 八连通邻域聚类算法。对分割图进行聚类,最终确定每个目标区域的质心。

其中DNIM是作者改进的Unet++网络,随后通过注意力机制进一步融合强化特征,算法的核心在于特征提取网络的设计。

3. 结果展示

在这里插入图片描述

4. 论文复现

4. 1 环境依赖

PaddlePaddle 2.3 skimage

4.2 数据集

本项目已经下载好相关数据,无需下载

Sirst Dai

数据集部分图像
在这里插入图片描述

!unzip sirst/images.zip -d sirst/
!unzip sirst/masks.zip -d  sirst/

5. 评价指标

这里和论文中的评价指标一致,使用了ROC曲线,虚警率,检测率作为评价指标,同时增加了PR曲线。

相关指标的介绍可以参考 link , 下面给出了基于Paddle的指标计算代码

'''
ROC Metric
'''
class ROCMetric(object):
    """Computes pixAcc and mIoU metric scores
    """
    def __init__(self, nclass, bins):  # bin的意义实际上是确定ROC曲线上的threshold取多少个离散值
        super(ROCMetric, self).__init__()
        self.nclass = nclass
        self.bins = bins
        self.tp_arr = np.zeros(self.bins + 1)
        self.pos_arr = np.zeros(self.bins + 1)
        self.fp_arr = np.zeros(self.bins + 1)
        self.neg_arr = np.zeros(self.bins + 1)
        self.class_pos = np.zeros(self.bins + 1)

    def update(self, preds, labels):
        for iBin in range(self.bins + 1):
            score_thresh = (iBin + 0.0) / self.bins
            i_tp, i_pos, i_fp, i_neg, i_class_pos = cal_tp_pos_fp_neg(
                preds, labels, self.nclass, score_thresh)
            self.tp_arr[iBin] += i_tp
            self.pos_arr[iBin] += i_pos
            self.fp_arr[iBin] += i_fp
            self.neg_arr[iBin] += i_neg
            self.class_pos[iBin] += i_class_pos

    def get(self):
        tp_rates = self.tp_arr / (self.pos_arr + 0.001)
        fp_rates = self.fp_arr / (self.neg_arr + 0.001)

        recall = self.tp_arr / (self.pos_arr + 0.001)
        precision = self.tp_arr / (self.class_pos + 0.001)

        return tp_rates, fp_rates, recall, precision

    def reset(self):
        self.tp_arr = np.zeros([11])
        self.pos_arr = np.zeros([11])
        self.fp_arr = np.zeros([11])
        self.neg_arr = np.zeros([11])
        self.class_pos = np.zeros([11])

def cal_tp_pos_fp_neg(output, target, nclass, score_thresh):
    predict = paddle.cast((F.sigmoid(output) > score_thresh), 'float')

    if len(target.shape) == 3:
        target = np.expand_dims(paddle.cast(target, 'float'), axis=1)
    elif len(target.shape) == 4:
        target =paddle.cast(target, 'float')
    else:
        raise ValueError("Unknown target dimension")

    intersection = predict * ((predict == target))
    intersection = paddle.cast(intersection, 'float32')
    tp = intersection.sum()
    fp = (predict * ((predict != target))).sum()
    tn = ((1 - predict) * ((predict == target))).sum()
    fn = (((predict != target)) * (1 - predict)).sum()
    pos = tp + fn
    neg = fp + tn
    class_pos = tp + fp

    return tp, pos, fp, neg, class_pos
'''
PD FA Metric
'''
class PD_FA():
    def __init__(self, nclass, bins, image_size):
        super(PD_FA, self).__init__()
        self.nclass = nclass
        self.bins = bins
        self.image_area_total = []
        self.image_area_match = []
        self.FA = np.zeros(self.bins + 1)
        self.PD = np.zeros(self.bins + 1)
        self.target = np.zeros(self.bins + 1)
        self.image_size = image_size

    def update(self, preds, labels, image_size):
        preds = preds* 255
        labels = labels * 255
        for iBin in range(self.bins + 1):
            score_thresh = iBin * (255 / self.bins)
            predits = np.array(preds > score_thresh).astype('int64')
            predits = np.reshape(predits, image_size)
            labelss = np.array(labels).astype('int64')  # P
            labelss = np.reshape(labelss, image_size)
            image = measure.label(predits, connectivity=2)
            coord_image = measure.regionprops(image)
            label = measure.label(labelss, connectivity=2)
            coord_label = measure.regionprops(label)
            self.target[iBin] += len(coord_label)
            self.image_area_total = []
            self.image_area_match = []
            self.distance_match = []
            self.dismatch = []
            for K in range(len(coord_image)):
                area_image = np.array(coord_image[K].area)
                self.image_area_total.append(area_image)

            for i in range(len(coord_label)):
                centroid_label = np.array(list(coord_label[i].centroid))
                for m in range(len(coord_image)):
                    centroid_image = np.array(list(coord_image[m].centroid))
                    distance = np.linalg.norm(centroid_image - centroid_label)
                    area_image = np.array(coord_image[m].area)
                    if distance < 3:
                        self.distance_match.append(distance)
                        self.image_area_match.append(area_image)

                        del coord_image[m]
                        break

            self.dismatch = [
                x for x in self.image_area_total if x not in self.image_area_match]
            self.FA[iBin] += np.sum(self.dismatch)
            self.PD[iBin] += len(self.distance_match)

    def get(self, img_num):

        Final_FA = self.FA / ((self.image_size[0] * self.image_size[1]) * img_num)
        Final_PD = self.PD / self.target

        return Final_FA, Final_PD

    def reset(self):
        self.FA = np.zeros([self.bins + 1])
        self.PD = np.zeros([self.bins + 1])
'''
mIOU Metric
'''
class mIoU():
    def __init__(self, nclass):
        super(mIoU, self).__init__()
        self.nclass = nclass
        self.reset()

    def update(self, preds, labels):
        correct, labeled = batch_pix_accuracy(preds, labels)
        inter, union = batch_intersection_union(preds, labels, self.nclass)
        self.total_correct += correct
        self.total_label += labeled
        self.total_inter += inter
        self.total_union += union

    def get(self):
        pixAcc = 1.0 * self.total_correct / (np.spacing(1) + self.total_label)
        IoU = 1.0 * self.total_inter / (np.spacing(1) + self.total_union)
        mIoU = IoU.mean()
        return pixAcc, mIoU

    def reset(self):
        self.total_inter = 0
        self.total_union = 0
        self.total_correct = 0
        self.total_label = 0

def batch_pix_accuracy(output, target):

    if len(target.shape) == 3:
        target = np.expand_dims(target, axis=1)
    elif len(target.shape) == 4:
        target = target
    else:
        raise ValueError("Unknown target dimension")

    assert output.shape == target.shape, "Predict and Label Shape Don't Match"
    predict = (output > 0)
    pixel_labeled = (target > 0).sum()
    pixel_correct = (((predict == target))*((target > 0))).sum()



    assert pixel_correct <= pixel_labeled, "Correct area should be smaller than Labeled"
    return pixel_correct, pixel_labeled


def batch_intersection_union(output, target, nclass):

    mini = 1
    maxi = 1
    nbins = 1
    predict = (output > 0)
    if len(target.shape) == 3:
        target = np.expand_dims(target, axis=1)
    elif len(target.shape) == 4:
        target = target
    else:
        raise ValueError("Unknown target dimension")
    intersection = predict * ((predict == target))

    area_inter, _  = np.histogram(intersection, bins=nbins, range=(mini, maxi))
    area_pred,  _  = np.histogram(predict, bins=nbins, range=(mini, maxi))
    area_lab,   _  = np.histogram(target, bins=nbins, range=(mini, maxi))
    area_union     = area_pred + area_lab - area_inter
    return area_inter, area_union

5.1 对比算法

在最后我们将对比该算法与经典分割算法的效果对比,这里我们也基于Paddle实现了Unet和FCN算法

import paddle 
from unet import Unet
from fcn import FCN
model = Unet(n_class=1)
model = paddle.Model(model)
model.summary((1, 1, 256, 256))
model = FCN(num_classes=1)
model = paddle.Model(model)
model.summary((1, 1, 256, 256))
!python -m pip install -U scikit-image

5.2 基本模型

作者提出了将VGG 和ResNet 的基本模块作为特征提取网络的基本网络元件

import paddle
from paddle import nn
import paddle.nn.functional as F
use_gpu = True
paddle.device.set_device('gpu:0') if use_gpu else paddle.device.set_device('cpu')
paddle.seed(1024)
<paddle.fluid.core_avx.Generator at 0x7f29df8bd770>
class VGG_CBAM_Block(nn.Layer):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.conv1 = nn.Conv2D(in_channels, out_channels, 3, padding=1)
        self.bn1 = nn.BatchNorm2D(out_channels)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2D(out_channels, out_channels, 3, padding=1)
        self.bn2 = nn.BatchNorm2D(out_channels)
        self.relu = nn.ReLU()
        self.ca = ChannelAttention(out_channels)
        self.sa = SpatialAttention()

    def forward(self, x):
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.ca(out) * out
        out = self.sa(out) * out
        out = self.relu(out)
        return out

class ChannelAttention(nn.Layer):
    def __init__(self, in_planes, ratio=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2D(1)
        self.max_pool = nn.AdaptiveMaxPool2D(1)
        self.fc1   = nn.Conv2D(in_planes, in_planes // 16, 1, bias_attr=None)
        self.relu1 = nn.ReLU()
        self.fc2   = nn.Conv2D(in_planes // 16, in_planes, 1, bias_attr=None)
        self.sigmoid = nn.Sigmoid()
    def forward(self, x):
        avg_out = self.fc2(self.relu1(self.fc1(self.avg_pool(x))))
        max_out = self.fc2(self.relu1(self.fc1(self.max_pool(x))))
        out = avg_out + max_out
        return self.sigmoid(out)

class SpatialAttention(nn.Layer):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()
        assert kernel_size in (3, 7), 'kernel size must be 3 or 7'
        padding = 3 if kernel_size == 7 else 1
        self.conv1 = nn.Conv2D(2, 1, kernel_size, padding=padding, bias_attr=None)
        self.sigmoid = nn.Sigmoid()
    def forward(self, x):
        avg_out = paddle.mean(x, axis=1, keepdim=True)
        max_out = paddle.max(x, axis=1, keepdim=True)
        x = paddle.concat([avg_out, max_out], axis=1)
        x = self.conv1(x)
        return self.sigmoid(x)

class Res_CBAM_block(nn.Layer):
    def __init__(self, in_channels, out_channels, stride = 1):
        super(Res_CBAM_block, self).__init__()
        self.conv1 = nn.Conv2D(in_channels, out_channels, kernel_size = 3, stride = stride, padding = 1)
        self.bn1 = nn.BatchNorm2D(out_channels)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2D(out_channels, out_channels, kernel_size = 3, padding = 1)
        self.bn2 = nn.BatchNorm2D(out_channels)
        if stride != 1 or out_channels != in_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2D(in_channels, out_channels, kernel_size = 1, stride = stride),
                nn.BatchNorm2D(out_channels))
        else:
            self.shortcut = None

        self.ca = ChannelAttention(out_channels)
        self.sa = SpatialAttention()

    def forward(self, x):
        residual = x
        if self.shortcut is not None:
            residual = self.shortcut(x)
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.ca(out) * out
        out = self.sa(out) * out
        out += residual
        out = self.relu(out)
        return out

5.3 密集特征交互网络

class DNANet(nn.Layer):
    def __init__(self, num_classes, input_channels, block, num_blocks, nb_filter,deep_supervision=False):
        super(DNANet, self).__init__()
        self.relu = nn.ReLU()
        self.deep_supervision = deep_supervision
        self.pool  = nn.MaxPool2D(2, 2)
        self.up    = nn.Upsample(scale_factor=2,   mode='bilinear', align_corners=True)
        self.down  = nn.Upsample(scale_factor=0.5, mode='bilinear', align_corners=True)

        self.up_4  = nn.Upsample(scale_factor=4,   mode='bilinear', align_corners=True)
        self.up_8  = nn.Upsample(scale_factor=8,   mode='bilinear', align_corners=True)
        self.up_16 = nn.Upsample(scale_factor=16,  mode='bilinear', align_corners=True)

        self.conv0_0 = self._make_layer(block, input_channels, nb_filter[0])
        self.conv1_0 = self._make_layer(block, nb_filter[0],  nb_filter[1], num_blocks[0])
        self.conv2_0 = self._make_layer(block, nb_filter[1],  nb_filter[2], num_blocks[1])
        self.conv3_0 = self._make_layer(block, nb_filter[2],  nb_filter[3], num_blocks[2])
        self.conv4_0 = self._make_layer(block, nb_filter[3],  nb_filter[4], num_blocks[3])

        self.conv0_1 = self._make_layer(block, nb_filter[0] + nb_filter[1],  nb_filter[0])
        self.conv1_1 = self._make_layer(block, nb_filter[1] + nb_filter[2] + nb_filter[0],  nb_filter[1], num_blocks[0])
        self.conv2_1 = self._make_layer(block, nb_filter[2] + nb_filter[3] + nb_filter[1],  nb_filter[2], num_blocks[1])
        self.conv3_1 = self._make_layer(block, nb_filter[3] + nb_filter[4] + nb_filter[2],  nb_filter[3], num_blocks[2])

        self.conv0_2 = self._make_layer(block, nb_filter[0]*2 + nb_filter[1], nb_filter[0])
        self.conv1_2 = self._make_layer(block, nb_filter[1]*2 + nb_filter[2]+ nb_filter[0], nb_filter[1], num_blocks[0])
        self.conv2_2 = self._make_layer(block, nb_filter[2]*2 + nb_filter[3]+ nb_filter[1], nb_filter[2], num_blocks[1])

        self.conv0_3 = self._make_layer(block, nb_filter[0]*3 + nb_filter[1], nb_filter[0])
        self.conv1_3 = self._make_layer(block, nb_filter[1]*3 + nb_filter[2]+ nb_filter[0], nb_filter[1], num_blocks[0])

        self.conv0_4 = self._make_layer(block, nb_filter[0]*4 + nb_filter[1], nb_filter[0])

        self.conv0_4_final = self._make_layer(block, nb_filter[0]*5, nb_filter[0])

        self.conv0_4_1x1 = nn.Conv2D(nb_filter[4], nb_filter[0], kernel_size=1, stride=1)
        self.conv0_3_1x1 = nn.Conv2D(nb_filter[3], nb_filter[0], kernel_size=1, stride=1)
        self.conv0_2_1x1 = nn.Conv2D(nb_filter[2], nb_filter[0], kernel_size=1, stride=1)
        self.conv0_1_1x1 = nn.Conv2D(nb_filter[1], nb_filter[0], kernel_size=1, stride=1)

        if self.deep_supervision:
            self.final1 = nn.Conv2D (nb_filter[0], num_classes, kernel_size=1)
            self.final2 = nn.Conv2D (nb_filter[0], num_classes, kernel_size=1)
            self.final3 = nn.Conv2D (nb_filter[0], num_classes, kernel_size=1)
            self.final4 = nn.Conv2D (nb_filter[0], num_classes, kernel_size=1)
        else:
            self.final  = nn.Conv2D (nb_filter[0], num_classes, kernel_size=1)

    def _make_layer(self, block, input_channels,  output_channels, num_blocks=1):
        layers = []
        layers.append(block(input_channels, output_channels))
        for i in range(num_blocks-1):
            layers.append(block(output_channels, output_channels))
        return nn.Sequential(*layers)

    def forward(self, input):
        x0_0 = self.conv0_0(input)
        x1_0 = self.conv1_0(self.pool(x0_0))
        x0_1 = self.conv0_1(paddle.concat([x0_0, self.up(x1_0)], 1))

        x2_0 = self.conv2_0(self.pool(x1_0))
        x1_1 = self.conv1_1(paddle.concat([x1_0, self.up(x2_0),self.down(x0_1)], 1))
        x0_2 = self.conv0_2(paddle.concat([x0_0, x0_1, self.up(x1_1)], 1))

        x3_0 = self.conv3_0(self.pool(x2_0))
        x2_1 = self.conv2_1(paddle.concat([x2_0, self.up(x3_0),self.down(x1_1)], 1))
        x1_2 = self.conv1_2(paddle.concat([x1_0, x1_1, self.up(x2_1),self.down(x0_2)], 1))
        x0_3 = self.conv0_3(paddle.concat([x0_0, x0_1, x0_2, self.up(x1_2)], 1))

        x4_0 = self.conv4_0(self.pool(x3_0))
        x3_1 = self.conv3_1(paddle.concat([x3_0, self.up(x4_0),self.down(x2_1)], 1))
        x2_2 = self.conv2_2(paddle.concat([x2_0, x2_1, self.up(x3_1),self.down(x1_2)], 1))
        x1_3 = self.conv1_3(paddle.concat([x1_0, x1_1, x1_2, self.up(x2_2),self.down(x0_3)], 1))
        x0_4 = self.conv0_4(paddle.concat([x0_0, x0_1, x0_2, x0_3, self.up(x1_3)], 1))

        Final_x0_4 = self.conv0_4_final(
            paddle.concat([self.up_16(self.conv0_4_1x1(x4_0)),self.up_8(self.conv0_3_1x1(x3_1)),
                       self.up_4 (self.conv0_2_1x1(x2_2)),self.up  (self.conv0_1_1x1(x1_3)), x0_4], 1))

        if self.deep_supervision:
            output1 = self.final1(x0_1)
            output2 = self.final2(x0_2)
            output3 = self.final3(x0_3)
            output4 = self.final4(Final_x0_4)
            return [output1, output2, output3, output4]
        else:
            output = self.final(Final_x0_4)
            return output

5.4 模型组网验证

in_channels = 1
nb_filter = [16, 16, 32, 64, 128]
num_blocks = [2, 2, 2, 2]
model = DNANet(num_classes=1,input_channels=in_channels, block=Res_CBAM_block, num_blocks=num_blocks, nb_filter=nb_filter)
model = paddle.Model(model)
model.summary((1, 1, 256, 256))

6. 构建数据流

from paddle.io import Dataset,DataLoader
from paddle.optimizer import Adam
from paddle.optimizer.lr import ReduceOnPlateau
from paddle.vision.transforms import transforms as T
import matplotlib.pyplot as plt
from skimage.segmentation import mark_boundaries
import glob 
import os 
from PIL import Image 
import numpy as np 
from utils import * 
from copy import deepcopy
from scipy.integrate import simps
class InfraredDataset(Dataset):
    def __init__(self, dataset_dir, image_index, image_size=256):
        super(InfraredDataset, self).__init__()
        self.dataset_dir = dataset_dir
        self.image_index = image_index
        self.transformer = T.Compose([
            T.Resize((int(image_size), int(image_size))),
            T.Grayscale(),
            T.ToTensor(),
        ])

    def __getitem__(self, index):
        image_index = self.image_index[index].strip('\n')
        image_path = os.path.join(self.dataset_dir, 'images', '%s.png' % image_index)
        label_path = os.path.join(self.dataset_dir, 'masks', '%s_pixels0.png' % image_index)
        image = Image.open(image_path)
        label = Image.open(label_path)
        train_image = self.transformer(image)
        label = self.transformer(label)
        return paddle.cast(train_image, 'float32'), paddle.cast(label, 'float32')

    def __len__(self):
        return len(self.image_index)
f = open('./sirst/idx_427/trainval.txt').readlines()
ds = InfraredDataset(dataset_dir='./sirst', image_index=f)
image , label = next(iter(ds))
image, label = image.numpy(), label.numpy()

6.1 数据流验证

plt.subplot(121)
plt.imshow(image[0], cmap='gray')
plt.subplot(122)
plt.imshow(np.uint8(label[0]), cmap='gray')
<matplotlib.image.AxesImage at 0x7f2827706090>

在这里插入图片描述

6.2 构建DataLoader

dataset_dir = './sirst'
train_index = open('./sirst/idx_320/train.txt').readlines()
test_index = open('./sirst/idx_320/test.txt').readlines()
batch_size = 8
image_size = (256, 256)
train_ds = InfraredDataset(dataset_dir, train_index, image_size[0])
test_ds = InfraredDataset(dataset_dir, test_index, image_size[0])
train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=True, num_workers=8)
test_dl = DataLoader(test_ds, batch_size=1,
                        shuffle=False,  num_workers=8)

7. 模型训练

为了节省篇幅,我们在utils.py文件中定义算法的训练流程,Loss。这里并没有使用论文中的IOU loss ,而是使用了分割领域更为常用的Dice loss。

init_epoch = 0  # 初始步数
epochs = 100  # 训练总轮数
model_type ='dnanet'
in_channels = 1
nb_filter = [16, 32, 64, 128, 256]
num_blocks = [2, 2, 2, 2]
model = DNANet(num_classes=1,input_channels=in_channels, block=Res_CBAM_block, num_blocks=num_blocks, nb_filter=nb_filter)
ckpt = 'weights/%s_best.params' % model_type  # 预训练模型保存位置
scheduler = paddle.optimizer.lr.ReduceOnPlateau(learning_rate=0.001, factor=0.5, patience=5, verbose=True)
optimizer = paddle.optimizer.AdamW(learning_rate=scheduler, parameters=model.parameters())

def train():
    best_wts = deepcopy(model.state_dict())
    best_loss = float('inf')

    for epoch in range(init_epoch, epochs):
        model.train()
        train_loss, train_metric = loss_epoch(
            epoch, model, loss_func, train_dl, sanity_check=False, opt=optimizer)

        model.eval()
        with paddle.no_grad():
            val_loss, val_metric = loss_epoch(
                epoch, model, loss_func, train_dl, sanity_check=False, opt=None, roc=None)

        if val_loss < best_loss:
            best_loss = val_loss
            best_wts = deepcopy(model.state_dict())
            print("Save Best Model")
            paddle.save(model.state_dict(), ckpt)
train()
model.set_state_dict(paddle.load(ckpt))
model.eval()
TF = T.Compose([
    T.Grayscale(),
    T.Resize((int(image_size[0]), int(image_size[1]))),
    T.ToTensor(),
])
image = Image.open('./sirst/images/Misc_1.png')
label = Image.open('./sirst/masks/Misc_1_pixels0.png')
tensor_img = TF(image)
tensor_img = paddle.unsqueeze(tensor_img, 0)
pred = model(tensor_img)[0]

8. 模型验证

import cv2 
w, h = image.size
prediction = F.sigmoid(pred[0])
prediction = cv2.resize(prediction.numpy(), (w, h))
plt.figure(figsize=(30, 30))
plt.subplot(131)
plt.title('Input')
plt.imshow(np.array(image), cmap='gray')
plt.subplot(132)
plt.title('Pred')
plt.imshow(prediction, cmap='gray')
plt.subplot(133)
plt.title('Label')
plt.imshow(label, cmap='gray')
<matplotlib.image.AxesImage at 0x7efdc18094d0>

在这里插入图片描述

def evulaute(model, ckpt):
    roc = ROCMetric(1, 10)
    pd_fa = PD_FA(1, 10, image_size)
    miou  = mIoU(1)
    model.set_state_dict(paddle.load(ckpt))
    model.eval()
    for i, (xb, yb) in enumerate(tqdm(test_dl)):
        output =model(xb)
        preds = F.sigmoid(output)
        roc.update(output, yb)
        pd_fa.update(preds, yb, image_size)
        miou.update(output, yb)
    ture_positive_rate, false_positive_rate, recall, precision= roc.get()
    FA, PD = pd_fa.get(img_num=len(test_dl))
    _, mean_IOU = miou.get()
    return ture_positive_rate, false_positive_rate, recall, precision, FA, PD, mean_IOU
res = []
for model_type in ['unet', 'dnanet',  'fcn']:
    if model_type == 'dnanet':
        in_channels = 1
        nb_filter = [16, 32, 64, 128, 256]
        num_blocks = [2, 2, 2, 2]
        model = DNANet(num_classes=1,input_channels=in_channels, block=Res_CBAM_block, num_blocks=num_blocks, nb_filter=nb_filter)
        
    elif model_type == 'unet':
        model = Unet(n_class=1)
    else:
        model = FCN(1)

    ckpt = 'weights/%s_best.params' % model_type  # 预训练模型保存位置
    re = evulaute(model, ckpt)
    res.append(re)
# ture_positive_rate, false_positive_rate, recall, precision, FA, PD, mean_IOU
plt.title('ROC')
plt.plot(res[0][1], res[0][0], label='UNet')
plt.plot(res[1][1], res[1][0], label='DNA')
plt.plot(res[2][1], res[2][0], label='FCN')
plt.legend()
plt.xlabel('True-positive rate')
plt.ylabel('False-positive rate')
Text(0,0.5,'False-positive rate')

在这里插入图片描述

plt.title('R-P')
plt.plot(res[0][2], res[0][3], label='Unet')
plt.plot(res[1][2], res[1][3], label='DNA')
plt.plot(res[2][2], res[2][3], label='FCN')
plt.legend()
plt.xlabel('Recall')
plt.ylabel('Precision')
Text(0,0.5,'Precision')

在这里插入图片描述

print(' DNANet   PD {:.3f}  FA {:.6f}  IOU {:.3f}'.format(res[1][4][0], res[1][5][0],res[1][6]))
 DNANet   PD 0.990  FA 0.000489  IOU 0.799

9. 结论

使用Dice loss 后基于Paddle的复现结果高于原文的实验结果

算法mIOUPD
原文79.2698.48
Paddle复现79.9099.00

在上述的PR曲线和ROC曲线中 DNANet算法的结果优于Unet,FCN也足以说明算法的有效性,同时 DNANet算法得到的模型参数量仅18MB,Unet和FCN则分别达到了100MB,43MB。

本文仅为搬运,原项目地址:https://aistudio.baidu.com/aistudio/projectdetail/4262944

Logo

学大模型,用大模型上飞桨星河社区!每天8点V100G算力免费领!免费领取ERNIE 4.0 100w Token >>>

更多推荐