【第五期论文复现赛-语义分割】ESPNet
第五期论文复现赛ESPNet,ESPNet适用于语义分割任务,本次复现的目标是Cityscapes 验证集miou 60.30%,复现的miou61.82%,该算法已被PaddleSeg合入。
【论文复现赛】ESPNet:Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation
本文提出了高效空间金字塔模块(Efficient Spatial Pyramid Modules),该模块基于卷积分解原理,将标准卷积分解为逐点卷积(point-wise convolutions)和空间金字塔型膨胀卷积(spatial pyramid of dilated convolutions),显著提高了计算、内存和功率方面的效率。该模型在Cityscapes验证集上mIOU为60.30%,本次复现的mIOU为61.82%,该算法已被PaddleSeg收录。
代码参考:https://github.com/sacmehta/ESPNet
本项目地址:https://github.com/simuler/ESPNet
一、模型结构
如上图所示,文中给出了4种网络结构,其中前3种网络的输出mask为输入尺寸的1/8。文中引入了超参数-ESP的堆叠个数,由于网络的前两个stage,特征映射较大,计算量较大和占用内存较多,因此只在后面几个stage堆叠ESP模块。
首先,ESPNet-A为基础网络,它以RGB图像作为输入,并使用ESP模块学习不同空间级别的特征,最终通过1x1conv得到mask。ESP-B通过共享前一个跨步ESP模块和前一个ESP模块的特征映射,改善了信息流。ESPNet-C加强了ESPNet-B内部的输入图像,进一步改善了信息流。这三个网络产生的maks是输入图像的1/8。最后,ESPNet在ESPNet-C的基础上,添加了一个轻量级的解码器,从而得到与输入图像相同分辨率的mask。
二、ESP(Efficient Spatial Pyramid Modules)
如图a所示,ESP模块利用卷积分解原理将标准卷积分解为点卷积和扩展卷积。
step1:对输入通道为M,卷积因子数为K,输入为1x1的卷积降维。
step2:使用多个不同膨胀率的卷积核对低维特征特征进行卷积,paddle代码如下:
self.d_conv1 = nn.Conv2D(branch_channels, remain_channels, 3, padding=1, bias_attr=False)
self.d_conv2 = nn.Conv2D(branch_channels, branch_channels, 3, padding=2, dilation=2, bias_attr=False)
self.d_conv4 = nn.Conv2D(branch_channels, branch_channels, 3, padding=4, dilation=4, bias_attr=False)
self.d_conv8 = nn.Conv2D(branch_channels, branch_channels, 3, padding=8, dilation=8, bias_attr=False)
self.d_conv16 = nn.Conv2D(branch_channels, branch_channels, 3, padding=16, dilation=16, bias_attr=False)
图b为ESP模块框图,ESP模块使用大空洞率的堆叠卷积结构容易形成伪影,因此本文采用了HHF分层特征融合消除了这些伪影。同时在输入和输出之间增加了跳跃连接,改善了信息流。
三、HHF对比
膨胀率r=2和3x3膨胀卷积核的ESP模块的效果如上图所示,其中使用了HHF分层特征融合的ESP模块消除了伪影,使分割效果更好。
四、实验结果
图中所示为ESPNet在Cityscapes数据集上的测试结果,mIou为60.3%。
六、核心代码
class ESPNetV1(nn.Layer):
"""
The ESPNetV1 implementation based on PaddlePaddle.
The original article refers to
Sachin Mehta1, Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi. "ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation"
(https://arxiv.org/abs/1803.06815).
Args:
num_classes (int): The unique number of target classes.
in_channels (int, optional): Number of input channels. Default: 3.
level2_depth (int, optional): Depth of DilatedResidualBlock. Default: 2.
level3_depth (int, optional): Depth of DilatedResidualBlock. Default: 3.
pretrained (str, optional): The path or url of pretrained model. Default: None.
"""
def __init__(self,
num_classes,
in_channels=3,
level2_depth=2,
level3_depth=3,
pretrained=None):
super().__init__()
self.encoder = ESPNetEncoder(num_classes, in_channels, level2_depth,
level3_depth)
self.level3_up = nn.Conv2DTranspose(num_classes,
num_classes,
2,
stride=2,
padding=0,
output_padding=0,
bias_attr=False)
self.br3 = layers.SyncBatchNorm(num_classes)
self.level2_proj = nn.Conv2D(in_channels + 128,
num_classes,
1,
bias_attr=False)
self.combine_l2_l3 = nn.Sequential(
BNPReLU(2 * num_classes),
DilatedResidualBlock(2 * num_classes, num_classes, residual=False),
)
self.level2_up = nn.Sequential(
nn.Conv2DTranspose(num_classes,
num_classes,
2,
stride=2,
padding=0,
output_padding=0,
bias_attr=False),
BNPReLU(num_classes),
)
self.out_proj = layers.ConvBNPReLU(16 + in_channels + num_classes,
num_classes,
3,
padding='same',
stride=1)
self.out_up = nn.Conv2DTranspose(num_classes,
num_classes,
2,
stride=2,
padding=0,
output_padding=0,
bias_attr=False)
self.pretrained = pretrained
def init_weight(self):
if self.pretrained is not None:
utils.load_entire_model(self, self.pretrained)
def forward(self, x):
p1, p2, p3 = self.encoder(x)
up_p3 = self.level3_up(p3)
combine = self.combine_l2_l3(paddle.concat([up_p3, p2], axis=1))
up_p2 = self.level2_up(combine)
combine = self.out_proj(paddle.concat([up_p2, p1], axis=1))
out = self.out_up(combine)
return [out]
七、ESPNet在线体验
运行以下代码,体验ESPNet训练、验证和预测。
step 1: 解压cityscape数据集
step 2: 训练ESPNet
step 3: 测试ESPNet在验证集的效果(这里给出个训练过程中最好的权重验证结果,对应的日志和vdl可视化文件点击复现结果中链接可以下载)
# step 1: unzip data
%cd ~/data/data64550/
/
!tar -xf cityscapes.tar
# step 2: train
%cd ~/ESPNet/
!python train.py --config /home/aistudio/ESPNet/configs/espnetv1/espnetv1_cityscapes_1024x512_120k.yml --do_eval --use_vdl --log_iter 10 --save_interval 2000 --save_dir output
# step 3: val
%cd /home/aistudio/ESPNet/
!python val.py --config /home/aistudio/ESPNet/configs/espnetv1/espnetv1_cityscapes_1024x512_120k.yml --model_path output/best_model/model.pdparams
/home/aistudio/ESPNet
2022-01-07 10:32:36 [INFO]
---------------Config Information---------------
batch_size: 4
iters: 120000
loss:
coef:
- 1
types:
- ignore_index: 255
type: CrossEntropyLoss
weight:
- 2.79834108
- 6.92945723
- 3.84068512
- 9.94349362
- 9.77098823
- 9.51484
- 10.30981624
- 9.94307377
- 4.64933892
- 9.55759938
- 7.86692178
- 9.53126629
- 10.3496365
- 6.67234062
- 10.26054204
- 10.28785275
- 10.28988296
- 10.40546021
- 10.13848367
lr_scheduler:
end_lr: 0.0
learning_rate: 0.001
power: 0.9
type: PolynomialDecay
model:
in_channels: 3
level2_depth: 2
level3_depth: 8
num_classes: 19
type: ESPNetV1
optimizer:
type: adam
weight_decay: 0.0002
train_dataset:
dataset_root: /home/aistudio/data/data64550/cityscapes
mode: train
transforms:
- max_scale_factor: 2.0
min_scale_factor: 0.5
scale_step_size: 0.25
type: ResizeStepScaling
- crop_size:
- 1024
- 512
type: RandomPaddingCrop
- type: RandomHorizontalFlip
- brightness_range: 0.4
contrast_range: 0.4
saturation_range: 0.4
type: RandomDistort
- type: Normalize
type: Cityscapes
val_dataset:
dataset_root: /home/aistudio/data/data64550/cityscapes
mode: val
transforms:
- type: Normalize
type: Cityscapes
------------------------------------------------
W0107 10:32:36.203390 9343 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0107 10:32:36.203438 9343 device_context.cc:465] device: 0, cuDNN Version: 7.6.
2022-01-07 10:32:40 [INFO] Loading pretrained model from output/best_model/model.pdparams
2022-01-07 10:32:40 [INFO] There are 211/211 variables loaded into ESPNetV1.
2022-01-07 10:32:40 [INFO] Loaded trained params of model successfully
2022-01-07 10:32:40 [INFO] Start evaluating (total_samples: 500, total_iters: 500)...
500/500 [==============================] - 81s 162ms/step - batch_cost: 0.1618 - reader cost: 0.1204
2022-01-07 10:34:01 [INFO] [EVAL] #Images: 500 mIoU: 0.6182 Acc: 0.9341 Kappa: 0.9148
2022-01-07 10:34:01 [INFO] [EVAL] Class IoU:
[0.9667 0.768 0.8798 0.4199 0.4632 0.5244 0.4507 0.6023 0.8938 0.5562
0.9045 0.6813 0.4034 0.901 0.4674 0.5768 0.3732 0.2815 0.6312]
2022-01-07 10:34:01 [INFO] [EVAL] Class Acc:
[0.9911 0.8374 0.9423 0.7118 0.6159 0.6614 0.6543 0.7398 0.9456 0.7007
0.9254 0.7594 0.6243 0.9352 0.7434 0.6963 0.5144 0.495 0.7439]
八、复现结果
本次论文复现赛要求是Cityscapes 验证集mIOU达到60.3%,本次复现的结果为mIOU 61.82%。
环境:
paddlepaddle==2.2.0
Tesla v100 * 4
Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
---|---|---|---|---|---|---|---|
ESPNetV2 | - | 1024x512 | 120000 | 61.82% | 62.20% | 62.89% | model | log | vdl |
九、复现经验
1、在复现模型阶段,一定要多查paddle和torch的映射表,认真对比api之间的不同之处,另外paddleseg在有某些更完善的api可供使用,比如layers中的一些api。
2、如果复现的精度相差较大,记得认真对比原论文中的参数。
十、致谢
非常感谢AiStudio平台提供的算力和奖金支持,感谢Paddle团队的辛勤付出。
非常感谢dudu大佬带领我参加这个比赛,让我少走了很多弯路。
最后,希望论文复现赛越办越好。
个人介绍
姓名:宁文彬
学校:东北大学
年级:研二
GitHub: [https://github.com/simuler](https://github.com/simuler)
请点击此处查看本环境基本用法.
Please click here for more detailed instructions.
更多推荐
所有评论(0)