
ParC-Net:结合ConvNets和Transformer优点的位置感知循环
本文提出了基于纯卷积的轻量化模型ParC-Net,继承了ViT的全局建模能力,同时又保持了卷积的高计算效率。
★★★ 本文源自AlStudio社区精品项目,【点击此处】查看更多精品内容 >>>
摘要
最近,视觉Transformer开始显示出令人印象深刻的结果,显著优于基于大卷积的模型。 然而,在面向移动或资源受限设备的小模型领域,ConvNet在性能和模型复杂度方面仍有自己的优势。 我们提出了PARC-Net,这是一个纯基于ConvNet的主干模型,通过将视觉Transformer的优点融合到ConvNet中来进一步增强这些优点。 具体来说,我们提出了位置感知循环卷积(ParC),这是一种轻量级的卷积运算,它拥有全局感受野,同时产生像局部卷积一样的位置敏感特征。 我们将ParCs和压缩激励操作结合起来,形成一个类似于MetaFormer的模型块,进一步具有类似于Transformer的注意力机制。 上述模块可以以即插即用的方式用于替换Convnets或Transformer中的相关模块。 实验结果表明,在一般的视觉任务和数据集中,该ParC-Net比目前流行的轻量级ConvNets和基于Transformer的模型具有更好的性能,且参数更少,推理速度更快。 在ImageNet-1K上,ParC-Net的分类准确率达到78.6%,约500万个参数,节省11%的参数和13%的计算成本,但比MobileViT(基于ARM的RockChip RK3288)的分类准确率高0.2%,推理速度快23%;与DeiT相比,仅使用0.5×个参数,准确率提高2.7%。 在MS-COCO目标检测和PASCAL VOC分割任务上,ParC-Net也表现出较好的性能。
1. ParC Block
如下图所示,本文比较了ConvNets与ViT的优劣,并总结了ViT相对于ConvNets的几个优势:
- ViT擅长提取全局特征
- ViT使用Meta-Former形式
- 在信息聚合中,ViT是数据驱动的(即与数据相关)
由此,本文提出了ParC Block,既保留ConvNets的高计算效率,又可以继承ViT的全局建模能力。如图2所示,本文提出将输入沿通道分成两部分,一部分在H维进行全局建模,一部分在V维进行全局建模,具体的全局建模可以表示为如下公式:
p e V = F ( p e ~ V ) = [ p e 0 V , p e 1 V , ⋯ , p e h − 1 V ] T p e e V = E V ( p e V , w ) k V = F ( k ~ ) = [ k 0 V , k 1 V , ⋯ , k h − 1 V ] x p = x + p e e V y i , j = ∑ t ∈ ( 0 , h − 1 ) k t V x ( ( i + t ) m o d h , j ) p \begin{array}{l} p e^{V}=F\left(\widetilde{p e}^{V}\right)=\left[p e_{0}^{V}, p e_{1}^{V}, \cdots, p e_{h-1}^{V}\right]^{T} \\ p e_{e}^{V}=E V\left(p e^{V}, w\right) \\ k^{V}=F(\widetilde{k})=\left[k_{0}^{V}, k_{1}^{V}, \cdots, k_{h-1}^{V}\right] \\ x^{p}=x+p e_{e}^{V} \\ y_{i, j}=\sum_{t \in(0, h-1)} k_{t}^{V} x_{((i+t) \bmod h, j)}^{p} \end{array} peV=F(pe V)=[pe0V,pe1V,⋯,peh−1V]TpeeV=EV(peV,w)kV=F(k )=[k0V,k1V,⋯,kh−1V]xp=x+peeVyi,j=∑t∈(0,h−1)ktVx((i+t)modh,j)p
上式可以理解为先对输入在V维嵌入位置编码,然后将其复制并合并起来,以方便后续的全局建模能力(使用一个等于V维大小的卷积核来实现,如图3所示),这里的位置编码是必不可少的,因为由于循环的操作,卷积已经无法进行位置感知。
ParC Block的优势:
- ParC能够从全局空间中提取全局特征和像素间的交互
- 自注意力的计算与输入大小呈二次关系,用ParC去替换可以大大降低计算成本
2. 代码复现
2.1 下载并导入所需要的包
%matplotlib inline
import paddle
import paddle.fluid as fluid
import numpy as np
import matplotlib.pyplot as plt
from paddle.vision.datasets import Cifar10
from paddle.vision.transforms import Transpose
from paddle.io import Dataset, DataLoader
from paddle import nn
import paddle.nn.functional as F
import paddle.vision.transforms as transforms
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
from paddle import ParamAttr
from paddle.nn.layer.norm import _BatchNormBase
2.2 创建数据集
train_tfm = transforms.Compose([
transforms.RandomResizedCrop(32),
transforms.RandomHorizontalFlip(0.5),
transforms.ToTensor(),
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)),
])
test_tfm = transforms.Compose([
transforms.Resize((32, 32)),
transforms.ToTensor(),
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)),
])
paddle.vision.set_image_backend('cv2')
# 使用Cifar100数据集
train_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='train', transform = train_tfm)
val_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='test',transform = test_tfm)
print("train_dataset: %d" % len(train_dataset))
print("val_dataset: %d" % len(val_dataset))
train_dataset: 50000
val_dataset: 10000
batch_size=256
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, drop_last=False, num_workers=4)
2.3 标签平滑
class LabelSmoothingCrossEntropy(nn.Layer):
def __init__(self, smoothing=0.1):
super().__init__()
self.smoothing = smoothing
def forward(self, pred, target):
confidence = 1. - self.smoothing
log_probs = F.log_softmax(pred, axis=-1)
idx = paddle.stack([paddle.arange(log_probs.shape[0]), target], axis=1)
nll_loss = paddle.gather_nd(-log_probs, index=idx)
smooth_loss = paddle.mean(-log_probs, axis=-1)
loss = confidence * nll_loss + self.smoothing * smooth_loss
return loss.mean()
2.4 ResNet-ParC
2.4.1 ParC
class ParC_op(nn.Layer):
def __init__(self, dim, type, global_kernel_size, use_pe=True, groups=1):
super().__init__()
self.type = type
self.dim = dim
self.global_kernel_size = global_kernel_size
self.use_pe = use_pe
self.kernel_size = (1, self.global_kernel_size) if type == 'H' else (self.global_kernel_size, 1)
self.gcc_conv = nn.Conv2D(dim, dim, kernel_size=self.kernel_size, stride=1, padding=0, groups=groups)
if use_pe:
if self.type == 'H':
self.pe = self.create_parameter((1, dim, 1, self.global_kernel_size))
else:
self.pe = self.create_parameter((1, dim, self.global_kernel_size, 1))
def forward(self, x):
if self.use_pe:
x = x + self.pe
x_cat = paddle.concat([x, x[:, :, :, :-1]], axis=-1) if self.type == 'H' else paddle.concat([x, x[:, :, :-1, :]], axis=-2)
x = self.gcc_conv(x_cat)
return x
class ParCBlock(nn.Layer):
expansion = 1
def __init__(
self,
inplanes,
planes,
stride=1,
downsample=None,
groups=1,
base_width=64,
dilation=1,
norm_layer=None,
use_pe = True,
global_kernel_size = 14
):
super().__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2D
width = int(planes * (base_width / 64.0)) * groups
self.conv1 = nn.Conv2D(inplanes, width, 1, bias_attr=False)
self.bn1 = norm_layer(width)
self.parc_H = ParC_op(width//2, 'H', global_kernel_size, use_pe, groups = groups)
self.parc_W = ParC_op(width//2, 'W', global_kernel_size, use_pe, groups = groups)
self.bn2 = norm_layer(width)
self.conv3 = nn.Conv2D(
width, planes * self.expansion, 1, bias_attr=False
)
self.bn3 = norm_layer(planes * self.expansion)
self.relu = nn.ReLU()
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
outH, outW = paddle.chunk(out, 2, axis=1)
outH = self.parc_H(outH)
outW = self.parc_W(outW)
out = paddle.concat([outH, outW], axis=1)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
model = ParCBlock(64, 64)
paddle.summary(model, (1, 64, 14, 14))
W0215 08:32:45.164451 1049 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0215 08:32:45.169569 1049 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
---------------------------------------------------------------------------
Layer (type) Input Shape Output Shape Param #
===========================================================================
Conv2D-1 [[1, 64, 14, 14]] [1, 64, 14, 14] 4,096
BatchNorm2D-1 [[1, 64, 14, 14]] [1, 64, 14, 14] 256
ReLU-1 [[1, 64, 14, 14]] [1, 64, 14, 14] 0
Conv2D-2 [[1, 32, 14, 27]] [1, 32, 14, 14] 14,368
ParC_op-1 [[1, 32, 14, 14]] [1, 32, 14, 14] 448
Conv2D-3 [[1, 32, 27, 14]] [1, 32, 14, 14] 14,368
ParC_op-2 [[1, 32, 14, 14]] [1, 32, 14, 14] 448
BatchNorm2D-2 [[1, 64, 14, 14]] [1, 64, 14, 14] 256
Conv2D-4 [[1, 64, 14, 14]] [1, 64, 14, 14] 4,096
BatchNorm2D-3 [[1, 64, 14, 14]] [1, 64, 14, 14] 256
===========================================================================
Total params: 38,592
Trainable params: 38,208
Non-trainable params: 384
---------------------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 0.77
Params size (MB): 0.15
Estimated Total Size (MB): 0.96
---------------------------------------------------------------------------
{'total_params': 38592, 'trainable_params': 38208}
2.4.2 ResNet-ParC
class BasicBlock(nn.Layer):
expansion = 1
def __init__(
self,
inplanes,
planes,
stride=1,
downsample=None,
groups=1,
base_width=64,
dilation=1,
norm_layer=None,
):
super().__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2D
if dilation > 1:
raise NotImplementedError(
"Dilation > 1 not supported in BasicBlock"
)
self.conv1 = nn.Conv2D(
inplanes, planes, 3, padding=1, stride=stride, bias_attr=False
)
self.bn1 = norm_layer(planes)
self.relu = nn.ReLU()
self.conv2 = nn.Conv2D(planes, planes, 3, padding=1, bias_attr=False)
self.bn2 = norm_layer(planes)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class BottleneckBlock(nn.Layer):
expansion = 4
def __init__(
self,
inplanes,
planes,
stride=1,
downsample=None,
groups=1,
base_width=64,
dilation=1,
norm_layer=None,
):
super().__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2D
width = int(planes * (base_width / 64.0)) * groups
self.conv1 = nn.Conv2D(inplanes, width, 1, bias_attr=False)
self.bn1 = norm_layer(width)
self.conv2 = nn.Conv2D(
width,
width,
3,
padding=dilation,
stride=stride,
groups=groups,
dilation=dilation,
bias_attr=False,
)
self.bn2 = norm_layer(width)
self.conv3 = nn.Conv2D(
width, planes * self.expansion, 1, bias_attr=False
)
self.bn3 = norm_layer(planes * self.expansion)
self.relu = nn.ReLU()
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class ResNet(nn.Layer):
"""ResNet model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
Args:
Block (BasicBlock|BottleneckBlock): Block module of model.
depth (int, optional): Layers of ResNet, Default: 50.
width (int, optional): Base width per convolution group for each convolution block, Default: 64.
num_classes (int, optional): Output num_features of last fc layer. If num_classes <= 0, last fc layer
will not be defined. Default: 1000.
with_pool (bool, optional): Use pool before the last fc layer or not. Default: True.
groups (int, optional): Number of groups for each convolution block, Default: 1.
Returns:
:ref:`api_paddle_nn_Layer`. An instance of ResNet model.
Examples:
.. code-block:: python
import paddle
from paddle.vision.models import ResNet
from paddle.vision.models.resnet import BottleneckBlock, BasicBlock
# build ResNet with 18 layers
resnet18 = ResNet(BasicBlock, 18)
# build ResNet with 50 layers
resnet50 = ResNet(BottleneckBlock, 50)
# build Wide ResNet model
wide_resnet50_2 = ResNet(BottleneckBlock, 50, width=64*2)
# build ResNeXt model
resnext50_32x4d = ResNet(BottleneckBlock, 50, width=4, groups=32)
x = paddle.rand([1, 3, 224, 224])
out = resnet18(x)
print(out.shape)
# [1, 1000]
"""
def __init__(
self,
block,
parc_insert_locs=[3, 4, 3, 2],
use_parc = [False, False, True, True],
global_kernel_size = [32, 16, 8, 4],
depth=50,
width=64,
num_classes=1000,
with_pool=True,
groups=1,
):
super().__init__()
layer_cfg = {
18: [2, 2, 2, 2],
34: [3, 4, 6, 3],
50: [3, 4, 6, 3],
101: [3, 4, 23, 3],
152: [3, 8, 36, 3],
}
layers = layer_cfg[depth]
self.groups = groups
self.base_width = width
self.num_classes = num_classes
self.with_pool = with_pool
self.parc_insert_locs = parc_insert_locs
self._norm_layer = nn.BatchNorm2D
self.inplanes = 64
self.dilation = 1
self.conv1 = nn.Conv2D(
3,
self.inplanes,
kernel_size=3,
stride=1,
padding=1,
bias_attr=False,
)
self.bn1 = self._norm_layer(self.inplanes)
self.relu = nn.ReLU()
self.layer1 = self._make_layer(block, 64, layers[0], use_parc=use_parc[0],
parc_insert_locs=parc_insert_locs[0], global_kernel_size=global_kernel_size[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2, use_parc=use_parc[1],
parc_insert_locs=parc_insert_locs[1], global_kernel_size=global_kernel_size[1])
self.layer3 = self._make_layer(block, 256, layers[2], stride=2, use_parc=use_parc[2],
parc_insert_locs=parc_insert_locs[2], global_kernel_size=global_kernel_size[2])
self.layer4 = self._make_layer(block, 512, layers[3], stride=2, use_parc=use_parc[3],
parc_insert_locs=parc_insert_locs[3], global_kernel_size=global_kernel_size[3])
if with_pool:
self.avgpool = nn.AdaptiveAvgPool2D((1, 1))
if num_classes > 0:
self.fc = nn.Linear(512 * block.expansion, num_classes)
def _make_layer(self, block, planes, blocks, stride=1, dilate=False, use_parc=False, parc_insert_locs=-1, global_kernel_size=14):
norm_layer = self._norm_layer
downsample = None
previous_dilation = self.dilation
if dilate:
self.dilation *= stride
stride = 1
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2D(
self.inplanes,
planes * block.expansion,
1,
stride=stride,
bias_attr=False,
),
norm_layer(planes * block.expansion),
)
layers = []
layers.append(
block(
self.inplanes,
planes,
stride,
downsample,
self.groups,
self.base_width,
previous_dilation,
norm_layer,
)
)
self.inplanes = planes * block.expansion
for i in range(1, blocks):
if use_parc:
if i >= parc_insert_locs:
layers.append(
ParCBlock(
self.inplanes,
planes,
groups=self.groups,
base_width=self.base_width,
norm_layer=norm_layer,
use_pe = True,
global_kernel_size = global_kernel_size
)
)
continue
layers.append(
block(
self.inplanes,
planes,
groups=self.groups,
base_width=self.base_width,
norm_layer=norm_layer,
)
)
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
if self.with_pool:
x = self.avgpool(x)
if self.num_classes > 0:
x = paddle.flatten(x, 1)
x = self.fc(x)
return x
model = ResNet(BasicBlock, depth=18, parc_insert_locs=[2, 2, 1, 1], num_classes=10)
paddle.summary(model, (1, 3, 32, 32))
2.5 训练
learning_rate = 0.1
n_epochs = 100
paddle.seed(42)
np.random.seed(42)
work_path = 'work/model'
model = ResNet(BasicBlock, depth=18, parc_insert_locs=[2, 2, 1, 1], num_classes=10)
criterion = LabelSmoothingCrossEntropy()
scheduler = paddle.optimizer.lr.MultiStepDecay(learning_rate=learning_rate, milestones=[30, 60, 80], verbose=False)
optimizer = paddle.optimizer.Momentum(parameters=model.parameters(), learning_rate=scheduler, weight_decay=1e-5)
gate = 0.0
threshold = 0.0
best_acc = 0.0
val_acc = 0.0
loss_record = {'train': {'loss': [], 'iter': []}, 'val': {'loss': [], 'iter': []}} # for recording loss
acc_record = {'train': {'acc': [], 'iter': []}, 'val': {'acc': [], 'iter': []}} # for recording accuracy
loss_iter = 0
acc_iter = 0
for epoch in range(n_epochs):
# ---------- Training ----------
model.train()
train_num = 0.0
train_loss = 0.0
val_num = 0.0
val_loss = 0.0
accuracy_manager = paddle.metric.Accuracy()
val_accuracy_manager = paddle.metric.Accuracy()
print("#===epoch: {}, lr={:.10f}===#".format(epoch, optimizer.get_lr()))
for batch_id, data in enumerate(train_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
logits = model(x_data)
loss = criterion(logits, y_data)
acc = paddle.metric.accuracy(logits, labels)
accuracy_manager.update(acc)
if batch_id % 10 == 0:
loss_record['train']['loss'].append(loss.numpy())
loss_record['train']['iter'].append(loss_iter)
loss_iter += 1
loss.backward()
optimizer.step()
optimizer.clear_grad()
train_loss += loss
train_num += len(y_data)
scheduler.step()
total_train_loss = (train_loss / train_num) * batch_size
train_acc = accuracy_manager.accumulate()
acc_record['train']['acc'].append(train_acc)
acc_record['train']['iter'].append(acc_iter)
acc_iter += 1
# Print the information.
print("#===epoch: {}, train loss is: {}, train acc is: {:2.2f}%===#".format(epoch, total_train_loss.numpy(), train_acc*100))
# ---------- Validation ----------
model.eval()
for batch_id, data in enumerate(val_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
with paddle.no_grad():
logits = model(x_data)
loss = criterion(logits, y_data)
acc = paddle.metric.accuracy(logits, labels)
val_accuracy_manager.update(acc)
val_loss += loss
val_num += len(y_data)
total_val_loss = (val_loss / val_num) * batch_size
loss_record['val']['loss'].append(total_val_loss.numpy())
loss_record['val']['iter'].append(loss_iter)
val_acc = val_accuracy_manager.accumulate()
acc_record['val']['acc'].append(val_acc)
acc_record['val']['iter'].append(acc_iter)
print("#===epoch: {}, val loss is: {}, val acc is: {:2.2f}%===#".format(epoch, total_val_loss.numpy(), val_acc*100))
# ===================save====================
if val_acc > best_acc:
best_acc = val_acc
paddle.save(model.state_dict(), os.path.join(work_path, 'best_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'best_optimizer.pdopt'))
print(best_acc)
paddle.save(model.state_dict(), os.path.join(work_path, 'final_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'final_optimizer.pdopt'))
2.6 实验结果
def plot_learning_curve(record, title='loss', ylabel='CE Loss'):
''' Plot learning curve of your CNN '''
maxtrain = max(map(float, record['train'][title]))
maxval = max(map(float, record['val'][title]))
ymax = max(maxtrain, maxval) * 1.1
mintrain = min(map(float, record['train'][title]))
minval = min(map(float, record['val'][title]))
ymin = min(mintrain, minval) * 0.9
total_steps = len(record['train'][title])
x_1 = list(map(int, record['train']['iter']))
x_2 = list(map(int, record['val']['iter']))
figure(figsize=(10, 6))
plt.plot(x_1, record['train'][title], c='tab:red', label='train')
plt.plot(x_2, record['val'][title], c='tab:cyan', label='val')
plt.ylim(ymin, ymax)
plt.xlabel('Training steps')
plt.ylabel(ylabel)
plt.title('Learning curve of {}'.format(title))
plt.legend()
plt.show()
plot_learning_curve(loss_record, title='loss', ylabel='CE Loss')
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/cbook/__init__.py:2349: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
if isinstance(obj, collections.Iterator):
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/cbook/__init__.py:2366: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
return list(data) if isinstance(data, collections.MappingView) else data
plot_learning_curve(acc_record, title='acc', ylabel='Accuracy')
import time
work_path = 'work/model'
model = ResNet(BasicBlock, depth=18, parc_insert_locs=[2, 2, 1, 1], num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
aa = time.time()
for batch_id, data in enumerate(val_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
with paddle.no_grad():
logits = model(x_data)
bb = time.time()
print("Throughout:{}".format(int(len(val_dataset)//(bb - aa))))
Throughout:3806
3. ResNet
3.1 ResNet
model = paddle.vision.models.resnet18(num_classes=10)
model.conv1 = nn.Conv2D(3, 64, 3, padding=1, bias_attr=False)
model.maxpool = nn.Identity()
paddle.summary(model, (1, 3, 128, 128))
3.2 训练
learning_rate = 0.1
n_epochs = 100
paddle.seed(42)
np.random.seed(42)
work_path = 'work/model1'
model = paddle.vision.models.resnet18(num_classes=10)
model.conv1 = nn.Conv2D(3, 64, 3, padding=1, bias_attr=False)
model.maxpool = nn.Identity()
criterion = LabelSmoothingCrossEntropy()
scheduler = paddle.optimizer.lr.MultiStepDecay(learning_rate=learning_rate, milestones=[30, 60, 80], verbose=False)
optimizer = paddle.optimizer.Momentum(parameters=model.parameters(), learning_rate=scheduler, weight_decay=1e-5)
gate = 0.0
threshold = 0.0
best_acc = 0.0
val_acc = 0.0
loss_record1 = {'train': {'loss': [], 'iter': []}, 'val': {'loss': [], 'iter': []}} # for recording loss
acc_record1 = {'train': {'acc': [], 'iter': []}, 'val': {'acc': [], 'iter': []}} # for recording accuracy
loss_iter = 0
acc_iter = 0
for epoch in range(n_epochs):
# ---------- Training ----------
model.train()
train_num = 0.0
train_loss = 0.0
val_num = 0.0
val_loss = 0.0
accuracy_manager = paddle.metric.Accuracy()
val_accuracy_manager = paddle.metric.Accuracy()
print("#===epoch: {}, lr={:.10f}===#".format(epoch, optimizer.get_lr()))
for batch_id, data in enumerate(train_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
logits = model(x_data)
loss = criterion(logits, y_data)
acc = paddle.metric.accuracy(logits, labels)
accuracy_manager.update(acc)
if batch_id % 10 == 0:
loss_record1['train']['loss'].append(loss.numpy())
loss_record1['train']['iter'].append(loss_iter)
loss_iter += 1
loss.backward()
optimizer.step()
optimizer.clear_grad()
train_loss += loss
train_num += len(y_data)
scheduler.step()
total_train_loss = (train_loss / train_num) * batch_size
train_acc = accuracy_manager.accumulate()
acc_record1['train']['acc'].append(train_acc)
acc_record1['train']['iter'].append(acc_iter)
acc_iter += 1
# Print the information.
print("#===epoch: {}, train loss is: {}, train acc is: {:2.2f}%===#".format(epoch, total_train_loss.numpy(), train_acc*100))
# ---------- Validation ----------
model.eval()
for batch_id, data in enumerate(val_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
with paddle.no_grad():
logits = model(x_data)
loss = criterion(logits, y_data)
acc = paddle.metric.accuracy(logits, labels)
val_accuracy_manager.update(acc)
val_loss += loss
val_num += len(y_data)
total_val_loss = (val_loss / val_num) * batch_size
loss_record1['val']['loss'].append(total_val_loss.numpy())
loss_record1['val']['iter'].append(loss_iter)
val_acc = val_accuracy_manager.accumulate()
acc_record1['val']['acc'].append(val_acc)
acc_record1['val']['iter'].append(acc_iter)
print("#===epoch: {}, val loss is: {}, val acc is: {:2.2f}%===#".format(epoch, total_val_loss.numpy(), val_acc*100))
# ===================save====================
if val_acc > best_acc:
best_acc = val_acc
paddle.save(model.state_dict(), os.path.join(work_path, 'best_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'best_optimizer.pdopt'))
print(best_acc)
paddle.save(model.state_dict(), os.path.join(work_path, 'final_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'final_optimizer.pdopt'))
3.3 实验结果
plot_learning_curve(loss_record1, title='loss', ylabel='CE Loss')
plot_learning_curve(acc_record1, title='acc', ylabel='Accuracy')
##### import time
work_path = 'work/model1'
model = paddle.vision.models.resnet18(num_classes=10)
model.conv1 = nn.Conv2D(3, 64, 3, padding=1, bias_attr=False)
model.maxpool = nn.Identity()
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
aa = time.time()
for batch_id, data in enumerate(val_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
with paddle.no_grad():
logits = model(x_data)
bb = time.time()
print("Throughout:{}".format(int(len(val_dataset)//(bb - aa))))
Throughout:3705
-
aa))))
``Throughout:3705
4. 对比实验结果
Model | Train Acc | Val Acc | Parameter |
---|---|---|---|
ResNet18 w ParC | 0.8638 | 0.8960 | 6735050 |
ResNet18 w/o ParC | 0.8631 | 0.8910 | 11183562 |
总结
将ParC Block插入到ResNet18中,减少了40%的参数,仍能与ResNet达到可比的性能(+0.5%)
参考资料
论文:ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer(ECCV 2022)
代码:hkzhang91/ParC-Net
此文章为搬运
原项目链接
更多推荐
所有评论(0)