MixConv:混合感受野的深度可分离卷积

摘要

        深度卷积在现代高效网络中越来越流行,但其内核大小往往被忽视。在本文中,我们系统地研究了不同内核大小的影响,并观察到结合多个内核大小的优点可以提高准确性和效率。基于这一观察,我们提出了一种新的混合深度卷积(MixConv),它在一个卷积中自然地混合了多个内核大小。作为vanilla深度卷积的简单替代,我们的MixConv提高了现有MobileNet在ImageNet分类和COCO目标检测方面的准确性和效率。为了证明MixConv的有效性,我们将其集成到AutoML搜索空间中,并开发了一个新的模型系列,称为MixNets,它优于以前的移动模型,包括MobileNetV2[23](ImageNet top-1精度+4.2%)、ShuffleNetV2[18](+3.5%)、MnasNet[29](+1.3%)、ProxylessNAS[2](+2.2%)和FBNet[30](+2.0%)。特别是,我们的MixNet-L在典型的移动设置下(<600M次)达到了78.9%的最先进的ImageNet top-1精度。

1. MixConv

        不同于常规的depth-wise卷积,对所有的通道使用相同尺寸的卷积核,MixConv对通道分组,不同的组使用不同尺寸的卷积核。
在这里插入图片描述

2. 代码复现

2.1 下载并导入所需要的包

!pip install paddlex
%matplotlib inline
import paddle
import paddle.fluid as fluid
import numpy as np
import matplotlib.pyplot as plt
from paddle.vision.datasets import Cifar10
from paddle.vision.transforms import Transpose
from paddle.io import Dataset, DataLoader
from paddle import nn
import paddle.nn.functional as F
import paddle.vision.transforms as transforms
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import paddlex
from paddle import ParamAttr

2.2 创建数据集

train_tfm = transforms.Compose([
    transforms.Resize((130, 130)),
    transforms.ColorJitter(brightness=0.2,contrast=0.2, saturation=0.2),
    paddlex.transforms.MixupImage(),
    transforms.RandomResizedCrop(128, scale=(0.6, 1.0)),
    transforms.RandomHorizontalFlip(0.5),
    transforms.RandomRotation(20),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

test_tfm = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
paddle.vision.set_image_backend('cv2')
# 使用Cifar10数据集
train_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='train', transform = train_tfm, )
val_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='test',transform = test_tfm)
print("train_dataset: %d" % len(train_dataset))
print("val_dataset: %d" % len(val_dataset))
train_dataset: 50000
val_dataset: 10000
batch_size=128
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True, num_workers=2)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, drop_last=False, num_workers=2)

2.3 标签平滑

class LabelSmoothingCrossEntropy(nn.Layer):
    def __init__(self, smoothing=0.1):
        super().__init__()
        self.smoothing = smoothing

    def forward(self, pred, target):

        confidence = 1. - self.smoothing
        log_probs = F.log_softmax(pred, axis=-1)
        idx = paddle.stack([paddle.arange(log_probs.shape[0]), target], axis=1)
        nll_loss = paddle.gather_nd(-log_probs, index=idx)
        smooth_loss = paddle.mean(-log_probs, axis=-1)
        loss = confidence * nll_loss + self.smoothing * smooth_loss

        return loss.mean()

2.4 AlexNet-MixConv

2.4.1 MixConv
def _SplitChannels(channels, num_groups):
    split_channels = [channels//num_groups for _ in range(num_groups)]
    split_channels[0] += channels - sum(split_channels)
    return split_channels

class MixConv(nn.Layer):
    # num_groups表示将通道分成几组
    def __init__(self, num_channels, num_filters, filter_size, stride, padding, num_groups=4, name=None):
        super().__init__()
        self.num_groups = num_groups
        self.split_in_channels = _SplitChannels(num_channels, num_groups)
        self.split_out_channels = _SplitChannels(num_filters, num_groups)
        self.mixconvs = nn.LayerList()
        for i in range(num_groups):
            self.mixconvs.append(nn.Conv2D(self.split_in_channels[i], self.split_out_channels[i], 
                2 * i + 3, stride, (2 * i + 3)//2, groups=self.split_in_channels[i], bias_attr=False))       
    
    def forward(self, x):
        if self.num_groups == 1:
            return self.mixconvs[0](x)

        x_split = paddle.split(x, self.split_in_channels, axis=1)
        
        x = [conv(t) for conv, t in zip(self.mixconvs, x_split)]
        x = paddle.concat(x, axis=1)

        return x
model = MixConv(16, 64, 3, 1, 2, 4)
paddle.summary(model, (1, 16, 224, 224))
---------------------------------------------------------------------------
 Layer (type)       Input Shape          Output Shape         Param #    
===========================================================================
   Conv2D-1      [[1, 4, 224, 224]]   [1, 16, 224, 224]         144      
   Conv2D-2      [[1, 4, 224, 224]]   [1, 16, 224, 224]         400      
   Conv2D-3      [[1, 4, 224, 224]]   [1, 16, 224, 224]         784      
   Conv2D-4      [[1, 4, 224, 224]]   [1, 16, 224, 224]        1,296     
===========================================================================
Total params: 2,624
Trainable params: 2,624
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 3.06
Forward/backward pass size (MB): 24.50
Params size (MB): 0.01
Estimated Total Size (MB): 27.57
---------------------------------------------------------------------------



W0724 18:52:12.671222   979 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0724 18:52:12.674866   979 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.





{'total_params': 2624, 'trainable_params': 2624}
class AlexNet_Mixconv(nn.Layer):
    def __init__(self, num_classes=10):
        super().__init__()
        self.features=nn.Sequential(
            nn.Conv2D(3, 48, 11, 4, 11//2),
            nn.ReLU(),
            nn.MaxPool2D(kernel_size=3,stride=2),
            nn.Conv2D(48, 128, 5, 1, 2),
            nn.ReLU(),
            nn.MaxPool2D(kernel_size=3,stride=2),
            MixConv(128, 256, 3, 1, 1),
            nn.ReLU(),
            MixConv(256, 256, 3, 1, 1),
            nn.ReLU(),
            nn.Conv2D(256, 128, 3, 1, 1),
            nn.ReLU(),
            nn.MaxPool2D(kernel_size=3,stride=2),
        )
        self.classifier=nn.Sequential(
            nn.Linear(3*3*128,2048),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(2048,2048),
            nn.ReLU(),
            nn.Dropout(),
            nn.Linear(2048,num_classes),
        )
 
 
    def forward(self,x):
        x = self.features(x)
        x = paddle.flatten(x, 1)
        x=self.classifier(x)
 
        return x
model = AlexNet_Mixconv(num_classes=10)
paddle.summary(model, (1, 3, 128, 128))
---------------------------------------------------------------------------
 Layer (type)       Input Shape          Output Shape         Param #    
===========================================================================
   Conv2D-5      [[1, 3, 128, 128]]    [1, 48, 32, 32]        17,472     
    ReLU-5       [[1, 48, 32, 32]]     [1, 48, 32, 32]           0       
  MaxPool2D-1    [[1, 48, 32, 32]]     [1, 48, 15, 15]           0       
   Conv2D-6      [[1, 48, 15, 15]]     [1, 128, 15, 15]       153,728    
    ReLU-6       [[1, 128, 15, 15]]    [1, 128, 15, 15]          0       
  MaxPool2D-2    [[1, 128, 15, 15]]     [1, 128, 7, 7]           0       
   Conv2D-7       [[1, 32, 7, 7]]       [1, 64, 7, 7]           576      
   Conv2D-8       [[1, 32, 7, 7]]       [1, 64, 7, 7]          1,600     
   Conv2D-9       [[1, 32, 7, 7]]       [1, 64, 7, 7]          3,136     
   Conv2D-10      [[1, 32, 7, 7]]       [1, 64, 7, 7]          5,184     
   MixConv-2      [[1, 128, 7, 7]]      [1, 256, 7, 7]           0       
    ReLU-7        [[1, 256, 7, 7]]      [1, 256, 7, 7]           0       
   Conv2D-11      [[1, 64, 7, 7]]       [1, 64, 7, 7]           576      
   Conv2D-12      [[1, 64, 7, 7]]       [1, 64, 7, 7]          1,600     
   Conv2D-13      [[1, 64, 7, 7]]       [1, 64, 7, 7]          3,136     
   Conv2D-14      [[1, 64, 7, 7]]       [1, 64, 7, 7]          5,184     
   MixConv-3      [[1, 256, 7, 7]]      [1, 256, 7, 7]           0       
    ReLU-8        [[1, 256, 7, 7]]      [1, 256, 7, 7]           0       
   Conv2D-15      [[1, 256, 7, 7]]      [1, 128, 7, 7]        295,040    
    ReLU-9        [[1, 128, 7, 7]]      [1, 128, 7, 7]           0       
  MaxPool2D-3     [[1, 128, 7, 7]]      [1, 128, 3, 3]           0       
   Linear-1         [[1, 1152]]           [1, 2048]          2,361,344   
    ReLU-10         [[1, 2048]]           [1, 2048]              0       
   Dropout-1        [[1, 2048]]           [1, 2048]              0       
   Linear-2         [[1, 2048]]           [1, 2048]          4,196,352   
    ReLU-11         [[1, 2048]]           [1, 2048]              0       
   Dropout-2        [[1, 2048]]           [1, 2048]              0       
   Linear-3         [[1, 2048]]            [1, 10]            20,490     
===========================================================================
Total params: 7,065,418
Trainable params: 7,065,418
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.19
Forward/backward pass size (MB): 2.09
Params size (MB): 26.95
Estimated Total Size (MB): 29.23
---------------------------------------------------------------------------






{'total_params': 7065418, 'trainable_params': 7065418}

2.5 训练

learning_rate = 0.001
n_epochs = 50
paddle.seed(42)
np.random.seed(42)
work_path = 'work/model'

model = AlexNet_Mixconv(num_classes=10)

criterion = LabelSmoothingCrossEntropy()

scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=learning_rate, T_max=50000 // batch_size * n_epochs, verbose=False)
optimizer = paddle.optimizer.Adam(parameters=model.parameters(), learning_rate=scheduler, weight_decay=1e-5)

gate = 0.0
threshold = 0.0
best_acc = 0.0
val_acc = 0.0
loss_record = {'train': {'loss': [], 'iter': []}, 'val': {'loss': [], 'iter': []}}   # for recording loss
acc_record = {'train': {'acc': [], 'iter': []}, 'val': {'acc': [], 'iter': []}}      # for recording accuracy

loss_iter = 0
acc_iter = 0

for epoch in range(n_epochs):
    # ---------- Training ----------
    model.train()
    train_num = 0.0
    train_loss = 0.0

    val_num = 0.0
    val_loss = 0.0
    accuracy_manager = paddle.metric.Accuracy()
    val_accuracy_manager = paddle.metric.Accuracy()
    print("#===epoch: {}, lr={:.10f}===#".format(epoch, optimizer.get_lr()))
    for batch_id, data in enumerate(train_loader):
        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)

        logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        accuracy_manager.update(acc)
        if batch_id % 10 == 0:
            loss_record['train']['loss'].append(loss.numpy())
            loss_record['train']['iter'].append(loss_iter)
            loss_iter += 1

        loss.backward()

        optimizer.step()
        scheduler.step()
        optimizer.clear_grad()
        
        train_loss += loss
        train_num += len(y_data)

    total_train_loss = (train_loss / train_num) * batch_size
    train_acc = accuracy_manager.accumulate()
    acc_record['train']['acc'].append(train_acc)
    acc_record['train']['iter'].append(acc_iter)
    acc_iter += 1
    # Print the information.
    print("#===epoch: {}, train loss is: {}, train acc is: {:2.2f}%===#".format(epoch, total_train_loss.numpy(), train_acc*100))

    # ---------- Validation ----------
    model.eval()

    for batch_id, data in enumerate(val_loader):

        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)
        with paddle.no_grad():
          logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        val_accuracy_manager.update(acc)

        val_loss += loss
        val_num += len(y_data)

    total_val_loss = (val_loss / val_num) * batch_size
    loss_record['val']['loss'].append(total_val_loss.numpy())
    loss_record['val']['iter'].append(loss_iter)
    val_acc = val_accuracy_manager.accumulate()
    acc_record['val']['acc'].append(val_acc)
    acc_record['val']['iter'].append(acc_iter)
    
    print("#===epoch: {}, val loss is: {}, val acc is: {:2.2f}%===#".format(epoch, total_val_loss.numpy(), val_acc*100))

    # ===================save====================
    if val_acc > best_acc:
        best_acc = val_acc
        paddle.save(model.state_dict(), os.path.join(work_path, 'best_model.pdparams'))
        paddle.save(optimizer.state_dict(), os.path.join(work_path, 'best_optimizer.pdopt'))

print(best_acc)
paddle.save(model.state_dict(), os.path.join(work_path, 'final_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'final_optimizer.pdopt'))
#===epoch: 0, lr=0.0010000000===#
#===epoch: 0, train loss is: [1.8332046], train acc is: 37.31%===#
#===epoch: 0, val loss is: [1.5604398], val acc is: 51.62%===#
#===epoch: 1, lr=0.0009990134===#
#===epoch: 1, train loss is: [1.6065372], train acc is: 49.95%===#
#===epoch: 1, val loss is: [1.4431052], val acc is: 58.04%===#
#===epoch: 2, lr=0.0009960574===#
#===epoch: 2, train loss is: [1.5085369], train acc is: 54.82%===#
#===epoch: 2, val loss is: [1.3632524], val acc is: 62.80%===#
#===epoch: 3, lr=0.0009911436===#
#===epoch: 3, train loss is: [1.4487118], train acc is: 58.17%===#
#===epoch: 3, val loss is: [1.300965], val acc is: 65.54%===#
#===epoch: 4, lr=0.0009842916===#
#===epoch: 4, train loss is: [1.4092876], train acc is: 60.00%===#
#===epoch: 4, val loss is: [1.2562582], val acc is: 66.83%===#
#===epoch: 5, lr=0.0009755283===#
#===epoch: 5, train loss is: [1.3632561], train acc is: 62.24%===#
#===epoch: 5, val loss is: [1.2178173], val acc is: 68.88%===#
#===epoch: 6, lr=0.0009648882===#
#===epoch: 6, train loss is: [1.336868], train acc is: 63.29%===#
#===epoch: 6, val loss is: [1.2229024], val acc is: 69.12%===#
#===epoch: 7, lr=0.0009524135===#
#===epoch: 7, train loss is: [1.3104041], train acc is: 64.57%===#
#===epoch: 7, val loss is: [1.1900145], val acc is: 70.71%===#
#===epoch: 8, lr=0.0009381533===#
#===epoch: 8, train loss is: [1.290957], train acc is: 65.48%===#
#===epoch: 8, val loss is: [1.1561929], val acc is: 71.58%===#
#===epoch: 9, lr=0.0009221640===#
#===epoch: 9, train loss is: [1.2713991], train acc is: 66.38%===#
#===epoch: 9, val loss is: [1.1582695], val acc is: 71.44%===#
#===epoch: 10, lr=0.0009045085===#
#===epoch: 10, train loss is: [1.256575], train acc is: 67.27%===#
#===epoch: 10, val loss is: [1.1347567], val acc is: 72.52%===#
#===epoch: 11, lr=0.0008852566===#
#===epoch: 11, train loss is: [1.2349375], train acc is: 68.05%===#
#===epoch: 11, val loss is: [1.1213648], val acc is: 73.43%===#
#===epoch: 12, lr=0.0008644843===#
#===epoch: 12, train loss is: [1.2255381], train acc is: 68.28%===#
#===epoch: 12, val loss is: [1.1092324], val acc is: 73.64%===#
#===epoch: 13, lr=0.0008422736===#
#===epoch: 13, train loss is: [1.211379], train acc is: 69.12%===#
#===epoch: 13, val loss is: [1.0956241], val acc is: 74.48%===#
#===epoch: 14, lr=0.0008187120===#
#===epoch: 14, train loss is: [1.2031662], train acc is: 69.55%===#
#===epoch: 14, val loss is: [1.0745709], val acc is: 75.38%===#
#===epoch: 15, lr=0.0007938926===#
#===epoch: 15, train loss is: [1.1895174], train acc is: 70.18%===#
#===epoch: 15, val loss is: [1.081457], val acc is: 75.05%===#
#===epoch: 16, lr=0.0007679134===#
#===epoch: 16, train loss is: [1.1810952], train acc is: 70.33%===#
#===epoch: 16, val loss is: [1.0502316], val acc is: 76.76%===#
#===epoch: 17, lr=0.0007408768===#
#===epoch: 17, train loss is: [1.1669109], train acc is: 71.11%===#
#===epoch: 17, val loss is: [1.05597], val acc is: 76.17%===#
#===epoch: 18, lr=0.0007128896===#
#===epoch: 18, train loss is: [1.1530827], train acc is: 71.83%===#
#===epoch: 18, val loss is: [1.047121], val acc is: 76.39%===#
#===epoch: 19, lr=0.0006840623===#
#===epoch: 19, train loss is: [1.145995], train acc is: 72.13%===#
#===epoch: 19, val loss is: [1.023506], val acc is: 77.50%===#
#===epoch: 20, lr=0.0006545085===#
#===epoch: 20, train loss is: [1.1302441], train acc is: 72.87%===#
#===epoch: 20, val loss is: [1.0353966], val acc is: 77.25%===#
#===epoch: 21, lr=0.0006243449===#
#===epoch: 21, train loss is: [1.121871], train acc is: 73.14%===#
#===epoch: 21, val loss is: [1.0212026], val acc is: 78.35%===#
#===epoch: 22, lr=0.0005936907===#
#===epoch: 22, train loss is: [1.1141613], train acc is: 73.35%===#
#===epoch: 22, val loss is: [1.0185486], val acc is: 78.14%===#
#===epoch: 23, lr=0.0005626666===#
#===epoch: 23, train loss is: [1.1039811], train acc is: 73.73%===#
#===epoch: 23, val loss is: [1.0128148], val acc is: 78.19%===#
#===epoch: 24, lr=0.0005313953===#
#===epoch: 24, train loss is: [1.0911169], train acc is: 74.48%===#
#===epoch: 24, val loss is: [1.0095358], val acc is: 78.32%===#
#===epoch: 25, lr=0.0005000000===#
#===epoch: 25, train loss is: [1.0807213], train acc is: 74.97%===#
#===epoch: 25, val loss is: [0.9975869], val acc is: 78.58%===#
#===epoch: 26, lr=0.0004686047===#
#===epoch: 26, train loss is: [1.0722029], train acc is: 75.41%===#
#===epoch: 26, val loss is: [1.0026505], val acc is: 78.49%===#
#===epoch: 27, lr=0.0004373334===#
#===epoch: 27, train loss is: [1.0646522], train acc is: 75.48%===#
#===epoch: 27, val loss is: [0.97930884], val acc is: 79.76%===#
#===epoch: 28, lr=0.0004063093===#
#===epoch: 28, train loss is: [1.0545902], train acc is: 75.98%===#
#===epoch: 28, val loss is: [0.97865117], val acc is: 79.64%===#
#===epoch: 29, lr=0.0003756551===#
#===epoch: 29, train loss is: [1.0444697], train acc is: 76.59%===#
#===epoch: 29, val loss is: [0.96476597], val acc is: 80.13%===#
#===epoch: 30, lr=0.0003454915===#
#===epoch: 30, train loss is: [1.037737], train acc is: 76.71%===#
#===epoch: 30, val loss is: [0.9573404], val acc is: 80.40%===#
#===epoch: 31, lr=0.0003159377===#
#===epoch: 31, train loss is: [1.0279362], train acc is: 77.33%===#
#===epoch: 31, val loss is: [0.9777868], val acc is: 79.72%===#
#===epoch: 32, lr=0.0002871104===#
#===epoch: 32, train loss is: [1.0181235], train acc is: 77.45%===#
#===epoch: 32, val loss is: [0.9529455], val acc is: 80.93%===#
#===epoch: 33, lr=0.0002591232===#
#===epoch: 33, train loss is: [1.0126927], train acc is: 78.02%===#
#===epoch: 33, val loss is: [0.9532719], val acc is: 80.87%===#
#===epoch: 34, lr=0.0002320866===#
#===epoch: 34, train loss is: [1.0000519], train acc is: 78.33%===#
#===epoch: 34, val loss is: [0.94096804], val acc is: 81.23%===#
#===epoch: 35, lr=0.0002061074===#
#===epoch: 35, train loss is: [0.9938114], train acc is: 78.71%===#
#===epoch: 35, val loss is: [0.9470541], val acc is: 81.20%===#
#===epoch: 36, lr=0.0001812880===#
#===epoch: 36, train loss is: [0.98845834], train acc is: 78.93%===#
#===epoch: 36, val loss is: [0.93678844], val acc is: 81.60%===#
#===epoch: 37, lr=0.0001577264===#
#===epoch: 37, train loss is: [0.98586285], train acc is: 78.99%===#
#===epoch: 37, val loss is: [0.93320215], val acc is: 81.62%===#
#===epoch: 38, lr=0.0001355157===#
#===epoch: 38, train loss is: [0.9760749], train acc is: 79.36%===#
#===epoch: 38, val loss is: [0.9337833], val acc is: 81.80%===#
#===epoch: 39, lr=0.0001147434===#
#===epoch: 39, train loss is: [0.9714146], train acc is: 79.79%===#
#===epoch: 39, val loss is: [0.9247616], val acc is: 82.04%===#
#===epoch: 40, lr=0.0000954915===#
#===epoch: 40, train loss is: [0.9661569], train acc is: 79.73%===#
#===epoch: 40, val loss is: [0.92751354], val acc is: 81.85%===#
#===epoch: 41, lr=0.0000778360===#
#===epoch: 41, train loss is: [0.9587111], train acc is: 80.09%===#
#===epoch: 41, val loss is: [0.92223495], val acc is: 82.17%===#
#===epoch: 42, lr=0.0000618467===#
#===epoch: 42, train loss is: [0.959283], train acc is: 80.08%===#
#===epoch: 42, val loss is: [0.92457324], val acc is: 82.09%===#
#===epoch: 43, lr=0.0000475865===#
#===epoch: 43, train loss is: [0.9539068], train acc is: 80.48%===#
#===epoch: 43, val loss is: [0.9255979], val acc is: 82.05%===#
#===epoch: 44, lr=0.0000351118===#
#===epoch: 44, train loss is: [0.9511745], train acc is: 80.55%===#
#===epoch: 44, val loss is: [0.91929686], val acc is: 82.39%===#
#===epoch: 45, lr=0.0000244717===#
#===epoch: 45, train loss is: [0.9490688], train acc is: 80.66%===#
#===epoch: 45, val loss is: [0.91968024], val acc is: 82.34%===#
#===epoch: 46, lr=0.0000157084===#
#===epoch: 46, train loss is: [0.9512466], train acc is: 80.48%===#
#===epoch: 46, val loss is: [0.9192811], val acc is: 82.42%===#
#===epoch: 47, lr=0.0000088564===#
#===epoch: 47, train loss is: [0.94866693], train acc is: 80.64%===#
#===epoch: 47, val loss is: [0.91951287], val acc is: 82.40%===#
#===epoch: 48, lr=0.0000039426===#
#===epoch: 48, train loss is: [0.948632], train acc is: 80.63%===#
#===epoch: 48, val loss is: [0.9191138], val acc is: 82.36%===#
#===epoch: 49, lr=0.0000009866===#
#===epoch: 49, train loss is: [0.94711], train acc is: 80.66%===#
#===epoch: 49, val loss is: [0.91893375], val acc is: 82.38%===#
0.8241693037974683

2.6 实验结果

def plot_learning_curve(record, title='loss', ylabel='CE Loss'):
    ''' Plot learning curve of your CNN '''
    maxtrain = max(map(float, record['train'][title]))
    maxval = max(map(float, record['val'][title]))
    ymax = max(maxtrain, maxval) * 1.1
    mintrain = min(map(float, record['train'][title]))
    minval = min(map(float, record['val'][title]))
    ymin = min(mintrain, minval) * 0.9

    total_steps = len(record['train'][title])
    x_1 = list(map(int, record['train']['iter']))
    x_2 = list(map(int, record['val']['iter']))
    figure(figsize=(10, 6))
    plt.plot(x_1, record['train'][title], c='tab:red', label='train')
    plt.plot(x_2, record['val'][title], c='tab:cyan', label='val')
    plt.ylim(ymin, ymax)
    plt.xlabel('Training steps')
    plt.ylabel(ylabel)
    plt.title('Learning curve of {}'.format(title))
    plt.legend()
    plt.show()
plot_learning_curve(loss_record, title='loss', ylabel='CE Loss')

在这里插入图片描述

plot_learning_curve(acc_record, title='acc', ylabel='Accuracy')

在这里插入图片描述

import time
work_path = 'work/model'
model = AlexNet_Mixconv(num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
aa = time.time()
for batch_id, data in enumerate(val_loader):

    x_data, y_data = data
    labels = paddle.unsqueeze(y_data, axis=1)
    with paddle.no_grad():
        logits = model(x_data)
bb = time.time()
print("Throughout:{}".format(int(len(val_dataset)//(bb - aa))))
Throughout:1134
def get_cifar10_labels(labels):  
    """返回CIFAR10数据集的文本标签。"""
    text_labels = [
        'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog',
        'horse', 'ship', 'truck']
    return [text_labels[int(i)] for i in labels]
def show_images(imgs, num_rows, num_cols, pred=None, gt=None, scale=1.5):  
    """Plot a list of images."""
    figsize = (num_cols * scale, num_rows * scale)
    _, axes = plt.subplots(num_rows, num_cols, figsize=figsize)
    axes = axes.flatten()
    for i, (ax, img) in enumerate(zip(axes, imgs)):
        if paddle.is_tensor(img):
            ax.imshow(img.numpy())
        else:
            ax.imshow(img)
        ax.axes.get_xaxis().set_visible(False)
        ax.axes.get_yaxis().set_visible(False)
        if pred or gt:
            ax.set_title("pt: " + pred[i] + "\ngt: " + gt[i])
    return axes
work_path = 'work/model'
X, y = next(iter(DataLoader(val_dataset, batch_size=18)))
model = AlexNet_Mixconv(num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
logits = model(X)
y_pred = paddle.argmax(logits, -1)
X = paddle.transpose(X, [0, 2, 3, 1])
axes = show_images(X.reshape((18, 128, 128, 3)), 1, 18, pred=get_cifar10_labels(y_pred), gt=get_cifar10_labels(y))
plt.show()
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

在这里插入图片描述

3. AlexNet

3.1 AlexNet

class AlexNet(nn.Layer):
    def __init__(self,num_classes=10):
        super(AlexNet, self).__init__()
        self.features=nn.Sequential(
            nn.Conv2D(3,48, kernel_size=11, stride=4, padding=11//2),
            nn.ReLU(),
            nn.MaxPool2D(kernel_size=3,stride=2),
            nn.Conv2D(48,128, kernel_size=5, padding=2),
            nn.ReLU(),
            nn.MaxPool2D(kernel_size=3,stride=2),
            nn.Conv2D(128, 256,kernel_size=3,stride=1,padding=1),
            nn.ReLU(),
            nn.Conv2D(256,256,kernel_size=3,stride=1,padding=1),
            nn.ReLU(),
            nn.Conv2D(256,128,kernel_size=3,stride=1,padding=1),
            nn.ReLU(),
            nn.MaxPool2D(kernel_size=3,stride=2),
        )
        self.classifier=nn.Sequential(
            nn.Linear(3*3*128,2048),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(2048,2048),
            nn.ReLU(),
            nn.Dropout(),
            nn.Linear(2048,num_classes),
        )
 
 
    def forward(self,x):
        x = self.features(x)
        x = paddle.flatten(x, 1)
        x=self.classifier(x)
 
        return x
model = AlexNet(num_classes=10)
paddle.summary(model, (1, 3, 128, 128))
---------------------------------------------------------------------------
 Layer (type)       Input Shape          Output Shape         Param #    
===========================================================================
   Conv2D-49     [[1, 3, 128, 128]]    [1, 48, 32, 32]        17,472     
    ReLU-33      [[1, 48, 32, 32]]     [1, 48, 32, 32]           0       
 MaxPool2D-13    [[1, 48, 32, 32]]     [1, 48, 15, 15]           0       
   Conv2D-50     [[1, 48, 15, 15]]     [1, 128, 15, 15]       153,728    
    ReLU-34      [[1, 128, 15, 15]]    [1, 128, 15, 15]          0       
 MaxPool2D-14    [[1, 128, 15, 15]]     [1, 128, 7, 7]           0       
   Conv2D-51      [[1, 128, 7, 7]]      [1, 256, 7, 7]        295,168    
    ReLU-35       [[1, 256, 7, 7]]      [1, 256, 7, 7]           0       
   Conv2D-52      [[1, 256, 7, 7]]      [1, 256, 7, 7]        590,080    
    ReLU-36       [[1, 256, 7, 7]]      [1, 256, 7, 7]           0       
   Conv2D-53      [[1, 256, 7, 7]]      [1, 128, 7, 7]        295,040    
    ReLU-37       [[1, 128, 7, 7]]      [1, 128, 7, 7]           0       
 MaxPool2D-15     [[1, 128, 7, 7]]      [1, 128, 3, 3]           0       
   Linear-13        [[1, 1152]]           [1, 2048]          2,361,344   
    ReLU-38         [[1, 2048]]           [1, 2048]              0       
   Dropout-9        [[1, 2048]]           [1, 2048]              0       
   Linear-14        [[1, 2048]]           [1, 2048]          4,196,352   
    ReLU-39         [[1, 2048]]           [1, 2048]              0       
  Dropout-10        [[1, 2048]]           [1, 2048]              0       
   Linear-15        [[1, 2048]]            [1, 10]            20,490     
===========================================================================
Total params: 7,929,674
Trainable params: 7,929,674
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.19
Forward/backward pass size (MB): 1.90
Params size (MB): 30.25
Estimated Total Size (MB): 32.34
---------------------------------------------------------------------------






{'total_params': 7929674, 'trainable_params': 7929674}

3.2 训练

learning_rate = 0.001
n_epochs = 50
paddle.seed(42)
np.random.seed(42)
work_path = 'work/model1'

model = AlexNet(num_classes=10)

criterion = LabelSmoothingCrossEntropy()

scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=learning_rate, T_max=50000 // batch_size * n_epochs, verbose=False)
optimizer = paddle.optimizer.Adam(parameters=model.parameters(), learning_rate=scheduler, weight_decay=1e-5)

gate = 0.0
threshold = 0.0
best_acc = 0.0
val_acc = 0.0
loss_record1 = {'train': {'loss': [], 'iter': []}, 'val': {'loss': [], 'iter': []}}   # for recording loss
acc_record1 = {'train': {'acc': [], 'iter': []}, 'val': {'acc': [], 'iter': []}}      # for recording accuracy

loss_iter = 0
acc_iter = 0

for epoch in range(n_epochs):
    # ---------- Training ----------
    model.train()
    train_num = 0.0
    train_loss = 0.0

    val_num = 0.0
    val_loss = 0.0
    accuracy_manager = paddle.metric.Accuracy()
    val_accuracy_manager = paddle.metric.Accuracy()
    print("#===epoch: {}, lr={:.10f}===#".format(epoch, optimizer.get_lr()))
    for batch_id, data in enumerate(train_loader):
        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)

        logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        accuracy_manager.update(acc)
        if batch_id % 10 == 0:
            loss_record1['train']['loss'].append(loss.numpy())
            loss_record1['train']['iter'].append(loss_iter)
            loss_iter += 1

        loss.backward()

        optimizer.step()
        scheduler.step()
        optimizer.clear_grad()
        
        train_loss += loss
        train_num += len(y_data)

    total_train_loss = (train_loss / train_num) * batch_size
    train_acc = accuracy_manager.accumulate()
    acc_record1['train']['acc'].append(train_acc)
    acc_record1['train']['iter'].append(acc_iter)
    acc_iter += 1
    # Print the information.
    print("#===epoch: {}, train loss is: {}, train acc is: {:2.2f}%===#".format(epoch, total_train_loss.numpy(), train_acc*100))

    # ---------- Validation ----------
    model.eval()

    for batch_id, data in enumerate(val_loader):

        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)
        with paddle.no_grad():
          logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        val_accuracy_manager.update(acc)

        val_loss += loss
        val_num += len(y_data)

    total_val_loss = (val_loss / val_num) * batch_size
    loss_record1['val']['loss'].append(total_val_loss.numpy())
    loss_record1['val']['iter'].append(loss_iter)
    val_acc = val_accuracy_manager.accumulate()
    acc_record1['val']['acc'].append(val_acc)
    acc_record1['val']['iter'].append(acc_iter)
    
    print("#===epoch: {}, val loss is: {}, val acc is: {:2.2f}%===#".format(epoch, total_val_loss.numpy(), val_acc*100))

    # ===================save====================
    if val_acc > best_acc:
        best_acc = val_acc
        paddle.save(model.state_dict(), os.path.join(work_path, 'best_model.pdparams'))
        paddle.save(optimizer.state_dict(), os.path.join(work_path, 'best_optimizer.pdopt'))

print(best_acc)
paddle.save(model.state_dict(), os.path.join(work_path, 'final_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'final_optimizer.pdopt'))
#===epoch: 0, lr=0.0010000000===#
#===epoch: 0, train loss is: [1.9650538], train acc is: 31.60%===#
#===epoch: 0, val loss is: [1.6305186], val acc is: 48.43%===#
#===epoch: 1, lr=0.0009990134===#
#===epoch: 1, train loss is: [1.7221233], train acc is: 43.71%===#
#===epoch: 1, val loss is: [1.6270477], val acc is: 49.51%===#
#===epoch: 2, lr=0.0009960574===#
#===epoch: 2, train loss is: [1.6308397], train acc is: 48.63%===#
#===epoch: 2, val loss is: [1.4422747], val acc is: 57.20%===#
#===epoch: 3, lr=0.0009911436===#
#===epoch: 3, train loss is: [1.5810406], train acc is: 51.07%===#
#===epoch: 3, val loss is: [1.4248943], val acc is: 59.70%===#
#===epoch: 4, lr=0.0009842916===#
#===epoch: 4, train loss is: [1.5372194], train acc is: 53.51%===#
#===epoch: 4, val loss is: [1.4051503], val acc is: 59.93%===#
#===epoch: 5, lr=0.0009755283===#
#===epoch: 5, train loss is: [1.5010643], train acc is: 55.45%===#
#===epoch: 5, val loss is: [1.3973494], val acc is: 60.99%===#
#===epoch: 6, lr=0.0009648882===#
#===epoch: 6, train loss is: [1.4671948], train acc is: 57.16%===#
#===epoch: 6, val loss is: [1.3499789], val acc is: 62.58%===#
#===epoch: 7, lr=0.0009524135===#
#===epoch: 7, train loss is: [1.4449805], train acc is: 57.94%===#
#===epoch: 7, val loss is: [1.3680706], val acc is: 62.40%===#
#===epoch: 8, lr=0.0009381533===#
#===epoch: 8, train loss is: [1.4209102], train acc is: 59.30%===#
#===epoch: 8, val loss is: [1.2962121], val acc is: 65.51%===#
#===epoch: 9, lr=0.0009221640===#
#===epoch: 9, train loss is: [1.408174], train acc is: 60.22%===#
#===epoch: 9, val loss is: [1.31671], val acc is: 64.29%===#
#===epoch: 10, lr=0.0009045085===#
#===epoch: 10, train loss is: [1.3806214], train acc is: 61.43%===#
#===epoch: 10, val loss is: [1.2446662], val acc is: 67.59%===#
#===epoch: 11, lr=0.0008852566===#
#===epoch: 11, train loss is: [1.3622785], train acc is: 62.25%===#
#===epoch: 11, val loss is: [1.2531981], val acc is: 67.60%===#
#===epoch: 12, lr=0.0008644843===#
#===epoch: 12, train loss is: [1.3495497], train acc is: 63.01%===#
#===epoch: 12, val loss is: [1.2252071], val acc is: 68.14%===#
#===epoch: 13, lr=0.0008422736===#
#===epoch: 13, train loss is: [1.3302865], train acc is: 63.91%===#
#===epoch: 13, val loss is: [1.2246354], val acc is: 68.83%===#
#===epoch: 14, lr=0.0008187120===#
#===epoch: 14, train loss is: [1.325365], train acc is: 64.09%===#
#===epoch: 14, val loss is: [1.1886824], val acc is: 70.48%===#
#===epoch: 15, lr=0.0007938926===#
#===epoch: 15, train loss is: [1.3083125], train acc is: 64.63%===#
#===epoch: 15, val loss is: [1.2410982], val acc is: 67.76%===#
#===epoch: 16, lr=0.0007679134===#
#===epoch: 16, train loss is: [1.2942201], train acc is: 65.51%===#
#===epoch: 16, val loss is: [1.1892152], val acc is: 70.74%===#
#===epoch: 17, lr=0.0007408768===#
#===epoch: 17, train loss is: [1.284402], train acc is: 66.05%===#
#===epoch: 17, val loss is: [1.2043357], val acc is: 70.05%===#
#===epoch: 18, lr=0.0007128896===#
#===epoch: 18, train loss is: [1.2684674], train acc is: 66.85%===#
#===epoch: 18, val loss is: [1.1422471], val acc is: 72.52%===#
#===epoch: 19, lr=0.0006840623===#
#===epoch: 19, train loss is: [1.2642417], train acc is: 67.08%===#
#===epoch: 19, val loss is: [1.1457285], val acc is: 72.66%===#
#===epoch: 20, lr=0.0006545085===#
#===epoch: 20, train loss is: [1.2530406], train acc is: 67.44%===#
#===epoch: 20, val loss is: [1.1435425], val acc is: 72.41%===#
#===epoch: 21, lr=0.0006243449===#
#===epoch: 21, train loss is: [1.230555], train acc is: 68.47%===#
#===epoch: 21, val loss is: [1.151703], val acc is: 72.39%===#
#===epoch: 22, lr=0.0005936907===#
#===epoch: 22, train loss is: [1.2243475], train acc is: 68.78%===#
#===epoch: 22, val loss is: [1.1317416], val acc is: 73.12%===#
#===epoch: 23, lr=0.0005626666===#
#===epoch: 23, train loss is: [1.2125044], train acc is: 69.28%===#
#===epoch: 23, val loss is: [1.131524], val acc is: 73.56%===#
#===epoch: 24, lr=0.0005313953===#
#===epoch: 24, train loss is: [1.1983484], train acc is: 70.06%===#
#===epoch: 24, val loss is: [1.1417092], val acc is: 73.41%===#
#===epoch: 25, lr=0.0005000000===#
#===epoch: 25, train loss is: [1.1918993], train acc is: 70.11%===#
#===epoch: 25, val loss is: [1.1028641], val acc is: 74.72%===#
#===epoch: 26, lr=0.0004686047===#
#===epoch: 26, train loss is: [1.1755028], train acc is: 70.75%===#
#===epoch: 26, val loss is: [1.0835562], val acc is: 75.52%===#
#===epoch: 27, lr=0.0004373334===#
#===epoch: 27, train loss is: [1.1704789], train acc is: 71.13%===#
#===epoch: 27, val loss is: [1.0854902], val acc is: 76.30%===#
#===epoch: 28, lr=0.0004063093===#
#===epoch: 28, train loss is: [1.1576461], train acc is: 71.59%===#
#===epoch: 28, val loss is: [1.0876684], val acc is: 75.67%===#
#===epoch: 29, lr=0.0003756551===#
#===epoch: 29, train loss is: [1.1432424], train acc is: 72.16%===#
#===epoch: 29, val loss is: [1.0748475], val acc is: 76.01%===#
#===epoch: 30, lr=0.0003454915===#
#===epoch: 30, train loss is: [1.1328856], train acc is: 72.69%===#
#===epoch: 30, val loss is: [1.0685778], val acc is: 76.72%===#
#===epoch: 31, lr=0.0003159377===#
#===epoch: 31, train loss is: [1.1225183], train acc is: 73.08%===#
#===epoch: 31, val loss is: [1.0607836], val acc is: 76.54%===#
#===epoch: 32, lr=0.0002871104===#
#===epoch: 32, train loss is: [1.114567], train acc is: 73.67%===#
#===epoch: 32, val loss is: [1.0464559], val acc is: 77.76%===#
#===epoch: 33, lr=0.0002591232===#
#===epoch: 33, train loss is: [1.1031892], train acc is: 74.00%===#
#===epoch: 33, val loss is: [1.0455275], val acc is: 77.51%===#
#===epoch: 34, lr=0.0002320866===#
#===epoch: 34, train loss is: [1.0884582], train acc is: 74.94%===#
#===epoch: 34, val loss is: [1.0408577], val acc is: 77.93%===#
#===epoch: 35, lr=0.0002061074===#
#===epoch: 35, train loss is: [1.0837501], train acc is: 74.68%===#
#===epoch: 35, val loss is: [1.0423734], val acc is: 77.95%===#
#===epoch: 36, lr=0.0001812880===#
#===epoch: 36, train loss is: [1.0759592], train acc is: 74.98%===#
#===epoch: 36, val loss is: [1.0235242], val acc is: 78.32%===#
#===epoch: 37, lr=0.0001577264===#
#===epoch: 37, train loss is: [1.0702893], train acc is: 75.37%===#
#===epoch: 37, val loss is: [1.016121], val acc is: 79.02%===#
#===epoch: 38, lr=0.0001355157===#
#===epoch: 38, train loss is: [1.0639284], train acc is: 75.69%===#
#===epoch: 38, val loss is: [1.0210428], val acc is: 78.49%===#
#===epoch: 39, lr=0.0001147434===#
#===epoch: 39, train loss is: [1.0564318], train acc is: 76.00%===#
#===epoch: 39, val loss is: [1.0193514], val acc is: 79.28%===#
#===epoch: 40, lr=0.0000954915===#
#===epoch: 40, train loss is: [1.051025], train acc is: 76.37%===#
#===epoch: 40, val loss is: [1.014566], val acc is: 79.20%===#
#===epoch: 41, lr=0.0000778360===#
#===epoch: 41, train loss is: [1.0399163], train acc is: 76.71%===#
#===epoch: 41, val loss is: [1.014955], val acc is: 78.92%===#
#===epoch: 42, lr=0.0000618467===#
#===epoch: 42, train loss is: [1.0356172], train acc is: 77.01%===#
#===epoch: 42, val loss is: [1.0119495], val acc is: 79.00%===#
#===epoch: 43, lr=0.0000475865===#
#===epoch: 43, train loss is: [1.0322536], train acc is: 77.08%===#
#===epoch: 43, val loss is: [1.0156872], val acc is: 79.09%===#
#===epoch: 44, lr=0.0000351118===#
#===epoch: 44, train loss is: [1.0303307], train acc is: 77.04%===#
#===epoch: 44, val loss is: [1.0064815], val acc is: 79.43%===#
#===epoch: 45, lr=0.0000244717===#
#===epoch: 45, train loss is: [1.02571], train acc is: 77.35%===#
#===epoch: 45, val loss is: [1.0124086], val acc is: 79.30%===#
#===epoch: 46, lr=0.0000157084===#
#===epoch: 46, train loss is: [1.0294893], train acc is: 77.29%===#
#===epoch: 46, val loss is: [1.0075635], val acc is: 79.39%===#
#===epoch: 47, lr=0.0000088564===#
#===epoch: 47, train loss is: [1.0229493], train acc is: 77.52%===#
#===epoch: 47, val loss is: [1.0071208], val acc is: 79.50%===#
#===epoch: 48, lr=0.0000039426===#
#===epoch: 48, train loss is: [1.022951], train acc is: 77.40%===#
#===epoch: 48, val loss is: [1.0082942], val acc is: 79.53%===#
#===epoch: 49, lr=0.0000009866===#
#===epoch: 49, train loss is: [1.0234195], train acc is: 77.50%===#
#===epoch: 49, val loss is: [1.0080563], val acc is: 79.57%===#
0.7956882911392406

3.3 实验结果

plot_learning_curve(loss_record1, title='loss', ylabel='CE Loss')

在这里插入图片描述

plot_learning_curve(acc_record1, title='acc', ylabel='Accuracy')

在这里插入图片描述

import time
work_path = 'work/model1'
model = AlexNet(num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
aa = time.time()
for batch_id, data in enumerate(val_loader):

    x_data, y_data = data
    labels = paddle.unsqueeze(y_data, axis=1)
    with paddle.no_grad():
        logits = model(x_data)
bb = time.time()
print("Throughout:{}".format(int(len(val_dataset)//(bb - aa))))
Throughout:1165
work_path = 'work/model1'
X, y = next(iter(DataLoader(val_dataset, batch_size=18)))
model = AlexNet(num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
logits = model(X)
y_pred = paddle.argmax(logits, -1)
X = paddle.transpose(X, [0, 2, 3, 1])
axes = show_images(X.reshape((18, 128, 128, 3)), 1, 18, pred=get_cifar10_labels(y_pred), gt=get_cifar10_labels(y))
plt.show()
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

在这里插入图片描述

4. 对比实验结果

modelTrain AccVal Accparameter
AlexNet w/o MixConv0.77500.795697929674
AlexNet w MixConv0.80480.824177065418

总结

        MixConv在减少参数(-864256)的同时大大加快了收敛速度以及精度(+0.02848)

此文仅为搬运,原作链接:https://aistudio.baidu.com/aistudio/projectdetail/4349384

Logo

学大模型,用大模型上飞桨星河社区!每天8点V100G算力免费领!免费领取ERNIE 4.0 100w Token >>>

更多推荐