![cover](https://img-blog.csdnimg.cn/img_convert/9a63bc53a6b5f8e569e60c2f556c2693.png)
【AI特训营第三期】基于PaddleClas的水稻细粒度分类
本文选用PaddleClas套件中的ResNet50_vd模型对水稻分类,精度可达0.99。与此同时,针对Transformer模型在分类中的应用进行了说明与代码实现。
★★★ 本文源自AlStudio社区精品项目,【点击此处】查看更多精品内容 >>>
1.项目介绍
水稻是世界上三大粮食作物之一,其起源于中国。根据考古学的研究,水稻在中国的种植历史可以追溯至公元前6000年左右。水稻以其高产、高效、适应性广等特点,成为亚洲和非洲的主要粮食作物,许多人甚至将其视为生命之源。因此,水稻品种本身的优良性至关重要。水稻品种有着不同的质量评价标准,包括外观、烹饪香气以及味道等要求。传统的人工检测方法通常是根据物理评价标准去进行分类,成本昂贵、效率低下且存在不可靠性。近年来,随着机器视觉系统和图像处理技术快速发展,可以从颜色、质地、大小等许多物理特征方面进行自动检测。但在如何提取好的特征方面也存在一定的难度,需要丰富的经验选择合适的特征进行分类。针对这些情况,选择深度学习方法进行水稻品种分类具有很大的应用前景。本文主要是使用ResNet50_vd模型对水稻分类,由环境配置、数据处理、模型配置、模型训练与优化、模型导出与推理为主要部分。其次为了研究Transformer模型在分类问题中的应用,大致讲解了一下如何搭建Transformer模型以及实现。
2.解决方法
为了快速高效的对水稻品种分类,我们首先选择使用飞桨提供的PaddleClas套件。PaddleClas不仅提供了快速构建模型的能力,还包含了数据增强、数据预处理、模型训练、模型评估等完整的模型工作流程。用户可以根据自己的需求灵活配置各种参数,快速进行模型训练和调优。
PaddleClas是一个非常强大的图像分类套件,非常适合初学者和专业人士使用,其具有以下优点:
1.用户友好,易于上手和调试;
2.具有高效的计算性能和灵活的模型配置;
3.支持常见的图像分类算法和模型,满足不同应用场景的需求;
4.可扩展性强,可以自定义模型和指标。
PaddleClas地址:github
PaddleClas文档:使用文档
与此同时,为了更好的帮助理解Transformer模型,同时利用飞桨API搭建分类框架,包括参数配置、数据加载、模型组网、训练与优化等部分。Transformer 是 Vaswani 等人在 2017 年提出的一种基于自注意力机制(self-attention)的神经网络模型,让神经网络在机器翻译等自然语言处理任务上一举超越了传统的基于循环神经网络的方法。它是目前在自然语言处理领域中最重要的模型之一,也是 GPT、BERT、Transformer-XL 等知名预训练模型背后的核心算法。
Transformer主要有两部分:编码器和解码器。编码器和解码器都由多个相同的层组成,每个层内部有多头自注意力机制(Multi-Head Attention)和前馈神经网络(Feed-Forward Neural Network)。编码器和解码器的架构都是类似的,只是在解码器中加入了一个额外的自注意力机制,用于在生成输出时查看输入的上下文信息。
具体来说,自注意力机制将输入序列分别映射为查询、键和值,然后通过查询和键之间的相似度计算得到权重,再通过权重和值计算得到输出。其中,相似度计算使用点积操作,通过缩放因子和softmax 函数将相似度转换为权重。通过这样的方式,自注意力机制能够学习输入序列内部的相互依赖关系,以及每个位置在不同上下文中的重要性。Transformer 的优点在于充分考虑了输入序列内部的相互依赖关系,能够对序列的全局结构进行建模。通过多头自注意力机制可以学习不同层次、不同方面的上下文信息,提高了模型的表现力
Transformer目前不仅应用在自然语言处理领域,还广泛用于图像分类、目标检测与图像分割领域,可参考PaddleViT学习,此处仅以图像分类为例。Transformer分类模型的整体结构如下所示:
从上图可以看出,Transformer的核心部分在于编码器中的Attention部分。将图像转换成与词向量类似的tokens作为网络输入,通过多层编码器进行关键信息的提取,最后用于softmax分类器进行输出,即可得到想要的分类结果。具体的细节在下面一一叙述。
3.安装环境
-
Python >= 3.6
-
PaddlePaddle >= 2.1
-
PaddleClas
注意:以下内容均会使用到linux常见操作指令,如ls、cd、clone等,部分指令如下:
ls:列出当前目录的文件和文件夹。
cd:切换目录。
mkdir:创建目录。
cp:复制文件或目录。
mv:移动或重命名文件或目录。
rm:删除文件或目录。
unzip:解压缩文件。
其他操作指令可参考网址:linux菜鸟教程
# 选择work目录
%cd /home/aistudio/work
# 将PaddleClas clone 到本地
!git clone https://github.com/PaddlePaddle/PaddleClas.git
# 切换到PaddleClas目录
%cd /home/aistudio/work/PaddleClas
# 安装项目额外依赖库
!pip install -r requirements.txt
!python setup.py install
3.数据集介绍与处理
本文采用的数据集来源于kaggle网站中的Rice Image Dataset,数据集压缩文件220M左右,一共分为5种水稻图像数据:Arborio, Basmati, Ipsala, Jasmine, Karacadag。
每种水稻数据集各有15000张图片,共有75000张图片。本项目创建时数据集已添加,直接解压即可。
##查看当前挂载的数据集目录,并解压。
%cd /home/aistudio/data/
!unzip data199018/Rice_Image_Dataset.zip
##第二种方式,可通过wget指令下载数据集并解压。
#%cd /home/aistudio/data/
#!wget https://www.muratkoklu.com/datasets/Rice_Image_Dataset.zip
#!unzip Rice_Image_Dataset.zip
3.1 数据查看
解压完后的数据位于/data/Rice_Image_Dataset中,包含5个文件夹,分别对应五种水稻品种的数据。
为了展示不同水稻图片,在这里对每种水稻各选取一张图进行显示。
##导入模块
import cv2,os
import matplotlib.pyplot as plt
import warnings
##有时会出现警告信息,选择忽略
warnings.filterwarnings("ignore")
##使用交互式指令,plt.show()可省略
%matplotlib inline
path = "/home/aistudio/data/Rice_Image_Dataset/"
data_list = os.listdir(path)
#print(data_list)
index = 1
plt.figure(figsize=(12,3))
for cur_dir in data_list:
if not cur_dir.endswith(".txt"):
for data in os.listdir(os.path.join(path,cur_dir)):
img = cv2.imread(os.path.join(path,cur_dir,data))
#print(img.shape)
plt.subplot(1,5,index)
index += 1
plt.title(cur_dir)
plt.imshow(img)
break
3.2 数据集划分
数据准备完成后,需要对数据格式按照如下结构组织数据,其中train_list.txt 和val_list.txt的格式形如下:
#每一行采用"空格"分隔图像路径与标注
Arborio(1000).jpg 0
前面为图像名,后面为图像标签,一般从0开始。我们将数据集按训练集与验证集的比例成7:3进行划分,并生成train_list.txt 和val_list.txt,以及对应的train_image和val_image文件夹,新生成的文件路为/home/aistudio/data/。
标签文件中的内容为:
0 Arborio
1 Basmati
2 Ipsala
3 Jasmine
4 Karacadag
import os
import numpy as np
from PIL import Image
import io
import re
import shutil
root_path = "/home/aistudio/data/Rice_Image_Dataset/"
data_list = os.listdir(root_path)
save_path = "/home/aistudio/data/"
#创建文件夹
train = "/home/aistudio/data/train/"
val = "/home/aistudio/data/val/"
if not os.path.exists(train):
os.makedirs(train)
if not os.path.exists(val):
os.makedirs(val)
##训练集与验证集比例
train_ratio = 0.7
##文件写入
train_file = open(os.path.join(save_path, 'train_list.txt'), 'w', encoding='utf-8')
val_file = open(os.path.join(save_path, 'val_list.txt'), 'w', encoding='utf-8')
label_dict = os.path.join(save_path, 'label_list.txt')
##统计数量
a = 0
b = 0
with open(label_dict,"w") as label_list:
label_id = 0
for i, path in enumerate(sorted(data_list)):
if not path.endswith(".txt") and "_" not in path:
label_list.write("{0} {1}\n".format(label_id,path))
image_path = os.listdir(os.path.join(root_path, path))
n = len(image_path) # n = 15000
for index, img in enumerate(image_path):
try:
#img此时为相对路径
img_f = os.path.join(root_path,path,img)
##判断图片是否OK,如果OK就保存
with open(img_f,"rb") as img_file:
save_img = Image.open(io.BytesIO(img_file.read()))
if index < int(n * train_ratio):
shutil.copyfile(os.path.join(root_path,path,img),os.path.join(train,img))
##去掉图片名称中的空格以及括号,重新命名图片
new_img = re.sub(r"[() ]","",img)
os.rename(os.path.join(train,img),os.path.join(train,new_img))
train_file.write("{0} {1}\n".format(os.path.join(train,new_img),label_id))
a += 1
else:
shutil.copyfile(os.path.join(root_path,path,img),os.path.join(val,img))
new_img = re.sub(r"[() ]","",img)
os.rename(os.path.join(val,img),os.path.join(val,new_img))
val_file.write("{0} {1}\n".format(os.path.join(val,new_img),label_id))
b += 1
except:
continue
label_id += 1
train_file.close()
val_file.close()
print(a)
print(b)
##查看数据
!tree -L 1 /home/aistudio/data
!head -20 /home/aistudio/data/train_list.txt
4.模型训练
PaddleClas提供了丰富的模型库,多达29个系列,同时也提供了134个模型在ImageNet1k数据集上的训练配置以及预训练模型。在数据预处理上,提供了8种数据增广方式,可更加便捷地进行数据增广扩充,提升模型的鲁棒性。我们只需要根据选择的模型,修改configs文件夹中对应的yaml配置文件即可进行训练。在这里我们选择使用模型库中的ResNet50_vd模型,配置文件位于…/configs/ImageNet/ResNet/ResNet50_vd.yaml。参数详细说明请点击模型配置说明
ResNet50_vd.yaml配置信息:
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu ##默认使用GPU,根据情况可选择cpu
save_interval: 1 ##保存轮数,每一轮保存一次
eval_during_train: True ##训练同时开启验证
eval_interval: 1
epochs: 10 ##迭代轮数
print_batch_step: 100 ##每100iter 打印信息
use_visualdl: True
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# model architecture
Arch:
name: ResNet50_vd
class_num: 5 #类别数
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum ##优化器
momentum: 0.9
lr:
name: CosineWarmup
learning_rate: 0.1
warmup_epoch:10 ##学习率预热
regularizer:
name: 'L2'
coeff: 0.00007
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./ ##数据集根目录
cls_label_path: /home/aistudio/data/train_list.txt ##此处为相对路径
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
batch_transform_ops:
- MixupOperator:
alpha: 0.2
sampler:
name: DistributedBatchSampler
batch_size: 32
drop_last: False
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./
cls_label_path: /home/aistudio/data/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 32
drop_last: False
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/inference_deployment/whl_demo.jpg
batch_size: 1
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 1
class_id_map_file: data/label_list.txt
Metric:
Train:
Eval:
- TopkAcc:
topk: [1, 5]
修改完配置文件后,运行以下代码开始同时训练与评估。评估精度最终可达到0.99853
# 切换到PaddleClas目录
%cd /home/aistudio/work/PaddleClas
!python3 tools/train.py --config /home/aistudio/work/PaddleClas/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml
Transformer分类模型
此部分主要针对Transformer模型分类,进行参数配置、数据加载、网络定义、模型训练与优化的说明与实现。参数配置包括数据加载与保存参数、训练参数、模型参数以及优化器参数,可根据自己需求进行自定义修改。日志文件与统计类用于记录模型的运行信息,详细情况如下。
import logging,time
train_parameters = {
##数据部分
"image_size": 224,
"input_channels": 3,
"resize_short": 256,
"mean_rgb": [0.485, 0.456, 0.406], # 均值
"std_rgb": [0.229, 0.224, 0.225], # 标准差
"data_dir": r"/home/aistudio/data/", # 训练数据存储地址
"train_file_list": "train.txt",
"val_file_list": "val.txt",
"label_file": "label_list.txt",
##训练配置
"batch_size": 16,
"save_path": "./freeze-model",
"save_freq": 10, ##保存频率
"last_epoch": 0,
"pretrained": False,
"pretrained_dir": r"/home/aistudio",
"mode": "train",
"use_gpu": True,
"num_works": 1,
"accum_iter": 1, ##计算梯度的频次
"debug_freq": 100,
## 模型部分
"num_epochs": 200,
"num_classes": 5, # 分类数
"patch_size": 16,
"embed_dim": 768,
"depth": 12,
"num_heads": 8,
"attn_head_size": None,
"mlp_ratio": 4.0,
"qkv_bias": True,
"dropout": 0.0,
"attention_dropout": 0.0,
"droppath": 0,
##优化器部分
"base_lr": 0.002,
"weight_decay": 0.01,
"betas": [0.9,0.999],
"eps": 1e-8,
"warmup_epochs": 40,
"warmup_start_lr": 0.00001,
"end_lr": 0.0001
}
##初始化日志配置,此函数初始化一次即可,后续直接调用全局变量logger进行日志记录
def init_log_config():
"""
初始化日志相关配置
:return:
"""
global logger
logger = logging.getLogger() ##创建logger对象
logger.setLevel(logging.INFO) ##设置日志级别
log_path = os.path.join(os.getcwd(), 'logs')
if not os.path.exists(log_path):
os.makedirs(log_path)
log_name = os.path.join(log_path, 'train.log')
sh = logging.StreamHandler()
sh.setLevel(logging.INFO) ##设置输出到控制台的级别
fh = logging.FileHandler(log_name, mode='w')
fh.setLevel(logging.DEBUG) ##设置文件级别
#输出格式
formatter = logging.Formatter("%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s")
fh.setFormatter(formatter)
sh.setFormatter(formatter)
# 将控制台日志输出对象和文件日志输出对象添加到logger对象中
logger.addHandler(sh)
logger.addHandler(fh)
##统计信息
class AverageMeter():
""" Meter for monitoring losses"""
def __init__(self):
self.avg = 0
self.sum = 0
self.cnt = 0
self.reset()
def reset(self):
"""reset all values to zeros"""
self.avg = 0
self.sum = 0
self.cnt = 0
def update(self, val, n=1):
"""update avg by val and n, where val is the avg of n values"""
self.sum += val * n
self.cnt += n
self.avg = self.sum / self.cnt
在飞桨中,Dataset和DataLoader是两个非常重要的数据处理接口,用于处理数据读取、预处理和批量加载等问题。Dataset是数据集接口,用于读取和处理数据。DataLoader是一个数据迭代器,用于将数据按照batch size拆分并返回。使用Dataset一般需要按照以下步骤进行:
- 创建自定义数据集类,继承自paddle.io.Dataset。
- 实现__init__、__getitem__和__len__这三个函数。
- 在__init__中初始化数据集相关参数,比如数据集路径等。
- 在__getitem__中读取图片等数据,并对数据进行预处理。
- 在__len__中返回数据集大小。
使用DataLoader相对简单,只需要按照以下步骤操作:
- 创建DataLoader对象,并将自定义数据集类作为输入参数传入。可以设置batch size、是否打乱数据、是否使用多进程等参数。
import os
import math
from paddle.io import Dataset
from paddle.io import DataLoader
from paddle.io import DistributedBatchSampler
from paddle.vision import transforms
from paddle.vision import image_load
class RiceImageDataset(Dataset):
def __init__(self, file_folder, is_train=True, transform_ops=None):
super().__init__()
self.file_folder = file_folder
self.transforms = transform_ops
self.img_path_list = []
self.label_list = []
list_name = 'train_list.txt' if is_train else 'val_list.txt'
self.list_file = os.path.join(self.file_folder, list_name)
assert os.path.isfile(self.list_file), f'{self.list_file} not exist!'
#读写文件
with open(self.list_file, 'r') as infile:
for line in infile:
img_path = line.strip().split()[0]
img_label = int(line.strip().split()[1])
self.img_path_list.append(os.path.join(self.file_folder, img_path))
self.label_list.append(img_label)
def __len__(self):
return len(self.label_list)
#返回数据
def __getitem__(self, index):
data = image_load(self.img_path_list[index]).convert('RGB')
data = self.transforms(data)
label = self.label_list[index]
return data, label
##自定义的预处理操作,可以自己实现,也可以调用已有的接口,这里直接使用接口
def train_transforms_vit(is_train):
if is_train: ##true训练
transform = transforms.Compose([
transforms.RandomResizedCrop(size=(train_parameters["image_size"],train_parameters["image_size"]),
interpolation='bicubic'),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(mean=train_parameters["mean_rgb"], std=train_parameters["std_rgb"])])
else: ##false验证
transform = transforms.Compose([
transforms.Resize(size=train_parameters["resize_short"]),
transforms.CenterCrop(size=(train_parameters["image_size"], train_parameters["image_size"])),
transforms.ToTensor(),
transforms.Normalize(mean=train_parameters["mean_rgb"], std=train_parameters["std_rgb"])])
return transform
##打印数据
trans = train_transforms_vit(True)
rice_data = RiceImageDataset(train_parameters["data_dir"],True,trans)
print(rice_data[0])
##数据批量读取
def get_dataloader(dataset, is_train=True, use_dist_sampler=False):
batch_size = train_parameters["batch_size"]
if use_dist_sampler is True: ##开启多GPU训练
sampler = DistributedBatchSampler(dataset=dataset,
batch_size=batch_size,
shuffle=is_train,
drop_last=is_train)
dataloader = DataLoader(dataset=dataset,
batch_sampler=sampler,
num_workers=train_parameters["num_works"])
else:
dataloader = DataLoader(dataset=dataset,
batch_size=batch_size,
num_workers=train_parameters["num_works"],
shuffle=is_train,
drop_last=is_train)
return dataloader
data_loader = get_dataloader(rice_data, True, False)
print(len(data_loader))
Transformer模型最早是用于自然语言处理,其输入数据一般为Embedding序列,通过网络可以预测下一个Token的概率。因此,我们需要对图像进行类似操作,以保证模型可以正常训练。具体实现:将图像分成大小相等的patches,并将每个patch作为一个输入元素,形成一个输入序列。通过多层Transformer的操作,可以学习到一个全局的向量表示,代表整个图像的语义信息。ViT模型不包含递归和卷积结构,为了使模型能够有效利用序列的顺序特征,我们需要加入序列中各个Token间相对位置或Token在序列中绝对位置的信息。由于位置编码与Embedding具有相同的维度,因此两者可以直接相加。位置编码有固定编码与可学习编码两种形式,在这里选用可学习的位置编码。
此项目需要进行分类任务,还需要在输入序列的开头添加cls_tokens用于分类计算输出。具体来说,由于在Transformer模型中每个位置的输入都包含了前面所有位置的信息,所以在输入序列开头添加cls_tokens可以让模型“看到”整个序列的信息。在ViT中,cls_tokens通常是一个可训练参数,它的形状和位置编码矩阵相同。增加cls_tokens的目的是让模型能够更好地捕捉整个序列的语义信息,从而提高模型的分类性能和特征表达能力。经过以上步骤后,即可将处理好的Tokens送入到Encoder进行编码处理,Encoder层的个数可根据需求设置,每层都是相同的结构,包括Multi-Head Attention、LayerNorm与Feed Forward Network。
在NLP中,Attention机制用于对序列中不同位置的元素进行加权,以强调对当前位置的预测具有更重要的信息。在图像分类中,Attention机制用于图像的局部和全局特征提取。在训练后,ViT能够学习到不同层次的特征表示,从浅层特征到高级特征,最终获得图像的全局特征表示。这种特征提取方式相比于传统的卷积神经网络,更加灵活和可扩展。Attention机制示意图 如下所示:
Attention函数可以将Query和一组Key-Value对映射到输出,其中Query、Key的维度相同,Value可以不同,三者输出都是向量,我们称这种特殊的Attention机制为"Scaled Dot-Product Attention"。 我们首先分别计算Query与各个Key的相似度,然后将每个相似度矩阵进行scale,使用Softmax函数来获得Key的权重矩阵。 最后使用权重矩阵与Value进行加权求和,即可得到输出结果。计算公式:
这里的缩放因子相当于归一化操作,防止网络在训练过程中会产生较大的波动。相对于Attention机制,Multi-Head Attention机制能让模型考虑到不同位置的Attention,Attention可以在不同的子空间表示不一样的关联关系,使用单个Head的Attention一般达不到这种效果。
Encoder模块中还使用了Layer Normalization,其本质是规范优化空间,加速收敛。每一层经过Attention之后,还会有一个Feed Forward Network,这个前馈网络作用就是空间变换。网络包含2个线性变换层,层与层之间引入了非线性(ReLu激活函数),增加了模型的表现能力。(有兴趣的可以删掉前馈网络试试,会影响精度)
在网络的最后使用分类任务常用的softmax分类器进行输出,至此整个模型结构搭建完成,后续即可进行模型训练与优化。
具体细节参考如下代码。代码可正常运行,不过由于时间问题,暂时没有跑完整个训练过程,有兴趣的可以自己跑跑看。
import paddle
import paddle.nn as nn
class Identity(nn.Layer):
##维持输入,不做任何改变
def __init__(self):
super().__init__()
def forward(self, x):
return x
##Embedding部分
class PatchEmbedding(nn.Layer):
"""Patch Embedding
Apply patch embedding (which is implemented using Conv2D) on input data.
Attributes:
image_size: image size
patch_size: patch size
num_patches: num of patches
patch_embddings: patch embed operation (Conv2D)
"""
def __init__(self,
image_size=224,
patch_size=16,
in_channels=3,
embed_dim=768):
super().__init__()
self.image_size = image_size
self.patch_size = patch_size ##块大小
self.num_patches = (image_size // patch_size) * (image_size // patch_size) ##块数量
self.patch_embedding = nn.Conv2D(in_channels=in_channels,
out_channels=embed_dim,
kernel_size=patch_size,
stride=patch_size)
def forward(self, x):
x = self.patch_embedding(x)
x = x.flatten(2) # [B, C, H, W] -> [B, C, h*w]
x = x.transpose([0, 2, 1]) # [B, C, h*w] -> [B, h*w, C] = [B, N, C] #N:patch总数量,C:Embed_dim维度
return x
##注意力机制部分
class Attention(nn.Layer):
""" Attention module
Attention module for ViT, here q, k, v are assumed the same.
The qkv mappings are stored as one single param.
Attributes:
num_heads: number of heads
attn_head_size: feature dim of single head
all_head_size: feature dim of all heads
qkv: a nn.Linear for q, k, v mapping
scales: 1 / sqrt(single_head_feature_dim)
out: projection of multi-head attention
attn_dropout: dropout for attention
proj_dropout: final dropout before output
softmax: softmax op for attention
"""
def __init__(self,
embed_dim,
num_heads,
attn_head_size=None,
qkv_bias=True,
dropout=0.,
attention_dropout=0.):
super().__init__()
self.embed_dim = embed_dim
self.num_heads = num_heads
if attn_head_size is not None:
self.attn_head_size = attn_head_size
else:
assert embed_dim % num_heads == 0, "embed_dim must be divisible by num_heads"
self.attn_head_size = embed_dim // num_heads ##计算每个头的维度
self.all_head_size = self.attn_head_size * num_heads ##所有头维度,embed_dim
##权重初始化
w_attr_1, b_attr_1 = self._init_weights()
##生成qkv向量
self.qkv = nn.Linear(embed_dim,
self.all_head_size * 3, # weights for q, k, and v
weight_attr=w_attr_1,
bias_attr=b_attr_1 if qkv_bias else False)
##缩放操作
self.scales = self.attn_head_size ** -0.5
w_attr_2, b_attr_2 = self._init_weights()
##输出结果
self.out = nn.Linear(self.all_head_size,
embed_dim,
weight_attr=w_attr_2,
bias_attr=b_attr_2)
self.attn_dropout = nn.Dropout(attention_dropout)
self.proj_dropout = nn.Dropout(dropout)
self.softmax = nn.Softmax(axis=-1)
def _init_weights(self):
weight_attr = paddle.ParamAttr(initializer=nn.initializer.TruncatedNormal(std=.02))
bias_attr = paddle.ParamAttr(initializer=nn.initializer.Constant(0.0))
return weight_attr, bias_attr
##维度变换,需要计算的是每个头所有patch的head_dim
def transpose_multihead(self, x):
"""[B, N, C] -> [B, N, n_heads, head_dim] -> [B, n_heads, N, head_dim]"""
new_shape = x.shape[:-1] + [self.num_heads, self.attn_head_size]
x = x.reshape(new_shape) # [B, N, C] -> [B, N, n_heads, head_dim]
x = x.transpose([0, 2, 1, 3]) # [B, N, n_heads, head_dim] -> [B, n_heads, N, head_dim]
return x
def forward(self, x):
qkv = self.qkv(x).chunk(3, axis=-1)
##使用map函数对qkv进行相同的变换,也可以使用3个线性层进行变换,不过需要注意维度的变换
q, k, v = map(self.transpose_multihead, qkv)
q = q * self.scales
attn = paddle.matmul(q, k, transpose_y=True) # [B, n_heads, N, N]
attn = self.softmax(attn)
attn = self.attn_dropout(attn)
z = paddle.matmul(attn, v) # [B, n_heads, N, head_dim]
z = z.transpose([0, 2, 1, 3]) # [B, N, n_heads, head_dim]
new_shape = z.shape[:-2] + [self.all_head_size]
z = z.reshape(new_shape) # [B, N, all_head_size]
z = self.out(z)
z = self.proj_dropout(z)
return z
##前馈神经网络部分
class Mlp(nn.Layer):
""" MLP module
Impl using nn.Linear and activation is GELU, dropout is applied.
Ops: fc -> act -> dropout -> fc -> dropout
Attributes:
fc1: nn.Linear
fc2: nn.Linear
act: GELU
dropout: dropout after fc
"""
def __init__(self,
embed_dim,
mlp_ratio,
dropout=0.):
super().__init__()
w_attr_1, b_attr_1 = self._init_weights()
self.fc1 = nn.Linear(embed_dim,
int(embed_dim * mlp_ratio),
weight_attr=w_attr_1,
bias_attr=b_attr_1)
w_attr_2, b_attr_2 = self._init_weights()
self.fc2 = nn.Linear(int(embed_dim * mlp_ratio),
embed_dim,
weight_attr=w_attr_2,
bias_attr=b_attr_2)
self.act = nn.GELU()
self.dropout = nn.Dropout(dropout)
def _init_weights(self):
weight_attr = paddle.ParamAttr(
initializer=paddle.nn.initializer.TruncatedNormal(std=0.2))
bias_attr = paddle.ParamAttr(
initializer=paddle.nn.initializer.Constant(0.0))
return weight_attr, bias_attr
def forward(self, x):
x = self.fc1(x)
x = self.act(x)
x = self.dropout(x)
x = self.fc2(x)
x = self.dropout(x)
return x
##Encoder组成部分
class TransformerLayer(nn.Layer):
"""Transformer Layer
Transformer layer contains attention, norm, mlp and residual
Attributes:
embed_dim: transformer feature dim
attn_norm: nn.LayerNorm before attention
mlp_norm: nn.LayerNorm before mlp
mlp: mlp modual
attn: attention modual
"""
def __init__(self,
embed_dim,
num_heads,
attn_head_size=None,
qkv_bias=True,
mlp_ratio=4.,
dropout=0.,
attention_dropout=0.,
droppath=0.):
super().__init__()
w_attr_1, b_attr_1 = self._init_weights()
self.attn_norm = nn.LayerNorm(embed_dim,
weight_attr=w_attr_1,
bias_attr=b_attr_1,
epsilon=1e-6)
self.attn = Attention(embed_dim,
num_heads,
attn_head_size,
qkv_bias,
dropout,
attention_dropout)
#self.drop_path = DropPath(droppath) if droppath > 0. else Identity()
w_attr_2, b_attr_2 = self._init_weights()
self.mlp_norm = nn.LayerNorm(embed_dim,
weight_attr=w_attr_2,
bias_attr=b_attr_2,
epsilon=1e-6)
self.mlp = Mlp(embed_dim, mlp_ratio, dropout)
def _init_weights(self):
weight_attr = paddle.ParamAttr(initializer=nn.initializer.Constant(1.0))
bias_attr = paddle.ParamAttr(initializer=nn.initializer.Constant(0.0))
return weight_attr, bias_attr
def forward(self, x):
h = x
x = self.attn_norm(x)
x = self.attn(x)
#x = self.drop_path(x)
x = x + h
h = x
x = self.mlp_norm(x)
x = self.mlp(x)
#x = self.drop_path(x)
x = x + h
return x
##由多个TransformerLayer的堆叠组成
class Encoder(nn.Layer):
"""Transformer encoder
Encoder encoder contains a list of TransformerLayer, and a LayerNorm.
Attributes:
layers: nn.LayerList contains multiple EncoderLayers
encoder_norm: nn.LayerNorm which is applied after last encoder layer
"""
def __init__(self,
embed_dim,
num_heads,
depth,
attn_head_size=None,
qkv_bias=True,
mlp_ratio=4.0,
dropout=0.,
attention_dropout=0.,
droppath=0.):
super().__init__()
# stochatic depth decay
depth_decay = [x.item() for x in paddle.linspace(0, droppath, depth)]
layer_list = []
for i in range(depth):
layer_list.append(TransformerLayer(embed_dim,
num_heads,
attn_head_size,
qkv_bias,
mlp_ratio,
dropout,
attention_dropout,
depth_decay[i]))
self.layers = nn.LayerList(layer_list)
w_attr_1, b_attr_1 = self._init_weights()
self.encoder_norm = nn.LayerNorm(embed_dim,
weight_attr=w_attr_1,
bias_attr=b_attr_1,
epsilon=1e-6)
def _init_weights(self):
weight_attr = paddle.ParamAttr(initializer=nn.initializer.Constant(1.0))
bias_attr = paddle.ParamAttr(initializer=nn.initializer.Constant(0.0))
return weight_attr, bias_attr
def forward(self, x):
for layer in self.layers:
x = layer(x)
x = self.encoder_norm(x)
return x
##模型整体实现
class VisionTransformer(nn.Layer):
"""ViT transformer
ViT Transformer, classifier is a single Linear layer for finetune,
For training from scratch, two layer mlp should be used.
Classification is done using cls_token.
Args:
image_size: int, input image size, default: 224
patch_size: int, patch size, default: 16
in_channels: int, input image channels, default: 3
num_classes: int, number of classes for classification, default: 1000
embed_dim: int, embedding dimension (patch embed out dim), default: 768
depth: int, number ot transformer blocks, default: 12
num_heads: int, number of attention heads, default: 12
attn_head_size: int, dim of head, if none, set to embed_dim // num_heads, default: None
mlp_ratio: float, ratio of mlp hidden dim to embed dim(mlp in dim), default: 4.0
qkv_bias: bool, If True, enable qkv(nn.Linear) layer with bias, default: True
dropout: float, dropout rate for linear layers, default: 0.
attention_dropout: float, dropout rate for attention layers default: 0.
droppath: float, droppath rate for droppath layers, default: 0.
representation_size: int, set representation layer (pre-logits) if set, default: None
"""
def __init__(self,
image_size=224,
patch_size=16,
in_channels=3,
num_classes=1000,
embed_dim=768,
depth=12,
num_heads=12,
attn_head_size=None,
mlp_ratio=4,
qkv_bias=True,
dropout=0.,
attention_dropout=0.,
droppath=0.,
representation_size=None):
super().__init__()
# 先进行图像patch embedding操作
self.patch_embedding = PatchEmbedding(image_size,
patch_size,
in_channels,
embed_dim)
#位置编码
self.position_embedding = paddle.create_parameter(
shape=[1, 1 + self.patch_embedding.num_patches, embed_dim],
dtype='float32',
default_initializer=paddle.nn.initializer.TruncatedNormal(std=.02))
#创建cls_token
self.cls_token = paddle.create_parameter(
shape=[1, 1, embed_dim],
dtype='float32',
default_initializer=paddle.nn.initializer.TruncatedNormal(std=.02))
self.pos_dropout = nn.Dropout(dropout)
# 多头注意力机制
self.encoder = Encoder(embed_dim,
num_heads,
depth,
attn_head_size,
qkv_bias,
mlp_ratio,
dropout,
attention_dropout,
droppath)
# 可用可不用
if representation_size is not None:
self.num_features = representation_size
w_attr_1, b_attr_1 = self._init_weights()
self.pre_logits = nn.Sequential(
nn.Linear(embed_dim,
representation_size,
weight_attr=w_attr_1,
bias_attr=b_attr_1),
nn.ReLU())
else:
self.pre_logits = Identity()
# 分类器
w_attr_2, b_attr_2 = self._init_weights()
self.classifier = nn.Linear(embed_dim,
num_classes,
weight_attr=w_attr_2,
bias_attr=b_attr_2)
def _init_weights(self):
weight_attr = paddle.ParamAttr(
initializer=paddle.nn.initializer.Constant(1.0))
bias_attr = paddle.ParamAttr(
initializer=paddle.nn.initializer.Constant(0.0))
return weight_attr, bias_attr
def forward_features(self, x):
x = self.patch_embedding(x)
cls_tokens = self.cls_token.expand((x.shape[0], -1, -1))
##将cls_tokes添加在头部,注意顺序
x = paddle.concat((cls_tokens, x), axis=1)
#位置编码与输入序列相加
x = x + self.position_embedding
x = self.pos_dropout(x)
#编码处理
x = self.encoder(x)
x = self.pre_logits(x[:, 0]) # cls_token only
return x
def forward(self, x):
x = self.forward_features(x)
logits = self.classifier(x)
return logits
##模型输出
def build_vit():
model = VisionTransformer(image_size=train_parameters["image_size"],
patch_size=train_parameters["patch_size"],
in_channels=train_parameters["input_channels"],
num_classes=train_parameters["num_classes"],
embed_dim=train_parameters["embed_dim"],
depth=train_parameters["depth"],
num_heads=train_parameters["num_heads"],
attn_head_size=train_parameters["attn_head_size"],
mlp_ratio=train_parameters["mlp_ratio"],
qkv_bias=train_parameters["qkv_bias"],
dropout=train_parameters["dropout"],
attention_dropout=train_parameters["attention_dropout"],
droppath=train_parameters["droppath"],
representation_size=None)
return model
定义训练函数与验证函数。
##在多个GPU中计算结果
import logging
import paddle.distributed as dist
def all_reduce_mean(x):
"""perform all_reduce on Tensor for gathering results from multi-gpus"""
"""此处使用多GPU时使用,有点小问题
world_size = dist.get_world_size()
if world_size > 1:
x_reduce = paddle.to_tensor(x)
dist.all_reduce(x_reduce)
x_reduce = x_reduce / world_size
return x_reduce.item()
"""
return x
##定义训练函数
def train(dataloader,
model,
optimizer,
criterion,
epoch,
total_epochs,
total_batches,
debug_steps=100,
accum_iter=1):
time_st = time.time()
train_loss_meter = AverageMeter()
train_acc_meter = AverageMeter()
master_loss_meter = AverageMeter()
master_acc_meter = AverageMeter()
model.train()
optimizer.clear_grad()
for batch_id, data in enumerate(dataloader):
# get data
images = data[0]
label = data[1]
batch_size = images.shape[0]
# forward
output = model(images)
loss = criterion(output, label)
loss_value = loss.item()
if not math.isfinite(loss_value):
print("Loss is {}, stopping training".format(loss_value))
sys.exit(1)
loss = loss / accum_iter
# backward and step
loss.backward()
if ((batch_id + 1) % accum_iter == 0) or (batch_id + 1 == len(dataloader)):
optimizer.step()
optimizer.clear_grad()
pred = paddle.nn.functional.softmax(output)
acc = paddle.metric.accuracy(pred, label.unsqueeze(1)).item()
# sync from other gpus for overall loss and acc
"""
master_loss = all_reduce_mean(loss_value)
master_acc = all_reduce_mean(acc)
master_batch_size = all_reduce_mean(batch_size)
master_loss_meter.update(master_loss, master_batch_size)
master_acc_meter.update(master_acc, master_batch_size)
"""
train_loss_meter.update(loss_value, batch_size)
train_acc_meter.update(acc, batch_size)
if batch_id % debug_steps == 0 or batch_id + 1 == len(dataloader):
logger.info(f"Epoch[{epoch:03d}/{total_epochs:03d}], "
f"Step[{batch_id:04d}/{total_batches:04d}], "
f"Lr: {optimizer.get_lr():04f}")
logger.info(f"Loss: {loss_value:.4f} ({train_loss_meter.avg:.4f}), "
f"Avg Acc: {train_acc_meter.avg:.4f}")
#paddle.distributed.barrier()
train_time = time.time() - time_st
return (train_loss_meter.avg,
train_acc_meter.avg,
master_loss_meter.avg,
master_acc_meter.avg,
train_time)
##定义验证函数
def validate(dataloader,
model,
criterion,
total_batches,
debug_steps=100):
model.eval()
val_loss_meter = AverageMeter()
val_acc1_meter = AverageMeter()
val_acc5_meter = AverageMeter()
master_loss_meter = AverageMeter()
master_acc1_meter = AverageMeter()
master_acc5_meter = AverageMeter()
time_st = time.time()
for batch_id, data in enumerate(dataloader):
# get data
images = data[0]
label = data[1]
batch_size = images.shape[0]
output = model(images)
loss = criterion(output, label)
loss_value = loss.item()
pred = paddle.nn.functional.softmax(output)
acc1 = paddle.metric.accuracy(pred, label.unsqueeze(1)).item()
acc5 = paddle.metric.accuracy(pred, label.unsqueeze(1), k=5).item()
# sync from other gpus for overall loss and acc
"""
master_loss = all_reduce_mean(loss_value)
master_acc1 = all_reduce_mean(acc1)
master_acc5 = all_reduce_mean(acc5)
master_batch_size = all_reduce_mean(batch_size)
master_loss_meter.update(master_loss, master_batch_size)
master_acc1_meter.update(master_acc1, master_batch_size)
master_acc5_meter.update(master_acc5, master_batch_size)
"""
val_loss_meter.update(loss_value, batch_size)
val_acc1_meter.update(acc1, batch_size)
val_acc5_meter.update(acc5, batch_size)
if batch_id % debug_steps == 0:
local_message = (f"Step[{batch_id:04d}/{total_batches:04d}], "
f"Avg Loss: {val_loss_meter.avg:.4f}, "
f"Avg Acc@1: {val_acc1_meter.avg:.4f}, "
f"Avg Acc@5: {val_acc5_meter.avg:.4f}")
"""
master_message = (f"Step[{batch_id:04d}/{total_batches:04d}], "
f"Avg Loss: {master_loss_meter.avg:.4f}, "
f"Avg Acc@1: {master_acc1_meter.avg:.4f}, "
f"Avg Acc@5: {master_acc5_meter.avg:.4f}")
"""
logger.info(local_message)
#paddle.distributed.barrier()
val_time = time.time() - time_st
return (val_loss_meter.avg,
val_acc1_meter.avg,
val_acc5_meter.avg,
master_loss_meter.avg,
master_acc1_meter.avg,
master_acc5_meter.avg,
val_time)
主程序入口函数,执行main函数即可进行训练或验证。由于时间问题,没有跑完整个流程,此处仅供参考学习。
#主函数
def main():
##初始化配置
paddle.device.set_device('gpu')
#paddle.distributed.init_parallel_env()
#world_size = paddle.distributed.get_world_size()
paddle.seed(0)
init_log_config()
##建立模型
model = build_vit()
if train_parameters["mode"] == "train":
dataloader_train = get_dataloader(RiceImageDataset(train_parameters["data_dir"],True,train_transforms_vit(True)), True, False)
total_batch_train = len(dataloader_train)
logger.info(f"----- Total # of train batch (single gpu): {total_batch_train}")
dataloader_val = get_dataloader(RiceImageDataset(train_parameters["data_dir"],False,train_transforms_vit(False)), True, False)
total_batch_val = len(dataloader_val)
logger.info(f"----- Total # of val batch (single gpu): {total_batch_val}")
##损失函数与优化器定义
criterion = paddle.nn.CrossEntropyLoss()
if train_parameters["mode"] == "train": ##仅当训练时需要优化器进行更新
cosine_lr_scheduler = paddle.optimizer.lr.CosineAnnealingDecay(
learning_rate=train_parameters["base_lr"],
T_max=train_parameters["num_epochs"] - train_parameters["warmup_epochs"],
eta_min=train_parameters["end_lr"],
last_epoch=-1) ##使用初始学习率
lr_scheduler = paddle.optimizer.lr.LinearWarmup(
learning_rate=cosine_lr_scheduler,
warmup_steps=train_parameters["warmup_epochs"],
start_lr=train_parameters["warmup_start_lr"],
end_lr=train_parameters["base_lr"],
last_epoch=-1)
else:
lr_scheduler = paddle.optimizer.lr.CosineAnnealingDecay(
learning_rate=train_parameters["base_lr"],
T_max=train_parameters["num_epochs"],
eta_min=train_parameters["end_lr"],
last_epoch=-1) ##使用初始学习率
optimizer = paddle.optimizer.AdamW(
parameters=model.parameters(),
learning_rate=lr_scheduler, # set to scheduler
beta1=train_parameters["betas"][0],
beta2=train_parameters["betas"][1],
weight_decay=train_parameters["weight_decay"],
epsilon=train_parameters["eps"])
if train_parameters["pretrained"]:
model_state = paddle.load(train_parameters["pretrained_dir"])
model.set_state_dict(model_state)
logger.info(f'----- Pretrained: Load model state from {train_parameters["pretrained_dir"]}')
##分布式训练
#model = paddle.DataParallel(model)
##只需要验证,不训练
if train_parameters["mode"] != "train":
logger.info("----- Start Validation")
val_loss, val_acc1, val_acc5, avg_loss, avg_acc1, avg_acc5, val_time = validate(
dataloader=dataloader_val,
model=model,
criterion=criterion,
total_batches=total_batch_val,
debug_steps=100)
local_message = ("----- Validation: " +
f"Validation Loss: {val_loss:.4f}, " +
f"Validation Acc@1: {val_acc1:.4f}, " +
f"Validation Acc@5: {val_acc5:.4f}, " +
f"time: {val_time:.2f}")
logger.info(local_message)
return
##边训练边验证
logger.info("----- Start Train")
for epoch in range(train_parameters["num_epochs"]):
# Train one epoch
train_loss, train_acc, avg_loss, avg_acc, train_time = train(
dataloader=dataloader_train,
model=model,
optimizer=optimizer,
criterion=criterion,
epoch=epoch,
total_epochs=train_parameters["num_epochs"],
total_batches=total_batch_train,
debug_steps=100,
accum_iter=train_parameters["accum_iter"])
# update lr
lr_scheduler.step()
general_message = (f"----- Epoch[{epoch:03d}/{train_parameters['num_epochs']:03d}], "
f"Lr: {optimizer.get_lr():.4f}, "
f"time: {train_time:.2f}, "
f"Train Loss: {train_loss:.4f}, "
f"Train Acc: {train_acc:.4f}")
logger.info(general_message)
if epoch % train_parameters["debug_freq"] == 0 or epoch == train_parameters["num_epochs"]:
logger.info(f'----- Validation after Epoch: {epoch}')
val_loss, val_acc1, val_acc5, avg_loss, avg_acc1, avg_acc5, val_time = validate(
dataloader=dataloader_val,
model=model,
criterion=criterion,
total_batches=total_batch_val,
debug_steps=100)
local_message = (f'----- Epoch[{epoch:03d}/{(train_parameters["num_epochs"]):03d}], ' +
f"Validation Loss: {val_loss:.4f}, " +
f"Validation Acc@1: {val_acc1:.4f}, " +
f"Validation Acc@5: {val_acc5:.4f}, " +
f"time: {val_time:.2f}")
master_message = (f"----- Epoch[{epoch:03d}/{train_parameters['num_epochs']:03d}], " +
f"Validation Loss: {avg_loss:.4f}, " +
f"Validation Acc@1: {avg_acc1:.4f}, " +
f"Validation Acc@5: {avg_acc5:.4f}, " +
f"time: {val_time:.2f}")
logger.info(local_message)
logger.info(master_message)
if epoch % train_parameters["save_freq"] == 0 or epoch == train_parameters["num_epochs"]:
model_path = os.path.join(
train_parameters["save_path"], f"Epoch-{epoch}-Loss-{avg_loss}.pdparams")
state_dict = dict()
state_dict['model'] = model.state_dict()
state_dict['optimizer'] = optimizer.state_dict()
paddle.save(state_dict, model_path)
logger.info(f"----- Save model: {model_path}")
main()
5.模型评估
在上述训练中,我们开启了训练同时进行评估,因此可以不在进行评估。如果需要单独进行评估,可以运行以下代码。
%cd /home/aistudio/work/PaddleClas
!python3 tools/eval.py \
-c /home/aistudio/work/PaddleClas/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml \
-o Global.pretrained_model=/home/aistudio/work/PaddleClas/output/ResNet50_vd/best_model
6.模型预测
完成评估后,使用tools/infer.py脚本进行预测。
!python tools/infer.py \
-c /home/aistudio/work/PaddleClas/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml \
-o Global.pretrained_model=/home/aistudio/work/PaddleClas/output/ResNet50_vd/best_model \
-o Infer.infer_imgs=/home/aistudio/data/Arborio14953.jpg
7.模型导出
后续需要将模型部署到实际现场进行应用,需要先导出inference模型,使用预测引擎加载 inference 模型进行预测推理。
PaddleClas通过tools/export_model.py导出模型。导出后,将生成以下三个文件:
inference.pdmodel:存储网络结构;
inference.pdiparams:存储网络参数;
inference.pdiparams.info:存储模型的其他信息,一般可忽略
%cd /home/aistudio/work/PaddleClas/
!python tools/export_model.py \
-c /home/aistudio/work/PaddleClas/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml \
-o Global.pretrained_model=/home/aistudio/work/PaddleClas/output/ResNet50_vd/best_model \
-o Global.save_inference_dir=deploy/models/ResNet50_vd
8.模型推理
成功将模型导出后,即可利用deploy/python目录下的脚本进行推理。
%cd /home/aistudio/work/PaddleClas/
!python deploy/python/predict_cls.py -c deploy/configs/inference_cls.yaml
总结
-
利用PaddleClas进行水稻分类,可以达到很高的精度。
-
后续可以选择使用数据增强、预训练、其他模型等操作进行验证与优化。
-
对Transformer模型分类进行简单分析与实现,供大家参考。
-
后面学习如何部署模型。
感谢飞桨李文博导师给予的帮助与指导。
此文章为转载
原文链接
更多推荐
所有评论(0)