BeiT:当BERT用于图像任务——超越ViT新范式
基于Paddle实现BeiT-ft结构,top1 88.60%,可能是目前paddle开源模型精度最高的
·
🔥🔥🔥 欢迎关注飞桨自监督库 PASSL
PASSL包含 SimCLR,MoCo,BYOL,CLIP等基于对比学习的图像自监督算法以及 Vision-Transformer,Swin-Transformer,BEiT,CVT,T2T,MLP_Mixer等视觉Transformer算法
BEiT(arvix,code)
Hi guy 我们又见面了,这次来搞一篇模型 BEiT,我们看一下结构图
BEiT是用于图片的BERT,与ViT类似,不同是训练时候会对图片的patch加上随机masking,利用掩码方式让模型在输入损坏图片的时候也能够正确预测出图片所对应的visual token
现BEiT模型最高精度是top1 88.60%,可能是目前已知 paddle 开源模型精度最高的
# 从 BERT 说起
作者起名叫 BEiT,结合之前的 BERT,很明显作者想在 CV 复制 BERT 的成功
BERT 是什么?
BERT 是谷歌在18年的提出的,用于 NLP 的模型
BERT 在机器阅读理解顶级水平测试 SQuAD1.1中表现出惊人成绩,全部两个衡量指标全面超越人类,并且在11种不同的 NLP 测试中创造出最佳的成绩,包括将GLUE基准推至80.4%(绝对改进7.6%),MultiNLI准确度达到86.7% (绝对改进率5.6%)等。可以预见的是,BERT将为NLP带来里程碑式的改变,也是NLP领域近期最重要的进展
===> 深而窄的模型好,MLM 统治 NLP
🔥 BERT --> BEiT 从 NLP 到 CV 的演变
自然语言处理中,BERT通过把每个词汇通过 Word2Vec 转为 Token,随机遮盖 Token,让模型根据不完整的句子来预测 Token
那么计算机视觉领域是怎么做的呢?
图像领域很早之前就有这种类似的遮住然后预测被遮住的区域,比如self-supervised的一个领域 Image Inpainting
Inpainting通过 masked 图像某一部分,来让网络去自己补全
那么BEiT是怎么做的呢?
我们来梳理一下BEiT 的基本流程,按流程大概分两个部分
* Masked Image
首先和我们常见的 ViTs 结构一样,对 Origin 图进行 Patch 操作,划分不同的 Image Patches 操作,这一部分 PPViT 的公开课有讲解,大家可以听一下
接下来就是 BEiT 的骚操作了
随机挑选 Image Patches 进行Masking 操作,然后展平
这里 Masking 操作不是随机,而是采取 blockwise masking 方法
对展平的 Tokens 进行进一步操作,便于后续进入编码器
然后…,经过Encoder 之后生成一系列的 h 向量,对应每个被 masked 的 image patches,如下图所示,masked 了5个 patches,经过 Beit encode 生成对应5个输出向量 h2、h3、h6、h7、h14,然后进入一个 fc 层,预测一系列的数字
这个数字什么意思我们后面讲,我们先看一下上述的 forward 部分代码
class VisionTransformerForMaskedImageModeling(nn.Layer):
...
def forward_features(self, x, bool_masked_pos):
x = self.patch_embed(x, bool_masked_pos=bool_masked_pos)
batch_size, seq_len, _ = x.shape()
cls_tokens = self.cls_token.expand(batch_size, -1, -1)
mask_token = self.mask_token.expand(batch_size, seq_len, -1)
# replace the masked visual tokens by mask_token
w = bool_masked_pos.unsqueeze(-1).type_as(mask_token)
x = x * (1 - w) + mask_token * w
x = paddle.concat((cls_tokens, x), axis=1)
if self.pos_embed is not None:
x = x + self.pos_embed
x = self.pos_drop(x)
rel_pos_bias = self.rel_pos_bias() if self.rel_pos_bias is not None else None
for blk in self.blocks:
x = blk(x, rel_pos_bias=rel_pos_bias)
return self.norm(x)
def forward(self, x, bool_masked_pos, return_all_tokens=False):
x = self.forward_features(x, bool_masked_pos=bool_masked_pos)
x = x[:, 1:]
if return_all_tokens:
return self.lm_head(x)
else:
# return the masked tokens
return self.lm_head(x[bool_masked_pos])
* dVAE
我们讲一下这个生成的数字啥意思
这个叫 Visual tokens
怎么来的?原图通过 Tokenizer 生成一系列的 Visual tokens,这就不得不提到传说中的变分编码器结构了(熟悉 GAN 的同学秒懂),一句话visual tokens 是图像的隐变量,可以看做是图像的另一种表示,这里篇幅有限不细讲VAE
visual tokens包含着图像信息,与 Tokenizer 对应的是 Decoder,负责从Visual tokens复原出原图像
至此,模型部分讲完了
🔥 BEiT自监督训练
=== 兄弟,恭喜你很有耐心看到这里
上面模型结构说完了,下面我们看看怎么进行自监督训练
首先,对 Origin 图进行forward,大致是这样
生成预测的 visual tokens 是目标,我们要训练网络,预测被masked 的部分对应的 visual tokens 对应原图的 visual tokens,这个训练方法在自监督领域叫生成式学习
网络训练时候,优化步骤是先优化 dVAE,这个我们叫重构损失,通过优化 Tokenizer 和 Decoder 让dVAE 能够学习到更好的隐变量又能更好还原原图,然后再优化 Encoder 和 Masked Image Modeling Head,使得其能更好预测出对应的 visual tokens
够麻烦把,这就是为啥MAE 牛逼之处,MAE 真的大简之道,不过 BEiT 是以后很多基于 MIM(Masked Image Modeling)思想的基石,给当前基于对比学习为主的自监督领域带来了新流
这也表明了,NLP 领域的 MLM 范式,是可以迁移到 CV 领域的
👍 BEiT Fine Tune
self supervised 训练完后,得到 BEiT 的 Encoder 权重,我们在后面加个分类头来做下游分类任务
具体是 frozen 住上一步训练得到的 Encoder 权重,只训练 Linear分类器的权重
官方性能如下
本项目将带大家实现 BEiT-fine tuning 结构,来复现官方精度,并可以做自己任务的迁移学习
基础模块
- MLP 模块
- DropPath 模块
- PatchEmbed 模块
import math
from functools import partial
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
trunc_normal_ = nn.initializer.TruncatedNormal(std=0.02)
zeros_ = nn.initializer.Constant(value=0.0)
ones_ = nn.initializer.Constant(value=1.0)
def drop_path(inputs, drop_prob=0.0, training=False):
"""drop path op
Args:
input: tensor with arbitrary shape
drop_prob: float number of drop path probability, default: 0.0
training: bool, if current mode is training, default: False
Returns:
output: output tensor after drop path
"""
if drop_prob == 0.0 or not training:
return inputs
keep_prob = 1 - drop_prob
shape = (inputs.shape[0],) + (1,) * (inputs.ndim - 1)
random_tensor = keep_prob + paddle.rand(shape, dtype=inputs.dtype)
random_tensor = random_tensor.floor()
output = (
inputs.divide(keep_prob) * random_tensor
)
return output
class DropPath(nn.Layer):
"""DropPath class"""
def __init__(self, drop_prob=None):
super(DropPath, self).__init__()
self.drop_prob = drop_prob
def forward(self, inputs):
return drop_path(inputs, self.drop_prob, self.training)
class Mlp(nn.Layer):
"""MLP module
MLP using nn.Linear and activation is GELU, dropout is applied.
Ops: fc1 -> act -> dropout -> fc2 -> dropout
"""
def __init__(
self,
in_features,
hidden_features=None,
out_features=None,
act_layer=nn.GELU,
drop=0.0,
):
super().__init__()
out_features = out_features or in_features
hidden_features = hidden_features or in_features
self.fc1 = nn.Linear(in_features, hidden_features)
self.act = act_layer()
self.fc2 = nn.Linear(hidden_features, out_features)
self.drop = nn.Dropout(drop)
def forward(self, x):
x = self.fc1(x)
x = self.act(x)
x = self.drop(x)
x = self.fc2(x)
x = self.drop(x)
return x
class PatchEmbed(nn.Layer):
"""2D Image to Patch Embedding
Apply patch embeddings on input images. Embeddings is implemented using a Conv2D op.
"""
def __init__(
self,
img_size=224,
patch_size=16,
in_chans=3,
embed_dim=768,
norm_layer=None,
flatten=True,
):
super().__init__()
img_size = (img_size, img_size)
patch_size = (patch_size, patch_size)
self.img_size = img_size
self.patch_size = patch_size
self.grid_size = (img_size[0] // patch_size[0], img_size[1] // patch_size[1])
self.num_patches = self.grid_size[0] * self.grid_size[1]
self.flatten = flatten
self.proj = nn.Conv2D(
in_chans, embed_dim, kernel_size=patch_size, stride=patch_size
)
self.norm = norm_layer(embed_dim) if norm_layer else Identity()
def forward(self, x):
B, C, H, W = x.shape
assert (
H == self.img_size[0] and W == self.img_size[1]
), f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})."
x = self.proj(x)
if self.flatten:
x = x.flatten(2).transpose((0, 2, 1)) # BCHW -> BNC
x = self.norm(x)
return x
class Identity(nn.Layer):
"""Identity layer
The output of this layer is the input without any change.
Use this layer to avoid if condition in some forward methods
"""
def __init__(self):
super().__init__()
def forward(self, input):
return input
网络搭建
class Attention(nn.Layer):
"""Attention Layer"""
def __init__(
self,
dim,
num_heads=8,
qkv_bias=False,
attn_drop=0.0,
proj_drop=0.0,
window_size=None,
attn_head_dim=None,
):
super().__init__()
self.num_heads = num_heads
head_dim = dim // num_heads
if attn_head_dim is not None:
head_dim = attn_head_dim
all_head_dim = head_dim * self.num_heads
self.scale = head_dim ** -0.5
self.qkv = nn.Linear(dim, all_head_dim * 3, bias_attr=False)
if qkv_bias:
self.q_bias = paddle.create_parameter(
shape=[all_head_dim], dtype="float32", default_initializer=zeros_
)
self.v_bias = paddle.create_parameter(
shape=[all_head_dim], dtype="float32", default_initializer=zeros_
)
else:
self.q_bias = None
self.v_bias = None
if window_size:
self.window_size = window_size
self.num_relative_distance = (2 * window_size[0] - 1) * (
2 * window_size[1] - 1
) + 3
self.relative_position_bias_table = paddle.create_parameter(
shape=[self.num_relative_distance, num_heads],
dtype="float32",
default_initializer=zeros_,
) # 2*Wh-1 * 2*Ww-1, nH
coords_h = paddle.arange(window_size[0])
coords_w = paddle.arange(window_size[1])
coords = paddle.stack(paddle.meshgrid([coords_h, coords_w])) # 2, Wh, Ww
coords_flatten = paddle.flatten(coords, 1) # 2, Wh*Ww
relative_coords = coords_flatten.unsqueeze(
axis=2
) - coords_flatten.unsqueeze(
axis=1
)
relative_coords = relative_coords.transpose([1, 2, 0]) # Wh*Ww, Wh*Ww, 2
relative_coords[:, :, 0] += window_size[0] - 1
relative_coords[:, :, 1] += window_size[1] - 1
relative_coords[:, :, 0] *= 2 * window_size[1] - 1
relative_position_index = paddle.zeros(
[
window_size[0] * window_size[1] + 1,
window_size[0] * window_size[1] + 1,
],
dtype=relative_coords.dtype,
)
# Wh*Ww, Wh*Ww
relative_position_index[1:, 1:] = relative_coords.sum(-1)
relative_position_index[0, 0:] = self.num_relative_distance - 3
relative_position_index[0:, 0] = self.num_relative_distance - 2
relative_position_index[0, 0] = self.num_relative_distance - 1
self.register_buffer("relative_position_index", relative_position_index)
else:
self.window_size = None
self.relative_position_bias_table = None
self.relative_position_index = None
self.attn_drop = nn.Dropout(attn_drop)
self.proj = nn.Linear(all_head_dim, dim)
self.proj_drop = nn.Dropout(proj_drop)
def forward(self, x, rel_pos_bias):
B, N, C = x.shape
qkv_bias = None
if self.q_bias is not None:
qkv_bias = paddle.concat(
(self.q_bias, paddle.zeros_like(self.v_bias), self.v_bias)
)
qkv = F.linear(x=x, weight=self.qkv.weight, bias=qkv_bias)
qkv = qkv.reshape([B, N, 3, self.num_heads, -1]).transpose([2, 0, 3, 1, 4])
q, k, v = qkv[0], qkv[1], qkv[2]
q = q * self.scale
attn = q @ k.transpose([0, 1, 3, 2])
if self.relative_position_bias_table is not None:
relative_position_bias = self.relative_position_bias_table[
self.relative_position_index.reshape([-1])
].reshape(
[
self.window_size[0] * self.window_size[1] + 1,
self.window_size[0] * self.window_size[1] + 1,
-1,
]
) # Wh*Ww,Wh*Ww,nH
relative_position_bias = relative_position_bias.transpose(
[2, 0, 1]
) # nH, Wh*Ww, Wh*Ww
attn = attn + relative_position_bias.unsqueeze(axis=0)
if rel_pos_bias is not None:
attn = attn + rel_pos_bias
attn = F.softmax(attn, axis=-1)
attn = self.attn_drop(attn)
x = (attn @ v).transpose([0, 2, 1, 3]).reshape([B, N, -1])
x = self.proj(x)
x = self.proj_drop(x)
return x
class Block(nn.Layer):
def __init__(
self,
dim,
num_heads,
mlp_ratio=4.0,
qkv_bias=False,
drop=0.0,
attn_drop=0.0,
drop_path=0.0,
init_values=None,
act_layer=nn.GELU,
norm_layer=nn.LayerNorm,
window_size=None,
attn_head_dim=None,
):
super().__init__()
self.norm1 = norm_layer(dim)
self.attn = Attention(
dim,
num_heads=num_heads,
qkv_bias=qkv_bias,
attn_drop=attn_drop,
proj_drop=drop,
window_size=window_size,
attn_head_dim=attn_head_dim,
)
self.drop_path = DropPath(drop_path) if drop_path > 0.0 else Identity()
self.norm2 = norm_layer(dim)
mlp_hidden_dim = int(dim * mlp_ratio)
self.mlp = Mlp(
in_features=dim,
hidden_features=mlp_hidden_dim,
act_layer=act_layer,
drop=drop,
)
if init_values:
self.gamma_1 = paddle.create_parameter(
shape=[dim],
dtype="float32",
default_initializer=nn.initializer.Constant(value=init_values),
)
self.gamma_2 = paddle.create_parameter(
shape=[dim],
dtype="float32",
default_initializer=nn.initializer.Constant(value=init_values),
)
else:
self.gamma_1, self.gamma_2 = None, None
def forward(self, x, rel_pos_bias):
if self.gamma_1 is None:
x = x + self.drop_path(self.attn(self.norm1(x), rel_pos_bias=rel_pos_bias))
x = x + self.drop_path(self.mlp(self.norm2(x)))
else:
x = x + self.drop_path(
self.gamma_1 * self.attn(self.norm1(x), rel_pos_bias=rel_pos_bias)
)
x = x + self.drop_path(self.gamma_2 * self.mlp(self.norm2(x)))
return x
class RelativePositionBias(nn.Layer):
def __init__(self, window_size, num_heads):
super().__init__()
self.window_size = window_size
self.num_relative_distance = (2 * window_size[0] - 1) * (
2 * window_size[1] - 1
) + 3
self.relative_position_bias_table = paddle.create_parameter(
shape=[self.num_relative_distance, num_heads],
dtype="float32",
default_initializer=zeros_,
) # 2*Wh-1 * 2*Ww-1, nH
coords_h = paddle.arange(window_size[0])
coords_w = paddle.arange(window_size[1])
coords = paddle.stack(paddle.meshgrid([coords_h, coords_w])) # 2, Wh, Ww
coords_flatten = paddle.flatten(coords, 1) # 2, Wh*Ww
relative_coords = coords_flatten.unsqueeze(axis=2) - coords_flatten.unsqueeze(
axis=1
) # 2, Wh*Ww, Wh*Ww
relative_coords = relative_coords.transpose([1, 2, 0]) # Wh*Ww, Wh*Ww, 2
relative_coords[:, :, 0] += window_size[0] - 1 # shift to start from 0
relative_coords[:, :, 1] += window_size[1] - 1
relative_coords[:, :, 0] *= 2 * window_size[1] - 1
relative_position_index = paddle.zeros(
[window_size[0] * window_size[1] + 1, window_size[0] * window_size[1] + 1]
)
relative_position_index[1:, 1:] = relative_coords.sum(-1) # Wh*Ww, Wh*Ww
relative_position_index[0, 0:] = self.num_relative_distance - 3
relative_position_index[0:, 0] = self.num_relative_distance - 2
relative_position_index[0, 0] = self.num_relative_distance - 1
self.register_buffer("relative_position_index", relative_position_index)
class Beit(nn.Layer):
"""Beit Layer"""
def __init__(
self,
img_size=224,
patch_size=16,
in_chans=3,
num_classes=1000,
embed_dim=768,
depth=12,
num_heads=12,
mlp_ratio=4.0,
qkv_bias=True,
drop_rate=0.0,
attn_drop_rate=0.0,
drop_path_rate=0.0,
norm_layer=partial(nn.LayerNorm, epsilon=1e-6),
init_values=None,
use_abs_pos_emb=True,
use_rel_pos_bias=False,
use_shared_rel_pos_bias=False,
use_mean_pooling=True,
init_scale=0.001,
):
super().__init__()
self.num_classes = num_classes
self.num_features = self.embed_dim = embed_dim
self.patch_embed = PatchEmbed(
img_size=img_size,
patch_size=patch_size,
in_chans=in_chans,
embed_dim=embed_dim,
)
num_patches = self.patch_embed.num_patches
self.cls_token = paddle.create_parameter(
shape=[1, 1, embed_dim],
dtype="float32",
default_initializer=trunc_normal_,
)
if use_abs_pos_emb:
self.pos_embed = paddle.create_parameter(
shape=[1, num_patches + 1, embed_dim],
dtype="float32",
default_initializer=trunc_normal_,
)
else:
self.pos_embed = None
self.pos_drop = nn.Dropout(p=drop_rate)
if use_shared_rel_pos_bias:
self.rel_pos_bias = RelativePositionBias(
window_size=self.patch_embed.grid_size, num_heads=num_heads
)
else:
self.rel_pos_bias = None
dpr = [x.item() for x in paddle.linspace(0, drop_path_rate, depth)]
self.use_rel_pos_bias = use_rel_pos_bias
self.blocks = nn.LayerList(
[
Block(
dim=embed_dim,
num_heads=num_heads,
mlp_ratio=mlp_ratio,
qkv_bias=qkv_bias,
drop=drop_rate,
attn_drop=attn_drop_rate,
drop_path=dpr[i],
norm_layer=norm_layer,
init_values=init_values,
window_size=self.patch_embed.grid_size
if use_rel_pos_bias
else None,
)
for i in range(depth)
]
)
self.norm = Identity() if use_mean_pooling else norm_layer(embed_dim)
self.fc_norm = norm_layer(embed_dim) if use_mean_pooling else None
self.head = nn.Linear(embed_dim, num_classes) if num_classes > 0 else Identity()
self.apply(self._init_weights)
self.fix_init_weight()
if isinstance(self.head, nn.Linear):
trunc_normal_(self.head.weight)
self.head.weight.set_value(
self.head.weight.multiply(paddle.to_tensor(init_scale))
)
self.head.bias.set_value(
self.head.bias.multiply(paddle.to_tensor(init_scale))
)
def fix_init_weight(self):
def rescale(param, layer_id):
param.set_value(param.divide(paddle.to_tensor(math.sqrt(2.0 * layer_id))))
for layer_id, layer in enumerate(self.blocks):
rescale(layer.attn.proj.weight, layer_id + 1)
rescale(layer.mlp.fc2.weight, layer_id + 1)
def _init_weights(self, m):
if isinstance(m, nn.Linear):
trunc_normal_(m.weight)
if isinstance(m, nn.Linear) and m.bias is not None:
zeros_(m.bias)
elif isinstance(m, nn.LayerNorm):
zeros_(m.bias)
ones_(m.weight)
def forward_features(self, x):
x = self.patch_embed(x)
batch_size, seq_len, _ = x.shape
cls_tokens = self.cls_token.expand([batch_size, -1, -1])
x = paddle.concat((cls_tokens, x), axis=1)
if self.pos_embed is not None:
x = x + self.pos_embed
x = self.pos_drop(x)
rel_pos_bias = self.rel_pos_bias() if self.rel_pos_bias is not None else None
for blk in self.blocks:
x = blk(x, rel_pos_bias=rel_pos_bias)
x = self.norm(x)
if self.fc_norm is not None:
t = x[:, 1:, :]
return self.fc_norm(t.mean(1))
else:
return x[:, 0]
def forward(self, x):
x = self.forward_features(x)
x = self.head(x)
return x
模型定义
def beit_base_patch16_224(**kwargs):
crop_pct = 0.9
model = Beit(
img_size=224,
patch_size=16,
embed_dim=768,
depth=12,
num_heads=12,
mlp_ratio=4.0,
use_abs_pos_emb=False,
use_rel_pos_bias=True,
init_values=0.1,
**kwargs
)
return model
def beit_base_patch16_384(**kwargs):
crop_pct = 1.0
model = Beit(
img_size=384,
patch_size=16,
embed_dim=768,
depth=12,
num_heads=12,
mlp_ratio=4.0,
use_abs_pos_emb=False,
use_rel_pos_bias=True,
init_values=0.1,
**kwargs
)
return model
def beit_large_patch16_224(**kwargs):
crop_pct = 0.9
model = Beit(
img_size=224,
patch_size=16,
embed_dim=1024,
depth=24,
num_heads=16,
mlp_ratio=4.0,
use_abs_pos_emb=False,
use_rel_pos_bias=True,
init_values=1e-5,
**kwargs
)
return model
def beit_large_patch16_384(**kwargs):
crop_pct = 1.0
model = Beit(
img_size=384,
patch_size=16,
embed_dim=1024,
depth=24,
num_heads=16,
mlp_ratio=4.0,
use_abs_pos_emb=False,
use_rel_pos_bias=True,
init_values=1e-5,
**kwargs
)
return model
def beit_large_patch16_512(**kwargs):
crop_pct = 1.0
model = Beit(
img_size=512,
patch_size=16,
embed_dim=1024,
depth=24,
num_heads=16,
mlp_ratio=4.0,
use_abs_pos_emb=False,
use_rel_pos_bias=True,
init_values=1e-5,
**kwargs
)
return model
模型权重加载
# beit base 224
m = beit_base_patch16_224()
m.set_state_dict(paddle.load('/home/aistudio/data/data110564/beit_base_patch16_224_ft22kto1k.pdparams'))
# beit base 384
m = beit_base_patch16_384()
m.set_state_dict(paddle.load('/home/aistudio/data/data110564/beit_base_patch16_384_ft22kto1k.pdparams'))
# beit large 224
m = beit_large_patch16_224()
m.set_state_dict(paddle.load('/home/aistudio/data/data110564/beit_large_patch16_224_ft22kto1k.pdparams'))
# beit large 384
m = beit_large_patch16_384()
m.set_state_dict(paddle.load('/home/aistudio/data/data110564/beit_large_patch16_384_ft22kto1k.pdparams'))
# beit large 512
m = beit_large_patch16_512()
m.set_state_dict(paddle.load('/home/aistudio/data/data110564/beit_large_patch16_512_ft22kto1k.pdparams'))
利用PASSL调用BEiT
PASSL 地址 github
# 下载依赖包
! pip install ftfy
# 克隆 passl 库
# !git clone https://github.com/PaddlePaddle/PASSL.git
# 如果上述太慢,可以解压已下载的PASSL压缩包
!unzip -oq /home/aistudio/PASSL-main.zip
# 进入主目录
%cd PASSL-main/
import paddle.nn as nn
from passl.modeling.backbones import build_backbone
from passl.modeling.heads import build_head
from passl.utils.config import get_config
class Model(nn.Layer):
def __init__(self, cfg_file):
super().__init__()
cfg = get_config(cfg_file)
self.backbone = build_backbone(cfg.model.architecture)
self.head = build_head(cfg.model.head)
def forward(self, x):
x = self.backbone(x)
x = self.head(x)
return x
cfg_file = "configs/beit/beit_base_p16_224.yaml"
m = Model(cfg_file)
Arch | Weight | Top-1 Acc | Top-5 Acc | Crop ratio | # Params |
---|---|---|---|---|---|
beit_base_p16_224 | ft 22k to 1k | 85.21 | 97.66 | 0.9 | 87M |
beit_base_p16_384 | ft 22k to 1k | 86.81 | 98.14 | 1.0 | 87M |
beit_large_p16_224 | ft 22k to 1k | 87.48 | 98.30 | 0.9 | 304M |
beit_large_p16_384 | ft 22k to 1k | 88.40 | 98.60 | 1.0 | 304M |
beit_large_p16_512 | ft 22k to 1k | 88.60 | 98.66 | 1.0 | 304M |
利用ppma进行模型精度验证
! pip install ppma
# 解压 ImageNet 数据集
! tar -xf /home/aistudio/data/data96753/ILSVRC2012_img_val.tar -C /home/aistudio/data/data96753
import ppma
import paddle
m = beit_large_patch16_512()
m.set_state_dict(paddle.load('/home/aistudio/data/data110564/beit_large_patch16_512_ft22kto1k.pdparams'))
data_path = "/home/aistudio/data/data96753"
| [ft 22k to 1k](https://passl.bj.bcebos.com/vision_transformers/beit/beit_large_p16_512_ft.pdparams) | 88.60 | 98.66 | 1.0 | 304M |
# 利用[ppma](https://github.com/lmk123568/Paddle_Model_Analysis)进行模型精度验证
```python
! pip install ppma
# 解压 ImageNet 数据集
! tar -xf /home/aistudio/data/data96753/ILSVRC2012_img_val.tar -C /home/aistudio/data/data96753
import ppma
import paddle
m = beit_large_patch16_512()
m.set_state_dict(paddle.load('/home/aistudio/data/data110564/beit_large_patch16_512_ft22kto1k.pdparams'))
data_path = "/home/aistudio/data/data96753"
ppma.imagenet.val(m, data_path, batch_size=32 ,img_size=512, crop_pct=1.0, normalize='inception')
更多推荐
已为社区贡献1438条内容
所有评论(0)