让部署又简单又快速:预训练模型的全流程优化FasterERNIE_副本
PaddleNLP FasterERNIE:预训练模型的全流程优化
预训练模型的全流程优化FasterERNIE
转载自AI Studio
标题 项目链接 https://aistudio.baidu.com/aistudio/projectdetail/3200308
近几年NLP Transformer类的模型发展迅速,各个NLP的基础技术和核心应用的核心技术基本上都被Transformer类的核心技术所替换。 学术上,目前各个NLP任务领域的SOTA效果基本都是由Transformer类刷新,但在落地应用上还面临上线困难的问题。 Transformer类文本预处理部分,主要存在以下两个因素影响Transformer训推一体的部署:
- 文本预处理部分逻辑复杂,C++端需要重新开发,成本高。 训练侧多为Python实现,进行产业级部署时需要重新进行迁移与对齐,目前业界缺少通用且高效的C++参考实现。
- 文本预处理效率Python实现与C++实现存在数量级上的差距,对产业实践场景有较大的价值。 在服务部署时Tokenizer的性能也是推动NLP相关模型的一个性能瓶颈,尤其是小型化模型部署如ERNIE-Tiny等,文本预处理耗时占总体预测时间高达30%。
基于以上两点原因,我们将常用预训练模型的文本预处理部分内置成Paddle底层算子——FasterTokenizer。 FasterTokenizer底层为C++实现,同时提供了python接口调用。其可以将文本转化为模型数值化输入。 同时,用户可以将其导出为模型的一部分,直接用于部署推理。实现了预训练模型包含高性能文本处理训推一体开发体验。
为了更好地实现预训练模型训推一体化,我们将FasterTokenizer内置到预训练模型ERNIE中,使模型计算图包含了高性能文本处理算子,在文本领域任务上提供了更加简洁易用的训推一体开发体验,同时Python部署具备更快的推理性能。 同时,FasterERNIE基于Paddle 2.2的Fused TransformerEncoder API功能,可以在NVDIA GPU上提供更快的训练与推理优化。
本项目源代码全部开源在 PaddleNLP 中。
如果对您有帮助,欢迎⭐️ star⭐️收藏一下,不易走丢哦! 链接指路:https://github.com/PaddlePaddle/PaddleNLP
文本处理算子FasterTokenizer
目前文本处理算子FasterTokenizer在PaddleNLP 2.2版本中有相应的实现。首先,我们需要安装paddlenlp 2.2版本以上。
!pip install --upgrade paddlenlp -i https://pypi.tuna.tsinghua.edu.cn/simple
# !pip install --upgrade numpy -i https://pypi.tuna.tsinghua.edu.cn/simple
import numpy
numpy.__version__
'1.19.5'
PaddleNLP 2.2版本提供了FasterTokenizer python API接口。
支持模型 | FasterTokenizer API Usage |
---|---|
ERNIE, Chinese | FasterTokenizer.from_pretrained("ernie-1.0") |
ERNIE 2.0 Base, English | FasterTokenizer.from_pretrained("ernie-2.0-en") |
ERNIE 2.0 Large, English | FasterTokenizer.from_pretrained("ernie-2.0-large-en") |
BERT-Base, Uncased | FasterTokenizer.from_pretrained("bert-base-uncased") |
BERT-Large, Uncased | FasterTokenizer.from_pretrained("bert-large-uncased") |
BERT-base, Cased | FasterTokenizer.from_pretrained("bert-base-cased") |
BERT-Large, Cased | FasterTokenizer.from_pretrained("bert-large-cased") |
BERT-Base, Multilingual Cased | FasterTokenizer.from_pretrained("bert-base-multilingual-cased") |
BERT-Base, Chinese | FasterTokenizer.from_pretrained("bert-base-chinese") |
BERT-Base (Whole Word Masking), Chinese | FasterTokenizer.from_pretrained("bert-wwm-chinese") |
BERT-Base ((Whole Word Masking, EXT Data), Chinese | FasterTokenizer.from_pretrained("bert-wwm-ext-chinese") |
RoBERTa-Base (Whole Word Masking, EXT Data), Chinese | FasterTokenizer.from_pretrained("roberta-wwm-ext") |
RoBERTa-Large (Whole Word Masking, EXT Data), Chinese | FasterTokenizer.from_pretrained("roberta-wwm-ext-large") |
import time
import numpy as np
import paddlenlp
from paddlenlp.transformers import ErnieTokenizer
from paddlenlp.experimental import FasterTokenizer
from paddlenlp.experimental import to_tensor
# ERNIE Tokenizer using PaddleNLP 2.2 FasterTokenizer
faster_tokenizer = FasterTokenizer.from_pretrained("ernie-1.0")
# PaddleNLP 2.1 ErnieTokenizer
tokenizer = ErnieTokenizer.from_pretrained("ernie-1.0")
text = '在世界几大古代文明中,中华文明源远流长、从未中断,至今仍充满蓬勃生机与旺盛生命力,这在人类历史上是了不起的奇迹。' \
'本固根深、一脉相承的历史文化是铸就这一奇迹的重要基础。先秦时期是中华文化的创生期,奠定了此后几千年中华文化发展的' \
'基础。'
length = len(text)
print(f"length:{length}")
f_input_ids, f_token_type_ids = faster_tokenizer(to_tensor([text]), max_seq_len=length)
f_input_ids, f_token_type_ids = f_input_ids.numpy()[0], f_token_type_ids.numpy()[0]
print(f"ernie faster tokenizer, f_input_ids: {f_input_ids}")
print(f"ernie faster tokenizer, f_token_type_ids: {f_token_type_ids}")
encoded_inputs = tokenizer(text, max_seq_len=length)
input_ids, token_type_ids = encoded_inputs["input_ids"], encoded_inputs["token_type_ids"]
print(f"ernie tokenizer, input_ids: {input_ids}")
print(f"ernie tokenizer, token_type_ids: {token_type_ids}")
print(f"f_input_ids == input_ids : {np.array_equal(f_input_ids, input_ids)}")
print(f"f_token_type_ids == token_type_ids : {np.array_equal(f_token_type_ids, token_type_ids)}")
从以上结果可以看出,FasterTokenizer 与 ErnieTokenizer输出结果是一致的。
FasterTokenizer通过以下3种方式,达到高性能文本预处理。
- 极致算子内存优化,减少数据分配与拷贝
- BPE分词结果Cache机制,常用字符预处理加速
- 批数据OMP多线程加速
那么,2.2版本FasterTokenizer相比于2.1版本性能提升有多少呢?
接下来,我们以一个简单示例说明。
我们以ERNIE Tokenizer作为示例,1000条数据,batch size=32, 对比循环处理10个epoch的吞吐率。
batch_size = 32
data = [text] * 1000
epochs = 10
batches = [
to_tensor(data[idx:idx + batch_size])
for idx in range(0, len(data), batch_size)
]
start = time.time()
for _ in range(epochs):
for batch_data in batches:
input_ids, token_type_ids = faster_tokenizer(
batch_data, max_seq_len=length)
end = time.time()
total_tokens = epochs * len(data) * length
print("The throughput of FasterTokenizer: {:,.2f} tokens/s".format((
total_tokens / (end - start))))
The throughput of FasterTokenizer: 9,016,956.58 tokens/s
batches = [
data[idx:idx + batch_size]
for idx in range(0, len(data), batch_size)
]
start = time.time()
for _ in range(epochs):
for batch_data in batches:
encoded_inputs = tokenizer(batch_data, max_seq_len=length)
end = time.time()
total_tokens = epochs * len(data) * length
print("The throughput of ErnieTokenizer: {:,.2f} tokens/s".format((
total_tokens / (end - start))))
The throughput of ErnieTokenizer: 136,282.50 tokens/s
从以上结果可以看出,FasterTokenizer 对比 ErnieTokenizer的性能提升非常明显。
同时,为了获得更好的推理部署体验,PaddleNLP 2.2版本将高性能文本处理FasterTokenizer内置到ERNIE模型内形成FasterERNIE模型。
下图是我们在Intel® Xeon® Gold 6271C CPU @ 2.60GHz 机器上使用16个线程,文本长度为128时,测试不同框架文本处理性能吞吐率。
FasterERNIE
我们以中文情感分类公开数据集ChnSentiCorp为示例数据集,展示FasterERNIE训练、预测、部署如何使用。
加载数据集和模型
FasterErnieForSequenceClassification
是ERNIE用于微调文本分类的常用网络结构。
FasterErnieForSequenceClassification
支持以下预训练模型:
支持模型 | FasterTokenizer API Usage |
---|---|
ERNIE, Chinese | FasterErnieForSequenceClassification.from_pretrained("ernie-1.0") |
ERNIE 2.0 Base, English | FasterErnieForSequenceClassification.from_pretrained("ernie-2.0-en") |
ERNIE 2.0 Large, English | FasterErnieForSequenceClassification.from_pretrained("ernie-2.0-large-en") |
import os
import paddle
import paddle.nn.functional as F
import paddlenlp as ppnlp
from paddlenlp.datasets import load_dataset
from paddlenlp.transformers import LinearDecayWithWarmup
from paddlenlp.experimental import FasterErnieForSequenceClassification, to_tensor
def create_dataloader(dataset, mode='train', batch_size=1):
def trans_fn(example):
return {
# 文本
"text": example["text"],
# 标签
"label": np.array(
example["label"], dtype="int64")
}
dataset.map(trans_fn)
shuffle = True if mode == 'train' else False
if mode == 'train':
batch_sampler = paddle.io.DistributedBatchSampler(
dataset, batch_size=batch_size, shuffle=shuffle)
else:
batch_sampler = paddle.io.BatchSampler(
dataset, batch_size=batch_size, shuffle=shuffle)
return paddle.io.DataLoader(dataset=dataset, batch_sampler=batch_sampler)
train_ds, dev_ds = load_dataset("chnsenticorp", splits=["train", "dev"])
max_seq_len = 128
batch_size = 32
# 加载模型
model = FasterErnieForSequenceClassification.from_pretrained(
'ernie-1.0',
num_classes=len(train_ds.label_list),
max_seq_len=max_seq_len)
# 加载训练集
train_data_loader = create_dataloader(
train_ds, mode='train', batch_size=batch_size)
# 加载验证集
dev_data_loader = create_dataloader(
dev_ds, mode='dev', batch_size=batch_size)
模型训练和评估
定义损失函数、优化器以及评价指标后,即可开始训练。
epochs = 3
learning_rate = 5e-5
warmup_proportion = 0.1
weight_decay = 0.01
num_training_steps = len(train_data_loader) * epochs
# 学习率变化策略
lr_scheduler = LinearDecayWithWarmup(learning_rate, num_training_steps, warmup_proportion)
# 所有的bias和LayerNorrm参数将不会权重衰减
decay_params = [
p.name for n, p in model.named_parameters()
if not any(nd in n for nd in ["bias", "norm"])
]
# 优化器
optimizer = paddle.optimizer.AdamW(
learning_rate=lr_scheduler,
parameters=model.parameters(),
weight_decay=weight_decay,
apply_decay_param_fun=lambda x: x in decay_params)
# 交叉熵损失
criterion = paddle.nn.loss.CrossEntropyLoss()
# accuaracy 评价指标
metric = paddle.metric.Accuracy()
@paddle.no_grad()
def evaluate(model, criterion, metric, data_loader):
# 模型评估
model.eval()
metric.reset()
losses = []
for batch in data_loader:
texts, labels = batch['text'], batch['label']
logits, predictions = model(texts)
loss = criterion(logits, labels)
losses.append(loss.numpy())
correct = metric.compute(logits, labels)
metric.update(correct)
accu = metric.accumulate()
print("eval loss: %.5f, accuracy: %.5f" % (np.mean(losses), accu))
model.train()
metric.reset()
# 开始模型训练
global_step = 0
tic_train = time.time()
total_train_time = 0
epoch = 3
# 训练模型保存的文件目录
ckpt_dir = "./ckpt"
for epoch in range(1, epoch+1):
for step, batch in enumerate(train_data_loader, start=1):
texts, labels = batch["text"], batch["label"]
logits, predictions = model(texts)
loss = criterion(logits, labels)
probs = F.softmax(logits, axis=1)
correct = metric.compute(logits, labels)
metric.update(correct)
acc = metric.accumulate()
# 梯度反向回传
loss.backward()
# 更新梯度
optimizer.step()
lr_scheduler.step()
optimizer.clear_grad()
global_step += 1
if global_step % 10 == 0:
time_diff = time.time() - tic_train
total_train_time += time_diff
print(
"global step %d, epoch: %d, batch: %d, loss: %.5f, accuracy: %.5f, speed: %.2f step/s"
% (global_step, epoch, step, loss, acc,
10 / time_diff))
tic_train = time.time()
if global_step % 100 == 0:
save_dir = os.path.join(ckpt_dir, "model_%d" % global_step)
if not os.path.exists(save_dir):
os.makedirs(save_dir)
evaluate(model, criterion, metric, dev_data_loader)
model.save_pretrained(save_dir)
tic_train = time.time()
预测
可以直接调用predict
函数即可输出预测结果。
data = [
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般",
"怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片",
"作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。"]
label_map = {0: 'negative', 1: 'positive'}
def predict(model, data, label_map, batch_size=1):
# 划分batch
batches = [
data[idx:idx + batch_size] for idx in range(0, len(data), batch_size)
]
results = []
model.eval()
for texts in batches:
# 预测
logits, preds = model(texts)
preds = preds.numpy()
labels = [label_map[i] for i in preds]
results.extend(labels)
return results
# 输出预测结果
results = predict(model, data, label_map, batch_size)
for idx, text in enumerate(data):
print(text, " : ", results[idx])
这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般 : negative
怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片 : negative
作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。 : positive
导出模型
我们可以将训练保存的模型,导出成静态图计算图,用于部署预测。
可以通过to_static()
接口一键导出模型。
save_path = os.path.join("export", "inference")
model.to_static(save_path)
[2022-02-11 11:32:41,988] [ INFO] - Already save the static model to the path export/inference
部署
导出模型之后,我们就可以利用Paddle预测库完成部署推理。
使用Python部署预测,相应的推理预测实现如下:
class Predictor(object):
def __init__(self,
save_path,
batch_size=32):
self.batch_size = batch_size
model_file = save_path + ".pdmodel"
params_file = save_path + ".pdiparams"
if not os.path.exists(model_file):
raise ValueError("The model file {} is not found.".format(
model_file))
if not os.path.exists(params_file):
raise ValueError("The params file {} is not found.".format(
params_file))
# 加载计算图
config = paddle.inference.Config(model_file, params_file)
config.enable_use_gpu(100, 0)
config.switch_use_feed_fetch_ops(False)
config.delete_pass("embedding_eltwise_layernorm_fuse_pass")
#
self.predictor = paddle.inference.create_predictor(config)
self.input_handle = self.predictor.get_input_handle(
self.predictor.get_input_names()[0])
self.output_handles = [
self.predictor.get_output_handle(name)
for name in self.predictor.get_output_names()
]
def predict(self, data, label_map):
# 输入文本数据
self.input_handle.copy_from_cpu(data)
# 预测
self.predictor.run()
# 输出结果
logits = self.output_handles[0].copy_to_cpu()
preds = self.output_handles[1].copy_to_cpu()
labels = [label_map[pred] for pred in preds]
return labels
batch_size = 1
predictor = Predictor(save_path, batch_size)
data = [
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般",
"怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片",
"作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。"]
label_map = {0: "negative", 1: "positive"}
batches = [
data[idx:idx + batch_size]
for idx in range(0, len(data), batch_size)
]
results = []
for batch in batches:
labels = predictor.predict(batch, label_map=label_map)
results.extend(labels)
for idx, text in enumerate(data):
print(text, " : ", results[idx])
这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般 : negative
怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片 : negative
作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。 : positive
W0211 11:32:42.008314 101 analysis_predictor.cc:795] The one-time configuration of analysis predictor failed, which may be due to native predictor called first and its configurations taken effect.
[1m[35m--- Running analysis [ir_graph_build_pass][0m
[1m[35m--- Running analysis [ir_graph_clean_pass][0m
[1m[35m--- Running analysis [ir_analysis_pass][0m
[32m--- Running IR pass [is_test_pass][0m
[32m--- Running IR pass [simplify_with_basic_ops_pass][0m
[32m--- Running IR pass [conv_affine_channel_fuse_pass][0m
[32m--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass][0m
[32m--- Running IR pass [conv_bn_fuse_pass][0m
[32m--- Running IR pass [conv_eltwiseadd_bn_fuse_pass][0m
[32m--- Running IR pass [multihead_matmul_fuse_pass_v2][0m
[32m--- Running IR pass [squeeze2_matmul_fuse_pass][0m
[32m--- Running IR pass [reshape2_matmul_fuse_pass][0m
[32m--- Running IR pass [flatten2_matmul_fuse_pass][0m
[32m--- Running IR pass [map_matmul_v2_to_mul_pass][0m
I0211 11:32:42.440791 101 fuse_pass_base.cc:57] --- detected 2 subgraphs
[32m--- Running IR pass [map_matmul_v2_to_matmul_pass][0m
[32m--- Running IR pass [map_matmul_to_mul_pass][0m
[32m--- Running IR pass [fc_fuse_pass][0m
I0211 11:32:42.443293 101 fuse_pass_base.cc:57] --- detected 2 subgraphs
[32m--- Running IR pass [fc_elementwise_layernorm_fuse_pass][0m
[32m--- Running IR pass [conv_elementwise_add_act_fuse_pass][0m
[32m--- Running IR pass [conv_elementwise_add2_act_fuse_pass][0m
[32m--- Running IR pass [conv_elementwise_add_fuse_pass][0m
[32m--- Running IR pass [transpose_flatten_concat_fuse_pass][0m
[32m--- Running IR pass [runtime_context_cache_pass][0m
[1m[35m--- Running analysis [ir_params_sync_among_devices_pass][0m
I0211 11:32:42.451716 101 ir_params_sync_among_devices_pass.cc:45] Sync params from CPU to GPU
[1m[35m--- Running analysis [adjust_cudnn_workspace_size_pass][0m
[1m[35m--- Running analysis [inference_op_replace_pass][0m
[1m[35m--- Running analysis [ir_graph_to_program_pass][0m
I0211 11:32:42.676571 101 analysis_predictor.cc:714] ======= optimize end =======
I0211 11:32:42.678894 101 naive_executor.cc:98] --- skip [feed], feed -> text
I0211 11:32:42.679934 101 naive_executor.cc:98] --- skip [linear_3.tmp_1], fetch -> fetch
I0211 11:32:42.679945 101 naive_executor.cc:98] --- skip [argmax_0.tmp_0], fetch -> fetch
从以上实现代码可以看到,推理不再需要对文本预处理,直接输入原文本即可得到相应的预测结果。
C++部署推理示例脚本如下:
#include <gflags/gflags.h>
#include <iostream>
#include <numeric>
#include "paddle/include/paddle_inference_api.h"
DEFINE_string(model_file, "", "Directory of the inference model.");
DEFINE_string(params_file, "", "Directory of the inference model.");
DEFINE_bool(use_gpu, true, "enable gpu");
template <typename T>
void GetOutput(paddle_infer::Predictor* predictor,
std::string output_name,
std::vector<T>* out_data) {
auto output = predictor->GetOutputHandle(output_name);
std::vector<int> output_shape = output->shape();
int out_num = std::accumulate(
output_shape.begin(), output_shape.end(), 1, std::multiplies<int>());
out_data->resize(out_num);
output->CopyToCpu(out_data->data());
}
int main(int argc, char* argv[]) {
google::ParseCommandLineFlags(&argc, &argv, true);
paddle_infer::Config config;
config.SetModel(FLAGS_model_file, FLAGS_params_file);
if (FLAGS_use_gpu) {
config.EnableUseGpu(100, 0);
}
auto pass_builder = config.pass_builder();
pass_builder->DeletePass("embedding_eltwise_layernorm_fuse_pass");
auto predictor = paddle_infer::CreatePredictor(config);
std::vector<std::string> data{
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般",
"请问:有些字打错了,我怎么样才可以回去编辑一下啊?",
"本次入住酒店的网络不是很稳定,断断续续,希望能够改进。"};
auto input_names = predictor->GetInputNames();
auto text = predictor->GetInputHandle(input_names[0]);
text->ReshapeStrings(data.size());
text->CopyStringsFromCpu(&data);
predictor->Run();
std::vector<float> logits;
std::vector<int64_t> preds;
auto output_names = predictor->GetOutputNames();
GetOutput(predictor.get(), output_names[0], &logits);
GetOutput(predictor.get(), output_names[1], &preds);
for (size_t i = 0; i < data.size(); i++) {
std::cout << data[i] << " : " << preds[i] << std::endl;
}
return 0;
}
以上C++推理预测代码可以看到,推理实现省去了复杂的Tokenizer C++实现,大大地减少了开发量。 经统计,PaddleNLP 2.2版本ERNIE/BERT推理部署代码相比于2.1版本节约94%的代码量。
C++ 详细部署贬义词脚本参考:https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/faster/faster_ernie/seq_cls
加入微信交流群,一起学习吧
现在就加入PaddleNLP的技术交流群(微信),一起交流NLP技术吧!
更多推荐
所有评论(0)