【复现赛 NLP】 Canine 模型

In [2]import osprint(f"current dir at {os.getcwd()}“)if os.getcwd() == ‘/home/aistudio’:os.chdir(”./work/canine_paddle")print(f"changing working dir into {os.getcwd()}")current dir at /home/aistudioch

AI Studio

879人浏览 · 2022-06-24 01:35:08

AI Studio · 2022-06-24 01:35:08 发布

In [2]

在开始之前，请运行它来切换你的工作目录！

import os
print(f"current dir at {os.getcwd()}“)
if os.getcwd() == ‘/home/aistudio’:
os.chdir(”./work/canine_paddle")
print(f"changing working dir into {os.getcwd()}")
current dir at /home/aistudio
changing working dir into /home/aistudio/work/canine_paddle
Canine Paddle 实现

简介
世界上存在海量的语言与词汇，在处理多语言场景时，传统预训练模型采用的 Vocab 和 Tokenization 方案难免会遇到 out of vocabulary 和 unkonw token 的情况。 Canine 提供了 tokenization-free 的预训练模型方案，提高了模型在多语言任务下的能力。

论文链接：CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

Canine Paddle 复现仓库地址：kevinng77/canine_paddle

2.数据集和复现精度
本次复现使用 tydiqa 数据集 tydiqa 官方repo，数据处理操作参考了 canine/tydiqa 官方。

TydiQA 为多语言阅读理解数据集。Tydi数据库中包含了 18万+篇 wiki 百科语料，20万+ 文章与问题对，共涉及 11 种不同的语言。Canine 在TydiQA 上实现了 Selection Passage Task 66% F1及 Minimum Answer Span Task 58% F1 的精度，比 TydiQA 基线（mBERT）高出约 2%。

TydiQA 任务 Canine 论文精度本仓库复现精度
Passage Selection Task (SELECTP) 66.0% 65.92%
Minimal Answer Span Task (MINSPAN) 52.8% 55.04%
指标为macro F1；本仓库展示的复现结果为多次微调、预测、评估后的平均值。

3.环境准备
3.1 设备环境
框架：paddlepaddle==2.3.0
硬件：前期模型权重转换、数据预处理等可以在cpu环境下执行；微调则需要 GPU V100 32G + 内存 32GB；
In [3]

复现 TydiQA 任务需要用到 h5py, absl-py 库

!pip install paddlenlp2.3.1 h5py absl-py -q
ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
parl 1.4.1 requires pyzmq18.1.1, but you have pyzmq 22.3.0 which is incompatible.
WARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
You should consider upgrading via the ‘/opt/conda/envs/python35-paddle120-env/bin/python -m pip install --upgrade pip’ command.
In [2]

权重转换需要：（微调时或者在 Aistudio GPU 环境下请勿安装）

!pip install transformers4.19.2 torch1.11.0 -q

WARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
You should consider upgrading via the ‘/opt/conda/envs/python35-paddle120-env/bin/python -m pip install --upgrade pip’ command.
3.2 模型准备
3.2.1 模型框架编写
编写模型的 modeling.py 以及 tokenizer.py 文件。通过阅读和观察 Canine论文及源码，会发现 Canine 模型框架与 Bert 类似，都采用了 Transformer Encoder 作为模型主体。不同的是 Canine：

在 Transformer Encoder 前后都采用了卷积层来对序列进行压缩和采样。
使用了 Hash Embedding 来减小模型的大小。
在 Embedding 和 Transformer Encoder 之间加入了 Local Attention Encoder，来对局部的字符信息进行学习。
使用 Unicode 进行编码，而非 WordPiece或BPE。
基于以上四点，模型搭建思路也就很明确了：先参考 paddlenlp/transformers/bert/modeling.py 文件进行修改，实现 Canine 框架的主体，而后分别添加 Hash_Embedding, Local_Attention_Encoder，Projection_Conv 来实现以上四个不同点。

编写后的模型放置与 paddlenlp/transformers/canine/modeling.py

3.2.2 预训练权重转换
【提示】paddle权重已上传，使用 model.from_pretrained(‘canine-s’) 时会自动下载。因此若想直接体验微调流程，可跳过此步。

根据torch 权重进行预训练权重转换

首先下载 hugging face的canine torch 权重：
In [ ]

1. 执行以下代码，下载 torch 权重到本地：

!mkdir -p data/huggingface_weight || echo “dir exist”
!wget -O data/huggingface_weight/model.bin https://huggingface.co/google/canine-s/resolve/main/pytorch_model.bin
根据 huggingface canine 和我们实现的canine 框架构造权重映射，具体可查看 reproduction_utils 中的 weight_mapping.py 文件。该步骤生成 torch_paddle_layer_map.json 文件，该文件储存了 Canine 每个 torch Layer 权重对应的 paddle Layer 名称以及形状，方便查看与debug。

运行 convert_weight.py 转换权重：

In [5]

2. 进行权重转换：

!python -m reproduction_utils.weight_convert_files.convert_weight
–pytorch_checkpoint_path=data/huggingface_weight/model.bin
–paddle_dump_path=data/paddle_weight/model_state.pdparams
–layer_mapping_file=reproduction_utils/weight_convert_files/torch_paddle_layer_map.json
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module’s documentation for alternative uses
import imp

converting torch weight at data/huggingface_weight/model.bin
to paddle weight at data/paddle_weight/model_state.pdparams
based on layer mapping file reproduction_utils/weight_convert_files/torch_paddle_layer_map.json
3.2.3 前向传导核对精度
（执行该步需要安装torch与transformers）请在 CPU 环境下进行前项传导核对。

验证过程中，运行 torch canine 模型时会出现 Using unk_token, but it is not set yet. ，属于正常提示。经过多次随机样本验证，paddle模型与huggingface模型精度能保持在 e−5e^{-5}e
−5
至 e−7e^{-7}e
−7
级别。

In [6]

Tokenzier 对齐验证

!python -m reproduction_utils.token_check
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module’s documentation for alternative uses
import imp
Downloading: 100%|██████████████████████████████| 657/657 [00:00<00:00, 480kB/s]
Downloading: 100%|██████████████████████████████| 854/854 [00:00<00:00, 626kB/s]
Downloading: 100%|██████████████████████████████| 670/670 [00:00<00:00, 454kB/s]
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.

running tokenizer check
pd inputs: [57344, 67, 97, 110, 105, 110, 101, 32, 109, 111, 100, 101, 108, 32, 105, 115, 32, 116, 111, 107, 101, 110, 105, 122, 97, 116, 105, 111, 110, 45, 102, 114, 101, 101, 46, 57345]
pt inputs: [57344, 67, 97, 110, 105, 110, 101, 32, 109, 111, 100, 101, 108, 32, 105, 115, 32, 116, 111, 107, 101, 110, 105, 122, 97, 116, 105, 111, 110, 45, 102, 114, 101, 101, 46, 57345]
torch token matched paddle? True
torch token matched paddle? True
In [9]

模型对齐验证，从 canine_paddle/data/paddle_weight 加载预训练权重

!python -m reproduction_utils.forward_ppg_check --model_dir=“./data/paddle_weight”
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module’s documentation for alternative uses
import imp

running forward propagation check, 10 random samples
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
mean diff: tensor(3.4447e-07) max diff tensor(3.5446e-06)
mean diff: tensor(3.4213e-07) max diff tensor(3.1590e-06)
mean diff: tensor(3.5274e-07) max diff tensor(4.5598e-06)
mean diff: tensor(3.0772e-07) max diff tensor(3.4422e-06)
mean diff: tensor(6.0501e-07) max diff tensor(7.4301e-06)
mean diff: tensor(3.3807e-07) max diff tensor(2.8983e-06)
mean diff: tensor(4.3866e-07) max diff tensor(5.2536e-06)
mean diff: tensor(4.2190e-07) max diff tensor(3.6359e-06)
mean diff: tensor(5.1109e-07) max diff tensor(9.9614e-06)
mean diff: tensor(3.4987e-07) max diff tensor(3.4571e-06)
3.3 数据准备
3.3.1 下载Tydi数据集
In [10]
!mkdir data/tydi || echo “dir exist”
!wget -O data/tydi/tydiqa-v1.0-dev.jsonl.gz https://storage.googleapis.com/tydiqa/v1.0/tydiqa-v1.0-dev.jsonl.gz
!wget -O data/tydi/tydiqa-v1.0-train.jsonl.gz https://storage.googleapis.com/tydiqa/v1.0/tydiqa-v1.0-train.jsonl.gz
–2022-06-19 16:33:34-- https://storage.googleapis.com/tydiqa/v1.0/tydiqa-v1.0-dev.jsonl.gz
正在解析主机 storage.googleapis.com (storage.googleapis.com)… 142.251.43.16, 172.217.160.112, 172.217.160.80, …
正在连接 storage.googleapis.com (storage.googleapis.com)|142.251.43.16|:443… 已连接。
已发出 HTTP 请求，正在等待回应… 200 OK
长度： 160614310 (153M) [application/gzip]
正在保存至: “data/tydi/tydiqa-v1.0-dev.jsonl.gz”

data/tydi/tydiqa-v1 100%[===================>] 153.17M 3.68MB/s in 39s

2022-06-19 16:34:16 (3.91 MB/s) - 已保存 “data/tydi/tydiqa-v1.0-dev.jsonl.gz” [160614310/160614310])

–2022-06-19 16:34:16-- https://storage.googleapis.com/tydiqa/v1.0/tydiqa-v1.0-train.jsonl.gz
正在解析主机 storage.googleapis.com (storage.googleapis.com)… 172.217.163.48, 142.251.43.16, 172.217.160.112, …
正在连接 storage.googleapis.com (storage.googleapis.com)|172.217.163.48|:443… 已连接。
已发出 HTTP 请求，正在等待回应… 200 OK
长度： 1729651634 (1.6G) [application/gzip]
正在保存至: “data/tydi/tydiqa-v1.0-train.jsonl.gz”

data/tydi/tydiqa-v1 100%[===================>] 1.61G 2.97MB/s in 13m 15s

2022-06-19 16:47:33 (2.07 MB/s) - 已保存 “data/tydi/tydiqa-v1.0-train.jsonl.gz” [1729651634/1729651634])

3.3.2 处理数据集
该步骤耗时约4小时，建议在 CPU 环境上运行，数据将保存在 work/canine_paddle/data 文件夹中，因此在切换 AiStudio 环境时不会丢失。

备注：详细的数据处理配置在 tydi_canine 文件夹中查看。

方案一：直接下载并解压处理好的训练和测试数据。

链接：https://pan.baidu.com/s/1QVHh3cTztKAgAEEXUlqxWg?pwd=ia6i ；提取码：ia6i
下载后将两个 h5df 数据库放在 data/tydi 目录，如下：

./canine_paddle # 仓库根目录
|–data # 仓库数据目录
| ├── tydi # tydi数据
| ├── dev.h5df # 从tydiqa-v1.0-dev.jsonl.gz提取的测试数据
| ├── train.h5df # 从tydiqa-v1.0-train.jsonl.gz提取的训练数据
你可以考虑将文件保存到自己的百度云盘，而后使用 bypy 库下载到 Aistuio 上。

方案二：处理官方的原数据集

执行以下代码生成测试数据集 dev.h5df，用时约40分钟，生成数据大小2.5GB，包括35万+个样本。

In [4]
!python3 -m tydi_canine.prepare_tydi_data
–input_jsonl=“**/tydiqa-v1.0-dev.jsonl.gz”
–output_dir=data/tydi/dev.h5df
–max_seq_length=2048
–doc_stride=512
–max_question_length=256
–logging_steps=2000
–is_training=false
I0619 21:25:15.262827 140499667293952 prepare_tydi_data.py:134] >>> input features will be store at data/tydi/dev.h5df
I0619 21:25:15.620631 140499667293952 pd_io.py:53] >>> loading file from data/tydi/tydiqa-v1.0-dev.jsonl.gz
I0619 21:25:15.668764 140499667293952 prepare_tydi_data.py:187] Examples processed: 0
I0619 21:27:27.986339 140499667293952 prepare_tydi_data.py:187] Examples processed: 2000
I0619 21:29:35.484807 140499667293952 prepare_tydi_data.py:187] Examples processed: 4000
I0619 21:32:27.297904 140499667293952 prepare_tydi_data.py:187] Examples processed: 6000
I0619 21:34:35.204277 140499667293952 prepare_tydi_data.py:187] Examples processed: 8000
I0619 21:37:27.616865 140499667293952 prepare_tydi_data.py:187] Examples processed: 10000
I0619 21:39:44.970473 140499667293952 prepare_tydi_data.py:187] Examples processed: 12000
I0619 21:41:51.723424 140499667293952 prepare_tydi_data.py:187] Examples processed: 14000
I0619 21:44:40.178485 140499667293952 prepare_tydi_data.py:187] Examples processed: 16000
I0619 21:46:47.189610 140499667293952 prepare_tydi_data.py:187] Examples processed: 18000
I0619 21:48:02.327348 140499667293952 prepare_tydi_data.py:211] Examples with correct context retained: 9212 of 18670
I0619 21:48:02.327539 140499667293952 prepare_tydi_data.py:216] Number of total features 336499
time cose: 22.80 min
执行以下代码生成训练数据集 train.h5df，用时约3小时，生成数据大小1.4GB，包括46万+ 个样本。

In [ ]
!python3 -m tydi_canine.prepare_tydi_data
–input_jsonl=“**/tydiqa-v1.0-train.jsonl.gz”
–output_dir=data/tydi/train.h5df
–max_seq_length=2048
–doc_stride=512
–max_question_length=256
–include_unknowns=0.1
–logging_steps=30000
–is_training=true
In [ ]

处理完数据后可以将 **/tydiqa-v1.0-train.jsonl.gz 文件删除，之后不需要用到

!rm ./data/tydi/tydiqa-v1.0-train.jsonl.gz
4 模型使用
4.1 使用案例
In [ ]

考虑到 aistudio 下载 huggingface 权重较慢，项目上传了 canine-s 的权重。

运行该代码，移动模型权重到 paddlenlp 缓存位置

!mkdir -p /home/aistudio/.paddlenlp/models/canine-s/
!cp …/model_state.pdparams /home/aistudio/.paddlenlp/models/canine-s/
In [22]
from canine import CanineTokenizer
from canine import CanineModel
import paddle

tokenizer = CanineTokenizer.from_pretrained(“canine-s”)
model = CanineModel.from_pretrained(“canine-s”)
text = [“canine is tokenization-free”]

inputs = tokenizer(text,
padding=“longest”,
return_attention_mask=True,
return_token_type_ids=True, )
pd_inputs = {k: paddle.to_tensor(v) for (k, v) in inputs.items()}
seq_outputs, pooling_outputs = model(**pd_inputs)
print(seq_outputs.shape)
print(pooling_outputs.shape)
[2022-06-19 21:14:47,553] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/canine-s/model_state.pdparams
[1, 29, 768]
[1, 768]
4.2 TydiQA 任务复现
训练参数信息可在 run_tydi.py 中查看。关于训练的超参、优化器、loss等选择，请查看根目录下的 note.md。

注意：由于官方论文中并没有提到微调的参数配置，因此本次复现参考并分别尝试了 canine官方仓库的微调配置（batch_size=512，epoch=10, lr=5e-5），以及 tydiqa 基线仓库的微调配置（batch_size=16,epoch=3, lr=5e-5）。其中 batch_size=512 通过梯度累加来近似模拟。

实验中发现，10个epoch训练存在明显的过拟合，并且3个epoch的效果普遍比10个epoch高出2-3%。

4.2.1 模型训练
单卡 V100 32G 训练需要8小时左右（多卡仅改动启动方式为 !python -m paddle.distributed.launch --selected_gpus=‘0’ run_tydi.py）。

In [67]

!python -m paddle.distributed.launch --selected_gpus=‘0’ run_tydi.py \ # 多卡训练使用

!python run_tydi.py
–train_input_dir=data/tydi/train.h5df
–do_train
–max_seq_length=2048
–train_batch_size=16
–learning_rate=5e-5
–num_train_epochs=2
–warmup_proportion=0.1
–logging_steps=1000
–checkout_steps=50000
–seed=2022
–fp16
–gradient_accumulation_steps=1
–output_dir=data/tydiqa_baseline_model/train
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - ********** Configuration Arguments **********
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - candidate_beam: None
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - checkout_steps: 50000
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - dev_split_ratio: 0.002
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - do_file_construct: False
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - do_predict: False
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - do_train: True
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - fp16: True
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - gradient_accumulation_steps: 1
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - learning_rate: 5e-05
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - logging_steps: 1000
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - max_answer_length: 2048
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - max_position: None
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - max_seq_length: 2048
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - max_to_predict: None
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - num_train_epochs: 2
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - output_dir: data/tydiqa_baseline_model/train
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - output_prediction_file: None
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - precomputed_predict_file: None
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - predict_batch_size: None
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - predict_file: None
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - scale_loss: 4096
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - seed: 2022
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - state_dict_path: None
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - train_batch_size: 16
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - train_input_dir: data/tydi/train.h5df
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - warmup_proportion: 0.1
06/19/2022 23:04:19 - INFO - tydi_canine.run_tydi_lib - **************************************************
[2022-06-19 23:04:19,761] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/canine-s/model_state.pdparams
W0619 23:04:19.762317 14081 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0619 23:04:19.766005 14081 gpu_context.cc:306] device: 0, cuDNN Version: 7.6.
[2022-06-19 23:04:24,336] [ INFO] - Weights of CanineForTydiQA not initialized from pretrained model: [‘span_classifier.weight’, ‘span_classifier.bias’, ‘answer_type_classifier.weight’, ‘answer_type_classifier.bias’]
06/19/2022 23:04:24 - INFO - tydi_canine.run_tydi_lib - >>> num_training_samples: 459358
06/19/2022 23:04:24 - INFO - tydi_canine.run_tydi_lib - >>> num_dev_samples: 920
06/19/2022 23:04:24 - INFO - tydi_canine.run_tydi_lib - >>> theory batch size (batch * gradient step * n_gpus): 16
06/19/2022 23:04:24 - INFO - tydi_canine.run_tydi_lib - >>> num_train_steps for each GPU: 57420
06/19/2022 23:04:24 - INFO - tydi_canine.run_tydi_lib - >>> start training…
06/19/2022 23:11:42 - INFO - tydi_canine.run_tydi_lib - Step 1000/57420 train loss 2.3659 dev loss 1.3957 acc 0.00% diff 711.52 time 7.29min
06/19/2022 23:18:57 - INFO - tydi_canine.run_tydi_lib - Step 2000/57420 train loss 1.3206 dev loss 1.0875 acc 64.84% diff 619.26 time 7.26min
06/19/2022 23:26:13 - INFO - tydi_canine.run_tydi_lib - Step 3000/57420 train loss 1.0976 dev loss 0.9824 acc 55.47% diff 589.34 time 7.26min
06/19/2022 23:33:28 - INFO - tydi_canine.run_tydi_lib - Step 4000/57420 train loss 1.0229 dev loss 0.9321 acc 57.42% diff 619.86 time 7.25min
Found inf or nan, current scale is: 65536.0, decrease to: 65536.00.5
06/19/2022 23:40:43 - INFO - tydi_canine.run_tydi_lib - Step 5000/57420 train loss 0.9527 dev loss 0.8803 acc 49.61% diff 576.10 time 7.26min
06/19/2022 23:47:59 - INFO - tydi_canine.run_tydi_lib - Step 6000/57420 train loss 0.9712 dev loss 0.8608 acc 62.89% diff 553.36 time 7.26min
06/19/2022 23:55:15 - INFO - tydi_canine.run_tydi_lib - Step 7000/57420 train loss 0.9411 dev loss 0.8116 acc 66.80% diff 515.69 time 7.27min
Found inf or nan, current scale is: 65536.0, decrease to: 65536.00.5
06/20/2022 00:02:33 - INFO - tydi_canine.run_tydi_lib - Step 8000/57420 train loss 0.8542 dev loss 0.8332 acc 69.14% diff 550.46 time 7.29min
06/20/2022 00:09:48 - INFO - tydi_canine.run_tydi_lib - Step 9000/57420 train loss 0.8530 dev loss 0.8211 acc 60.16% diff 543.80 time 7.25min
06/20/2022 00:17:03 - INFO - tydi_canine.run_tydi_lib - Step 10000/57420 train loss 0.8235 dev loss 0.7979 acc 62.50% diff 518.39 time 7.25min
06/20/2022 00:24:18 - INFO - tydi_canine.run_tydi_lib - Step 11000/57420 train loss 0.8169 dev loss 0.7724 acc 78.12% diff 524.05 time 7.25min
06/20/2022 00:31:33 - INFO - tydi_canine.run_tydi_lib - Step 12000/57420 train loss 0.8347 dev loss 0.7843 acc 69.92% diff 541.53 time 7.25min
06/20/2022 00:38:48 - INFO - tydi_canine.run_tydi_lib - Step 13000/57420 train loss 0.8145 dev loss 0.7679 acc 66.02% diff 501.39 time 7.25min
06/20/2022 00:46:02 - INFO - tydi_canine.run_tydi_lib - Step 14000/57420 train loss 0.8008 dev loss 0.7735 acc 65.62% diff 514.75 time 7.24min
06/20/2022 00:53:22 - INFO - tydi_canine.run_tydi_lib - Step 15000/57420 train loss 0.7900 dev loss 0.7639 acc 78.12% diff 414.54 time 7.34min
06/20/2022 01:00:42 - INFO - tydi_canine.run_tydi_lib - Step 16000/57420 train loss 0.8032 dev loss 0.7543 acc 73.05% diff 464.55 time 7.32min
06/20/2022 01:08:00 - INFO - tydi_canine.run_tydi_lib - Step 17000/57420 train loss 0.7608 dev loss 0.7073 acc 73.44% diff 510.86 time 7.30min
06/20/2022 01:15:22 - INFO - tydi_canine.run_tydi_lib - Step 18000/57420 train loss 0.7751 dev loss 0.7345 acc 78.91% diff 469.59 time 7.37min
06/20/2022 01:22:38 - INFO - tydi_canine.run_tydi_lib - Step 19000/57420 train loss 0.7113 dev loss 0.7479 acc 68.75% diff 497.30 time 7.26min
Found inf or nan, current scale is: 131072.0, decrease to: 131072.00.5
06/20/2022 01:29:57 - INFO - tydi_canine.run_tydi_lib - Step 20000/57420 train loss 0.7142 dev loss 0.7227 acc 74.22% diff 477.53 time 7.33min
06/20/2022 01:37:17 - INFO - tydi_canine.run_tydi_lib - Step 21000/57420 train loss 0.7234 dev loss 0.6891 acc 71.48% diff 476.39 time 7.33min
06/20/2022 01:44:33 - INFO - tydi_canine.run_tydi_lib - Step 22000/57420 train loss 0.7083 dev loss 0.6853 acc 70.31% diff 473.48 time 7.26min
06/20/2022 01:51:49 - INFO - tydi_canine.run_tydi_lib - Step 23000/57420 train loss 0.7435 dev loss 0.6792 acc 81.25% diff 438.71 time 7.27min
06/20/2022 01:59:08 - INFO - tydi_canine.run_tydi_lib - Step 24000/57420 train loss 0.6861 dev loss 0.6798 acc 74.22% diff 487.15 time 7.32min
06/20/2022 02:06:27 - INFO - tydi_canine.run_tydi_lib - Step 25000/57420 train loss 0.6648 dev loss 0.6937 acc 76.95% diff 411.52 time 7.31min
06/20/2022 02:13:43 - INFO - tydi_canine.run_tydi_lib - Step 26000/57420 train loss 0.7159 dev loss 0.6683 acc 74.61% diff 446.82 time 7.26min
06/20/2022 02:20:58 - INFO - tydi_canine.run_tydi_lib - Step 27000/57420 train loss 0.7019 dev loss 0.6713 acc 75.00% diff 431.98 time 7.26min
06/20/2022 02:28:17 - INFO - tydi_canine.run_tydi_lib - Step 28000/57420 train loss 0.6771 dev loss 0.6604 acc 71.09% diff 495.70 time 7.31min
06/20/2022 02:35:35 - INFO - tydi_canine.run_tydi_lib - Step 29000/57420 train loss 0.6475 dev loss 0.6605 acc 76.95% diff 479.70 time 7.30min
06/20/2022 02:42:50 - INFO - tydi_canine.run_tydi_lib - Step 30000/57420 train loss 0.6903 dev loss 0.6392 acc 79.69% diff 433.50 time 7.25min
06/20/2022 02:50:04 - INFO - tydi_canine.run_tydi_lib - Step 31000/57420 train loss 0.6527 dev loss 0.6326 acc 77.34% diff 402.65 time 7.24min
06/20/2022 02:57:20 - INFO - tydi_canine.run_tydi_lib - Step 32000/57420 train loss 0.6250 dev loss 0.6356 acc 80.86% diff 382.66 time 7.25min
06/20/2022 03:04:35 - INFO - tydi_canine.run_tydi_lib - Step 33000/57420 train loss 0.6186 dev loss 0.6035 acc 83.98% diff 393.27 time 7.26min
06/20/2022 03:11:50 - INFO - tydi_canine.run_tydi_lib - Step 34000/57420 train loss 0.6007 dev loss 0.6019 acc 78.91% diff 430.07 time 7.24min
06/20/2022 03:19:05 - INFO - tydi_canine.run_tydi_lib - Step 35000/57420 train loss 0.6066 dev loss 0.5945 acc 78.91% diff 403.21 time 7.25min
06/20/2022 03:26:20 - INFO - tydi_canine.run_tydi_lib - Step 36000/57420 train loss 0.5684 dev loss 0.6145 acc 75.39% diff 400.80 time 7.25min
06/20/2022 03:33:37 - INFO - tydi_canine.run_tydi_lib - Step 37000/57420 train loss 0.5617 dev loss 0.6195 acc 81.64% diff 339.78 time 7.29min
06/20/2022 03:40:52 - INFO - tydi_canine.run_tydi_lib - Step 38000/57420 train loss 0.5483 dev loss 0.6139 acc 78.52% diff 424.13 time 7.25min
Found inf or nan, current scale is: 131072.0, decrease to: 131072.00.5
06/20/2022 03:48:07 - INFO - tydi_canine.run_tydi_lib - Step 39000/57420 train loss 0.5343 dev loss 0.6105 acc 73.83% diff 396.68 time 7.25min
06/20/2022 03:55:24 - INFO - tydi_canine.run_tydi_lib - Step 40000/57420 train loss 0.5400 dev loss 0.6095 acc 78.91% diff 343.46 time 7.29min
06/20/2022 04:02:40 - INFO - tydi_canine.run_tydi_lib - Step 41000/57420 train loss 0.5492 dev loss 0.5925 acc 80.86% diff 355.37 time 7.25min
06/20/2022 04:09:55 - INFO - tydi_canine.run_tydi_lib - Step 42000/57420 train loss 0.5127 dev loss 0.5984 acc 76.95% diff 376.34 time 7.26min
06/20/2022 04:17:11 - INFO - tydi_canine.run_tydi_lib - Step 43000/57420 train loss 0.5194 dev loss 0.6068 acc 79.30% diff 327.65 time 7.26min
Found inf or nan, current scale is: 65536.0, decrease to: 65536.00.5
06/20/2022 04:24:28 - INFO - tydi_canine.run_tydi_lib - Step 44000/57420 train loss 0.5211 dev loss 0.6100 acc 77.34% diff 367.96 time 7.29min
06/20/2022 04:31:46 - INFO - tydi_canine.run_tydi_lib - Step 45000/57420 train loss 0.5185 dev loss 0.6149 acc 81.25% diff 287.38 time 7.30min
06/20/2022 04:39:02 - INFO - tydi_canine.run_tydi_lib - Step 46000/57420 train loss 0.5103 dev loss 0.5860 acc 77.73% diff 361.80 time 7.26min
06/20/2022 04:46:18 - INFO - tydi_canine.run_tydi_lib - Step 47000/57420 train loss 0.4860 dev loss 0.6056 acc 79.30% diff 374.09 time 7.27min
06/20/2022 04:53:35 - INFO - tydi_canine.run_tydi_lib - Step 48000/57420 train loss 0.4563 dev loss 0.5870 acc 80.08% diff 327.53 time 7.29min
06/20/2022 05:00:51 - INFO - tydi_canine.run_tydi_lib - Step 49000/57420 train loss 0.4578 dev loss 0.5935 acc 78.12% diff 352.49 time 7.26min
06/20/2022 05:08:07 - INFO - tydi_canine.run_tydi_lib - Step 50000/57420 train loss 0.4629 dev loss 0.5675 acc 79.30% diff 340.87 time 7.27min
06/20/2022 05:15:28 - INFO - tydi_canine.run_tydi_lib - Step 51000/57420 train loss 0.4788 dev loss 0.5684 acc 75.00% diff 372.25 time 7.35min
06/20/2022 05:22:45 - INFO - tydi_canine.run_tydi_lib - Step 52000/57420 train loss 0.4699 dev loss 0.5763 acc 76.17% diff 357.04 time 7.27min
06/20/2022 05:29:58 - INFO - tydi_canine.run_tydi_lib - Step 53000/57420 train loss 0.4400 dev loss 0.5686 acc 76.95% diff 346.14 time 7.22min
06/20/2022 05:37:11 - INFO - tydi_canine.run_tydi_lib - Step 54000/57420 train loss 0.4240 dev loss 0.5781 acc 77.34% diff 338.02 time 7.22min
Found inf or nan, current scale is: 131072.0, decrease to: 131072.00.5
06/20/2022 05:44:24 - INFO - tydi_canine.run_tydi_lib - Step 55000/57420 train loss 0.4664 dev loss 0.5793 acc 77.34% diff 322.41 time 7.22min
06/20/2022 05:51:42 - INFO - tydi_canine.run_tydi_lib - Step 56000/57420 train loss 0.4537 dev loss 0.5716 acc 76.17% diff 336.51 time 7.29min
06/20/2022 05:58:59 - INFO - tydi_canine.run_tydi_lib - Step 57000/57420 train loss 0.4184 dev loss 0.5762 acc 76.17% diff 337.59 time 7.29min
06/20/2022 06:02:16 - INFO - tydi_canine.run_tydi_lib - Step 57420/57420 train loss 0.1692 dev loss 0.5771 acc 76.17% diff 337.76 time 3.28min
06/20/2022 06:02:16 - INFO - tydi_canine.run_tydi_lib - training done, total steps trained: 57420
4.2.2 tydi任务评测
根据 tydi 官方指示进行评测。我们使用训练结束的权重进行测试，不考虑中间的checkout poing。

步骤一：运行以下代码，生成任务评测文件 pred.jsonl ，由于 tydiQA任务的评估方式较为特殊，因此可以采用单卡或者多卡进行（多卡仅需改动–selected_gpus 为 0,1,2,3）：

【备注】 pred.jsonl 为格式满足 TydiQA 评测要求的文件，格式要求可以参考：TydiQA 评测文件示例。

In [70]

!python3 -m paddle.distributed.launch --selected_gpus=‘0’ run_tydi.py \ # 多卡训练使用

!python3 run_tydi.py
–state_dict_path=data/tydiqa_baseline_model/train
–predict_file=data/tydi/tydiqa-v1.0-dev.jsonl.gz
–precomputed_predict_file=data/tydi/dev.h5df
–do_predict
–max_seq_length=2048
–max_answer_length=100
–candidate_beam=30
–predict_batch_size=32
–logging_steps=100
–seed=2022
–output_dir=data/tydiqa_baseline_model/predict
–output_prediction_file=data/tydiqa_baseline_model/predict/pred.jsonl
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - ********** Configuration Arguments **********
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - candidate_beam: 30
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - checkout_steps: 40000
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - dev_split_ratio: 0.002
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - do_file_construct: False
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - do_predict: True
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - do_train: False
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - fp16: False
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - gradient_accumulation_steps: 1
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - learning_rate: None
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - logging_steps: 100
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - max_answer_length: 100
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - max_position: None
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - max_seq_length: 2048
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - max_to_predict: None
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - num_train_epochs: None
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - output_dir: data/tydiqa_baseline_model/predict
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - output_prediction_file: data/tydiqa_baseline_model/predict/pred.jsonl
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - precomputed_predict_file: data/tydi/dev.h5df
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - predict_batch_size: 32
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - predict_file: data/tydi/tydiqa-v1.0-dev.jsonl.gz
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - scale_loss: 4096
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - seed: 2022
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - state_dict_path: data/tydiqa_baseline_model/train
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - train_batch_size: None
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - train_input_dir: None
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - warmup_proportion: None
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - **************************************************
06/20/2022 08:57:10 - INFO - tydi_canine.run_tydi_lib - >>> Number of prediction samples: 336499
[2022-06-20 08:57:10,776] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/canine-s/model_state.pdparams
W0620 08:57:10.778136 12853 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0620 08:57:10.781677 12853 gpu_context.cc:306] device: 0, cuDNN Version: 7.6.
[2022-06-20 08:57:15,480] [ INFO] - Weights of CanineForTydiQA not initialized from pretrained model: [‘span_classifier.weight’, ‘span_classifier.bias’, ‘answer_type_classifier.weight’, ‘answer_type_classifier.bias’]
06/20/2022 08:57:15 - INFO - tydi_canine.run_tydi_lib -

loading weight from data/tydiqa_baseline_model/train/tydi_seed_2022.pdparams
06/20/2022 08:57:18 - INFO - tydi_canine.run_tydi_lib - step 0/10516 time 0.02min
06/20/2022 08:58:27 - INFO - tydi_canine.run_tydi_lib - step 100/10516 time 1.16min
06/20/2022 08:59:38 - INFO - tydi_canine.run_tydi_lib - step 200/10516 time 1.17min
06/20/2022 09:00:48 - INFO - tydi_canine.run_tydi_lib - step 300/10516 time 1.18min
06/20/2022 09:01:59 - INFO - tydi_canine.run_tydi_lib - step 400/10516 time 1.18min
06/20/2022 09:03:10 - INFO - tydi_canine.run_tydi_lib - step 500/10516 time 1.18min
06/20/2022 09:04:21 - INFO - tydi_canine.run_tydi_lib - step 600/10516 time 1.18min
06/20/2022 09:05:32 - INFO - tydi_canine.run_tydi_lib - step 700/10516 time 1.18min
06/20/2022 09:06:43 - INFO - tydi_canine.run_tydi_lib - step 800/10516 time 1.18min
06/20/2022 09:07:54 - INFO - tydi_canine.run_tydi_lib - step 900/10516 time 1.18min
06/20/2022 09:09:05 - INFO - tydi_canine.run_tydi_lib - step 1000/10516 time 1.18min
06/20/2022 09:10:16 - INFO - tydi_canine.run_tydi_lib - step 1100/10516 time 1.18min
06/20/2022 09:11:27 - INFO - tydi_canine.run_tydi_lib - step 1200/10516 time 1.18min
06/20/2022 09:12:38 - INFO - tydi_canine.run_tydi_lib - step 1300/10516 time 1.18min
06/20/2022 09:13:49 - INFO - tydi_canine.run_tydi_lib - step 1400/10516 time 1.18min
06/20/2022 09:15:00 - INFO - tydi_canine.run_tydi_lib - step 1500/10516 time 1.18min
06/20/2022 09:16:11 - INFO - tydi_canine.run_tydi_lib - step 1600/10516 time 1.18min
06/20/2022 09:17:22 - INFO - tydi_canine.run_tydi_lib - step 1700/10516 time 1.18min
06/20/2022 09:18:33 - INFO - tydi_canine.run_tydi_lib - step 1800/10516 time 1.18min
06/20/2022 09:19:44 - INFO - tydi_canine.run_tydi_lib - step 1900/10516 time 1.18min
06/20/2022 09:20:55 - INFO - tydi_canine.run_tydi_lib - step 2000/10516 time 1.18min
06/20/2022 09:22:06 - INFO - tydi_canine.run_tydi_lib - step 2100/10516 time 1.18min
06/20/2022 09:23:17 - INFO - tydi_canine.run_tydi_lib - step 2200/10516 time 1.18min
06/20/2022 09:24:28 - INFO - tydi_canine.run_tydi_lib - step 2300/10516 time 1.18min
06/20/2022 09:25:39 - INFO - tydi_canine.run_tydi_lib - step 2400/10516 time 1.18min
06/20/2022 09:26:50 - INFO - tydi_canine.run_tydi_lib - step 2500/10516 time 1.18min
06/20/2022 09:28:01 - INFO - tydi_canine.run_tydi_lib - step 2600/10516 time 1.18min
06/20/2022 09:29:12 - INFO - tydi_canine.run_tydi_lib - step 2700/10516 time 1.18min
06/20/2022 09:30:23 - INFO - tydi_canine.run_tydi_lib - step 2800/10516 time 1.18min
06/20/2022 09:31:34 - INFO - tydi_canine.run_tydi_lib - step 2900/10516 time 1.18min
06/20/2022 09:32:45 - INFO - tydi_canine.run_tydi_lib - step 3000/10516 time 1.18min
06/20/2022 09:33:56 - INFO - tydi_canine.run_tydi_lib - step 3100/10516 time 1.18min
06/20/2022 09:35:07 - INFO - tydi_canine.run_tydi_lib - step 3200/10516 time 1.18min
06/20/2022 09:36:18 - INFO - tydi_canine.run_tydi_lib - step 3300/10516 time 1.18min
06/20/2022 09:37:29 - INFO - tydi_canine.run_tydi_lib - step 3400/10516 time 1.18min
06/20/2022 09:38:40 - INFO - tydi_canine.run_tydi_lib - step 3500/10516 time 1.18min
06/20/2022 09:39:51 - INFO - tydi_canine.run_tydi_lib - step 3600/10516 time 1.18min
06/20/2022 09:41:02 - INFO - tydi_canine.run_tydi_lib - step 3700/10516 time 1.18min
06/20/2022 09:42:13 - INFO - tydi_canine.run_tydi_lib - step 3800/10516 time 1.18min
06/20/2022 09:43:24 - INFO - tydi_canine.run_tydi_lib - step 3900/10516 time 1.18min
06/20/2022 09:44:35 - INFO - tydi_canine.run_tydi_lib - step 4000/10516 time 1.18min
06/20/2022 09:45:46 - INFO - tydi_canine.run_tydi_lib - step 4100/10516 time 1.18min
06/20/2022 09:46:57 - INFO - tydi_canine.run_tydi_lib - step 4200/10516 time 1.18min
06/20/2022 09:48:08 - INFO - tydi_canine.run_tydi_lib - step 4300/10516 time 1.18min
06/20/2022 09:49:19 - INFO - tydi_canine.run_tydi_lib - step 4400/10516 time 1.18min
06/20/2022 09:50:30 - INFO - tydi_canine.run_tydi_lib - step 4500/10516 time 1.18min
06/20/2022 09:51:41 - INFO - tydi_canine.run_tydi_lib - step 4600/10516 time 1.18min
06/20/2022 09:52:52 - INFO - tydi_canine.run_tydi_lib - step 4700/10516 time 1.19min
06/20/2022 09:54:03 - INFO - tydi_canine.run_tydi_lib - step 4800/10516 time 1.18min
06/20/2022 09:55:14 - INFO - tydi_canine.run_tydi_lib - step 4900/10516 time 1.18min
06/20/2022 09:56:25 - INFO - tydi_canine.run_tydi_lib - step 5000/10516 time 1.18min
06/20/2022 09:57:36 - INFO - tydi_canine.run_tydi_lib - step 5100/10516 time 1.18min
06/20/2022 09:58:47 - INFO - tydi_canine.run_tydi_lib - step 5200/10516 time 1.18min
06/20/2022 09:59:58 - INFO - tydi_canine.run_tydi_lib - step 5300/10516 time 1.18min
06/20/2022 10:01:09 - INFO - tydi_canine.run_tydi_lib - step 5400/10516 time 1.18min
06/20/2022 10:02:21 - INFO - tydi_canine.run_tydi_lib - step 5500/10516 time 1.18min
06/20/2022 10:03:32 - INFO - tydi_canine.run_tydi_lib - step 5600/10516 time 1.18min
06/20/2022 10:04:43 - INFO - tydi_canine.run_tydi_lib - step 5700/10516 time 1.18min
06/20/2022 10:05:54 - INFO - tydi_canine.run_tydi_lib - step 5800/10516 time 1.18min
06/20/2022 10:07:05 - INFO - tydi_canine.run_tydi_lib - step 5900/10516 time 1.18min
06/20/2022 10:08:16 - INFO - tydi_canine.run_tydi_lib - step 6000/10516 time 1.19min
06/20/2022 10:09:27 - INFO - tydi_canine.run_tydi_lib - step 6100/10516 time 1.19min
06/20/2022 10:10:38 - INFO - tydi_canine.run_tydi_lib - step 6200/10516 time 1.18min
06/20/2022 10:11:49 - INFO - tydi_canine.run_tydi_lib - step 6300/10516 time 1.18min
06/20/2022 10:13:00 - INFO - tydi_canine.run_tydi_lib - step 6400/10516 time 1.18min
06/20/2022 10:14:11 - INFO - tydi_canine.run_tydi_lib - step 6500/10516 time 1.19min
06/20/2022 10:15:22 - INFO - tydi_canine.run_tydi_lib - step 6600/10516 time 1.19min
06/20/2022 10:16:33 - INFO - tydi_canine.run_tydi_lib - step 6700/10516 time 1.18min
06/20/2022 10:17:45 - INFO - tydi_canine.run_tydi_lib - step 6800/10516 time 1.19min
06/20/2022 10:18:56 - INFO - tydi_canine.run_tydi_lib - step 6900/10516 time 1.18min
06/20/2022 10:20:07 - INFO - tydi_canine.run_tydi_lib - step 7000/10516 time 1.18min
06/20/2022 10:21:18 - INFO - tydi_canine.run_tydi_lib - step 7100/10516 time 1.18min
06/20/2022 10:22:29 - INFO - tydi_canine.run_tydi_lib - step 7200/10516 time 1.18min
06/20/2022 10:23:40 - INFO - tydi_canine.run_tydi_lib - step 7300/10516 time 1.18min
06/20/2022 10:24:51 - INFO - tydi_canine.run_tydi_lib - step 7400/10516 time 1.18min
06/20/2022 10:26:02 - INFO - tydi_canine.run_tydi_lib - step 7500/10516 time 1.18min
06/20/2022 10:27:13 - INFO - tydi_canine.run_tydi_lib - step 7600/10516 time 1.18min
06/20/2022 10:28:24 - INFO - tydi_canine.run_tydi_lib - step 7700/10516 time 1.19min
06/20/2022 10:29:35 - INFO - tydi_canine.run_tydi_lib - step 7800/10516 time 1.18min
06/20/2022 10:30:46 - INFO - tydi_canine.run_tydi_lib - step 7900/10516 time 1.18min
06/20/2022 10:31:57 - INFO - tydi_canine.run_tydi_lib - step 8000/10516 time 1.18min
06/20/2022 10:33:08 - INFO - tydi_canine.run_tydi_lib - step 8100/10516 time 1.18min
06/20/2022 10:34:19 - INFO - tydi_canine.run_tydi_lib - step 8200/10516 time 1.19min
06/20/2022 10:35:30 - INFO - tydi_canine.run_tydi_lib - step 8300/10516 time 1.18min
06/20/2022 10:36:41 - INFO - tydi_canine.run_tydi_lib - step 8400/10516 time 1.18min
06/20/2022 10:37:52 - INFO - tydi_canine.run_tydi_lib - step 8500/10516 time 1.18min
06/20/2022 10:39:03 - INFO - tydi_canine.run_tydi_lib - step 8600/10516 time 1.19min
06/20/2022 10:40:15 - INFO - tydi_canine.run_tydi_lib - step 8700/10516 time 1.18min
06/20/2022 10:41:26 - INFO - tydi_canine.run_tydi_lib - step 8800/10516 time 1.18min
06/20/2022 10:42:37 - INFO - tydi_canine.run_tydi_lib - step 8900/10516 time 1.18min
06/20/2022 10:43:48 - INFO - tydi_canine.run_tydi_lib - step 9000/10516 time 1.19min
06/20/2022 10:44:59 - INFO - tydi_canine.run_tydi_lib - step 9100/10516 time 1.18min
06/20/2022 10:46:10 - INFO - tydi_canine.run_tydi_lib - step 9200/10516 time 1.18min
06/20/2022 10:47:21 - INFO - tydi_canine.run_tydi_lib - step 9300/10516 time 1.18min
06/20/2022 10:48:32 - INFO - tydi_canine.run_tydi_lib - step 9400/10516 time 1.18min
06/20/2022 10:49:43 - INFO - tydi_canine.run_tydi_lib - step 9500/10516 time 1.19min
06/20/2022 10:50:54 - INFO - tydi_canine.run_tydi_lib - step 9600/10516 time 1.19min
06/20/2022 10:52:05 - INFO - tydi_canine.run_tydi_lib - step 9700/10516 time 1.18min
06/20/2022 10:53:16 - INFO - tydi_canine.run_tydi_lib - step 9800/10516 time 1.18min
06/20/2022 10:54:27 - INFO - tydi_canine.run_tydi_lib - step 9900/10516 time 1.18min
06/20/2022 10:55:38 - INFO - tydi_canine.run_tydi_lib - step 10000/10516 time 1.18min
06/20/2022 10:56:49 - INFO - tydi_canine.run_tydi_lib - step 10100/10516 time 1.18min
06/20/2022 10:58:00 - INFO - tydi_canine.run_tydi_lib - step 10200/10516 time 1.18min
06/20/2022 10:59:12 - INFO - tydi_canine.run_tydi_lib - step 10300/10516 time 1.19min
06/20/2022 11:00:23 - INFO - tydi_canine.run_tydi_lib - step 10400/10516 time 1.18min
06/20/2022 11:01:34 - INFO - tydi_canine.run_tydi_lib - step 10500/10516 time 1.18min
06/20/2022 11:01:53 - INFO - tydi_canine.run_tydi_lib - >>> start generating TyqiQA task evaluation file
06/20/2022 11:01:54 - INFO - tydi_canine.run_tydi_lib - >>> Loaded predicted logits from [‘data/tydiqa_baseline_model/predict/results_gpu_0.pickle’]
06/20/2022 11:01:54 - INFO - tydi_canine.run_tydi_lib - >>> start processing predicted logits
06/20/2022 11:01:56 - INFO - tydi_canine.run_tydi_lib - Reading: data/tydi/tydiqa-v1.0-dev.jsonl.gz
06/20/2022 11:02:04 - INFO - tydi_canine.run_tydi_lib - loading precomputed evaluation meta data…
06/20/2022 11:05:45 - INFO - tydi_canine.run_tydi_lib - Num candidate examples loaded (includes all shards): 18670
06/20/2022 11:05:45 - INFO - tydi_canine.run_tydi_lib - Num candidate features loaded: 336499
06/20/2022 11:05:45 - INFO - tydi_canine.run_tydi_lib - Num prediction result features: 336499
06/20/2022 11:05:46 - INFO - tydi_canine.postproc - Post-processing predictions started.
06/20/2022 11:05:48 - INFO - tydi_canine.postproc - Start Combining results and articles…
06/20/2022 11:05:48 - INFO - tydi_canine.postproc - >>> step 0/691668
06/20/2022 11:06:50 - INFO - tydi_canine.postproc - >>> step 50000/691668
06/20/2022 11:07:47 - INFO - tydi_canine.postproc - >>> step 100000/691668
06/20/2022 11:08:39 - INFO - tydi_canine.postproc - >>> step 150000/691668
06/20/2022 11:08:49 - INFO - tydi_canine.postproc - >>> step 200000/691668
06/20/2022 11:08:56 - INFO - tydi_canine.postproc - >>> step 250000/691668
06/20/2022 11:09:03 - INFO - tydi_canine.postproc - >>> step 300000/691668
06/20/2022 11:09:09 - INFO - tydi_canine.postproc - >>> step 350000/691668
06/20/2022 11:09:14 - INFO - tydi_canine.postproc - >>> step 400000/691668
06/20/2022 11:09:19 - INFO - tydi_canine.postproc - >>> step 450000/691668
06/20/2022 11:09:22 - INFO - tydi_canine.postproc - >>> step 500000/691668
06/20/2022 11:09:24 - INFO - tydi_canine.postproc - >>> step 550000/691668
06/20/2022 11:09:25 - INFO - tydi_canine.postproc - >>> step 600000/691668
06/20/2022 11:09:26 - INFO - tydi_canine.postproc - >>> step 650000/691668
06/20/2022 11:09:26 - INFO - tydi_canine.postproc - Num candidate examples found: 18670
06/20/2022 11:09:26 - INFO - tydi_canine.postproc - Num candidate features found: 336499
06/20/2022 11:09:26 - INFO - tydi_canine.postproc - Num results found: 336499
06/20/2022 11:09:26 - INFO - tydi_canine.postproc - len(merged): 691668
06/20/2022 11:09:26 - INFO - tydi_canine.postproc - >>> Collecting & formatting Article Answers…
06/20/2022 11:09:31 - WARNING - root - No passage predicted for eval_example -157495529391570507. Choosing first.
06/20/2022 11:09:44 - WARNING - root - No predictions for eval_example 15806027368664557
06/20/2022 11:09:54 - WARNING - root - No passage predicted for eval_example 692274191345985157. Choosing first.
06/20/2022 11:09:56 - INFO - tydi_canine.postproc - >>> Step 1000/18670
06/20/2022 11:09:58 - WARNING - root - No passage predicted for eval_example 8586073032272855090. Choosing first.
06/20/2022 11:10:01 - WARNING - root - No passage predicted for eval_example -3668665318389239427. Choosing first.
06/20/2022 11:10:04 - WARNING - root - No passage predicted for eval_example -6268274736183476846. Choosing first.
06/20/2022 11:10:04 - WARNING - root - No passage predicted for eval_example -215014285213980107. Choosing first.
06/20/2022 11:10:14 - WARNING - root - No passage predicted for eval_example 3541081022049335864. Choosing first.
06/20/2022 11:10:20 - INFO - tydi_canine.postproc - >>> Step 2000/18670
06/20/2022 11:10:21 - WARNING - root - No passage predicted for eval_example 2029377693637076368. Choosing first.
06/20/2022 11:10:31 - WARNING - root - No passage predicted for eval_example -4153920489965267783. Choosing first.
06/20/2022 11:10:34 - WARNING - root - No passage predicted for eval_example 9118697991497450969. Choosing first.
06/20/2022 11:10:42 - WARNING - root - No passage predicted for eval_example 1121313378616415793. Choosing first.
06/20/2022 11:10:42 - INFO - tydi_canine.postproc - >>> Step 3000/18670
06/20/2022 11:10:49 - WARNING - root - No passage predicted for eval_example -6546365819310023734. Choosing first.
06/20/2022 11:10:51 - WARNING - root - No passage predicted for eval_example 3869441511486046909. Choosing first.
06/20/2022 11:10:53 - WARNING - root - No passage predicted for eval_example -7460178754977380099. Choosing first.
06/20/2022 11:10:55 - WARNING - root - No passage predicted for eval_example -5583398636434503809. Choosing first.
06/20/2022 11:10:58 - WARNING - root - No passage predicted for eval_example -1579250984781414326. Choosing first.
06/20/2022 11:11:01 - WARNING - root - No passage predicted for eval_example 5964889685116234539. Choosing first.
06/20/2022 11:11:02 - WARNING - root - No predictions for eval_example 3843729750068551852
06/20/2022 11:11:04 - INFO - tydi_canine.postproc - >>> Step 4000/18670
06/20/2022 11:11:04 - WARNING - root - No passage predicted for eval_example -4220329321148598058. Choosing first.
06/20/2022 11:11:08 - WARNING - root - No passage predicted for eval_example -1243519876584071162. Choosing first.
06/20/2022 11:11:17 - WARNING - root - No passage predicted for eval_example 5478730504437513486. Choosing first.
06/20/2022 11:11:25 - WARNING - root - No passage predicted for eval_example 3092608921581193966. Choosing first.
06/20/2022 11:11:27 - INFO - tydi_canine.postproc - >>> Step 5000/18670
06/20/2022 11:11:30 - WARNING - root - No predictions for eval_example -6615280656492729722
06/20/2022 11:11:38 - WARNING - root - No predictions for eval_example -4524843330972236201
06/20/2022 11:11:39 - WARNING - root - No passage predicted for eval_example -1411403610895126481. Choosing first.
06/20/2022 11:11:49 - INFO - tydi_canine.postproc - >>> Step 6000/18670
06/20/2022 11:11:53 - WARNING - root - No passage predicted for eval_example 4163179397873590275. Choosing first.
06/20/2022 11:12:03 - WARNING - root - No predictions for eval_example 6373078339589919319
06/20/2022 11:12:07 - WARNING - root - No passage predicted for eval_example 5656430593574082335. Choosing first.
06/20/2022 11:12:07 - WARNING - root - No passage predicted for eval_example 6610201684546286526. Choosing first.
06/20/2022 11:12:11 - INFO - tydi_canine.postproc - >>> Step 7000/18670
06/20/2022 11:12:22 - WARNING - root - No passage predicted for eval_example -4404602207821076479. Choosing first.
06/20/2022 11:12:25 - WARNING - root - No passage predicted for eval_example -4709649541188559837. Choosing first.
06/20/2022 11:12:27 - WARNING - root - No passage predicted for eval_example 8687855370743918974. Choosing first.
06/20/2022 11:12:30 - WARNING - root - No predictions for eval_example 5902650000795019979
06/20/2022 11:12:31 - INFO - tydi_canine.postproc - >>> Step 8000/18670
06/20/2022 11:12:32 - WARNING - root - No passage predicted for eval_example 1112061616471594681. Choosing first.
06/20/2022 11:12:42 - WARNING - root - No passage predicted for eval_example -4709175724619041853. Choosing first.
06/20/2022 11:12:44 - WARNING - root - No passage predicted for eval_example 3678655315858773176. Choosing first.
06/20/2022 11:12:48 - WARNING - root - No passage predicted for eval_example 3172690327163837648. Choosing first.
06/20/2022 11:12:54 - WARNING - root - No predictions for eval_example -7013079242391337056
06/20/2022 11:12:57 - WARNING - root - No passage predicted for eval_example -4194866533342765131. Choosing first.
06/20/2022 11:12:57 - INFO - tydi_canine.postproc - >>> Step 9000/18670
06/20/2022 11:13:11 - WARNING - root - No passage predicted for eval_example 8270259553561019967. Choosing first.
06/20/2022 11:13:12 - WARNING - root - No passage predicted for eval_example -7461931697155385758. Choosing first.
06/20/2022 11:13:16 - INFO - tydi_canine.postproc - >>> Step 10000/18670
06/20/2022 11:13:29 - WARNING - root - No passage predicted for eval_example 8727562133571856683. Choosing first.
06/20/2022 11:13:35 - WARNING - root - No passage predicted for eval_example 6112979102531531059. Choosing first.
06/20/2022 11:13:38 - INFO - tydi_canine.postproc - >>> Step 11000/18670
06/20/2022 11:13:45 - WARNING - root - No passage predicted for eval_example 7794975204473342532. Choosing first.
06/20/2022 11:13:46 - WARNING - root - No passage predicted for eval_example 5918522504162160213. Choosing first.
06/20/2022 11:13:48 - WARNING - root - No passage predicted for eval_example -1978910050848973876. Choosing first.
06/20/2022 11:13:49 - WARNING - root - No passage predicted for eval_example -6582729939781779767. Choosing first.
06/20/2022 11:14:00 - INFO - tydi_canine.postproc - >>> Step 12000/18670
06/20/2022 11:14:01 - WARNING - root - No passage predicted for eval_example -6579362358884965533. Choosing first.
06/20/2022 11:14:04 - WARNING - root - No predictions for eval_example -7109862816436123410
06/20/2022 11:14:23 - WARNING - root - No passage predicted for eval_example -5269545664347125369. Choosing first.
06/20/2022 11:14:27 - INFO - tydi_canine.postproc - >>> Step 13000/18670
06/20/2022 11:14:29 - WARNING - root - No passage predicted for eval_example -3242799976610035832. Choosing first.
06/20/2022 11:14:33 - WARNING - root - No predictions for eval_example -7699728260210150066
06/20/2022 11:14:47 - INFO - tydi_canine.postproc - >>> Step 14000/18670
06/20/2022 11:14:48 - WARNING - root - No passage predicted for eval_example 2341642335923759980. Choosing first.
06/20/2022 11:14:49 - WARNING - root - No passage predicted for eval_example -5545495779511166009. Choosing first.
06/20/2022 11:14:51 - WARNING - root - No passage predicted for eval_example 5944538058051359599. Choosing first.
06/20/2022 11:14:59 - WARNING - root - No passage predicted for eval_example -9079375040722190104. Choosing first.
06/20/2022 11:15:01 - WARNING - root - No passage predicted for eval_example -1181159886776109180. Choosing first.
06/20/2022 11:15:03 - WARNING - root - No passage predicted for eval_example -2999534583346695596. Choosing first.
06/20/2022 11:15:05 - WARNING - root - No passage predicted for eval_example 2269660853108546534. Choosing first.
06/20/2022 11:15:05 - WARNING - root - No passage predicted for eval_example -486358702203362747. Choosing first.
06/20/2022 11:15:08 - INFO - tydi_canine.postproc - >>> Step 15000/18670
06/20/2022 11:15:10 - WARNING - root - No passage predicted for eval_example -7113919627051621662. Choosing first.
06/20/2022 11:15:17 - WARNING - root - No passage predicted for eval_example 2513886235746790035. Choosing first.
06/20/2022 11:15:22 - WARNING - root - No passage predicted for eval_example -1547749879440032670. Choosing first.
06/20/2022 11:15:32 - INFO - tydi_canine.postproc - >>> Step 16000/18670
06/20/2022 11:15:35 - WARNING - root - No passage predicted for eval_example 8771092624763674503. Choosing first.
06/20/2022 11:15:42 - WARNING - root - No predictions for eval_example 5585953506784041766
06/20/2022 11:15:45 - WARNING - root - No passage predicted for eval_example 2595983157228829776. Choosing first.
06/20/2022 11:15:45 - WARNING - root - No passage predicted for eval_example 5761353756339863087. Choosing first.
06/20/2022 11:15:51 - WARNING - root - No passage predicted for eval_example 8111145340577365527. Choosing first.
06/20/2022 11:15:51 - WARNING - root - No passage predicted for eval_example -6120628694887196063. Choosing first.
06/20/2022 11:15:53 - INFO - tydi_canine.postproc - >>> Step 17000/18670
06/20/2022 11:16:12 - WARNING - root - No passage predicted for eval_example 9035334114280147788. Choosing first.
06/20/2022 11:16:15 - WARNING - root - No passage predicted for eval_example 206926814747769566. Choosing first.
06/20/2022 11:16:15 - WARNING - root - No passage predicted for eval_example 9052542596602415287. Choosing first.
06/20/2022 11:16:15 - INFO - tydi_canine.postproc - >>> Step 18000/18670
06/20/2022 11:16:26 - INFO - root - Num post-processed results: 18660
06/20/2022 11:16:26 - INFO - tydi_canine.run_tydi_lib - Prediction finished for all shards.
06/20/2022 11:16:26 - INFO - tydi_canine.run_tydi_lib - Total output predictions: 18660
–state_dict_path：存放微调权重的文件路径；若为文件夹路径，则会读取该文件夹下的 tydi_seed_{seed}.pdparams 权重。

–predict_file：从官方下载的 tydiqa-v1.0-dev.jsonl.gz 文件路径。

–output_dir：输出运行日志

–output_prediction_file：输出 JSON 评估文件路径。

步骤二：运行 tydi 官方跑分程序：将 predictions_path 对应到上一步中的 pred.jsonl 位置。

其中 Tydi 测评所需要的 tydi_eval.py, eval_utils.py 源于 tydi 官方。

运行下方代码，可以看到，仅训练 2个 Epoch（官方仓库的配置是10个epoch），CANINE 在 TydiQA 数据集上的指标均比Tydi mBert 基线高出了3%+。

In [79]
!python3 official_tydi/tydi_eval.py
–gold_path=data/tydi/tydiqa-v1.0-dev.jsonl.gz
–predictions_path=data/tydiqa_baseline_model/predict/pred.jsonl
–verbose=False
I0620 16:35:01.788677 140525045303040 eval_utils.py:291] Parsing data/tydi/tydiqa-v1.0-dev.jsonl.gz (gzip)…
I0620 16:35:08.704256 140525045303040 tydi_eval.py:479] 7556 examples have minimal answers
I0620 16:35:08.704433 140525045303040 tydi_eval.py:480] ****************************************
I0620 16:35:08.704489 140525045303040 eval_utils.py:211] Reading predictions from file: data/tydiqa_baseline_model/predict/pred.jsonl
Passage & english & \fpr{60.9}{67.3}{55.6}
Minimal Answer & english & \fpr{47.7}{57.2}{40.9}
{“passage-best-threshold-f1”: 0.6091743119266055, “passage-best-threshold-precision”: 0.6734279918864098, “passage-best-threshold-recall”: 0.5561139028475712, “passage-best-threshold”: 1.7843270301818848, “passage-recall-at-precision>=0.5”: 0.6984924623115578, “passage-precision-at-precision>=0.5”: 0.5012019230769231, “passage-recall-at-precision>=0.75”: 0.4438860971524288, “passage-precision-at-precision>=0.75”: 0.7571428571428571, “passage-recall-at-precision>=0.9”: 0.2780569514237856, “passage-precision-at-precision>=0.9”: 0.907103825136612, “minimal-best-threshold-f1”: 0.47718759121647764, “minimal-best-threshold-precision”: 0.572279321350196, “minimal-best-threshold-recall”: 0.4091945406545443, “minimal-best-threshold”: 3.0788726806640625, “minimal-recall-at-precision>=0.5”: 0.4351140871817951, “minimal-precision-at-precision>=0.5”: 0.5038621129565187, “minimal-recall-at-precision>=0.75”: 0.2648600289512845, “minimal-precision-at-precision>=0.75”: 0.7517350821705575, “minimal-recall-at-precision>=0.9”: 0.08933464039620799, “minimal-precision-at-precision>=0.9”: 0.9074518734983232}
Passage & arabic & \fpr{83.9}{87.3}{80.6}
Minimal Answer & arabic & \fpr{71.7}{76.2}{67.6}
{“passage-best-threshold-f1”: 0.8385185185185186, “passage-best-threshold-precision”: 0.8734567901234568, “passage-best-threshold-recall”: 0.8062678062678063, “passage-best-threshold”: 2.782742500305176, “passage-recall-at-precision>=0.5”: 0.8727445394112061, “passage-precision-at-precision>=0.5”: 0.71796875, “passage-recall-at-precision>=0.75”: 0.8689458689458689, “passage-precision-at-precision>=0.75”: 0.7506152584085316, “passage-recall-at-precision>=0.9”: 0.761633428300095, “passage-precision-at-precision>=0.9”: 0.9001122334455668, “minimal-best-threshold-f1”: 0.7166665868374188, “minimal-best-threshold-precision”: 0.7622372339352328, “minimal-best-threshold-recall”: 0.6762374688721449, “minimal-best-threshold”: 3.860295295715332, “minimal-recall-at-precision>=0.5”: 0.7094175252233874, “minimal-precision-at-precision>=0.5”: 0.6020179817157552, “minimal-recall-at-precision>=0.75”: 0.6807293875395697, “minimal-precision-at-precision>=0.75”: 0.7501768064596533, “minimal-recall-at-precision>=0.9”: 0.49143306616165144, “minimal-precision-at-precision>=0.9”: 0.9003817837626016}
Passage & bengali & \fpr{61.7}{62.5}{61.0}
Minimal Answer & bengali & \fpr{51.4}{62.5}{43.6}
{“passage-best-threshold-f1”: 0.6172839506172839, “passage-best-threshold-precision”: 0.625, “passage-best-threshold-recall”: 0.6097560975609756, “passage-best-threshold”: 2.35416316986084, “passage-recall-at-precision>=0.5”: 0.6504065040650406, “passage-precision-at-precision>=0.5”: 0.5405405405405406, “passage-recall-at-precision>=0.75”: 0.4878048780487805, “passage-precision-at-precision>=0.75”: 0.75, “passage-recall-at-precision>=0.9”: 0.24390243902439024, “passage-precision-at-precision>=0.9”: 0.9090909090909091, “minimal-best-threshold-f1”: 0.5136347551342907, “minimal-best-threshold-precision”: 0.6254258488988128, “minimal-best-threshold-recall”: 0.4357475176754024, “minimal-best-threshold”: 4.6143341064453125, “minimal-recall-at-precision>=0.5”: 0.4521409602983532, “minimal-precision-at-precision>=0.5”: 0.5628693587387662, “minimal-recall-at-precision>=0.75”: 0.30776934850578797, “minimal-precision-at-precision>=0.75”: 0.7509572103541227, “minimal-recall-at-precision>=0.9”: 0.1520575443292071, “minimal-precision-at-precision>=0.9”: 0.9275510204081632}
Passage & finnish & \fpr{63.6}{64.2}{63.0}
Minimal Answer & finnish & \fpr{55.5}{67.4}{47.2}
{“passage-best-threshold-f1”: 0.6358557643473844, “passage-best-threshold-precision”: 0.642051282051282, “passage-best-threshold-recall”: 0.6297786720321932, “passage-best-threshold”: 2.322622299194336, “passage-recall-at-precision>=0.5”: 0.7293762575452716, “passage-precision-at-precision>=0.5”: 0.5, “passage-recall-at-precision>=0.75”: 0.5160965794768612, “passage-precision-at-precision>=0.75”: 0.7588757396449705, “passage-recall-at-precision>=0.9”: 0.3722334004024145, “passage-precision-at-precision>=0.9”: 0.9024390243902439, “minimal-best-threshold-f1”: 0.5550572674976648, “minimal-best-threshold-precision”: 0.6744244217982379, “minimal-best-threshold-recall”: 0.47159000922733935, “minimal-best-threshold”: 5.607261657714844, “minimal-recall-at-precision>=0.5”: 0.5672765128184869, “minimal-precision-at-precision>=0.5”: 0.5029851746990583, “minimal-recall-at-precision>=0.75”: 0.4132970478490073, “minimal-precision-at-precision>=0.75”: 0.7500576053556058, “minimal-recall-at-precision>=0.9”: 0.2586424859932716, “minimal-precision-at-precision>=0.9”: 0.901858256403505}
W0620 16:35:09.052904 140525045303040 tydi_eval.py:233] Predictions missing for 1 examples.
I0620 16:35:09.053056 140525045303040 tydi_eval.py:234] Missing ids: [15806027368664557]
Passage & indonesian & \fpr{65.3}{70.3}{60.9}
Minimal Answer & indonesian & \fpr{57.0}{62.9}{52.1}
{“passage-best-threshold-f1”: 0.6525974025974027, “passage-best-threshold-precision”: 0.7027972027972028, “passage-best-threshold-recall”: 0.6090909090909091, “passage-best-threshold”: 4.978901386260986, “passage-recall-at-precision>=0.5”: 0.746969696969697, “passage-precision-at-precision>=0.5”: 0.5015259409969481, “passage-recall-at-precision>=0.75”: 0.5454545454545454, “passage-precision-at-precision>=0.75”: 0.7515657620041754, “passage-recall-at-precision>=0.9”: 0.33181818181818185, “passage-precision-at-precision>=0.9”: 0.9012345679012346, “minimal-best-threshold-f1”: 0.5698175817649904, “minimal-best-threshold-precision”: 0.629327181339338, “minimal-best-threshold-recall”: 0.5205902218724493, “minimal-best-threshold”: 5.424781799316406, “minimal-recall-at-precision>=0.5”: 0.5881461611726815, “minimal-precision-at-precision>=0.5”: 0.5028073064142924, “minimal-recall-at-precision>=0.75”: 0.3683726523303538, “minimal-precision-at-precision>=0.75”: 0.7505162449347397, “minimal-recall-at-precision>=0.9”: 0.17684339114577521, “minimal-precision-at-precision>=0.9”: 0.9035592016354452}
Passage & japanese & \fpr{52.0}{58.6}{46.7}
Minimal Answer & japanese & \fpr{42.4}{53.1}{35.2}
{“passage-best-threshold-f1”: 0.5198019801980198, “passage-best-threshold-precision”: 0.5855018587360595, “passage-best-threshold-recall”: 0.46735905044510384, “passage-best-threshold”: 3.351984977722168, “passage-recall-at-precision>=0.5”: 0.5192878338278932, “passage-precision-at-precision>=0.5”: 0.5043227665706052, “passage-recall-at-precision>=0.75”: 0.3486646884272997, “passage-precision-at-precision>=0.75”: 0.7507987220447284, “passage-recall-at-precision>=0.9”: 0.21364985163204747, “passage-precision-at-precision>=0.9”: 0.9, “minimal-best-threshold-f1”: 0.4235726977165413, “minimal-best-threshold-precision”: 0.5314451838172493, “minimal-best-threshold-recall”: 0.35210300104300724, “minimal-best-threshold”: 4.627134323120117, “minimal-recall-at-precision>=0.5”: 0.35925322955546485, “minimal-precision-at-precision>=0.5”: 0.5123125525228042, “minimal-recall-at-precision>=0.75”: 0.26545638495010104, “minimal-precision-at-precision>=0.75”: 0.7521264240252864, “minimal-recall-at-precision>=0.9”: 0.1243942037821494, “minimal-precision-at-precision>=0.9”: 0.9029062431827922}
W0620 16:35:09.126857 140525045303040 tydi_eval.py:233] Predictions missing for 1 examples.
I0620 16:35:09.127008 140525045303040 tydi_eval.py:234] Missing ids: [-6615280656492729722]
Passage & swahili & \fpr{69.6}{72.6}{66.7}
Minimal Answer & swahili & \fpr{62.1}{65.2}{59.3}
{“passage-best-threshold-f1”: 0.6955719557195571, “passage-best-threshold-precision”: 0.7263969171483622, “passage-best-threshold-recall”: 0.6672566371681415, “passage-best-threshold”: 3.9792022705078125, “passage-recall-at-precision>=0.5”: 0.8017699115044248, “passage-precision-at-precision>=0.5”: 0.5050167224080268, “passage-recall-at-precision>=0.75”: 0.6247787610619469, “passage-precision-at-precision>=0.75”: 0.7510638297872341, “passage-recall-at-precision>=0.9”: 0.44778761061946903, “passage-precision-at-precision>=0.9”: 0.9035714285714286, “minimal-best-threshold-f1”: 0.6208197471155671, “minimal-best-threshold-precision”: 0.6518607344713455, “minimal-best-threshold-recall”: 0.5926006677012232, “minimal-best-threshold”: 4.171348571777344, “minimal-recall-at-precision>=0.5”: 0.6443927365667972, “minimal-precision-at-precision>=0.5”: 0.5034879181253109, “minimal-recall-at-precision>=0.75”: 0.5040697230708423, “minimal-precision-at-precision>=0.75”: 0.7500878372486539, “minimal-recall-at-precision>=0.9”: 0.3758919218887072, “minimal-precision-at-precision>=0.9”: 0.9050444986247413}
Passage & korean & \fpr{62.2}{68.9}{56.7}
Minimal Answer & korean & \fpr{39.4}{47.9}{33.5}
{“passage-best-threshold-f1”: 0.6217765042979942, “passage-best-threshold-precision”: 0.6888888888888889, “passage-best-threshold-recall”: 0.566579634464752, “passage-best-threshold”: 3.5800743103027344, “passage-recall-at-precision>=0.5”: 0.6762402088772846, “passage-precision-at-precision>=0.5”: 0.5078431372549019, “passage-recall-at-precision>=0.75”: 0.5143603133159269, “passage-precision-at-precision>=0.75”: 0.7576923076923077, “passage-recall-at-precision>=0.9”: 0.2741514360313316, “passage-precision-at-precision>=0.9”: 0.9051724137931034, “minimal-best-threshold-f1”: 0.39417932534116856, “minimal-best-threshold-precision”: 0.4790794877223433, “minimal-best-threshold-recall”: 0.3348405021715303, “minimal-best-threshold”: 4.7461347579956055, “minimal-recall-at-precision>=0.5”: 0.3187114699134658, “minimal-precision-at-precision>=0.5”: 0.5002559780920223, “minimal-recall-at-precision>=0.75”: 0, “minimal-precision-at-precision>=0.75”: 0, “minimal-recall-at-precision>=0.9”: 0, “minimal-precision-at-precision>=0.9”: 0}
Passage & russian & \fpr{64.7}{66.2}{63.3}
Minimal Answer & russian & \fpr{49.3}{58.1}{42.8}
{“passage-best-threshold-f1”: 0.647175421209118, “passage-best-threshold-precision”: 0.6622718052738337, “passage-best-threshold-recall”: 0.6327519379844961, “passage-best-threshold”: 1.5278520584106445, “passage-recall-at-precision>=0.5”: 0.7209302325581395, “passage-precision-at-precision>=0.5”: 0.5003362474781439, “passage-recall-at-precision>=0.75”: 0.5174418604651163, “passage-precision-at-precision>=0.75”: 0.75, “passage-recall-at-precision>=0.9”: 0.34205426356589147, “passage-precision-at-precision>=0.9”: 0.9028132992327366, “minimal-best-threshold-f1”: 0.49289272582109733, “minimal-best-threshold-precision”: 0.5813522812499168, “minimal-best-threshold-recall”: 0.4277982186585645, “minimal-best-threshold”: 4.268627643585205, “minimal-recall-at-precision>=0.5”: 0.45303643977365304, “minimal-precision-at-precision>=0.5”: 0.5025031213056952, “minimal-recall-at-precision>=0.75”: 0.31390674958684145, “minimal-precision-at-precision>=0.75”: 0.7524960866264002, “minimal-recall-at-precision>=0.9”: 0.11343088850886904, “minimal-precision-at-precision>=0.9”: 0.9021712527914699}
W0620 16:35:09.236227 140525045303040 tydi_eval.py:233] Predictions missing for 8 examples.
I0620 16:35:09.236372 140525045303040 tydi_eval.py:234] Missing ids: [-7699728260210150066, -7109862816436123410, -7013079242391337056, -4524843330972236201, 3843729750068551852, 5585953506784041766, 5902650000795019979, 6373078339589919319]
Passage & telugu & \fpr{84.1}{82.6}{85.6}
Minimal Answer & telugu & \fpr{77.8}{82.4}{73.7}
{“passage-best-threshold-f1”: 0.8406004288777699, “passage-best-threshold-precision”: 0.8258426966292135, “passage-best-threshold-recall”: 0.8558951965065502, “passage-best-threshold”: 3.646038055419922, “passage-recall-at-precision>=0.5”: 0.9155749636098981, “passage-precision-at-precision>=0.5”: 0.5138888888888888, “passage-recall-at-precision>=0.75”: 0.8806404657933042, “passage-precision-at-precision>=0.75”: 0.7553058676654182, “passage-recall-at-precision>=0.9”: 0.7656477438136827, “passage-precision-at-precision>=0.9”: 0.9006849315068494, “minimal-best-threshold-f1”: 0.7781368588172927, “minimal-best-threshold-precision”: 0.8243306021082858, “minimal-best-threshold-recall”: 0.7368455905242973, “minimal-best-threshold”: 6.738748550415039, “minimal-recall-at-precision>=0.5”: 0.8284551801756967, “minimal-precision-at-precision>=0.5”: 0.502025829291251, “minimal-recall-at-precision>=0.75”: 0.7773096948922542, “minimal-precision-at-precision>=0.75”: 0.755843293434474, “minimal-recall-at-precision>=0.9”: 0.6329431445948274, “minimal-precision-at-precision>=0.9”: 0.9009339653913606}
Passage & thai & \fpr{64.0}{65.0}{63.1}
Minimal Answer & thai & \fpr{51.5}{62.3}{43.9}
{“passage-best-threshold-f1”: 0.6404773744405767, “passage-best-threshold-precision”: 0.649848637739657, “passage-best-threshold-recall”: 0.6313725490196078, “passage-best-threshold”: 3.4539432525634766, “passage-recall-at-precision>=0.5”: 0.7568627450980392, “passage-precision-at-precision>=0.5”: 0.5012987012987012, “passage-recall-at-precision>=0.75”: 0.5058823529411764, “passage-precision-at-precision>=0.75”: 0.7532846715328467, “passage-recall-at-precision>=0.9”: 0.2088235294117647, “passage-precision-at-precision>=0.9”: 0.902542372881356, “minimal-best-threshold-f1”: 0.5148919305454571, “minimal-best-threshold-precision”: 0.622668865711407, “minimal-best-threshold-recall”: 0.43891986780074155, “minimal-best-threshold”: 6.228517532348633, “minimal-recall-at-precision>=0.5”: 0.4919076994984223, “minimal-precision-at-precision>=0.5”: 0.5077927705263584, “minimal-recall-at-precision>=0.75”: 0.304829936533608, “minimal-precision-at-precision>=0.75”: 0.753432755504459, “minimal-recall-at-precision>=0.9”: 0.024739996751653098, “minimal-precision-at-precision>=0.9”: 0.9125252648013584}
Total # examples in gold: 18670, # ex. in pred: 18660 (including english)
*** Macro Over 10 Languages, excluding English **
Passage F1:0.671 P:0.698 R:0.647611
\fpr{67.1}{69.8}{64.8}
Minimal F1:0.558 P:0.638 R:0.498727
\fpr{55.8}{63.8}{49.9}
*** / Aggregate Scores ****
{“avg_passage_f1”: 0.6709659300823626, “avg_passage_recall”: 0.6476108490540536, “avg_passage_precision”: 0.6982056079387957, “avg_minimal_f1”: 0.5579669476591489, “avg_minimal_recall”: 0.49872730655467007, “avg_minimal_precision”: 0.6382151841052168}
–gold_path：从官方下载的 tydiqa-v1.0-dev.jsonl.gz 文件路径。

–predictions_path：步骤一种输出 JSON 评估文件的路径。步骤三：清理过程文件

在 data/tydiqa_baseline_model/predict 文件夹下会生成用于储存 logits 的 results_gpu_*.pickle 文件。测试结束后可以将其删除。

4.2.3 复现结果
以下复现结果为多次微调、预测、评估后的 macro F1 平均值：

TydiQA 任务 Canine 论文精度本仓库复现精度
Passage Selection Task (SELECTP) 66.0% 65.92%
Minimal Answer Span Task (MINSPAN) 52.8% 55.04%
各次微调的日志、评估文件等可以在 logs 文件夹中查看。训练结果整理：

batch size acc grad steps 理论 batch size seed epoch TydiQA SelectP F1 TydiQA MinSpan F1
V100 16 1 16 2021 3 66.01% 55.77%
V100 16 1 16 666 3 67.02% 56.17%
v100 16 32 512 5121 10 64.35% 53.58%
v100 16 32 512 555 4 66.29% 54.12%
3090*4 14 9 504 5123 4 65.93% 55.60%

- - - ```
      平均	65.92%	55.04%
```

此外，以下展示了所有复现过程中进行过的其他微调结果，由于参数配置问题，他们不被计入论文复现精度，但仍可以为该模型在Tydi任务上的训练效果提供一些信息。

设备 batch size acc grad steps 理论 batch size seed 混合精度训练 epoch warm up TydiQA SelectP F1 TydiQA MinSpan F1
V1001 20 25 500 6 否 10 0.01 64.38% 53.73%
30904 10 12 480 6 否 10 0.01 65.23% 53.49%
30904 10 1 40 6 否 10 0.01 67.31% 53.11%
V1004 16 1 64 2022 是 3 0.01 67.26% 56.41%
V1004 16 1 64 2020 是 3 0.01 67.29% 56.42%
V1004 8 1 32 2020 是 3 0.1 67.26% 56.37%
A4000*4 8 1 32 2020 是 3 0.1 67.43% 55.91%

- - - - 平均 66.59% 55.06%
        备注：

官方 warm up 比例为 0.1；
此处理论 batch_size = batch_size * accumulate_gradient_steps * number_GPU
除上述两者外，其他训练配置均与官方相同。
在 Tydi Leaderboard 上，TydiQA SelectP 任务的 Top5/Top1 成绩为 72.5%/79.5%。而 TydiQA MinSpan 的 Top5/Top1 成绩为 63.4%/72.35%。canine 与 SOTA 还是有点差距的。
5. 模型推理部署
建议进入 test_tipc 文件夹查看详细信息。

自动化测试脚本
建议进入 test_tipc 文件夹查看详细信息。
LICENSE
本项目的发布受Apache 2.0 license许可认证。
参考链接
https://paperswithcode.com/paper/canine-pre-training-an-efficient-tokenization

https://github.com/google-research/language/tree/master/language/canine

https://github.com/huggingface/transformers/tree/main/src/transformers/models/canine

https://github.com/google-research-datasets/tydiqa

3.2 数据准备请点击此处查看本环境基本用法.
Please click here for more detailed instructions.