【论文复现】AAAI2021-开放意图识别模型-ADB

项目复现AAAI2021的开放意图识别模型-ADB

AI Studio

645人浏览 · 2023-04-26 19:38:02

AI Studio · 2023-04-26 19:38:02 发布

★★★ 本文源自AlStudio社区精品项目，【点击此处】查看更多精品内容 >>>

Adaptive Decision Boundary

标题：Deep Open Intent Classification with Adaptive Decision Boundary
作者：Hanlei Zhang, Hua Xu, Ting-En Lin
单位：清华大学、西门子公司
论文：AAAI2021接收论文 https://arxiv.org/abs/2012.10209
摘要：

Abstract: Open intent classification is a challenging task in dialogue systems. On the one hand, it should ensure the quality of known intent identification. On the other hand, it needs to detect the open (unknown) intent without prior knowledge. Current models are limited in finding the appropriate decision boundary to balance the performances of both known intents and the open intent. In this paper, we propose a post-processing method to learn the adaptive decision boundary (ADB) for open intent classification. We first utilize the labeled known intent samples to pre-train the model. Then, we automatically learn the adaptive spherical decision boundary for each known class with the aid of well-trained features. Specifically, we propose a new loss function to balance both the empirical risk and the open space risk. Our method does not need open intent samples and is free from modifying the model architecture. Moreover, our approach is surprisingly insensitive with less labeled data and fewer known intents. Extensive experiments on three benchmark datasets show that our method yields significant improvements compared with the state-of-the-art methods. The codes are released at this https URL.
摘要：在对话系统中，开放式意图分类是一项具有挑战性的任务。一方面，它应该保证已知意图识别的质量，另一方面，它需要在没有先验知识的情况下检测到开放式（未知）意图。目前的模型在寻找适当的决策边界以平衡已知意图和开放式意图的性能方面存在局限性。本文提出了一种后处理方法，用于学习自适应决策边界（ADB）用于开放式意图分类。首先，我们利用标记的已知意图样本进行预训练模型。然后，我们利用训练良好的特征自动学习每个已知类别的自适应球形决策边界。具体而言，我们提出了一种新的损失函数来平衡经验风险和开放空间风险。我们的方法不需要开放式意图样本，并且不需要修改模型架构。此外，我们的方法对于标记不足的数据和已知意图类别较少的情况非常不敏感。在三个基准数据集上的大量实验表明，与最先进的方法相比，我们的方法具有显着的改进。本文的代码发布在URL上。

模型框架：

数据集：

实验结果：

# 更新paddlenlp
!pip install -U paddlenlp
# 解压数据集
!unzip -d data data/data154307/ADB_data.zip

# # 分步操作-1.预训练
# !python run.py \
#     --seed 0 \
#     --start_Pretrain \
#     --do_train \
#     --do_eval \
#     --device gpu \
#     --dataset "stackoverflow" \
#     --known_cls_ratio 0.25 \
#     --labeled_ratio 0.5 \
#     --output_dir "done_model" \
#     --model_name_or_path "ernie-2.0-base-en"\
#     --overwrite_output_dir \
#     --warmup_ratio 0.1 \
#     --learning_rate 2e-5 \
#     --num_train_epochs 100 \
#     --per_device_train_batch_size 128 \
#     --per_device_eval_batch_size 64 \
#     --metric_for_best_model "eval_macro_f1" \
#     --early_stopping \
#     --evaluation_strategy epoch \
#     --load_best_model_at_end \
#     --save_strategy epoch \
#     --save_total_limit 1 \
#     --disable_tqdm True

# # 分步操作-2.自适应决策边界
# !python run.py \
#     --seed 0 \
#     --start_ADB \
#     --device gpu \
#     --dataset "stackoverflow" \
#     --known_cls_ratio 0.25 \
#     --labeled_ratio 0.5 \
#     --output_dir "done_model" \
#     --model_name_or_path "done_model" \
#     --lr_boundary 0.05 \
#     --num_train_epochs 100 \
#     --adb_device_train_batch_size 128 \
#     --adb_device_eval_batch_size 64

# 一键训练和测试
!python run.py \
    --seed 0 \
    --start_Pretrain \
    --do_train \
    --do_eval \
    --device gpu \
    --dataset "stackoverflow" \
    --known_cls_ratio 0.25 \
    --labeled_ratio 1.0 \
    --output_dir "done_model" \
    --model_name_or_path "ernie-2.0-base-en"\
    --overwrite_output_dir \
    --warmup_ratio 0.1 \
    --learning_rate 2e-5 \
    --num_train_epochs 100 \
    --per_device_train_batch_size 128 \
    --per_device_eval_batch_size 64 \
    --metric_for_best_model "eval_macro_f1" \
    --early_stopping \
    --evaluation_strategy epoch \
    --load_best_model_at_end \
    --save_strategy epoch \
    --save_total_limit 1 \
    --disable_tqdm True \
    --start_ADB \
    --lr_boundary 0.05 \
    --adb_device_train_batch_size 128 \
    --adb_device_eval_batch_size 64

复现Tabel 2和Tabel 3的实验结果

在BANKING, OOS 和 StackOverflow 三个数据集上，使用不同已知类比率（25％，50％和75％）的开放分类结果。
“Accuracy” 和 “F1-score”分别表示所有类别的准确率和宏F1分数。
“Open”和“Known”分别表示开放类和已知类的宏f1分数。

复现结果：

	Banking	Banking	oos	oos	stackoverflow	stackoverflow
ratio	Accuracy	F1-score	Accuracy	F1-score	Accuracy	F1-score
25%	63.8	65.6195	70.4	64.9428	49.7	56.4304
50%	71.4	79.6125	77.63	80.8123	72.72	77.53
75%	81.27	87.5629	84.91	88.596	77.67	83.2664

ernie-2.0-base-en 复现结果：

	Banking	Banking	oos	oos	stackoverflow	stackoverflow
ratio	Open	Known	Open	Known	Open	Known
25%	69.0889	65.4369	77.5117	64.6121	51.8410	57.3483
50%	64.2706	80.0163	78.5085	80.8431	68.1969	78.4633
75%	60.8696	88.0231	80.2738	88.6703	50.8588	85.4269

使用ernie-2.0-base-en和论文结果有一定差距，待后续更换bert权重测试，仅在OOS上表现的还算接近
一是Open类并未分类的很好
二是在少样本数量和少类别数量下特别敏感，不是论文中说的不敏感

%%capture
for choice_dataset in ['oos','stackoverflow','banking']:
    for choice_know_ratio in [0.25,0.5,0.75]:
        !python run.py \
            --seed 0 \
            --start_Pretrain \
            --do_train \
            --do_eval \
            --device gpu \
            --dataset {choice_dataset} \
            --known_cls_ratio {choice_know_ratio} \
            --labeled_ratio 1.0 \
            --output_dir "done_model" \
            --model_name_or_path "ernie-2.0-base-en"\
            --overwrite_output_dir \
            --warmup_ratio 0.1 \
            --learning_rate 2e-5 \
            --num_train_epochs 100 \
            --per_device_train_batch_size 128 \
            --per_device_eval_batch_size 64 \
            --metric_for_best_model "eval_macro_f1" \
            --early_stopping \
            --evaluation_strategy epoch \
            --load_best_model_at_end \
            --save_strategy epoch \
            --save_total_limit 1 \
            --disable_tqdm True \
            --start_ADB \
            --lr_boundary 0.05 \
            --adb_device_train_batch_size 128 \
       --adb_device_train_batch_size 128 \
            --adb_device_eval_batch_size 64

import pandas as pd
test_result = pd.read_csv("./outputs/results.csv")
test_result

	Known	Open	F1-score	Accuracy	dataset	known_cls_ratio	labeled_ratio
0	64.6121	77.5117	64.9428	70.40	oos	0.25	1.0
1	80.8431	78.5085	80.8123	77.63	oos	0.50	1.0
2	88.6703	80.2738	88.5960	84.91	oos	0.75	1.0
3	57.3483	51.8410	56.4304	49.70	stackoverflow	0.25	1.0
4	78.4633	68.1969	77.5300	72.72	stackoverflow	0.50	1.0
5	85.4269	50.8588	83.2664	77.67	stackoverflow	0.75	1.0
6	65.4369	69.0889	65.6195	63.80	banking	0.25	1.0
7	80.0163	64.2706	79.6125	71.40	banking	0.50	1.0
8	88.0231	60.8696	87.5629	81.27	banking	0.75	1.0