Rasa自定义NLU组件

问题描述

想用自定义组件(如情感分析、拼写检查、字符分词器等)加强Rasa现有NLU模型




Rasa NLU pipeline介绍

pipeline定义了输入到输出经过哪些处理,如:

在这里插入图片描述

pipeline:
- name: "SpacyNLP"
- name: "SpacyTokenizer"
- name: "SpacyFeaturizer"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "SklearnIntentClassifier"

每个组件会一个接一个调用,并产生输出,这些输出要么直接作为最终输出,要么作为其他组件的输入

在这里插入图片描述




步骤

以下添加【情感分析组件】为例:

  1. 安装自然语言处理库spaCy
pip install spacy
python -m spacy download en_core_web_sm
  1. 创建项目:rasa init --no-prompt

  2. nlu.md顶部添加数据

## intent: feedback
- It’s very helpful
- I had the best experience speaking with you
- no feedback
- ok
- You are the most stupid bot I have ever seen
- the worst
  1. 添加标签labels.txt
pos
pos
neu
neu
neg
neg
  1. 构建情感分析组件sentiment.py。实现继承Component类的方法,具体有训练train()、解析process()、持久化persist()、加载load()
import pickle
from typing import Any, Text, Dict
from rasa.nlu.components import Component
from nltk.classify import NaiveBayesClassifier

SENTIMENT_MODEL_FILE_NAME = "sentiment_classifier.pkl"


class SentimentAnalyzer(Component):
    """自定义情感分析组件"""
    name = "sentiment"
    provides = ["entities"]
    requires = ["tokens"]
    defaults = {}
    language_list = ["en"]

    def __init__(self, component_config=None):
        super(SentimentAnalyzer, self).__init__(component_config)

    def train(self, training_data, cfg, **kwargs):
        """从文本文件中加载情感标签,检索训练分词并格式化,形成情感分类器"""
        with open("labels.txt", "r") as f:
            labels = f.read().splitlines()
        training_data = training_data.training_examples  # list of Message objects
        tokens = [list(map(lambda x: x.text, t.get("tokens"))) for t in training_data]
        processed_tokens = [self.preprocessing(t) for t in tokens]
        labeled_data = [(t, x) for t, x in zip(processed_tokens, labels)]
        self.clf = NaiveBayesClassifier.train(labeled_data)

    def convert_to_rasa(self, value, confidence):
        """将模型输出转换为Rasa NLU的输出格式"""
        entity = {"value": value,
                  "confidence": confidence,
                  "entity": "sentiment",
                  "extractor": "sentiment_extractor"}
        return entity

    def preprocessing(self, tokens):
        """创建训练示例的词袋表示"""
        return ({word: True for word in tokens})

    def process(self, message, **kwargs):
        """检索新消息的分词,并将其传给分类器,将预测结果追加到message中"""
        if not self.clf:
            print("No training!")
        else:
            tokens = [t.text for t in message.get("tokens")]
            tb = self.preprocessing(tokens)
            pred = self.clf.prob_classify(tb)
            sentiment = pred.max()
            confidence = pred.prob(sentiment)
            entity = self.convert_to_rasa(sentiment, confidence)
            message.set("entities", [entity], add_to_output=True)

    def persist(self, file_name, model_dir):
        """将整个类持久化"""
        classifier_file = SENTIMENT_MODEL_FILE_NAME
        with open(classifier_file, "wb") as f:
            pickle.dump(self, f, pickle.HIGHEST_PROTOCOL)
        return {"classifier_file": SENTIMENT_MODEL_FILE_NAME}

    @classmethod
    def load(cls, meta: Dict[Text, Any], model_dir=None, model_metadata=None, cached_component=None, **kwargs):
        file_name = meta.get("classifier_file")
        classifier_file = file_name
        with open(classifier_file, "rb") as f:
            return pickle.load(f)
  1. 修改config.yml,添加自定义组件,格式为模块名.类名
language: en
pipeline:
- name: "SpacyNLP"
- name: "SpacyTokenizer"
- name: "sentiment.SentimentAnalyzer"
- name: "SpacyFeaturizer"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "SklearnIntentClassifier"
  1. 当祖安用户发来亲切的问候Hello stupid bot
"entities": [
    {
      "value": "neg",
      "confidence": 0.8181818181818182,
      "entity": "sentiment",
      "extractor": "sentiment_extractor"
    }
  ]




备注

教程原文How to Enhance Rasa NLU Models with Custom Components,作者很好看有没有

在这里插入图片描述




参考文献

  1. Custom NLU Components
  2. How to Enhance Rasa NLU Models with Custom Components | Rasa Blog
  3. Rasa 详解自定义 NLU 组件
  4. Python工业级自然语言处理库spaCy
  5. Rasa自定义组件预测置信度为平均值
已标记关键词 清除标记
相关推荐
©️2020 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客 返回首页