PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained Language Model (2203.17090v3)

Published 31 Mar 2022 in cs.CL

Abstract: In this paper, we introduce PanGu-Bot, a Chinese pre-trained open-domain dialogue generation model based on a large pre-trained LLM (PLM) PANGU-alpha (Zeng et al.,2021). Different from other pre-trained dialogue models trained over a massive amount of dialogue data from scratch, we aim to build a powerful dialogue model with relatively fewer data and computation costs by inheriting valuable language capabilities and knowledge from PLMs. To this end, we train PanGu-Bot from the large PLM PANGU-alpha, which has been proven well-performed on a variety of Chinese natural language tasks. We investigate different aspects of responses generated by PanGu-Bot, including response quality, knowledge, and safety. We show that PanGu-Bot outperforms state-of-the-art Chinese dialogue systems (CDIALGPT (Wang et al., 2020), EVA (Zhou et al., 2021), EVA2.0 (Gu et al., 2022)) w.r.t. the above three aspects. We also demonstrate that PanGu-Bot can be easily deployed to generate emotional responses without further training. Throughout our empirical analysis, we also point out that the PanGu-Bot response quality, knowledge correctness, and safety are still far from perfect, and further explorations are indispensable to building reliable and smart dialogue systems. Our model and code will be available at https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/PanGu-Bot soon.

Citations (17)

View on Semantic Scholar

Summary

The paper demonstrates a novel approach by fine-tuning a pre-trained Chinese language model to create PanGu-Bot.
It shows that PanGu-Bot achieves superior dialogue quality and effective knowledge integration while using fewer resources.
The study highlights advancements in safety and emotional response generation without relying on extensive emotion-labeled training data.

An Evaluation of PanGu-Bot: A Chinese Generative Dialogue Model

The paper explores PanGu-Bot, a Chinese dialogue generation model developed to leverage the capabilities of pre-trained LLMs (PLMs) with reduced data and computational resources. Built upon the large-scale LLM PanGu- $, PanGu-Bot marks a departure from conventional approaches that train dialogue models from scratch using extensive datasets. Instead, it demonstrates the utility of utilizing pre-existing linguistic capabilities in PLMs to enhance dialogue generation tasks.</p> <h3 class='paper-heading' id='methodology-and-model-architecture'>Methodology and Model Architecture</h3> <p>PanGu-Bot is designed by fine-tuning PanGu-$ , a Chinese PLM, intending to enhance dialogue quality with fewer computational demands. Two versions of PanGu-Bot, encompassing 350 million and 2.6 billion parameters, were trained using 100 million high-quality dialogue utterances. This approach underscores a strategic shift towards reducing computational costs by inheriting linguistic knowledge from PanGu- $.</p> <p>The architecture of PanGu-Bot adheres to the GPT-style autoregressive framework, incorporating PanGu-$ 's transformer layers. Training methodologies involved careful curation and preprocessing of dialogue data, ensuring quality while minimizing volume. The training process leveraged mixed-precision techniques across powerful GPU infrastructures, emphasizing efficiency.

Experimental Analysis

Dialogue Quality

The paper evaluates PanGu-Bot against state-of-the-art dialogue systems including CDialGPT, EVA, and EVA2.0 using both self-chat and interactive human evaluations. PanGu-Bot demonstrates superior overall response quality, particularly in sensibility, specificity, and interestingness, which are integral to engaging conversations. A noteworthy aspect is the model's ability to generate diverse and contextually appropriate responses without extensive data.

Knowledge Integration

PanGu-Bot's capacity to generate factually coherent responses underscores its inherited knowledge from PanGu-$. Knowledge evaluations across various domains (e.g., literature, geography) reveal that PanGu-Bot effectively utilizes encoded knowledge, outperforming baseline models. This inheritability highlights the model's robustness in retaining and applying learned knowledge domains effectively.

Safety Evaluation

With the dissemination of potentially harmful dialogue being a critical challenge, PanGu-Bot's responses were scrutinized for safety through adversarial prompts. While it exhibited commendable safety metrics, the paper acknowledges existing vulnerabilities, advocating for continued exploration into comprehensive safety measures.

Emotional Response Generation

An interesting facet of PanGu-Bot is its ability to generate emotion-specific responses without explicit training on emotion-labeled datasets. Through simple prompts, the model can accurately align its responses with specified emotional tones, illustrating its adaptive architecture.

Implications and Future Directions

The PanGu-Bot offers an insightful model for dialogue generation leveraging existing PLMs, emphasizing efficiency in computational resources. The approach challenges the traditional paradigm of training from scratch, advocating for the utilization of pre-existing linguistic models that offer enriched, contextually aware dialogue generation.

Future research could delve into complementary strategies like knowledge grounding through retrieval methods and enhancing persona-aware dialogue modeling. Additionally, advancements in mitigating safety concerns remain a priority area, as underscored by the comparative analyses.

Overall, PanGu-Bot sets a precedent for scalable and resource-efficient models in open-domain dialogue systems, highlighting the potential for PLMs in advancing AI language applications.

PDF Markdown