Log Parsing with Prompt-based Few-shot Learning (2302.07435v1)

Published 15 Feb 2023 in cs.SE

Abstract: Logs generated by large-scale software systems provide crucial information for engineers to understand the system status and diagnose problems of the systems. Log parsing, which converts raw log messages into structured data, is the first step to enabling automated log analytics. Existing log parsers extract the common part as log templates using statistical features. However, these log parsers often fail to identify the correct templates and parameters because: 1) they often overlook the semantic meaning of log messages, and 2) they require domain-specific knowledge for different log datasets. To address the limitations of existing methods, in this paper, we propose LogPPT to capture the patterns of templates using prompt-based few-shot learning. LogPPT utilises a novel prompt tuning method to recognise keywords and parameters based on a few labelled log data. In addition, an adaptive random sampling algorithm is designed to select a small yet diverse training set. We have conducted extensive experiments on 16 public log datasets. The experimental results show that LogPPT is effective and efficient for log parsing.

Authors (2)

Van-Hoang Le (19 papers)
Hongyu Zhang (147 papers)

Citations (46)

View on Semantic Scholar

Summary

The paper introduces LogPPT, which integrates prompt-based few-shot learning to parse logs efficiently using minimal labeled data.
It employs an adaptive random sampling algorithm to select diverse training samples that enhance semantic understanding.
Empirical evaluations show that LogPPT outperforms traditional log parsers in accuracy and robustness across multiple datasets.

Log Parsing with Prompt-based Few-shot Learning: An Overview

The paper "Log Parsing with Prompt-based Few-shot Learning" by Le and Zhang introduces a novel approach to improve the process of log parsing through a method named LogPPT. This paper targets the inherent limitations of traditional log parsers that struggle with semantic comprehension and require substantial domain expertise.

Core Contributions

Innovation in Approach

LogPPT introduces a unique combination of prompt-based few-shot learning with adaptive data sampling. The approach is centered around the utilization of pre-trained LLMs, specifically RoBERTa, for improved semantic understanding. By adopting a prompt tuning strategy, LogPPT can recognize patterns in log templates and parameters using a minimal amount of labelled data.

Adaptive Random Sampling

To efficiently select a diverse set of training samples, the authors propose an Adaptive Random Sampling algorithm. This algorithm effectively reduces the input size while maintaining representative diversity, allowing LogPPT to function with as few as 32 labelled samples.

Prompt Tuning Mechanism

The paper utilizes a template-free prompt tuning method to align clues from large-scale pre-trained LLMs with log parsing tasks. This paradigm shift from conventional supervised learning facilitates accurate parsing from limited data without domain-specific pre-processing.

Empirical Evaluation

Experimental Setup and Results

LogPPT was evaluated on 16 public log datasets, demonstrating significant accuracy improvements across metrics such as Group Accuracy and Parsing Accuracy. LogPPT achieved consistently high parsing accuracy, exceeding other state-of-the-art parsers such as Drain and Spell by substantial margins. Even under conditions with unseen logs, LogPPT maintained superior performance metrics.

Robustness and Efficiency

The approach exhibited robustness across diverse logging formats without needing domain-specific adjustments. Runtime evaluations revealed competitive processing times, leveraging GPU acceleration to efficiently handle large log volumes.

Implications and Future Prospects

Theoretical and Practical Implications

The introduction of log parsing via prompt-based few-shot learning may redefine the baseline for semantic log analysis. By mitigating the need for extensive pre-definition of domain ontologies or frequent retraining, this method enhances scalability in dynamic environments.

Potential Developments

The principles behind LogPPT could be expanded to other domains requiring minimal training data yet leveraging deep semantic models. Future work may explore deeper integration of such models into operational log systems or extend the adaptive sampling methods to other forms of data analytics.

Overall, this paper presents a well-articulated contribution to log parsing methodologies, emphasizing semantic understanding with reduced human intervention. It invites continued exploration into the role of pre-trained models in handling the complexities of structured data conversion.

PDF Markdown

Related Papers

GitHub

GitHub - logpai/loghub: A large collection of system log datasets for AI-driven log analytics [ISSRE'23] (2,191 stars)