Using LLMs for the Extraction and Normalization of Product Attribute Values (2403.02130v4)

Published 4 Mar 2024 in cs.CL

Abstract: Product offers on e-commerce websites often consist of a product title and a textual product description. In order to enable features such as faceted product search or to generate product comparison tables, it is necessary to extract structured attribute-value pairs from the unstructured product titles and descriptions and to normalize the extracted values to a single, unified scale for each attribute. This paper explores the potential of using LLMs, such as GPT-3.5 and GPT-4, to extract and normalize attribute values from product titles and descriptions. We experiment with different zero-shot and few-shot prompt templates for instructing LLMs to extract and normalize attribute-value pairs. We introduce the Web Data Commons - Product Attribute Value Extraction (WDC-PAVE) benchmark dataset for our experiments. WDC-PAVE consists of product offers from 59 different websites which provide schema.org annotations. The offers belong to five different product categories, each with a specific set of attributes. The dataset provides manually verified attribute-value pairs in two forms: (i) directly extracted values and (ii) normalized attribute values. The normalization of the attribute values requires systems to perform the following types of operations: name expansion, generalization, unit of measurement conversion, and string wrangling. Our experiments demonstrate that GPT-4 outperforms the PLM-based extraction methods SU-OpenTag, AVEQA, and MAVEQA by 10%, achieving an F1-score of 91%. For the extraction and normalization of product attribute values, GPT-4 achieves a similar performance to the extraction scenario, while being particularly strong at string wrangling and name expansion.

Citations (5)

View on Semantic Scholar

Summary

The paper introduces a novel method using LLMs (GPT-3.5 and GPT-4) to extract and normalize product attribute values with high accuracy.
It details a custom WDC PAVE dataset with 24,583 verified attribute-value pairs from 87 websites, emphasizing the challenges of normalization.
Experiments reveal GPT-4 achieves up to 98% F1-score in normalization tasks, significantly outperforming traditional PLM-based methods.

Using LLMs for the Extraction and Normalization of Product Attribute Values

Introduction

The paper "Using LLMs for the Extraction and Normalization of Product Attribute Values" explores the capabilities of LLMs, specifically GPT-3.5 and GPT-4, in extracting and normalizing product attribute values from e-commerce product descriptions. This work introduces the WDC Product Attribute-Value Extraction (WDC PAVE) dataset, consisting of 1,420 product offers across five categories, annotated with schema.org vocabulary. These annotations include both directly extracted attribute values and normalized values, requiring operations such as name expansion, generalization, unit normalization, and string wrangling.

WDC Product Attribute-Value Extraction Dataset

WDC PAVE was derived from the Web Data Commons Large-Scale Product Matching corpus, featuring offers from 87 websites. The dataset comprises 24,583 manually verified attribute-value pairs. For normalization, 37 out of 70 attributes require transformation to enable applications like faceted product search. This normalization involves operations such as converting abbreviations to full forms (e.g., "HP" to "Hewlett-Packard"), categorizing specific labels under broader classes (e.g., "Oatmeal" to "Snacks and Breakfast"), standardizing measurements (e.g., "7''" to "17.8 cm"), and simplifying strings for consistency.

Figure 1: Product offer with extracted and normalized attribute-value pairs.

Experimental Setup

The dataset is split into training and test sets, and various prompt templates were designed for GPT-3.5 and GPT-4. The prompts use a JSON-schema to instruct the models either for direct extraction or extraction and normalization. The experiments involve zero-shot and few-shot learning, where a subset of labeled examples are used to aid the model's predictions. The models' outputs were evaluated using F1-scores, with GPT-4 achieving superior results across different scenarios due to its string manipulation and schema mastery.

Attribute Value Extraction

The extraction task focused on retrieving exact attribute-value pairs with minimal normalization. Both GPT-3.5 and GPT-4 were effective when provided with example values and task demonstrations. GPT-4 achieved a peak F1-score of 91%, outperforming PLM-based methods such as AVEQA by a 10% margin.

Extraction with Normalization

The extraction and normalization scenario demanded not only identifying attribute values but also transforming them. GPT-4 demonstrated considerable proficiency with normalization tasks, especially those involving string wrangling and name expansion, reaching F1-scores of 95% and 98%, respectively. The unit of measurement normalization posed challenges, though demonstrations and example values substantially improved performance.

Figure 2: JSON-Schema Prompt Template. In Context Demonstration in Box (Black Font = Extraction, Black + Red Font = Extraction and Normalization).

Conclusion

The paper conclusively shows that GPT-4 is highly effective in extracting and normalizing product attribute values, significantly outperforming traditional PLM-based methods. While GPT-4's performance is robust across several normalization techniques, future work could explore integrating computational tools or code interpreters to enhance numerical normalization capabilities further. This exploration suggests substantial potential for LLMs to transform data wrangling processes in e-commerce and related fields.