Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 155 tok/s
Gemini 2.5 Pro 42 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 213 tok/s Pro
GPT OSS 120B 422 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Rethinking Tabular Data Understanding with Large Language Models (2312.16702v1)

Published 27 Dec 2023 in cs.CL, cs.AI, cs.DB, and cs.LG

Abstract: LLMs have shown to be capable of various tasks, yet their capability in interpreting and reasoning over tabular data remains an underexplored area. In this context, this study investigates from three core perspectives: the robustness of LLMs to structural perturbations in tables, the comparative analysis of textual and symbolic reasoning on tables, and the potential of boosting model performance through the aggregation of multiple reasoning pathways. We discover that structural variance of tables presenting the same content reveals a notable performance decline, particularly in symbolic reasoning tasks. This prompts the proposal of a method for table structure normalization. Moreover, textual reasoning slightly edges out symbolic reasoning, and a detailed error analysis reveals that each exhibits different strengths depending on the specific tasks. Notably, the aggregation of textual and symbolic reasoning pathways, bolstered by a mix self-consistency mechanism, resulted in achieving SOTA performance, with an accuracy of 73.6% on WIKITABLEQUESTIONS, representing a substantial advancement over previous existing table processing paradigms of LLMs.

Citations (7)

Summary

  • The paper introduces a table normalization technique that mitigates performance drops when LLMs encounter diverse table orientations.
  • The study integrates textual and symbolic reasoning with a self-consistency mechanism, achieving state-of-the-art results on WikiTableQuestions.
  • The approach significantly enhances tabular data interpretation by combining multiple reasoning pathways, setting new benchmarks for LLM performance.

Tabular Data and LLMs

LLMs are currently less adept at handling structured tabular data compared to unstructured text. Challenges arise because of versions of tables, such as those with headers as the first row (column tables) or the first column (row tables), as well as ones featuring numerical operations.

Robustness and Reasoning

LLMs tend to struggle when table structures are altered. Different orientations of the same information significantly drop performance, with transposed tables posing a particular challenge. Despite this, a new method for table structure normalization (NORM) enhances LLM robustness to structural changes. Textual reasoning is slightly ahead of symbolic reasoning in overall effectiveness, though each exhibits distinct advantages for specific tasks.

Performance Boost With Multiple Reasoning Aggregation

LLMs can improve their reasoning capabilities for tabular data interpretation when multiple reasoning pathways are integrated. One prominent method combines textual and symbolic reasoning with a self-consistency mechanism, achieving state-of-the-art performance on the WikiTableQuestions dataset with an accuracy of 73.6%.

Conclusion

This research outlines the difficulties LLMs face with tabular data and presents normalization strategies and reasoning pathway aggregation as effective solutions. The combination of textual and symbolic reasoning, enhanced by self-consistency, leads to significant advances over existing table processing frameworks, establishing new benchmarks in LLMs' abilities to understand and reason over tabular data.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 1 like.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com