Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 78 tok/s
Gemini 2.5 Pro 42 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 80 tok/s Pro
Kimi K2 127 tok/s Pro
GPT OSS 120B 471 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Integrating and querying similar tables from PDF documents using deep learning (1901.04672v1)

Published 15 Jan 2019 in cs.IR

Abstract: Large amount of public data produced by enterprises are in semi-structured PDF form. Tabular data extraction from reports and other published data in PDF format is of interest for various data consolidation purposes such as analysing and aggregating financial reports of a company. Queries into the structured tabular data in PDF format are normally processed in an unstructured manner through means like text-match. This is mainly due to that the binary format of PDF documents is optimized for layout and rendering and do not have great support for automated parsing of data. Moreover, even the same table type in PDF files varies in schema, row or column headers, which makes it difficult for a query plan to cover all relevant tables. This paper proposes a deep learning based method to enable SQL-like query and analysis of financial tables from annual reports in PDF format. This is achieved through table type classification and nearest row search. We demonstrate that using word embedding trained on Google news for header match clearly outperforms the text-match based approach in traditional database. We also introduce a practical system that uses this technology to query and analyse finance tables in PDF documents from various sources.

Citations (3)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube