Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 44 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 13 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 208 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

LMDX: Language Model-based Document Information Extraction and Localization (2309.10952v2)

Published 19 Sep 2023 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs (LLM) have revolutionized NLP, improving state-of-the-art and exhibiting emergent capabilities across various tasks. However, their application in extracting information from visually rich documents, which is at the core of many document processing workflows and involving the extraction of key entities from semi-structured documents, has not yet been successful. The main obstacles to adopting LLMs for this task include the absence of layout encoding within LLMs, which is critical for high quality extraction, and the lack of a grounding mechanism to localize the predicted entities within the document. In this paper, we introduce LLM-based Document Information Extraction and Localization (LMDX), a methodology to reframe the document information extraction task for a LLM. LMDX enables extraction of singular, repeated, and hierarchical entities, both with and without training data, while providing grounding guarantees and localizing the entities within the document. Finally, we apply LMDX to the PaLM 2-S and Gemini Pro LLMs and evaluate it on VRDU and CORD benchmarks, setting a new state-of-the-art and showing how LMDX enables the creation of high quality, data-efficient parsers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Docformer: End-to-end transformer for document understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.  993–1003, October 2021.
  2. Docformerv2: Local features for document understanding, 2023.
  3. Language models are few-shot learners, 2020.
  4. Bertgrid: Contextualized embedding for 2d document representation and understanding, 2019.
  5. LAMBERT: Layout-aware language modeling for information extraction. In Document Analysis and Recognition – ICDAR 2021, pp.  532–547. Springer International Publishing, 2021. doi: 10.1007/978-3-030-86549-8_34. URL https://doi.org/10.1007%2F978-3-030-86549-8_34.
  6. Palm 2 technical report, 2023.
  7. Training compute-optimal large language models, 2022.
  8. BROS: A layout-aware pre-trained language model for understanding documents. CoRR, abs/2108.04539, 2021. URL https://arxiv.org/abs/2108.04539.
  9. Layoutlmv3: Pre-training for document ai with unified text and image masking. In Proceedings of the 30th ACM International Conference on Multimedia, 2022.
  10. Spatial dependency parsing for semi-structured document information extraction. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp.  330–343, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-acl.28. URL https://aclanthology.org/2021.findings-acl.28.
  11. Chargrid: Towards understanding 2d documents, 2018.
  12. Ocr-free document understanding transformer. In European Conference on Computer Vision (ECCV), 2022.
  13. FormNet: Structural encoding beyond sequential modeling in form document information extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  3735–3754, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.260. URL https://aclanthology.org/2022.acl-long.260.
  14. FormNetV2: Multimodal graph contrastive learning for form document information extraction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  9011–9026, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.501. URL https://aclanthology.org/2023.acl-long.501.
  15. Representation learning for information extraction from form-like documents. In ACL, 2020.
  16. OpenAI. Gpt-4 technical report, 2023.
  17. Cloudscan - a configuration-free invoice analysis system using recurrent neural networks. In Proceedings of 2017 14th IAPR International Conference on Document Analysis and Recognition, pp.  406–413, United States, 2017. IEEE. ISBN 9781538635858. doi: 10.1109/ICDAR.2017.74.
  18. Cord: A consolidated receipt dataset for post-ocr parsing. In Workshop on Document Intelligence at NeurIPS 2019, 2019.
  19. Going full-tilt boogie on document understanding with text-image-layout transformer. In Josep Lladós, Daniel Lopresti, and Seiichi Uchida (eds.), Document Analysis and Recognition – ICDAR 2021, pp.  732–747, Cham, 2021. Springer International Publishing. ISBN 978-3-030-86331-9.
  20. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. URL http://jmlr.org/papers/v21/20-074.html.
  21. Text chunking using transformation-based learning. In Third Workshop on Very Large Corpora, 1995. URL https://aclanthology.org/W95-0107.
  22. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  23. Gpt-ner: Named entity recognition via large language models, 2023a.
  24. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
  25. QueryForm: A simple zero-shot form entity query framework. In Findings of the Association for Computational Linguistics: ACL 2023, pp.  4146–4159, Toronto, Canada, July 2023b. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.255. URL https://aclanthology.org/2023.findings-acl.255.
  26. Vrdu: A benchmark for visually-rich document understanding. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, pp.  5184–5193, New York, NY, USA, 2023c. Association for Computing Machinery. ISBN 9798400701030. doi: 10.1145/3580305.3599929. URL https://doi.org/10.1145/3580305.3599929.
  27. Finetuned language models are zero-shot learners, 2022.
  28. Ppn: Parallel pointer-based network for key information extraction with complex layouts, 2023.
  29. Layoutlmv2: Multi-modal pre-training for visually-rich document understanding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL) 2021, 2021.
  30. Layoutlm: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp.  1192–1200, 2020.
  31. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing, 18(6):1245–1262, 1989. doi: 10.1137/0218082. URL https://doi.org/10.1137/0218082.
  32. Multimodal pre-training based on graph attention network for document understanding, 2022.
Citations (20)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube