Legal-HNet: Mixing Legal Long-Context Tokens with Hartley Transform (2311.05089v1)

Published 9 Nov 2023 in cs.CL and cs.AI

Abstract: Since its introduction, the transformers architecture has seen great adoption in NLP applications, but it also has limitations. Although the self-attention mechanism allows for generating very rich representations of the input text, its effectiveness may be limited in specialized domains such as legal, where, for example, LLMs often have to process very long texts. In this paper, we explore alternatives to replace the attention-based layers with simpler token-mixing mechanisms: Hartley and Fourier transforms. Using these non-parametric techniques, we train models with long input documents from scratch in the legal domain setting. We also introduce a new hybrid Seq2Seq architecture, a no-attention-based encoder connected with an attention-based decoder, which performs quite well on existing summarization tasks with much less compute and memory requirements. We believe that similar, if not better performance, as in the case of long correlations of abstractive text summarization tasks, can be achieved by adopting these simpler infrastructures. This not only makes training models from scratch accessible to more people, but also contributes to the reduction of the carbon footprint during training.

Authors (2)

Daniele Giofré (4 papers)
Sneha Ghantasala (1 paper)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Legal-HNet: Mixing Legal Long-Context Tokens with Hartley Transform (2311.05089v1)

Summary

Related Papers