Emergent Mind

Legal-HNet: Mixing Legal Long-Context Tokens with Hartley Transform

(2311.05089)
Published Nov 9, 2023 in cs.CL and cs.AI

Abstract

Since its introduction, the transformers architecture has seen great adoption in NLP applications, but it also has limitations. Although the self-attention mechanism allows for generating very rich representations of the input text, its effectiveness may be limited in specialized domains such as legal, where, for example, language models often have to process very long texts. In this paper, we explore alternatives to replace the attention-based layers with simpler token-mixing mechanisms: Hartley and Fourier transforms. Using these non-parametric techniques, we train models with long input documents from scratch in the legal domain setting. We also introduce a new hybrid Seq2Seq architecture, a no-attention-based encoder connected with an attention-based decoder, which performs quite well on existing summarization tasks with much less compute and memory requirements. We believe that similar, if not better performance, as in the case of long correlations of abstractive text summarization tasks, can be achieved by adopting these simpler infrastructures. This not only makes training models from scratch accessible to more people, but also contributes to the reduction of the carbon footprint during training.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.