Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory (2311.08719v1)

Published 15 Nov 2023 in cs.CL

Abstract: Memory-augmented LLMs have demonstrated remarkable performance in long-term human-machine interactions, which basically relies on iterative recalling and reasoning of history to generate high-quality responses. However, such repeated recall-reason steps easily produce biased thoughts, \textit{i.e.}, inconsistent reasoning results when recalling the same history for different questions. On the contrary, humans can keep thoughts in the memory and recall them without repeated reasoning. Motivated by this human capability, we propose a novel memory mechanism called TiM (Think-in-Memory) that enables LLMs to maintain an evolved memory for storing historical thoughts along the conversation stream. The TiM framework consists of two crucial stages: (1) before generating a response, a LLM agent recalls relevant thoughts from memory, and (2) after generating a response, the LLM agent post-thinks and incorporates both historical and new thoughts to update the memory. Thus, TiM can eliminate the issue of repeated reasoning by saving the post-thinking thoughts as the history. Besides, we formulate the basic principles to organize the thoughts in memory based on the well-established operations, (\textit{i.e.}, insert, forget, and merge operations), allowing for dynamic updates and evolution of the thoughts. Furthermore, we introduce Locality-Sensitive Hashing into TiM to achieve efficient retrieval for the long-term conversations. We conduct qualitative and quantitative experiments on real-world and simulated dialogues covering a wide range of topics, demonstrating that equipping existing LLMs with TiM significantly enhances their performance in generating responses for long-term interactions.

Citations (14)

View on Semantic Scholar

Summary

The paper presents Think-in-Memory (TiM), a novel mechanism that augments LLMs with long-term memory by saving and recalling distilled thoughts.
The paper demonstrates a two-stage process—recalling and post-thinking—that leverages LSH for efficient memory operations, yielding improved retrieval accuracy and response correctness.
The paper introduces dynamic memory update principles such as insert, forget, and merge, significantly enhancing contextual coherence and reducing retrieval time in diverse datasets.

The paper introduces Think-in-Memory (TiM), a novel memory mechanism designed to augment LLMs with long-term memory capabilities by enabling them to remember and selectively recall historical thoughts in long-term interaction scenarios. The motivation stems from the limitations of existing memory-augmented LLMs, which rely on iterative recalling and repeated reasoning over the history in an external memory cache, leading to inconsistent reasoning paths and high retrieval costs. TiM addresses these issues by saving thoughts as memories, similar to the metacognition process in humans, rather than saving the details of original events.

The TiM framework consists of two stages:

In the recalling stage, LLMs generate responses to new queries by recalling relevant thoughts from memory.
In the post-thinking stage, the LLM engages in reasoning and thinking about the response and saves new thoughts into an external memory.

To mirror the cognitive process of humans, the paper formulates basic principles to organize thoughts in memory based on well-established operations, such as insert, forget, and merge, allowing for dynamic updates and evolution of the thoughts. TiM utilizes Locality-Sensitive Hashing (LSH) to facilitate efficient hand-in (insert thoughts) and hand-out (recall thoughts) operations. TiM is designed to be LLM-agnostic, enabling its integration with both closed-source LLMs like ChatGPT and open-source LLMs like ChatGLM.

The key components of TiM are:

Agent $\mathcal{A}$ : A pre-trained LLM model to facilitate dynamic conversations.
Memory Cache $\mathcal{M}$ : A continually growing hash table of key-value pairs, where the key is the hash index and the value is a single thought.
Hash-based Mapping $\mathbb{\mathbf{F}(\cdot)}$ : LSH is introduced to quickly save and find relevant thoughts in $\mathcal{M}$ .

The paper defines an "inductive thought" as text containing the relation between two entities, satisfying a relation triple $(E_h, r_i, E_t)$ , where $E_h$ is the head entity connected with the tail entity $E_t$ via the relation $r_i$ .

The paper utilizes a hash table as the architecture of TiM’s storage system, where similar thoughts are assigned the same hash index. The LSH method assigns each $d$ -dimension embedding vector $x\in \mathbf{R}^d$ to a hash index $\mathbf{F}(x)$ , where nearby vectors get the same hash index with higher probability. The hash function is defined as:

$\mathbb{\mathbf{F}(x) = \mathop{\arg\max}\left(\left[xR; -xR\right]\right)}$

where:

$x$ is the $d$ -dimension embedding vector
$R$ is a random matrix of size $(d, b/2)$
$b$ is the number of groups in the memory

The memory retrieval operates as a two-stage retrieval task for the most relevant thoughts: LSH-based retrieval followed by similarity-based retrieval.

The organization principles based on operations for dynamic updates and evolution of thoughts supported by TiM include:

Insert: storing new thoughts into the memory.
Forget: removing unnecessary thoughts from the memory, such as contradictory thoughts.
Merge: merging similar thoughts in the memory, such as thoughts with the same head entity.

The paper adopts Low-Rank Adaptation (LoRA) for computation-efficient fine-tuning. LoRA fine-tunes according to $y = Wx + BAx$ , where $W \in \mathbf{R}^{d\times k}$ , $B \in \mathbf{R}^{d\times r}$ , $A \in \mathbf{R}^{r\times k}$ , and $r \ll \min(d; k)$ .

The paper evaluates TiM on three datasets: KdConv, Generated Virtual Dataset (GVD), and Real-world Medical Dataset (RMD). The LLMs used are ChatGLM and Baichuan2. The baselines include answering questions without any memory mechanism and SiliconFriend.

The evaluation metrics include:

Retrieval Accuracy
Response Correctness
Contextual Coherence

On the GVD dataset, TiM exhibited superior performance across all metrics compared to SiliconFriend, especially for contextual coherence. On the KdConv dataset, TiM obtained the best results across all topics (film, music, and travel). On the RMD dataset, TiM improved the overall response performance for real-world medical conversations, with significant improvements in response correctness and contextual coherence. Retrieval time was also reduced using TiM compared to calculating pairwise similarity between the question and the whole memory.

A medical agent, TiM-LLM, was developed based on ChatGLM and TiM in the context of patient-doctor conversations. TiM-LLM serves as an auxiliary tool for clinical doctors to provide treatment options and medical suggestions for patients' needs.

PDF Markdown

Related Papers

Tweets

https://twitter.com/mcraddock/status/1921468955354452193

https://twitter.com/mcraddock/status/1916756337800745161

https://twitter.com/mcraddock/status/1921469192726786090

https://twitter.com/mcraddock/status/1901937653391503762

https://twitter.com/mcraddock/status/1902779748859691450

https://twitter.com/theobjectivedad/status/1872765652928192584