Emergent Mind

Abstract

Multi-modal LLMs (LLM) have achieved powerful capabilities for visual semantic understanding in recent years. However, little is known about how LLMs comprehend visual information and interpret different modalities of features. In this paper, we propose a new method for identifying multi-modal neurons in transformer-based multi-modal LLMs. Through a series of experiments, We highlight three critical properties of multi-modal neurons by four well-designed quantitative evaluation metrics. Furthermore, we introduce a knowledge editing method based on the identified multi-modal neurons, for modifying a specific token to another designative token. We hope our findings can inspire further explanatory researches on understanding mechanisms of multi-modal LLMs.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.