Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI

Published 29 May 2024 in cs.LG | (2405.18765v1)

Abstract: The current electroencephalogram (EEG) based deep learning models are typically designed for specific datasets and applications in brain-computer interaction (BCI), limiting the scale of the models and thus diminishing their perceptual capabilities and generalizability. Recently, LLMs have achieved unprecedented success in text processing, prompting us to explore the capabilities of Large EEG Models (LEMs). We hope that LEMs can break through the limitations of different task types of EEG datasets, and obtain universal perceptual capabilities of EEG signals through unsupervised pre-training. Then the models can be fine-tuned for different downstream tasks. However, compared to text data, the volume of EEG datasets is generally small and the format varies widely. For example, there can be mismatched numbers of electrodes, unequal length data samples, varied task designs, and low signal-to-noise ratio. To overcome these challenges, we propose a unified foundation model for EEG called Large Brain Model (LaBraM). LaBraM enables cross-dataset learning by segmenting the EEG signals into EEG channel patches. Vector-quantized neural spectrum prediction is used to train a semantically rich neural tokenizer that encodes continuous raw EEG channel patches into compact neural codes. We then pre-train neural Transformers by predicting the original neural codes for the masked EEG channel patches. The LaBraMs were pre-trained on about 2,500 hours of various types of EEG signals from around 20 datasets and validated on multiple different types of downstream tasks. Experiments on abnormal detection, event type classification, emotion recognition, and gait prediction show that our LaBraM outperforms all compared SOTA methods in their respective fields. Our code is available at https://github.com/935963004/LaBraM.

Abstract PDF HTML Upgrade to Chat

Citations (29)

View on Semantic Scholar

Summary

The paper introduces LaBraM with masked unsupervised pre-training and neural spectrum tokenization that effectively models heterogeneous EEG data.
It segments EEG signals into fixed-length patches to handle diverse channel configurations and achieves state-of-the-art performance on tasks like TUAB and TUEV.
Empirical results show significant improvements in balanced accuracy, AUROC, and performance across emotion recognition and gait prediction tasks.

Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI

Motivation and Problem Setting

The development of EEG-based deep learning models in BCI has been constrained by task-specific dataset design, heterogeneity in electrode configurations, limited dataset sizes, and low signal-to-noise ratio (SNR) inherent in EEG signals. Past approaches primarily rely on CNNs, RNNs, and GNNs to encode spatial or temporal features but suffer from weak generalization and poor cross-dataset adaptability. The rise of LLMs demonstrated a scalable paradigm for foundation models via generic self-supervised pre-training, motivating the search for EEG models capable of learning universal neural representations. However, EEG data acquisition faces unique hurdles: variance in channel layouts, short data duration, inconsistent sample lengths, and annotation expense. Therefore, the authors set out to build a unified, cross-dataset EEG foundation model leveraging masked, unsupervised pre-training to efficiently capture generic EEG representations.

Model Architecture and Training Paradigms

The proposed Large Brain Model (LaBraM) employs a neural Transformer backbone with critical architectural innovations to handle arbitrary channel counts and signal lengths. EEG signals are segmented into fixed-length channel patches, enabling consistent input representation across heterogeneous setups. For each patch, a temporal encoder (stacked 1-D convolution blocks, group normalization, GELU) extracts temporal features. Learnable spatial and temporal embeddings are added to each patch embedding, providing channel and position-aware encoding. Patchwise attention and modified Transformer encoder layers (query/key normalization, bias omission) yield robust sequence representations.

A significant novelty lies in the neural tokenizer: the vector-quantized neural spectrum prediction. Instead of tokenizing based on raw EEG or direct reconstruction, the method discretizes patch representations by reconstructing amplitude and phase spectra (via DFT), capturing neurophysiological semantics beyond time-domain noise. The codebook is trained to maximize cosine similarity, leveraging $\ell_2$ normalization for efficient usage. Pre-training proceeds via masked EEG modeling, where random patch tokens are replaced with learnable mask tokens; the Transformer is trained to predict masked tokens, akin to masked language modeling in NLP, but with symmetric masking to enhance data efficiency and regularization.

Empirical Evaluation and Results

LaBraM was trained on over 2,500 hours of EEG signals from ~20 diverse datasets, representing the largest collection in BCI literature. Three model variants (5.8M–369M parameters) were tested across four downstream tasks: abnormal detection (TUAB), event type classification (TUEV), emotion recognition (SEED-V), gait prediction (MoBI).

Strong numerical improvements were demonstrated over all state-of-the-art baselines in both classification and regression metrics. On TUAB, LaBraM-Huge achieved balanced accuracy of 0.8258 and AUROC of 0.9162, outperforming previous Transformer-based and CNN variants. On TUEV, the challenging multi-class classification task, LaBraM-Huge attained balanced accuracy of 0.6616 and weighted F1 of 0.8329. Emotion and gait tasks corroborated generalization, with LaBraM-Huge yielding 0.4102 accuracy on SEED-V and 0.5632 Pearson correlation for gait prediction. Notably, increasing both model size and pre-training data volume produced scale-adherent improvements, following established scaling laws in LLMs, suggesting further gains are plausible with larger datasets and models.

Ablations confirmed key design choices: spatial embeddings proved essential for cross-dataset adaptability, symmetric masking enhanced performance (especially at large scale), and vector-quantized neural spectrum prediction was particularly effective for semantic representation learning. Fine-tuning explored partial adaptation (last $n$ Transformer layers) versus linear probing, validating full or partial fine-tuning's superiority.

Implications, Limitations, and Outlook

LaBraM establishes a foundation paradigm for EEG-based BCI models, capable of unsupervised cross-task generalization and adapting to heterogeneous channel configurations without bespoke engineering. The approach demonstrates that significant perceptual capabilities and cross-task transfer can be acquired from unlabeled EEG data at scale, mitigating annotation bottlenecks. Practical implications include universal BCI interfaces, robust clinical diagnostic tools (seizure, sleep, emotion, motor imagery), and scalable deployment across various hardware.

Theoretical considerations are substantive: the neural tokenizer and spectrum-based tokenization bridge physiological semantics and foundation model learning, opening avenues for aligning EEG codes with natural language, vision, and multimodal representations. Future directions include: i) collecting orders of magnitude more EEG data to explore emergent abilities, ii) leveraging parameter-efficient adaptation strategies (adapters, LoRA, prompt tuning) as in LLMs, iii) integrating multimodal physiological and behavioral signals, and iv) investigating the scaling regime for EEG foundation models.

Current limitations include dataset size compared to vision and language corpora, unimodal focus, and resource-intensive full fine-tuning. Addressing these will require broad collaborative data collection and methodological advances.

Conclusion

This work introduces LaBraM, a scalable foundation model for generic EEG representation learning in BCI. By segmenting signals, utilizing codebook-driven spectrum tokenization, and performing masked unsupervised Transformer pre-training, LaBraM delivers substantial improvements across diverse downstream tasks and proves robust to dataset heterogeneity. Its implications reach toward universal EEG interfaces, transfer learning, and multimodal alignment, marking a crucial step in large-scale EEG modeling for BCI.

Markdown