R&B -- Rhythm and Brain: Cross-subject Decoding of Music from Human Brain Activity (2406.15537v1)

Published 21 Jun 2024 in q-bio.NC, cs.AI, cs.SD, and eess.AS

Abstract: Music is a universal phenomenon that profoundly influences human experiences across cultures. This study investigates whether music can be decoded from human brain activity measured with functional MRI (fMRI) during its perception. Leveraging recent advancements in extensive datasets and pre-trained computational models, we construct mappings between neural data and latent representations of musical stimuli. Our approach integrates functional and anatomical alignment techniques to facilitate cross-subject decoding, addressing the challenges posed by the low temporal resolution and signal-to-noise ratio (SNR) in fMRI data. Starting from the GTZan fMRI dataset, where five participants listened to 540 musical stimuli from 10 different genres while their brain activity was recorded, we used the CLAP (Contrastive Language-Audio Pretraining) model to extract latent representations of the musical stimuli and developed voxel-wise encoding models to identify brain regions responsive to these stimuli. By applying a threshold to the association between predicted and actual brain activity, we identified specific regions of interest (ROIs) which can be interpreted as key players in music processing. Our decoding pipeline, primarily retrieval-based, employs a linear map to project brain activity to the corresponding CLAP features. This enables us to predict and retrieve the musical stimuli most similar to those that originated the fMRI data. Our results demonstrate state-of-the-art identification accuracy, with our methods significantly outperforming existing approaches. Our findings suggest that neural-based music retrieval systems could enable personalized recommendations and therapeutic applications. Future work could use higher temporal resolution neuroimaging and generative models to improve decoding accuracy and explore the neural underpinnings of music perception and emotion.

Summary

The paper demonstrates the efficacy of linear alignment with voxel-wise encoding models, achieving a 90.12% test identification accuracy in decoding music from fMRI data.
It employs functional and anatomical alignment techniques to pinpoint critical brain regions like the superior temporal gyrus and primary auditory cortex that process musical stimuli.
Temporal analysis indicates that extended neural engagement enhances identification accuracy, suggesting potential applications in personalized music therapy.

Rhythm and Brain: Cross-subject Decoding of Music from Human Brain Activity

The paper "Rhythm and Brain: Cross-subject Decoding of Music from Human Brain Activity" explores the feasibility of decoding musical information from neural data, utilizing functional Magnetic Resonance Imaging (fMRI) and advanced machine learning models. This paper is rooted in the growing interest in understanding the neural correlates of music perception and its potential applications in various domains, including personalized music recommendation systems and music therapy.

Overview

The investigation hinges on the GTZan fMRI dataset, comprising data from five participants who listened to 540 musical stimuli across ten different genres. The authors implemented the Contrastive Language-Audio Pretraining (CLAP) model to extract latent representations from musical stimuli, utilizing voxel-wise encoding models to pinpoint brain regions responsive to these stimuli. The paper underscores the importance of functional and anatomical alignment techniques to facilitate cross-subject decoding, thereby overcoming the limitations inherent in fMRI data such as low temporal resolution and signal-to-noise ratio (SNR).

Methodological Insights

The paper adopts a multifaceted approach, leveraging CLAP for feature extraction and various alignment techniques for optimizing cross-subject data integration:

Functional Alignment: Three distinct alignment techniques were tested—anatomical alignment, functional alignment via hyperalignment, and linear alignment using ridge regression. These methods aim to account for intersubject variability, enhancing the robustness of the decoding models.
Encoding Models: Voxel-wise encoding models were built to map CLAP-extracted features onto neural data, identifying brain regions highly responsive to music. The encoding models employed cross-validation with a threshold correlation value of 0.1 to select 833 voxels considered critical for music processing.
Decoding Pipeline: The decoding pipeline involved training a ridge regression model on aligned brain activity data and employing a retrieval-based approach to match predicted musical features with true counterparts in the CLAP space. This retrieval utilized L2 distance to identify the top-k nearest musical stimuli.

Key Findings

The results reveal significant advancements in music decoding accuracy:

Identification Accuracy: The best-performing method (linear alignment) achieved a test identification accuracy of 0.9012, surpassing other baselines and the hyperalignment approach. This highlights the efficacy of linear modelling for cross-subject music decoding.
Genre Decoding: The model exhibited high performance in classifying musical genres, particularly for classical and jazz, while genres like disco and metal showed higher misclassification rates due to overlapping musical features.
Temporal Dynamics: A detailed temporal analysis revealed that identification accuracy peaks towards the later part of the 15-second window, suggesting prolonged neural engagement enhances decoding performance.

Neural Correlates and Practical Implications

The paper identifies critical brain regions involved in music perception, such as the superior temporal gyrus (STG), primary auditory cortex, planum temporale, and inferior parietal lobule. These areas play vital roles in auditory processing, suggesting the neural mechanisms underlying music perception are deeply rooted in complex, interconnected brain networks.

Future Directions

Future work could expand on this foundation by incorporating high-temporal-resolution neuroimaging techniques such as EEG or iEEG. These methods could provide finer temporal resolution, crucial for decoding rhythmic elements of music more accurately. Additionally, the exploration of generative models to create music based on neural data holds promise for innovative artistic and therapeutic applications. For instance, integrating music therapy with neural decoding techniques could lead to personalized interventions for managing psychological conditions like anxiety and depression.

Conclusion

"Rhythm and Brain: Cross-subject Decoding of Music from Human Brain Activity" sets a new benchmark in the field of neuromusicology by demonstrating the feasibility and high accuracy of decoding music from cross-subject neural activity. The methodological rigor and significant findings pave the way for future research aimed at unraveling the neural dynamics of music perception and leveraging these insights for practical applications in personalized music therapy and beyond.

PDF Markdown

Related Papers

Tweets

https://twitter.com/emulenews/status/1807458843032461538

https://twitter.com/RecsysPapers/status/1814737044796780547